Face-to-face interactions are part of everyday life, ranging from family to working in teams and to global communities. Social psychologists have long studied these interactions with the aim of understanding behavior, motivations, and emergence of interaction patterns. An organization is environment rich in daily interactions including structured periodic meetings, planning, brainstorming, negotiations, decision-making and informal gatherings and leaders play a key role in many of them. Leader face problems, propose solutions, make decisions, and often are the main source of inspiration of the employees. Identifying emergent leaders at early stages in organizations is a key issue in organizational behavioral research, and a new problem in social computing. The study of this phenomenon requires sensing of natural face-to-face interactions, automatic extraction of behavioral cues and reliable machine learning algorithms to identify emergent leaders. In this thesis we present a computational approach to analyze emergence of leadership in small groups using multimodal audio and visual features. In the computational framework, we first present an analysis on how an emergent leader is perceived in newly formed, small groups. We present the ELEA (Emergent LEadership Analysis) corpus collected with the aim of analyzing emergence of leaders. We propose to analyze emergent leaders, using a variety of nonverbal cues studied in social psychology and automatically extracted from audio and video streams. Our analysis address how the emergent leader is perceived by his/her peers in terms of speaking and visual active, and its relation with the most dominant person (including external observers’ perception). We then propose to investigate which individual nonverbal channel (or combination of features from different channels) provides better inferences of the emergent leader and related concepts using unsupervised and supervised methods. We use a supervised collective approach which adds relational information to the nonverbal cues and compare its performance, with the performance of supervised (non-collective) and unsupervised methods. We also propose to capture the social visual attention patterns from automatically extracted features from video, in order to analyze who receives or gives the largest amount of visual attention in the group. Finally, with the aim of understanding who receives the largest amount of visual attention while speaking and who has the highest dominance ratio (i.e., many occurrences of looking at others while speaking and few occurrences of looking at others while not speaking). We synchronize the audio and video streams to capture the speaking and attention activity patterns. We end our analysis exploring the impact of the verbal content (language style) in the interactions and its influence in the perception of emergent leaders. For the language style analysis, we propose to compute word categories extracted from manual transcriptions