Visual behavior recognition is currently a highly active research area. This is due both to the scientific challenge posed by the complexity of the task, and to the growing interest in its applications, such as automated visual surveillance, human-computer interaction, medical diagnosis or video indexing/retrieval. A large number of different approaches have been developed, whose complexity and underlying models depend on the goals of the particular application which is targeted. The general trend followed by these approaches is the separation of the behavior recognition task into two sequential processes. The first one is a feature extraction process, where features which are considered relevant for the recognition task are extracted from the input image sequence. The second one is the actual recognition process, where the extracted features are classified in terms of the pre-defined behavior classes. One problematic issue of such a two-pass procedure is that the recognition process is highly dependent on the feature extraction process, and does not have the possibility to influence it. Consequently, a failure of the feature extraction process may impair correct recognition. The focus of our thesis is on the recognition of single object behavior from monocular image sequences. We propose a general framework where feature extraction and behavior recognition are performed jointly, thereby allowing the two tasks to mutually improve their results through collaboration and sharing of existing knowledge. The intended collaboration is achieved by introducing a probabilistic temporal model based on a Hidden Markov Model (HMM). In our formulation, behavior is decomposed into a sequence of simple actions and each action is associated with a different probability of observing a particular set of object attributes within the image at a given time. Moreover, our model includes a probabilistic formulation of attribute (feature) extraction in terms of image segmentation. Contrary to existing approaches, segmentation is achieved by taking into account the relative probabilities of each action, which are provided by the underlying HMM. In this context, we solve the joint problem of attribute extraction and behavior recognition by developing a variation of the Viterbi decoding algorithm, adapted to our model. Within the algorithm derivation, we translate the probabilistic attribute extraction formulation into a variational segmentation model. The proposed model is defined as a combination of typical image- and contour-dependent energy terms with a term which encapsulates prior information, offered by the collaborating recognition process. This prior information is introduced by means of a competition between multiple prior terms, corresponding to the different action classes which may have generated the current image. As a result of our algorithm, the recognized behavior is represented as a succession of action classes corresponding to the images in the given seq