Object detection is a computer technology related to computer vision and that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including and video surveillance.
It is widely used in computer vision tasks such as , vehicle counting, activity recognition, face detection, face recognition, video object co-segmentation. It is also used in tracking objects, for example tracking a ball during a football match, tracking movement of a cricket bat, or tracking a person in a video.
Often, the test images are sampled from a different data distribution, making the object detection task significantly more difficult. To address the challenges caused by the domain gap between training and test data, many unsupervised domain adaptation approaches have been proposed. A simple and straightforward solution of reducing the domain gap is to apply an image-to-image translation approach, such as cycle-GAN. Among other uses, cross-domain object detection is applied in autonomous driving, where models can be trained on a vast amount of video game scenes, since the labels can be generated without manual labor.
Every object class has its own special features that help in classifying the class – for example all circles are round.
Object class detection uses these special features. For example, when looking for circles, objects that are at a particular distance from a point (i.e. the center) are sought. Similarly, when looking for squares, objects that are perpendicular at corners and have equal side lengths are needed. A similar approach is used for face identification where eyes, nose, and lips can be found and features like skin color and distance between eyes can be found.
Methods for object detection generally fall into either neural network-based or non-neural approaches.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
These lab sessions are designed as a practical companion to the course "Image Analysis and Pattern Recognition" by Prof. Thiran. The main objective is to learn how to use some important image processi
This course gives an introduction to the fundamental concepts and methods of the Digital Humanities, both from a theoretical and applied point of view. The course introduces the Digital Humanities cir
The course will cover different aspects of multimodal processing (complementarity vs redundancy; alignment and synchrony; fusion), with an emphasis on the analysis of people, behaviors and interaction
Le cours suivi propose une initiation aux concepts de base de la programmation impérative tels que : variables, expressions, structures de contrôle, fonctions/méthodes, en les illustrant dans la synta
Le cours suivi propose une introduction aux concepts de base de la programmation orientée objet tels que : encapsulation et abstraction, classes/objets, attributs/méthodes, héritage, polymorphisme, ..
Ce cours initie à la programmation en utilisant le langage C++. Il ne présuppose pas de connaissance préalable. Les aspects plus avancés (programmation orientée objet) sont donnés dans un cours suivan
Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels.
In computer vision, object co-segmentation is a special case of , which is defined as jointly segmenting semantically similar objects in multiple images or video frames. It is often challenging to extract segmentation masks of a target/object from a noisy collection of images or video frames, which involves object discovery coupled with . A noisy collection implies that the object/target is present sporadically in a set of images or the object/target disappears intermittently throughout the video of interest.
Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.
Turning pass-through network architectures into iterative ones, which use their own output as input, is a well-known approach for boosting performance. In this paper, we argue that such architectures offer an additional benefit: The convergence rate of the ...
2024
Deep learning has revolutionized the field of computer vision, a success largely attributable to the growing size of models, datasets, and computational power.Simultaneously, a critical pain point arises as several computer vision applications are deployed ...
This report presents a study on the development and application of a Region-based Convolutional Neural Network, Faster RCNN and a more complex one, TransVOD, to locate solar coronal jets using data from the Solar Dynamic Observatory (SDO). The study focus ...