Publication

Improving Object Detection under Domain Shifts

Vidit Vidit
2023
EPFL thesis
Abstract

Object detection plays a critical role in various computer vision applications, encompassingdomains like autonomous vehicles, object tracking, and scene understanding. These applica-tions rely on detectors that generate bounding boxes around known object categories, andthe outputs of these detectors are subsequently utilized by downstream systems. In practice,supervised training is the predominant approach for training object detectors, wherein labeleddata is used to train the models.However, the effectiveness of these detectors in real-world scenarios hinges on the extentto which the training data distribution can adequately represent all potential test scenarios.In many cases, this assumption does not hold true. For instance, a model will be typicallytrained under a single environmental condition but at the test time, it can encounter a muchmore diverse condition. Such discrepancies often occur as acquiring training data that coversdiverse environmental conditions can be challenging. This disparity between the trainingand test distributions, commonly referred to as the domain shift deteriorates the detector’sperformance.In the literature, various methods have been employed to mitigate the domain shift issue.One approach involves unsupervised domain adaptation techniques, where the model isadapted to perform well on the target domain by leveraging unlabeled images from that do-main. Another avenue of research is domain generalization, which aims to train models thatcan generalize effectively across multiple target domains without direct access to data in thatparticular domain.In this thesis, we propose unsupervised domain adaptation and domain generalization meth-ods to alleviate domain shift. First, we introduce an attention-based module to obtain localobject regions in the single-stage detectors. Here we show the efficacy of a gradual transitionfrom global image features adaptation to local region adaptation. While this work mainlyfocuses on appearance shifts due to illumination or weather change, in our second work,we show that the gap introduced due to differences in the camera setup and parameters isnon-negligible, as well. Hence, we propose a method to learn a set of homographies thatallow us to learn robust features to bring two domains closer under such shifts. Both of theseworks have access to unlabelled data in the target domain, but sometimes even unlabeleddata is scarce. To tackle this, in our third work, we propose a domain generalization methodby leveraging image and text-aligned feature embeddings. We estimate the visual features of thetarget domain based on the textual prompt describing the domain.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.