Publication

Learning and leveraging shared domain semantics to counteract visual domain shifts

Róger Bermúdez Chacón
2020
Thèse EPFL
Résumé

One of the main limitations of artificial intelligence today is its inability to adapt to unforeseen circumstances. Machine Learning (ML), due to its data-driven nature, is particularly susceptible to this. ML relies on observations in order to learn implicit rules about the inputs, outcomes, and relationships among them, so as to solve a task. An unfortunate consequence of learning based on observations, however, is that ML algorithms learn by observing a partial, inevitably skewed version of the world. As a result, ML methods experience difficulties deciding how to properly use their learned experience when the world as they know it changes. Domain adaptation, the paradigm followed in this thesis, is an area of ML research that tackles the above problem. In the domain adaptation setup, a ML problem must be adapted when domain shifts—or changes in the nature of the data—occur. This thesis tackles the domain adaptation problem in the particular context of visual applications, with the ultimate goal of reducing as much as possible the need for human intervention in training ML methods for visual tasks. We study the domain adaptation problem from two different fronts. A first idea is to harness the existing structure of the images. Despite visual differences, structural information related to the semantic content of the image is often preserved in images from different origins. We present a method to leverage those visual correspondences—even when the match is imperfect—based on Multiple Instance Learning, so as to adjust parameters of a ML model to new domains. We also introduce a Self-Supervised ML method that aggregates visual correspondences into a consensus heatmap. As such heatmaps are good unsupervised proxies for real annotations, we use them as supervisory signal. In addition, we propose a Two-Stream U-Net architecture that processes different domains simultaneously. The Two-Stream U-Net combines parameter regularization, distribution matching, and the self-supervised signal from the consensus heatmaps, to bridge the performance gap between models operating on different image domains. The second line of reasoning looks beyond the raw image information and instead maps images to compact latent representations that preserve image semantics. For this, we introduce multiflow networks: a neural Network Architecture Search paradigm that assigns different network capacity to different image domains to extract domain-agnostic latent representations. In the multiflow formalism, domain-specific learnable gates modulate the contribution of different operations to the encoding. The end results are latent encodings from different domains that do not suffer from the domain shift. Our results in biomedical image segmentation, object classification, and object detection, validate the wide range of applicability of the methods introduced in this thesis.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.