**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Learning to Reduce Annotation Load

Abstract

Modern machine learning methods and their applications in computer vision are known to crave for large amounts of training data to reach their full potential. Because training data is mostly obtained through humans who manually label samples, it induces a significant cost. Therefore, the problem of reducing the annotation load is of great importance for the success of machine learning methods. We study the problem of reducing the annotation load from two viewpoints, by answering the questions âWhat to annotate?â and âHow to annotate?â. The question âWhat?â addresses the selection of a small portion of the data that would be sufficient to train an accurate model. The question âHow? focuses on minimising the effort of labelling each datapoint. The question âWhat to annotate?â becomes particularly compelling if we can select data to be annotated in an iterative and adaptive way, a setting known as active learning (AL). The key challenge in AL is to identify the datapoints that are the most informative for the model at a given stage. We propose several techniques to address this challenge. Firstly, we consider the problem of segmenting natural images and image volumes. We take advantage of image priors, such as smoothness of objects of interest, and use them in a novel form of geometric uncertainty. Using this, we design an AL technique to efficiently annotate data that is tailored to segmentation applications. Next, we notice that no single manually-designed strategy outperforms others in every application and that often the burden of designing new strategies outweighs the benefits of AL. To overcome this problem we suggest learning an AL strategy from data by formulating the AL problem as a regression task that predicts the reduction in the generalisation error achieved by labelling each datapoint. This enables us to learn AL strategies from simulated data and to transfer them to new datasets. Finally, we turn towards non-myopic data-driven AL strategies. To this end, we formulate the AL problem as a Markov decision process and find the best selection policy using reinforcement learning. We design the decision process such that the policy can be learnt for any ML model and transferred to diverse application domains. Effectively addressing the question âHow to annotate?â is of no less importance as large cost savings can be achieved by labelling each datapoint more efficiently. This can be done with intelligent interfaces that interact with a human annotator. We make two contributions towards answering the question âHow?â. Firstly, we propose an efficient technique to annotate 3D image volumes for image segmentation. Annotating data in 3D is cumbersome and an obvious way to facilitate it is to select a subset of the data lying on a 2D plane. To find the optimal plane (i.e. the one containing the most informative datapoints) we design a branch-and-bound algorithm that quickly eliminates hypotheses about the optimal projection. Secondly, we propose an intelligent data annotation method to train object detectors. Instead of always asking the human annotator to draw bounding boxes in images, we detect automatically in which cases we can rely on the current detector and verify its proposal.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related publications (110)

Loading

Loading

Loading

Detection of curvilinear structures has long been of interest due to its wide range of applications. Large amounts of imaging data could be readily used in many fields, but it is practically not possible to analyze them manually. Hence, the need for automated delineation approaches. In the recent years Computer Vision witnessed a paradigm shift from mathematical modelling to data-driven methods based on Machine Learning. This led to improvements in performance and robustness of the detection algorithms. Nonetheless, most Machine Learning methods are general-purpose and they do not exploit the specificity of the delineation problem. In this thesis, we present learning methods suited for this task and we apply them to various kinds of microscopic and natural images, proving the general applicability of the presented solutions.
First, we introduce a topology loss - a new training loss term, which captures higher-level features of curvilinear networks such as smoothness, connectivity and continuity. This is in contrast to most Deep Learning segmentation methods that do not take into account the geometry of the resulting prediction. In order to compute the new loss term, we extract topology features of prediction and ground-truth using a pre-trained network, whose filters are activated by structures at different scales and orientations. We show that this approach yields better results in terms of conventional segmentation metrics and overall topology of the resulting delineation.
Although segmentation of curvilinear structures provides useful information, it is not always sufficient. In many cases, such as neuroscience and cartography, it is crucial to estimate the network connectivity. In order to find the graph representation of the structure depicted in the image, we propose an approach for joint segmentation and connection classification. Apart from pixel probabilities, this approach also returns the likelihood of a proposed path being a part of the reconstructed network. We show that segmentation and path classification are closely related tasks and can benefit from the synergy.
The aforementioned methods rely on Machine Learning, which requires significant amounts of annotated ground-truth data to train models. The labelling process often requires expertise, it is costly and tiresome. To alleviate this problem, we introduce an Active Learning method that significantly decreases the time spent on annotating images. It queries the annotator only about the most informative examples, in this case the hypothetical paths belonging to the structure of interest. Contrary to conventional Active Learning methods, our approach exploits local consistency of linear paths to pick the ones that stand out from their neighborhood.
Our final contribution is a method suited for both Active Learning and proofreading the result, which often requires more time than the automated delineation itself. It investigates edges of the delineation graph and determines the ones that are especially significant for the global reconstruction by perturbing their weights. Our Active Learning and proofreading strategies are combined with a new efficient formulation of an optimal subgraph computation and reduce the annotation effort by up to 80%.

Related concepts (36)

Learning is the process of acquiring new understanding, knowledge, behaviors, skills, values, attitudes, and preferences. The ability to learn is possessed by humans, animals, and some machines; th

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforce

In and computer vision, image segmentation is the process of partitioning a into multiple image segments, also known as image regions or image objects (sets of pixels). The goal of segmentation is

In recent years, Machine Learning based Computer Vision techniques made impressive progress. These algorithms proved particularly efficient for image classification or detection of isolated objects. From a probabilistic perspective, these methods can predict marginals, over single or multiple variables, independently, with high accuracy.
However, in many tasks of practical interest, we need to predict jointly several correlated variables.
Practical applications include people detection in crowded scenes, image segmentation, surface reconstruction, 3D pose estimation and others. A large part of the research effort in today's computer-vision community aims at finding task-specific solutions to these problems, while leveraging the power of Deep-Learning based classifiers. In this thesis, we present our journey towards a generic and practical solution based on mean-field (MF) inference.
Mean-field is a Statistical Physics-inspired method which has long been used in Computer-Vision as a variational approximation to posterior distributions over complex Conditional Random Fields. Standard
mean-field optimization is based on coordinate descent
and in many situations can be impractical.
We therefore propose a novel proximal gradient-based
approach to optimizing the variational objective. It
is naturally parallelizable and easy to implement.
We prove its convergence, and then demonstrate that, in
practice, it yields faster convergence and often finds better
optima than more traditional mean-field optimization techniques.
Then, we show that we can replace the fully factorized distribution of mean-field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of mean-field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean-fields.
One of the important properties of the mean-field inference algorithms is that the closed-form updates are fully differentiable operations. This naturally allows to do parameter learning by simply unrolling multiple iterations of the updates, the so-called back-mean-field algorithm. We derive a novel and efficient structured learning method for multi-modal posterior distribution based on the Multi-Modal Mean-Field approximation, which can be seamlessly combined to modern gradient-based learning methods such as CNNs.
Finally, we explore in more details the specific problem of structured learning and prediction for multiple-people detection in crowded scenes. We then present a mean-field based structured deep-learning detection algorithm that provides state of the art results on this dataset.

Though deep learning (DL) algorithms are very powerful for image processing tasks, they generally require a lot of data to reach their full potential. Furthermore, there is no straightforward way to impose various properties, given by the prior knowledge about the target task, on the features extracted by a DL model. Therefore, in this thesis we propose several techniques that rely on the power of graph representations to embed prior knowledge inside the learning process. This allows to reduce the solution space and leads to faster optimization convergence and higher accuracy in the representation learning.
In our first work, inspired by the ability of a human to correctly classify rotated, shifted or flipped objects, we propose an algorithm that permits to inherently encode invariance to isometric transformations of objects in an image. Our DL architecture is based on graph representations and consists of three novel layers, which we refer to as graph convolutional, dynamic pooling and statistical layers. Our experiments on the image classification tasks show that our network correctly recognizes isometrically transformed objects even though such types of transformation are not seen by the network at training time. Standard DL techniques are typically not able to succeed in solving such a problem without extensive data augmentation.
Then, we propose to exploit the properties of graph-based approaches to efficiently process images with various types of projective geometry. In particular, we are interested in increasingly popular omnidirectional cameras, which have a 360 degree field of view. Despite their effectiveness, such cameras create images with specific geometric properties, which require special techniques for efficient processing. We propose an efficient way of adjusting the weights of the graph edges to adapt the filter responses to the geometric image properties introduced by omnidirectional cameras. Our experiments prove that using the proposed graph with properly adjusted edge weights permits to reach better performance as compared to using regular grid graph with equal weights.
Finally, the approach described above relies on the isotropic filters, which work well within our transformation invariant architecture for image classification. However, for other problems (e.g. image compression) or even when used without dynamic pooling and statistical layers that are defined within the proposed architecture, these filters are unable to efficiently encode the information about the object. Thus, we introduce a different technique based on anisotropic filters that adapt their shape and size according to the omnidirectional image geometry. The main advantage of this approach compared to the previous one is the ability to encode the orientation of an image pattern, which is important for various tasks such as image compression. Our experiments show that our approach adapts to different image projective geometries and achieves state-of-the-art performance on image classification and compression tasks.
Overall we propose several methods, which combine the power of DL and graph signal processing towards incorporating prior information about the target task inside the optimization procedure. We hope that the research efforts presented in this thesis will help the development of efficient DL algorithms that can use various types of prior knowledge to make them efficient even when the available training data is scarce.