**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Predicting in Uncertain Environments: Methods for Robust Machine Learning

Résumé

One of the main goal of Artificial Intelligence is to develop models capable of providing valuable predictions in real-world environments. In particular, Machine Learning (ML) seeks to design such models by learning from examples coming from this same environment. However, the real world is most of the time not static, and the environment in which the model will be used can differ from the one in which it is trained. It is hence desirable to design models that are robust to changes of environments. This encapsulates a large family of topics in ML, such as adversarial robustness, meta-learning, domain adaptation and others, depending on the way the environment is perturbed.In this dissertation, we focus on methods for training models whose performance does not drastically degrade when applied to environments differing from the one the model has been trained in. Various types of environmental changes will be treated, differing in their structure or magnitude. Each setup defines a certain kind of robustness to certain environmental changes, and leads to a certain optimization problem to be solved. We consider 3 different setups, and propose algorithms for solving each associated problem using 3 different types of methods, namely, min-max optimization (Chapter 2), regularization (Chapter 3) and variable selection (Chapter 4).Leveraging the framework of distributionally robust optimization, which phrases the problem of robust training as a min-max optimization problem, we first aim to train robust models by directly solving the associated min-max problem. This is done by exploiting recent work on game theory as well as first-order sampling algorithms based on Langevin dynamics. Using this approach, we propose a method for training robust agents in the scope of Reinforcement Learning.We then treat the case of adversarial robustness, i.e., robustness to small arbitrary perturbation of the model's input. It is known that neural networks trained using classical optimization methods are particularly sensitive to this type of perturbations. The adversarial robustness of a model is tightly connected to its smoothness, which is quantified by its so-called Lipschitz constant. This constant measures how much the model's output changes upon any bounded input perturbation. We hence develop a method to estimate an upper bound on the Lipschitz constant of neural networks via polynomial optimization, which can serve as a robustness certificate against adversarial attacks. We then propose to penalize the Lipschitz constant during training by minimizing the 1-path-norm of the neural network, and we develop an algorithm for solving the resulting regularized problem by efficiently computing the proximal operator of the 1-path-norm term, which is non-smooth and non-convex.Finally, we consider a scenario where the environmental changes can be arbitrary large (as opposed to adversarial robustness), but need to preserve a certain causal structure. Recent works have demonstrated interesting connections between robustness and the use of causal variables. Assuming that certain mechanisms remain invariant under some change of the environment, it has been shown that knowing the underlying causal structure of the data at hand allows to train models that are invariant to such changes. Unfortunately, in many cases, the causal structure is unknown. We thus propose a causal discovery algorithm from observational data in the case of non-linear additive model.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (31)

Publications associées (105)

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Adversarial machine learning

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a d

Optimisation robuste

L'optimisation robuste est une branche de l'optimisation mathématique qui cherche à résoudre un problème d'optimisation en prenant en compte les différentes sources d'incertitude de celui-ci.
Hi

Chargement

Chargement

Chargement

In the last decade, deep neural networks have achieved tremendous success in many fields of machine learning.However, they are shown vulnerable against adversarial attacks: well-designed, yet imperceptible, perturbations can make the state-of-the-art deep neural networks output incorrect results.Understanding adversarial attacks and designing algorithms to make deep neural networks robust against these attacks are key steps to building reliable artificial intelligence in real-life applications.In this thesis, we will first formulate the robust learning problem.Based on the notations of empirical robustness and verified robustness, we design new algorithms to achieve both of these types of robustness.Specifically, we investigate the robust learning problem from the optimization perspectives.Compared with classic empirical risk minimization, we show the slow convergence and large generalization gap in robust learning.Our theoretical and numerical analysis indicates that these challenges arise, respectively, from non-smooth loss landscapes and model's fitting hard adversarial instances.Our insights shed some light on designing algorithms for mitigating these challenges.Robust learning has other challenges, such as large model capacity requirements and high computational complexity.To solve the model capacity issue, we combine robust learning with model compression.We design an algorithm to obtain sparse and binary neural networks and make it robust.To decrease the computational complexity, we accelerate the existing adversarial training algorithm and preserve its performance stability.In addition to making models robust, our research provides other benefits.Our methods demonstrate that robust models, compared with non-robust ones, usually utilize input features in a way more similar to the way human beings use them, hence the robust models are more interpretable.To obtain verified robustness, our methods indicate the geometric similarity of the decision boundaries near data points.Our approaches towards reliable artificial intelligence can not only render deep neural networks more robust in safety-critical applications but also make us better aware of how they work.

Our brain continuously self-organizes to construct and maintain an internal representation of the world based on the information arriving through sensory stimuli. Remarkably, cortical areas related to different sensory modalities appear to share the same functional unit, the neuron, and develop through the same learning mechanism, synaptic plasticity. It motivates the conjecture of a unifying theory to explain cortical representational learning across sensory modalities. In this thesis we present theories and computational models of learning and optimization in neural networks, postulating functional properties of synaptic plasticity that support the apparent universal learning capacity of cortical networks. In the past decades, a variety of theories and models have been proposed to describe receptive field formation in sensory areas. They include normative models such as sparse coding, and bottom-up models such as spike-timing dependent plasticity. We bring together candidate explanations by demonstrating that in fact a single principle is sufficient to explain receptive field development. First, we show that many representative models of sensory development are in fact implementing variations of a common principle: nonlinear Hebbian learning. Second, we reveal that nonlinear Hebbian learning is sufficient for receptive field formation through sensory inputs. A surprising result is that our findings are independent of specific details, and allow for robust predictions of the learned receptive fields. Thus nonlinear Hebbian learning and natural statistics can account for many aspects of receptive field formation across models and sensory modalities. The Hebbian learning theory substantiates that synaptic plasticity can be interpreted as an optimization procedure, implementing stochastic gradient descent. In stochastic gradient descent inputs arrive sequentially, as in sensory streams. However, individual data samples have very little information about the correct learning signal, and it becomes a fundamental problem to know how many samples are required for reliable synaptic changes. Through estimation theory, we develop a novel adaptive learning rate model, that adapts the magnitude of synaptic changes based on the statistics of the learning signal, enabling an optimal use of data samples. Our model has a simple implementation and demonstrates improved learning speed, making this a promising candidate for large artificial neural network applications. The model also makes predictions on how cortical plasticity may modulate synaptic plasticity for optimal learning. The optimal sampling size for reliable learning allows us to estimate optimal learning times for a given model. We apply this theory to derive analytical bounds on times for the optimization of synaptic connections. First, we show this optimization problem to have exponentially many saddle-nodes, which lead to small gradients and slow learning. Second, we show that the number of input synapses to a neuron modulates the magnitude of the initial gradient, determining the duration of learning. Our final result reveals that the learning duration increases supra-linearly with the number of synapses, suggesting an effective limit on synaptic connections and receptive field sizes in developing neural networks.

Learning to embed data into a space where similar points are together and dissimilar points are far apart is a challenging machine learning problem. In this dissertation we study two learning scenarios that arise in the context of learning embeddings and one scenario in efficiently estimating an empirical expectation. We present novel algorithmic solutions and demonstrate their applications on a wide range of data-sets. The first scenario deals with learning from small data with large number of classes. This setting is common in computer vision problems such as person re-identification and face verification. To address this problem we present a new algorithm called Weighted Approximate Rank Component Analysis (WARCA), which is scalable, robust, non-linear and is independent of the number of classes. We empirically demonstrate the performance of our algorithm on 9 standard person re-identification data-sets where we obtain state of the art performance in terms of accuracy as well as computational speed. The second scenario we consider is learning embeddings from sequences. When it comes to learning from sequences, recurrent neural networks have proved to be an effective algorithm. However there are many problems with existing recurrent neural networks which makes them data hungry (high sample complexity) and difficult to train. We present a new recurrent neural network called Kronecker Recurrent Units (KRU), which addresses the issues of existing recurrent neural networks through Kronecker matrices. We show its performance on 7 applications, ranging from problems in computer vision, language modeling, music modeling and speech recognition. Most of the machine learning algorithms are formulated as minimizing an empirical expectation over a finite collection of samples. In this thesis we also investigate the problem of efficiently estimating a weighted average over large data-sets. We present a new data-structure called Importance Sampling Tree (IST), which permits fast estimation of weighted average without looking at all the samples. We show successfully the evaluation of our data-structure in the training of neural networks in order to efficiently find informative samples.