**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Publication# Controlling the Complexity and Lipschitz Constant improves Polynomial Nets

Abstract

While the class of Polynomial Nets demonstrates comparable performance to neural networks (NN), it currently has neither theoretical generalization characterization nor robustness guarantees. To this end, we derive new complexity bounds for the set of Coupled CP-Decomposition (CCP) and Nested Coupled CP-decomposition (NCP) models of Polynomial Nets in terms of the $\ell_\infty$-operator-norm and the $\ell_2$-operator norm. In addition, we derive bounds on the Lipschitz constant for both models to establish a theoretical certificate for their robustness. The theoretical results enable us to propose a principled regularization scheme that we also evaluate experimentally in six datasets and show that it improves the accuracy as well as the robustness of the models to adversarial perturbations. We showcase how this regularization can be combined with adversarial training, resulting in further improvements.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related concepts

Loading

Related publications

Loading

Related concepts (15)

Related publications (24)

Neural network

A neural network can refer to a neural circuit of biological neurons (sometimes also called a biological neural network), a network of artificial neurons or nodes in the case of an artificial neur

Complexity

Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and

Convolutional neural network

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization. Vanishing gradients and explodin

Loading

Loading

Loading

In the last decade, deep neural networks have achieved tremendous success in many fields of machine learning.However, they are shown vulnerable against adversarial attacks: well-designed, yet imperceptible, perturbations can make the state-of-the-art deep neural networks output incorrect results.Understanding adversarial attacks and designing algorithms to make deep neural networks robust against these attacks are key steps to building reliable artificial intelligence in real-life applications.In this thesis, we will first formulate the robust learning problem.Based on the notations of empirical robustness and verified robustness, we design new algorithms to achieve both of these types of robustness.Specifically, we investigate the robust learning problem from the optimization perspectives.Compared with classic empirical risk minimization, we show the slow convergence and large generalization gap in robust learning.Our theoretical and numerical analysis indicates that these challenges arise, respectively, from non-smooth loss landscapes and model's fitting hard adversarial instances.Our insights shed some light on designing algorithms for mitigating these challenges.Robust learning has other challenges, such as large model capacity requirements and high computational complexity.To solve the model capacity issue, we combine robust learning with model compression.We design an algorithm to obtain sparse and binary neural networks and make it robust.To decrease the computational complexity, we accelerate the existing adversarial training algorithm and preserve its performance stability.In addition to making models robust, our research provides other benefits.Our methods demonstrate that robust models, compared with non-robust ones, usually utilize input features in a way more similar to the way human beings use them, hence the robust models are more interpretable.To obtain verified robustness, our methods indicate the geometric similarity of the decision boundaries near data points.Our approaches towards reliable artificial intelligence can not only render deep neural networks more robust in safety-critical applications but also make us better aware of how they work.

One of the main goal of Artificial Intelligence is to develop models capable of providing valuable predictions in real-world environments. In particular, Machine Learning (ML) seeks to design such models by learning from examples coming from this same environment. However, the real world is most of the time not static, and the environment in which the model will be used can differ from the one in which it is trained. It is hence desirable to design models that are robust to changes of environments. This encapsulates a large family of topics in ML, such as adversarial robustness, meta-learning, domain adaptation and others, depending on the way the environment is perturbed.In this dissertation, we focus on methods for training models whose performance does not drastically degrade when applied to environments differing from the one the model has been trained in. Various types of environmental changes will be treated, differing in their structure or magnitude. Each setup defines a certain kind of robustness to certain environmental changes, and leads to a certain optimization problem to be solved. We consider 3 different setups, and propose algorithms for solving each associated problem using 3 different types of methods, namely, min-max optimization (Chapter 2), regularization (Chapter 3) and variable selection (Chapter 4).Leveraging the framework of distributionally robust optimization, which phrases the problem of robust training as a min-max optimization problem, we first aim to train robust models by directly solving the associated min-max problem. This is done by exploiting recent work on game theory as well as first-order sampling algorithms based on Langevin dynamics. Using this approach, we propose a method for training robust agents in the scope of Reinforcement Learning.We then treat the case of adversarial robustness, i.e., robustness to small arbitrary perturbation of the model's input. It is known that neural networks trained using classical optimization methods are particularly sensitive to this type of perturbations. The adversarial robustness of a model is tightly connected to its smoothness, which is quantified by its so-called Lipschitz constant. This constant measures how much the model's output changes upon any bounded input perturbation. We hence develop a method to estimate an upper bound on the Lipschitz constant of neural networks via polynomial optimization, which can serve as a robustness certificate against adversarial attacks. We then propose to penalize the Lipschitz constant during training by minimizing the 1-path-norm of the neural network, and we develop an algorithm for solving the resulting regularized problem by efficiently computing the proximal operator of the 1-path-norm term, which is non-smooth and non-convex.Finally, we consider a scenario where the environmental changes can be arbitrary large (as opposed to adversarial robustness), but need to preserve a certain causal structure. Recent works have demonstrated interesting connections between robustness and the use of causal variables. Assuming that certain mechanisms remain invariant under some change of the environment, it has been shown that knowing the underlying causal structure of the data at hand allows to train models that are invariant to such changes. Unfortunately, in many cases, the causal structure is unknown. We thus propose a causal discovery algorithm from observational data in the case of non-linear additive model.

Deep Neural Networks (DNNs) have achieved great success in a wide range of applications, such as image recognition, object detection, and semantic segmentation. Even thoughthe discriminative power of DNNs is nowadays unquestionable, serious concerns have arised ever since DNNs have shown to be vulnerable to adversarial examples craftedby adding imperceptible perturbations to clean images. The implications of these malicious attacks are even more significant for DNNs deployed in real-world systems, e.g.,autonomous driving and biometric authentication. Consequently, an intriguing question that we aim to understand is the underlying behavior of DNNs to adversarial attacks.This thesis contributes to a better understanding of the mechanism of adversarial attacks on DNNs. Our main contributions are broadly in two directions: (1) we proposeinterpretable architectures first to understand the reasons for the success of adversarial attacks and then to improve the robustness of DNNs; (2) we design intuitive adversarialattacks to both mislead and use as a tool to expand our present understanding of DNNs' internal workings and their limitations. In the first direction, we introduce deep architectures that allow humans to interpret the reasoning process of DNNs prediction. Specifically, we incorporate Bag-of-visual-wordsrepresentations from the pre-deep learning era into DNNs using an attention scheme. We find key reasons for adversarial attack success and use these insights to propose anadversarial defense by maximally separating the latent features of discriminative regions while minimizing the contribution of non-discriminative regions in the final prediction.The second direction deals with the design of adversarial attacks to understand DNNs' limitations in a real-world environment. To begin with, we show that existing state-of-the-art semantic segmentation networks that achieve superior performance by exploiting the context are highly susceptible to indirect local attacks. Furthermore, we demonstrate the existence of universal directional perturbations that are quasi-independent of the input template but still successfully fool unknown siamese-based visual object trackers. We then identify that the mid-level filter banks across different backbones bear strong similarities and thus can be potential common ground for attack. We, therefore, learn a generator that disrupts mid-level features with high transferability across different target architectures, datasets, and tasks. In short, our attacks highlight critical vulnerabilities of DNNs, which make their deployment challenging in the real-world environment, even in the extreme case when the attacker is unaware of the target architecture or the targetdata used to train it.Furthermore, we go beyond fooling networks and demonstrate the usefulness of adversarial attacks for studying the internal disentangled representations in self-supervised 3D pose estimation networks. We observe that adversarial manipulation of appearance information in the input image alters the pose output, indicating that the pose code contains appearance information and disentanglement is far from complete. Besides the above contributions, an underlying theme that arises multiple times in this thesis is counteracting the adversarial attacks by detecting them.