Publication

Estimating and Improving the Robustness of Attributions in Text

Ádám Dániel Ivánkay
2023
EPFL thesis
Abstract

End-to-end learning methods like deep neural networks have been the driving force in the remarkable progress of machine learning in recent years. However, despite their success, the deployment process of such networks in safety-critical use cases, such as healthcare, has been lagging. This is due to the black-box nature of deep neural networks. Such networks rely on raw data as input and learn relevant features directly from the data, which makes understanding the inference process hard. To mitigate this, several explanation methods have been proposed, such as local linear proxy models, attribution methods, feature activation maps or attention mechanisms.However, many of these explanation methods, attribution maps in particular, tend not to fulfill certain desiderata of faithful explanations, in particular robustness, i.e., explanations should be invariant towards imperceptible perturbations in the input that do not alter the inference outcome. The poor robustness of attribution maps to such input alterations is a key factor that hinders trust in explanations and the deployment on neural networks in high-stakes scenarios.While the robustness of attribution maps has been studied extensively in the image domain, it has not been researched in text domains at all. This is the focus of this thesis. First, we show that the existence of imperceptible, adversarial perturbations on attributions extends to text classifiers as well. We demonstrate this on five text classification datasets and a range of state-of-the-art classifier architectures. Moreover, we show that such perturbations transfer across model architectures and attribution methods, being effective in scenarios where the target model and explanation method are unknown.Our initial findings demonstrate the need for a definition of attribution robustness that incorporates the extent to which the input sentences are altered in order to differentiate between more perceptible adversarial perturbations. Thus, we establish a new definition of attribution robustness, based on Lipschitz continuity, that reflects the perceptibility of such alterations. This allows for effectively quantifying and comparing the robustness of neural network attributions. As part of this effort, we propose a set of metrics that effectively capture the perceptibility of perturbations in text. Then, based on our new definition, we introduce a novel attack that yields perturbations altering explanations to a greater extent while being less perceptible.Lastly, in order to improve attribution robustness in text classifiers, we introduce a general framework for training robust classifiers, which is a generalized formulation of current robust training objectives. We propose instantiations of this framework and show, with experiments on three biomedical text datasets, that attributions in medical text classifiers lack robustness to small input perturbations as well. Then we showcase that our instantiations successfully train networks with improved attribution robustness, outperforming baseline methods. Finally, we show that our framework performs better or comparably to current methods in image classification as well, while being more general.In summary, our work significantly contributes to quantifying and improving the attribution robustness of text classifiers, taking a step towards enabling the safe deployment of state-of-the-art neural networks in real life, safety-critical applications like healthcare ones.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Artificial neural network
Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
Recurrent neural network
A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.
Deep learning
Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Show more
Related publications (162)

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Jean-Paul Richard Kneib, Emma Elizabeth Tolley, Tianyue Chen, Michele Bianco

The upcoming Square Kilometre Array Observatory will produce images of neutral hydrogen distribution during the epoch of reionization by observing the corresponding 21-cm signal. However, the 21-cm signal will be subject to instrumental limitations such as ...
Oxford Univ Press2024

Machine Learning for Modeling Stock Returns

Teng Andrea Xu

Throughout history, the pace of knowledge and information sharing has evolved into an unthinkable speed and media. At the end of the XVII century, in Europe, the ideas that would shape the "Age of Enlightenment" were slowly being developed in coffeehouses, ...
EPFL2024

Robust NAS under adversarial training: benchmark, theory, and beyond

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Yongtao Wu

Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robus ...
2024
Show more
Related MOOCs (31)
Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition
This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.
Neuronal Dynamics 2- Computational Neuroscience: Neuronal Dynamics of Cognition
This course explains the mathematical and computational models that are used in the field of theoretical neuroscience to analyze the collective dynamics of thousands of interacting neurons.
Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described by mathematical neuron models at different levels of detail.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.