Revisiting Character-level Adversarial Attacks for Language Models

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While characterlevel attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our implementation is available in github.com/LIONS-EPFL Charmer.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Revisiting Character-level Adversarial Attacks for Language Models

Graph Chatbot

Chattez avec Graph Search

Understanding generalization and robustness in modern deep learning

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Understanding generalization and robustness in modern deep learning

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Explainable Face Verification via Feature-Guided Gradient Backpropagation