Multi-armed bandit

In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma. The name comes from imagining a gambler at a row of slot machines (sometimes known as "one-armed bandits"), who has to decide which machines to play, how many times to play each machine and in which order to play them, and whether to continue with the current machine or try a different machine. The multi-armed bandit problem also falls into the broad category of stochastic scheduling. In the problem, each machine provides a random reward from a probability distribution specific to that machine, that is not known a-priori. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls. The crucial tradeoff the gambler faces at each trial is between "exploitation" of the machine that has the highest expected payoff and "exploration" to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also faced in machine learning. In practice, multi-armed bandits have been used to model problems such as managing research projects in a large organization, like a science foundation or a pharmaceutical company. In early versions of the problem, the gambler begins with no initial knowledge about the machines. Herbert Robbins in 1952, realizing the importance of the problem, constructed convergent population selection strategies in "some aspects of the sequential design of experiments". A theorem, the Gittins index, first published by John C.

Robust NAS under adversarial training: benchmark, theory, and beyond

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Yongtao Wu

Recent developments in neural architecture search (NAS) emphasize the significance of considering robust architectures against malicious data. However, there is a notable absence of benchmark evaluations and theoretical guarantees for searching these robus ...

2024

Revisiting Character-level Adversarial Attacks for Language Models

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Yongtao Wu, Elias Abad Rocamora

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adv ...

2024

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

Rachid Guerraoui, Nirupam Gupta, John Stephan, Youssef Allouah, Rafaël Benjamin Pinot

The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied ...

2023

Robust NAS under adversarial training: benchmark, theory, and beyond

Revisiting Character-level Adversarial Attacks for Language Models

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

Graph Chatbot

Chat with Graph Search

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

Revisiting Character-level Adversarial Attacks for Language Models

Robust NAS under adversarial training: benchmark, theory, and beyond