Publication

Robust Reinforcement Learning via Adversarial training with Langevin Dynamics

Résumé

We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization across different training and testing conditions, on several MuJoCo environments. Our experiments also show that, even for objective functions that entirely ignore potential environmental shifts, our sampling approach remains highly robust in comparison to standard RL algorithms.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Concepts associés (22)
Reinforcement learning from human feedback
In machine learning, reinforcement learning from human feedback (RLHF) or reinforcement learning from human preferences is a technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning (RL) through an optimization algorithm like Proximal Policy Optimization. The reward model is trained in advance to the policy being optimized to predict if a given output is good (high reward) or bad (low reward).
Apprentissage par renforcement
En intelligence artificielle, plus précisément en apprentissage automatique, l'apprentissage par renforcement consiste, pour un agent autonome ( robot, agent conversationnel, personnage dans un jeu vidéo), à apprendre les actions à prendre, à partir d'expériences, de façon à optimiser une récompense quantitative au cours du temps. L'agent est plongé au sein d'un environnement et prend ses décisions en fonction de son état courant. En retour, l'environnement procure à l'agent une récompense, qui peut être positive ou négative.
Adversarial machine learning
Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A survey from May 2020 exposes the fact that practitioners report a dire need for better protecting machine learning systems in industrial applications. To understand, note that most machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID).
Afficher plus
Publications associées (32)

Residual-based attention in physics-informed neural networks

Nikolaos Stergiopulos, Sokratis Anagnostopoulos

Driven by the need for more efficient and seamless integration of physical models and data, physics -informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains ...
Lausanne2024

Fusing Pre-existing Knowledge and Machine Learning for Enhanced Building Thermal Modeling and Control

Loris Di Natale

Buildings play a pivotal role in the ongoing worldwide energy transition, accounting for 30% of the global energy consumption. With traditional engineering solutions reaching their limits to tackle such large-scale problems, data-driven methods and Machine ...
EPFL2024

Reinforcement learning approach to control an inverted pendulum: A general framework for educational purposes

Jesus Sanchez Rodriguez

Machine learning is often cited as a new paradigm in control theory, but is also often viewed as empirical and less intuitive for students than classical model-based methods. This is particularly the case for reinforcement learning, an approach that does n ...
PUBLIC LIBRARY SCIENCE2023
Afficher plus
MOOCs associés (3)
Neuro Robotics
At the same time, several different tutorials on available data and data tools, such as those from the Allen Institute for Brain Science, provide you with in-depth knowledge on brain atlases, gene exp
Neurorobotics
The MOOC on Neuro-robotics focuses on teaching advanced learners to design and construct a virtual robot and test its performance in a simulation using the HBP robotics platform. Learners will learn t
Neurorobotics
The MOOC on Neuro-robotics focuses on teaching advanced learners to design and construct a virtual robot and test its performance in a simulation using the HBP robotics platform. Learners will learn t