Deep reinforcement learningDeep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs (e.g.
Reinforcement learningReinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected.
École des ponts ParisTechÉcole des Ponts ParisTech (originally called École nationale des ponts et chaussées or ENPC, also nicknamed Ponts) is a university-level institution of higher education and research in the field of science, engineering and technology. Founded in 1747 by Daniel-Charles Trudaine, it is one of the oldest and one of the most prestigious French Grandes Écoles. Historically, its primary mission has been to train engineering officials and civil engineers but the school now offers a wide-ranging education including computer science, applied mathematics, civil engineering, mechanics, finance, economics, innovation, urban studies, environment and transport engineering.
Q-learningQ-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions and rewards without requiring adaptations. For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state.
ParisTechParisTech is a cluster that brings together 7 renowned grandes écoles based in Paris, France. It covers the whole spectrum of science, technology and management and has more than 12.000 students. The training programs in engineering bring them together. But ParisTech schools offer also Master programmes, Advanced Masters (Mastères Spécialisés), several MBA programmes and a vast range of PhD programmes.