**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Exploration versus Exploitation

Description

This lecture discusses the exploration-exploitation dilemma in reinforcement learning, where the challenge lies in balancing the need to explore new possibilities to find optimal actions with the desire to exploit known rewarding actions. It covers the issues of correct Q values estimation, the drawbacks of a greedy strategy, and practical approaches like epsilon-greedy methods. Through examples and simulations, the instructor illustrates how different strategies impact decision-making and performance in reinforcement learning tasks.

Login to watch the video

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Related concepts (4)

Instructor

CS-456: Artificial neural networks/reinforcement learning

Since 2010 approaches in deep learning have revolutionized fields as diverse as computer vision, machine learning, or artificial intelligence. This course gives a systematic introduction into influent

Greedy algorithm

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time. For example, a greedy strategy for the travelling salesman problem (which is of high computational complexity) is the following heuristic: "At each step of the journey, visit the nearest unvisited city.

Greedy coloring

In the study of graph coloring problems in mathematics and computer science, a greedy coloring or sequential coloring is a coloring of the vertices of a graph formed by a greedy algorithm that considers the vertices of the graph in sequence and assigns each vertex its first available color. Greedy colorings can be found in linear time, but they do not, in general, use the minimum number of colors possible. Different choices of the sequence of vertices will typically produce different colorings of the given graph, so much of the study of greedy colorings has concerned how to find a good ordering.

Grundy number

In graph theory, the Grundy number or Grundy chromatic number of an undirected graph is the maximum number of colors that can be used by a greedy coloring strategy that considers the vertices of the graph in sequence and assigns each vertex its first available color, using a vertex ordering chosen to use as many colors as possible. Grundy numbers are named after P. M. Grundy, who studied an analogous concept for directed graphs in 1939. The undirected version was introduced by .

Approximation algorithm

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time.