Publication

Minimizing Regret in Unconstrained Online Convex Optimization

Maryam Kamgarpour
2018
Conference paper
Abstract

We consider online convex optimizations in the bandit setting. The decision maker does not know the time- varying cost functions, or their gradients. At each time step, she observes the value of the cost function for her chosen action. The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of the optimal action computable had she known the cost functions a priori. We present a novel algorithm in order to minimize the regret in an unconstrained action space. Our algorithm hinges on the idea of introducing randomization to approximate the gradients of the cost functions using only their observed values. We establish an almost sure regret bound for the mean values of actions and an expected regret bound for the actions.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.