Minimizing Regret in Unconstrained Online Convex Optimization

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.

We consider online convex optimizations in the bandit setting. The decision maker does not know the time- varying cost functions, or their gradients. At each time step, she observes the value of the cost function for her chosen action. The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of the optimal action computable had she known the cost functions a priori. We present a novel algorithm in order to minimize the regret in an unconstrained action space. Our algorithm hinges on the idea of introducing randomization to approximate the gradients of the cost functions using only their observed values. We establish an almost sure regret bound for the mean values of actions and an expected regret bound for the actions.

Minimizing Regret in Unconstrained Online Convex Optimization

Graph Chatbot

Chat with Graph Search

A continuum approximation approach to the depot location problem in a crowd-shipping system

Optimal Matching of Random Parts

Modeling and mitigating supply chain disruptions as a bilevel network flow problem

Optimal Matching of Random Parts

Modeling and mitigating supply chain disruptions as a bilevel network flow problem

A continuum approximation approach to the depot location problem in a crowd-shipping system