CEU eTD Collection (2013); Zimin, Alexander: Online Learning in Markovian Decision Processes

CEU Electronic Theses and Dissertations, 2013
Author Zimin, Alexander
Title Online Learning in Markovian Decision Processes
Summary This thesis stidies the theoretical properties of the Relative Entropy Policy Search algorithm. We explore that it is an instance of the Proximal Point Algorithm and, using this fact, develop the applications to di fferent learning problems that can be formulated using Markovian Decision Processes.
First, we survey the theory underlying the Proximal Point Algorithm and show how it is used in the context of online linear optimization.
Second, we apply the algorithm to the full-information and the bandit cases of the online stochastic shortest path problem. We show that this approach vastly improves the previously known results.
Finally, we introduce O-REPS, a version of REPS applied to the online learning in unichain MDPs in the full-information case. We prove that it enjoys an optimal bound on the regret with smaller additional terms than previously known bounds.
Supervisor Györfi, László
Department Mathematics MSc
Full texthttps://www.etd.ceu.edu/2013/zimin_alexander.pdf

Visit the CEU Library.

© 2007-2021, Central European University