CEU Electronic Theses and Dissertations, 2013
Author | Zimin, Alexander |
---|---|
Title | Online Learning in Markovian Decision Processes |
Summary | This thesis stidies the theoretical properties of the Relative Entropy Policy Search algorithm. We explore that it is an instance of the Proximal Point Algorithm and, using this fact, develop the applications to different learning problems that can be formulated using Markovian Decision Processes. First, we survey the theory underlying the Proximal Point Algorithm and show how it is used in the context of online linear optimization. Second, we apply the algorithm to the full-information and the bandit cases of the online stochastic shortest path problem. We show that this approach vastly improves the previously known results. Finally, we introduce O-REPS, a version of REPS applied to the online learning in unichain MDPs in the full-information case. We prove that it enjoys an optimal bound on the regret with smaller additional terms than previously known bounds. |
Supervisor | Györfi, László |
Department | Mathematics MSc |
Full text | https://www.etd.ceu.edu/2013/zimin_alexander.pdf |
Visit the CEU Library.
© 2007-2021, Central European University