CEU eTD Collection

CEU Electronic Theses and Dissertations, 2013

Author	Zimin, Alexander
Title	Online Learning in Markovian Decision Processes
Summary	This thesis stidies the theoretical properties of the Relative Entropy Policy Search algorithm. We explore that it is an instance of the Proximal Point Algorithm and, using this fact, develop the applications to different learning problems that can be formulated using Markovian Decision Processes. First, we survey the theory underlying the Proximal Point Algorithm and show how it is used in the context of online linear optimization. Second, we apply the algorithm to the full-information and the bandit cases of the online stochastic shortest path problem. We show that this approach vastly improves the previously known results. Finally, we introduce O-REPS, a version of REPS applied to the online learning in unichain MDPs in the full-information case. We prove that it enjoys an optimal bound on the regret with smaller additional terms than previously known bounds.
Supervisor	Györfi, László
Department	Mathematics MSc
Full text	https://www.etd.ceu.edu/2013/zimin_alexander.pdf

CEU eTD Collection (2013); Zimin, Alexander: Online Learning in Markovian Decision Processes