Iterative Lower Bound Optimization for Planning in Continuous MDPs
Speaker (s):

LOW Siow Meng
PhD Student
School of Computing and Information Systems
Singapore Management University
|
|
Date:
Time:
Venue:
|
|
17 February 2022, Thursday
3:00pm - 3:30pm
This is a virtual seminar. Please register by 16 February, the zoom link will be sent out on the following day to those who registered.
We look forward to seeing you at this research seminar.

|
|
About the Talk
Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous Markov Decision Process (MDP) planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this talk, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP with respect to a locally tight lower-bounded objective. This reformulation has various advantages, such as ability to reuse samples between iterations and theoretical guarantee of monotonically improving objective. We present three empirical experiments and demonstrate the superior sample-efficiency and learning stability of our proposed method.
About the Speaker
LOW Siow Meng is a PhD candidate in Computer Science at the SMU School of Computing and Information Systems, supervised by Prof. Akshat KUMAR. His research focuses on model-based reinforcement learning and planning. Prior to starting his academic career, he had worked as Data Scientist and Software Consultant in a number of global technology MNCs.