Research Seminar by LOW Siow Meng | Iterative Lower Bound Optimization for Planning in Continuous MDPs

Please click here if you are unable to view this page.

Iterative Lower Bound Optimization for Planning in Continuous MDPs

Speaker (s):

LOW Siow Meng
PhD Student
School of Computing and Information Systems
Singapore Management University

Date:

Time:

Venue:

17 February 2022, Thursday

3:00pm - 3:30pm

This is a virtual seminar. Please register by 16 February, the zoom link will be sent out on the following day to those who registered.

We look forward to seeing you at this research seminar.

About the Talk

Recent advances in deep learning have enabled optimization of deep reactive policies (DRPs) for continuous Markov Decision Process (MDP) planning by encoding a parametric policy as a deep neural network and exploiting automatic differentiation in an end-to-end model-based gradient descent framework. This approach has proven effective for optimizing DRPs in nonlinear continuous MDPs, but it requires a large number of sampled trajectories to learn effectively and can suffer from high variance in solution quality. In this talk, we revisit the overall model-based DRP objective and instead take a minorization-maximization perspective to iteratively optimize the DRP with respect to a locally tight lower-bounded objective. This reformulation has various advantages, such as ability to reuse samples between iterations and theoretical guarantee of monotonically improving objective. We present three empirical experiments and demonstrate the superior sample-efficiency and learning stability of our proposed method.

About the Speaker

LOW Siow Meng is a PhD candidate in Computer Science at the SMU School of Computing and Information Systems, supervised by Prof. Akshat KUMAR. His research focuses on model-based reinforcement learning and planning. Prior to starting his academic career, he had worked as Data Scientist and Software Consultant in a number of global technology MNCs.

Where to find us

Get in touch