Policy Gradient With Value Function Approximation For Collective Multiagent Planning
Speaker (s): 
NGUYEN Duc Thien PhD Candidate School of Information Systems Singapore Management University | Date:
Time:
Venue: | | November 24, 2017, Friday
2:00pm - 3:00pm
Meeting Room 5.1, Level 5 School of Information Systems Singapore Management University 80 Stamford Road Singapore 178902 We look forward to seeing you at this research seminar. 
|
|
About the Talk
Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.
This is a pre-conference talk for Neural Information Processing Systems (NIPS 2017).
About the Speaker
NGUYEN Duc Thien is a fourth-year PhD candidate in Information Systems. Since 2014, he has been working under the supervision of Professor Lau Hoong Chuin and Assistant Professor Akshat Kumar in his PhD thesis topic "Collective Multi-agent Planning and Inference", i.e. to find the agent policy in a (large) population. Before joining SMU as a PhD student, he had his Master degree in Information Systems from SMU in 2013 and Bachelor degree in Mathematics from Vietnam National University in 2010.