showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

PhD Dissertation Defense by NGUYEN Duc Thien | Reinforcement Learning for Collective Multi-agent Decision Making

Please click here if you are unable to view this page.

 


 


 


 

 

 

Reinforcement Learning for Collective Multi-agent Decision Making

 

 

 

 


 

 

 


 


 

 

 

 

NGUYEN Duc Thien


 

PhD Candidate

School of Information Systems

Singapore Management University

 


 


 

FULL PROFILE

 


Research Area


 

 

Dissertation Committee


 

Chairman


 

 

Committee Members


 

 

 

External Member


 

  • Qin Zheng, Deputy Department Director, Senior Scientist, Institute of High Performance Computing, A*STAR

     

 

 

 


 


 


 


 

 


Date


 

October 22, 2018 (Monday)

 

 


Time


 

1.30pm - 2.30pm

 

 


Venue


 

Meeting Room 4.4, Level 4,

School of Information Systems,

Singapore Management University,

80 Stamford Road

Singapore 178902

 

 

We look forward to seeing you at this research seminar.


 

 


 


 


 


 

 

 

About The Talk


 

In this thesis, we study reinforcement learning algorithms to collectively optimize decentralized policy in a large population of autonomous agents. We notice one of the main bottlenecks in the large multi-agent system is the size of the joint trajectory of agents which quickly increases with the number of participating agents. Furthermore, the noise of actions concurrently executed by different agents in a large system makes it difficult for each agent to estimate the value of its own actions, which is well-known as the multi-agent credit assignment problem. To address these problems, we make following contributions:


 

1. We formulate a sub-class of multi-agent systems as a Collective Decentralized Partially Observable Markov Decision Process (CDEC-POMDP) where the collective behavior of a population of agents affects the joint-reward and environment dynamic. We show that in CDEC-POMDP, the transition counts, which summarize the numbers of agents taking different local actions and transiting from their current local states to new local states, are sufficient-statistics for learning/optimizing the decentralized policy. This allows us to transform the original planning problems to optimize the complex joint agent trajectory into optimizing compact count variables. In addition, samples of the counts can be efficiently obtained with multinomial distributions, which provide a faster way to simulate the multi-agent systems and evaluate the planning policy.


 

2. To address multi-agent credit assignment problem in CDEC-POMDP, we propose the collective decomposition principle in designing value function approximation and decentralized policy update. Then, based on the collective decomposition principle, we design 2 classes of MRL algorithms for domains with local rewards and for domains with global rewards respectively. i) When the reward is decomposable into local rewards among agents, by exploiting exchangeability in CDEC-POMDPs we propose a mechanism to estimate the individual value function by using the sampled values of the counts and average individual rewards. We use this count-based individual value function to derive a new actor critic algorithm called fAfC to learn effective individual policy for agents. ii) When the reward is non-decomposable, we propose to estimate individual contribution value of agents using partial differentials of the joint value function with respect to the state-action counts. This is the basis for us to develop two algorithms called MCAC and CCAC to optimize individual policy under non-decomposable reward domains.

 

 

 

Speaker Biography


 

NGUYEN Duc Thien is a PhD candidate in Information Systems. Since 2014, he has been working under the supervision of Professor Lau Hoong Chuin and Assistant Professor Akshat Kumar in his PhD thesis topic "Reinforcement Learning for Collective Multi-agent Decision Making" to develop efficient algorithms to optimize multi-agent behavior in a large system. Before joining SMU as a PhD student, he had his Master degree in Information Systems from SMU in 2013 and Bachelor degree in Mathematics from Vietnam National University in 2010.