|
|
Credit Assignment in Multiagent Reinforcement Learning for Large Agent Population
|

|
Arambam James Singh
PhD Candidate
School of Computing and Information Systems
Singapore Management University
|
Research Area
Dissertation Committee
Advisor
Co-Advisor
Committee Member
External Members
- Harold Soh Soon Hong, Assistant Professor, National University of Singapore
|
|
|
Date
11 August 2021 (Wednesday)
|
Time
10:00am - 11:00am
|
Venue
This is a virtual seminar. Please register by 09 August 2021, the zoom link will be send out on the following day to those who have registered.
|
We look forward to seeing you at this research seminar.

|
|
|
|
About The Talk
In the current age, rapid grow thin sectors like finance, transportation etc., involve fast digitization of industrial processes. This creates a huge opportunity for next-generation artificial intelligence systems with multiple agents operating at scale. Multiagent reinforcement learning (MARL) is the field of study that addresses problems in multiagent systems. In this thesis, I develop and evaluate novel MARL methodologies that address the challenges in a large scale multiagent system with a cooperative setting. One of the key challenges in cooperative MARL is the problem of credit assignment. Many of the previous approaches to the problem rely on agent's individual trajectory which makes scalability limited to a small number of agents. Our proposed methodologies are solely based on aggregate information which provides the benefit of high scalability.
The first part of this thesis investigates the challenges in the maritime traffic management (MTM) problem, one of the motivating domains for large scale cooperative multiagent systems. The key research question is how to coordinate vessels in a heavily trafficked maritime traffic environment to increase the safety of navigation by reducing traffic congestions. MTM problem is an instance of cooperative MARL with shared reward, it suffers from the credit assignment problem. We address it by developing a vessel-based value function using aggregate information, which performs effective credit assignment by computing the effectiveness of the agent’s policy by filtering out the contributions from other agents. Although this first approach achieved promising results, its ability to handle variable duration action is rather limited, which is a crucial feature of the problem domain. Thus, we address this challenge by developing a hierarchical learning based approach. We introduce a notion of meta action, a high-level action that takes a variable amount of time to execute. We also propose an individual meta value function using aggregate information which effectively addresses the credit assignment problem.
We also develop a general approach to address the credit assignment problem for a large scale cooperative multiagent system for both discrete and continuous actions settings. We extended a shaped reward approach known as difference rewards (DR)to address the credit assignment problem. DRs are an effective tool to tackle this problem, but their computation is known to be challenging even for a small number of agents. We propose a scalable method to compute difference rewards based on aggregate information.
|
|
Speaker Biography
Arambam James Singh is a PhD candidate at the School of Computing & Information Systems, Singapore Management University advised by Associate Professor Akshat Kumar and Professor Hoong Chuin Lau. His current research focuses mainly on deep reinforcement learning.
|
|