PhD Dissertation Proposal by Tanvi VERMA | Scalable Reinforcement Learning in Anonymous Multi-Agent Settings

Please click here if you are unable to view this page.

Scalable Reinforcement Learning in Anonymous Multi-Agent Settings

Tanvi VERMA

PhD Candidate

School of Information Systems

Singapore Management University

FULL PROFILE

Research Area

Intelligent Systems & Optimization

Dissertation Committee

Chairman

Associate Prof. Pradeep Reddy VARAKANTHAM

Committee Members

Prof. LAU Hoong Chuin (Co-Supervisor)

Associate Prof. CHENG Shih-Fen

External Member

Sarit Kraus, Professor, Bar-llan University

Date

October 29, 2018 (Monday)

Time

4.00pm - 5.00pm

Venue

Meeting Room 4.4, Level 4,

School of Information Systems,

Singapore Management University,

80 Stamford Road

Singapore 178902

We look forward to seeing you at this research seminar.

About The Talk

Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the "right" place at the "right" time. In my thesis, I develop approaches that learn to guide individuals to be in the "right" place at the "right" time (to maximize revenue) in the presence of other similar learning individuals.

A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (for example, competing for demand in taxi domain) is dependent only on the number and not on the identity of the agents. Hence, I model the learning problem as Anonymous MARL (AyMARL) and focus on providing learning methods for independent agents to learn efficiently with limited local observation.

First, I develop a learning mechanism for independent agents to learn from offline trajectories of other agents. I show that they perform extremely well when almost all the other agents follow stationary policies. I then propose a method of independent learning when the agent is aware of the fact that other agents are also simultaneously learning. In this approach, the learning agents also consider the number of other agents present in their local observation. Experimental results on real-world data sets demonstrates that these approaches improve the efficiency of independent learners over the existing approaches.

Speaker Biography

Tanvi VERMA is a PhD candidate in School of Information Systems, Singapore Management University. She is part of Intelligent Systems and Optimization Group and is advised by Associate Professor Pradeep Varakantham and Professor Hoong Chuin Lau. She received her B.Tech in Computer Science & Engineering from National Institute of Technology (NIT), Warangal, India. She then worked as a software engineer at NetApp, Bangalore before joining the PhD program at SMU in 2015. Her key research interests include Decision Making under Uncertainty, Reinforcement Learning and Multiagent Systems.

Where to find us

Get in touch