showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

PhD Dissertation Proposal by Tanvi VERMA | Scalable Reinforcement Learning in Anonymous Multi-Agent Settings

Please click here if you are unable to view this page.

 


 


 


 

 

 

Scalable Reinforcement Learning in Anonymous Multi-Agent Settings

 

 

 

 


 

 

 


 


 

 

 

 

Tanvi VERMA


 

PhD Candidate

School of Information Systems

Singapore Management University

 


 


 

FULL PROFILE

 


Research Area


 

 

Dissertation Committee


 

Chairman


 

 

Committee Members


 

 

 

External Member


 

  • Sarit Kraus, Professor, Bar-llan University

 

 

 

 

 


 


 


 


 

 


Date


 

October 29, 2018 (Monday)

 

 


Time


 

4.00pm - 5.00pm

 

 


Venue


 

Meeting Room 4.4, Level 4,

School of Information Systems,

Singapore Management University,

80 Stamford Road

Singapore 178902

 

 

We look forward to seeing you at this research seminar.


 

 


 


 


 


 

 

 

About The Talk


 

Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the "right" place at the "right" time. In my thesis, I develop approaches that learn to guide individuals to be in the "right" place at the "right" time (to maximize revenue) in the presence of other similar learning individuals.


 

A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (for example, competing for demand in taxi domain) is dependent only on the number and not on the identity of the agents. Hence, I model the learning problem as Anonymous MARL (AyMARL) and focus on providing learning methods for independent agents to learn efficiently with limited local observation.


 

First, I develop a learning mechanism for independent agents to learn from offline trajectories of other agents. I show that they perform extremely well when almost all the other agents follow stationary policies. I then propose a method of independent learning when the agent is aware of the fact that other agents are also simultaneously learning. In this approach, the learning agents also consider the number of other agents present in their local observation. Experimental results on real-world data sets demonstrates that these approaches improve the efficiency of independent learners over the existing approaches.

 

 

 

Speaker Biography


 

Tanvi VERMA is a PhD candidate in School of Information Systems, Singapore Management University. She is part of Intelligent Systems and Optimization Group and is advised by Associate Professor Pradeep Varakantham and Professor Hoong Chuin Lau. She received her B.Tech in Computer Science & Engineering from National Institute of Technology (NIT), Warangal, India. She then worked as a software engineer at NetApp, Bangalore before joining the PhD program at SMU in 2015. Her key research interests include Decision Making under Uncertainty, Reinforcement Learning and Multiagent Systems.