| |
| | Scalable Reinforcement Learning in Anonymous Multi-Agent Settings |

| Tanvi VERMA
PhD Candidate
School of Information Systems
Singapore Management University
| Research Area
Dissertation Committee
Chairman
Committee Members
External Member
- Sarit Kraus, Professor, Bar-llan University
|
| |
Date
October 29, 2018 (Monday) | Time
4.00pm - 5.00pm | Venue
Meeting Room 4.4, Level 4,
School of Information Systems,
Singapore Management University,
80 Stamford Road
Singapore 178902 | We look forward to seeing you at this research seminar.

|
|
|
| | About The Talk
Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the "right" place at the "right" time. In my thesis, I develop approaches that learn to guide individuals to be in the "right" place at the "right" time (to maximize revenue) in the presence of other similar learning individuals.
A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (for example, competing for demand in taxi domain) is dependent only on the number and not on the identity of the agents. Hence, I model the learning problem as Anonymous MARL (AyMARL) and focus on providing learning methods for independent agents to learn efficiently with limited local observation.
First, I develop a learning mechanism for independent agents to learn from offline trajectories of other agents. I show that they perform extremely well when almost all the other agents follow stationary policies. I then propose a method of independent learning when the agent is aware of the fact that other agents are also simultaneously learning. In this approach, the learning agents also consider the number of other agents present in their local observation. Experimental results on real-world data sets demonstrates that these approaches improve the efficiency of independent learners over the existing approaches. | | | Speaker Biography
Tanvi VERMA is a PhD candidate in School of Information Systems, Singapore Management University. She is part of Intelligent Systems and Optimization Group and is advised by Associate Professor Pradeep Varakantham and Professor Hoong Chuin Lau. She received her B.Tech in Computer Science & Engineering from National Institute of Technology (NIT), Warangal, India. She then worked as a software engineer at NetApp, Bangalore before joining the PhD program at SMU in 2015. Her key research interests include Decision Making under Uncertainty, Reinforcement Learning and Multiagent Systems. |
|