showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

Pre-Conference Talks by Tanvi VERMA & Meghna LOWALEKAR

Please click here if you are unable to view this page.

 

Pre-Conference Talks by Tanvi VERMA & Meghna LOWALEKAR

 
DATE :  July 5, 2019, Friday
TIME :  2.00pm - 2.40pm
VENUE :  Meeting Room 4.4, Level 4

  SMU School of Information Systems

  80 Stamford Road

  Singapore 178902
 
There are 2 talks in this session, each talk is approximately 20 minutes.

About the Talk (s)

Talk #1: Entropy Based Independent Learning in Anonymous Multi-Agent Settings

by Tanvi VERMA, PhD Candidate, School of Information Systems, Singapore Management University

Efficient sequential matching of supply and demand is a problem of interest in many online to offline services. For instance, Uber, Lyft, Grab for matching taxis to customers; Ubereats, Deliveroo, FoodPanda etc for matching restaurants to customers. In these online to offline service problems, individuals who are responsible for supply (e.g., taxi drivers, delivery bikes or delivery van drivers) earn more by being at the "right" place at the "right" time. We are interested in developing approaches that learn to guide individuals to be in the "right" place at the "right" time (to maximize revenue) in the presence of other similar "learning" individuals and only local aggregated observation of other agents states (e.g., only number of other taxis in same zone as current agent).

Existing approaches in Multi-Agent Reinforcement Learning (MARL) are either not scalable (e.g., about 40000 taxis/cars for a city like Singapore) or assumptions of common objective or action coordination or centralized learning are not viable. A key characteristic of the domains of interest is that the interactions between individuals are anonymous, i.e., the outcome of an interaction (competing for demand) is dependent only on the number and not on the identity of the agents. We model these problems using the Anonymous MARL (AyMARL) model. To ensure scalability and individual learning, we focus on improving performance of independent reinforcement learning methods, specifically Deep Q-Networks (DQN) and Advantage Actor Critic (A2C) for AyMARL. The key contribution of this paper is in employing principle of maximum entropy to provide a general framework of independent learning that is both empirically effective (even with only local aggregated information of agent population distribution) and theoretically justified.

Finally, our approaches provide a significant improvement with respect to joint and individual revenue on a generic simulator for online to offline services and a real world taxi problem over existing approaches. More importantly, this is achieved while having the least variance in revenues earned by the learning individuals, an indicator of fairness.

Talk #2: ZAC: A Zone pAth Construction Approach for Effective Real-Time Ride Sharing

by Meghna LOWALEKAR, PhD Candidate, School of Information Systems, Singapore Management University

Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the right requests to travel in available vehicles in real time, so that the objective (e.g., requests served, revenue or delay) is optimized. The most relevant existing work has focused on generating as many relevant feasible (with respect to available delay for customers) combinations of requests (referred to as trips) as possible in real-time. Since the number of trips increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such an approach has to employ ad hoc heuristics to identify relevant trips. To that end, we propose an approach that generates many zone (abstraction of individual locations) paths – where each zone path can represent multiple trips (combinations of requests) – and assigns available vehicles to these zone paths to optimize the objective. The key advantage of our approach is that these zone paths are generated using a combination of offline and online methods, consequently allowing for the generation of many more relevant combinations in real-time than competing approaches. We demonstrate that our approach outperforms (with respect to both objective and runtime) the current best approach for ridesharing on both real world and synthetic datasets.

These are pre-conference talks for International Conference on Automated Planning and Scheduling (ICAPS 2019).

About the Speaker(S)

 Tanvi Verma is a PhD candidate in School of Information Systems, Singapore Management University. She is part of Intelligent Systems and Decision Analytics (ISDA) Group and is advised by Associate Professor Pradeep Varakantham and Professor Hoong Chuin Lau. She received her B.Tech in Computer Science & Engineering from National Institute of Technology (NIT), Warangal, India. She then worked as a software engineer at NetApp, Bangalore before joining the PhD program at SMU in 2015. Her key research interests include Decision Making under Uncertainty, Reinforcement Learning and Multiagent Systems.
   
 Meghna LOWALEKAR is a PhD candidate in School of Information Systems, Singapore Management University and working under the supervision of Associate Professor Pradeep Varakantham and Professor Patrick Jaillet. She received her B.Tech. & M.S in Computer Science & Engineering from International Institute of Information Technology (IIIT), Hyderabad, India. She then worked as a software engineer in Qualcomm and Microsoft in Hyderabad, India. She has also worked as a research engineer in School of Information Systems, Singapore Management University prior to joining her PhD. She works in the area of Intelligent Systems & Decision Analytics (ISDA). Her key research interest lies in Online Matching of demand and supply in Stochastic Environments.

Please click here if you wish to unsubscribe.