Pre-Conference Talk by PHAM Quang Anh | IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching

Please click here if you are unable to view this page.

IOSTOM: Offline Imitation Learning from Observations via State Transition Occupancy Matching

Speaker (s):

PHAM Quang Anh
PhD Student
School of Computing and Information Systems
Singapore Management University

Date:

Time:

Venue:

25 November 2025, Tuesday

11:30am – 12:00pm

Meeting room 5.1, Level 5
School of Computing and
Information Systems 1,
Singapore Management University,
80 Stamford Road,
Singapore 178902

We look forward to seeing you at this research seminar.

Please register by 23 November 2025.

About the Talk

Offline Learning from Observation (LfO) focuses on enabling agents to imitate expert behavior using datasets that contain only expert state trajectories and separate transition data with suboptimal actions. This setting is both practical and critical in real-world scenarios where direct environment interaction or access to expert action labels is costly, risky, or infeasible. Most existing LfO methods attempt to solve this problem through state or state-action occupancy matching. They typically rely on pretraining a discriminator to differentiate between expert and non-expert states, which could introduce errors and instability—especially when the discriminator is poorly trained. While recent discriminator-free methods have emerged, they generally require substantially more data, limiting their practicality in low-data regimes. In this paper, we propose IOSTOM (Imitation from Observation via State Transition Occupancy Matching), a novel offline LfO algorithm designed to overcome these limitations. Our approach formulates a learning objective based on the joint state visitation distribution. A key distinction of IOSTOM is that it first excludes actions entirely from the training objective. Instead, we learn an implicit policy that models transition probabilities between states, resulting in a more compact and stable optimization problem. To recover the expert policy, we introduce an efficient action inference mechanism that avoids training an inverse dynamics model. Extensive empirical evaluations across diverse offline LfO benchmarks show that IOSTOM substantially outperforms state-of-the-art methods, demonstrating both improved performance and data efficiency.

This is a Pre-Conference talk for The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025).

About the speaker

Quang Anh PHAM is a first-year PhD student in Computer Science at SMU School of Computing and Information Systems, supervised by Associate Prof Akshat Kumar and Assistant Prof Mai Anh Tien. His research interests are Artificial Intelligence (Imitation Learning, Reinforcement Learning and Heuristic Search), Operations Research (Routing and Scheduling problems), and Combinatorial Optimization techniques (Metaheuristic and Integer & Dynamic Programming).

Where to find us

Get in touch