showSidebars ==
showTitleBreadcrumbs == 1
node.field_disable_title_breadcrumbs.value ==

Pre-Conference Talk by CHEN Changyu and Janaka Brahmanage

Please click here if you are unable to view this page.

 

Pre-Conference Talk by CHEN Changyu and Janaka Brahmanage Chathuranga
DATE : 29 November 2023, Wednesday
TIME : 1:00pm - 2:00pm
VENUE :

Meeting room 5.1, Level 5. School of Computing and Information Systems, Singapore Management University, 80 Stamford Road Singapore 178902.

Please register by 28 November 2023, Tuesday.

 

There are 2 talks in this session, each talk is approximately 30 minutes. 
All sessions are for pre-conference talk for the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023).

About the Talk (s)

Talk #1: Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning
by CHEN Changyu, PhD Candidate

Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy — the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state.

Talk #2: FlowPG: Action-constrained Policy Gradient with Normalizing Flows 
by Brahmanage Janaka Chathuranga THILAKARATHNA, PhD Candidate

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging. We develop multiple methods, based on Hamiltonian Monte-Carlo and probabilistic sentential decision diagrams for such action sampling for convex and non-convex constraints. Third, we integrate the learned normalizing flow with the DDPG algorithm. By design, a well-trained normalizing flow will transform policy output into a valid action without requiring an optimization solver. Empirically, our approach results in significantly fewer constraint violations (upto an order-of-magnitude for several instances) and is multiple times faster on a variety of continuous control tasks.

About the Speaker (s)

CHEN Changyu is a Ph.D. candidate in Computer Science at the SMU School of Computing and Information Systems, co-supervised by Prof. Pradeep Varakantham and Prof. Arunesh Sinha (Rutgers Business School). His research focuses on generative modeling and its application in reinforcement learning.

Janaka Brahmanage is a second-year PhD candidate in Computer Science, conducting research under the guidance of Associate Prof. Akshat Kumar at the SMU School of Computing and Information Systems. His research focuses on reinforcement learning and multi-agent systems.