Pre-Conference Talk by Mudiyanselage Dulanga Kaveesha WEERAKOON

Please click here if you are unable to view this page.

DATE :	30 August 2022, Tuesday
TIME :	1:00pm - 2.00pm
VENUE :	Meeting room 5.1, Level 5 School of Computing and Information Systems, Singapore Management University, 80 Stamford Road Singapore 178902

There are 2 talks in this session, each talk is approximately 30 minutes.

About the Talk (s)

Talk #1: COSM2IC: Optimizing real-time multi-modal instruction comprehension

Abstract: Supporting real-time, on-device execution of multi- modal referring instruction comprehension models is an important challenge to be tackled in embodied Human-Robot Interaction. However, state-of-the-art deep learning models are resource- intensive and unsuitable for real-time execution on embedded devices. While model compression can achieve a reduction in computational resources up to a certain point, further optimizations result in a severe drop in accuracy. To minimize this loss in accuracy, this talk introduces the COSM2IC framework, with a lightweight Task Complexity Predictor, that uses multiple sensor inputs to assess the instructional complexity and thereby dynamically switch between a set of models of varying computational intensity such that computationally less demanding models are invoked whenever possible. To demonstrate the benefits of COSM2IC, we utilize a representative human-robot collaborative “table-top target acquisition” task, to curate a new multi- modal instruction dataset where a human issues instructions in a natural manner using a combination of visual, verbal, and gestural (pointing) cues.

Talk #2: SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension

Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. In this talk, I present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby ‘skipped’ visual scales are not completely eliminated but approximated with minimal additional computation.

For talk #1: This is the pre-conference talks for IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022). Link: https://iros2022.org/

For talk #2: This is the pre-conference talk for 30th ACM International Conference on Multimedia (ACMM-22). Link: https://2022.acmmm.org/

About the Speaker

Mudiyanselage Dulanga Kaveesha WEERAKOON is a fourth year PhD student at the School of Computing and Information Systems advised by Prof. Archan Misra. My research interests lie in Human-AI Collaboration, Multi-modal Sense-making, Pervasive Computing and Referring Expression Comprehension (REC). I have worked extensively on Multi-modal sensing-making for Human-AI collaboration tasks on pervasive devices. In particular, I have explored several static and dynamic optimization techniques on REC models to support human-AI collaborative object acquisition tasks with low energy and latency with comparable accuracy.

Where to find us

Get in touch