Workshop on Advances of Generative AI (GenAI)

SMU Mochtar Riady Auditorium
| 27 April 2025
​​​​​​​| 9:00 a.m. to 6:00 p.m.

OVERVIEW

The SMU 2025 Advances of Generative Artificial Intelligence (GenAI) delves into the fundamental challenges of generative modeling and synthesis. We explore critical advances in compositional generationlatent space modeling, and conditional synthesis. Our focus spans from theoretical frameworks in generative learning to breakthroughs in multi-view consistency and semantic control in generation processes.

Distinguished speakers will address key research directions including structured generation of 3D human dynamics, memory-efficient generative inference, and generalizable generation paradigms. This event examines emerging challenges in controllable synthesisdistribution alignment, and compositional generalization across modalities. Join us in exploring how these advances are reshaping our understanding of generative modeling and its applications!

39 Days
09 Hours
03 Minutes
40 Seconds

SPEAKERS

Siyu TANG

ETH Zurich

Anton van den Hengel

Australian Institute for Machine Learning

Xia (Ben) HU

Rice University

Yuki M. Asano

University of Technology Nuremberg

Yongqin XIAN

Google Zurich

Cong LU

Google DeepMind

Tatsuya Harada

The University of Tokyo

Fabio Galasso

Sapienza University of Rome

ORGANIZERS

Qianru SUN

Singapore Management University

Guansong PANG

Singapore Management University

Xinrun WANG

Singapore Management University

Zichen TIAN

Singapore Management University

Yunshan MA

Singapore Management University

Ee Peng Lim
Singapore Management University

SCHEDULE

Lecture

9:00 a.m. – 9:10 a.m.

Opening Remarks

Qianru SUN

Singapore Management University

🎉  Opening Remark & Registration


Lecture

9:10 a.m. – 9:55 a.m.

Morning Session 1

Siyu TANG

ETH Zurich

To Be Determined

To Be Determined


Lecture

9:55 a.m. – 10:40 a.m.

Morning Session 2

Anton van den Hengel

Australian Institute for Machine Learning

To Be Determined

To Be Determined


Break

10:40 a.m. – 11:00 a.m.

Morning Session

☕️ Coffee Break


Lecture

11:00 a.m. – 11:45 a.m.

Morning Session 3

Xia (Ben) HU

Rice University

Efficient LLM Serving via Lossy Computation

Large language models (LLMs) have exhibited human-like conversational abilities. Yet, scaling LLMs to longer contexts, such as extracting information from lengthy articles—one of the most fundamental tasks in healthcare applications—poses significant challenges. The primary issues are their inability to handle contexts beyond pre-training lengths and system constraints that make deployment difficult, as memory requirements for inference increase with context length. The key idea to overcome these challenges is that LLMs are extremely robust to noise from lossy computation, such as low-precision computation. Following this insight, we will discuss recent advancements in serving LLMs at scale, particularly in handling longer contexts. To address the algorithmic challenge, I will share our recent work on extending LLM context length to at least 8x longer by coarsening the positional information of distant tokens. To address the system challenge, I will discuss our recent efforts in quantizing the intermediate states of past tokens to 2-bit numbers, leading to a 8x memory efficiency and 3.5x wall-clock time speedup without harming performance. Finally, I will highlight our latest projects applying LLMs in healthcare, particularly how we utilize retrieval techniques for long contexts to mitigate the hallucination problem in healthcare chatbots.


Lecture

11:45 a.m. – 12:30 p.m.

Morning Session 4

Yongqin XIAN

Google Zurich

Improving vision-language pretraining with self-distillation, location-aware captioner and data curation

Image-Text pretraining on web-scale image caption datasets has become the default recipe for open vocabulary classification and retrieval models thanks to the success of CLIP and its variants. However, the contrastive objective used by these models only focuses on image-text alignment and does not incentivise image feature learning for dense prediction tasks. In the first part, I will introduce SILC, a novel framework for vision language pretraining. SILC improves image-text contrastive learning with the simple addition of local-to-global correspondence learning by self-distillation. In the second part, I will present LocCa, a simple visual pretraining method with location-aware captioners. LocCa uses an image captioner task interface, to teach a model to read out rich information, i.e. bounding box coordinates, and captions, conditioned on the image pixel input. Finally, I will present ACED, a novel method that distills those large foundation models via active data curation.

To Be Determined

To Be Determined.

Lecture

1:40 p.m. – 2:25 p.m.

Afternoon Session 1

Yuki M. Asano

University of Technology Nuremberg

Insights from Vision-Language Models and Post-Pretraining in Computer Vision

I will talk about how we can build on top of pre-trained Foundation Models to achieve better models for vision, language, audio, and multi-modal tasks. First I will show that despite its strong performance, DINOv2 and other vision backbones often lack spatial understanding of images. To counteract this, we use NeCo, a new post-pretraining approach based on patch-nearest neighbors, which significantly improves the dense performances of this and any other model despite using only 16 GPU hours. We will also learn how we can leverage videos to further improve the dense understanding of pre-trained image models such as DINO. Next, I will present our latest work where we show that gradients from self-supervised losses can be successfully used as features for improved retrieval performances across vision, audio, and text. Finally, I will introduce a new method that allows training CLIP models with only 10 GPU hours by leveraging pre-trained unimodal encoders. We will find a surprising relationship between LLMs' performances and their visual understanding of the world.


Lecture

2:25 p.m. – 3:10 p.m.

Afternoon Session 2

Cong LU

Google DeepMind

Towards Fully Autonomous Open-Ended Scientific Discovery

A grand challenge in artificial intelligence is developing systems capable of open-ended learning and autonomous scientific discovery. This talk highlights recent progress toward fully autonomous AI-driven science, beginning with The AI Scientist, which automates the entire scientific process, from hypothesis generation through experimentation. We then discuss Automated Design of Agentic Systems (ADAS), demonstrating how meta-agents autonomously design and optimize large language model architectures, achieving strong performance in complex reasoning and problem-solving tasks. Lastly, we introduce Automated Capability Discovery (ACD), a method for systematically evaluating the extensive capabilities of foundation models. ACD employs one model to autonomously generate open-ended tasks to probe another model's abilities, uncovering numerous surprising capabilities and limitations in models such as GPT, Claude, and Llama. The talk concludes by exploring future directions and the potential of autonomous scientific exploration.


Break

3:10 p.m. – 3:30 p.m.

Afternoon Session

☕️ Coffee Break


Lecture

3:30 p.m. – 4:15 p.m.

Afternoon Session 3

Tatsuya Harada

The University of Tokyo

To Be Determined

To Be Determined


Lecture

4:15 p.m. – 5:00 p.m.

Afternoon Session 4

Fabio Galasso

Sapienza University of Rome

Representing and Synthesizing Human Motion

Representation learning and generative models have thrived in theory and applications most recently. The capability of synthesizing human motion is foundational to virtual reality, animation, robotics, and biomechanics. In this talk, I will introduce our most recent work on learning representations of human motions. Beyond Euclidean, I will present multi-modal representation learning approaches that exploit hyperbolic latent spaces to model uncertainty and hierarchical representations. I will also introduce models for generating human motions that can be steered and aligned with human preferences.

WORKSHOP VENUE & REGISTRATION

REGISTER NOW TO RESERVE YOUR SEAT!


Due to limited seating, priority will be given to registered attendees.

27 APRIL 2025

Mochtar Riady Auditorium
Singapore Management University
Administration Building
81 Victoria St, Singapore 188065

WORKSHOP CONTACT

For further enquiries, please contact,