Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance
Speaker:  ZOU Xiandong Ph.D. Candidate School of Computing and Information Systems Singapore Management University
| Date: Time: Venue: | | 23 June 2026, Tuesday 10:30am – 11:00am Meeting room 4.4, Level 4 School of Computing and Information Systems 1, Singapore Management University, 80 Stamford Road Singapore 178902
Please register by 22 June 2026. 
|
|
About the Talk
Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals while minimizing divergence from the target distribution. To enhance quality and reduce variance, we incorporate a path-level utility and optimize via an Expectation-Maximization procedure. The E-step draws Monte Carlo samples from an oracle-filtered posterior, while the M-step maximizes weighted likelihood using Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Theoretical analysis confirms that VSD increases expected acceptance length and speedup. Extensive experiments across LLMs and MLLMs show that VSD achieves up to a 9.6% speedup over EAGLE-3 and 7.9% over ViSpec, significantly improving decoding efficiency.
This is a Pre-Conference talk for Forty-Third International Conference on Machine Learning (ICML 2026).
About the Speaker
Xiandong ZOU is a Ph.D. candidate in Computer Science at the School of Computing and Information Systems, Singapore Management University, under the supervision of Professor Pan Zhou. He is a member of the Language and Vision Lab (LV-Lab), directed by Professor Shuicheng Yan and Professor Pan Zhou. His research interests include AIGC, generative models, and machine learning.