Speculative Decoding

Research papers, repositories, and articles about speculative decoding

Showing 1 of 1 items

JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

JetSpec adds a new draft head so you can propose large token trees in one forward pass while staying consistent with the base model. On Qwen3 models it reaches up to ~9.6x speedups on math without tanking quality, and integrates with vLLM. If you serve heavy workloads, this is a must-read for cutting the cost to run the AI. ([huggingface.co](https://huggingface.co/papers/2606.18394))

Lanxiang Hu, Zhaoxiang Feng