Speculative

Research papers, repositories, and articles about speculative

Showing 1 of 1 items

DFlash: Block Diffusion for Flash Speculative Decoding

DFlash uses a small diffusion model to draft whole blocks of tokens in parallel, then lets a larger model quickly verify them. It keeps output quality while giving over 6x faster generation than standard decoding on common LLMs.

Jian Chen, Yesheng Liang