Be ahead of the curve
Research papers, repositories, and articles about speculative
Showing 1 of 1 items
DFlash uses a small diffusion model to draft whole blocks of tokens in parallel, then lets a larger model quickly verify them. It keeps output quality while giving over 6x faster generation than standard decoding on common LLMs.