Decoding

Research papers, repositories, and articles about decoding

Showing 1 of 1 items

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

ReFusion is a masked diffusion model for text that decodes in parallel over contiguous ‘slots’ instead of individual tokens. By combining diffusion-based planning with autoregressive infilling, it recovers much of the quality of strong autoregressive LLMs while massively speeding up generation and allowing KV-cache reuse. This is one of the more serious attempts to rethink LLM decoding beyond the usual left-to-right paradigm.

Jia-Nan Li, Jian Guan