Back to AI Lab

Long Context

Research papers, repositories, and articles about long context

Showing 6 of 6 items

FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling

FlashPrefill discovers sparse attention patterns during the prefill phase and drops low-importance connections on the fly. It reports huge speedups on 256K-token contexts while still matching baseline accuracy.

Qihang Fan, Huaibo Huang

Information-Aware KV Cache Compression for Long Reasoning

InfoKV mixes attention scores with an information-theory signal that tracks how much a token affects future predictions. This lets the model drop uninformative tokens while keeping rare but important ones, improving long-context reasoning under tight memory. If you fight KV blowup, this suggests a smarter eviction policy. ([huggingface.co](https://huggingface.co/papers/2606.26875))

Jushi Kai, Zhuiri Xiao

QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Describes the QwenLong-L1.5 post-training recipe for extending LLM context windows while keeping reasoning quality intact. The work focuses not just on positional encodings but also on memory management strategies and training curricula that keep long-context performance from collapsing. This is highly relevant for anyone trying to turn a baseline LLM into a stable long-context model without re‑training from scratch.

Weizhou Shen, Ziyi Yang

EuroLLM-22B: Technical Report

EuroLLM-22B is a 22B-parameter open model focused on European languages, with long-context support and a detailed training recipe. It aims to give EU labs and companies a strong regional alternative to US-centric frontier models.

Miguel Moura Ramos, Duarte M. Alves

RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference

RRAttention keeps only a fraction of attention blocks by rotating which positions each head looks at in a round-robin pattern. It recovers almost full-attention accuracy while skipping about half the computation at 128K-token context lengths.

Siran Liu, Guoxia Wang

Diffusion Language Models Are Natively Length-Aware

Argues that diffusion-style language models naturally handle short and long prompts without special tricks. Points to a promising path for huge-context text models.

Vittorio Rossi, Giacomo Cirò