Attention
Research papers, repositories, and articles about attention
Showing 2 of 2 items
RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference
RRAttention keeps only a fraction of attention blocks by rotating which positions each head looks at in a round-robin pattern. It recovers almost full-attention accuracy while skipping about half the computation at 128K-token context lengths.
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
Claims an exact, error-free formulation of linear attention derived from a continuous-time view of transformer dynamics. The authors argue they can match the behavior of standard softmax attention while enjoying linear-time complexity, avoiding the approximation errors that plague many fast-attention variants. If the theory and practice hold up, this could become a key building block for large-context models and resource-constrained deployments.