Back to AI Lab

Theory

Research papers, repositories, and articles about theory

Showing 5 of 5 items

ZJU-LLMs/Foundations-of-LLMs

An open book and course materials on the foundations of large language models, covering theory, architectures, training, and deployment. With >14k stars, it’s quickly becoming a go‑to learning resource for people trying to move from ‘user’ to ‘builder’ of LLMs. If you want a structured, code-linked path into the guts of modern LMs, this is a strong candidate.

14,400

Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics

Claims an exact, error-free formulation of linear attention derived from a continuous-time view of transformer dynamics. The authors argue they can match the behavior of standard softmax attention while enjoying linear-time complexity, avoiding the approximation errors that plague many fast-attention variants. If the theory and practice hold up, this could become a key building block for large-context models and resource-constrained deployments.

Jingdi Lei, Di Zhang

Deep Delta Learning

The authors replace standard residual skip connections with a learnable "Delta" operator that can flexibly distort the identity path. This lets deep nets control how much old information to erase versus new information to write, improving how they model complex dynamics while keeping training stable.

Yifan Zhang, Yifeng Liu

Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

This paper dissects why "learning from verifiable rewards" can improve math reasoning even when rewards look noisy or misleading. It shows how clipping and reward noise reduce the model’s randomness in useful ways and offers principles for designing better reasoning-focused training runs. ([ar5iv.org](https://ar5iv.org/abs/2512.16912))

Peter Chen, Xiaopeng Li

GeoDM: Geometry-aware Distribution Matching for Dataset Distillation

Proposes GeoDM, a dataset distillation framework that performs distribution matching in a product space of Euclidean, hyperbolic, and spherical manifolds, with learnable curvature and weights. This geometry-aware approach yields lower generalization error bounds and consistently outperforms prior distillation methods by better aligning synthetic and real-data manifolds. ([arxiv.org](https://arxiv.org/abs/2512.08317?utm_source=openai))

Xuhui Li, Zhengquan Luo