Theory
Research papers, repositories, and articles about theory
Showing 3 of 3 items
ZJU-LLMs/Foundations-of-LLMs
An open book and course materials on the foundations of large language models, covering theory, architectures, training, and deployment. With >14k stars, it’s quickly becoming a go‑to learning resource for people trying to move from ‘user’ to ‘builder’ of LLMs. If you want a structured, code-linked path into the guts of modern LMs, this is a strong candidate.
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
Claims an exact, error-free formulation of linear attention derived from a continuous-time view of transformer dynamics. The authors argue they can match the behavior of standard softmax attention while enjoying linear-time complexity, avoiding the approximation errors that plague many fast-attention variants. If the theory and practice hold up, this could become a key building block for large-context models and resource-constrained deployments.
GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
Proposes GeoDM, a dataset distillation framework that performs distribution matching in a product space of Euclidean, hyperbolic, and spherical manifolds, with learnable curvature and weights. This geometry-aware approach yields lower generalization error bounds and consistently outperforms prior distillation methods by better aligning synthetic and real-data manifolds. ([arxiv.org](https://arxiv.org/abs/2512.08317?utm_source=openai))