Training
Research papers, repositories, and articles about training
Showing 36 of 36 items
huggingface/transformers
The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Builds thousands of synthetic "computers" with realistic files and calendars to simulate month-long knowledge work for AI agents. Each run spans 8+ hours and ~2,000 steps, yielding dense signals for training long-horizon productivity agents. If you are designing office copilots or agent training curricula, copy this setup to cheaply generate rich experience data. ([arxiv.org](https://arxiv.org/abs/2604.28181))
Kling-Omni Technical Report
Kling-Omni is a unified system for generating and editing high-end video from text, images, and video context. Treat it as a reference design for next-gen multimodal world simulators.
DFlash: Block Diffusion for Flash Speculative Decoding
DFlash uses a small diffusion model to draft whole blocks of tokens in parallel, then lets a larger model quickly verify them. It keeps output quality while giving over 6x faster generation than standard decoding on common LLMs.
Stronger Normalization-Free Transformers
Introduces Derf, a simple point-wise activation that replaces normalization layers like LayerNorm and RMSNorm while improving generalization across vision, speech, DNA sequence modeling, and GPT-style language models. The authors systematically study properties of point-wise functions, run a large-scale search, and show Derf outperforms prior normalization-free approaches (e.g., Dynamic Tanh) with similar or better stability. ([arxiv.org](https://arxiv.org/abs/2512.10938))
Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
A "reasoner" model and a "discriminator" model train together so the discriminator flags wrong steps in math solutions, not just wrong final answers. This joint training gives dense step-level rewards and boosts math benchmark scores for existing open models like DeepSeek-R1 distills without huge extra compute. ([ar5iv.org](https://ar5iv.org/abs/2512.16917))
Stronger Normalization-Free Transformers
HF highlights this work for showing that a carefully designed point-wise activation (Derf) can fully replace normalization layers in Transformers and still improve performance across multiple domains. For practitioners, it points toward simpler, potentially faster architectures without layer norm’s synchronization and batch-size headaches. ([huggingface.co](https://huggingface.co/papers/2512.10938))
Rethinking Agentic Reinforcement Learning In Large Language Models
Synthesizes the fast-growing literature on reinforcement learning for agent-style language models, from environment design to safety and compute limits. Argues the key shift is treating models as long-lived decision-makers, not one-shot text generators. If you’re planning big training runs for agents, use this as a design checklist, not just a citation. ([databubble.co](https://databubble.co/news/rethinking-agentic-reinforcement-learning-in-large-language-models?utm_source=openai))
MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs
MoEBlaze redesigns mixture‑of‑experts training to cut activation memory and data movement on GPUs. It claims over 4× speedups and 50% memory savings versus existing frameworks, which directly matters for anyone pushing bigger sparse models.
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Describes the QwenLong-L1.5 post-training recipe for extending LLM context windows while keeping reasoning quality intact. The work focuses not just on positional encodings but also on memory management strategies and training curricula that keep long-context performance from collapsing. This is highly relevant for anyone trying to turn a baseline LLM into a stable long-context model without re‑training from scratch.
Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection
Builds on Direct Preference Optimization but tackles its weak learning signal when both preferred and rejected responses share similar flaws. RPO adds a hint-guided reflection step that encourages the model to produce more contrastive, informative preference pairs before optimizing them. The result is a more stable and data-efficient on-policy alignment pipeline that still avoids full RLHF/RLAIF complexity.
Constructive Circuit Amplification: Improving Math Reasoning in LLMs via Targeted Sub-Network Updates
The authors find sparse "circuits" inside language models that drive math reasoning and selectively strengthen only those pieces. They report up to 11.4% accuracy gains while touching about 1.6% of model components, keeping other skills like MMLU almost unchanged. ([ar5iv.org](https://ar5iv.org/abs/2512.16914))
nanoGPT
Karpathy’s minimalist GPT training repo continues to trend, reflecting ongoing interest in from-scratch pretraining and fine-tuning for medium-sized LLMs. Still one of the best learning references if you want to understand the guts of GPT-style models. ([github.com](https://github.com/trending?since=daily))
Let ViT Speak: Generative Language-Image Pre-training
Trains a Vision Transformer to predict language tokens directly from image tokens using a standard language-model objective. Removes contrastive tricks and extra decoders while staying competitive on many multimodal benchmarks. If you maintain vision backbones for language models, this is a simpler pretraining recipe to test. ([huggingface.co](https://huggingface.co/papers/2605.00809))
Deep Delta Learning
The authors replace standard residual skip connections with a learnable "Delta" operator that can flexibly distort the identity path. This lets deep nets control how much old information to erase versus new information to write, improving how they model complex dynamics while keeping training stable.
Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
The Daily Papers summary underlines how reward clipping and entropy tricks interact in RL for reasoning. Read this before you copy any popular reward setups for math models.
Efficient Training on Multiple Consumer GPUs with RoundPipe
Introduces a new pipeline schedule that avoids tight weight sharing constraints across stages when customizing large models. Targets setups with several consumer GPUs and slow interconnects, squeezing more throughput from cheap hardware. If your lab or startup runs on gamer cards, this is immediately actionable. ([huggingface.co](https://huggingface.co/papers/2604.27085))
EasyV2V: A High-quality Instruction-based Video Editing Framework
EasyV2V upgrades text-controlled video editing by cleverly generating training pairs from existing experts and images. If you're building video tools, this paper is a recipe for better data and architectures.
Next-Embedding Prediction Makes Strong Vision Learners
Instead of predicting pixels or patches, this method predicts the next embedding in a learned space. Vision folks can plug this into pretraining to squeeze more out of ImageNet-scale data.
GPT-SoVITS
GPT-SoVITS is a hugely popular WebUI and pipeline for few-shot TTS and voice conversion, enabling convincing voice cloning with as little as 5 seconds to 1 minute of audio, plus dataset prep tools (separation, ASR, labeling) and multi-lingual support (EN/JA/KO/ZH/Cantonese). If you’re experimenting with custom voices, VTuber-style content, or rapid TTS prototyping on consumer GPUs, this is effectively the community standard toolkit. ([github.com](https://github.com/RVC-Boss/GPT-SoVITS?utm_source=openai))
Co-Evolving Policy Distillation
Unifies two popular post‑training styles and shows why naively merging many expert policies can lose capabilities. Proposes a bidirectional distillation loop where student and experts improve together. If you juggle multiple specialist models, this offers a more stable way to fold them into one. ([huggingface.co](https://huggingface.co/papers/2604.27083))
MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue
Introduces a new optimization rule for training chat agents over long conversations. The goal: steadier learning and more helpful dialogue without exploding token and compute costs.
Shared LoRA Subspaces for almost Strict Continual Learning
The paper shows that you can reuse a shared low-rank adapter space across many tasks instead of adding new adapters forever. That keeps performance high while holding down memory as models pick up new skills over time.
VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization
VIVA uses a vision-language model to encode instructions and a reward-optimized diffusion model to edit videos. Great blueprint for anyone mixing video generation with RL-style feedback.
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer
Trains a tokenizer and autoregressive image model together, letting generation feedback directly improve the tokenization scheme. Hits state-of-the-art ImageNet 256×256 scores without guidance. If you build discrete image generators, this supports fusing tokenizer and generator into one training pipeline. ([huggingface.co](https://huggingface.co/papers/2605.00503))
What Matters in Data Curation for Multimodal Reasoning? Insights from the DCVLR Challenge
Using a NeurIPS data curation challenge, this paper shows that picking hard, aligned examples beats just adding more or more diverse data. For vision–language reasoning, curation quality matters more than dataset size.
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Posterior Behavioral Cloning shows how the way you pretrain policies can make downstream reinforcement learning far cheaper. Robotics teams can adopt this to cut expensive environment time.
Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model
Introduces Neuro-RIT, which looks at individual neurons while customizing language models for retrieval-heavy tasks. The aim is steadier answers when retrieved documents shift or are noisy.
Digital Metabolism: Decoupling Logic from Facts via Regenerative Unlearning -- Towards a Pure Neural Logic Core
This paper experiments with aggressively "forgetting" facts while preserving reasoning ability in a small Qwen model. The model loses targeted knowledge yet starts to lean harder on explicit reasoning steps.
cocoindex-io/cocoindex
A high-performance data transformation engine built for AI pipelines. It focuses on incremental processing, so you can keep large feature stores and training datasets in sync cheaply. ([github.com](https://github.com/trending))
The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level
Studies how mixture-of-experts language models actually route work between experts. Offers tools to inspect which expert fires and why, instead of treating MoE as a black box.
Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection
Hi‑ZFO mixes gradient-based updates on important layers with gradient-free noise on the rest to escape bad minima. It aims to get better customized models with less compute and more stable training than pure gradient methods.
GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
Proposes GeoDM, a dataset distillation framework that performs distribution matching in a product space of Euclidean, hyperbolic, and spherical manifolds, with learnable curvature and weights. This geometry-aware approach yields lower generalization error bounds and consistently outperforms prior distillation methods by better aligning synthetic and real-data manifolds. ([arxiv.org](https://arxiv.org/abs/2512.08317?utm_source=openai))
ed-donner/llm_engineering
Companion repo for a course on building with large language models. Covers prompts, customization, retrieval, and deployment in runnable notebooks. If you onboard new engineers into AI, this is a solid starting curriculum. ([github.com](https://github.com/trending/jupyter-notebook?since=daily))
geoai
geoai is a Python package from the opengeos ecosystem that integrates deep-learning frameworks (PyTorch, Transformers, segmentation models) with geospatial tooling to handle everything from remote-sensing data download and tiling to training, inference, and interactive map visualization. It’s aimed at practitioners who want a higher-level, batteries-included stack for tasks like land-cover classification, building footprint extraction, and change detection, without reinventing all the GIS + ML plumbing. ([github.com](https://github.com/opengeos/geoai?utm_source=openai))
tinker-cookbook
tinker-cookbook provides practical, end‑to‑end examples of post‑training LLMs using Tinker, a managed fine‑tuning API from Thinking Machines Lab that handles distributed training while you control the algorithms and data. The repo includes recipes for instruction tuning, math reasoning, RLHF-style preference learning, tool use, prompt distillation, and multi-agent setups, making it a strong starting point if you want to fine‑tune open-weight models like Llama or Qwen without building your own training stack. ([github.com](https://github.com/thinking-machines-lab/tinker-cookbook?utm_source=openai))