Training
Research papers, repositories, and articles about training
Showing 9 of 9 items
Stronger Normalization-Free Transformers
Introduces Derf, a simple point-wise activation that replaces normalization layers like LayerNorm and RMSNorm while improving generalization across vision, speech, DNA sequence modeling, and GPT-style language models. The authors systematically study properties of point-wise functions, run a large-scale search, and show Derf outperforms prior normalization-free approaches (e.g., Dynamic Tanh) with similar or better stability. ([arxiv.org](https://arxiv.org/abs/2512.10938))
Stronger Normalization-Free Transformers
HF highlights this work for showing that a carefully designed point-wise activation (Derf) can fully replace normalization layers in Transformers and still improve performance across multiple domains. For practitioners, it points toward simpler, potentially faster architectures without layer norm’s synchronization and batch-size headaches. ([huggingface.co](https://huggingface.co/papers/2512.10938))
Reflective Preference Optimization (RPO): Enhancing On-Policy Alignment via Hint-Guided Reflection
Builds on Direct Preference Optimization but tackles its weak learning signal when both preferred and rejected responses share similar flaws. RPO adds a hint-guided reflection step that encourages the model to produce more contrastive, informative preference pairs before optimizing them. The result is a more stable and data-efficient on-policy alignment pipeline that still avoids full RLHF/RLAIF complexity.
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Describes the QwenLong-L1.5 post-training recipe for extending LLM context windows while keeping reasoning quality intact. The work focuses not just on positional encodings but also on memory management strategies and training curricula that keep long-context performance from collapsing. This is highly relevant for anyone trying to turn a baseline LLM into a stable long-context model without re‑training from scratch.
nanoGPT
Karpathy’s minimalist GPT training repo continues to trend, reflecting ongoing interest in from-scratch pretraining and fine-tuning for medium-sized LLMs. Still one of the best learning references if you want to understand the guts of GPT-style models. ([github.com](https://github.com/trending?since=daily))
GPT-SoVITS
GPT-SoVITS is a hugely popular WebUI and pipeline for few-shot TTS and voice conversion, enabling convincing voice cloning with as little as 5 seconds to 1 minute of audio, plus dataset prep tools (separation, ASR, labeling) and multi-lingual support (EN/JA/KO/ZH/Cantonese). If you’re experimenting with custom voices, VTuber-style content, or rapid TTS prototyping on consumer GPUs, this is effectively the community standard toolkit. ([github.com](https://github.com/RVC-Boss/GPT-SoVITS?utm_source=openai))
GeoDM: Geometry-aware Distribution Matching for Dataset Distillation
Proposes GeoDM, a dataset distillation framework that performs distribution matching in a product space of Euclidean, hyperbolic, and spherical manifolds, with learnable curvature and weights. This geometry-aware approach yields lower generalization error bounds and consistently outperforms prior distillation methods by better aligning synthetic and real-data manifolds. ([arxiv.org](https://arxiv.org/abs/2512.08317?utm_source=openai))
tinker-cookbook
tinker-cookbook provides practical, end‑to‑end examples of post‑training LLMs using Tinker, a managed fine‑tuning API from Thinking Machines Lab that handles distributed training while you control the algorithms and data. The repo includes recipes for instruction tuning, math reasoning, RLHF-style preference learning, tool use, prompt distillation, and multi-agent setups, making it a strong starting point if you want to fine‑tune open-weight models like Llama or Qwen without building your own training stack. ([github.com](https://github.com/thinking-machines-lab/tinker-cookbook?utm_source=openai))
geoai
geoai is a Python package from the opengeos ecosystem that integrates deep-learning frameworks (PyTorch, Transformers, segmentation models) with geospatial tooling to handle everything from remote-sensing data download and tiling to training, inference, and interactive map visualization. It’s aimed at practitioners who want a higher-level, batteries-included stack for tasks like land-cover classification, building footprint extraction, and change detection, without reinventing all the GIS + ML plumbing. ([github.com](https://github.com/opengeos/geoai?utm_source=openai))