ArXiv Paper

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs

Jiyuan Zhang, Yining Liu, Siqi Yan +6January 12, 2026

Summary

MoEBlaze redesigns mixture‑of‑experts training to cut activation memory and data movement on GPUs. It claims over 4× speedups and 50% memory savings versus existing frameworks, which directly matters for anyone pushing bigger sparse models.

Topics

training moe systems scaling

View Original View PDF

Related Content

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.

HuggingFace's Transformers: State-of-the-art Natural Language Processing

This 2019 paper launched the Transformers library, giving a clean API around many transformer models and pretrained checkpoints. It turned cutting-edge NLP into a reusable software layer that underpins most open-source LLM work today.

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Wraps real robots in a closed-loop system where coding agents iteratively reset scenes, run policies, check results, and improve code. If you’re serious about autonomous robot labs, this is basically a blueprint.