Moe

Research papers, repositories, and articles about moe

Showing 2 of 2 items

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs

MoEBlaze redesigns mixture‑of‑experts training to cut activation memory and data movement on GPUs. It claims over 4× speedups and 50% memory savings versus existing frameworks, which directly matters for anyone pushing bigger sparse models.

Jiyuan Zhang, Yining Liu

The Expert Strikes Back: Interpreting Mixture-of-Experts Language Models at Expert Level

Studies how mixture-of-experts language models actually route work between experts. Offers tools to inspect which expert fires and why, instead of treating MoE as a black box.

Jeremy Herbst, Jae Hee Lee