On May 27, 2026, Xiaomi’s MiMo large-model team announced that MiMo‑V2.5 series APIs are being permanently discounted, with top price cuts of up to 99% versus original rates. The move also unifies pricing across context window sizes and boosts Token Plan allowances by roughly 5–8x at the same price.
This article aggregates reporting from 5 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
Xiaomi’s decision to permanently slash MiMo‑V2.5 API prices by up to 99% is a dramatic escalation in the Chinese LLM price war. Combined with MiMo‑V2.5’s MIT‑licensed open‑source release and 1M‑token context window, this turns a frontier‑class model into near‑commodity infrastructure — especially attractive for agents and long‑context workloads. The technical note that SGLang HiCache and sliding‑window attention have cut KV‑cache data movement to about one‑seventh of previous levels explains how Xiaomi can afford this: inference efficiency gains are being passed straight through to developers.([chinaz.com](https://www.chinaz.com/2026/0527/1754957.shtml))
For the race to AGI, price is not a side detail; it’s a control knob on how much experimentation and deployment actually happens. When you can run multi‑hundred‑thousand‑token contexts for a fraction of previous cost, it becomes viable to let agents maintain long‑lived memories, operate over entire codebases, and continuously monitor streams of sensor data. Xiaomi is effectively betting that whoever commoditizes high‑end reasoning first will own the developer mindshare that sits on top of it. That puts direct pressure on domestic rivals like DeepSeek and Qwen, but also on Western providers whose enterprise pricing remains orders of magnitude higher.
The move also highlights a deeper geopolitical dynamic: while US policy increasingly focuses on restricting access to cutting‑edge chips, Chinese players are racing to differentiate on cost‑per‑token and open‑source friendliness. If they succeed in making “good enough” AGI‑adjacent capability almost free, it could accelerate global deployment even if absolute model quality lags slightly behind the very best proprietary systems.



