On May 25, 2026, ModelBest (面壁智能), Tsinghua University and the OpenBMB community released and open-sourced BitCPM-CANN, described as China’s first 1.58‑bit (ternary) large model trained end‑to‑end on Huawei Ascend hardware. The series includes 0.5B, 1B, 3B and 8B parameter models and reportedly delivers about 6× memory savings with 90–97.2% capability retention versus BF16, enabling 8B models to run on flagship smartphones.
This article aggregates reporting from 2 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
BitCPM-CANN is one of the clearest examples yet of China pushing hard on efficient frontier‑class inference without relying on Nvidia. A fully ternary, 1.58‑bit LLM family trained natively on Huawei Ascend shows that Chinese labs are not just copying Western model architectures—they are iterating on the hardware–algorithm co‑design needed for true edge deployment. With 6× memory savings and near‑full capability retention, running an 8B model locally on a phone stops being a marketing slide and starts looking like an actual product path.([ithome.com](https://www.ithome.com/0/954/759.htm))
For the race to AGI, ultra‑low‑bit models matter because they shift the constraint from raw FLOPs to clever representation. If teams can pack surprisingly strong cognition into small, quantized networks, ubiquitous on‑device agents become viable even without hyperscale data centers. This is especially strategic for China, which faces export controls on high‑end GPUs but controls its domestic handset and Ascend ecosystems. BitCPM‑CANN also undercuts the idea that only the US “big three” can drive core algorithmic innovation: the quantization stack, long‑context support and full training pipeline give domestic players a reusable foundation for future edge models.

