Chinese AI lab DeepSeek released a new architecture, “mHC: Manifold-Constrained Hyper-Connections,” that stabilizes training for very large models while boosting reasoning and reading performance. Chinese coverage and a new summary from Sina on January 3 say internal experiments and wording in the paper strongly suggest DeepSeek’s next flagship model, DeepSeek V4, has already finished training and could launch around Lunar New Year 2026.
This article aggregates reporting from 5 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
DeepSeek’s mHC paper is one of the more consequential architecture tweaks we’ve seen since the original residual networks. By constraining hyper-connections to a manifold that preserves something close to identity mappings, DeepSeek is attacking a very practical frontier-model problem: as you widen residual streams and add more cross-layer connections, training becomes numerically unstable long before you hit your desired scale. mHC is essentially a recipe for packing more computation and communication into each layer without the model blowing up. ([arxiv.org](https://arxiv.org/abs/2512.24880?utm_source=openai))
If this technique works as advertised at DeepSeek’s internal scales, it lowers one of the key non‑compute barriers to frontier models: you don’t need exotic new optimization tricks for every scale jump. Combined with DeepSeek’s focus on kernel-level engineering, mHC looks like a reusable blueprint other labs can adopt, much like attention or residuals themselves. The strong hint that V4 is already trained means we’re likely to see a new open model that pushes reasoning benchmarks again, this time with a more scalable backbone.
Strategically, this reinforces DeepSeek’s position as the lab defining the open frontier of large models, forcing Western labs to respond on architecture and efficiency, not just raw FLOPs. If mHC diffuses widely, it could compress the gap between current systems and the sort of ultra‑deep, ultra‑wide networks people imagine for AGI, without requiring a proportional jump in compute budgets.


