On February 9, 2026, Chinese outlet Machine Heart Pro detailed Nvidia GEAR Lab’s DreamZero, a 14‑billion‑parameter world action model built on a video diffusion backbone. The system reportedly enables 7 Hz real‑time closed‑loop control and over 2x better generalization than state‑of‑the‑art vision‑language‑action baselines on unseen robotic tasks, with code and demos released publicly.
This article aggregates reporting from 3 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
DreamZero is one of the clearest signals yet that robotics is entering its “GPT‑2 moment.” Nvidia’s GEAR Lab is effectively treating the physical world the way large language models treat text: as a distribution to be modeled at scale, then queried with natural‑language prompts. By building a world action model on top of a pre‑trained video diffusion backbone, DreamZero learns both what the world will look like and which actions will make that happen, and can then generalize to new tasks, objects and even new robot bodies with minimal additional data. ([k.sina.com.cn](https://k.sina.com.cn/article_5952915705_162d248f906702hlvi.html))
For the race to AGI, this is strategically important. Language‑only systems hit a ceiling when they have to act in messy, continuous environments; general-purpose agents will need something like DreamZero’s cross‑embodiment transfer and zero‑shot control to matter outside screens. A 14B‑parameter model running at 7 Hz on real hardware shows that world‑scale robot models are now practical engineering projects, not sci‑fi. It also puts Nvidia in a powerful position: if it owns both the chips and the leading embodied foundation models, it can shape the ecosystem the way CUDA shaped deep learning a decade ago.



