At CES 2026 in Las Vegas, Nvidia introduced the Rubin platform, a six‑chip AI supercomputing stack featuring the Vera CPU and Rubin GPU. The company claims Rubin will cut inference token costs by up to 10x and train mixture‑of‑experts models with a quarter of the GPUs compared to its prior Blackwell platform.
This article aggregates reporting from 6 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
Rubin is Nvidia’s answer to the question of what AI infrastructure looks like once GPUs alone stop scaling linearly. By co‑designing six components—Vera CPU, Rubin GPU, NVLink 6 switch, Spectrum‑X Ethernet, ConnectX‑9 SuperNIC and BlueField‑4 DPU—into a single platform, Nvidia is essentially productizing an AI “factory in a rack.” The focus on agentic AI, massive context memory and MoE efficiency is a clear signal that the company is building for long‑horizon reasoning and trillion‑parameter systems, not just today’s chatbots.
For the race to AGI, Rubin pushes the bottleneck further away from raw flops and toward data, algorithms and safety. A 10x reduction in per‑token inference cost doesn’t just make existing models cheaper; it makes entirely new classes of always‑on, agentic systems economically viable. At the same time, integrating confidential computing and rack‑scale reliability shows Nvidia is anticipating hyperscale deployments where a million‑GPU footprint is realistic, not science fiction.
Strategically, Rubin tightens Nvidia’s grip on the full AI stack while deepening dependency from hyperscalers like Microsoft, AWS and Google, even as those firms invest in their own silicon. Anyone betting on AGI timelines has to account for Rubin‑class systems massively expanding the capacity frontier in 2026–2028.


