DeepSeek V4 lags US frontier by 8 months in NIST CAISI tests

Source: NIST / CAISI

Read original

TL;DR

AI-Summarizedfrom 4 sources

The US National Institute of Standards and Technology’s Centre for AI Standards and Innovation (CAISI) has released an evaluation of DeepSeek V4 Pro, calling it the most capable Chinese model tested so far. The April 2026 tests, published May 1 and summarized on May 4, 2026, conclude DeepSeek V4 trails top US frontier models by roughly eight months on aggregate capability, despite strong scores in math, software engineering and natural sciences.

About this summary

This article aggregates reporting from 4 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.

4 sources covering this story|1 company mentioned

Race to AGI Analysis

This is one of the first rigorous, government-backed comparisons of US and Chinese frontier models in the 2026 cycle, and the signal is clear: China’s best open-weight model is closing in, but still meaningfully behind. DeepSeek V4 Pro looks like a serious system—strong on math, code and scientific reasoning, and substantially more cost-efficient than comparable US models—but CAISI’s methodology pegs it around GPT‑5 era capability, roughly eight months off today’s US frontier. That’s a narrower gap than many expected, but still a gap. ([nist.gov](https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro?utm_source=openai))

For the race to AGI, this suggests a world where the US retains a capabilities lead while China is increasingly competitive on cost and openness. An eight-month lag is close enough that any slowdown or misstep by US labs—regulatory, financial, or safety-driven—could be enough for PRC labs to catch up on specific domains. Just as important, CAISI’s public benchmarking itself is new infrastructure: an institutionalized scoreboard for frontier performance. That will shape how governments, investors and corporate buyers perceive model risk and value, and may influence where the next wave of compute and talent goes.