On May 5, 2026, the US Commerce Department’s Center for AI Standards and Innovation (CAISI) announced new agreements with Google DeepMind, Microsoft, and xAI to give the government early access to their AI models. The deals allow CAISI to run pre‑deployment evaluations for national security risks, extending earlier arrangements with OpenAI and Anthropic.
This article aggregates reporting from 12 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
These CAISI agreements pull some of the most powerful labs—Google DeepMind, Microsoft, and xAI—into a structured pre‑release evaluation regime for frontier models. In practice, that means the US government will see capabilities and failure modes before the public does, and can run targeted red‑teaming focused on national‑security risks. It also normalizes the idea that deployment of top‑end models is conditional on scrutiny, not just a product launch decision inside a lab.
For the race to AGI, this is a double‑edged development. On one side, mandatory-ish pre‑release testing for a handful of top models could slow down reckless launches and surface dangerous capabilities before they’re widely available. It also creates an institutional memory of what “normal” and “abnormal” behavior looks like in frontier systems, which is essential for governing agents, tool‑use, and autonomous cyber actions. On the other side, giving a single national government privileged early access consolidates information power. CAISI will become a high‑leverage chokepoint that shapes which risks matter, who gets influenced, and how standards propagate internationally.
The bigger picture is that AI evaluation itself is becoming a strategic asset. Labs that can demonstrate strong performance under government‑designed tests will be better placed to win sensitive contracts and public trust, while smaller players may struggle to match the compliance overhead. That could entrench today’s leaders even as it makes catastrophic misuse less likely.

