On December 23, 2025, OpenAI told NewsBytes and other outlets that its ChatGPT Atlas AI browser will likely remain vulnerable to prompt injection attacks, even after new security updates. The company detailed a reinforcement‑learning–driven “automated attacker” it uses to discover exploits and said it is hardening Atlas with an adversarially trained agent model and multi‑layer safeguards.
This article aggregates reporting from 4 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
OpenAI is effectively conceding that agentic browsing will live with a chronic security vulnerability: prompt injection. The Atlas blog and follow‑on coverage paint a picture where even with adversarially trained models, layered defenses and a reinforcement‑learning “auto‑attacker,” there is no clean fix for malicious instructions hiding in emails, docs or web pages. That matters because Atlas‑style agents—LLMs that can click, type and move money—are the clearest bridge from today’s chatbots to the kind of semi‑autonomous systems many people associate with early AGI.
OpenAI’s response is to lean into automation on the defense side too: using its strongest models as red‑team attackers and closing a tight discovery‑to‑patch loop. This is an important proof of concept for how frontier labs might secure increasingly capable agents without freezing deployment. But it also underlines that as we add autonomy and tool access, the attack surface explodes faster than we can mathematically guarantee safety. Competitors like Anthropic, Google and Brave face the same structural problem, so whoever can iterate this kind of automated red teaming and hardening fastest will gain an edge in shipping powerful but trustworthy agents.



