Agentic Systems

Maturing30%

Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.

agentstool-usecomputer-useAutoGPTmulti-agentfunction callingautonomous systems

224

Papers

120

Milestones

$71.2B

Funding

Benchmarks

Key Benchmarks

SWE-bench Verified

Real-world software engineering tasks from GitHub issues (500 validated samples)

95.5%Human: 87%

Leader: Claude Mythos 5high saturation

AISI Cyber Evals

UK AI Safety Institute cybersecurity capability evaluations

73%Human: 90%

Leader: Claude Mythos Previewmedium saturation

WebArena

Autonomous web navigation and task completion across 812 tasks

68%Human: 78.2%

Leader: Claude Opus 4.6medium saturation

Recent Milestones

OpenAI Ships GPT-5.6 and ChatGPT Work Agents

OpenAI on July 9, 2026 broadly released its GPT‑5.6 model family — Sol, Terra and Luna — after a limited government‑review preview. The company also launched ChatGPT Work, an agentic productivity tool that combines ChatGPT and Codex to automate end‑to‑end knowledge work across apps and documents. The rollout starts globally over the next 24 hours across ChatGPT and Codex.

Jul 9, 2026releaseImpact: 90/100

Claude Cowork Agents Go Cloud, Web, and Mobile

On July 8, 2026, TechRadar reported that Anthropic has enabled Claude Cowork sessions to run from mobile apps and a dedicated web portal, with workflows executing in the cloud by default. Anthropic also shared usage data showing Cowork is now used more for knowledge work than for coding.

Jul 8, 2026releaseImpact: 70/100

Claude Cowork agent expands to web and mobile

Anthropic began rolling out its Claude Cowork agent to web and mobile on July 8, 2026, after initially limiting it to a desktop app. The expansion lets Max-tier users assign multi‑step tasks that Cowork can continue executing even when their laptop is closed.

Jul 8, 2026releaseImpact: 70/100

Proofpoint embeds GPT‑5.5 in cyber defense

Cybersecurity company Proofpoint has been selected for OpenAI’s Daybreak Cyber Partner Program, allowing it to integrate GPT‑5.5 into its managed security products and workflows. The partnership focuses on using OpenAI models for defensive tasks like threat analysis, alert triage, and incident response while keeping direct model access with trusted partners.

Jul 5, 2026releaseImpact: 70/100

Microsoft $2.5B Frontier Co for AI deployments

Microsoft has created a new business unit called Frontier Company to embed AI engineering teams directly inside large customer organizations. The group will deploy over 6,000 specialists and is backed by what reports describe as a $2.5 billion internal commitment to accelerate enterprise AI projects.

Jul 5, 2026fundingImpact: 80/100

Codex data proves the agent era is here

Quasa.io reports on a new OpenAI economic research paper, “The Shift to Agentic AI: Evidence from Codex,” which analyzes millions of Codex usage records. The study finds that by mid‑2026, 99.8% of internal OpenAI employee output tokens and over 60% of organisational tokens flowed through Codex agents rather than chat interfaces, with median employee token output rising 10–50x since late 2025.

Jul 4, 2026paperImpact: 90/100

Microsoft Frontier Co. Bets $2.5B on Enterprise AI

On July 2, 2026 Microsoft announced Microsoft Frontier Company, a new operating business that will invest $2.5 billion and deploy around 6,000 engineers and industry experts to embed AI systems inside large enterprises. Spanish outlet CincoDías and Chinese tech site 36Kr reported the initiative in detail on July 4, 2026.

Jul 4, 2026fundingImpact: 70/100

LLM Nurse Safely Handles Cardiac Prep Calls

On July 4, 2026, npj Digital Medicine published a Mount Sinai Health System study evaluating a conversational AI assistant named Sofiya for pre‑procedure cardiac catheterization calls. Over roughly 1,600 patient calls, the customized LLM‑based agent completed about 88% of scripts successfully while reducing error rates between an initial stabilization phase and routine deployment.

Jul 4, 2026paperImpact: 70/100

GenAI.mil Hits 1.7M DoD Users, 100K Agents

The US Department of Defense said on July 2, 2026 that its internal generative AI marketplace GenAI.mil now has nearly 1.7 million users and over 100,000 custom agents. Officials plan to onboard more commercial models and deploy the service at higher classification levels under an updated "commercial‑first" procurement approach.

Jul 2, 2026releaseImpact: 80/100

Microsoft Pours $2.5B, 6K Staff into Frontier Co

On July 2, 2026, Microsoft announced a new business unit called Microsoft Frontier Company backed by a $2.5 billion investment and 6,000 engineers and industry specialists. The unit will embed teams inside customer organizations to design, deploy and operate AI systems built on Microsoft and third‑party models.

Jul 2, 2026fundingImpact: 80/100

Gemini Spark brings AI agents to macOS

Google’s Gemini Spark AI agent, previously available on the web and mobile, is now live in the Gemini app for macOS for Google AI Ultra subscribers in the US. A June 30 company blog post and July 1 coverage explain that Spark can automate multi‑step tasks across Mac files and apps, including remote runs initiated from a phone.

Jul 1, 2026releaseImpact: 70/100

Claude Sonnet 5 brings near‑Opus agents to the masses

Anthropic released Claude Sonnet 5 on June 30, 2026 as its most agentic mid‑tier model, closing much of the gap to Opus 4.8 while remaining significantly cheaper. The model is now the default for most Claude plans and is also being rolled out via cloud partners such as Amazon Bedrock.

Jun 30, 2026releaseImpact: 80/100

AWS pours $1B into agentic AI engineering

AWS announced a new Forward Deployed Engineering (FDE) organization on June 30, 2026, backed by a $1 billion investment to embed AI engineers directly inside customer teams. The group will build agentic AI systems on top of AWS services, aiming to compress deployment timelines from months to days while leaving customers self‑sufficient. ([aboutamazon.com](https://www.aboutamazon.com/news/aws/aws-1-billion-forward-deployed-ai-engineers))

Jun 30, 2026fundingImpact: 80/100

Claude Sonnet 5: cheaper near‑Opus agents

Anthropic released Claude Sonnet 5 on June 30, 2026 as its most agentic mid-size Claude model, narrowing the gap with Opus 4.8 at significantly lower prices. The model is now the default for free and Pro users and launches on the Claude API and platform with introductory pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026. ([anthropic.com](https://www.anthropic.com/news/claude-sonnet-5))

Jun 30, 2026releaseImpact: 80/100

ARIA agent automates model experimentation loops

On June 29, 2026, CoreWeave announced ARIA, an AI research and iteration agent embedded into Weights & Biases’ Weave platform that reads experiment data, surfaces insights, and suggests model or agent improvements. ARIA enters preview alongside the general availability of Weave as CoreWeave deepens its tooling for AI developers. ([coreweave.com](https://www.coreweave.com/news/coreweave-aria-launches-as-an-ai-research-and-iteration-agent-with-autonomous-research-and-collaborative-intelligence))

Jun 29, 2026releaseImpact: 70/100

Oracle Ships Agent Teams for Supply Chains

Oracle announced four new Fusion Agentic Applications for Oracle Fusion Cloud Supply Chain & Manufacturing, built around coordinated teams of AI agents, on June 29, 2026.([prnewswire.com](https://www.prnewswire.com/news-releases/oracle-adds-new-fusion-agentic-applications-to-help-customers-improve-supply-chain-performance-302812443.html)) The tools focus on inventory planning, supplier qualification, production readiness and Kanban management, and are already available to Oracle Cloud SCM customers.([zonebourse.com](https://www.zonebourse.com/actualite-bourse/oracle-annonce-quatre-nouvelles-applications-fusion-agentic-et-des-capacites-d-optimisation-des-stoc-ce7f5fded08cf62d))

Jun 29, 2026releaseImpact: 70/100

GLM‑5.2: Open Chinese model hits frontier cyber

On June 28, 2026, multiple outlets reported that Zhipu AI’s open‑weight GLM‑5.2 model matches or beats Anthropic’s restricted Claude Mythos on security bug‑finding benchmarks, based on tests by security firm Semgrep. Analysts say the result, if confirmed, exposes gaps in US export controls that targeted Mythos-class capabilities while leaving competitive Chinese open models accessible.

Jun 28, 2026benchmarkImpact: 90/100

Gemini 3.5 Flash Learns to Use Your Computer

On June 28, 2026, Sina Finance reported that Google has moved its "computer use" tool from a separate Gemini 2.5 model into Gemini 3.5 Flash as a built‑in capability, allowing the model to see screens and operate browsers, mobile apps and desktop software. The feature is in public preview and targets long‑running enterprise automation workloads, with tight integration into Google Cloud. ([finance.sina.com.cn](https://finance.sina.com.cn/stock/usstock/summary/2026-06-28/doc-iniewwyh6700122.shtml))

Jun 27, 2026releaseImpact: 80/100

Gemini 3.5 Flash Gets Built‑In Computer Control

Forbes reported on June 27, 2026 that Google has integrated its "computer use" capability directly into the Gemini 3.5 Flash model, allowing the AI to see screens and control browsers, desktops and mobile apps like a human user. The upgrade, rolled out on June 24 and highlighted publicly on June 27, targets developers and enterprises building agentic workflows that automate multi‑step tasks across devices.

Jun 27, 2026releaseImpact: 70/100

Sakana, 360 Ship Mythos‑Class Orchestration Alternatives

On June 27, 2026, TechCrunch reported that China’s 360 Security Technology and Japan’s Sakana AI have unveiled new AI systems positioned as alternatives to Anthropic’s Mythos and Fable 5 models, which remain restricted under a U.S. export order. 360 introduced its Tulongfeng and Yitianzhen cybersecurity tools, while Sakana AI launched its Fugu orchestration model targeting Japanese enterprises seeking frontier capabilities without U.S. export‑control risk.

Jun 27, 2026releaseImpact: 70/100

Leading Organizations

OpenAI

Anthropic

Microsoft

Google

ArXiv Categories

cs.AIcs.MAcs.LGcs.SE

Related Frontiers

Science Robotics