Back to Frontiers

Agentic Systems

Steady40%

Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.

agentstool-usecomputer-useAutoGPTmulti-agentfunction callingautonomous systems
140
Papers
58
Milestones
$71.2B
Funding
3
Benchmarks

Key Benchmarks

SWE-bench Verified

Real-world software engineering tasks from GitHub issues (500 validated samples)

80.9%Human: 87%
Leader: Claude Opus 4.5medium saturation

WebArena

Autonomous web navigation and task completion across 812 tasks

74.3%Human: 78.2%
Leader: DeepSeek V3.2high saturation

AISI Cyber Evals

UK AI Safety Institute cybersecurity capability evaluations

58%Human: 90%
Leader: Gemini 3 Prolow saturation

Recent Papers

Recent Milestones

Fujitsu rolls out self-evolving AI agent teams

On May 25, 2026, Fujitsu announced a self-evolving multi-AI agent technology built around its business-specific Takane large language model. The system lets multiple AI agents work as a team and continuously learn from operations, human feedback and policy or specification changes while automatically tuning prompts, evaluation criteria and models for domains such as manufacturing, healthcare, finance and public administration.

May 25, 2026releaseImpact: 70/100

xAI Completes 1.5T Grok V9 Coding Model

On May 25, 2026, Elon Musk announced that xAI has completed training of the 1.5-trillion‑parameter Grok V9-Medium foundation model, now entering fine-tuning and reinforcement learning. The new model, trained with substantial Cursor coding data, is slated for public release in 2–3 weeks, while the current 0.5T V8-Small model serving production traffic is planned to be open-sourced by year-end.

May 25, 2026releaseImpact: 70/100

EPAM, Anthropic sign multi‑year safe AI deal

On May 6, 2026, EPAM Systems announced a strategic multi‑year partnership with Anthropic to help Global 2000 clients build and scale enterprise AI using Claude models. EPAM is creating a 10,000‑strong Claude‑certified architect practice, with over 20,000 employees already trained via Anthropic Academy.

May 6, 2026fundingImpact: 70/100

Sinopec deploys Fenghuo industrial AI agents

On May 6, 2026, Sinopec announced its "Fenghuo" industrial AI agent, built on the company’s Great Wall large model, as a digital expert for petrochemical operations. The system can run simulations, process‑modeling tools and long‑duration tasks across roles like "Fenghuo Scientist" and "Fenghuo Engineer" in live production environments.

May 6, 2026releaseImpact: 80/100

Perplexity launches AI terminal for finance pros

On May 6, 2026, Perplexity launched “Computer for Professional Finance,” an AI platform for analysts that combines licensed market data with pre‑built research workflows. The product integrates feeds from providers like Morningstar and PitchBook and emphasizes traceable outputs linked back to source documents.

May 6, 2026releaseImpact: 70/100

Google preps Gemini ‘Remy’ 24/7 AI agent

On May 6, 2026, Moneycontrol reported that Google is internally testing an AI agent codenamed “Remy” that would let Gemini proactively manage emails, calendars and other tasks. The tool, described in internal documents, may be previewed at Google I/O but has not yet been officially announced.

May 6, 2026releaseImpact: 70/100

China becomes mass testbed for AI agents

An AP feature published late on May 5, 2026 describes how more than 600 million people in China were using generative AI by December 2025, a 142% year‑on‑year increase. Chinese models’ weekly token usage has recently surpassed US models, turning China into a large‑scale testing ground for AI agents like OpenClaw and services from companies such as DeepSeek, Tencent, Alibaba and Baidu.

May 6, 2026breakthroughImpact: 80/100

Meta readies agentic AI for daily tasks

On May 6, 2026, UAE daily Al Khaleej, citing the Financial Times and The Information, reported that Meta is internally testing a highly personalized AI assistant, powered by a new model called “Muse Spark,” designed to carry out users’ daily tasks. Meta is also training an internal agent codenamed “Hatch” and plans to integrate a separate AI shopping tool into Instagram before Q4 2026.

May 6, 2026releaseImpact: 70/100

IBM Rolls Out Full Stack for Enterprise AI Agents

On May 5, 2026, IBM used its Think 2026 conference to announce a broad expansion of its enterprise AI stack, including multi‑agent orchestration, AI‑ready data, and an AI‑powered operations platform. IBM also made its Sovereign Core platform generally available to help enterprises and governments run AI in regulated, sovereignty‑constrained environments. ([newsroom.ibm.com](https://newsroom.ibm.com/2026-05-05-Think-2026-IBM-Delivers-the-Blueprint-for-the-AI-Operating-Model-as-the-AI-Divide-Widens))

May 5, 2026releaseImpact: 80/100

OpenAI Fast-Tracks 2027 AI Agent Phone

On May 5, 2026, analyst Ming‑Chi Kuo reported that OpenAI is fast‑tracking development of its first "AI agent phone," now aiming for mass production as early as the first half of 2027. The device is expected to use a customized MediaTek Dimensity 9600 chip on TSMC’s N2P node and target tens of millions of units over 2027–2028.

May 5, 2026releaseImpact: 80/100

HUMAIN & AWS Launch OS for AI Agents

On May 5, 2026, Saudi AI company HUMAIN announced HUMAIN ONE, an expanded strategic collaboration with Amazon Web Services to deploy enterprise‑grade AI agents on AWS infrastructure. The initiative aims to let public and private organizations run agentic generative AI with built‑in security, data sovereignty and regulatory compliance. ([news.nate.com](https://news.nate.com/view/20260505n05287?utm_source=openai))

May 5, 2026releaseImpact: 70/100

OpenAI & Anthropic Launch $11.5B Deployment Arms

On May 5, 2026, Awesome Agents detailed how OpenAI and Anthropic have each launched private‑equity‑backed enterprise AI deployment companies, together valued at around $11.5 billion. OpenAI’s “The Deployment Company” and Anthropic’s new services firm will embed forward‑deployed engineers inside portfolio companies of major PE investors to roll out AI tools at scale. ([awesomeagents.ai](https://awesomeagents.ai/news/openai-anthropic-enterprise-deployment-jv/))

May 5, 2026fundingImpact: 90/100

OpenAI spins up $10bn enterprise Deployment Company

OpenAI has finalized a $10 billion joint venture called The Deployment Company with private equity firms including TPG, Brookfield, Advent and Bain Capital to push its models into portfolio companies at scale. The vehicle has raised over $4 billion in outside capital on May 4, 2026, valuing the JV at around $10 billion and positioning it as a dedicated enterprise AI deployment arm. ([thenextweb.com](https://thenextweb.com/news/openai-deployco-finalized-10-billion-joint-venture))

May 4, 2026fundingImpact: 90/100

Anthropic lands $1.5bn Claude deployment JV

Anthropic has launched a new AI-native enterprise services firm with Blackstone, Hellman & Friedman and Goldman Sachs as founding partners, backed by about $1.5 billion in committed capital. Announced on May 4, 2026, the venture will embed Claude engineers and models directly into mid-market companies, positioning itself as an alternative to traditional consulting for AI transformation. ([techcrunch.com](https://techcrunch.com/2026/05/04/anthropic-and-openai-are-both-launching-joint-ventures-for-enterprise-ai-services/))

May 4, 2026fundingImpact: 80/100

SpaceX IPO bid forces Wall St into Grok

On April 4, 2026, crypto-focused outlet MEXC News reported that SpaceX has confidentially filed for a U.S. IPO targeting up to a $1.75 trillion valuation and a raise of as much as $75 billion. Related coverage from Meyka highlighted reports that banks seeking senior roles on the deal are being asked to buy paid subscriptions to xAI’s Grok chatbot as part of their engagement.

Apr 4, 2026fundingImpact: 80/100

Linux Foundation Backs x402 Agent Payments

On April 3, 2026 MEXC relayed that the Linux Foundation has launched the x402 Foundation to steward Coinbase’s x402 protocol as an open standard for AI agent payments. Google, Microsoft, Amazon Web Services, Visa, Mastercard, Stripe and others signaled support for the new foundation, which targets on‑chain and fiat payments initiated by AI agents.

Apr 3, 2026releaseImpact: 90/100

Qwen3.6-Plus: 1M-token agentic coding model

On April 2, 2026, Alibaba’s Qwen unit released Qwen 3.6‑Plus, a new large language model with a 1‑million‑token context window aimed at “agentic coding” and multimodal code generation. The model is offered via Alibaba Cloud’s Model Studio with pricing tailored to enterprise workloads.

Apr 2, 2026releaseImpact: 80/100

Claude Code leak reveals always-on dev agents

On April 1, 2026, multiple outlets reported that Anthropic accidentally shipped an npm update of its Claude Code CLI that exposed more than 500,000 lines of internal TypeScript source code. The leak, traced to a misconfigured source map in version 2.1.88, revealed unreleased features such as a background ‘Proactive’ or daemon mode and detailed agent orchestration internals. Anthropic confirmed the incident and said no customer data or credentials were exposed.

Apr 1, 2026releaseImpact: 80/100

Claude Enters Live Military Targeting Stack

On March 8, 2026, Paraguay’s La Nación, citing AFP, detailed how AI-enabled systems like the US Maven Smart System, reportedly integrated with Palantir software and Anthropic’s Claude model, are accelerating target selection in the escalating conflict between the US, Israel and Iran. Experts say major militaries are heavily investing in AI across logistics, reconnaissance and electronic warfare, while raising unresolved questions about accountability when AI-assisted strikes hit civilian targets.

Mar 8, 2026releaseImpact: 80/100

Claude Code hits 100% self-written codebase

Anthropic engineer Boris Cherny said on March 7, 2026 that Claude Code, the company’s AI coding assistant, now writes 100% of its own codebase, with humans providing direction and review. An OfficeChai report on March 8 recounts that Anthropic’s internal teams are already shipping thousands of lines per pull request generated entirely by Claude Code.

Mar 8, 2026releaseImpact: 80/100

Leading Organizations

OpenAI
Anthropic
Microsoft
Google

ArXiv Categories

cs.AIcs.MAcs.LGcs.SE

Related Frontiers