Back to Frontiers

Agentic Systems

High71%

Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.

agentstool-usecomputer-useAutoGPTmulti-agentfunction callingautonomous systems
59
Papers
15
Milestones
$71.2B
Funding
3
Benchmarks

Key Benchmarks

SWE-bench Verified

Real-world software engineering tasks from GitHub issues (500 validated samples)

79.2%Human: 87%
Leader: Claude Opus 4.5medium saturation

WebArena

Autonomous web navigation and task completion across 812 tasks

71.2%Human: 78.2%
Leader: GPT-5 Browser Agentmedium saturation

AISI Cyber Evals

UK AI Safety Institute cybersecurity capability evaluations

58%Human: 90%
Leader: Gemini 3 Prolow saturation

Recent Papers

Recent Milestones

Gmail rolls out Gemini 3 AI Inbox

On January 10, 2026, Google began rolling out a Gemini 3–powered overhaul of Gmail that adds AI conversation summaries, natural-language search, and a new AI Inbox view that prioritises urgent messages and VIP contacts. The update is debuting for English-speaking users in the US, with advanced drafting and proofreading tools reserved for Google AI Pro and Ultra subscribers.

Jan 10, 2026releaseImpact: 70/100

DeepSeek V4 aims past GPT and Claude at code

Chinese startup DeepSeek is expected to launch its next-generation V4 AI model in mid‑February 2026, with a strong focus on software development tasks. According to reporting based on internal tests, V4 could outperform Anthropic’s Claude and OpenAI’s GPT series on coding benchmarks and handle exceptionally long programming prompts.

Jan 9, 2026releaseImpact: 80/100

Nvidia–Groq and Meta–Manus mark 2026 AI land grab

Spanish outlet ElNacional.cat reported from Barcelona at 5:30 a.m. CET on Jan. 4, 2026 that Nvidia is paying around $20 billion for a licensing and talent deal with Groq, while Meta is acquiring Chinese‑founded AI agent startup Manus for an estimated $2–3 billion. The column frames these moves as emblematic of an AI M&A boom and a prelude to blockbuster IPOs from SpaceX, OpenAI and Anthropic.([elnacional.cat](https://www.elnacional.cat/oneconomia/es/on-ia/semana-ia-inteligencia-artificial-va-compras_1532545_102.html))

Jan 4, 2026fundingImpact: 80/100

Claude co-builds Rue, a new systems language

Rust core contributor Steve Klabnik is building a new systems programming language called Rue, aimed at memory safety without garbage collection and simpler ergonomics than Rust. In a January 3, 2026 interview, he said Anthropic’s Claude wrote most of Rue’s ~70,000 lines of Rust code over two weeks, effectively acting as a co‑developer on the compiler.

Jan 3, 2026breakthroughImpact: 70/100

OpenAI unifies teams for audio‑first AI device

Dataconomy reports that OpenAI has consolidated engineering, product and research teams over the last two months to overhaul its audio models for a new ‘audio-first’ personal device targeted for about a year from now, citing The Information. The project, involving former Apple design chief Jony Ive, aims to ship hardware that acts as an AI companion using more natural, interruption-tolerant speech models.([dataconomy.com](https://dataconomy.com/2026/01/02/openai-unifies-teams-to-build-audio-device-with-jony-ive/))

Jan 2, 2026releaseImpact: 70/100

Gulf states emerge as third AI compute pole

Semafor reported on December 31, 2025 that 2025’s AI deals in the Gulf—spanning Saudi Arabia’s HUMAIN, Abu Dhabi’s G42 and MGX, and Qatar’s new Qai venture—map out aggressive national AI strategies. Qatar’s Qai plans to invest $20 billion in AI infrastructure with Brookfield, while HUMAIN struck chip deals with Nvidia and Qualcomm and MGX partnered with BlackRock on data centers. ([semafor.com](https://www.semafor.com/article/12/31/2025/gulf-ai-deals-in-2025-reveal-ambitious-national-strategies))

Dec 31, 2025fundingImpact: 80/100

Meta’s $2B+ Manus deal bets big on AI agents

Meta said on December 30, 2025 it is acquiring Singapore-based AI agent startup Manus, which runs a paid “general‑purpose” agent service for research, coding and other tasks. The company did not disclose terms, but multiple outlets report the deal is valued at more than $2 billion and will see Manus’ technology integrated into Meta AI across Facebook, Instagram and WhatsApp while Manus continues operating from Singapore.

Dec 30, 2025fundingImpact: 80/100

China Spots Agent + Infra Power Combo

Chinese outlet 投资界 (Pedaily) ran a December 30 wrap-up emphasizing Meta’s multi‑billion‑dollar acquisition of Manus and SoftBank’s $4 billion deal for DigitalBridge as landmark AI infrastructure and application bets. The piece frames both transactions as evidence of surging capital concentration around general AI agents and hyperscale data centers.([news.pedaily.cn](https://news.pedaily.cn/202512/559307.shtml))

Dec 30, 2025fundingImpact: 70/100

Meta Bets Big on Manus General AI Agent

Meta Platforms agreed on December 30 to acquire AI agent startup Manus, whose general-purpose agent is already generating over $100 million in annualized revenue. The company plans to integrate Manus’s autonomous ‘general agent’ technology into Meta AI and other consumer and business products, while continuing to offer Manus as a standalone service.([businesstimes.com.sg](https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-acquire-singapore-based-startup-manus-adding-agents-bolster-ai-bet))

Dec 30, 2025releaseImpact: 80/100

Claude Opus 4.5 Released

Anthropic releases Claude Opus 4.5, described as best model for coding, agents, and computer use.

Nov 24, 2025releaseImpact: 95/100

Claude Opus 4 Released

Anthropic releases Claude Sonnet 4 and Claude Opus 4 with enhanced agentic capabilities.

May 22, 2025releaseImpact: 90/100

OpenAI o3 Released

OpenAI releases o3 and o4-mini with advanced agentic capabilities.

Apr 16, 2025releaseImpact: 92/100

Anthropic MCP Protocol Released

Anthropic releases Model Context Protocol for standardized AI-tool integration.

Nov 25, 2024releaseImpact: 78/100

Claude Computer Use Released

Anthropic releases Claude Computer Use, enabling AI to control desktop applications autonomously.

Oct 22, 2024releaseImpact: 92/100

Devin AI Software Engineer Launch

Cognition launches Devin, the first AI software engineer capable of end-to-end development.

Mar 12, 2024releaseImpact: 80/100

Leading Organizations

OpenAI
Anthropic
Microsoft
Google

ArXiv Categories

cs.AIcs.MAcs.LGcs.SE

Related Frontiers