Back to Frontiers

Agentic Systems

Maturing30%

Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.

agentstool-usecomputer-useAutoGPTmulti-agentfunction callingautonomous systems
95
Papers
35
Milestones
$71.2B
Funding
3
Benchmarks

Key Benchmarks

SWE-bench Verified

Real-world software engineering tasks from GitHub issues (500 validated samples)

74.9%Human: 87%
Leader: GPT-5medium saturation

WebArena

Autonomous web navigation and task completion across 812 tasks

74.3%Human: 78.2%
Leader: Deepseek v3.2high saturation

AISI Cyber Evals

UK AI Safety Institute cybersecurity capability evaluations

58%Human: 90%
Leader: Gemini 3 Prolow saturation

Recent Papers

Recent Milestones

Claude agents build 100k‑line C compiler in Rust

Techzine reported on February 10 that 16 instances of Anthropic’s Claude Opus 4.6, running as independent AI agents, collaboratively wrote roughly 100,000 lines of Rust code to create a working C compiler. The experiment, originally detailed by Anthropic researcher Nicholas Carlini, took about two weeks, cost around US$20,000 in API usage, and produced a compiler capable of building a Linux 6.9 kernel and major open‑source projects.

Feb 10, 2026breakthroughImpact: 90/100

Harvey Legal AI Chases $11B Valuation

TechCrunch reports that legal AI startup Harvey is in talks to raise about $200 million at an $11 billion valuation, led by Sequoia and Singapore’s GIC. The company, which sells LLM‑based tools to law firms, has already climbed from a $3 billion valuation in early 2025 to $8 billion after a $160 million round late last year.([techcrunch.com](https://techcrunch.com/2026/02/09/harvey-reportedly-raising-at-11b-valuation-just-months-after-it-hit-8b/))

Feb 9, 2026fundingImpact: 70/100

Claude Cowork Triggers SaaS Panic

On February 8, 2026, Chilean outlet G5noticias reported that the launch of Anthropic’s Claude Cowork automation tools triggered sell‑offs in software, legal and financial services stocks. Analysts fear enterprises could internalize tasks previously outsourced to SaaS and data providers.

Feb 8, 2026releaseImpact: 80/100

Goldman rolls out Claude agents across the bank

Goldman Sachs is rolling out AI agents built with Anthropic’s Claude model to automate trade accounting, compliance checks and client onboarding after six months of joint development. CIO Marco Argenti said on February 8, 2026 that the agents, already tested internally, will launch soon to support back-office staff handling a portion of the bank’s $2.5 trillion in assets under supervision.

Feb 8, 2026releaseImpact: 80/100

Goldman deploys Claude AI agents in back office

On February 7, 2026, Indian outlet The Financial Express reported that Goldman Sachs is rolling out Anthropic’s Claude‑based AI agents to automate accounting, paperwork, and client onboarding tasks after six months of co‑development. MLQ.ai and other tech outlets say the agents already support workflows tied to a portion of Goldman’s $2.5 trillion in assets under supervision and have cut onboarding times by about 30%. ([financialexpress.com](https://www.financialexpress.com/life/technology-goldman-sachs-adopts-anthropic-ai-agents-to-automate-accounting-and-back-office-tasks-4134437/))

Feb 7, 2026releaseImpact: 70/100

SoftBank bets on OpenAI Frontier for Japan Inc

On Feb. 6, 2026, SB OAI Japan and SoftBank announced they will base their “Crystal Intelligence” enterprise AI solution on OpenAI’s new Frontier platform.([softbank.jp](https://www.softbank.jp/corp/news/press/sbkk/2026/20260206_01/)) Frontier is being validated inside SoftBank, with a full rollout to Japanese enterprises targeted for 2026.

Feb 6, 2026releaseImpact: 70/100

OpenAI Frontier launches enterprise agent platform

On February 5, 2026, OpenAI announced Frontier, a new platform for enterprises to build, deploy and govern AI agents across their existing systems, initially available to a limited set of customers. TechCrunch reported the launch at 10:09 a.m. PST, noting early adopters including HP, Oracle, State Farm, Thermo Fisher and Uber.([openai.com](https://openai.com/index/introducing-openai-frontier/))

Feb 5, 2026releaseImpact: 80/100

Claude Opus 4.6 adds 1M context and agent teams

Anthropic launched Claude Opus 4.6 on February 5, 2026, adding “agent teams” that split large tasks across multiple coordinated agents and extending context to 1 million tokens. TechCrunch reported the release at 9:51 a.m. PST, highlighting gains in multi‑step coding, analytics and new PowerPoint integration.([techcrunch.com](https://techcrunch.com/2026/02/05/anthropic-releases-opus-4-6-with-new-agent-teams/))

Feb 5, 2026releaseImpact: 80/100

GPT‑5.3‑Codex pushes state-of-the-art coding

OpenAI released GPT‑5.3‑Codex, a new agentic coding model, on February 5, 2026, claiming state‑of‑the‑art performance on key coding and agent benchmarks and 25% faster execution than GPT‑5.2‑Codex. TechCrunch reported the launch at 12:01 p.m. PST, shortly after Anthropic unveiled its own upgraded Claude Opus 4.6 model for agentic coding.([openai.com](https://openai.com/index/introducing-gpt-5-3-codex/))

Feb 5, 2026releaseImpact: 90/100

ElevenLabs Raises $500M for Voice & Agentic AI

On February 5, 2026, ElevenLabs announced a $500 million Series D round at an $11 billion valuation, led by Sequoia Capital with heavy participation from Andreessen Horowitz and Iconiq. The company says it will use the capital to expand its ElevenAgents conversational AI platform, invest in “audio AGI” research and grow teams in hubs including Tokyo, Bengaluru, London and New York. ([prtimes.jp](https://prtimes.jp/main/html/rd/p/000000023.000160611.html))

Feb 5, 2026fundingImpact: 70/100

Roblox 4D AI Lets Players Spawn Live Objects

Roblox opened an “open beta” for its new 4D creation feature on February 4, 2026, letting users generate fully interactive objects rather than static 3D models. The system builds on Roblox’s Cube 3D generative model, which has already produced over 1.8 million user-created assets since launch.([techcrunch.com](https://techcrunch.com/2026/02/04/robloxs-4d-creation-feature-is-now-available-in-open-beta/))

Feb 4, 2026releaseImpact: 70/100

Claude Cowork Triggers $300B 'SaaSpocalypse

Between February 3–4, 2026, global software and data stocks lost roughly $300 billion in market value after Anthropic launched new Claude Cowork plugins, including a legal automation tool, that investors fear could undercut traditional software and IT services. Indian IT and BPO stocks fell up to 8–10%, while major U.S. and European software names also sold off sharply.([economictimes.indiatimes.com](https://economictimes.indiatimes.com/markets/stocks/news/rs-1-75-lakh-crore-saaspocalypse-for-it-stocks-explained-what-it-means-for-investors/articleshow/127900151.cms?from=mdr))

Feb 4, 2026releaseImpact: 80/100

Gemini Tests Full Android App Control

Chinese tech outlet IT之家 reports that strings in the Google app 17.4 beta reveal an experimental “Complete tasks with Gemini” feature, codenamed “bonobo,” enabling Gemini to control Android apps via screen automation. The prototype would let Gemini place orders or book rides inside specified apps, with Google warning users to closely supervise its actions.([finance.sina.com.cn](https://finance.sina.com.cn/tech/digi/2026-02-04/doc-inhkrtwh2936393.shtml))

Feb 4, 2026releaseImpact: 80/100

Alexa+ Becomes Free Prime-Scale AI Agent

On February 4, 2026, Amazon made Alexa+, its generative AI-powered assistant, available to all U.S. customers, bundling unlimited access into Prime memberships and offering a $19.99/month standalone tier. Alexa+ runs on large language models including Amazon Nova and Anthropic and adds agentic capabilities like booking services, managing calendars, and handling complex requests across devices.([techcrunch.com](https://techcrunch.com/2026/02/04/alexa-amazons-ai-assistant-is-now-available-to-everyone-in-the-u-s/))

Feb 4, 2026releaseImpact: 90/100

ElevenLabs raises $500M for voice & agent platform

ElevenLabs announced on February 4, 2026 that it raised a $500 million Series D round valuing the company at $11 billion, led by Sequoia Capital with major follow‑on investments from Andreessen Horowitz and ICONIQ. On February 5, 2026, outlets including eWeek and Music Business Worldwide reported the funding as one of the largest private generative‑AI deals to date.([elevenlabs.io](https://elevenlabs.io/blog/series-d))

Feb 4, 2026fundingImpact: 80/100

Zoom’s federated AI tops Humanity’s Last Exam

On January 21, 2026, UC Today reported that Zoom’s federated AI system scored 53.0 on the Humanity’s Last Exam benchmark, outperforming OpenAI’s GPT‑5.2 Pro and Google’s Gemini Deep Research. Zoom attributes the result to its architecture that routes queries across its own small models and partner models from OpenAI, Anthropic, Meta and others.

Jan 21, 2026benchmarkImpact: 70/100

Baidu tests multi-agent AI group chats

On January 21, 2026, Baidu’s Wenxin (Ernie) app confirmed internal testing of a new “multi-person, multi-agent” group chat feature. Company insiders said the tool is meant for task-focused AI collaboration, not to compete directly with WeChat.

Jan 21, 2026releaseImpact: 70/100

AI Slows Hiring at India’s IT Giants

The Register reports that India’s top four IT outsourcers — HCL, Infosys, TCS and Wipro — added just 3,910 staff over the past year, an unusually slow pace given historic quarterly hiring of 10,000+ workers. Management at all four firms told investors they are leaning heavily on AI to automate client work and streamline their own operations. ([theregister.com](https://www.theregister.com/2026/01/19/hcl_infosys_tcs_wipro_results/))

Jan 19, 2026releaseImpact: 70/100

Claude Cowork Shows Agents Hitting SaaS

Semafor reports that Anthropic’s new Claude Cowork agent has contributed to a selloff in SaaS software stocks, with investors worried about automation of white-collar workflows. The tool, launched January 12 and now going viral, lets Claude autonomously build apps, edit and organize local files, and handle complex office tasks with minimal input. ([semafor.com](https://www.semafor.com/article/01/18/2026/anthropics-new-ai-agent-claude-cowork-pushes-down-software-stocks?utm_source=openai))

Jan 18, 2026releaseImpact: 80/100

Claude agents power large cyber‑espionage op

On January 17, 2026, Cyber Magazine reported on Anthropic’s November 2025 disclosure that a Chinese state‑backed group used its Claude Code tool to automate 80–90% of a cyber‑espionage campaign against about 30 organizations worldwide. Anthropic says it detected the activity in mid‑September 2025, shut down the attackers’ access within 10 days and published a 13‑page technical report describing the AI‑orchestrated operation.([cybermagazine.com](https://cybermagazine.com/news/ai-agents-drive-first-large-scale-autonomous-cyberattack))

Jan 17, 2026breakthroughImpact: 80/100

Leading Organizations

OpenAI
Anthropic
Microsoft
Google

ArXiv Categories

cs.AIcs.MAcs.LGcs.SE

Related Frontiers