Agentic Systems
Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.
Key Benchmarks
Recent Papers
R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning
Wanlong Liu, Bo Zhang, Chenliang Li +4 more
InCoder-32B-Thinking: Industrial Code World Model for Thinking
Jian Yang, Wei Zhang, Jiajun Wu +19 more
RouteGoT: Node-Adaptive Routing for Cost-Efficient Graph of Thoughts Reasoning
Yuhang Liu, Ruijie Wang, Yunlong Chu +4 more
MASFactory: A Graph-centric Framework for Orchestrating LLM-Based Multi-Agent Systems with Vibe Graphing
Yang Liu, Jinxuan Cai, Yishen Li +6 more
Reinforcement World Model Learning for LLM-based Agents
Xiao Yu, Baolin Peng, Ruize Xu +6 more
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
Haozhen Zhang, Quanyu Long, Jianzhu Bao +4 more
Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening
Zhenxiong Yu, Zhi Yang, Zhiheng Jin +19 more
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Johannes Kirmayr, Lukas Stappen, Elisabeth André
Optimization is Not Enough: Why Problem Formulation Deserves Equal Attention
Iván Olarte Rodríguez, Gokhan Serhat, Mariusz Bujny +3 more
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions
Fangzhi Xu, Hang Yan, Qiushi Sun +16 more
Recent Milestones
SpaceX IPO bid forces Wall St into Grok
On April 4, 2026, crypto-focused outlet MEXC News reported that SpaceX has confidentially filed for a U.S. IPO targeting up to a $1.75 trillion valuation and a raise of as much as $75 billion. Related coverage from Meyka highlighted reports that banks seeking senior roles on the deal are being asked to buy paid subscriptions to xAI’s Grok chatbot as part of their engagement.
Linux Foundation Backs x402 Agent Payments
On April 3, 2026 MEXC relayed that the Linux Foundation has launched the x402 Foundation to steward Coinbase’s x402 protocol as an open standard for AI agent payments. Google, Microsoft, Amazon Web Services, Visa, Mastercard, Stripe and others signaled support for the new foundation, which targets on‑chain and fiat payments initiated by AI agents.
Qwen3.6-Plus: 1M-token agentic coding model
On April 2, 2026, Alibaba’s Qwen unit released Qwen 3.6‑Plus, a new large language model with a 1‑million‑token context window aimed at “agentic coding” and multimodal code generation. The model is offered via Alibaba Cloud’s Model Studio with pricing tailored to enterprise workloads.
Claude Code leak reveals always-on dev agents
On April 1, 2026, multiple outlets reported that Anthropic accidentally shipped an npm update of its Claude Code CLI that exposed more than 500,000 lines of internal TypeScript source code. The leak, traced to a misconfigured source map in version 2.1.88, revealed unreleased features such as a background ‘Proactive’ or daemon mode and detailed agent orchestration internals. Anthropic confirmed the incident and said no customer data or credentials were exposed.
Claude Enters Live Military Targeting Stack
On March 8, 2026, Paraguay’s La Nación, citing AFP, detailed how AI-enabled systems like the US Maven Smart System, reportedly integrated with Palantir software and Anthropic’s Claude model, are accelerating target selection in the escalating conflict between the US, Israel and Iran. Experts say major militaries are heavily investing in AI across logistics, reconnaissance and electronic warfare, while raising unresolved questions about accountability when AI-assisted strikes hit civilian targets.
Claude Code hits 100% self-written codebase
Anthropic engineer Boris Cherny said on March 7, 2026 that Claude Code, the company’s AI coding assistant, now writes 100% of its own codebase, with humans providing direction and review. An OfficeChai report on March 8 recounts that Anthropic’s internal teams are already shipping thousands of lines per pull request generated entirely by Claude Code.
AWS ships agentic AI into hospital back offices
Amazon Web Services has launched Amazon Connect Health, an AI agent platform that automates tasks like patient verification, appointment scheduling, documentation and medical coding for healthcare providers. AWS says the HIPAA‑eligible service plugs directly into electronic health record systems and is already in use at systems like UC San Diego Health and One Medical, with more capabilities rolling out over time. ([aboutamazon.com](https://www.aboutamazon.com/news/aws/amazon-connect-health-ai-healthcare))
Japan Pilots Gennai AI for 180K Officials
Japan’s Digital Agency said on March 7, 2026 it will launch a large-scale test of its in-house “Gennai” generative AI platform with about 180,000 staff across all ministries and agencies starting in May. The pilot will evaluate AI for drafting, summarisation and workflow support before a planned full-scale rollout in fiscal 2027.([japantimes.co.jp](https://www.japantimes.co.jp/news/2026/03/07/japan/digital-agency-ai-test-may/?utm_source=openai))
AWS launches Amazon Connect Health AI agents
Amazon Web Services launched Amazon Connect Health on March 5, 2026 as a HIPAA-eligible agentic AI platform to automate patient verification, appointment management, ambient documentation, and medical coding for healthcare providers.([aws.amazon.com](https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-connect-health-agentic-ai-healthcare/?utm_source=openai)) The service is initially available in AWS US East (N. Virginia) and US West (Oregon) regions and integrates with Amazon Connect contact centers and EHR systems.([aws.amazon.com](https://aws.amazon.com/about-aws/whats-new/2026/03/amazon-connect-health-agentic-ai-healthcare/?utm_source=openai))
Claude agents build 100k‑line C compiler in Rust
Techzine reported on February 10 that 16 instances of Anthropic’s Claude Opus 4.6, running as independent AI agents, collaboratively wrote roughly 100,000 lines of Rust code to create a working C compiler. The experiment, originally detailed by Anthropic researcher Nicholas Carlini, took about two weeks, cost around US$20,000 in API usage, and produced a compiler capable of building a Linux 6.9 kernel and major open‑source projects.
Harvey Legal AI Chases $11B Valuation
TechCrunch reports that legal AI startup Harvey is in talks to raise about $200 million at an $11 billion valuation, led by Sequoia and Singapore’s GIC. The company, which sells LLM‑based tools to law firms, has already climbed from a $3 billion valuation in early 2025 to $8 billion after a $160 million round late last year.([techcrunch.com](https://techcrunch.com/2026/02/09/harvey-reportedly-raising-at-11b-valuation-just-months-after-it-hit-8b/))
Claude Cowork Triggers SaaS Panic
On February 8, 2026, Chilean outlet G5noticias reported that the launch of Anthropic’s Claude Cowork automation tools triggered sell‑offs in software, legal and financial services stocks. Analysts fear enterprises could internalize tasks previously outsourced to SaaS and data providers.
Goldman rolls out Claude agents across the bank
Goldman Sachs is rolling out AI agents built with Anthropic’s Claude model to automate trade accounting, compliance checks and client onboarding after six months of joint development. CIO Marco Argenti said on February 8, 2026 that the agents, already tested internally, will launch soon to support back-office staff handling a portion of the bank’s $2.5 trillion in assets under supervision.
Goldman deploys Claude AI agents in back office
On February 7, 2026, Indian outlet The Financial Express reported that Goldman Sachs is rolling out Anthropic’s Claude‑based AI agents to automate accounting, paperwork, and client onboarding tasks after six months of co‑development. MLQ.ai and other tech outlets say the agents already support workflows tied to a portion of Goldman’s $2.5 trillion in assets under supervision and have cut onboarding times by about 30%. ([financialexpress.com](https://www.financialexpress.com/life/technology-goldman-sachs-adopts-anthropic-ai-agents-to-automate-accounting-and-back-office-tasks-4134437/))
SoftBank bets on OpenAI Frontier for Japan Inc
On Feb. 6, 2026, SB OAI Japan and SoftBank announced they will base their “Crystal Intelligence” enterprise AI solution on OpenAI’s new Frontier platform.([softbank.jp](https://www.softbank.jp/corp/news/press/sbkk/2026/20260206_01/)) Frontier is being validated inside SoftBank, with a full rollout to Japanese enterprises targeted for 2026.
OpenAI Frontier launches enterprise agent platform
On February 5, 2026, OpenAI announced Frontier, a new platform for enterprises to build, deploy and govern AI agents across their existing systems, initially available to a limited set of customers. TechCrunch reported the launch at 10:09 a.m. PST, noting early adopters including HP, Oracle, State Farm, Thermo Fisher and Uber.([openai.com](https://openai.com/index/introducing-openai-frontier/))
Claude Opus 4.6 adds 1M context and agent teams
Anthropic launched Claude Opus 4.6 on February 5, 2026, adding “agent teams” that split large tasks across multiple coordinated agents and extending context to 1 million tokens. TechCrunch reported the release at 9:51 a.m. PST, highlighting gains in multi‑step coding, analytics and new PowerPoint integration.([techcrunch.com](https://techcrunch.com/2026/02/05/anthropic-releases-opus-4-6-with-new-agent-teams/))
GPT‑5.3‑Codex pushes state-of-the-art coding
OpenAI released GPT‑5.3‑Codex, a new agentic coding model, on February 5, 2026, claiming state‑of‑the‑art performance on key coding and agent benchmarks and 25% faster execution than GPT‑5.2‑Codex. TechCrunch reported the launch at 12:01 p.m. PST, shortly after Anthropic unveiled its own upgraded Claude Opus 4.6 model for agentic coding.([openai.com](https://openai.com/index/introducing-gpt-5-3-codex/))
ElevenLabs Raises $500M for Voice & Agentic AI
On February 5, 2026, ElevenLabs announced a $500 million Series D round at an $11 billion valuation, led by Sequoia Capital with heavy participation from Andreessen Horowitz and Iconiq. The company says it will use the capital to expand its ElevenAgents conversational AI platform, invest in “audio AGI” research and grow teams in hubs including Tokyo, Bengaluru, London and New York. ([prtimes.jp](https://prtimes.jp/main/html/rd/p/000000023.000160611.html))
Roblox 4D AI Lets Players Spawn Live Objects
Roblox opened an “open beta” for its new 4D creation feature on February 4, 2026, letting users generate fully interactive objects rather than static 3D models. The system builds on Roblox’s Cube 3D generative model, which has already produced over 1.8 million user-created assets since launch.([techcrunch.com](https://techcrunch.com/2026/02/04/robloxs-4d-creation-feature-is-now-available-in-open-beta/))