Agentic Systems
Autonomous agents, tool use, multi-agent collaboration, and extended autonomous action. AI that can act in the world.
Key Benchmarks
Recent Papers
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling
Chulun Zhou, Chunkang Zhang, Guoxin Yu +4 more
Adaptation of Agentic AI
Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi +15 more
VIVA: VLM-Guided Instruction-Based Video Editing with Reward Optimization
Xiaoyan Cong, Haotian Yang, Angtian Wang +4 more
AdaTooler-V: Adaptive Tool-Use for Images and Videos
Chaoyang Wang, Kaituo Feng, Dongyang Chen +8 more
VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer
Songqiao Hu, Zeyi Liu, Shuang Liu +3 more
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning
Zhe Liu, Runhui Huang, Rui Yang +6 more
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment
Mahir Labib Dihan, Tanzima Hashem, Mohammed Eunus Ali +1 more
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents
Jingzhe Ding, Shengda Long, Changxin Pu +2 more
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue +2 more
MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data
Zhenghao Zhu, Chuxue Cao, Sirui Han +4 more
Recent Milestones
Gmail rolls out Gemini 3 AI Inbox
On January 10, 2026, Google began rolling out a Gemini 3–powered overhaul of Gmail that adds AI conversation summaries, natural-language search, and a new AI Inbox view that prioritises urgent messages and VIP contacts. The update is debuting for English-speaking users in the US, with advanced drafting and proofreading tools reserved for Google AI Pro and Ultra subscribers.
DeepSeek V4 aims past GPT and Claude at code
Chinese startup DeepSeek is expected to launch its next-generation V4 AI model in mid‑February 2026, with a strong focus on software development tasks. According to reporting based on internal tests, V4 could outperform Anthropic’s Claude and OpenAI’s GPT series on coding benchmarks and handle exceptionally long programming prompts.
Nvidia–Groq and Meta–Manus mark 2026 AI land grab
Spanish outlet ElNacional.cat reported from Barcelona at 5:30 a.m. CET on Jan. 4, 2026 that Nvidia is paying around $20 billion for a licensing and talent deal with Groq, while Meta is acquiring Chinese‑founded AI agent startup Manus for an estimated $2–3 billion. The column frames these moves as emblematic of an AI M&A boom and a prelude to blockbuster IPOs from SpaceX, OpenAI and Anthropic.([elnacional.cat](https://www.elnacional.cat/oneconomia/es/on-ia/semana-ia-inteligencia-artificial-va-compras_1532545_102.html))
Claude co-builds Rue, a new systems language
Rust core contributor Steve Klabnik is building a new systems programming language called Rue, aimed at memory safety without garbage collection and simpler ergonomics than Rust. In a January 3, 2026 interview, he said Anthropic’s Claude wrote most of Rue’s ~70,000 lines of Rust code over two weeks, effectively acting as a co‑developer on the compiler.
OpenAI unifies teams for audio‑first AI device
Dataconomy reports that OpenAI has consolidated engineering, product and research teams over the last two months to overhaul its audio models for a new ‘audio-first’ personal device targeted for about a year from now, citing The Information. The project, involving former Apple design chief Jony Ive, aims to ship hardware that acts as an AI companion using more natural, interruption-tolerant speech models.([dataconomy.com](https://dataconomy.com/2026/01/02/openai-unifies-teams-to-build-audio-device-with-jony-ive/))
Gulf states emerge as third AI compute pole
Semafor reported on December 31, 2025 that 2025’s AI deals in the Gulf—spanning Saudi Arabia’s HUMAIN, Abu Dhabi’s G42 and MGX, and Qatar’s new Qai venture—map out aggressive national AI strategies. Qatar’s Qai plans to invest $20 billion in AI infrastructure with Brookfield, while HUMAIN struck chip deals with Nvidia and Qualcomm and MGX partnered with BlackRock on data centers. ([semafor.com](https://www.semafor.com/article/12/31/2025/gulf-ai-deals-in-2025-reveal-ambitious-national-strategies))
Meta’s $2B+ Manus deal bets big on AI agents
Meta said on December 30, 2025 it is acquiring Singapore-based AI agent startup Manus, which runs a paid “general‑purpose” agent service for research, coding and other tasks. The company did not disclose terms, but multiple outlets report the deal is valued at more than $2 billion and will see Manus’ technology integrated into Meta AI across Facebook, Instagram and WhatsApp while Manus continues operating from Singapore.
China Spots Agent + Infra Power Combo
Chinese outlet 投资界 (Pedaily) ran a December 30 wrap-up emphasizing Meta’s multi‑billion‑dollar acquisition of Manus and SoftBank’s $4 billion deal for DigitalBridge as landmark AI infrastructure and application bets. The piece frames both transactions as evidence of surging capital concentration around general AI agents and hyperscale data centers.([news.pedaily.cn](https://news.pedaily.cn/202512/559307.shtml))
Meta Bets Big on Manus General AI Agent
Meta Platforms agreed on December 30 to acquire AI agent startup Manus, whose general-purpose agent is already generating over $100 million in annualized revenue. The company plans to integrate Manus’s autonomous ‘general agent’ technology into Meta AI and other consumer and business products, while continuing to offer Manus as a standalone service.([businesstimes.com.sg](https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-acquire-singapore-based-startup-manus-adding-agents-bolster-ai-bet))
Claude Opus 4.5 Released
Anthropic releases Claude Opus 4.5, described as best model for coding, agents, and computer use.
Claude Opus 4 Released
Anthropic releases Claude Sonnet 4 and Claude Opus 4 with enhanced agentic capabilities.
OpenAI o3 Released
OpenAI releases o3 and o4-mini with advanced agentic capabilities.
Anthropic MCP Protocol Released
Anthropic releases Model Context Protocol for standardized AI-tool integration.
Claude Computer Use Released
Anthropic releases Claude Computer Use, enabling AI to control desktop applications autonomously.
Devin AI Software Engineer Launch
Cognition launches Devin, the first AI software engineer capable of end-to-end development.