Back to Frontiers

Foundation Models & Reasoning

Maturing1%

Core model architectures, training methods, chain-of-thought reasoning, and test-time compute scaling. The backbone of modern AI capabilities.

transformersscaling lawschain-of-thoughto1reasoningtest-time computeworld models
156
Papers
77
Milestones
$97.2B
Funding
3
Benchmarks

Key Benchmarks

GPQA Diamond

Graduate-level science questions requiring PhD-level expertise

94.3%Human: 69.7%
Leader: Gemini 3.1 Prohigh saturation

MMLU-Pro

Massive Multitask Language Understanding - Pro version with 10 answer choices and harder reasoning

90.1%Human: 89%
Leader: Gemini 3 Prohigh saturation

HLE (Humanity's Last Exam)

2,500 questions at the frontier of human knowledge across 100+ subjects

44.32%Human: 95%
Leader: GPT-5.4 Prolow saturation

Recent Papers

Recent Milestones

Anthropic Books $200B in Google Cloud & TPUs

On May 7, 2026, ET CIO, citing Reuters and The Information, reported that Anthropic has agreed to spend about $200 billion over five years on Google Cloud services and TPU chips as part of a recent capacity deal. The commitment would account for more than 40% of Google Cloud’s disclosed revenue backlog, while Alphabet is separately planning to invest up to $40 billion in Anthropic.

May 7, 2026fundingImpact: 90/100

Anthropic + SpaceX secure mega‑cluster Colossus 1

On May 6, 2026, Anthropic announced a partnership with SpaceX to use all compute capacity at the Colossus 1 data center in Memphis, adding over 300 MW and 220,000+ Nvidia GPUs within a month. The deal lets Anthropic immediately double Claude Code’s five-hour rate limits, lift peak‑hour throttling for Pro/Max, and substantially raise Opus API rate limits.

May 6, 2026fundingImpact: 90/100

AI Big Three Cut Major Model Cycles to ~50 Days

On May 5, 2026, South Korea’s AJU PRESS reported that Google, OpenAI and Anthropic have compressed their average major model release cadence to roughly 50 days over the last six months. The analysis warns that this acceleration raises competitive barriers for Korean foundation‑model efforts despite new public funds.

May 5, 2026releaseImpact: 80/100

Harvard Ties Neural Learning to Physics Theory

On May 5, 2026, TechXplore reported Harvard researchers have built a simplified ridge‑regression model that uses renormalization theory to explain why large neural networks can generalize without overfitting. The toy model connects high‑dimensional statistical fluctuations to stable learning behavior, offering a potential theoretical foundation for deep learning scaling laws. ([techxplore.com](https://techxplore.com/news/2026-05-simple-physics-ai.html))

May 5, 2026paperImpact: 70/100

GPT‑5.5 Instant Becomes Default ChatGPT Engine

On May 5, 2026, OpenAI began rolling out GPT‑5.5 Instant as the new default model for ChatGPT, replacing GPT‑5.3 Instant. The company says the model cuts hallucinated claims by over 50% on high‑stakes prompts while delivering faster, more concise, and more personalized responses.

May 5, 2026releaseImpact: 80/100

OpenAI o1 matches ER doctors in Harvard trial

On May 4, 2026, Indian Express reported on a Harvard Medical School and Beth Israel Deaconess study showing OpenAI’s o1 and GPT‑4o models correctly identified exact or near‑exact diagnoses in 67% of emergency‑room cases at initial triage, versus 50–55% for attending physicians. With more patient data, o1’s accuracy rose to 82%, matching or beating doctors across later diagnostic stages.([indianexpress.com](https://indianexpress.com/article/technology/artificial-intelligence/harvard-study-ai-doctors-emergency-room-trial-findings-10671938/))

May 4, 2026benchmarkImpact: 90/100

Anthropic taps $1.5B PE JV for Claude rollout

Anthropic is close to forming a roughly $1.5 billion joint venture with private equity giants Blackstone, Hellman & Friedman and Goldman Sachs to sell AI tools to portfolio companies, according to Reuters and Wall Street Journal reporting on May 4, 2026. The JV would see Anthropic and key investors each contribute hundreds of millions of dollars to commercialize Claude-based systems across private equity-backed firms. ([es.marketscreener.com](https://es.marketscreener.com/noticias/anthropic-ultima-una-joint-venture-de-1-500-millones-de-d-lares-con-firmas-de-wall-street-seg-n-e-ce7f58deda8af227?utm_source=openai))

May 4, 2026fundingImpact: 80/100

Korea backs Upstage with ₩560B sovereign AI bet

On May 3, 2026, South Korea’s National Growth Fund approved a direct equity investment of about 560 billion won (~$381 million) into AI startup Upstage. The funding will support Upstage’s Korean-language foundation models and a broader sovereign AI initiative, alongside related investments in a national AI computing center and strategic industries.

May 3, 2026fundingImpact: 80/100

NIST: DeepSeek V4 is top China model, 8 months behind US

The US National Institute of Standards and Technology’s Centre for AI Standards and Innovation (CAISI) has released an evaluation of DeepSeek V4 Pro, calling it the most capable Chinese model tested so far. The April 2026 tests, published May 1 and summarized on May 4, 2026, conclude DeepSeek V4 trails top US frontier models by roughly eight months on aggregate capability, despite strong scores in math, software engineering and natural sciences. ([nist.gov](https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro?utm_source=openai))

May 1, 2026benchmarkImpact: 80/100

Anthropic locks 5GW Amazon compute through 2030s

Caproasia reports on May 3, 2026 that Amazon plans to invest up to $25 billion in Anthropic on top of earlier funding, in exchange for more than $100 billion in Claude‑related spend on AWS and up to 5 gigawatts of Trainium‑based compute over the next decade. The underlying April 20 announcements by Anthropic and Amazon described $5 billion in immediate equity, up to $20 billion in milestone‑based commitments, and a multi‑year cloud and chip partnership.([caproasia.com](https://www.caproasia.com/2026/05/03/united-states-2-9-trillion-amazon-to-invest-25-billion-in-united-states-380-billion-ai-research-startup-anthropic-5-billion-on-20-4-26-20-billion-subject-to-milestones-anthropic-raised-30-b/))

Apr 20, 2026fundingImpact: 90/100

Gemma 4 vs Llama 4 vs Qwen 3.5: Open Titans

US-based consultancy Lushbinary published an in-depth comparison on April 5, 2026 of three flagship open-weight model families: Google DeepMind’s Gemma 4, Meta’s Llama 4 and Alibaba’s Qwen 3.5. The piece benchmarks licensing, performance, context length, multimodality and deployment trade-offs for production use.([lushbinary.com](https://www.lushbinary.com/blog/gemma-4-vs-llama-4-vs-qwen-3-5-open-weight-model-comparison/))

Apr 5, 2026releaseImpact: 80/100

400B Open Reasoning Model Undercuts Claude

On April 3, 2026, Arcee AI released Trinity-Large-Thinking, an Apache 2.0–licensed 400B-parameter sparse Mixture-of-Experts reasoning model that activates 13B parameters per token. The model scores 91.9 on PinchBench, within two points of Anthropic’s Claude Opus 4.6, while Arcee prices output at $0.90 per million tokens, roughly 96% cheaper than Opus. Trinity-Large-Thinking is available via OpenRouter, DigitalOcean’s Agentic Inference Cloud and downloadable weights on Hugging Face.

Apr 3, 2026releaseImpact: 80/100

China Labs Turn Token Sales Into Real Revenue

On April 3, 2026 at 14:49 local time in Shanghai, Xinhua reported that Chinese labs MiniMax, Zhipu AI and Moonshot AI are driving a global ‘token economy’ with rapid growth in API usage and overseas adoption. Zhipu AI’s 2025 revenue jumped 131.9% year‑on‑year with token sales up 292.6%, while MiniMax’s 2025 revenue rose 158.9% with about 70% from international markets, and Moonshot’s Kimi K2.5 model was recently adopted as the base engine for U.S. coding platform Cursor.

Apr 3, 2026fundingImpact: 70/100

Microsoft Pours $10B into Japan AI Stack

Microsoft announced on April 3, 2026 it will invest $10 billion in Japan between 2026 and 2029 to expand AI data centers, strengthen cybersecurity and train one million engineers. The package includes partnerships with SoftBank and Sakura Internet to provide sovereign GPU infrastructure and in-country AI compute for Japanese customers.

Apr 3, 2026fundingImpact: 80/100

AI Soaks Up 81% of Record $300B Q1 VC

Multiple analyses published on April 2, 2026 report that global venture funding hit roughly $297–300 billion in Q1 2026, the highest quarter on record. Around $239–242 billion, or about 81%, went to AI companies, led by mega-rounds for OpenAI, Anthropic, xAI and Waymo.

Apr 2, 2026fundingImpact: 90/100

Record $297B Q1 as AI megadeals dominate

On April 1, 2026, TechCrunch reported that global startup funding hit $297 billion in Q1 2026, the largest quarter on record. The spike was driven by four outsized rounds, including massive financings for OpenAI, Anthropic, xAI and Waymo that together accounted for roughly two‑thirds of the total. Seed‑stage AI startups are also raising at historically rich valuations.

Apr 1, 2026fundingImpact: 90/100

OpenAI’s Record $122B Round Supercharges AGI Race

OpenAI said on March 31, 2026 it closed a $122 billion funding round at an $852 billion post‑money valuation, the largest private tech raise on record. The round, anchored by Amazon, Nvidia, SoftBank and Microsoft, brings OpenAI’s revenue run‑rate to $2 billion per month and funds massive chip and data center expansion. Follow‑on coverage on April 1 from outlets across India, Europe, Latin America and the Middle East detailed the investor mix, retail participation and plans for an AI “superapp.”

Mar 31, 2026fundingImpact: 100/100

Hyperscalers Lock In Massive 2026 AI Capex

A March 8, 2026 sector report from Chinese brokerage Guosen Securities finds that Microsoft, Meta, Amazon and Google all sharply increased 2025 Q4 capital expenditure, with aggressive 2026 guidance largely driven by AI infrastructure. Microsoft’s FY26 Q2 capex hit $37.5 billion, Google guided $175–185 billion for 2026 capex, and Amazon plans about $200 billion, with much of the spend earmarked for GPUs, custom AI chips and cloud AI services.

Mar 8, 2026fundingImpact: 90/100

Claude user surge challenges ChatGPT dominance

On March 7, 2026 AI Insider reported that Anthropic’s Claude app has overtaken ChatGPT in U.S. daily mobile downloads and reached 11.3 million daily active users, with more than 1 million sign‑ups per day since late February. The growth follows the Pentagon’s decision to label Anthropic a supply‑chain risk, even as Microsoft, Google and AWS reaffirmed they will keep offering Claude for non‑defense workloads and a separate Mozilla partnership saw Claude Opus 4.6 uncover 22 Firefox security vulnerabilities in two weeks. ([theaiinsider.tech](https://theaiinsider.tech/2026/03/07/claude-surges-in-user-growth-and-enterprise-adoption-as-anthropic-challenges-pentagon-restrictions/))

Mar 7, 2026releaseImpact: 80/100

Japan backs seven homegrown LLMs for government

On March 6, 2026 Japan’s Digital Agency announced it has selected seven domestically developed large language models, including NTT Data, Customer Cloud, KDDI/ELYZA, SoftBank, NEC, Fujitsu and Preferred Networks, for trial use in its “Government AI” platform GENNAI. A related press release from Customer Cloud confirmed at 13:44 JST that its CC Gov‑LLM is among the models to be evaluated for administrative workflows.([digital.go.jp](https://www.digital.go.jp/news/10d55c63-b3e1-42b9-9cc5-93a06943ae0e))

Mar 6, 2026releaseImpact: 70/100

Leading Organizations

OpenAI
DeepMind
Anthropic
Meta

ArXiv Categories

cs.LGcs.AIcs.CL

Related Frontiers