Foundation Models & Reasoning
Core model architectures, training methods, chain-of-thought reasoning, and test-time compute scaling. The backbone of modern AI capabilities.
Key Benchmarks
Recent Papers
Make Your LVLM KV Cache More Lightweight
Anonymous (ICLR and TMLR drafts; arXiv metadata lists named authors)
End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer
Wenda Chu, Bingliang Zhang, Jiaqi Han +4 more
Let ViT Speak: Generative Language-Image Pre-training
Yan Fang, Mengcheng Lan, Zilong Huang +7 more
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning
Adam Ishay, Joohyung Lee
Characterizing the Consistency of the Emergent Misalignment Persona
Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants
Junan Hu, Jian Liu, Jingxiang Lai +6 more
Rethinking Agentic Reinforcement Learning In Large Language Models
Fangming Cui, Ruixiao Zhu, Cheng Fang +2 more
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Tao Ge, Baolin Peng, Hao Cheng +1 more
In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks
Simon Dennis, Michael Diamond, Rivaan Patil +2 more
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists
Yujun Wu, Dongxu Zhang, Xinchen Li +10 more
Recent Milestones
BitCPM-CANN Brings 1.58-Bit LLMs to Phones
On May 25, 2026, ModelBest (面壁智能), Tsinghua University and the OpenBMB community released and open-sourced BitCPM-CANN, described as China’s first 1.58‑bit (ternary) large model trained end‑to‑end on Huawei Ascend hardware. The series includes 0.5B, 1B, 3B and 8B parameter models and reportedly delivers about 6× memory savings with 90–97.2% capability retention versus BF16, enabling 8B models to run on flagship smartphones.([ithome.com](https://www.ithome.com/0/954/759.htm))
Anthropic Books $200B in Google Cloud & TPUs
On May 7, 2026, ET CIO, citing Reuters and The Information, reported that Anthropic has agreed to spend about $200 billion over five years on Google Cloud services and TPU chips as part of a recent capacity deal. The commitment would account for more than 40% of Google Cloud’s disclosed revenue backlog, while Alphabet is separately planning to invest up to $40 billion in Anthropic.
Anthropic + SpaceX secure mega‑cluster Colossus 1
On May 6, 2026, Anthropic announced a partnership with SpaceX to use all compute capacity at the Colossus 1 data center in Memphis, adding over 300 MW and 220,000+ Nvidia GPUs within a month. The deal lets Anthropic immediately double Claude Code’s five-hour rate limits, lift peak‑hour throttling for Pro/Max, and substantially raise Opus API rate limits.
AI Big Three Cut Major Model Cycles to ~50 Days
On May 5, 2026, South Korea’s AJU PRESS reported that Google, OpenAI and Anthropic have compressed their average major model release cadence to roughly 50 days over the last six months. The analysis warns that this acceleration raises competitive barriers for Korean foundation‑model efforts despite new public funds.
Harvard Ties Neural Learning to Physics Theory
On May 5, 2026, TechXplore reported Harvard researchers have built a simplified ridge‑regression model that uses renormalization theory to explain why large neural networks can generalize without overfitting. The toy model connects high‑dimensional statistical fluctuations to stable learning behavior, offering a potential theoretical foundation for deep learning scaling laws. ([techxplore.com](https://techxplore.com/news/2026-05-simple-physics-ai.html))
GPT‑5.5 Instant Becomes Default ChatGPT Engine
On May 5, 2026, OpenAI began rolling out GPT‑5.5 Instant as the new default model for ChatGPT, replacing GPT‑5.3 Instant. The company says the model cuts hallucinated claims by over 50% on high‑stakes prompts while delivering faster, more concise, and more personalized responses.
OpenAI o1 matches ER doctors in Harvard trial
On May 4, 2026, Indian Express reported on a Harvard Medical School and Beth Israel Deaconess study showing OpenAI’s o1 and GPT‑4o models correctly identified exact or near‑exact diagnoses in 67% of emergency‑room cases at initial triage, versus 50–55% for attending physicians. With more patient data, o1’s accuracy rose to 82%, matching or beating doctors across later diagnostic stages.([indianexpress.com](https://indianexpress.com/article/technology/artificial-intelligence/harvard-study-ai-doctors-emergency-room-trial-findings-10671938/))
Anthropic taps $1.5B PE JV for Claude rollout
Anthropic is close to forming a roughly $1.5 billion joint venture with private equity giants Blackstone, Hellman & Friedman and Goldman Sachs to sell AI tools to portfolio companies, according to Reuters and Wall Street Journal reporting on May 4, 2026. The JV would see Anthropic and key investors each contribute hundreds of millions of dollars to commercialize Claude-based systems across private equity-backed firms. ([es.marketscreener.com](https://es.marketscreener.com/noticias/anthropic-ultima-una-joint-venture-de-1-500-millones-de-d-lares-con-firmas-de-wall-street-seg-n-e-ce7f58deda8af227?utm_source=openai))
Korea backs Upstage with ₩560B sovereign AI bet
On May 3, 2026, South Korea’s National Growth Fund approved a direct equity investment of about 560 billion won (~$381 million) into AI startup Upstage. The funding will support Upstage’s Korean-language foundation models and a broader sovereign AI initiative, alongside related investments in a national AI computing center and strategic industries.
NIST: DeepSeek V4 is top China model, 8 months behind US
The US National Institute of Standards and Technology’s Centre for AI Standards and Innovation (CAISI) has released an evaluation of DeepSeek V4 Pro, calling it the most capable Chinese model tested so far. The April 2026 tests, published May 1 and summarized on May 4, 2026, conclude DeepSeek V4 trails top US frontier models by roughly eight months on aggregate capability, despite strong scores in math, software engineering and natural sciences. ([nist.gov](https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro?utm_source=openai))
Anthropic locks 5GW Amazon compute through 2030s
Caproasia reports on May 3, 2026 that Amazon plans to invest up to $25 billion in Anthropic on top of earlier funding, in exchange for more than $100 billion in Claude‑related spend on AWS and up to 5 gigawatts of Trainium‑based compute over the next decade. The underlying April 20 announcements by Anthropic and Amazon described $5 billion in immediate equity, up to $20 billion in milestone‑based commitments, and a multi‑year cloud and chip partnership.([caproasia.com](https://www.caproasia.com/2026/05/03/united-states-2-9-trillion-amazon-to-invest-25-billion-in-united-states-380-billion-ai-research-startup-anthropic-5-billion-on-20-4-26-20-billion-subject-to-milestones-anthropic-raised-30-b/))
Gemma 4 vs Llama 4 vs Qwen 3.5: Open Titans
US-based consultancy Lushbinary published an in-depth comparison on April 5, 2026 of three flagship open-weight model families: Google DeepMind’s Gemma 4, Meta’s Llama 4 and Alibaba’s Qwen 3.5. The piece benchmarks licensing, performance, context length, multimodality and deployment trade-offs for production use.([lushbinary.com](https://www.lushbinary.com/blog/gemma-4-vs-llama-4-vs-qwen-3-5-open-weight-model-comparison/))
400B Open Reasoning Model Undercuts Claude
On April 3, 2026, Arcee AI released Trinity-Large-Thinking, an Apache 2.0–licensed 400B-parameter sparse Mixture-of-Experts reasoning model that activates 13B parameters per token. The model scores 91.9 on PinchBench, within two points of Anthropic’s Claude Opus 4.6, while Arcee prices output at $0.90 per million tokens, roughly 96% cheaper than Opus. Trinity-Large-Thinking is available via OpenRouter, DigitalOcean’s Agentic Inference Cloud and downloadable weights on Hugging Face.
China Labs Turn Token Sales Into Real Revenue
On April 3, 2026 at 14:49 local time in Shanghai, Xinhua reported that Chinese labs MiniMax, Zhipu AI and Moonshot AI are driving a global ‘token economy’ with rapid growth in API usage and overseas adoption. Zhipu AI’s 2025 revenue jumped 131.9% year‑on‑year with token sales up 292.6%, while MiniMax’s 2025 revenue rose 158.9% with about 70% from international markets, and Moonshot’s Kimi K2.5 model was recently adopted as the base engine for U.S. coding platform Cursor.
Microsoft Pours $10B into Japan AI Stack
Microsoft announced on April 3, 2026 it will invest $10 billion in Japan between 2026 and 2029 to expand AI data centers, strengthen cybersecurity and train one million engineers. The package includes partnerships with SoftBank and Sakura Internet to provide sovereign GPU infrastructure and in-country AI compute for Japanese customers.
AI Soaks Up 81% of Record $300B Q1 VC
Multiple analyses published on April 2, 2026 report that global venture funding hit roughly $297–300 billion in Q1 2026, the highest quarter on record. Around $239–242 billion, or about 81%, went to AI companies, led by mega-rounds for OpenAI, Anthropic, xAI and Waymo.
Record $297B Q1 as AI megadeals dominate
On April 1, 2026, TechCrunch reported that global startup funding hit $297 billion in Q1 2026, the largest quarter on record. The spike was driven by four outsized rounds, including massive financings for OpenAI, Anthropic, xAI and Waymo that together accounted for roughly two‑thirds of the total. Seed‑stage AI startups are also raising at historically rich valuations.
OpenAI’s Record $122B Round Supercharges AGI Race
OpenAI said on March 31, 2026 it closed a $122 billion funding round at an $852 billion post‑money valuation, the largest private tech raise on record. The round, anchored by Amazon, Nvidia, SoftBank and Microsoft, brings OpenAI’s revenue run‑rate to $2 billion per month and funds massive chip and data center expansion. Follow‑on coverage on April 1 from outlets across India, Europe, Latin America and the Middle East detailed the investor mix, retail participation and plans for an AI “superapp.”
Hyperscalers Lock In Massive 2026 AI Capex
A March 8, 2026 sector report from Chinese brokerage Guosen Securities finds that Microsoft, Meta, Amazon and Google all sharply increased 2025 Q4 capital expenditure, with aggressive 2026 guidance largely driven by AI infrastructure. Microsoft’s FY26 Q2 capex hit $37.5 billion, Google guided $175–185 billion for 2026 capex, and Amazon plans about $200 billion, with much of the spend earmarked for GPUs, custom AI chips and cloud AI services.
Claude user surge challenges ChatGPT dominance
On March 7, 2026 AI Insider reported that Anthropic’s Claude app has overtaken ChatGPT in U.S. daily mobile downloads and reached 11.3 million daily active users, with more than 1 million sign‑ups per day since late February. The growth follows the Pentagon’s decision to label Anthropic a supply‑chain risk, even as Microsoft, Google and AWS reaffirmed they will keep offering Claude for non‑defense workloads and a separate Mozilla partnership saw Claude Opus 4.6 uncover 22 Firefox security vulnerabilities in two weeks. ([theaiinsider.tech](https://theaiinsider.tech/2026/03/07/claude-surges-in-user-growth-and-enterprise-adoption-as-anthropic-challenges-pentagon-restrictions/))