Alibaba has unveiled Qwen3.7‑Max, a proprietary flagship AI model designed for long‑running agentic coding tasks, claiming it outperforms OpenAI and Google models on coding benchmarks. In tests described on May 28, 2026, Alibaba says Qwen3.7‑Max autonomously optimized code for one of its AI chips over 35 hours, running 432 kernel tests and 1,100+ tool calls without human intervention.
This article aggregates reporting from 2 news sources. The TL;DR is AI-generated from original reporting. Race to AGI's analysis provides editorial context on implications for AGI development.
Qwen3.7‑Max pushes Alibaba deeper into the agent era, explicitly optimizing for long‑horizon, tool‑using workflows rather than chat UX. The reported 35‑hour autonomous run on an unfamiliar chip kernel—hundreds of tests, thousands of tool calls, and a 10x speedup—reads like a targeted demonstration that Chinese labs can field models capable of serious systems‑level reasoning, not just text synthesis. Even if some claims are marketing‑heavy, the direction is clear: Qwen is being positioned as a full‑stack coding and operations agent, not just another LLM.([newsbytesapp.com](https://www.newsbytesapp.com/news/science/alibaba-says-its-new-ai-beats-chatgpt-gemini-in-coding/story))
Strategically, Alibaba is also making a sharp turn away from its earlier open‑weight Qwen releases: Qwen3.7‑Max is proprietary, API‑only and tightly integrated into Alibaba Cloud’s Model Studio. That mirrors Western moves around GPT‑5‑series and Claude Opus, and it suggests that the economic frontier is now in agent orchestration and hosting, not just model weights. For the race to AGI, that’s important because it shifts experimentation into environments controlled by a handful of hyperscalers, including Alibaba.
If Qwen3.7‑Max really does match or beat top US models on coding while being cheaper in its home region, it will give Chinese developers a strong local default. That increases the chance that future breakthrough agent architectures and tooling ecosystems emerge first around Chinese APIs, then get distilled back into Western stacks.



