Alibaba Cloud said revenue from model-as-a-service (MaaS) token usage grew 15-fold in the first five months of 2026. The company disclosed the figure at a May 20 launch event, saying monthly revenue from the segment has reached a nine-figure RMB sum. It tied the increase to one driver: artificial intelligence agents.

The company also released Qwen 3.7-Max, its latest flagship model, alongside a slate of other products. The launch came just one month after Qwen 3.6-Max, underscoring how quickly large models are being updated as agentic coding, the use of AI agents to write and execute code, becomes a central competitive front.

That pace is partly a response to OpenClaw’s popularity in February, which pushed model companies to improve coding performance for agent scenarios. Competition among large language models (LLMs) is increasingly centered on coding, and Alibaba needs a strong model in that field to maintain the competitiveness of its MaaS business.

Alibaba is not alone. That same day, Google held its developer conference in Silicon Valley, where the theme was similarly centered on cloud-based agentic AI. Its new chips, models, applications, and other products were organized around agents.

For major cloud providers, agentic coding has become one of the clearest points of consensus in AI.

Alibaba Cloud’s answer began with the new Qwen Cloud website. It is designed for agents and, according to Alibaba Cloud, is the first standalone website the company has created for a single business in its 17-year history.

“Qwen Cloud is designed for agents, not humans,” Liu said. The idea came from an internal judgment Alibaba Cloud made at the end of 2024: over time, the primary users of cloud computing products would shift from human engineers to AI agents.

Previously, a developer or enterprise that wanted to deploy services on the cloud had to open the company’s website, register, navigate hundreds of product categories, choose machine types, configure networks, start instances, install environments, and connect APIs. Every step required human engineering judgment, creating a meaningful onboarding barrier.

Qwen Cloud changes that sequence. An agent first looks for a model, then tools and skills, and only afterward the underlying cloud resources. The order is reversed.

Alibaba Cloud saw one example after OpenClaw launched. An agent could automatically activate cloud computing resources within a single day, a process that previously took human engineers two weeks. “In the future, people will not need to activate these resources. Agents will automatically activate cloud computing resources in the background,” Liu said.

The website is only the entry point. Alibaba Cloud is adapting its stack around agents, from models and infrastructure to chips.

The clearest example is Qwen 3.7-Max, which arrived only one month after Qwen 3.6-Max. Alibaba has long built influence and credibility in open source, but compared with domestic rivals such as Z.ai’s GLM and Moonshot AI’s Kimi, its flagship model did not receive the full benefit of the surge that OpenClaw created. That made stronger coding performance a priority.

Compared with Qwen 3.6-Max Preview, Qwen 3.7-Max’s biggest upgrade is stronger long-horizon task capability. Alibaba Cloud said agents running on the model can independently execute complex tasks that span dozens of hours and more than 1,000 steps without human intervention.

The stronger a model’s long-horizon task capability, the more complex the tasks an agent can generally complete and the less human intervention it needs. This is also a competitive dimension for leading agent products such as Claude Code and Gemini Deep Research.

Zhou Jingren, CTO of Alibaba Cloud, gave one example: on a new chip platform from subsidiary T-Head Semiconductor, Qwen 3.7-Max reportedly used autonomous programming and more than 1,000 tool calls to evolve a key platform kernel, increasing inference speed tenfold from the original version.

In practice, that means a model can independently fix code defects like an experienced engineer while helping engineers develop complex functions.

Those capabilities also depend on adaptation at the chip and infrastructure layers. Alibaba Cloud said its new Zhenwu M890 AI chip, produced by T-Head, supports both training and inference. It has also deployed its ICN Switch 1.0 interconnect chip in supernode servers built for large-scale concurrent agent scenarios.

T-Head’s PPU shipments have reportedly exceeded 540,000 units, and the chips have begun providing inference services for AI applications such as Wukong and Meoo.

Why AI agents are driving token consumption

The shift matters commercially because AI agents do more than respond to prompts. In coding scenarios, a single task can consume ten or even 100 times as many tokens as a typical chat interaction.

That has intensified market competition. The more often a model is called in agentic scenarios, the faster it can generate revenue. One apparent beneficiary is Anthropic. According to WSJ, Anthropic’s revenue is expected to more than double in the second quarter to USD 10.9 billion.

Alibaba Cloud is trying to capture the same economics. In calendar year 2025, its revenue exceeded RMB 146.6 billion (USD 21.6 billion), up 28.6% year-on-year, with AI products contributing to the increase.

Alibaba CEO Eddie Wu said on the group’s latest earnings call that annual recurring revenue from AI model and application services, including the Model Studio platform, will exceed RMB 10 billion (USD 1.5 billion) in the June quarter and surpass RMB 30 billion (USD 4.4 billion) by year’s end.

But Alibaba Cloud and ByteDance are taking different approaches in the token race.

“Token revenue mainly comes from two ends: LLMs represented by coding, and video models,” Liu said. “But over the past period, many people have mixed together the token growth from these two markets. That is inappropriate.”

ByteDance has taken the lead in video generation models. Citing institutional research, 36Kr estimated that after Seedance 2.0 became popular, ByteDance accounted for 80% of the video model market’s average daily token consumption. At the end of 2025, Volcano Engine reportedly set a target of more than RMB 10 billion in MaaS revenue for 2026. After Seedance 2.0’s rise, that target was reportedly raised again.

By contrast, Alibaba Cloud’s advantage appears to be in LLMs. “Companies with developers need cloud services, so Alibaba Cloud’s existing customers, who certainly have developers, are almost all potential users of coding,” Liu said.

At the end of 2025, Alibaba Cloud set a goal of capturing 80% of incremental growth in the AI cloud market in 2026. It is now concentrating its resources on coding. “In the first five months of this year, we can say Alibaba Cloud has already captured 80% of incremental growth in the LLM market,” Liu said.

To support that goal, Alibaba Cloud is changing how it evaluates sales teams. The question is no longer who sells the largest token volume, but who sells the most valuable tokens.

Put simply, Alibaba Cloud is not pursuing token consumption generated by basic chat. Prices for that type of model usage have already fallen sharply.

Instead, one of Alibaba Cloud’s key metrics is the number of customer core business systems its models are connected to. It wants tokens that customers use to write code, make decisions, and run processes. Once models enter enterprise production workflows, token consumption rises, unit prices are higher, repurchases are more stable, and the revenue is higher quality.

Coding-related token consumption differs from video. Video model consumption is one-off: a video is generated, and the task ends.

By comparison, coding is closer to a self-evolving process. A model writes code, the code becomes an application, the application is deployed to the cloud, the running application calls the model, and the model generates more code.

The large model race has become a systems engineering contest. The coupling of chips, infrastructure, and large models is now one of the most important factors shaping training and inference efficiency. Commercial competition is also accelerating, rapidly testing use case value and feeding intelligence back into models.

“Chips, models, and the cloud are now like gears that need to mesh together and spiral upward,” Liu said. If the future competition is about whether each chip can run more tokens, and higher-quality tokens, than competitors, “then we win.”

KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Deng Yongyi for 36Kr.

Note: RMB figures are converted to USD at rates of RMB 6.80 = USD 1 based on estimates as of May 25, 2026, unless otherwise stated. USD conversions are presented for ease of reference and may not fully match prevailing exchange rates.