Volcano Engine bets better models, not lower prices, will decide the MaaS race

Over the past three years, Volcano Engine president Tan Dai has repeated the same cycle when setting revenue targets for his team’s model-as-a-service (MaaS) business.

At the start of each year, he sets an ambitious goal. The team thinks it is too difficult to hit. By midyear, the target starts to look achievable, and Volcano Engine raises it again.

At the start of 2026, ByteDance’s video model Seedance 2.0 gave Volcano Engine an early boost. As a latecomer to China’s cloud market, Volcano Engine has leaned into artificial intelligence to drive MaaS growth.

Since the second half of 2025, coding and video models have opened up more use cases for commercial production. That shift has shown the market that the boundaries of model capabilities remain hard to predict. For MaaS providers, top-tier models are now a central growth driver.

At Volcano Engine’s Force Conference on June 23, ByteDance unveiled its next-generation flagship model, Doubao-Seed-2.1 Pro. The release also means that, beyond video generation, ByteDance has addressed a gap in its model lineup: coding.

Tan describes Doubao-Seed-2.1 Pro as a model that has “truly earned a seat at the table” in coding and agent capabilities. On Terminal-Bench, a programming benchmark, Doubao-Seed-2.1 Pro is roughly on par with Claude Opus 4.7, according to Volcano Engine. It also performs well on long-horizon and complex tasks, reaching what Tan sees as the threshold for practical use.

That is the market Volcano Engine cares more about. Progress in coding and agent capabilities allows models to enter more core production workflows for enterprises and individuals. In other words, they can create more commercial value.

Alongside its flagship model, Volcano Engine announced a series of model updates, including the 4K version of Seedance 2.0, image generation model Seedream 5.0, Doubao speech generation model 1.0, and Seedance 2.5, which is scheduled for release in July.

“Video generation is also one path toward world models,” Tan said. Because the Seedance model has demonstrated precise restoration and understanding of the physical world, he said, high-quality visual data synthesis becomes more feasible, accelerating research progress in areas such as embodied intelligence and autonomous driving.

Two years after 36Kr’s first interview with Tan, ByteDance appears to have moved from the experimental phase to broader adoption. Large models are no longer tools used only by a small group of early adopters. They are entering more people’s work and daily lives.

Volcano Engine said usage of its MaaS products has grown sharply. Compared with the end of 2025, its average daily token consumption has risen 50% to 180 trillion, more than 1,500 times the level two years ago. The number of clients with cumulative token consumption in the trillions has also doubled to more than 200.

Tan said that, with the models released at the conference and others that may follow later this year, Volcano Engine has already raised its annual revenue target.

Behind that is a shift in model pricing. In 2024, Volcano Engine was one of the first companies to push large model prices down. Yet at this year’s conference, it no longer emphasized price cuts.

“In 2024, model prices were cut because models were only worth that much at the time,” Tan said. “Now that capabilities are stronger and models can help customers create greater value, they can command higher prices.”

That raises a larger question: when large models move into the core production workflows of more industries, what will AI change about the cloud industry?

At the end of 2024, someone asked Tan: If selling APIs can make money, why build a cloud business at all? Cloud was once seen as a high-growth business with a long runway. But after more than a decade of development, China’s cloud market had become saturated and overheated.

To Tan, the question is based on a false premise. MaaS and cloud solutions are not in opposition. He believes the future cloud market is likely to use agents for orchestration. Traditional cloud services will not disappear, but they will become part of AI cloud.

Volcano Engine and other cloud vendors now see AI as their main growth driver. Tan thinks this is natural: “If you went back to 2012, the year ByteDance was founded, would you have made a major push into PC search?”

The next question for Volcano Engine is how it can keep winning in the MaaS market. Tan does not yet have a complete answer. But one thing is clear, and it is also the hardest part: the company needs to keep its models competitive over the long term.

The following transcript has been edited and consolidated for brevity and clarity.

36Kr: Volcano Engine has grown quickly over the past year. What has been the driver?

Tan Dai (TD): Fundamentally, it is because models have unlocked real production-grade use cases and entered core production workflows. The more challenging and valuable a productivity use case or workflow is, the more value it creates once unlocked.

One main thread is video generation. Seedance is the first model globally to truly unlock commercial production use cases.

The other main thread is large language models (LLMs) and agents. Production-grade use cases were unlocked last year after Claude Opus 4.6 came out. Cursor had an analysis showing the ratio between two modes: agents automatically completing code and Tab-based code completion. Before Claude Opus 4.6 came out, Tab completion accounted for the larger share. After that, the ratio reversed. This shows that, after 4.6, model capabilities improved significantly and could truly be used in production-grade coding and agent scenarios.

36Kr: How do you judge whether Seedance 2.0 has truly achieved commercial production?

TD: Before Seedance 2.0 came out, most video models were used to produce user-generated and professionally generated entertainment videos. They were relatively difficult to apply to serious creative use cases, such as film, TV dramas, and advertising.

We can also see this change in user consumption patterns. Previously, usage of video generation models was higher on weekends than on weekdays, similar to many entertainment-oriented consumer products. After Seedance 2.0 came out, that was no longer the case. Its weekday load is now more than twice its weekend load, which shows that people are using it for work.

Video generation is also one path toward world models, and it has significant application potential in physical industries. Seedance has been implemented in areas such as embodied intelligence, industrial manufacturing, and intelligent driving, providing new tools for business needs including data synthesis, scenario simulation, and process demonstrations.

36Kr: Before Seedance 2.0 came out, did you expect internally that it would become a breakout hit?

TD: I would not call it a breakout hit. We actually set an even more aggressive target. Looking at it now, meeting that target is still challenging.

36Kr: Why has Seedance 2.0 been able to achieve such strong results?

TD: It reflects our overall capabilities. To do video generation well, you need a strong language model as the foundation. Image generation and VLM (vision-language modeling) capabilities also need to be strong enough.

Seedance 2.0’s performance can be seen as built on Doubao’s capabilities. That is an important advantage we have over vertical companies that focus only on video models.

Another point is that content creation is very active in China by global standards. The fact that China was the first to produce the best video model is related to this.

36Kr: Some market players think competition in video generation has already tailed off, with ByteDance occupying a dominant position. How do you see it?

TD: We are not at that stage yet. AI penetration in video generation remains very low overall.

Right now, the outside world is paying too much attention to Seedance’s short-term revenue and overlooking its technical value. Video generation is a relatively mature technical path that can be scaled up massively through unsupervised methods.

The Seedance model has demonstrated precise restoration and understanding of the physical world. That makes high-quality visual data synthesis more feasible and accelerates research progress in areas such as embodied intelligence and autonomous driving. It will have tremendous application potential in physical industries.

Also, if AI truly creates value, it should not replace the past. It should make the entire industry bigger.

36Kr: At this Force Conference, Volcano Engine also released the new flagship model Doubao-Seed-2.1. How do you define this model?

TD: I think Doubao-Seed-2.1 Pro has already reached the standard for practical use. It can benchmark against the level of Claude Opus 4.6 and has crossed the threshold for usable agents.

Doubao-Seed-2.1 also marks the point at which we have truly earned a seat at the table in coding. This is very important. In China, there are still not many players that have really earned that seat.

36Kr: How would you define its usability?

TD: There are several characteristics:

First, strong coding capabilities. In the digital world, strong coding capabilities mean you can flexibly call scripts and tools, and your generalization capability is also strong.
Second, the ability to complete complex general agent tasks. That means being able to use tools better, handle long-horizon tasks, integrate well with memory, adapt to different harnesses and frameworks, and have strong VLM capabilities. Many inputs need to be processed visually, such as in computer use.
Third, the ability to be deployed at scale. If a model is very good but too expensive, that does not work. If the latency is too high, such as throughput of more than 20 milliseconds, that does not work either. The model also needs to be able to support more services at scale.

Doubao-Seed-2.1 performs very well across these areas. Compared with Claude Opus 4.6, its coding capabilities can also surpass it. In terms of scalable deployment, the task mode that just launched on the Doubao app is built with Doubao-Seed-2.1.

36Kr: In coding use cases, when do you think Chinese-made models will have truly caught up?

TD: Around the second quarter of this year. Many models used to say they wanted to match this or that benchmark, but saying it is useless. If you have truly caught up, or even surpassed others, people will pay you. Annual recurring revenue will show whether you have actually done it.

36Kr: Compared with video generation, why does China seem to have moved more slowly in coding overall?

TD: First, globally, competition in LLMs is more intense. Second, we started later. Anthropic and OpenAI started much earlier, and coding was also a direction they defined and pushed into first. We started later, so it is normal that our overall progress is behind theirs. It is a very difficult thing to do.

36Kr: Seed originally had a separate coding model, SeedCode. Will you still develop it?

TD: After Doubao-Seed-2.1 was released, there is no longer a separate one. Coding and agent capabilities have both been integrated into the main version.

Models are iterating too quickly now. We do not want to wait one or two months to release a new version, so we now have a new series called Seed Evolving. It is based on Doubao-Seed-2.1 and will be updated every one or two weeks.

36Kr: Is this model mainly aimed at developers and optimized around coding and agents?

TD: Not only developers. Some enterprises want stable model performance. They do not want surprises, whether good or bad, so they can use Doubao-Seed-2.1 directly. But many others always want to use the latest and smartest version. Doubao-Seed-Evolving is designed to meet their needs. But it is not a guinea-pig version. It will go through very rigorous evaluation.

36Kr: You have now unlocked both main threads of production-grade use cases. Between LLMs and video generation, which do you think is more important?

TD: From my perspective, LLMs are actually more important. They have a larger value creation space. Although the current situation is that Seedance sells more, I hope LLMs can become a larger part later.

36Kr: There is a perception that as long as a model is sufficiently state of the art, it will sell well. How does Volcano Engine demonstrate its value?

TD: There is actually a lot we can do. The stronger the model capability, the greater the responsibility.

For example, Seedance 2.0 became popular before the Lunar New Year, but Volcano Engine’s API was launched at least two months later, not until April. What were we doing during that time? We were mainly working on copyright protection. We believe that, to do model inference well, guardrails are also very important.

When we look at LLMs, their capabilities are fully released through harnesses. Right now, Seedance still lacks a harness layer of its own. Recently, we have been thinking about how to work with different industries to build this harness layer for different models. We now have an FDE, or frontier deployment engineer, team working with different industries on this.

36Kr: Can you elaborate? What kinds of harnesses do different industries need?

TD: For example, in film and television production, many digital assets have not been managed, and many skills have not been accumulated. These issues limit the model’s performance. We think that, in the future, something like Claude Code will also emerge in video creation.

36Kr: Does ByteDance need to build the video harness itself?

TD: We will work with everyone. The important thing is to find people who understand this.

In coding, programmers understand the work, so they can build the corresponding harness. In video creation, you also need to understand the creative workflow. That is why we have recently been trying to recruit directors for the team.

For example, Seedance’s newly launched “3D blockout pre-visualization” feature came from a suggestion by a well-known industry director who is also a Volcano Engine customer. In science fiction films, there is an important production process called a “blockout,” which is used to express scenes, character relationships, and how the plot unfolds. By using that as a reference, the model can directly generate the corresponding shot. We are also the first in the world to launch this feature in a video model.

36Kr: Do you think the capabilities of video models have not yet been fully released?

TD: I would not put it that way. Video models have been implemented quickly in China because there is a very good intermediary layer. Video creation has always been very active in China, so creators can be highly sensitive to how these new technologies should be used. They serve as the intermediary layer that turns APIs into content and ultimately achieves commercial implementation.

Of course, Volcano Engine still has room to optimize in this process.

36Kr: Is the difficulty in implementing LLMs also due to the lack of an intermediary layer?

TD: If the software-as-a-service ecosystem were strong enough, it could serve as a bridge. China’s software-as-a-service foundation is relatively weak, so it is somewhat difficult for end enterprises to use APIs directly.

Looking at it the other way, that also means there is an opportunity to leapfrog.

36Kr: Is model implementation the issue enterprise customers care about most right now?

TD: Everyone believes in AI. But how AI should be implemented inside my company is indeed something people are still not very clear about.

One category of use cases has already gone from zero to one and become a more concrete demand question. For example, Seedance customers will tell us they want a 4K version.

Another category involves complex problems, especially for companies that are not digital natives. How they can ultimately generate business value through AI still requires us to work alongside them and provide advice.

36Kr: The boundaries of models are still blurry. How can you implement models well for enterprises when models are iterating so quickly?

TD: Implementation can happen in two modes: agile mode and steady-state mode. Steady-state mode abstracts workflows and transforms them with AI. Agile mode gives good tools to people and allows them to experiment broadly, then uses steady-state mode to implement the best practices that emerge. For agile mode, hands-on support is very important.

36Kr: What is still missing for the MaaS market to move into the next stage?

TD: It is still the models. Models need to get better. In addition, each industry needs to build the relevant harnesses.

36Kr: Volcano Engine’s token consumption is said to have kept rising this year. What is the latest situation?

TD: We certainly have more and more customers. We now have more than 1 million enterprise and individual users. Enterprise customers that have cumulatively consumed more than one trillion tokens have also reached 200. Last December, the figure was 100. Our customer retention is also very good. Of the original 100, basically none have churned.

36Kr: Which models are driving MaaS growth this year?

TD: Seedance is definitely still the largest piece, accounting for more than half. But the LLM side is also substantial.

Over the long term, LLMs will be the larger market. For example, if you build a video creation agent, it will definitely need to call LLMs in addition to Seedance.

36Kr: Does Volcano Engine still want to capture the existing traditional cloud computing market?

TD: If you went back to 2012, the year ByteDance was founded, would you have made a major push into PC search?

The AI-driven cloud market is ten times larger than the traditional cloud market, and more workloads will run on it in the future. Perhaps the traditional cloud market is now worth USD 100 billion and will still be worth USD 100 billion in the future, but AI cloud may be worth USD 1 trillion.

36Kr: Where do you think MaaS barriers come from? If MaaS simply follows state-of-the-art models, its stickiness seems even lower than public cloud.

TD: Making models good is indeed very important, but that also means the barrier to earning a seat at the table is very high. The challenge of making models good is greater than the challenge of making cloud good. Cloud is more engineering-driven. Models require engineering capabilities, but they also need a group of people with genius-level ideas.

But I do not think MaaS is less sticky than cloud. It is just a question of stage. In the beginning, cloud only sold servers, which also had no stickiness. It became sticky only after many products emerged.

36Kr: When will MaaS services become sticky?

TD: If they are only used to improve coding efficiency and are not deeply integrated with production systems, the degree of application is shallow, and it is easy for them to lack stickiness. But if they become part of a company’s security systems or other core production systems, the degree of coupling becomes much greater.

36Kr: When the MaaS market first emerged, you mainly talked about token consumption. Some peers thought this was not fundamental enough. How do you see that?

TD: They all call themselves token business units now. Does that not mean they agree with this view?

36Kr: Right now, what metric matters most for MaaS?

TD: Token revenue is definitely the most important thing.

36Kr: We heard that Volcano Engine has been continuously raising its MaaS revenue target in the first half of the year. The latest number is RMB 15 billion (USD 2.2 billion). What is the actual figure?

TD: I will not discuss specific data. Judging by this growth trend, we can definitely complete the target we set at the start of the year, and we have indeed raised the target.

36Kr: Is Volcano Engine’s top priority still revenue scale?

TD: Since the first day we were founded, what we have always pursued is scale with gross margin. We do not want scale without gross margin. First, there needs to be gross margin. Second, on that basis, there needs to be scale.

36Kr: In the large model era, will we see profits sooner? Z.ai’s GLM has raised prices three times because demand has outstripped supply.

TD: If your model is good and creates high value, you can make more money.

In 2024, model prices were cut because models were only worth that much at the time. Now that capabilities are stronger and models can help customers create greater value, they can command higher prices.

36Kr: How should we understand the pricing structure for models?

TD: Pricing is not calculated backward from a target gross margin. It depends on how much value tokens can create. Your pricing needs to make customers’ migration benefits at least two to three times higher than their migration costs before they are willing to use it. For example, if shooting one second of an advertisement used to cost RMB 100 (USD 14.7), and now using Seedance costs only a few RMB, then it is definitely worth using.

Customers are very smart. They know that, compared with the past, the value created by the model is worth this price. Fundamentally, the model’s value has increased.

36Kr: How high a priority are overseas markets for Volcano Engine this year?

TD: We have always attached great importance to globalization. In the same year Volcano Engine was founded, we also established BytePlus to serve markets outside China.

36Kr: How much will investment in overseas markets increase this year?

TD: I do not think that depends on us. It depends on whether your model is competitive. If it is not competitive, no amount of investment will help.

36Kr: We heard that Seedance’s market share has already surpassed Google’s Veo and become number one globally. That clearly suggests it is competitive.

TD: It is number one in China and is also developing quickly overseas. Overseas markets already account for almost half of Seedance’s total business. But I am not sure about its global market ranking.

36Kr: How high will overseas markets rank in your business priorities this year?

TD: Many Chinese tooling companies in the creative space are going overseas, and we also encourage and accompany them as they expand internationally. Local customers overseas also use our products. For example, WPP’s global agencies cooperate with us in many places.

36Kr: Has the MaaS market already reached a stage of heated competition?

TD: It is hard to estimate. The MaaS market is still growing fivefold to tenfold every year. I am talking about incremental markets.

36Kr: So it is still the stage of growing the pie together.

TD: Yes. The competition is still localized.

36Kr: What is your key objective this year? Has it changed?

TD: It has not changed much. Our strategy is very stable: become the top MaaS player, refine the AI cloud-native product stack, and build the organization well.

One thing has changed. Two years ago, the goal was to expand market share in the cloud business. This year, it has changed to focusing on AI cloud.

36Kr: What has been the biggest change in you over the past two years?

TD: My English has improved a little because I have to meet many overseas customers.

36Kr: Do you have any new thoughts on organization and management?

TD: First, the most important thing is still to think clearly about strategy. Do not keep changing it. You need to take a longer-term view. Volcano Engine formed its AI strategy relatively early and has been relatively firm about it.

Second is talent. Diversity is very important. For example, Seedance recently needed to recruit directors to join the team. For AI to penetrate every industry, you cannot rely only on people with IT backgrounds.

Third, you need to keep reflecting on yourself. When people see a bit of success, it is easy for them to get carried away.

36Kr: What was your most recent reflection?

TD: Do not micromanage too much. I found that, for a period, I was micromanaging too much. Generally, when you feel very confident, it is easy to do that. But now there are too many details, and there are many things you do not understand well enough. It is better to interfere less and trust the team.

36Kr: When Volcano Engine was founded, it set a target of RMB 100 billion (USD 14.7 billion) in revenue by 2030. Will that timetable be moved forward?

TD: In 2025, we already moved that timetable forward. This year, we will move it forward a bit more.

36Kr: As the timetable keeps moving forward, do you think any estimates of the AI market’s size are still valid?

TD: This is how the past three years have been for us: At the start of the year, we set a very high target, and the team thinks it is too hard. Then, by midyear, we find that we have almost completed it, so we need to raise it. The changes brought by AI are bigger and faster than most people imagine.

KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Deng Yongyi for 36Kr.

Note: RMB figures are converted to USD at rates of RMB 6.80 = USD 1 based on estimates as of July 2, 2026, unless otherwise stated. USD conversions are presented for ease of reference and may not fully match prevailing exchange rates.