After mass production, Agibot shifts focus to architecture and ecosystem

What happens when a robotics company decides to build both a large model platform and an open ecosystem?

In March, Agibot crossed the threshold of 10,000 robots rolling off its production lines. Then, on April 17, at its partner conference, the company spent significant time introducing new software products. By comparison, hardware received far less attention.

Agibot unveiled six artificial intelligence models, seven productivity solutions, and, for the first time, its AIMA full-stack ecosystem technology system, short for “AI machine architecture.” Together with its hardware robots, these software products form Agibot’s “one robotic body, three intelligences” architecture.

In this framework, the “one body” refers to the robot’s physical form. The “three intelligences” are motion intelligence, interaction intelligence, and task intelligence:

Motion intelligence is the foundational layer, serving as the actuator function of the physical platform.
Interaction intelligence is a higher-order layer, acting as the entry point for engagement.
Task intelligence is another higher-order layer, providing labor productivity.

“Without intelligence that is deeply coupled with the physical body, a robot is just a tool, not true embodied intelligence,” said Peng Zhihui, president and CTO of Agibot.

The key question is how to enable robots to do more than dance or perform flips based on prewritten code, and instead take on work autonomously in industrial, commercial, and household settings. The critical variable is robotic intelligence, and Agibot aims to become a platform to incubate that capability.

In motion intelligence, Agibot plans to launch two foundation models: a whole-body motion-control foundation model that supports sensing-control fusion, enabling adaptive motion through environmental perception, and a generative motion-control foundation model that can produce actions in real time through multimodal interaction without preprogramming.

In interaction intelligence, Agibot will release WITA Omni 1.0 in the third quarter, building on its widely used WITA model. The company describes it as the industry’s first end-to-end embodied multimodal interaction model. It preserves information such as emotional tone, context, vocal inflection, and surroundings to enable natural, humanlike interaction, while supporting interruptions, mid-conversation interjections, and corrections.

Task intelligence is where Agibot is investing most heavily and where its algorithm talent is most concentrated. The company recently released the GO-2 model, which integrates a large-and-small-brain architecture, the GE-2 action world model, the open-source dataset Agibot World 2026, the Genie Sim 3.0 simulation platform, and Genie Studio 2.0.

In the third quarter, Agibot will also roll out the GO-3 model, which combines the ViLLA architecture with a world model architecture. Agibot said the model will support planning and simulation, as well as reasoning and execution for complex tasks, with a data scale tens to hundreds of times larger than GO-2.

At the partner conference, chairman and CEO Deng Taihua outlined the trajectory of embodied intelligence using an XYZ curve framework:

The X curve, spanning 2022–2025, represents the development and early adoption stage, when the industry moved from prototypes to scaled production. In 2023, the first humanoid robot was launched, validating technical feasibility. By 2025, mass production reached 5,000 units, bringing robots closer to becoming viable products.
The Y curve, covering 2026–2030, represents the deployment growth stage. In March, Agibot’s 10,000th robot rolled off the line. Interaction and task intelligence are beginning to scale, and robot productivity is approaching human levels.
The Z curve, beginning in 2030 and beyond, represents the deployment and popularization stage, when embodied intelligence reaches a broader inflection point. In this phase, robots in manufacturing, logistics, services, and other sectors are expected to surpass human productivity. Their learning efficiency and rate of evolution may accelerate significantly, with the potential emergence of swarm intelligence.

Photo source: 36Kr.

Under Agibot’s plan, the company aims to move through the X curve in three years and reach RMB 1 billion (USD 146.4 million) in revenue, progress through the Y curve in five years with 10,000 deployed units and RMB 10 billion (USD 1.5 billion) in revenue, and reach the Z curve in eight years, scaling with global ecosystem partners.

Peng said Agibot sees 2026 as a breakthrough year because “three factors come together,” including advances in large models, improvements in robot hardware, and acceleration of the data flywheel:

On large models, Peng said they are enabling robots to perceive and understand the world. More importantly, these models are no longer isolated algorithms but part of an open-source ecosystem, accelerating iteration across robotics.
On hardware, Agibot has achieved large-scale robot production and can now operate its systems reliably around the clock.
On data, Peng said: “The more robots are deployed, the faster the flywheel spins. The more data is collected, the stronger the models’ training capabilities become. Once that flywheel starts turning, it will generate exponential network effects. Agibot believes the flywheel will begin accelerating in 2026.”

Agibot’s strategy appears to be mass producing robot bodies, iterating on models, opening up data, and building an open ecosystem platform. Peng described it as “the hardest path, but also the one with the greatest compounding returns.”

The shift toward openness among robotics companies in 2026, even among competitors, reflects a shared constraint: limited resources.

Large language and video generation models are consuming vast amounts of tokens. While such models can learn from text and video, embodied intelligence must learn through interaction with the physical world. Data scarcity has become a central bottleneck.

The day before the partner conference, Maniformer, an Agibot subsidiary, introduced a B2B data service platform for robotics companies.

“Who is the biggest consumer of tokens in the AI era? Not chat apps, not coding assistants, and not image or video generators, but embodied agents,” Peng said. “The task space for embodied agents spans both the digital and physical worlds. A robot operating continuously in the physical world consumes tokens at every moment.”

Robots have reached mass production, and large models have been developed. What remains missing is the data flywheel.

“GPT-5 used 100 trillion tokens of training data. One token is roughly equal to 0.75 English words. If an average person speaks 150 words per minute, a single person would need ten billion hours to produce that volume,” said Yao Maoqing, chairman and CEO of Maniformer. “Embodied intelligence is different. Even if you aggregate all high-quality data globally today, it amounts to only about 500,000 hours.”

In interviews after the conference, Peng reiterated the issue of data scarcity.

“The data gap in embodied intelligence remains large, and it is a major bottleneck for the industry. The threshold for collecting this data is also high, because it requires real-world physical interaction and capturing variables such as friction and gravity,” Peng said. “That is why we continue to launch data collection products and commercial solutions, while actively building open data ecosystems.”

The open ecosystem is intended to address data shortages collectively, while also helping establish standards and reducing duplicated effort across the industry.

“The more open-source work there is, the easier it becomes to build an ecosystem. The more participants there are, the easier it is to establish de facto standards. That is one of the paths we are taking to advance standardization,” Peng said.

KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Wang Yuchan for 36Kr.

Note: RMB figures are converted to USD at rates of RMB 6.83 = USD 1 based on estimates as of April 21, 2026, unless otherwise stated. USD conversions are presented for ease of reference and may not fully match prevailing exchange rates.