X Square Robot has completed a Series B funding round, raising nearly RMB 2 billion (USD 292.8 million), 36Kr reported. Xiaomi and HongShan co-led the investment.
The latest round builds on earlier financings secured by the robotics firm, which has attracted backing from major companies such as Meituan, Alibaba, and ByteDance. With Xiaomi’s participation, X Square Robot is now the only Chinese company in the embodied intelligence sector to have secured investment from all four.
Xiaomi is not new to embodied intelligence. Over the past two years, its disclosed investments have included ViTai Robotics, which focuses on tactile sensing; Xynova, which develops dexterous hands; RoboParty, which builds robot bodies; and SynapX, which focuses on models.
Beyond external investments, Xiaomi’s own robotics products are also advancing. This month, its CyberOne robot entered Xiaomi’s automotive factory for a live production trial aimed at validating flexible assembly in manufacturing.
On the model side, Xiaomi open-sourced its vision-language-action model, Xiaomi-Robotics-0, in February.
A source told 36Kr that X Square Robot is among the few companies in China that remain committed to building embodied intelligence foundation models fully in-house, and it has consistently adhered to an end-to-end approach.
More specifically, in its self-developed WALL-A model, X Square Robot maps visual, language, tactile, and action signals into a continuous sequence of high-dimensional tokens. These unified representations are then fed into a single transformer architecture, enabling multimodal inputs and synchronized outputs within one system.
Wang Hao, co-founder and CTO of X Square Robot, said in an interview with 36Kr late last year that this type of natively multimodal unified representation helps reduce information loss when data moves across modalities. It enables robots to perceive the physical world more directly while ensuring that perception, decision-making, and execution remain synchronized in dynamic environments.
“The problem with the fine-tuning route is that once upstream models are no longer open-source, or once foundation model capabilities make a leap forward, all fine-tuning work could be overturned,” Wang said. “That makes it difficult to build a closed data loop and achieve scale effects.”
By contrast, an end-to-end unified architecture lays the groundwork for scaling embodied foundation models. In the future, as parameter counts and high-quality interaction data increase, model generalization, particularly zero-shot generalization, is expected to improve further.
Beyond its foundational technology stack, X Square Robot has also accelerated commercialization this year.
Last month, it partnered with 58 Daojia to launch a commercial home cleaning service that deploys robots in households, bringing embodied intelligence into real-world home settings at scale. The service, believed to be the first of its kind, is expected to roll out across multiple cities in the near term.
With data support from 58 Daojia, iterations of X Square Robot’s foundation model are expected to gain access to diverse, real-world datasets, creating a data flywheel in which deployment itself becomes part of the training process.
36Kr understands that beyond home service scenarios, X Square Robot also plans to expand into industrial manufacturing, logistics, and elder care, pushing embodied intelligence into broader real-world applications.
KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Qiu Xiaofen for 36Kr.