2024 has been a watershed year for robotics. With advancements in large models and embodied intelligence, a new generation of startups has emerged. Yet, interviews conducted by 36Kr reveal a peculiar pecking order in this rapidly evolving field.

“Software teams look down on hardware teams, and those working on large models dismiss reinforcement learning outright,” one insider quipped. This hierarchy reflects the philosophical divides shaping the strategies of China’s robotics entrepreneurs.

Wang Sheng, a partner at Innoangel Fund, categorizes robotics startups into three primary groups. Hardware-first teams—often led by engineers—focus on building motors, control systems, and the physical bodies of robots, ranging from humanoids to quadrupeds and robotic arms.

Software-first startups form the other two groups. One consists of artificial intelligence veterans pivoting into robotics, leveraging expertise in areas like computer vision and reinforcement learning. The other is a smaller, elite cohort specializing in large models, often viewed as the “top of the hierarchy.”

“Hardware companies in China barely touch AI,” said the founder of an embodied intelligence startup. Many hardware firms opt for integrating open-source models rather than investing in proprietary AI due to cost constraints. Unitree Robotics, a hardware-driven player, epitomizes this approach. Founder Wang Xingxing has previously mentioned that the company’s investment in AI is limited due to its high cost. “Robots are our foundation,” he said, even encouraging customers to replace their software while retaining Unitree’s hardware.

The debate over integrating intelligence into robotics highlights deeper industry tensions. One hardware representative lamented the lack of consensus on the “soft” aspects of robotics, questioning distinctions between a robotic brain and cerebellum and how embodied intelligence should be constructed. These unresolved debates have left the industry fragmented, with competing paradigms.

Meanwhile, software-focused firms are increasingly developing their own hardware, further fracturing the ecosystem. “Today’s hardware companies are basically video production teams,” joked several investors, alluding to the elaborate setups required for staged robot demos.

Indeed, demonstrations—whether of robots moving objects in factories or sorting shelf items—often create an illusion of artificial general intelligence. The reality, however, is far from it. Behind every flawless-looking demo lies extensive staging, including carefully orchestrated lighting and object placement to mask technical challenges. A slight environmental change can derail a robot’s performance entirely.

“Some demos succeed only once in 10,000 tries,” one insider revealed.

If large models are already being extensively applied in smartphones and computers, why can’t they adequately enhance robotic functionality? The answer lies in the limited application of AI among hardware companies, which typically rely on general-purpose language models. According to 36Kr, these models lack the “spatial intelligence” critical for robotics.

Additionally, the reliance on massive datasets introduces hallucinations that disrupt robotic task execution. “Language models have no bearing on robotics,” one expert said. “Their success rates in localized tasks are abysmal.”

The founder of an embodied intelligence firm highlighted a deeper challenge: no Chinese team has yet developed large models optimized specifically for robotics. A stopgap solution has been integrating a “cerebellum layer” between multimodal large models and robotic hardware. This layer breaks tasks into actionable steps—for instance, making coffee is divided into subtasks like “grab a cup,” “grind beans,” and “pour water.”

While such coordination enables robots to execute tasks, it introduces new challenges. Complex operations require extensive predefined subtasks, and data scarcity remains a significant hurdle. A single motion, like grasping a cup, might need millions of high-quality data points to account for environmental variability.

Many entrepreneurs entered robotics expecting large models to revolutionize the field, only to discover significant gaps. The resulting fragmentation has become unsustainable, prompting a collective shift in industry thinking.

By late 2024, investment trends began to shift. Previously, investors equated robotics with humanoid hardware, and companies like Unitree and Zhiyuan Robotics saw valuations soar past USD 1 billion, making them “too expensive for most investors.”

Meanwhile, Chinese startups faced steep domestic fundraising challenges, forcing some to rethink their narratives. “Until recently, pitching end-to-end robotics was a non-starter,” said one founder. “Now, you can’t walk into a room without it.”

Pure hardware startups are now struggling to secure significant funding. “The market feels cold for hardware-only players,” said a founder of a startup specializing in humanoid joints.

Recent funding rounds confirm this shift. Globally, Skild AI and Physical Intelligence saw their valuations soar past USD 10 billion. In China, Galaxea AI secured investment from Ant Group, while X Square and Qianjue Technology closed significant rounds.

Innoangel Fund’s Wang noted that investors are pivoting from humanoid robots to embodied intelligence, emphasizing its potential to drive generalized robotic tasks. Even hardware manufacturers are rethinking their strategies. Once skeptical of generalization, they are now exploring foundational general-purpose models to build specialized capabilities.

The robotics industry remains in a state of chaotic consensus. However, one certainty is clear: the future lies in integrating hardware and embodied intelligence. Neither can succeed alone.

“Whether you start from hardware or software, the endgame is the same,” an expert said. “It’s a race to achieve higher business efficiency. AGI-era robotics companies must master both AI and hardware—and, most importantly, foster mutual respect.”

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Qiu Xiaofen for 36Kr.