When OpenClaw swept across the world, AutoArk founder Zeng Xiaodong did not react with excitement. What he felt instead was something closer to relief, a sense of finally being seen.
“OpenClaw is essentially an agent framework. Underneath it is an agent that can write code, deployed on a cloud server or a computer,” he told 36Kr. “Our AI operating system is also, at its core, an agent framework, but it is the OpenClaw that runs on hardware at the edge. Robots, earbuds, and glasses can all run on this OS.”
That is what AutoArk has been building since its founding in 2024. Zeng has a name for the paradigm: “vibe hardware.” In his telling, it means using natural language to develop hardware, with artificial intelligence writing code, tuning drivers, and completing the full cycle of application development and deployment on its own.
AutoArk said it has built a product stack spanning hardware devices and its operating system, including the Eva OS platform, hardware products, and an ecosystem that includes the AI-driven education robot Qiduoduo.
More recently, AutoArk closed two consecutive pre-Series A funding rounds. Its investors include wearable brand Shokz, Guoruiyuan Fund, Everpine Capital, and Shanghai Angel Group, while CEC Capital served as financial adviser. Over the past year, it moved through four funding rounds and raised a cumulative nine-figure RMB sum.
Before founding AutoArk, Zeng spent a decade at Alibaba and Ant Group, where he incubated hardware products from zero to one, including facial recognition payment systems, Alipay Box, which sold in the tens of millions of units, and Tao Cafe, which he described as China’s first cashierless supermarket. Having lived through the first phase of the AI boom, from hardware products to systems, Zeng said he came away convinced that the biggest opportunity lay in building the kernel itself, or the OS. “But you can’t build a kernel from pure imagination,” he said. “So I needed to first build a hardware product end to end.”
AutoArk is not trying to become another large model company or a standalone hardware maker, Zeng said. Its goal is to build an OS framework for the next generation of intelligent terminals, so that devices in different form factors can run and evolve on a common foundation.
After OpenClaw, AI hardware development changed
After OpenClaw went viral, harness engineering also became a hot topic in Silicon Valley. As model capabilities improve, what determines whether an agent is useful is no longer the model alone, but the environment in which it runs. What tools can it call? How does it understand the current state? How is the feedback loop designed? That environment is the harness.
Eva OS is AutoArk’s answer to harness engineering for hardware. “You can think of Eva OS as a hardware version of the context model. It adds to the OS rather than replacing it,” Zeng said. In multiple conversations with 36Kr, he stressed that Eva OS is not trying to become another HarmonyOS. In the era of traditional operating systems such as Android, Linux, and ROS, the issue was not that AI was too weak. The missing piece, he argued, was an intermediate layer that could let AI capabilities run natively on hardware.
So what exactly does that intermediate layer need to do?
In the past, getting a complete AI hardware pipeline up and running at production-grade service levels required at least three people and two to three months of work, according to Zeng. With Eva OS installed, developers can describe requirements in natural language, and the AI, aware of the hardware environment it is operating in, writes the app on its own. On average, he said, it takes half an hour to turn an edge device into a real-time interactive AI system with memory and the ability to adjust in real time.
That development model depends on tight coupling between Eva OS and the hardware. If the AI does not understand the full context of the hardware it is running on, including chip compute capacity, which sensors are online, how much memory is left, and the connection status of peripherals, it cannot develop an app on top of it. Zeng described that as the hardest part of Eva OS, and its moat.
Recently, AutoArk partnered with an undisclosed robotic arm company, according to Zeng. After connecting a development board running Eva OS to the arm, the AI tuned the driver, fixed bugs, and then began autonomous exploration, he said. When an engineer issued the command, “Help me pick up a certain object,” Eva OS could write the program itself and learn through trial and error, Zeng said.
“Eva OS can do its own trial and error because it knows the connection state between the development board and the robotic arm. That is completely different from before,” Zeng said. In the old workflow, engineers had to read through driver documentation, troubleshoot hardware bugs one by one, and manually get everything working, while hardware remained stuck in the traditional OS model of preloaded apps and abstracted hardware.
To make this work on edge devices with limited compute, Eva OS uses an architecture that coordinates between the cloud and local hardware, according to AutoArk. It can deliver voice latency below 250 milliseconds and multimodal feedback below 350 milliseconds, compared with the roughly 600-millisecond voice latency that the company said is typical of industry-standard solutions.
The logic of Eva OS is straightforward: leave on the device whatever can be handled on the device, and send only complex reasoning to the cloud. High-frequency interactions such as speech recognition, text-to-speech, and visual perception run locally rather than being routed to the cloud every time. The edge-side model handles memory, execution, and interaction, remembering user habits, calling tools, and providing the interface. The cloud retains general knowledge and complex reasoning, while the edge turns those capabilities into something that can run on hardware. If the perception model runs entirely on the device, costs can fall by 70–92%, AutoArk estimates.
AutoArk also plans to launch a new hardware terminal called Eva Pi. Zeng described it as a hardware terminal that can write its own code and update itself.
Eva Pi integrates Eva OS and can sense the full hardware-side context in real time, including sensors, drivers, connection status, and runtime feedback, allowing it to natively develop, deploy, and iterate AI applications on the device itself. Eva OS 1.0 has been on the market for more than three months, and more than 2,500 enterprises and R&D organizations have already applied it to hardware product development, according to the company. Those use cases span wearables, desktop robots, in-car assistants, and robotic arms.
End-to-end models will determine whether AI OS can exist
In speaking with Zeng, one comes away with a clear sense of how firmly he believes in the end-to-end path.
In the first half of 2024, large model makers were still caught in the storm around language models, and the technical path for multimodal interaction had yet to converge. The only real point of reference was the launch demo for GPT-4o, but at the time OpenAI had not opened its API. Most AI hardware companies chose a pipeline approach instead, chaining together modules such as ASR (automatic speech recognition), large language models, and TTS (text-to-speech), like an assembly line to complete the task. That path was relatively mature and economically viable.
“But the problem was obvious,” Zeng said. “There was severe information loss between modules. Emotion, tone, and continuity all got lost. Latency piled up layer by layer, and you ended up with a lot of bugs to patch.”
He did not choose that route. Instead, with a team of seven, he spent nearly a year building an end-to-end multimodal foundation model that could run on hardware. That model became the basis of Eva OS.
The decision grew out of his experiences at Ant Group. While incubating facial recognition payment systems, Alipay Box, and Tao Cafe, he said he kept running into the same wall: a vast gap between AI algorithms and terminal hardware, with a missing middle layer.
The logic behind AutoArk’s choices is straightforward. Future end-to-end models must be able to run on a wide range of edge devices at far lower cost. That, Zeng argues, is where startups still have room to compete. A company focused only on vertical software models is too easily swallowed up by makers of foundation models.
One example is education. Starting in 2024, AutoArk began exploring AI-powered education through Qiduoduo. Most large model makers, by contrast, only entered the segment in the second half of 2024 or in 2025, according to Zeng, giving AutoArk a lead of roughly six months to a year.
Today, AutoArk’s self-developed end-to-end model uses a single model to handle speech recognition, speech synthesis, visual understanding, and language reasoning at once, which the company says reduces information loss. That in-house end-to-end approach has also opened the door to more hardware categories and use cases.
“Our edge-side model combines speech recognition and TTS into one model that handles both tasks,” Zeng said. “It does not require a GPU. It runs entirely on a CPU, and memory usage stays below one gigabyte.”
That matters in overseas markets, where devices such as earbuds and glasses often face unstable network conditions. Hardware running Eva OS can perform speech recognition, speech synthesis, and basic translation even without an internet connection, according to the company.
Qiduoduo is the first commercial case for Eva OS, an education robot aimed at children aged between three to ten. At present, the device’s user metrics exclude app usage and count only hardware-use duration for functions such as conversation and reading. By that measure, users spend an average of 145 minutes a day on the product, according to AutoArk.
The self-developed model also creates a different kind of user experience, Zeng said. Much of that comes from the fact that the end-to-end model does not pass through translation steps between modalities. Voice and visual signals connect directly to the language model. It can perceive emotion, catch contextual shifts in continuous conversations, and respond in a way that feels more human.
Cost matters just as much. AutoArk says its end-to-end model has reduced voice costs to one-twentieth of standard industry solutions. That allows Qiduoduo to be sold at a price point of around RMB 1,000, (USD 146.1) with no subscription fee afterward.
Two years ago, Zeng and his seven-person team placed a bet on an unproven path: end-to-end models for hardware. Two years later, Eva OS has reached its third iteration, and AutoArk is incubating more consumer hardware categories while partnering across a growing range of products, including AI glasses and AI earbuds.
The pace of model evolution still outstrips most people’s imagination. But Zeng said the larger wager is only just beginning.
The idea of building an operating system for AI hardware is not new. Around 2017, China saw a wave of robot OS startups trying to build an ecosystem the way Android did. Few succeeded. Tmall Genie and Xiaomi smart speakers quickly captured the entry point through subsidies, but what they defined was the smart speaker category rather than the agent category, squeezing out space for a vertical OS.
Cost pressures were even more acute. At the time, achieving high performance required smartphone-grade chips, which significantly increased costs. Even when devices sold for RMB 2,000–3,000, (USD 292.2–438.4) companies still lost money.
Zeng believes the opportunity now is fundamentally different from the period when AI was still nascent.
“For AI hardware OS, there is still no true winner anywhere in the world. The window may only last two or three years.”
In some ways, large models have redefined how people interact with hardware. Content no longer needs to be preloaded. AI can generate it. Low-power AI chips have fallen into a commercially viable cost range. End-to-end models, meanwhile, make it possible for a startup with a very small team to connect the entire chain from model to hardware.
“The timing, the conditions, and the people are all here now. Everything the last wave lacked, this wave has.”
Inside AutoArk, hardware iteration has accelerated to once a day as AI assistance reaches full swing, according to Zeng. To become a company capable of building AI OS, he began pushing an organizational overhaul last year: everyone practices vibe coding. Whether engineer, product manager, or operations staff, everyone’s work is meant to converge at the code layer.
“When all of a company’s actions converge into vibe coding, all of your data becomes structured,” he said. “Once it is structured, real middle-layer optimization becomes possible.”
KrASIA features translated and adapted content that was originally published by 36Kr. This article was written by Deng Yongyi for 36Kr.
Note: RMB figures are converted to USD at rates of RMB 6.84 = USD 1 based on estimates as of April 9, 2026, unless otherwise stated. USD conversions are presented for ease of reference and may not fully match prevailing exchange rates.