When discussing the challenges faced by the semiconductor industry, what comes to mind first? Lithography machines? 5-nanometer technology? A perfectly square chip that we can’t produce?

Yes, but not entirely.

People often categorize semiconductor chips as part of the hardware industry. In reality, it’s an industry highly integrated with both hardware and software, where the latter often plays a more significant role.

The hardware of a chip refers to the physical platform that runs instructions, including processors, memory, storage devices, and more. Chip-related terms like “transistor count” and “7-nanometer process” are examples of hardware parameters.

Software, on the other hand, includes firmware, drivers, operating systems, applications, operators, compilers, development tools, model optimization, deployment tools, and application ecosystems, among others. Software guides hardware on how to respond to user instructions, process data, and tasks, while also optimizing the use of hardware resources through specific algorithms and strategies. Terms like “x86 instruction set,” “deep learning operator,” and “CUDA platform” typically relate to software for chips.

Without hardware, software cannot execute. Yet, without software, hardware is just a pile of meaningless silicon.

In 2012, with the combination of deep learning and GPU making a splash at the ImageNet competition, artificial intelligence became an overnight sensation globally, with the tech world turning its focus on this field. Nvidia, which had been deeply involved in the CUDA AI computing platform, saw its stock price soar, becoming the new era’s dominant force.

Yet, software would become the core technical barrier in the AI era.

To break Nvidia’s monopoly, former chip leader Intel and longtime rival AMD launched OneAPI and ROCm respectively to compete with CUDA. The Linux Foundation, along with Intel, Google, Qualcomm, Arm, and Samsung, among others, formed the UXL Foundation, informally dubbed the “anti-CUDA alliance,” to develop a new open-source software suite enabling AI developers to program on any member company’s chips, attempting to replace CUDA as the preferred AI development platform.

Conversely, Nvidia has been fortifying its CUDA moat.

As early as 2021, Nvidia publicly stated its prohibition on using translation layers to run CUDA-based software on other hardware platforms, and in March 2024, it upgraded this to an outright ban, directly adding a clause to CUDA’s end user license agreement.

For Chinese users, this ban hits harder.

In 2022, Nvidia was required to stop supplying high-end GPU chips to the Chinese market, effectively blocking China’s GPU chip purchase channels.

Now that running CUDA software on other chips is also prohibited, what will Chinese AI companies do?

The rise of domestic AI chips in China

In fact, long before this ban was issued, Chinese chip companies had already been bracing themselves. In 2015, as the Chinese AI industry boomed, the four “AI dragons” emerged to lead the industry’s development.

During this AI wave sparked by interest in convolutional neural networks (CNNs), Chinese companies recognized the importance of producing AI chips domestically.

In this period, nearly a hundred Chinese AI chip companies emerged, including startups like Cambricon, Horizon Robotics, Biren Technology, and Houmo.ai, as well as tech giants like Huawei, Alibaba, and Baidu, along with traditional chip manufacturers and mining equipment manufacturers.

Everyone jumped on the bandwagon, and the industry flourished, with seemingly a common goal: to create an independent domestic AI chip ecosystem.

Amid this push, Chinese AI chip players early on realized the importance of software, tools, and ecosystems for chips, and thus invested significant time and effort in solving software-related problems while continuously upgrading and iterating hardware products.

CUDA is a closed software platform, so building an original software stack from the ground up is key to breaking through the CUDA ecosystem barrier.

Overview of China’s AI chip software platforms

China’s AI chip startups are flourishing in the cloud, edge, and endpoint fields, each excelling in their specific areas. For example, Biren has developed the BIRENSUPA software platform, which includes a hardware abstraction layer, a programming model, the BRCC compiler, deep learning and general computing acceleration libraries, toolchains, support for mainstream deep learning frameworks, self-developed inference acceleration engines, and application SDKs for various scenarios. It is one of the few comprehensive AI software development platforms in China.

In addition, Cambricon, focusing on cloud and automotive AI chips, has launched a foundational software platform. Houmo.ai, specializing in integrated smart driving chips, has also introduced the Houmo Dadao software platform. Moore Threads, which focuses on full GPUs, has rolled out the MUSA SDK and AI software platform. Iluvatar CoreX, focusing on general-purpose GPUs (GPGPUs), has developed the Iluvatar CoreX software stack.

Unlike China’s early chip researchers who started from scratch, contemporary Chinese AI chip players are mostly backed by extensive industry experience and understand the critical importance of CUDA-like software tools for AI developers.

Therefore, during the period from 2015 to 2022, these players strived to build their own hardware and software ecosystems. They managed to catch up to some extent on an international level, albeit still significantly behind global giants like Nvidia, which did not remain idle, consolidating its position by leveraging its specialization in deep learning.

What no one anticipated however, was that a new opportunity for change would arrive so soon. In November 2022, ChatGPT burst onto the scene, disrupting the industry’s balance once again.

Are large models a godsend opportunity?

In November 2022, with ChatGPT making a global splash, large language models (LLMs) suddenly became the frontier technology pursued worldwide, far surpassing the popularity of CNN.

For some Chinese AI chip manufacturers, this was seen as a godsend opportunity to catch up with the tide of competition.

What’s even more advantageous is that the technological foundation of LLMs is the Transformer network, which initially had three different paths: BERT, T5, and GPT.

However, since the stunning debut of ChatGPT, GPT has become the absolute mainstream, leading the global AI industry to a unified understanding.

In the history of AI technology development, such a level of unity is almost unheard of.

The first-mover advantage of CUDA suddenly narrowed as a result.

Due to this rapid convergence, Chinese AI chip manufacturers were able to quickly get started with tuning and adapting their large models. More importantly, at this time, they were able to start on level ground with other international players.

Currently, with Nvidia’s strict prohibition on running CUDA on other AI chip hardware platforms, coupled with the further tightening of US chip bans and the global shortage of computing power, Chinese large model software companies struggle to obtain the most cutting-edge GPU chips. Therefore, the first pain point to resolve is figuring out how to migrate existing large models to new computing platforms.

Given the urgent demand for computing clusters in large model training, Chinese AI chip companies are now committed to strengthening their cluster capabilities. Taking Biren with its GPGPU architecture as an example, 36Kr said customers have provided feedback that Biren’s SUPA managed to complete practical application migrations in a short time with software team support, and its performance in mainstream open-source large models has shown promising results.

If AI chip manufacturers can provide easy-to-use and low-cost migration tools, and offer comprehensive model adaptation capabilities as well as mature cluster deployment experience, rapid implementation of large models becomes feasible.

Industry insiders also told 36Kr that several Chinese companies, including Biren, have already completed adaptation for most domestic open-source large models, accumulating substantial experience in deploying 1,000-card clusters, and the time for self-developed model adaptation by Chinese large model partners has significantly shortened.

36Kr also learned that, besides helping users quickly migrate from CUDA to the SUPA ecosystem, large model companies can also leverage Biren’s architectural features and SUPA’s capabilities to extend the CUDA ecosystem, further enhancing performance.

By developing everything from the bottom up, Biren can maximize its hardware advantages, ensuring that its software stack can always be optimized, iterated, and adjusted regardless of changes from hardware to terminal applications.

Under the current stronghold, apart from the chip layer, localization is actively being promoted across the large model software, computing power, and cloud computing layers as well.

AI chip companies, as the foundational building blocks of the overall AI ecosystem, seek to deeply cooperate with large model, framework, and cluster companies to maximize overall performance.

For example, Biren has not only partnered with framework developers like PaddlePaddle to not only meet enterprise users’ development needs in line with international standards, but specifically tailor solutions to be compatible with the domestic environment, providing a smoother integration path for Chinese AI companies.

Simultaneously, Biren has also partnered with Chinese computing power optimization players like Infinigence, further driving the localization of AI computing in an efficient manner.

To resolve bottlenecks in the ecosystem, Biren is promoting its software platform by building computing power platforms, open-sourcing related tools and libraries, and opening upper-level models. It is carrying out joint adaptation and optimization with framework and large model partners to establish ecosystem collaborations, and also promoting implementation through various means like research cooperation with universities, research institutions, and end customers.

Software is undoubtedly the most challenging barrier to break through, and it is currently the consensus focus of all major AI chip companies. By making concerted efforts across industry, academia, and research, breakthroughs can be achieved. For instance, Zhejiang University’s AI teaching platform Mo uses Biren’s hardware and software resources as the foundation for teaching practice, providing students with practical opportunities and sowing the seeds for the long-term development of the domestic software ecosystem.

Conclusion

Undoubtedly, computing power has become the battleground of the AI era, with its severe shortage constraining the development of AI technology in various countries.

OpenAI’s CEO Sam Altman once reportedly said that “computing power will be the currency of the future,” suggesting that the development of AI will transpire into a massive power struggle between companies, organizations, and even countries.

Currently, Nvidia’s GPU hardware, thanks to its advantages in CUDA software, is highly sought after by the market and has long been in a state of undersupply. And the rise of large models replacing deep neural networks as the new generation of AI technology just gives China’s AI chip players a rare opportunity to play catch-up.

Looking back over the past two decades, the reason why Nvidia has been able to dominate the AI era is due to its first-mover advantage in the field of AI, leveraging the wave of deep learning to achieve a strategic advantage over Intel with the CUDA platform.

Now, a brand new path is about to emerge again. But this time, Chinese companies will be ready.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Xiao Xi for 36Kr.