The tech industry and stock markets have been trying to grasp how a small, relatively unknown Chinese company was able to develop a sophisticated artificial intelligence chatbot on par with OpenAI’s ChatGPT at a fraction of the cost.
One possible answer being floated in tech circles is distillation, an AI training method that uses bigger “teacher” models to train smaller but faster-operating “student” models.
DeepSeek claims to have achieved a similar level of performance as OpenAI’s o1 model at a fraction of the cost through “optimized codesign of algorithms, frameworks, and hardware.”
This sparked a sharp selloff in tech shares as investors considered whether the Chinese company’s low-budget approach signaled the end of the AI investment race and the dominance of US tech giants.
But questions soon arose, with some in the industry speculating that the company had piggybacked on OpenAI’s developments.
Such speculation was fueled when Bloomberg reported that Microsoft and OpenAI had launched a probe into whether DeepSeek improperly obtained data from OpenAI to train its own model. OpenAI told the Financial Times on January 28 that it has seen evidence of distillation, though it did not make that evidence public.
Microsoft and DeepSeek did not immediately respond to Nikkei Asia’s request for comment.
Distillation itself is not a new technique and is not necessarily controversial. Nvidia’s Minitron and Falcon 3, which was developed by the Technology Innovation Institute in the UAE, both used the technique, likely using their own families of large language models (LLMs) as the teacher. It has become increasingly popular since 2024 amid demand from businesses wanting to utilize LLMs in their services.
Big LLM models, however, are “difficult to handle, and you would need a vast number of graphics processing units (GPUs) for its deployment,” said an engineer at an AI startup in Japan.
GPUs are the main reason AI systems are so expensive. Nvidia’s signature H100 chips, for example, can cost USD 30,000–35,000 each. Distillation drastically cuts development time and costs, and results in models that can operate faster than their bigger counterparts.
The issue for DeepSeek is whether its low-cost model is based more on distillation than innovation.
“There is a question on whether they’re able to use existing large language models to distill their results,” Kirk Boodry, an analyst at Astris Advisory Japan, told Nikkei Asia. “It seems to be coming up quite a bit in discussion. People are like, ‘I don’t know how much of this is really cutting edge.'”
Kazuhiro Sugiyama, consulting director at Omdia, is skeptical that DeepSeek could drastically disrupt the current AI ecosystem. Its impact is “temporary and limited,” he said, pointing out that although the Chinese company’s chatbot shows signs of impressive innovation, the industry still needs to verify how much it holds up.
Analysts have also questioned whether the Chinese chatbot was indeed developed with a fraction of the budget of Western counterparts.
“When people talk about [DeepSeek’s] headline numbers, like a couple months of development, [or spending] USD 6 million, what they’re talking about is this very specific [model],” said Boodry from Astris. “The numbers that people are throwing around are probably way too low.”
The Chinese company released a paper in December 2024 which set the figure for its V3 model at USD 5.6 million. This does not include costs associated with prior research and experiments. The training cost of OpenAI’s GPT-4, is estimated to exceed USD 100 million.
Sugiyama said more companies will likely enter the race to develop LLMs, but the market position of the big players, including OpenAI, will probably not change. AI models will gradually “polarize,” he predicted, with big companies like Microsoft and Google continuing to invest in bigger and more powerful models to be used on their services, and smaller players developing smaller, cheaper and more efficient models that are tailored to specific markets.
Hype aside, engineers do not doubt that DeepSeek has accomplished something that deserves to be acknowledged.
Even if the company used distillation, that alone would not be enough to develop a functioning model, according to one engineer. “It will need know-how to utilize GPUs efficiently and also come up with a way to do complex training,” such as combining different models to come up with a better answer.
Another AI engineer said he was “not surprised” that a company like DeepSeek suddenly appeared. “There is a big trend of reducing the size of the AI model. … Over time, there will be many ways to achieve this.”