On March 16, Baidu officially launched Ernie X1 and Ernie 4.5, two advanced large models now available for free on the Ernie Bot website and poised to become the next key drivers of artificial intelligence adoption.

From OpenAI to Grok, Google’s Gemini, and beyond, AI’s major players are constantly shuffling. The one constant? AI applications are still in the early stages of commercialization.

Baidu’s latest rollout signals its intention to prepare for the anticipated AI app boom in 2025, facilitated by multimodal AI.

Ernie 4.5 introduces a breakthrough in native multimodal learning, improving comprehension across text, images, and logic-based tasks. By integrating joint modeling across multiple modalities, it enhances textual reasoning and logical inference. Benchmark results suggest that Ernie 4.5 outperforms OpenAI’s GPT-4.5—while offering API access at just 1% of GPT-4.5’s cost.

Meanwhile, Ernie X1 is an advanced reasoning model positioned against DeepSeek-R1. Baidu claims X1 offers comparable capabilities in multiple benchmarks, with pricing at only half that of DeepSeek-R1. Unlike traditional models focused on content generation, X1 is engineered for decision-making, pushing AI beyond a response tool to a more interactive and analytical agent.

For instance, in narrative generation tests, X1 can craft intricate murder mystery plots from a simple background prompt. It also excels at mimicking the sharp, opinionated tone prevalent in Chinese social media, making it a potential tool for content creators seeking more engaging, localized AI-generated responses.

Since launching Ernie 4.0, Baidu has rapidly built an AI ecosystem, integrating AI-powered search, maps, cloud storage, and document services. Back in 2021, the company transitioned from its strategy of integrating AI and cloud to prioritizing AI-first architecture.

The free release of Ernie 4.5 and X1 reflects cost reductions driven by technological advancements. This move also signals three key industry trends:

  1. Baidu has a rare advantage as one of the few AI firms with a four-layer architecture approach, encompassing foundational research, framework, model, and application. Its deep expertise in AI chips and infrastructure supports long-term commercialization efforts.
  2. China’s tech giants are not just competing on models but also model-as-a-service (MaaS) platforms. Enterprises and developers can now access Ernie 4.5 APIs via Qianfan, while Ernie X1 will be added soon.
  3. AI adoption in China has reached a tipping point as businesses grow eager to embrace new AI technologies. Baidu, as a leader in the space, can lower entry barriers for developers and enterprises to accelerate AI industrialization.

A recent evaluation demonstrated Ernie 4.5’s notable ability to reduce AI hallucinations. This improvement stems from FlashMask dynamic attention masking for improved accuracy, a heterogeneous multimodal mixture-of-experts (MoE) for optimized reasoning, and a self-feedback enhanced post-training process.

In testing, when presented with a single movie screenshot, Ernie 4.5 was able to accurately identify the film.

Its chart analysis and reasoning capabilities also highlight its enhanced multimodal generation abilities.

The industry’s interest in multimodal large models is no secret.

By jointly modeling images, video, and text, AI can unify semantic understanding and overcome the fragmented information issues of conventional chatbot-style models.

For example, model context protocol (MCP) has drawn significant attention in “vibe design” on Blender, where AI facilitates iterative design refinements through dialogue-based interactions, making AI more of a controlled creative tool rather than a passive responder.

Similarly, Gemini’s multimodal capabilities in Google AI Studio—handling multi-turn conversations, image generation, and real-time editing—have positioned it as a go-to AI for developers.

Baidu’s Ernie 4.5 is ostensibly China’s first native multimodal model. Meanwhile, Ernie X1, also multimodal, particularly excels in Chinese Q&A, literature generation, document drafting, logical reasoning, complex calculations, and tool usage.

X1’s strengths are rooted in progressive reinforcement learning, improving creativity, search, tool usage, and logical inference across multiple domains. Additionally, its end-to-end training based on reasoning and action chains has enhanced its ability to perform deep search and tool calls, which are areas where many AI models still struggle.

On the cost side, PaddlePaddle and Ernie have implemented optimizations in model compression, inference engines, and system architecture, achieving deep compression and accelerated inference. This is why X1’s costs are only half those of DeepSeek-R1.

As the AI industry enters the era of deep reasoning, falling costs have eliminated one of the biggest barriers to adoption.

The two major challenges that enterprises face in AI deployment are high technical barriers and unsustainable costs.

Small and midsized businesses often struggle with AI costs, while larger enterprises, despite having technical teams, face high training expenses and complex adaptation challenges.

For years, the feasibility of AI adoption was unclear, leading to inefficient investments. But as AI models continue to improve, companies across industries are now fully embracing AI-driven transformation.

Baidu’s strategy of lowering costs and raising accessibility with Ernie 4.5 and X1 is aimed at addressing these pain points.

Baidu’s cloud AI business has steadily grown over the past year, proving that more enterprises are leveraging MaaS platforms and foundational AI models to build applications without excessive upfront investment.

In March 2023, Baidu was the first Chinese firm to declare its commitment to rebuilding all of its products with AI. Since then, the company has invested heavily in next-gen foundational models, culminating in today’s native multimodal Ernie models.

2025 could be the breakout year for enterprise AI adoption, where precision and accuracy will be paramount. Baidu, with Ernie 4.5 and X1, could yet be the one leading the charge.

KrASIA Connection features translated and adapted content that was originally published by 36Kr. This article was written by Xiao Xi for 36Kr.