Can Shengshu’s Vidu AI text-to-video generator outcompete its rivals?

A fresh name has emerged in the rapidly expanding landscape of text-to-video generation tools: Shengshu Technology’s Vidu AI. Launched globally on July 30, Vidu AI allows users to convert text (in both Chinese and English) and images into crisp video clips of 4 or 8 seconds in duration. The platform comes equipped with upscaling capabilities, bringing those clips to full 1080p resolution, setting a high bar for quality.

https://console.kr-asia.com/wp-content/uploads/2024/09/Shengshu-Vidu-AI-Official-Video.mp4

Video source: Shengshu Technology.

Vidu’s arrival signals yet another step forward for China’s ambitions to develop generative artificial intelligence. In tandem with Kuaishou’s Kling AI, MiniMax’s Hailuo AI, and a growing roster of others, Shengshu’s tool is joining the fray just as China’s AI developers work to match, if not outpace, more established international names like OpenAI’s Sora and Google’s Veo.

Media reports have highlighted several key features that make Vidu AI stand out. First is its efficiency—Vidu can reportedly generate a 4-second video clip in just 30 seconds, making it one of the fastest models of its kind.

There’s also the “reference-to-video” feature, which ensures consistency in subjects, settings, and visual styles across multiple clips—an especially useful tool for creators working in dynamic formats like films and games, where coherence is essential.

Vidu’s ability to generate anime-style videos has also drawn attention, which could make it a preferred tool for creators aiming to replicate the distinct aesthetic of Japanese anime. But how does Vidu AI stack up in real-world tests?

How Vidu AI compares

To gauge Vidu AI’s performance, KrASIA put the tool through a series of prompts previously tested on Kling AI and Hailuo AI. The comparisons were designed to evaluate not only the quality of the generated videos but also the coherence, creativity, and speed of each tool.

When prompted to generate a video of a “realistic puppy driving a car,” Vidu AI delivered a clip that, while visually engaging, leaned more toward a toy-like representation of the car rather than the real thing.

https://console.kr-asia.com/wp-content/uploads/2024/09/Shengshu-Vidu-AI-English-Puppy-Driver.mp4

Video generated by Vidu AI in response to a prompt requesting a “realistic puppy driving a car.”

The puppy was well-positioned behind the wheel, but it didn’t quite feel like the driver—it seemed more placed there than actively interacting with the scene. Hailuo AI encountered a similar issue, struggling to achieve full realism. It seems that, for prompts like this, adding more detailed inputs—such as specifying the car model or dog breed—could result in a stronger output.

Next, Vidu AI was tasked with a more playful challenge: a “cute kitten eating lunch like a human.”

https://console.kr-asia.com/wp-content/uploads/2024/09/Shengshu-Vidu-AI-English-Kitten-Lunch.mp4

Video generated by Vidu AI in response to a prompt requesting a “cute kitten eating lunch like a human.”

Here, the tool performed on par with both Kling AI and Hailuo AI, producing a charming scene of a kitten mimicking human behavior at the table.