Chinese artificial intelligence and electronics company Mobvoi recently unveiled a new large model product called Sequence Monkey.

Sequence Monkey is equipped with multimodal generation capabilities across six dimensions: knowledge, dialogue, mathematics, logic, reasoning, and planning. It can be utilized for text generation use cases and is capable of supporting tasks ranging from image generation to 3D content creation, voice synthesis, and speech recognition.

Sequence Monkey

The new product has enabled Mobvoi to develop a range of AI-powered solutions that cater to the needs of its client base, including consumers, enterprises, and creators. Examples of such solutions include:

  • Mozhuan Writing is an AI writing assistant designed to generate textual content for various contexts. The solution can optimize generated content based on user feedback, ratings, and preferences, and is equipped with machine learning capabilities to continuously improve its performance based on user data.
  • Yan Zhi Hua is an AI image content creation tool capable of text-to-image, image-to-image, and animated image generation, among other AI drawing capabilities. The tool supports customization of model styles and multi-user collaboration for enterprise-level projects.
  • Moyin Workshop, also known as DupDub overseas, is an AI voiceover tool that allows creators to produce content seamlessly integrating text and voice. Creators can create custom voices by describing the desired voice in text or choose from thousands of voice styles. The tool can also adjust the emotional tone of voices, including calm, sad, and happy, among others.
  • Weta365 is an AI platform for creating and livestreaming virtual human videos. Users can select from over 100 realistic 3D digital humans and over 100 voices, and can select different scenes, actions, and expressions to generate personalized short videos or live content.
A virtual character created using Mobvoi’s Weta365 platform for livestreaming. Image source: Weta365’s official website.

Four-phase evolution

Founded in 2012 by former Google scientist Zhifei Li, Mobvoi’s journey began with a focus on voice interaction technology when intelligent voice assistants were popularized by the release of Siri on the iPhone 4s. The company sought to redefine the technology, developing a mobile search engine known as Voice Search. This was Mobvoi’s first phase of evolution, which made it one of China’s pioneers in voice recognition and natural language processing technology.

The second phase involved the integration of AI assistants into wearable hardware. In 2014, when smart wearables, including smart watches were not yet prevalent, Mobvoi introduced TicWatch, its first smart wearable product catered to consumers. TicWatch pioneered the use of an AI voice assistant feature, which helped normalize AI voice assistants being used in the smart wearables industry.

When OpenAI introduced the GPT-3 large language model in 2020, Mobvoi saw the potential of AI-generated content. Leveraging its expertise in big data processing, the company began to develop a large language model named UCLAI, leading the way in commercializing AI-generated content. This was followed by the introduction of products like Sound Wizardry and Wonderful Element.

Sequence Monkey is an upgraded iteration of the UCLAI model, marking Mobvoi’s latest foray in the AI industry and the fourth phase of its evolution.

The challenges ahead

While the generative AI industry embodies immense potential, it is facing several issues related to reliability, scalability, ethical implications, and data privacy. In April 2023, the Cyberspace Administration of China released a draft regulation seeking public opinion on the “Management Measures for Generative AI Services.” The regulation aims to establish supervisory measures for generative AI in terms of content compliance, algorithm model compliance, and operational compliance.

In addition, Mobvoi relies heavily on Volkswagen Group as its largest client, contributing RMB 2.13 billion revenue, accounting for 42.6% of its total revenue. Volkswagen has worked with Mobvoi since 2017 and became the latter’s most significant enterprise customer in 2022. Both companies collaborate to develop a variety of solutions. One example is Volkswagen Ask, which equips vehicles with Ai-powered features using voice recognition, semantic analysis, and speech synthesis.

Mobvoi is actively diversifying its customer portfolio by seeking collaborations with other automotive companies and broadening its suite of AI products. It is also leveraging its AI-generated content platform to cultivate a community, aiming to solidify its market position.

In May 2023, Mobvoi submitted its IPO prospectus to the Stock Exchange of Hong Kong. According to the prospectus, Mobvoi has raised about USD 233 million to date across seven rounds of financing from February 2013 to September 2019, with investors including Sequoia Capital, Zhen Fund, Susquehanna International Group, Google, and Volkswagen Group. The company’s founder, Zhifei Li, remains its largest shareholder with a 26.72% stake.