Xiaomi launches three self-developed large models. Lei Jun said it will invest more than 16 billion in the AI field this year

On March 19, Xiaomi announced the launch of three self-developed large models, Xiaomi MiMo-V2-Pro, Xiaomi MiMo-V2-Omni and Xiaomi MiMo-V2-TTS. It is reported that MiMo-V2-Pro and MiMo-V2-Omni have officially opened API services.

According to Xiaomi’s official introduction, among the three models, MiMo-V2-Pro is the flagship text base, which is designed for high-intensity Agent work scenarios and focuses on reasoning, planning and tool invocation. MiMo-V2-Omni is a full-modal Agent base that natively integrates text, visual and audio perception to open up a complete link from understanding to execution. MiMo-V2-TTS is a large speech synthesis model. Its goal is to give Agents the ability to express warm and emotional voices, forming the final link of the entire stack.

As the flagship base model, MiMo-V2-Pro is specially optimized for agent scenarios. It performs supervised fine-tuning and reinforcement learning for complex and diverse agent architectures, has stronger tool calling and multi-step reasoning capabilities, and ultimately delivers results. From an architectural point of view, the total parameter size of the model exceeds 1 trillion (1T), of which the activation parameters are 42B. It adopts an improved hybrid attention mechanism (Hybrid Attention), which greatly improves the model capacity while ensuring reasoning efficiency. Its context window is further expanded to 1 million Tokens, which can support ultra-long task chains and complex workflows.

MiMo-V2-Omni and MiMo-V2-TTS, which were unveiled simultaneously, complete the two pieces of the puzzle of perception and expression. The core value of the former lies in the alignment of audio, images, and videos. The latter supports an emotional expression engine with fine-grained control, giving Agents expression capabilities closer to humans.

In terms of pricing, MiMo-V2-Pro’s API price is lower than competing products of the same level. Within the context of 256K, the input price per million tokens is US$1, and the output is US$3; within the context range of 1M, the input is US$2, and the output is US$6. Currently, MiMo-V2-Pro has officially opened API services. MiMo-V2-Omni has also opened its API and supports 256K context length. The input price is US$0.4 per million tokens and the output price is US$2.

API service charging standards for MiMo-V2-Pro and Claude

In addition, Xiaomi has also joined forces with the five Agent framework teams of OpenClaw, OpenCode, KiloCode, Blackbox and Cline to provide a one-week limited-time free interface support to further promote its penetration in the developer community.

This is also seen as an important signal that Xiaomi is fully betting on the era of agents. Early that morning, Xiaomi founder Lei Jun posted on social platforms: "In the field of AI, our R&D and capital investment this year will exceed 16 billion yuan."

According to Lei Jun, MiMo-V2-Pro, a large model with trillions of parameters, ranks eighth in the world in the Artificial Analysis ranking of global large model comprehensive intelligence. Ranked by large model brand, it ranks fifth in the world. “Our model has just been completed and will be rapidly iterated and enhanced in the coming period.”

It is worth mentioning that Xiaomi’s MiMo large model manager Luo Fuli also publicly stated on social platforms that the previously launched “Hunter Alpha” is the internal test version of the flagship model MiMo-V2-Pro. Luo Fuli, born in 1995, is called an "AI talented woman" by the industry.

On March 11, the mysterious model code-named “Hunter Alpha” was launched on OpenRouter, the world’s largest API aggregation platform. It is reported that in just seven days, the cumulative number of calls to this model exceeded 1 trillion Tokens, and it topped the list for many consecutive days, causing heated discussions and was once mistaken for an early version of "DeepSeek V4". .

Coincidentally, Luo Fuli once worked at DeepSeek. She started her career at Alibaba Damo Academy, where she led the development of the multi-language pre-training model VECO and promoted the open source work of AliceMind. In 2022, Luo Fuli joined DeepSeek's parent company Huanquan Quantitative to engage in deep learning-related work. She later served as a deep learning researcher at DeepSeek and participated in the development of DeepSeek-V2 and other models.

In December last year, Luo Fuli made her first public appearance at Xiaomi's "People, Cars, and Homes Ecosystem" partner conference.

On December 17, Xiaomi Group partner and group president Lu Weibing announced that Xiaomi’s self-developed large AI model Xiaomi MiMo-V2-Flash was officially open source and launched. Lu Weibing revealed at the time that Xiaomi had launched "pressure investment" in the field of AI, and the progress of large models and applications "far exceeded expectations". In the future, it will focus on the core direction of "the deep integration of AI and the physical world."