This model has obtained an MIT license, which means that it can be freely used for commercial purposes, and early tests in the industry have confirmed that the model can run directly on consumer-grade hardware, such as the high-end market Apple MacStudio.
AI researcher Awni Hannun said that the new DeepSeek-V3 model can run at a speed of 20 tokens per second on an Apple computer equipped with an M3 Ultra chip. This breaks the industry's earlier consensus on the conflict between artificial intelligence model capabilities and localized operation, and also means that data centers are not a necessary match for large models.
Another artificial intelligence researcher, Xeophon, claimed on X that after testing the new version of DeepSeek-V3 on an internal workbench, he found that it had made a huge leap in all the indicators tested. It is now the best non-inferential model, surpassing Oracle's Claude Sonnet 3.5.
Low-key but sensational
DeepSeek-V3-0324 came out without a white paper or any publicity, only an empty ReadMe file. This almost austere launch form stands in sharp contrast to Silicon Valley’s carefully curated product promotion model.
At the same time, DeepSeek's models are all open source and free for anyone to download and use, as opposed to one of the best commercial models, ClaudeSonnet, which charges a monthly fee of $20.
In addition, DeepSeek fundamentally reimagines how large language models operate, activating only about 37 billion parameters instead of all, so-called "expert" modules, during a specific task, which greatly reduces computational requirements.
The model also features two other breakthrough technologies: Multi-Latent Attention (MLA) and Multi-Tag Prediction (MTP). MLA enhances the model's ability to maintain context across long texts, while MTP generates multiple tokens at each step instead of the usual method of generating one token at a time. Together, these innovations increase output speed by nearly 80%.
To a certain extent, DeepSeek embodies the spirit of Chinese enterprises’ ultimate pursuit of efficiency and resources, that is, how to achieve equal or more optimized performance with limited computing resources. This demand-driven innovation has enabled China’s artificial intelligence to shock the world within a few months.
The changes in DeepSeek's new model are also of great significance to the industry. On the one hand, it greatly reduces the energy consumption and computing costs of large models, further shaking Wall Street's assumptions about the scale of investment in top model infrastructure. On the other hand, the broad consensus on open source in China's artificial intelligence industry has rapidly promoted the development of the domestic AI industry, continuously shortening the distance between it and the world's top opponents.
Others believe that with DeepSeek’s rapid catch-up, the R2 model it plans to release in April may directly challenge OpenAI’s long-publicized GPT-5 model. If this prospect really happens, then the different ideas of China and the United States in developing artificial intelligence may usher in a direct confrontation.