The SuperCLUE team released the evaluation results of the Chinese large model of the DeepSeek V4 series. DeepSeek-V4-Pro ranked first in the country due to its comprehensive performance., the Flash version followed closely in second place, and the domestic open source model is making another breakthrough. This evaluation covers six dimensions: mathematical reasoning, scientific reasoning, code generation, agent task planning, instruction following, and illusion control. The Pro version scored 70.98 points and the Flash version scored 68.82 points. Both scores were significantly ahead of other domestic models.


The DeepSeek V4 series adopts a new attention mechanism. All versions support millions of long contexts, while reducing computing power and memory usage. When used with domestic chips, the overall efficiency is higher.

Compared with the previous generation V3.2, both versions have achieved comprehensive improvements. The Pro version's agent ability has been improved by more than 20 points, mathematical reasoning has been improved by nearly 10 points, command following has been improved by nearly 12 points, and illusion control has also been significantly optimized.


While maintaining efficient reasoning, the Flash version also significantly improves agent and mathematical reasoning, with outstanding cost performance.


The Pro version (15 yuan/million Tokens) focuses on high performance, more stable illusion control, and is suitable for complex tasks and professional scenes. The Flash version is faster and cheaper. The API price is only 1.25 yuan per million Tokens, making it more cost-effective for daily use.

The evaluation also pointed out that there is still a gap between the model and top overseas models in terms of code generation and complex instruction execution. Overall, DeepSeek V4 has established itself as the first echelon in China with its balanced capabilities and affordable cost, becoming a high-quality choice for daily office work, development and creation, and long text processing.