DeepSeek low-key claimed that a small update would reach R1 level at a 20% discount

Last night, DeepSeek released the DeepSeek-V3-0324 model without warning. Although this update was officially called a "small version iteration" in a low-key manner, the actual measured performance far exceeded expectations. This model has been significantly improved in terms of code generation, front-end development, etc., and even some capabilities are comparable to Claude3.7Sonnet, triggering heated discussions in the global AI community.

In the large model arena test KCORES, DeepSeek-V3-0324's code ability score was 328.3 points, surpassing the ordinary version of Claude3.7Sonnet (322.3 points) and close to the thinking chain version of Claude3.7Sonnet (334.8 points).

In the AiderLLMLeaderboard ranking, DeepSeek-V3-0324 scored 55% in the multi-language benchmark test, which is significantly improved than V3 and slightly lower than R1. Among non-thinking/reasoning models, it ranks second behind ClaudeSonnet3.7.

The test data also shows that among the models that perform well, DeepSeek-V3-0324 is the lowest cost, much lower than R1, only about 1/5, and has an outrageously high cost performance.

In addition, the cost of ClaudeSonnet3.7Thinking is 33 times that of DeepSeek-V3-0324, and o1 is 167 times that of DeepSeek-V3-0324.

Currently on the DeepSeek official website, you only need to turn off the "Deep Thinking" option to use the new model.

Open source downloads are also provided on HuggingFace, download address:

https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/tree/main

DeepSeek-V3-0324 contains 685B parameters, a slight increase from the previous generation V3. It adopts MoE (Mixed of Experts) architecture and activates 37 billion parameters. Netizens measured that DeepSeek-V3-0324 supports 4-bit quantization and can run at a speed of 20+token/s on a 512GBM3UltraMac, occupying only 352GB of disk space. The new model adopts the same MIT license as DeepSeek-R1, allowing free modification, commercial use and model distillation, and is more open than the previous version V3.

Judging from the evaluation results, DeepSeek-V3-0324 front-end development performance is outstanding, comparable to top business models.

X blogger Deepanshu Sharma only uses a simple prompt ("use HTML/CSS/JS to create a modern login page"), and can make the new version of V3 generate 800+ lines of code with one click, and run without errors. The effect is comparable to Claude3.7Sonnet.

In the classic bouncing ball test, DeepSeekV3-0324 not only performed better than R1, but blogger Deepanshu Sharma also believed that it generated the smoothest movements.

The o3-mini initially looked good in testing, but it didn't follow the physics correctly, especially in the middle of the video, where the ball didn't react correctly to gravity.

DeepanshuSharma commented that DeepSeekV3-0324 "performs like the only top-ranked non-inference model".

According to X netizen karminski-dentist, DeepSeekV3-0324 also performed better than V3 in the upgraded version of the 20-ball physical simulation test.

The comparison between the new version of V3 and the head inference model is as follows:

In the Mars mission test, DeepSeek-V3-0324 has greatly improved. Planets and legends are rendered correctly, and window calculations for launch and return have also been greatly improved.

Combining UI design and physical simulation, X netizen ParulPandey also used DeepSeek-V3-0324 to generate an interactive physical simulation interface, and used DeepSeek-V3-0324 to simulate water molecules through AnyChat.

In actual operation, the temperature can be increased through the temperature slider to allow the molecules to advance faster and faster and collide and rebound.

Tip: Create an interactive simulation that shows water molecules forming and breaking hydrogen bonds, along with a temperature slider.

Regarding the technical difficulty of this upgrade, Reddit user pigeon57434 gave a relatively neutral interpretation: Don’t be too amazed at the magnitude of this V3 upgrade, because RL (Reinforcement Learning) has great potential. Take the QwQ-32B, for example. Even though it's actually 20 times smaller, it performs almost as well or even better than the R1 in some areas. It can be so powerful only because there is a lot of room to expand the inference model, and it does not even require a new base model. I bet that using more sophisticated techniques, one can easily get a DeepSeek-V2.5 based inference model to beat R1, let alone this new version of V3.

In general, this combination of free + high performance will put increasing pressure on closed source business models such as OpenAI and Anthropic.

This update of DeepSeek once again proves the explosive power of open source models. Not only does its technical indicators approach top commercial AI, it also promotes industry change with low cost and high freedom.

It can be reasonably speculated that this update may be a pre-version of R2, similar to the release rhythm of last year's V3 (24.12.16) → R1 (25.01.20). Perhaps we may usher in a stronger inference model R2 within a few weeks.

As R2 approaches, the global AI competition landscape may usher in a new round of reshuffle.