There are only a few days until late April, and the release of the DeepSeek V4 large model has touched people's hearts. Yesterday, the company's researchers suddenly updated the DeepGEMM operator library, which is regarded as a precursor to the release of V4. However, they obviously anticipated the reaction from the outside world, and added an additional explanation after the update,It is emphasized that this update is only related to DeepGEMM development and has nothing to do with internal model release.That is to say, don’t think too much, this does not mean that V4 will be released.

However, the more this statement is made, the more people are interested in DeepSeek V4, because there are many highlights in this wave of DeepGEMM updates, and it cannot be related to the V4 large model.
In addition to supporting the FP8_FP4 hybrid operator and optimizing support for NVIDIA Blackwell, this update mainly includes Mega MoE and HyperConnection. Mega MoE may bring a major upgrade to the MoE architecture.
Mega MoE has many benefits, and there are many explanations on the Internet.Gemini's analysis suggests that the number of activated experts in V4 will be significantly higher than the 256 in V3, and may be thousands.This will obviously greatly improve the performance of V4, while maintaining flexibility and not having exaggerated demands on computing power and video memory.

More importantly, this update of DeepGEMM also hints at the parameter amount of the V4 large model. Netizens said that the single-layer MoE is approximately 25.37B.If it is still 60 layers, then V4 will most likely be a 1.6T large model, or at worst it will be a 48-layer 1.25T large model.
Compared with previous rumors that V4 has 1T trillion parameters, 1.6T parameters means that it is 60% higher than previous expectations, so the performance is very worth looking forward to.
In case 1.6T is not realized, the parameter volume of 1.25T will be doubled compared to the 670 billion parameters of the current V3. We can still look forward to the performance. After all, if Mega MoE technology can activate thousands of experts again, it will definitely be a transformation and a milestone event in the development of large models of MoE architecture.