DeepSeek Third Edition: Open Source DeepGEMM

At 9 o'clock this morning, DeepSeek continued to fulfill its Open Source Week commitment and released the open sourceDeepGEMM. Once this post was forwarded, it immediately attracted21,000The amount of reading is enough to show its popularity at home and abroad. It is understood that DeepGEMM is a project focused onFP8The efficient general matrix multiplication (GEMM) library supports the matrix computing needs of ordinary and mixed expert (MoE) groups, and can dynamically optimize resource allocation to improve computing power efficiency.

This library is based onCUDAdevelop, adoptLightweight just-in-time compilation (JIT) module, dynamically compile the kernel at runtime without pre-compilation and installation.

It is worth mentioning that DeepGEMM is designed toDeepSeek-V3/R1 modelProvides simple and efficient underlying support for training and inference, especially forHopper architecture GPU (such as H800) optimization, taking into account high performance and low cost.

As the third result of Open Source Week, the release of DeepGEMM continues DeepSeek's previous strategy of open source models and tools (such as FlashMLA), further lowering the application threshold of high-performance computing technology.

In addition, this open source is the third project of DeepSeek's "Open Source Week" (February 24-28), which has been previously releasedFlashMLA (efficient decoding kernel) and DeepEP (expert parallel communication library).