On February 26, on the third day of Open Source Week, DeepSeek announced the opening of the efficient FP8GEMM library DeepGEMM. DeepSeek's releases in the past three days are all related to algorithms and are more technical.
Chen Ran, founder of the large model ecological community OpenCSG (Open Expression), gave an example to China Business News, "In the past, DeepSeek directly gave a car and told everyone that the car has a range of 900 kilometers, but now DeepSeek is digging deeper to find out how it can drive to 900 kilometers." Why DeepSeek's model can achieve better results, there are some algorithms and corresponding frameworks, and the open source of these "scaffoldings" is conducive to future ecological construction.
Specific to the keywords released this time, GEMM (General Matrix Multiplication) is a basic operation in linear algebra, while FP8GEMM is a calculation operation that uses 8-bit floating point numbers for matrix multiplication. FP8 is a low-precision floating point format suitable for deep learning and high-performance computing. It can reduce memory usage and bandwidth requirements while maintaining high computing efficiency.
According to DeepSeek, DeepGEMM supports both traditional dense models and GEMM operations of MoE (Mixed Expert) models. This code provides efficient training and inference support for V3/R1 series hardware based on NVIDIA Hopper architecture (such as H100GPU).
DeepSeek mentioned that based on this code base, the performance of 1350+FP8TFLOPS (floating point operations per second) can be achieved on the NVIDIA Hopper architecture GPU, making full use of computing power. At the same time, the code base design is very simple, with only one core kernel function and a code size of about 300 lines, but it outperforms expert-tuned kernels on most matrix sizes.
What is the impact of open source DeepGEMM? The reporter asked this question to DeepSeek, which responded that DeepGEMM solved the pain points of large model computing efficiency and resource consumption through FP8 and hardware-level optimization, and especially provided key support for the implementation of the MoE model. Its open source behavior not only accelerates the democratization of technology, but also may become the "infrastructure" of the AI computing ecosystem and promote the development of the industry in a more efficient and low-cost direction.
FP8 is an emerging standard for AI computing. Its high efficiency can accelerate the training of hundreds of billions of parameter models and reduce video memory requirements. When deployed on edge devices or in the cloud, FP8's low-precision calculations can significantly improve throughput and reduce costs. Therefore, open source DeepGEMM can promote the popularization of FP8 ecology, lower the threshold for developers to use, promote more frameworks and models to adapt to FP8, and accelerate the industry's migration to low-precision computing.
In addition, MoE models are difficult to implement due to computational complexity. The open source of DeepGEMM provides an efficient implementation reference, which may lead to more MoE applications (such as multi-modal models, edge-side efficient models).
Regarding DeepSeek's open source code base for three consecutive days, Chen Ran told reporters, "We are quite shocked by it." DeepSeek's ultimate goal is to show how their R1 and V3 are made. He believes that the algorithms currently released by DeepSeek are "scaffolding" in a sense. "We must give everyone a 'scaffolding' so that everyone can continue to use the technology line based on DeepSeek, and ultimately the industry can build an ecosystem based on this."
In the long run, Chen Ran believes that the open source initiative of DeepSeek is very meaningful. It has both model standards, tool standards, and ecological cornerstones, so that the ecosystem can grow.
Chen Ran judged that DeepSeek's code open source may affect a group of practitioners working on the AIInfra layer. "DeepSeek basically provides the technology stack and models, but it lacks data, but others may also reproduce the data. People in the AIInfra layer will have to find new directions." But he also said that this kind of open source is a double-edged sword. If you can make good use of DeepSeek's open source content, you may also benefit. "If you don't use it well, you will be hit."
Some practitioners also told reporters that DeepSeek's open source is the inference acceleration of the Infra layer. The open source of DeepSeek's underlying technology will have an impact on practitioners, but it may not be too big.
"DeepSeek's impact on the industry has just begun, and no one can guess the outcome." said the above-mentioned practitioner.
DeepSeek has previously announced that it will open source 5 code libraries one after another. Next, DeepSeek will release two more code libraries this week. "Every line of code shared will become a collective motivation to accelerate the development of the AI industry." DeepSeek said in the announcement.