MLPerf’s latest GPT large model inference test is released! This domestic computing power company ranks first in the world again, with performance up to 1.8 times that of NVIDIA H100. As AIGC applications such as ChatGPT set off a wave of large models, the computing power layer, as infrastructure, became the first industry to benefit.
However, problems such as high computing power demand and high cost have become common pain points for enterprises to implement large models, and are more likely to restrict the forward development of AI: large model parameters are increasing day by day, while computing power supply bottlenecks are imminent, creating a huge contradiction between the two.
How to explore better large model computing power solutions is the focus of the industry.
Recently, the world's authoritative evaluation MLPerf announced the latest inference evaluation results. This is the first time that MLPerf has introduced the GPT large model inference test. The participation rate has reached a new record, with more than 13,500 performance results submitted by NVIDIA, Intel, Google, Qualcomm and other companies.
In MLPerfInference3.1, the MoffetAI S30 computing card was ranked first on the large model GPT-J (6 billion parameters), with single-card, 4-card, and 8-card computing power ranking first.
This is the third consecutive title defense for Ink Core at MLPerf.
Previously, the ink core has won the first place in MLPerfInference2.0 and 2.1 for two consecutive years.
Ink core S30 computing card
Ink core’s achievements have brought feasible innovative directions to large-scale model computing power solutions.
Facts have proven that collaborative innovation of hardware and software that combines AI models and computing platforms can unleash greater computing power potential. This also proves once again that innovative technologies represented by sparse computing will be the key to the development of computing power in the era of large models.
The ink core participates in the MLPerf open partition. According to the organizer MLCommons, this partition is designed to encourage innovation. Therefore, contestants can explore ways to improve computing power through software and hardware collaboration.
On the GPT-J large model in MLPerf, compared with the H100 pure hardware acceleration solution of the 4nm process, the 12nm process Ink Core S30 computing card achieved an advantage of up to 1.8 times through the "original dual sparse algorithm + hardware collaboration" approach.
The GPT-J model in this evaluation is a generative AI model. The performance of the Ink Core S30 computing card in 8-card, 4-card, and single-card modes is 170.5 respectively. 9, 91.57, 23.28 (Sample/s), reaching 1.6 times, 1.8 times, and 1.8 times the performance of NVIDIA H100, demonstrating the capabilities of ink core products in AIGC tasks.
won the championship three times. The large model computing power was the first to "submit the paper", and the software and hardware collaboration continued to innovate. The product strength of Ink Core has been rigorously tested by MLPerf several times, and it has also explored a new path for the development of large model computing power.
01
Rare Sparse computing - "potential stock" of large models has gained market recognition
The excellent results of ink core are mainly due to the collaborative design of software and hardware based on sparse algorithm.
In the era of large models, the importance of sparse computing is self-evident: the size of an AI model is directly proportional to its sparsification potential.
In other words, when the model is larger, there is a greater possibility of sparseness in the algorithm, and the degree of acceleration of sparse calculations is also higher. For general large language models, sparse computing can bring dozens of times acceleration.
Inkcore’s original dual sparse algorithm, combined with software and hardware collaborative design, makes Inkcore’s Antoum® chip the world’s first high-sparse magnification AI chip, supporting up to 32 times sparseness—this is the key to Inkcore’s record breaking record in this MLPerf.
The larger the model, the more obvious the advantage of sparse computing - especially in the current situation where the parameters of large models such as GPT often reach tens of billions or hundreds of billions, which makes the moat of the ink core more stable.
The ink core’s product strength and the general trend of sparse computing have also been recognized by the industry: The ink core’s commercialization process has made important breakthroughs one after another, helping enterprises accelerate AI applications.
Just recently, Ink Core has officially become one of the suppliers supporting ByteMLPerf.
Source: ByteMLPerf website
Project address: https://github.com/bytedance/By teMLPerf/blob/main/README.md
Currently, the Ink Core AI computing platform can support large models of different parameter levels, including BLOOM, OPT, GPT-J, LLaMA, StableDiffusion, etc.
At the same time, it has the characteristics of high throughput, low latency, and low power consumption, which alleviates the difficulty of computing power and truly brings "easy-to-use" and "affordable" large-model computing power solutions to enterprises.
02
brings fundamental changes in computing power, and sparse computing helps Lida Model Development
The sparse computing solution of the ink core can not only alleviate the current computing power problem, but also open up new space for the sustainable development of AI.
Sparse computing reduces the calculation amount of AI models, which means that large models can increase the number of parameters by several orders of magnitude without generating too much calculation amount. The contradiction between large model parameter growth and computing power bottlenecks is expected to be fundamentally solved.
At the same time, due to the reduction in calculation amount, the pain points of high computing power requirements, high power consumption, and high cost of large models have also been solved, achieving a "win-win" effect.
Ink core Antoum chip: the world’s first high-sparse magnification AI chip, supporting up to 32 times sparse
TAGPH5 2The excellent results of three consecutive MLPerfs not only prove the strength of ink core products, but also bring new revelations to the industry: With the help of technologies such as sparse computing, the development and application of large models are expected to usher in a broader space for development, accelerating the proliferation of AIGC and other applications in all walks of life.
03
TAGPH 59About MLPerfMLPerf Initiated by Turing Award winner David Patterson in collaboration with top academic institutions such as Google, Stanford, and Harvard University, it is the most authoritative and influential international AI performance benchmark test to conduct timely tracking and evaluation of the rapidly growing AI computing requirements and performance.