6 billion parameter AI model test: Intel leads 2.4 times, the only alternative to NVIDIA

MLCommons officially announced the performance benchmark test results of MLPerf inference v3.1 for the 6 billion parameter large language model and the computer vision and natural language processing model GPT-J. The Intel CPU processor and AI accelerator performed well and are quite competitive in AI inference.

The previously disclosed MLCommonsAI training results and HuggingFace performance benchmark test results in June showed that the Intel Gaudi2AI accelerator can completely surpass the performance of the NVIDIA H100 accelerator in advanced visual language models.It can be called the only feasible alternative to NVIDIAH100/A100, the latest results confirm this again.

On the GPT-J model, the Intel Gaudi2 accelerator's GPT-J-99, GPT-J-99.9 server query and offline sample inference performance are 78.58 times/second and 84.08 times/second respectively.

Compared with competing products, H100 only has 1.09 times (server) and 1.28 times (offline) performance advantages over Gaudi2. Gaudi2 has 2.4 times (server) and 2 times (offline) performance advantages over A100.

It is worth mentioning thatThe results submitted by Gaudi2 use the FP8 data type with an accuracy of 99.9%.

Gaudi2 software is updated every 6-8 weeks and will continue to improve MLPerf benchmark performance and expand model coverage.

At the same time, Intel submitted 7 inference benchmark tests based on SapphireRapids fourth-generation Xeon scalable processors, including the GPT-J model.

The results show that the fourth generation Xeon performs very well when handling general AI workloads, including vision, language processing, speech and audio translation models, as well as the larger DLRMv2 deep learning recommendation model and ChatGPT-J model.

As of now,Intel remains the only vendor to submit public CPU results using industry-standard deep learning ecosystem software.

According to the latest results,Using GPT-J to summarize a 1000-word press release of approximately 1,000-1,500 words, the fourth-generation Xeon can complete two paragraphs per second in offline mode, and one paragraph per second in real-time server mode.

also,Intel submits MLPerf test results for Xeon CPUMax processors for the first time, which integrates up to 64GB HBM3 high-bandwidth memory, is the only CPU that can achieve 99.9% accuracy for GPT-J, which is very suitable for applications with extremely high accuracy requirements.

Visit the purchase page:

Intel Flagship Store