At yesterday's GTC conference, NVIDIA released a number of AI systems and officially launched a new LPU chip-Groq 3 LPU, which is the product of the $20 billion acquisition of Groq technology last year. Compared with AI GPU chips that focus on heavy model training, Groq 3 LPU is designed for AI inference and has the advantages of low latency and long context. It can be used in conjunction with Vera Rubin to support the entire AI process.

The good news is that this LPU chip is also expected to be used in China.Foreign media quoted sources as saying that NVIDIA is preparing to launch a Groq chip that can be sold to the domestic market.

Unlike previous GPU chips, which had to have their performance reduced before they could be exported,This time the Groq chip will not have castrated specifications, and it is not a domestic special version like the H20.

Without castration or special supply, such Groq chips will obviously reduce a lot of resistance when sold in the domestic market. However, the biggest problem is whether it can pass the U.S. review. It depends on how Huang Renxun persuades the U.S. President.

But having said that, even if NVIDIA does not supply Groq chips specifically to China, it is difficult to say that the Groq chips launched in China in the future will be the Groq 3 LPU released yesterday, because the latter's current performance and specifications are also very strong.

The single chip of this product is called LPU30, manufactured by Samsung OEM, integrating 500MB SRAM cache, 98 billion transistors, FP8 character 1.2PFLOPS, AI computing performance is far inferior to Rubin GPU, but the bandwidth of 150TB/s is much higher than the 22TB/s of HBM4.

The Groq 3 LPU chip will appear in the form of the Groq 3 LPX rack, which can integrate 256 LPU30 chips with a cache capacity of 128GB.The total memory bandwidth has been increased to 40PB/s, the interconnection bandwidth is also 640TB/s.

In general, LPU cannot compare with GPU in terms of AI computing power, but the bandwidth of SRAM undoubtedly beats HBM4, and the latency is lower. The two chips are suitable for different situations. It is not a matter of who replaces the other. LPU sales will explode in the future.