As AI models continue to expand, HBM may be unable to meet future demands for video memory capacity, prompting the industry to view GPU-driven storage architecture as a potential next technological frontier. Last year, there was news that Nvidia was working with SK Hynix and Kioxia respectively to promote the development of AI SSD and use tailor-made SSD parts to replace HBM as a GPU memory expander. In addition, this year SK Hynix also cooperated with SanDisk to bring HBF (High Bandwidth Flash), a next-generation memory solution for the AI ​​inference era, to solve the same problem.

Nvidia plans to allow GPUs to directly access storage, expected to accelerate HBF

According to TrendForce reports, Nvidia is advancing the development of the GPU direct access storage architecture and plans to introduce it from the Vera Rubin platform and activate the GIDS (GPU-Initiated Direct Storage Access) function. Outsiders believe that this change may accelerate the development of HBF.

GIDS is different from the existing GDS (GPU Direct Storage) function. There is a difference between the two: in GDS, the CPU sends a data request to the storage device before transmitting the data to the GPU. In GIDS, the GPU directly accesses the storage device, skipping the CPU and DRAM in the middle.

Both GIDS and GDS aim to overcome data transmission bottlenecks in traditional computing architectures, and Microsoft and AMD are rumored to be exploring similar approaches. The main problem is that the traditional data transmission method is inefficient. The CPU has a limited structure in thread processing, while the GPU can generate tens of thousands of parallel threads. Currently, GPU-HBM data transmission accounts for about half of the total system power consumption, which further supports the HBF architecture and brings ultra-high-speed NAND flash memory closer to the GPU to cope with future AI bottlenecks.

The emergence of GIDS may allow NAND flash to play a more important role in AI storage systems, while reducing the pressure on HBM in terms of capacity. This shift requires higher-performance NAND flash memory to keep up with GPU processing speeds. The advantage of NAND flash memory is its bit density, which is about 30 times that of DRAM, achieving greater storage capacity in a similar space.

However, NAND flash memory has limited durability, while DRAM has almost unlimited write capabilities. Therefore, HBF is considered more suitable for storing AI model parameters, because this part of the data remains basically unchanged during the inference process and is only used as a read-only workload.