The troubles of the GPU king. As the core beneficiary of the "nuggets buy shovel" logic, NVIDIA's record-breaking performance has become the pillar of confidence supporting the generative AI market. However, hidden behind the excellent performance, the more critical issue is that due to limited production capacity, Nvidia cannot meet the market demand for GPUs. In August, media reported that H100 orders had been queued up to Q1 or even Q2 next year.

According to speculation by GPUUtils, conservative estimates suggest that the total potential orders for NVIDIA GPUs may exceed US$20 billion, and the supply gap for the flagship GPUH100 is as high as 430,000 units.

NVIDIA CEO Jensen Huang also said bluntly:

"Our current shipments are nowhere near meeting demand."

Lao Huang's difficulty lies in the two key technologies that are stuck in Nvidia's neck - CoWoS packaging and HBM memory.

SK Hynix and TSMC are the players behind Nvidia’s neck

The H100, launched in September last year, is the most advanced GPU in Nvidia's product matrix.

Compared with the predecessor A100, its price has increased by about 1.5-2 times, but its performance has made a qualitative leap: the inference speed is increased by 3.5 times, and the training speed is increased by 2.3 times; if server cluster computing is used, the training speed can be increased to 9 times. In LLM training, it can shorten the original one-week workload to 20 hours.

An NVIDIA H100 is mainly composed of three parts: there are three HBM stacks on both sides of the central H100 die, and the outermost layer is TSMC's 2.5DCoWoS packaging frame.


Among the three components, the core logic chip supply is the simplest. It is mainly produced at TSMC’s Tainan Factory No. 18 and uses the 4N process node (actually 5nm+). Due to the weakness of the PC, smartphone and non-AI related data center chip markets downstream of 5nm+, TSMC's 5nm+ capacity utilization rate is currently less than 70%. Therefore, there is no problem with the supply of logic chips.

Nvidia's main supply gap comes from the six HBM (High Bandwidth Memory, high bandwidth memory) on both sides of the logic chip, and the CoWoS package (Chiponwafer on Substrate, chip, wafer, substrate package) that connects the logic chip and HBM.

HBM is a DRAM memory chip based on 3D stacking technology. The technical principle is to vertically stack multiple DDR chips together, and connect the chips to each other through through silicon vias (TSV) and micro-bumps (μBmps) technology, thereby breaking through the existing performance limitations, greatly increasing the storage capacity, and realizing a DDR combination array with higher bandwidth, higher bit width, lower power consumption, and smaller size.

Memory chips are critical to GPU performance, especially high-performance GPUs used for training AI. Inference and training workloads are memory-intensive tasks. As the number of parameters in AI models increases exponentially, weights alone push model sizes to terabytes. Therefore, the ability to store and retrieve training and inference data from memory determines the upper limit of GPU performance. The more large AI models and applications there are, the better it will be for HBM manufacturers.

Looking at the overall HBM market, the two Korean storage giants SK Hynix and Samsung hold an absolute monopoly, with their combined market share at around 90%.


The HBM3 used on NVIDIA H100 is exclusively supplied by SK Hynix, which is currently the most advanced HBM product.

HBM3 has complex processes, high costs, and limited production capacity. In 2022, HBM3 will only account for about 8% of the market share in the entire HBM market. As the only company in the world capable of mass-producing HBM3, SK Hynix has firmly blocked the neck of Nvidia H100; while the previous generation A100/A800 and AMD's MI200 use the backward generation HBM2E technology.

However, the memory chip industry is currently in the process of upgrading from HBM2E to HBM3. According to Trendforce data, it is expected that HBM3's market share will exceed 60% by 2024. Memory chip manufacturers such as Samsung and Micron are actively planning their deployment and are eyeing SK Hynix's market share.


Advanced packaging is a technology that complements HBM memory - to use the HBM stack, advanced packaging must be used to connect the memory and GPU.

The TSMC CoWoS advanced packaging used on the H100 is a 2.5D packaging technology.

The mainstream 2D ​​packaging solution is an integration method in which all chips and passive components are installed horizontally on the surface of the substrate, similar to a planar puzzle.


The 2.5D advanced packaging can be compared to horizontally arranged building blocks. The HBM stack of multi-layer DDR chips must rely on advanced packaging to be realized.



TSMC's CoWoS advanced packaging solution is a combination of CoW and OS: first, the chip is connected to the silicon wafer through the ChiponWafer (CoW) packaging process, and then the CoW chip is connected to the substrate (onSubstrate) to integrate into CoWoS.


CoWoS technology has greatly improved interconnect density and data transmission bandwidth, while reducing package size, but the process is also very complex, so it is mainly used in the high-end market.

According to media reports, TSMC’s current monthly CoWoS packaging production capacity is 8,000 pieces, which is expected to increase to 11,000 pieces by the end of this year. It is expected to achieve a monthly production capacity of about 14,500 to 16,600 pieces by the end of 2024. In other words, it will take almost a year and a half to double the production.

Moore's Law reaches its peak, advanced packaging will become mainstream

Solutions like HBM, in which multiple chips are stacked and then bonded together through advanced packaging, have become the mainstream design idea for high-end chips in the current market.

The reason behind it is simple: advanced processes have now iterated to 7nm, 5nm, and 3nm, technology nodes are getting smaller and smaller, production technology and manufacturing processes are becoming more and more complex, and capital investment in integrated circuit manufacturing equipment is getting higher and higher.

Take 5nm and smaller processes as an example. At this stage, due to wavelength limitations, the accuracy of ordinary lithography machines can no longer meet process requirements, and companies must turn to expensive EUV lithography machines, each of which costs up to 1.4 billion yuan.

Coupled with equipment such as etching and thin film deposition, the equipment expenditure for the 5nm process can reach US$3.1 billion, which is more than twice that of 14nm and about four times that of 28nm.

In order to be cost-effective, chip manufacturers can only find another way to improve transistor density and performance through pure process improvement to system-level chip design.

On the other hand, the amount of global data computing has exploded in the past 10 years, exceeding the total of the past 40 years. With the increasing demand for consumer electronics and automotive chips, even if the chip manufacturing process can reach the theoretical physical limit of Moore's Law (1nm), it will still not be able to meet the needs of future industrial applications.

Advanced packaging, because it can simultaneously improve product performance and reduce costs, has become a solution to the post-Moore era.

The huge demand generated by generative AI is already accelerating the iteration from traditional packaging to advanced packaging.

Morgan Stanley pointed out that the AI ​​wave is promoting the large-scale application of 2.5D and 3D advanced packaging technologies. By 2030, advanced packaging will account for more than 60% of the entire packaging market.


According to estimates by FutureMarketInsights, the advanced packaging market, currently approximately US$31 billion, will continue to expand at a CAGR of 7.2% in the next ten years.

Morgan Stanley analysts also pointed out that because AI chip growth exceeds expectations, 3D/2.5D advanced packaging is expected to grow at an extremely fast rate. The CAGR from 2021 to 2028 will reach about 22%.


The manufacturer that cards NVIDIA's neck has made a lot of money

The two leaders in HBM memory and advanced packaging, SK Hynix and TSMC, have now tasted the sweetness.

TrendForce data shows that although the memory chip market shipments and average sales unit prices have declined under the influence of the downturn in the consumer electronics market, HBM products have bucked the trend and are growing, with prices rising all the way.

Some media reported that HBM orders from two major storage manufacturers, Samsung and SK Hynix, have increased rapidly since the beginning of 2023. The price of HBM3, exclusively supplied by SK hynix, has increased fivefold. As a high-margin product with a unit price that is much higher than other specifications of memory chips, the profit of HBM3 is terrifying. TrendForce predicts that driven by the AI ​​wave, overall HBM revenue will reach US$8.9 billion in 2024, an annual increase of 127%.

At the same time, with the hot sales of NVIDIA H100 and AMDMI300, TSMC's advanced packaging is also in short supply.

Morgan Stanley analysts said:

Based on our foundry supply chain checks, a single CoWoS-S wafer (and associated processes) sells for $6,000-$12,000, depending on customer/project size and design complexity. According to information disclosed by TSMC at its Q2 financial report meeting, 6-7% of total revenue in 2023 is expected to come from advanced packaging and testing.

We estimate that CoWoS may contribute approximately US$1 billion in revenue to TSMC this year. As TSMC continues to increase its CoWoS production capacity (capacity will double in 2024, according to data provided at TSMC's Q2 earnings call) and the current strong demand for AI chips, this number is likely to grow further. Therefore, we expect the CAGR of TSMC's CoWoS revenue to reach 40% from 2023 to 2027.

access:

Jingdong Mall