According to news on February 24, although investors once had doubts, huge amounts of money from large technology companies, governments and venture capital institutions are flowing into the field of artificial intelligence at an unprecedented rate. To understand the reasons behind this phenomenon, the key is to gain insight into the evolutionary trends of artificial intelligence technology itself.

Currently, artificial intelligence technology is transforming from traditional large language models to reasoning models and AI agents. The training process of traditional large language models, which are used by most free AI chatbots, consumes huge amounts of power and computing time. However, as technology advances, we are quickly finding ways to reduce the resources required to run these models when users invoke them. In contrast, the actual running process of an inference model based on a large language model consumes several times more computing and power resources than a traditional model.

Since OpenAI released its first inference model o1 in September 2024, artificial intelligence companies have accelerated the launch of systems that can compete with it. This includes DeepSeekR1, which shook up the entire artificial intelligence industry and sent valuations tumbling across many technology and energy companies. Last week, Elon Musk’s artificial intelligence startup xAI also launched its inference model Grok3.

The launch of DeepSeek caused a degree of panic because it demonstrated that artificial intelligence models can be trained more cheaply, potentially cutting the need for data centers and expensive advanced chips. However, DeepSeek actually pushed the artificial intelligence industry more firmly towards resource-intensive inference models, which means that the demand for computing infrastructure is still very strong.

Given the greater capabilities of inference models, they may soon become the default way people use artificial intelligence to perform a variety of tasks. OpenAI CEO Sam Altman has said that the next major upgrade to the company's artificial intelligence models will include advanced reasoning capabilities.

So why do inference models and their related products, such as “deep research” tools and AI agents, require so many computing resources? The answer lies in how they work.

Kari Briski, vice president of product management for artificial intelligence at Nvidia, explained in a recent blog that artificial intelligence reasoning models usually consume more than 100 times the computing resources of traditional large language models. This is because the reasoning model needs to talk to itself for a long time in the "thinking chain", and these reasoning processes are often invisible to users. The computing resources consumed by a model are proportional to the vocabulary it generates, so if an inference model generates 100 times the vocabulary of a conventional model, it will also consume corresponding power and computing resources.

When inference models are connected to the Internet, as Google, OpenAI, and Perplexity's "deep research" models do, resource consumption will be even greater. These models’ demands on computing resources are just the beginning. In this regard, Google, Microsoft and Meta plan to invest a total of at least US$215 billion in capital expenditures in 2025, most of which will be used to build artificial intelligence data centers. This marks a 45% increase in their capital expenditures from last year.

In January this year, with the release of China's AI model DeepSeekR1, the cost of computing power per token (including electricity and hardware expenses) seems to be about to drop off a cliff. DeepSeek has proved through papers that the training and deployment efficiency of its AI model far exceeds the methods previously disclosed by the US AI laboratory.

On the surface, this seems to indicate that artificial intelligence's future demand for computing resources will be significantly reduced, perhaps only one-tenth of the current demand, or even less. But as inference models become available, the demand for computing resources when they answer queries is likely to increase significantly. In short, if new efficient models based on DeepSeek technology reduce the demand for AI computing power to one-tenth, and the popularity of inference models increases the demand for use by 100 times, the overall computing power demand will still increase by 10 times in the future.

And this is just the starting point. As companies discover new AI models that are more capable, they will call on those models more and more frequently, shifting the demand for computing resources from model training to model use, what the AI ​​industry calls "inference."

Tuhin Srivastava, CEO of Baseten, which provides artificial intelligence computing resources to other companies, said this shift to inferencing needs is already underway. His clients include tech companies using artificial intelligence in apps and services, such as Descript, which allows content creators to edit audio and video through transcription, and PicnicHealth, a startup that processes medical records.

Srivastava said that as customers' demand for their own products grew rapidly, they found the need for more artificial intelligence processing power. He added: “Six months ago, we helped a customer reduce their computing resource requirements by 60%, but just three months later, their computing power consumption had exceeded the original level.”

Companies such as OpenAI, Google and Meta are still racing to train more capable AI models. No matter how high the cost, their goal is to seize as much of the nascent artificial intelligence market as possible. “I think it’s likely that cutting-edge labs will need to continue to invest huge sums of money to advance cutting-edge technologies,” said Chris Taylor, CEO of FractionalAI. His company, like Baseten and many others in the booming AI ecosystem, relies on these cutting-edge models to serve its customers.

Venture capitalist and TheoryVentures founder Tomasz Tunguz predicts that in the next few years, new innovations and more artificial intelligence-specific microchips may make artificial intelligence systems more efficient than today, or increase the efficiency of end systems by a thousand times. Investors and big tech companies are betting that demand for artificial intelligence models is likely to grow dramatically over the next decade due to the popularity and rapid adoption of inference models.

"Every keystroke you make, or every syllable you speak into a microphone, every operating node will be processed in real time by at least one AI system," Tunguz said. If that were the case, he added, the AI ​​market could soon be 1,000 times larger than it is now.