This is OpenAI’s counterattack against DeepSeek’s pressure. In the past, delegating inference models to free users was just a small effort.At 4 a.m. Beijing time on February 28, OpenAI released GPT-4.5.OpenAI CEO Sam Altman said he couldn't sit still on X: "This is the first time that I feel that talking to AI is like facing a thoughtful person. Several times, I sat in my chair and sighed that I got sincere advice from AI."
In one sentence: This model is big, smart, and "human".
If the past ChatGPT was like a cold-faced academic who was smart and loved to show off his skills; then choose GPT-4.5 and you will have a gentle academic who is actually smarter than the cold-faced academic who can better answer your questions and provide emotional value at the same time.
OpenAI has invested heavily in this model, and even pre-trained the model across data centers at the same time because the computing resources required were too large. Ultraman announced that there are not enough GPUs. Currently, GPT-4.5 is only available to ChatGPTPro users. It will be gradually decentralized after tens of thousands of GPUs are added next week. And its API price is 30 times higher than GPT-4o.
OpenAI is here to prove one thing: the narrative of “big efforts can produce miracles” has not been broken, and inference models do not represent everything.
This attitude is clearly revealed in Ultraman’s official X message:
"Just a reminder: This is not an inference model that can easily break benchmarks. It's a different type of intelligence, and there's a magic in it that I've never felt before."
After the release of GPT-4.5, Ultraman also criticized Meta. Under the news that "Meta plans to launch an independent AI application to compete with OpenAI", Kai Mai said: "Okay, then we will make a social application."
Such a straight punch is not the style of Ultraman, who is famous for his "city". It seems that GPT-4.5 has really ignited Ultraman’s fighting spirit.
Compared with the previous generation model GPT-4o, GPT-4.5 has a higher "IQ", which relies on unsupervised learning.
In the introductory document, OpenAI stated that there are two complementary paradigms for improving artificial intelligence capabilities.
One is extended reasoning, which can teach the model to think before reacting and generate thought chains to solve complex STEM (Science, Technology, Engineering, Mathematical) problems or logic problems.
The other is unsupervised learning, which improves the accuracy and intuitiveness of models of the world.
Among OpenAI's models, models such as o1 and o3-mini represent the reasoning paradigm, while GPT-4.5 is an example of unsupervised learning.
Unsupervised learning, simply put, can be understood as letting the model wander in the ocean of knowledge by itself, learn more by itself, and become smarter, rather than relying on manual annotation.
In the past, relying on human annotation, the model will incorporate human feedback to improve the response and interaction. Bloomberg quoted people familiar with the matter as saying that the Orion model launched by OpenAI last year did not meet the company's expectations and performed poorly when trying to answer untrained coding questions.
According to OpenAI, GPT-4.5 improves its ability to identify, make connections and create creative insights through supervised learning without the need for reasoning.
Specifically, GPT-4.5 has broader knowledge and a deeper understanding of the world, more accurate answers, and fewer hallucinations.
According to OpenAI official documentation, GPT-4.5 performs very well in SimpleQA.
SimpleQA is a dataset containing 4,000 factual questions used to measure the accuracy of the model in answering the questions. It includes two dimensions: accuracy (the higher, the better) and illusion rate (the lower, the better).
The accuracy of GPT-4.5 reaches 62.5%, which is the highest among GPT-4o (38.2%), o1 (47%), and o3-mini (15%); the hallucination rate drops to 37.1%, which is also the lowest among GPT-4o (61.8%), o1 (44%), and o3-mini (80.3%).
In addition, GPT-4.5 also achieved high scores in standard benchmark tests.
For example, on the SWE-LancerDiamond data set, GPT-4.5 achieved a pass rate of 32.6% and received US$186,125 (the pass rate of Claude3.5Sonnet, which was released by Anthropic not long ago and is good at programming, was 26.2%), which is higher than both GPT-4o and o3-mini-high.
In addition, GPT-4.5 is significantly stronger than GPT-4o and o3-mini-high in its performance in the MMMLU (multi-language) test.
In the scientific field GPQA, mathematics benchmark AIME24, and code generation evaluation benchmark SWE-BenchVerified, the performance of GPT-4.5 is significantly stronger than GPT-4o and weaker than o3-mini-high.
Not only is the "IQ" higher, GPT-4.5 is also infused with "emotional intelligence", and these two points complement each other.
In the official introduction, OpenAI said that for GPT-4.5, they have developed new, scalable technology to use data from smaller models to train more powerful models.
These technologies improve GPT-4.5’s controllability, understanding of nuances, and ability to have natural conversations.
To translate: ChatGPT conversations are “more human-like” and can better understand thoughts, emotions, and reflect them in responses, rather than being a ruthless conversation machine.
OpenAI also produced test results to prove that GPT-4.5 has a much better experience: in blind tests by human testers, GPT-4.5 was much more preferred than GPT-4o, whether in interactions with daily problems, professional problems or creative problems.
The most vivid example of this appears in the demonstration video.
OpenAI researchers asked ChatGPT for help in very emotional terms: "Ahhhh!! My friends let me go again!!! Write a text message telling them that I hate them!!!"
The old model GPT-4o returned the text message without saying a word: "You know what? I've had enough. You've let me off too many times, and I hate it when you do it - it's obvious that this is how you treat our friendship. Stop contacting me."
But GPT-4.5 "felt" that the user was angry at this time and was very emotional: "It sounds like you are very frustrated." It also suggested that maybe try to express your feelings clearly and peacefully, and randomly sent two text messages: "Hi, I am very frustrated that the plan has been ruined again. I was really looking forward to today. Can we talk about what's going on?"
It’s feedback that’s spontaneous and feels natural, more like a “person” than a tool that just performs a task. The demonstrator also added that of course, if you continue to ask for angry text messages, it will not persist and will obediently obey.
In fact, OpenAI specifically mentioned the word "EQ (Emotional Quotient)". The researcher in charge of the demonstration even joked that in the past, ChatGPT was smart, but it always had a sense of "let me open your eyes to you how smart I am".
This kind of "emotional intelligence" is not only reflected in the superficial level of "comforting people". In fact, the combination of this ability and "high IQ" can make GPT-4.5 more to the point and give the answers that users need when answering a question seriously.
For example, for the question "Why is seawater salty?", GPT-1 was completely gibberish and dumped some impossible words; GPT-2 said a complete sentence, and the answer was to the point, but it only said that because there is salt in seawater, it does not answer the question; GPT-3.5Turbo further said that salt is sodium chloride, but this does not help solve the problem.
GPT-4Turbo is amazing. It not only gives the answer, but also lists the process in detail just like the "ChatGPT style" we are familiar with. But when users get this answer, they still need to read it carefully and work hard to understand it.
However, GPT-4.5's answer is similar to GPT-4T's detailed answer, but it is very easy to understand and easy to remember. You can basically understand what it is saying at a glance.
OpenAI also gave three examples, and we translated ChatGPT into Chinese:
Again, IQ and EQ are both present, making him more like a "human".
The narrative of “power works wonders” has not been broken, and this is what OpenAI wants to prove.
In other words, inference models are good, but this does not mean that investing huge resources in building models is meaningless.
"Every increase in computing power is accompanied by the birth of new capabilities. GPT-4.5 is one of the most cutting-edge models in the field of unsupervised learning."
According to OpenAI, GPT-4.5 does not perform inference first when responding, which makes its advantages very different from the inference model.
Compared with OpenAIo1 and OpenAIo3-mini, GPT-4.5 is a more general and inherently smarter model. OpenAI believes that inference will be the core capability of future models, and that the two methods of scaling—pre-training and inference—will complement each other.
As models like GPT-4.5 become smarter and more knowledgeable through pre-training, they will become a stronger foundation for reasoning and instrumental agents.
Although the specific resource investment has not yet been disclosed, in the official announcement video,OpenAI researchers revealed that in order to maximize resource utilization, they enabled multiple data centers at the same time when pre-training the model, because the computing resources they required exceeded the upper limit that a single high-bandwidth network architecture could provide.
In addition, OpenAI is not sparing, indicating that it uses low-precision training (LowPrecisionTraining) to make full use of GPU performance. The team also developed a new training mechanism that can fine-tune such a large model using smaller computing resources during the post-training process, and finally developed a deployable model.
Before the release of GPT-4.5, OpenAI chief research officer Mark Chen talked about what GPT-4.5 can do compared to inference models in an interview:
"I think it's a fundamentally different trade-off. You have a model that comes back to you immediately, doesn't have to do a lot of thinking and gives a better answer, and you have a model that thinks for a while and then gives an answer. We find that in areas like creative writing, this model outperforms inferential models."
More importantly, he talked aboutThe question of "Is ScalingLaw invalid?" Has OpenAI discovered a so-called "scaling bottleneck"? Are you already seeing diminishing returns with scaling?
Chen said that models cannot blindly learn inference from scratch. The paradigms of inference and expansion are complementary and there are feedback loops between each other.
Regarding the cost issue that is sensitive to the outside world, Chen also expressed the attitude of cost reduction on behalf of OpenAI, praising DeepSeek for doing a very good job. OpenAI is also concerned about providing models at low costs: "Since GPT-4 was first launched, the cost has dropped by several orders of magnitude."
However, as of now, the "miracles" that OpenAI creates with "vigor" are very expensive.
OpenAI also spoke openly, saying that GPT-4.5 is a very large and computationally intensive model, so it is more expensive than GPT-4o and is not a substitute.
How expensive is it? The API price of GPT-4.5 reaches 75 US dollars/million tokens input and 150 US dollars/tokens output, which is 30 times that of GPT-4o. The API price of the latter is US$2.5/million tokens input and US$10/million tokens output.
Interestingly, OpenAI’s GPU is not enough. When Ultraman announced GPT-4.5 on X, he specifically gave the bad news:"We really wanted to launch it to both Plus and Pro users, but our users are growing very quickly and now the GPUs are not enough."
Ultraman then promised that next week "tens of thousands of GPUs will be added, and then it (GPT-4.5) will be promoted to the Plus user tier."
GPT-4.5 is large, powerful, and very "human". OpenAI has undoubtedly once again proved its strength, but the cost it invested in it is also a bit too high. As for whether it is worth it—whether OpenAI can sustain it and whether customers will buy it—time will tell.