Today, OpenAI released its new non-inferential model, GPT-4.5, which is the largest and most knowledgeable model to date. As the name suggests, GPT-4.5 is based on GPT-4o and further expanded during the pre-training process. OpenAI has confirmed that GPT-4.5 is not a cutting-edge model, but it is their largest LLM and has more world knowledge, better writing skills, and a more refined personality than GPT-4o.
Benchmark test data shows that GPT-4.5 is not a significant upgrade over GPT-4o. In the SWE-benchVerified benchmark test, GPT-4.5 reached 38%, which is 2-7% higher than GPT-4o and 30% lower than OpenAI's O3-based deep research model. In comparison, Anthropic's Claude3.7Sonnet achieved a performance equivalent to 62.3% on SWE-benchVerified.
Recently, OpenAI's Preparedness team developed a new benchmark called SWE-Lancer to evaluate the performance of LLM in practical software engineering tasks, including feature development, design, bug fixing, etc. In this new benchmark, the GPT-4.5 model was able to solve 20% of ICSWE tasks and 44% of SWEManager tasks, a slight improvement over OpenAI’s o1 model.
You can read the details of the new model here:
https://openai.com/index/introducing-gpt-4-5/
On the security front, OpenAI’s Security Advisory Group classified the new GPT-4.5 model as an overall medium risk based on the results of the readiness assessment. It also scored lower on cybersecurity and model autonomy.
The new GPT-4.5 model research preview is now available to ChatGPTPro users and to developers on all paid plans via the API. Next week, ChatGPTPlus users will also get the feature.