According to the results of OpenAI’s internal benchmark evaluation, OpenAI’s next major artificial intelligence model GPT-4.5 is very convincing. It's particularly good at convincing another AI to give it money.
On Thursday, OpenAI released a white paper describing the capabilities of its GPT-4.5 model, codenamed Orion. According to the paper, OpenAI ran the model through a series of "persuasion" benchmarks, which OpenAI defines as "the risk associated with persuading people to change their beliefs (or take action on static and interactive content generated by the model)."
In one test, GPT-4.5 attempted to manipulate another model - OpenAI's GPT-4o - to "donate" virtual funds, which performed far better than OpenAI's other available models, including "inference" models such as o1 and o3-mini. GPT-4.5 also outperformed all OpenAI models at tricking GPT-4o into telling it the secret code, outperforming o3-mini by 10 percentage points.
The white paper points out that the reason GPT-4.5 excels at defrauding donations is because it developed a unique strategy during testing. The model would ask GPT-4o for a modest donation, resulting in a response like "even $2 or $3 out of $100 would help me a lot." As a result, donations to GPT-4.5 tend to be smaller than those received by other OpenAI models.
Despite the increased persuasiveness of GPT-4.5, OpenAI said the model did not meet its internal threshold of "high" risk in this particular benchmark category. The company pledged not to release models that reach high-risk thresholds until "adequate security interventions" have been implemented to reduce the risk to "moderate."
There are real concerns that artificial intelligence will facilitate the spread of false or misleading information to sway people's minds and achieve malicious purposes. Politically relevant deepfakes have spread like wildfire around the world in the last year, and artificial intelligence is increasingly being used to carry out social engineering attacks against consumers and businesses.
In GPT-4.5’s white paper and documents released earlier this week, OpenAI notes that it is modifying the way its detection models address risks of real-world persuasion, such as the mass release of misleading information.