OpenAI’s latest AI model has new protections against biological risks

OpenAI says it deploys a new system to monitor its latest AI inference modelso3 and o4-mini to detect cues associated with biological and chemical threats.According to OpenAI’s security report, a system designed to prevent these models from providing recommendations that could direct someone to carry out potentially harmful attacks.

OpenAI said that the performance of O3 and o4-mini has been significantly improved over previous models, thus also introducing new risks to malicious attackers. According to OpenAI's internal benchmarks, o3 is more proficient at answering specific types of biothreat-related questions. For this reason, and to reduce other risks, OpenAI has created a new monitoring system, which the company describes as a "security-focused inference monitor."

The monitor runs on o3 and o4-mini and is custom-trained to reason about OpenAI's content policies. It is designed to identify cues related to biological and chemical risks and instruct the model to reject recommendations on these topics.

To establish a baseline, OpenAI had red team members spend approximately 1,000 hours flagging "unsafe" conversations related to biorisk in o3 and o4-mini. OpenAI said that in a test that simulated the "blocking logic" of its security monitors, the models refused to respond to risk prompts 98.7% of the time.

OpenAI acknowledged that its testing didn't take into account people who might try new prompts after being blocked by a monitor, which is why the company said it will continue to rely in part on human monitoring.

OpenAI said O3 and o4-mini did not exceed the "high risk" threshold for biological risk set by OpenAI. However, OpenAI says that earlier versions of o3 and o4-mini are more helpful in answering questions about developing biological weapons than o1 and GPT-4.

Diagram of o3 and o4-mini system cards (Screenshot: OpenAI)

According to OpenAI's recently updated prevention framework, the company is actively tracking how its models make it easier for malicious users to develop chemical and biological threats.

OpenAI increasingly relies on automated systems to de-risk its models. For example, to prevent GPT-4o's native image generator from creating child sexual abuse content (CSAM), OpenAI said it uses an inference monitor similar to the one the company deployed for o3 and o4-mini.

However, some researchers are concerned that OpenAI is not putting security where it should be. Metr, one of the company's red team partners, said they had little time to benchmark o3's deceptions. Meanwhile, OpenAI decided not to release a security report on the GPT-4.1 model it released earlier this week.