GPT-5.1-Codex-Max, claimed to be the most advanced agent coding model, was launched for paid ChatGPT users on Thursday. According to OpenAI, the SWE-Bench Pro test accuracy of the new model is 56.4%, which is higher than the 55.6% of GPT-5.2. The evaluation found that its network security capabilities have been greatly improved. Although it has not yet reached the "high" level, the company is preparing to cross this threshold.

Altman said the model will have a net benefit to cybersecurity and that we are in the "real impact phase" and are beginning to explore trusted access programs for defensive cybersecurity efforts. OpenAI is piloting a trusted access program that invites professionals.

One week after releasing the GPT-5.2 series of models, OpenAI took action again. On Thursday the 18th, Eastern Time, it launched a new generation of Codex model GPT-5.2-Codex based on GPT-5.2. It is known as the most advanced agent coding model, focusing on professional software engineering and defensive network security, further consolidating its competitive advantage over Google Gemini in the field of AI programming.

According to OpenAI, GPT-5.2-Codex has achieved breakthroughs in coding performance, network security capabilities, and long-cycle task processing. The accuracy of GPT-5.2-Codex reached 56.4% in the SWE-Bench Pro test and 64.0% in the Terminal-Bench 2.0 test, breaking two benchmark test records. The model has been opened to paid ChatGPT users on all Codex interfaces on the day of release, and API user access is in progress.

OpenAI particularly emphasizes GPT-5.2-Codex’s significant improvements in network security. CEO Sam Altman mentioned that earlier this month, a security researcher discovered and responsibly disclosed a vulnerability in React that could lead to source code exposure using the previous generation model GPT-5.1-Codex-Max. OpenAI believes that the new model has not yet reached a "high" level of cybersecurity capabilities, but the company is preparing for future models to cross this threshold.

OpenAI said that GPT-5.2-Codex was released to paid ChatGPT users on all Codex interfaces on Thursday and is working to securely enable access for API users in the coming weeks. The company plans to maximize the impact of defenses while mitigating the risk of abuse through a gradual rollout, a combination of deployment and protection measures, and close collaboration with the security community.

Thursday’s release continues OpenAI’s offensive in the field of AI programming.

When GPT-5.2 was released last week, OpenAI cited the user experience of coding startups and said that the model has "the most advanced agent coding performance". It also disclosed that the Thinking version of GPT-5.2 achieved the highest score in history in the SWE coding ability test, becoming OpenAI's first model whose performance reaches or exceeds the level of human experts. The move is seen as a direct response to Google Gemini 3's positive reviews for its coding and reasoning capabilities.

Coding performance has been upgraded and optimized for large-scale actual combat scenarios.

GPT-5.2-Codex is an optimized version of GPT-5.2, specifically enhanced for agent coding in Codex. OpenAI said the new model achieves improvements in three key areas: improved long-term work capabilities through context compression, stronger performance on project-level tasks such as refactoring and migration, and improved performance in Windows environments.

In the benchmark test, GPT-5.2-Codex achieved an accuracy of 56.4% in the SWE-Bench Pro test, which is higher than GPT-5.2’s 55.6% and GPT-5.1’s 50.8%. In the Terminal-Bench 2.0 test, the accuracy of GPT-5.2-Codex was 64.0%, GPT-5.2 was 62.2%, and GPT-5.1 was 58.1%. SWE-Bench Pro requires the model to generate patches in a given code base to solve actual software engineering tasks, while Terminal-Bench 2.0 tests the AI ​​agent's ability to complete tasks such as compiling code, training models, and setting up servers in a real terminal environment.


GPT-5.2-Codex has improvements in long-context understanding, reliable tool invocation, improved realism, and native compression, making it a more reliable partner in long-term encoding tasks while maintaining token efficiency during inference. Stronger visual performance enables GPT-5.2-Codex to more accurately interpret screenshots, technical diagrams and user interfaces, and can quickly transform design drafts into functional prototypes.


OpenAI said that with these improvements, Codex can work in large code bases for long periods of time, maintain complete context, and more reliably complete complex tasks such as large-scale refactoring, code migration, and feature building, without losing track even if plans change or attempts fail.

Cyber ​​security capabilities have jumped significantly in preparation for crossing the "high" level threshold

Network security has become another key breakthrough area of ​​GPT-5.2-Codex. OpenAI observed in the core network security assessment that there was a sharp jump in capabilities starting from GPT-5-Codex, GPT-5.1-Codex-Max achieved another significant improvement, and now GPT-5.2-Codex has completed its third jump.

In professional Capture the Flag competition evaluations, GPT-5.2-Codex demonstrated the ability to solve advanced multi-step real-world challenges that require professional-level cybersecurity skills. According to OpenAI's readiness framework assessment, although GPT-5.2-Codex has not yet reached the "high" level of network security capabilities, the company expects that future AI models will continue to develop along this trajectory, and is planning and evaluating according to the standard that each new model may reach the "high" level.

A real-life case highlights the defensive cybersecurity potential of the new model. On December 11, the React team announced three security vulnerabilities affecting applications built using React server components. Andrew MacPherson, chief security engineer at Privy, a subsidiary of Stripe, was using GPT-5.1-Codex-Max and Codex CLI to research another serious vulnerability called React2Shell. By guiding Codex to perform standard defensive security workflows, he accidentally discovered these previously unknown vulnerabilities and responsibly disclosed them to the React team.

Altman disclosed on social platforms: "Last week, a security researcher using our previous generation (Codex) model discovered and disclosed a vulnerability in React that could lead to source code exposure. I believe these models will have a net benefit for network security, but as they improve, we are in the 'real impact phase.'"


Launched Trusted Access program to provide special access to security professionals

In order to balance capability improvement and security risks, OpenAI has added additional protection measures at both the model and product levels to enhance network security capabilities, including specialized security training for harmful tasks and prompt injection, agent sandboxes, and configurable network access. Meanwhile, the company is piloting an invitation-only trusted access program.

The program will initially be open only to vetted security professionals and organizations with clear professional cybersecurity use cases. Eligible participants will gain access to OpenAI's most powerful models for defensive work, enabling them to conduct legitimate dual-use work such as vulnerability research or authorized red team testing, while removing limitations that security teams may encounter when emulating threat actors, analyzing malware, or stress testing critical infrastructure.

"We're starting to explore a trusted access program for defensive cybersecurity work," Altman said on


Risk warning and disclaimer