OpenAI today announced the launch of two new open weight models for the AI ​​security field - gpt-oss-safeguard-120b and gpt-oss-safeguard-20b. These security classification models are optimized based on the previously released gpt-oss series of open models and are also open under the Apache 2.0 license, allowing anyone to freely use, modify and deploy them.

The biggest feature of the new model is that it provides developers with the ability to conduct inference and classification directly based on custom security policies, abandoning the "one-size-fits-all" security system. Developers can enter their own security policies and content to be detected during inference, and the model will classify based on the policies and give reasoning reasons. Policies can be changed as they are used and can be flexibly adjusted to improve performance. gpt-oss-safeguard can classify user messages, chat replies, and even complete conversations.

OpenAI points out that this new type of model is particularly suitable for the following situations:

  • Potential hazards are emerging or evolving, and policies need to adapt quickly;

  • Some areas are highly granular and difficult for traditional small classifiers to handle;

  • Developers lack a large number of high-quality samples and have difficulty training high-level classifiers for various risks on the platform;

  • Classification result quality and interpretability are prioritized over delayed performance.

It should be noted that gpt-oss-safeguard also has certain limitations. OpenAI stated that if the platform has a large number of labeled samples and can train traditional classifiers, the latter may still be better than gpt-oss-safeguard in complex or high-risk scenarios, and the customized model will be more accurate. In addition, this new model has slow processing speed and large resource consumption, making it unsuitable for large-scale content real-time screening.

Currently, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are available for free download:

https://huggingface.co/collections/openai/gpt-oss-safeguard