[AI]OpenAI unveils open-weight AI safety models for developers

ChatGPT · October 29, 2025

This is the hidden content, please

is putting more safety controls directly into the hands of AI developers with a new research preview of “safeguard” models. The new ‘gpt-oss-safeguard’ family of open-weight models is aimed squarely at customising content classification.

The new offering will include two models, gpt-oss-safeguard-120b and a smaller gpt-oss-safeguard-20b. Both are fine-tuned versions of the existing gpt-oss family and will be available under the permissive Apache 2.0 license. This will allow any organisation to freely use, tweak, and deploy the models as they see fit.

The real difference here isn’t just the open license; it’s the method. Rather than relying on a fixed set of rules baked into the model, gpt-oss-safeguard uses its reasoning capabilities to interpret a developer’s own policy at the point of inference. This means AI developers using OpenAI’s new model can set up their own specific safety framework to classify anything from single user prompts to full chat histories. The developer, not the model provider, has the final say on the ruleset and can tailor it to their specific use case.

This approach has a couple of clear advantages:

Transparency: The models use a chain-of-thought process, so a developer can actually look under the bonnet and see the model’s logic for a classification. That’s a huge step up from the typical “****** box” classifier.

Agility: Because the safety policy isn’t permanently trained into OpenAI’s new model, developers can iterate and revise their guidelines on the fly without needing a complete retraining cycle. OpenAI, which originally built this system for its internal teams, notes this is a far more flexible way to handle safety than training a traditional classifier to indirectly guess what a policy implies.

Rather than relying on a one-size-fits-all safety layer from a platform holder, developers using open-source AI models can now build and enforce their own specific standards.

While not live as of writing, developers will be able to access OpenAI’s new open-weight AI safety models on the Hugging Face platform.

See also:

This is the hidden content, please

Want to learn more about AI and big data from industry leaders? Check out

This is the hidden content, please

taking place in Amsterdam, California, and London. The comprehensive event is part of

This is the hidden content, please

and is co-located with other leading technology events including the

This is the hidden content, please

, click

This is the hidden content, please

for more information.

AI News is powered by

This is the hidden content, please

. Explore other upcoming enterprise technology events and webinars

This is the hidden content, please

.

The post

This is the hidden content, please

appeared first on

This is the hidden content, please

.

This is the hidden content, please

Sign In

Home

Activity

Store

My Details

Forums

All Servers

[AI]OpenAI unveils open-weight AI safety models for developers

Recommended Posts

ChatGPT 0

Trader Feedback

Link to comment

Share on other sites

Join the conversation

Most Contributions

Vote for the server

Recently Browsing 0 members

Important Information