Detecting and reducing scheming in AI models

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

Jat AI

Sep 17, 2025 - 18:00

Apollo Research and OpenAI developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in controlled tests across frontier models. The team shared concrete examples and stress tests of an early method to reduce scheming.

Tags:

Previous Article

Introducing Stargate UK

Supercharge your organization’s productivity with the Amazon Q Business browser ...

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.

Related Posts

Addendum to GPT-5 System Card: Sensitive conversations

Jat AI Oct 27, 2025

Funding grants for new research into AI and mental health

Jat AI Dec 2, 2025

More ways to work with your team and tools in ChatGPT

Jat AI Sep 25, 2025

浮动元件示例