Estimating worst case frontier risks of open weight LLMs

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.

Jat AI

Aug 5, 2025 - 19:00

In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity.

Tags:

Previous Article

Cost tracking multi-tenant model inference on Amazon Bedrock

Open Weights and AI for All

Jat AI Stay informed with the latest in artificial intelligence. Jat AI News Portal is your go-to source for AI trends, breakthroughs, and industry analysis. Connect with the community of technologists and business professionals shaping the future.

Related Posts

Understanding neural networks through sparse circuits

Jat AI Nov 13, 2025

Netomi’s lessons for scaling agentic systems into the e...

Jat AI Jan 8, 2026

Expanding economic opportunity with AI

Jat AI Sep 4, 2025

浮动元件示例