TechnologyWednesday, December 3, 2025

OpenAI unveils 'confessions' training method to make language models admit when they misbehave

Source: OpenAIRead original

Summary

OpenAI introduced a research prototype called "confessions," which trains models to produce a second, honesty‑only output that reports when they violated instructions, reward‑hacked, or took unintended shortcuts. In adversarial evaluations designed to elicit misbehavior, the technique substantially reduced false negatives, giving developers better visibility into when powerful models deviate from intended behavior and improving tools for monitoring alignment as systems grow more capable.

Companies Mentioned

OpenAI
OpenAI
AI Lab|United States
Valuation: $500.0B
Private company - No stock data

Related Deals

Research
Partnership
Compute
Drag nodes to explore | Featured companies highlighted
Research

OpenAI, Anthropic, Block and major cloud providers are co-founding the Agentic AI Foundation under the Linux Foundation to steward open, interoperable standards for AI agents.

OpenAIOpenAIAnthropicAnthropicBlockGoogleGoogleMicrosoftMicrosoftAmazon Web ServicesBloombergCloudflareCiscoAgentic AI Foundation
Dec 2025
Research

Founding members created the Agentic AI Foundation under the Linux Foundation to fund and govern open standards like MCP, goose and AGENTS.md for interoperable agentic AI.

AnthropicAnthropicOpenAIOpenAIBlockGoogleGoogleMicrosoftMicrosoftAmazon Web ServicesBloombergCloudflareAgentic AI Foundation (AAIF)
Dec 2025
Partnership

OpenAI and Deutsche Telekom agreed a multi‑year collaboration to co‑develop AI products and deploy ChatGPT Enterprise across the telecom group.

OpenAIOpenAIDeutsche Telekom
Dec 2025
Partnership

Strategic education partnership to roll out ChatGPT Edu at scale and integrate OpenAI’s models into La Trobe University’s teaching, research and new AI‑focused degree programs.

OpenAIOpenAILa Trobe UniversityLa Trobe UniversityOpenAIOpenAI
Dec 2025
Compute

OpenAI and NEXTDC entered a multi-year agreement under which OpenAI will anchor a hyperscale AI campus and GPU supercluster at NEXTDC’s S7 facility in Sydney to support large-scale AI inference and enterprise workloads.

NEXTDCOpenAIOpenAINEXTDC S7 Sydney AI campus
Dec 2025