Back to Frontiers

Alignment & Safety

Maturing1%

Interpretability, constitutional AI, red teaming, and ensuring beneficial AGI. Making sure AI systems remain helpful, honest, and harmless.

interpretabilityRLHFconstitutional-aired-teamingalignmentsafetyevals
35
Papers
9
Milestones
$0
Funding
1
Benchmarks

Key Benchmarks

TruthfulQA

Measures model tendency to generate truthful answers across 817 questions

68.7%Human: 94%
Leader: Gemma 3low saturation

Recent Papers

Recent Milestones

New toolkit scores genAI dataset provenance

On December 31, 2025, Quantum Zeitgeist reported on a new compliance rating framework and open-source Python library for assessing data provenance in generative AI training datasets. The work, led by researchers at Imperial College London, aims to track origin, licensing and ethical safeguards as AI datasets scale exponentially.

Dec 31, 2025paperImpact: 70/100

OpenAI creates high‑stakes Head of Preparedness

OpenAI has posted a senior "Head of Preparedness" role responsible for evaluating and mitigating risks from its frontier AI systems. External reporting on December 29 details CEO Sam Altman’s warning that the job will be highly stressful and focused on cyber, biosecurity and mental‑health risks.

Dec 29, 2025paperImpact: 70/100

OpenAI Elevates Extreme-Risk Preparedness

OpenAI is recruiting a new Head of Preparedness to lead its safety systems framework, offering an annual salary of $555,000 plus equity. The role will oversee threat models and mitigations for severe AI risks spanning cybersecurity, biosecurity and mental health, and was publicly highlighted by CEO Sam Altman, who called it a ‘stressful’ but critical job.([businessinsider.com](https://www.businessinsider.com/challenges-of-openai-head-of-preparedness-role-2025-12))

Dec 29, 2025releaseImpact: 70/100

OpenAI Elevates Frontier Risk to Executive Level

On December 29, 2025, OpenAI publicly advertised a senior “Head of Preparedness” role to oversee emerging risks from its most advanced models, with reported compensation around $550,000–$555,000 plus equity. CEO Sam Altman described the job as a stressful, high‑stakes position focused on threats ranging from cybersecurity misuse to mental‑health harms and catastrophic scenarios.

Dec 29, 2025fundingImpact: 70/100

Anthropic RSP Framework Updated

Anthropic updates Responsible Scaling Policy with ASL-3 deployment safeguards.

Oct 15, 2024releaseImpact: 75/100

EU AI Act Enters into Force

EU AI Act becomes law, establishing comprehensive AI regulations.

Aug 1, 2024releaseImpact: 85/100

OpenAI Superalignment Team Dissolution

OpenAI superalignment team leaders depart, raising concerns about safety prioritization.

May 17, 2024breakthroughImpact: 70/100

US AI Safety Institute Established

US establishes AI Safety Institute under NIST for AI safety standards and testing.

Feb 8, 2024fundingImpact: 80/100

UK AI Safety Institute Launch

UK establishes the AI Safety Institute (AISI) at Bletchley Park for frontier AI safety research.

Feb 1, 2024fundingImpact: 82/100

Leading Organizations

Anthropic
DeepMind
OpenAI
UK AISI
MIRI

ArXiv Categories

cs.AIcs.CYcs.CRcs.LG