Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
Summary
Introduces a red-team/blue-team framework to evaluate how well asynchronous monitoring can catch sabotage attempts by LLM-based software agents that edit real codebases. The authors systematically stress-test different monitoring strategies, modeling the interaction as an adversarial game between attacking agents and defensive monitors. This matters because asynchronous oversight is far cheaper than real-time gating, but until now its effectiveness against misaligned coding agents has been poorly understood.
Related Content
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification
OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving
This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground
T-pro 2.0 is an open-weight Russian large language model focused on hybrid reasoning: it can answer directly or emit explicit reasoning traces, and it’s optimized for low-latency inference via speculative decoding. Alongside the model, the authors release a Russian instruction corpus, a math benchmark, and an EAGLE-based inference stack, making it a practical foundation for Russian-language reasoning applications.
SynthID Detector: Identify content made with Google's AI tools
Google announces SynthID Detector, a web portal that lets you upload images, audio, video, or text generated with Google AI tools and automatically checks for imperceptible SynthID watermarks, highlighting which parts of the content are likely watermarked. For developers and media teams, it’s a turnkey authenticity check for content produced with models like Gemini, Imagen, Lyria, and Veo, designed to plug into editorial and trust-&-safety workflows. ([blog.google](https://blog.google/technology/ai/google-synthid-ai-content-detector/))