ArXiv Paper

SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning

Emre Can Acikgoz, Jinoh Oh, Jie Hao +7December 16, 2025

Summary

Argues that current task-oriented agents are over-optimized as passive followers and under-use conversation as an action. SpeakRL introduces a reinforcement-learning setup that rewards models for asking clarifying questions when the user’s intent is ambiguous, balancing ‘asking’ vs ‘acting’. On synthetic task-oriented dialogue scenarios, the trained agents substantially improve task completion rates without bloating the number of turns, suggesting that proactive clarification is a powerful, underused control knob.

Topics

llm agents dialogue rl

View Original View PDF

Related Content

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.

T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground

T-pro 2.0 is an open-weight Russian large language model focused on hybrid reasoning: it can answer directly or emit explicit reasoning traces, and it’s optimized for low-latency inference via speculative decoding. Alongside the model, the authors release a Russian instruction corpus, a math benchmark, and an EAGLE-based inference stack, making it a practical foundation for Russian-language reasoning applications.

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.