ArXiv Paper

MOA: Multi-Objective Alignment for Role-Playing Agents

Chonghua Liao, Ke Wang, Yuchuan Wu +2December 10, 2025

Summary

MOA is an RL framework that jointly optimizes multiple fine-grained rubrics for role-playing agents—such as persona consistency, domain knowledge, and dialogue quality—using multi-objective alignment and thought-augmented rollouts. An 8B model trained with MOA can match or surpass GPT‑4o and Claude on PersonaGym and RoleMRC, suggesting smaller models can be pushed far with better objective design. ([huggingface.co](https://huggingface.co/papers/2512.09756))

Topics

alignment rl agents evaluation

View Original View PDF

Related Content

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

This report compares seven frontier language and vision models across many safety tests, from basic benchmarks to adversarial red-teaming. It finds GPT-5.2 clearly safest overall while others trade off safety across languages, modalities, and threat models.

opendatalab/MinerU

Pipeline that converts messy PDFs and Office docs into clean markdown or JSON tuned for LLM and agent workflows. It's quickly becoming a standard pre-processing tool. Plug it in if you're serious about document-heavy RAG. ([github.com](https://github.com/trending?since=daily))