ArXiv Paper

Rethinking Continual Experience Internalization for Self-Evolving LLM Agents

Jingwen Chen, Wenkai Yang, Shengda Fan +7June 5, 2026

Summary

This work studies how agents should turn past trajectories into lasting skills without collapsing or forgetting. It compares different ways to chunk experience, inject it back into the model, and train on it off-policy. If you’re building agents that learn over weeks or months, the design choices here matter more than raw model size.

Topics

agents training memory

View Original View PDF

Related Content

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.

opendatalab/MinerU

Pipeline that converts messy PDFs and Office docs into clean markdown or JSON tuned for LLM and agent workflows. It's quickly becoming a standard pre-processing tool. Plug it in if you're serious about document-heavy RAG. ([github.com](https://github.com/trending?since=daily))