ArXiv Paper

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Chaoyang Wang, Kaituo Feng, Dongyang Chen +8December 18, 2025

Summary

AdaTooler-V teaches vision-language models when to call external tools, not just how. That cuts unnecessary tool calls, reducing costs while often boosting accuracy on vision tasks.

Topics

multimodal agents tools efficiency

View Original View PDF

Related Content

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

OPV (Outcome-based Process Verifier) is a verifier model that inspects the rationale steps of long chains-of-thought via summarized outcomes, combining the strengths of outcome-based and process-based verification. Trained with an active learning loop, rejection fine-tuning, and RLVR, OPV reaches strong F1 on OPV-Bench and outperforms much larger models like Qwen3-Max-Preview at detecting reasoning errors.

Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

This work presents a long-horizon reasoning agent for Olympiad-level math that uses an Outcome-based Process Verifier (OPV) to supervise and clean up very long chains-of-thought. By summarizing and checking reasoning segments rather than only final answers, and training OPV via iterative active learning and RLVR, the system achieves new SOTA on a held-out benchmark while reducing annotation cost.

huggingface/transformers

The standard library for state-of-the-art models in text, vision, audio, and combined formats. If you build with open models, you almost certainly depend on this already.

opendatalab/MinerU

Pipeline that converts messy PDFs and Office docs into clean markdown or JSON tuned for LLM and agent workflows. It's quickly becoming a standard pre-processing tool. Plug it in if you're serious about document-heavy RAG. ([github.com](https://github.com/trending?since=daily))