ArXiv Paper

Over-Searching in Search-Augmented Large Language Models

Roy Xie, Deepak Gopinath, David Qiu +4January 12, 2026

Summary

This work shows that search‑augmented models often call tools even when search hurts answers or wastes tokens. It introduces a cost‑aware metric and mitigation tricks, so teams can dial back needless retrieval instead of just adding more context.

Topics

rag efficiency evaluation

View Original View PDF

Related Content

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

This report compares seven frontier language and vision models across many safety tests, from basic benchmarks to adversarial red-teaming. It finds GPT-5.2 clearly safest overall while others trade off safety across languages, modalities, and threat models.

opendatalab/MinerU

Pipeline that converts messy PDFs and Office docs into clean markdown or JSON tuned for LLM and agent workflows. It's quickly becoming a standard pre-processing tool. Plug it in if you're serious about document-heavy RAG. ([github.com](https://github.com/trending?since=daily))

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.

ggml-org/llama.cpp

llama.cpp keeps pushing local LLM performance on CPUs and small GPUs. It’s still the reference for running big models on modest hardware. If you care about running the AI cheaply or on-device, you should track every major change here.