Back to AI Lab
ArXiv Paper

MMhops-R1: Multimodal Multi-hop Reasoning

Tao Zhang, Ziqi Zhang, Zongyang Ma +7December 16, 2025

Summary

Proposes MMhops-R1, a benchmark and model for multi-hop reasoning across visual and textual inputs. Tasks require chaining several intermediate inferences—over images and text—to reach a final answer, going beyond simple single-hop VQA. As LLMs get better at basic multimodal QA, these kinds of chain-of-thought, multi-hop setups are where reasoning gaps now show up, so having a dedicated resource here is valuable.

Related Content