MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data
Introduces MedInsightBench, a benchmark for ‘analytics agents’ that must reason over multimodal medical data—think tables, images, and reports—to extract multi-step clinical insights rather than just answer single questions. The tasks force agents to chain together retrieval, interpretation, and aggregation across data sources, closer to what real analytics workflows look like in hospitals. This is important if you care about LLM agents that move beyond toy QA into realistic decision support.
Zhenghao Zhu, Chuxue Cao