From Macro to Micro: Benchmarking Microscopic Spatial Intelligence on Molecules via Vision-Language Models
MiSI-Bench introduces "Microscopic Spatial Intelligence"—the ability to reason about invisible molecular 3D structures—and builds a massive VLM benchmark spanning 163k QA pairs over 4k molecules. Current VLMs lag well behind humans on many tasks, but a tuned 7B model can exceed human performance on some spatial transformations, highlighting both the promise and the need for domain knowledge in scientific AGI.
Zongzhao Li, Xiangzhe Kong