GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents
Builds a carefully matched benchmark where GUI agents and command-line agents solve identical desktop tasks under the same checks. Finds GUI agents fail on long, brittle interactions, while CLI agents are limited by missing skills, not raw intelligence. If you design computer-use stacks, this tells you where to invest next. ([huggingface.co](https://huggingface.co/papers/2606.24551))
Xiao Zhou, Siyue Zhang