CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies
Simulates a small coffee supply chain where agents run farms, roasters, and retailers over 90 days. Different models show very different communication styles and profit profiles. If you care about economic alignment and multi-agent markets, CoffeeBench is a ready-made sandbox. ([huggingface.co](https://huggingface.co/papers/2606.16613))
Issa Sugiura, Daichi Hattori