Be ahead of the curve
Research papers, repositories, and articles about long horizon
Showing 1 of 1 items
OdysseyArena tests agents on environments where they must discover hidden rules over hundreds of steps, not just follow given instructions. Even top models struggle, showing that long-run discovery and strategy remain weak points.