When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models
Analyzes 67 models and shows that any system choosing a single model’s answer is capped by how often all models fail together. Provides practical bounds on how much routing or voting can help. If you're building ensemble/agent stacks, this sets a hard ceiling you should calculate. ([huggingface.co](https://huggingface.co/papers/2606.27288))
Josef Chen