Back to AI Lab
Technical Article

MTEB Leaderboard: From a slow demo to feature-rich leaderboard

HuggingFace BlogJune 12, 2026

Summary

HuggingFace’s team rebuilt the MTEB embedding leaderboard to be much faster and more navigable. You can now slice models by task, filter aggressively, and actually pick the right embedding model instead of chasing a single score.

Related Content

SynthID Detector: Identify content made with Google's AI tools

Google announces SynthID Detector, a web portal that lets you upload images, audio, video, or text generated with Google AI tools and automatically checks for imperceptible SynthID watermarks, highlighting which parts of the content are likely watermarked. For developers and media teams, it’s a turnkey authenticity check for content produced with models like Gemini, Imagen, Lyria, and Veo, designed to plug into editorial and trust-&-safety workflows. ([blog.google](https://blog.google/technology/ai/google-synthid-ai-content-detector/))

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

This report compares seven frontier language and vision models across many safety tests, from basic benchmarks to adversarial red-teaming. It finds GPT-5.2 clearly safest overall while others trade off safety across languages, modalities, and threat models.

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

Introduces GauntletBench, a web-based testbed with video editors, workflow tools, 3D apps, and more, focused on tough perception and reasoning tasks. Even the best agents hit only ~19% success while non-expert humans clear 80%+. If you think your agent is "human level," try it here. ([huggingface.co](https://huggingface.co/papers/2606.14397))

Partnering with Mozilla to improve Firefox’s security

Anthropic used Claude Opus 4.6 to scan Firefox’s code and surfaced 22 new vulnerabilities, 14 rated high severity. The post lays out a playbook for pairing AI bug hunters with human maintainers safely.