The Curse of Multiple Mediators: Hidden Interaction Effects in Activation Patching
Summary
Shows that standard activation patching mixes the effect of a unit with how it interacts with many others, not just its direct influence. These interaction terms can hide or fake "important" neurons. If you run mechanistic interpretability experiments, this paper says: treat patching results with more skepticism. ([arxiv.org](https://arxiv.org/list/cs.LG/new))
Related Content
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Wraps real robots in a closed-loop system where coding agents iteratively reset scenes, run policies, check results, and improve code. If you’re serious about autonomous robot labs, this is basically a blueprint.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Builds thousands of synthetic "computers" with realistic files and calendars to simulate month-long knowledge work for AI agents. Each run spans 8+ hours and ~2,000 steps, yielding dense signals for training long-horizon productivity agents. If you are designing office copilots or agent training curricula, copy this setup to cheaply generate rich experience data. ([arxiv.org](https://arxiv.org/abs/2604.28181))
HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers
HYDRA-X unifies image and video tokenization inside a single vision transformer, then uses that tokenizer to drive a 7B multimodal model. It hits strong scores on both understanding and generation while proposing a better way to edit content by operating in the tokenizer’s latent space instead of inside the language model.