Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps
Summary
Diamond Maps reframe reward alignment as learning a transport map over model outputs instead of tweaking rewards token by token. This gives smoother, more sample-efficient updates and shows strong results across safety-style alignment tasks.
Topics
Related Content
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning
The authors build SpatialClaw, a code-driven agent that uses a stateful Python kernel plus vision tools to solve 3D and 4D spatial puzzles. It beats prior spatial agents across 20 benchmarks and six vision-language backbones, showing that the action interface design can unlock much stronger spatial reasoning.
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Wraps real robots in a closed-loop system where coding agents iteratively reset scenes, run policies, check results, and improve code. If you’re serious about autonomous robot labs, this is basically a blueprint.
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security
AgentDoG 1.5 is a small family of guardrail models trained on a detailed agent-risk taxonomy with surprisingly few samples. They can sit in front of powerful agents, flag dangerous actions, and run cheaply. If you build tool-using agents, this is emerging as a standard safety baseline to copy or test against.
Synthetic Computers at Scale for Long-Horizon Productivity Simulation
Builds thousands of synthetic "computers" with realistic files and calendars to simulate month-long knowledge work for AI agents. Each run spans 8+ hours and ~2,000 steps, yielding dense signals for training long-horizon productivity agents. If you are designing office copilots or agent training curricula, copy this setup to cheaply generate rich experience data. ([arxiv.org](https://arxiv.org/abs/2604.28181))