Reinforcement Learning

Research papers, repositories, and articles about reinforcement learning

Showing 2 of 2 items

MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue

Introduces a new optimization rule for training chat agents over long conversations. The goal: steadier learning and more helpful dialogue without exploding token and compute costs.

Naifan Zhang, Ruihan Sun

Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning

Posterior Behavioral Cloning shows how the way you pretrain policies can make downstream reinforcement learning far cheaper. Robotics teams can adopt this to cut expensive environment time.

Andrew Wagenmaker, Perry Dong