Back to AI Lab
Reinforcement Learning
Research papers, repositories, and articles about reinforcement learning
Showing 2 of 2 items
MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue
Introduces a new optimization rule for training chat agents over long conversations. The goal: steadier learning and more helpful dialogue without exploding token and compute costs.
Naifan Zhang, Ruihan Sun
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Posterior Behavioral Cloning shows how the way you pretrain policies can make downstream reinforcement learning far cheaper. Robotics teams can adopt this to cut expensive environment time.
Andrew Wagenmaker, Perry Dong