Reward Models

Research papers, repositories, and articles about reward models

Showing 1 of 1 items

Discretizing Reward Models

Shows that continuous reward models often assign very different scores to equally good answers, which encourages reward hacking and bad policies. Clustering rewards into a few discrete levels using Monte Carlo dropout reduces this oversensitivity and leads to better RL outcomes. If you're training policies on reward models, this is a strong argument to discretize. ([huggingface.co](https://huggingface.co/papers/2606.21795))

Vijay Viswanathan, Shiqi Wang