Be ahead of the curve
Research papers, repositories, and articles about small models
Showing 1 of 1 items
This paper trains small reasoning models with rewards that check whether each intermediate step actually follows from earlier ones. That reduces reward hacks where the model spews long but logically broken chains of thought.