Back to AI Lab
Behavior
Research papers, repositories, and articles about behavior
Showing 2 of 2 items
Verbalizing LLMs' assumptions to explain and control sycophancy
Gets models to spell out their hidden assumptions before answering. That makes it easier to spot flattery-driven answers and dial them down.
Myra Cheng, Isabel Sieh
Effects of personality steering on cooperative behavior in Large Language Model agents
The authors test how adding human-like personality traits changes how AI agents cooperate in repeated Prisoner’s Dilemma games. They find agreeableness boosts cooperation but can also make agents easier to exploit, warning that persona dials act as soft biases, not hard controls.
Mizuki Sakai, Mizuki Yokoyama