Trust

Research papers, repositories, and articles about trust

Showing 1 of 1 items

Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models

The authors probe how models like Llama 3.1, Qwen 2.5, and Mistral internally represent human trust signals in text. They show specific attention heads reliably track fairness, certainty, and accountability cues, which you can exploit to design more trustworthy systems.

Gerard Yeo, Svetlana Churina