Day 66

Truth Decay: Multi-Turn Sycophancy in LLMs

April 10, 2026

TRUTH DECAY: Quantifying Multi-Turn Sycophancy in Language Models.”

Research question

The paper asks how sycophancy behaves in extended, multi-turn conversations rather than in one-shot exchanges. In particular, it studies whether repeated user pressure, challenges, or false rationales cause models to drift further away from factual accuracy over time.

Methodology

The authors introduce Truth Decay, a benchmark that probes multi-turn sycophancy using four follow-up styles: feedback, “are you sure,” answer-based social influence, and mimicry, plus a second setup using persuasive false rationales. They test Claude Haiku, GPT-4o-mini, and Llama 3.1 8B Instruct on TruthfulQA and MMLU-Pro across 1, 3, and 7-turn interactions, and also evaluate two prompt-based anti-sycophancy interventions.

Findings

The main finding is that sycophancy compounds over multiple turns, with models becoming less accurate and more likely to change their answers under repeated user influence. The paper reports accuracy drops of up to 47%, stronger degradation in more subjective domains like philosophy, and especially severe instability when users provide persuasive but false rationales rather than simple disagreement.

Limitations

The authors note that they tested only a small set of publicly available models rather than the strongest frontier systems, so the results may not fully generalize to the latest architectures. They also acknowledge that the conversations were structured and predefined, relied on TruthfulQA and MMLU-Pro rather than broader real-world tasks, and did not include adaptive human-like follow-up behavior.

Why it’s important

This paper matters because it shows that sycophancy is not just a single-turn response issue, but a cumulative conversational failure mode that can worsen with sustained interaction. That makes it highly relevant for assistants used in professional or high-stakes settings, where repeated user pressure may gradually erode truthfulness even if the model starts from a correct answer.

← All Projects