Day 97

FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations

FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations

FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations

Research questions. The paper asks whether RAG systems can become sycophantic by retrieving evidence that supports a user’s false premise, producing “hallucinations with citations.” It asks whether deliberately searching for counter-evidence can reduce this failure mode.

Methodology. The author proposes FVA-RAG, which treats the first answer as a draft hypothesis, then retrieves “anti-context” using adversarial “Kill Queries” before a dual-verification step. The system is tested on TruthfulQA-Generation with 817 questions, frozen corpora, no live web calls, and equal retrieval budgets across methods.

Findings. FVA-RAG achieves about 79.8 to 80.1 percent accuracy, outperforming prompted Self-RAG and CRAG variants, with falsification triggered on about one-quarter to one-third of queries. The key finding is that targeted counter-evidence retrieval helps prevent premise-confirming hallucinations.

Why it matters. The paper is important because RAG can make false answers look more credible when citations reinforce a mistaken user assumption. For sycophancy research, it offers a concrete mitigation strategy: make systems actively test user premises rather than only retrieve supporting evidence.

← All Projects