Day 82

Verifying Chain-of-Thought Reasoning via Its Computational Graph

Verifying Chain-of-Thought Reasoning via Its Computational Graph

Verifying Chain-of-Thought Reasoning via Its Computational Graph.

Research question

The paper asks whether reasoning errors in chain-of-thought outputs can be detected by inspecting the model’s internal computational process, rather than only judging the final answer or generated text. More specifically, it tests whether correct and incorrect reasoning steps leave different structural signatures in attribution graphs built from the model’s latent computation. 

Methodology

The authors introduce Circuit-based Reasoning Verification, or CRV, a white-box verification method that replaces MLP modules with interpretable transcoders, constructs attribution graphs for each reasoning step, extracts graph-structural features, and trains a classifier to predict step correctness. They create step-labeled datasets for synthetic Boolean and arithmetic tasks and GSM8K because existing text-only verifier datasets do not contain the model-specific computational traces needed for this method. 

Findings

The paper finds that attribution-graph structure contains a strong signal of reasoning correctness, making it possible to verify reasoning through computational traces. It also finds that error signatures are domain-specific and provides evidence that these signatures are partly causal, since targeted interventions on individual transcoder features can correct some faulty reasoning. 

Limitations

The authors emphasize that CRV is computationally intensive and is not intended as a practical drop-in verifier for deployed systems. The method also requires full model access and modified interpretable surrogate models, so it is less applicable to closed-source frontier models or ordinary API-only agent settings. 

Why it’s important

This paper matters because it moves reasoning verification from surface-level judging toward mechanistic diagnosis of why reasoning fails. For your proposal, it is useful because it supports the broader argument that agent and reasoning evaluation should examine process traces, not just final answers, although its focus is internal computational graphs rather than external tool-use traces.

← All Projects