Day 71

Aligning Large Language Models with Representation

Aligning Large Language Models with Representation

Aligning Large Language Models with Representation Editing: A Control Perspective.

Research question

The paper asks whether large language models can be aligned at inference time by directly steering their internal representations, instead of relying on full fine-tuning or prompt engineering alone. More specifically, it studies whether alignment can be framed as an optimal control problem over the hidden-state dynamics of an autoregressive language model. 

Methodology

The authors propose RE-CONTROL, which treats an LLM as a discrete-time stochastic dynamical system and injects small control signals into hidden states during generation. They train a lightweight value function over representations using the Bellman equation, then use gradient-based optimization at test time to choose control signals that improve alignment while regularizing them to preserve generation quality. 

Findings

The paper reports that RE-CONTROL outperforms existing test-time alignment methods such as prompting and guided decoding, while requiring far fewer resources than alignment through fine-tuning. It also claims strong generalization and computational efficiency because the learned value model is small and the intervention happens directly in representation space during decoding. 

Limitations

A key limitation is that the method still depends on training an auxiliary value model and running gradient-based optimization during inference, so it is not as simple as plain prompting. The paper also positions itself mainly against other test-time alignment methods and fine-tuning approaches, which means questions remain about robustness across broader deployment settings and more varied alignment objectives. 

Why it’s important

This paper matters because it offers a middle ground between expensive fine-tuning and weaker prompt-only control, giving a more flexible way to steer model behavior at test time. It is also important conceptually because it connects LLM alignment to control theory, suggesting that hidden-state interventions can be designed systematically rather than heuristically

← All Projects