Beyond linear steering: Unified multi-attribute control for language models

Amirali Abdullah ,

Narmeen Oozeer ,

Luke Marks and

Fazl Barez

Published: May 30, 2025 | Last updated: April 04, 2026

Controlling multiple behaviors in large language models (LLMs) is a challenging problem because different attributes can interfere with each other. Current linear steering methods are limited as they assume that behaviors can be simply added together in the model's activation space, which is often not the case. Additionally, these methods are inefficient, requiring a separate, dedicated tuning for each individual attribute, making them difficult to manage and scale for complex, multi-faceted control.

To solve this, a new approach called K-Steering has been developed. It uses a single, non-linear classifier trained on the model's internal states to dynamically compute new intervention directions via gradients. This approach is more flexible and powerful because it avoids the restrictive assumption of linearity and removes the need for storing and tuning separate attribute vectors. K-Steering allows for the dynamic and flexible composition of multiple behaviors without any additional training, providing a unified and efficient solution for controlling LLM outputs

Research submission here.