Distribution-aware feature selection for SAEs

Amirali Abdullah ,

Narmeen Oozeer ,

Nirmalendu Prakash ,

Michael Lan and

Alice Rigg

Published: August 29, 2025

TopK Sparse Autoencoders (SAEs) break down neural activations into understandable features, but they're inefficient because they reconstruct each token using a fixed number of the most active features. A more advanced version, BatchTopK, improves this by selecting the most active features across an entire batch of tokens. However, this can lead to an "activation lottery," where a few very high-magnitude features dominate the selection process, potentially at the expense of other more informative features that have a lower magnitude.

We have developed a new approach called Sampled-SAE. This technique works by first scoring all the potential features in a batch of data. It then creates a smaller, curated "candidate pool" of the best features, which it selects from. The size of this pool is controlled by a new parameter, l. By adjusting l, researchers can find a balance between using globally important features and using more specific, rare ones. For example, a small l forces the model to use only the most globally consistent features, while a large l allows for a wider variety of fine-grained features. This makes BatchTopK a more flexible, tunable approach that can be adjusted based on the specific trade-offs needed for a given task, such as prioritizing a model's performance over its interpretability.

Research submission here.