By Abir Harrasse, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks and Amir Abdullah
TinySQL: A progressive text-to-SQL dataset for mechanistic interpretability research (Read more here.)
A new research project uses text-to-SQL generation to bridge the gap between understanding simple AI and complex AI. This task combines real-world complexity with a clear, logical structure.
Researchers created the TinySQL dataset, progressing from basic to advanced queries, and trained a range of models from 33M to 1B parameters. They applied interpretability techniques to identify the minimal circuits—the smallest set of neurons and connections—that generate SQL.
The study compares these circuits for different SQL subskills and uses a layerwise logit lens to show how models compose queries step-by-step. This work provides a framework for probing and comparing interpretability methods in a structured, progressively complex setting.