An independent TransformerLens replication of Hanna, Liu, and Variengien's analysis of how GPT-2 Small performs a narrow "greater-than" behavior in year-completion prompts.
The canonical experiment in this repository is not general arithmetic. It asks whether GPT-2 assigns more probability to valid end years in prompts like:
The war lasted from the year 1732 to the year 17
For this prompt, completions 33 through 99 are valid under the task. The
main result is that patching identifies MLP layers 9 and 10 as the strongest
contributors, with attention layers around 7-9 routing year information into
that computation.
git clone https://github.com/ashioyajotham/greater-than-circuit
cd greater-than-circuit
python -m venv venv
.\venv\Scripts\activate # Windows PowerShell
pip install -r requirements.txt
python run_hanna_analysis.py --n_examples 50 --device cpuOn first run, TransformerLens/Hugging Face may download GPT-2 Small. A CPU run is usable for smoke tests, but larger runs are much faster on CUDA.
For a quick smoke test:
python run_hanna_analysis.py --n_examples 1 --device cpuExpected one-example smoke behavior:
Baseline Probability Difference: about 0.93
Top MLP layers: MLP9, MLP10, MLP8, MLP11
Top attention layers: L9, L7, L8
Primary reference:
Hanna, Liu, and Variengien (NeurIPS 2023), "How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model"
The original paper found that GPT-2 Small can perform a specific year-span completion task and that the behavior is concentrated in late MLPs, especially MLPs 9 and 10. This repo replicates that result with TransformerLens rather than the original rust-circuit stack.
This repository is best read as:
- a runnable replication of the Hanna-style year-completion experiment
- a small activation-patching codebase for exploring this circuit
- a starting point for further mechanistic interpretability experiments
It is not evidence that GPT-2 has general numerical reasoning. GPT-2 Small fails many arithmetic tasks, and this circuit does not imply robust less-than, subtraction, or symbolic comparison ability.
The main task uses year-span prompts:
Prompt: "The war lasted from the year 1732 to the year 17"
Target: More probability on tokens 33-99 than on tokens 01-32
Metric: PD = sum p(y > YY) - sum p(y <= YY)
The metric is Probability Difference, where YY is the two-digit start year.
GPT-2's tokenizer makes most two-digit years single tokens, which makes the
intervention setup relatively clean.
The corrupted baseline follows the paper's 01 dataset. Clean prompts vary the
start year, while corrupted prompts use 01 as the start year:
Clean: "The war lasted from the year 1732 to the year 17"
Corrupted: "The war lasted from the year 1701 to the year 17"
Because almost all years are greater than 01, this changes the model's
completion distribution in a controlled way.
For each candidate component, the script:
- Runs the clean prompt and caches activations.
- Runs the corrupted prompt.
- Patches one clean activation into the corrupted run.
- Measures how much the Probability Difference recovers.
High recovery means the patched component is causally involved in the behavior under this intervention.
The Hanna-style entry point reports the same qualitative structure as the paper:
| Component family | Strongest layers in this implementation |
|---|---|
| MLPs | MLP 9, MLP 10, MLP 8, MLP 11 |
| Attention | Layers 7-9, especially layer 9 |
| Baseline task behavior | High positive Probability Difference |
The exact percentages vary with prompt sample, template choice, device, and implementation details. The important replication claim is qualitative: late MLPs, especially 9 and 10, dominate the recovered greater-than behavior.
greater-than-circuit/
|-- src/
| |-- __init__.py
| |-- model_setup.py # TransformerLens model loading
| |-- prompt_design.py # Exploratory True/False comparison prompts
| |-- prompt_design_hanna.py # Hanna-style year-completion prompts and PD metric
| |-- activation_patching.py # Core patching utilities
| |-- circuit_analysis.py # Component ranking and summaries
| |-- circuit_validation.py # Exploratory validation utilities
| `-- visualization.py # Plotting helpers
|-- tests/
| |-- test_model_setup.py
| |-- test_activation_patching.py
| `-- test_circuit_analysis.py
|-- notebooks/
| `-- quick_start_analysis.ipynb
|-- results/ # Generated outputs from exploratory runs
|-- main.py # Exploratory True/False comparison pipeline
|-- run_hanna_analysis.py # Canonical Hanna-style replication entry point
|-- requirements.txt
`-- pyproject.toml
Use run_hanna_analysis.py for the replication result. main.py is an older
exploratory pipeline for direct True/False number-comparison prompts; GPT-2
Small is weak on that task, so its results should not be used as the primary
replication claim.
import torch
from src.model_setup import ModelSetup
from src.prompt_design_hanna import (
YearPromptGenerator,
compute_probability_difference,
get_year_token_ids,
)
setup = ModelSetup(device="cpu")
model = setup.load_model()
year_token_ids = get_year_token_ids(model)
generator = YearPromptGenerator(seed=42)
examples = generator.generate_balanced_year_dataset(n_examples=5, template_idx=0)
for example in examples:
tokens = model.to_tokens(example.prompt_text)
with torch.no_grad():
logits = model(tokens)
final_logits = logits[0, -1, :]
pd = compute_probability_difference(
final_logits,
example.start_year,
year_token_ids,
)
print(f"{example.prompt_text!r} -> PD={pd:.3f}")The test suite uses mocks for most model-facing behavior, so it is much faster than the full activation-patching run:
python -m pytestIf the Windows pytest.exe launcher is broken, prefer:
venv\Scripts\python.exe -m pytest- The code replicates the year-completion circuit structure, not general arithmetic.
- The generic True/False comparison pipeline in
main.pyis exploratory and can show weak baseline performance. - The activation-patching script patches whole attention/MLP layer outputs; it is not a full path-patching reimplementation of every analysis in the paper.
- The mechanism inside MLPs 9 and 10 is not fully characterized here. The code identifies important components, but it does not explain exactly how those MLPs encode year order.
- Larger runs can be slow on CPU.
@inproceedings{hanna2023greater,
title={How does {GPT-2} compute greater-than?: Interpreting mathematical abilities in a pre-trained language model},
author={Hanna, Michael and Liu, Ollie and Variengien, Alexandre},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2023},
url={https://arxiv.org/abs/2305.00586}
}Related work:
- Elhage et al. (2021), A Mathematical Framework for Transformer Circuits
- Wang et al. (2023), Interpretability in the Wild
- Conmy et al. (2023), Towards Automated Circuit Discovery for Mechanistic Interpretability
- Nanda (2022), TransformerLens
This project builds on the original greater-than circuit work by Michael Hanna, Ollie Liu, and Alexandre Variengien, and on TransformerLens and the broader mechanistic interpretability tooling ecosystem.
MIT License. See LICENSE.
@software{ashioya2025greaterthan,
title={Greater-Than Circuit in GPT-2 Small: A TransformerLens Replication},
author={Ashioya, Jotham Victor},
year={2025},
url={https://github.com/ashioyajotham/greater-than-circuit},
note={Independent replication of Hanna et al. (2023)}
}