The NBA's play-by-play API puts the calling official's name in the description field of
every foul. It's an unstructured string that programmatic consumers skip entirely. I parsed it across
13,278 games. The result is the first whistle-by-whistle attribution of shooting fouls to individual
NBA officials across full games. It was hiding in plain sight.
The first question I tried to answer was whether refs who travel more and sleep less have higher error rates. They don't. So I asked a different one: what situations produce the errors that do happen? That became the Attention Load. The Harden study is more personal. The 2023 Sixers-Celtics series left a permanent mark. Harden put the team on his back and willed two wins in Games 1 and 4, then completely shit the bed in Games 6 and 7. How does that happen? Does it happen to anyone else? After ruling out the obvious answers, one thing holds: you're more likely to have a terrible playoff game if you lose your free throws. What causes the FTA loss, defense or refereeing or something else, I don't know yet. I'm working on it.
The trigger taxonomy doesn't replicate. The box-score architecture model failed (R² = 0.128). The timing axis for foul classification was killed by the Giannis counterexample. I'm including the stuff that didn't work too. That's part of the answer.
Dataset: CC-BY-4.0. Code: MIT. Officials are named by design. Anonymizing them would kill the point of the dataset.
The ref-ball pipeline runs from the repo Makefile. This site is generated by
site/scripts/build_site_data.py. The frontend never reads parquet.