Methodology · ref-ball

Layer 1 — what's done

Per-official shooting-foul rates and player×official interaction profiles across 13,278 games. The player-pair ANOVA comes back at p = 0.000003. Individual officials produce significantly different FTA outcomes for the same players.

Defense adjustment

Raw FTA deltas confound opponent defensive quality. If Harden plays the Grizzlies (elite defense) when Ransom is on the crew, is the FTA drop Ransom or Jaren Jackson Jr.? Defense-adjusted FTA/36 deltas control for opponent DefRtg in games with vs. without each official. The adjustment barely moves the results (r = 0.98 raw vs. adjusted), but I apply it anyway because it's the right thing to do.

Foul-type specificity

Shooting fouls and non-shooting fouls are different skills. Per-official SF rate and NSF rate are uncorrelated (r = 0.152, p = 0.15). After controlling for overall foul volume, 100% of the shooting foul ANOVA effect survives (η²: 0.032 vs 0.031 raw). This isn't just some refs calling more fouls overall. The effect is foul-type-specific, not a volume artifact. Personal fouls show the largest effect (η² = 0.064). Personal Take and Offensive Charge show no significant variance.

Layer 2 — what's next

I graded 300 shooting foul clips by hand trying to build a landing-foul classifier. LLM grading topped out at 55% precision. Fine-tuned VideoMAE got to 75% precision / 83% recall. Not enough to clear the gate. When the classifier works, it unlocks per-official landing foul rates, which is the why behind the suppressor and amplifier profiles. That's Paper 2.

How this works

Layer 1 — what's done

Defense adjustment

Foul-type specificity

Layer 2 — what's next