methodology — resolve research

methodology

Pre-registered before the season opens, honestly evaluated when it closes.

Resolve Research builds projection models for sports and research tools for the broader world. This page is the apex framework — the rules that hold across every model we ship. Per-sport methodology lives on each sport's own subdomain (linked below).

framework

What "honest" means here

Every projection is published with its model spec, its training window, and its out-of-sample (OOS) calibration number — before the predictions are evaluated. We don't tune to live results, we don't quietly swap models mid-season, and when a model misses we say so in the next release notes.

The general loop:

Pre-registration — model spec + projection set frozen and timestamped before the relevant games are played. Edits after kickoff are flagged in release notes, not silently shipped.
Out-of-sample testing — every engine reports a holdout-set calibration number (Brier score, log-loss, or per-domain equivalent). If we can't measure it on data the model didn't see, we don't claim it works.
Equity-first roll-out — when we ship a new sport, the women's league ships at the same fidelity as the men's. WNBA, NWSL, PWHL, LPGA, NCAA-W, WTA: same engine class as their counterparts, same release cadence.
Open release notes — every model version, what changed, and the date it shipped. No silent edits.

data

Where our data comes from

All projection inputs are public: official league box scores, schedule feeds, Elo-rateable match results, and (for tennis) the public Wimbledon / ATP / WTA draws. We do not buy proprietary tracking data, and we don't scrape paywalled sources. The trade-off is honest: we lose access to some tracking-data signal, we keep the model auditable end-to-end.

Training windows vary by sport — published per-spoke. For each engine we note the data origin, the season range used to fit, and the season range held out for OOS evaluation.

bias control

How we avoid fitting to noise

Sports prediction is a high-variance, low-sample problem. The three guardrails we lean on:

Separate SE from prediction SE — confidence intervals on a metric are not the same as confidence intervals on the prediction it feeds. We label both and don't conflate them.
Per-archetype aging curves — players age differently by role. Slot-based aging beats one-size-fits-all curves; we publish the per-archetype splits.
Shrinkage for small samples — first-year college transitions, new-surface tennis appearances, and rookie-NBA priors all use shrinkage toward the relevant population mean. The shrinkage strength is a named lever in the spec, not a hidden constant.

per-sport methodology

Sport-specific spokes

Each subdomain has its own methodology page detailing the engine, data sources, training window, and OOS calibration for that sport.

NBAnba. WNBAwnba. NFLnfl. NHLnhl. MLBmlb. PGA / LPGApga. Futbolfutbol. Tennistennis.

reproducibility

Code, data, write-ups

Model code lives on GitHub. Long-form research write-ups (substantive findings, methodology deep-dives, failure post-mortems) are on Substack. Per-sport release notes are linked from each sport's subdomain home.

honesty notes

What we don't claim. We do not have access to private tracking data, biomechanical signals, or insider injury reports. The model knows what the league publishes, no more. When a player's status changes between projection lock and tip-off, the projection does not chase that change in real time — what's pre-registered is what we evaluate against.

Where the limits are. Single-game predictions on a 16-game season (NFL) are wide-CI by construction. Tennis draws are dominated by 5-7 players in any era. Soccer Elo is calibrated on club football and ported to international duty with shrinkage; that port is a known weak point. We label these limits per-page, not just here.