Coaches who fed 11 seasons of scrimmage plays into a Bayesian win-probability model now treat a failed fourth-and-one at midfield like a 65-yard punt that pins the opponent at the 12. The break-even point drops to 38 % conversion probability, and leaguewide success from one yard is 68 %. That gap produces 0.31 more victories per season for clubs willing to hit the gas.
Chip Kelly was first to wire a tablet to a Markov chain in 2013; since then play-callers receive a color-coded card that flashes "GO" in red when the model spits out positive leverage. In 2025, contenders followed the prompt 54 % of the time, up from 16 % in 2017. The same sheet recommends a fake punt when trailing by 4-8 points after 9:30 remaining in the fourth quarter; special-teams units practiced that look once a week and converted 5 of 7 last year.
Tracking wind, down, distance, score, and time remaining, the algorithm updates every 11 seconds, weighting the last 400 snaps of each offensive line against the front they will face. A 2 % uptick in interior pressure rate slides the go-for-it threshold by 3.4 %; if the opponent has used more than 18 defensive substitutions in the drive, the model adds 1.7 % to the conversion odds. These micro-edges turned into 42 extra touchdowns leaguewide last season, worth 186 total points.
NFL Fourth-Down Analytics: How Teams Use Data to Decide
Go on 4th-and-1 from the opponent’s 34-yard line; the model spits out +0.72 expected points versus +0.49 for a 52-yard field-goal attempt, so punt units stay on the bench.
Coaches receive a laminated card calibrated every Monday: green cells for ≥55% conversion probability, yellow for 45-54%, red below that. Thresholds shift by weather, quarterback health, and opponent red-zone defense.
In 2026 Baltimore faced 28 4th-and-short situations inside the 40; they kept offense on field 22 times, converted 18, and those drives produced 8.3 points per possession against a league mean of 5.1.
Green Bay’s win probability jumped from 41% to 63% after a successful 4th-and-2 fake jet-sweep versus Kansas City; the call came from a clustering algorithm that identified Chiefs’ slot-corner blitz tendency on 3rd-long earlier in quarter.
Chip Kelly’s 2015 Philadelphia workbook is still circulating: it labeled punts from plus territory minus-0.14 win probability leaks and banned them in practice scrimmages, a policy Brandon Staley adopted almost verbatim in Los Angeles.
Each club subscribes to at least two private vendors who refresh Markov chains before sunrise Sunday; prices start at $240k per season and include 30-year wind-vector archives plus opponent-specific linebacker reaction-time metrics harvested from RFID tags.
Refusal rates remain stubborn: 37% of 2026 fourth-and-forecasts inside the 50 were overridden by head coaches who cited field-position faith, a phrase analytics staffs counter with 8×10 color printouts showing 0.03 EPA per yard of lost field position is dwarfed by 0.28 EPA per failed conversion.
Next frontier is real-time neural nets pulling helmet accelerometer data to detect fatigue spikes; Cincinnati beta-tested it in Week 14 and saw go-for-it accuracy climb 4.7% when guard heart-rate variability stayed below 135 bpm.
Win Probability Added: The One Metric Coaches Check Before Headset Silence

Clip the play-sheet to a +0.03 WPA threshold; anything below that in the middle third of the field gets punted on 4th-and-4 or longer. The 2026 regular season saw 432 such scrapped attempts, saving an aggregate 14.7 expected wins across the 32 clubs.
Green Bay’s staff keeps a laminated WPA heat-map taped inside the call-sheet. Last November in Detroit, 14:18 left, 4th-and-2 at their own 34, the chart flashed +0.041. Jordan Love kept the drive alive, drive ended in a touchdown, swing moved the ledger from 46 % to 71 %, final margin 29-22.
Baltimore’s model penalises WPA by 0.007 for every 0.1 second the play-clock dips below 10. John Harbaugh cited the adjustment when he sent the offense back out versus Seattle: 4th-and-1, own 43, clock 8:04 Q3, up 4. WPA read +0.038 pre-snap, Lamar Jackson gained 3, eventual field-position edge produced 3.1 expected points, game never within one score again.
Defensive injuries recalibrate WPA faster than public models track. When the Steelers lost T.J. Watt for a series in Week 15, Cincinnati’s in-house sheet reran 2,800 Monte Carlo runs in 11 seconds, nudging the go-for-it breakpoint from +0.028 to +0.019. Zac Taylor took the cue, converted twice, stole eight plays and 5:42 of possession.
Bookmakers price live moneylines off the same WPA curves; a +0.025 swing equals roughly 55 cents on the dollar against a -110 baseline. Sharp bettors watching the stadium tablet feed pounced on the Cardinals-Rams over when Arizona’s internal WPA jumped from +0.031 to +0.058 after a successful fake punt; the line moved from 47.5 to 50 within 90 seconds.
Keep the threshold dynamic: wind above 20 mph gusts shaves 0.004, frozen ball conditions another 0.003, each timeout retained adds back 0.0015. Print the adjusted number on athletic tape, stick it to the back of the quarterback’s wristband; when the headset cuts at 15 seconds, he still knows whether to sprint to the line or jog to the sideline.
Ball Position & Yard-to-Go Lookup Tables: A Color-Coded Chart Every OC Keeps in the Booth
From the +38 to the +49, green means go on any 1-to-3; yellow flips to red if wind > 12 mph or the center-guard duo has allowed a pressure in the last two drives.
- Own 20-29: 1 ytg-green; 2 ytg-yellow; 3-4 ytg-red; 5+ ytg-black
- Midfield 50: 1-2 ytg-green; 3 ytg-yellow; 4 ytg-red; 5+ ytg-black
- Opp 35: 1-3 ytg-green; 4 ytg-yellow; 5-6 ytg-red; 7+ ytg-black
- Opp 25: 1-4 ytg-green; 5 ytg-yellow; 6-7 ytg-red; 8+ ytg-black
- Opp 15: 1-5 ytg-green; 6 ytg-yellow; 7-9 ytg-red; 10+ ytg-black
Inside the +30, the chart shrinks to a wallet-sized strip: green runs only to 4 ytg, yellow vanishes, red starts at 5; the kicker’s career accuracy from 48-52 yards is printed in 9-point font on the back edge.
- Check in-stadium radar for gust delta > 7 mph in any 10-second window-drop one color band.
- If left tackle grade < 65 the prior quarter, slide protection call and bump red zone threshold back one yard.
- Against single-high rate > 68 %, QB alert to slot fade becomes the override on 3-to-5 ytg.
- Inside two-minutes with one timeout, green extends by one yard only if offense operates at < 18 sec/play.
- On 3 ytg from either +38 to +42, if defensive front shows bear front, automatic switch to quarterback wrap/keeper off guard-tackle gap.
The laminated card is updated Tuesday 03:00 ET; printer in the equipment room spits 120 g/m² matte, 11.5 pt font, then gets clipped to the play-sheet ring opposite the two-point chart.
Opponent Tendency Scrapes: Tagging Blitz, Front, and Coverage on 4,000+ Past Fourth-Down Snaps
Scrape every 2021-23 fourth-down clip, tag five-man pressures at 38% frequency versus 11 personnel, then auto-flag Cover-1 behind it; versus heavy sets the same coordinators blitz only 19% and stay in two-high. Feed those tags into a logistic edge model and you’ll see a 0.18 EPA lift on outside hitch routes when the backside safety flips late.
Build the scrape with 640×360 All-22 mp4s pulled from AWS, run YOLOv5 on the pre-snap frame to label five offensive linemen, then match each ID to the nearest defender within two yards. If that defender crosses the LOS in <1.9 s post-snap, mark it as a simulated pressure; the 2026 Ravens used this look 42% of the time on 4th-and-2, yet never brought more than five. Tagging that pattern lets an OC dial a sprint-out flood that beats Cover-3 ratchet by 0.31 EPA.
Store each tagged snap as a 42-character hash: down-distance-hash includes front (3-4 odd, 4-3 over, tite), blitz type (zero, creeper, green-dog), coverage family (cover-1, 2-man, 3-match), and whether the Mike declared. Hash collisions dropped from 1.4% to 0.07% after switching to SHA-256 and appending week number. Query time for any opponent hash averages 11 ms on a 32-core box, letting a play-caller pull last-season tendencies during the 25-second headset clock.
The resulting sheet surfaces that on 4th-and-1 inside the 40, the 2025 Vikings showed a double-A gap mug 61% of the time but re-routed into quarters on 72% of those looks; running back dive against that quarters re-route returned 0.45 EPA per carry. Export the sheet as a 12-kb CSV, pipe it into the wrist coach via Bluetooth, and the OC gets a one-word alert: Quarters-Return triggers an automatic outside zone check with the backside guard folding to pick off the 3-technique.
Weather Index Adjustment: Rewriting Go-for-It Thresholds When Wind Gusts Top 20 mph

Drop the break-even point from +2.3 to -0.8 expected points when sustained gusts exceed 20 mph; the 15-yard reduction in net punt distance and 8 % rise in botched snaps shift the equilibrium so that keeping the ball becomes profitable anywhere inside the 38-yard line.
Over the last five seasons, 312 kicks into winds ≥20 mph produced a 34.7-yard net versus 49.2-yd in calm air; the same dataset shows field-goal accuracy collapsing from 84 % to 57 % beyond 45 yd. Combine the two and every yard of field position gains only 0.034 win probability instead of 0.051, flattening the decision curve.
| Wind Speed | Net Punt (yd) | FG % from 48 yd | Break-even EP |
|---|---|---|---|
| <10 mph | 49.2 | 84 | +2.3 |
| 15-19 mph | 44.6 | 71 | +0.7 |
| 20-24 mph | 34.7 | 57 | -0.8 |
| ≥25 mph | 29.1 | 41 | -1.9 |
The model adds a 0.003 EP deduction for every degree below 38 °F once the wind crosses the 20-mph mark; frozen air further deadens the ball, cutting punt hang-time by 0.4 s and increasing the probability of a return past the 30 by 11 %.
Coaches have already internalized the shift: in Weeks 12-17 of the past three campaigns, attempts to convert on 4th-and-4 or shorter inside the 35 rose from 28 % in calm games to 54 % when gusts topped 20 mph, producing 0.18 more wins per 16-game schedule for the clubs that trusted the recalibrated chart.
FAQ:
How do teams actually build the fourth-down model—what raw data goes in, and how is it cleaned so the win-probability adder isn’t just noise?
They start with every snap since 2001, tagging 65 fields per play: pre-snap leverage of each defender, hash, wind vector, turf temperature, zip-code-level weather radar, bookmaker in-game line, injury reports updated to the minute, and the GPS fatigue index for every player on the field. Next comes a de-noising layer: plays with backup offensive linemen are down-weighted 18 %, wind gusts above 25 mph get a 30-play rolling average, and any drive inside the two-minute warning is treated as a separate seasonal regime. The cleaned set is then bootstrapped 10 000 times so rare scores (down 9 with 0:37 left) aren’t extrapolated from single-digit samples. Only after that does the win-probability curve get refit each Tuesday morning; the fourth-down calculator you see on the broadcast is the tip of that iceberg.
Why does the same 4th-and-2 from the 38 show a 0.05 EPA difference between the Ravens and the 49ers—aren’t the league models supposed to be universal?
Universal baseline, yes, but each club maintains its own posterior. Baltimore’s prior assumes a 74 % conversion rate on short-yardage runs because their 2020-23 heavy packages averaged 5.9 YPC; San Francisco’s prior sits at 63 % because Shanahan’s offense uses more motion and gets shorter second-effort yards. The Bayesian update after three seasons of tracking keeps that 11-point gap alive, so the private model spits out different go-for-it thresholds even when the public model calls it a coin flip. Add special-teams variables—Tucker’s FG range vs. Moody’s—and the gap can swing 0.08 EPA, enough to flip the call.
Coaches still talk about momentum. Is there any measurable signal inside the tracking data that justifies punting on 4th-and-1 after two negative runs, or is it pure narrative?
Momentum shows up, but only in specific fatigue windows. If the offense just endured two plays longer than 2.8 s of real time and traveled >35 yds of GPS distance, the conversion probability drops 4.3 % on the next snap; that’s the physiological signal. Outside that 90-second fatigue band, no cohort of two negative runs lowers expected points; the drop-off vanishes. So unless the coach is staring at a winded OL group with no timeouts, the punt is narrative, not math.
How do analytics staffs keep the head coach from tuning out after one high-profile failure—do they have a built-in reset protocol so Week 4 doesn’t wreck the model for Week 14?
They bake in a Bayesian regret buffer. Every failed fourth down is treated as a single Bernoulli trial with a beta(2,2) prior; one miss moves the posterior mean only 0.02, so a coach would need to fail six straight times before the model’s recommendation flips from go to punt. The analytics director also delivers a one-page burn card after each game: league-wide league-wide league-wide success rate for that down-distance, the coach’s career rate, and the monetary delta of expected wins. Seeing that the rest of the league converted 62 % of identical situations keeps the emotional tail from wagging the rational dog.
Betting markets react within seconds—could a team’s own analytics crew legally front-run the sportsbook by pushing fourth-down info to the sideline faster than the 10-second TV delay?
They already see the data first, but the league’s integrity monitor tags every RFID packet with a cryptographic timestamp, so any wager placed from the bench would leave a fingerprint. More practically, the edge is tiny: the difference between the private model and the closing line is rarely more than 1.5 %, and you’d need seven-figure volume to beat the -110 vig. No club has risked a suspension for that skim; they’d rather use the half-second speed advantage to call timeout and substitute the jumbo package.
