Projection Cascade - Methodology

1. What the cascade measures (and what it does not)

The projection cascade produces a per-player season-long fantasy-point projection plus its supporting decomposition. For every active MLB player the cascade outputs:

Headline projPoints - the median projection in fantasy points.
floorPoints and upsidePoints - the P10 and P90 of the simulated distribution.
pctBust - the fraction of simulations scoring below 60% of the median.
pctElite - fixed at 8% by construction (the top 8% of simulated outcomes).
projectionAudit - a 30-50 field block containing all intermediate values: rate projections, workload tier, talent tier, master confidence, role retention, workload band, Statcast multipliers, lineup-slot context, etc.
Per-category rates - projK_rate, projER_rate, projH_rate, etc. (pitchers); projTB_rate, projR_rate, projSB_rate, etc. (batters).

Note on population scope

The cascade operates on the active MLB roster - which includes rookies (first-year MLB players who still retain rookie eligibility), MLB-active young players (recently graduated former prospects), and established veterans. It does NOT operate on minor-league prospects who have not yet appeared in MLB; those records are handled by the prospect-shadow model instead.

Per MLB's operational definition, "prospect" status is determined by retained rookie eligibility - a player loses prospect status once they exceed any one of these thresholds:

Position players: more than 130 MLB at-bats
Pitchers: more than 50 MLB innings pitched
More than 45 total active-roster days, excluding IL time and September roster expansion

Source: MLB rookie eligibility rules.

A player on the eligibility cusp may appear in BOTH the cascade and the shadow pipelines during their transition season - this is operationally correct (different model layers serving different scopes), not a duplication bug.

Honest data-coverage disclosure: the cascade does not currently track per-player MLB at-bats, MLB innings pitched, or active-roster service days as first-class fields. Cascade-vs-shadow routing is therefore based on whether the record has cascade-shape data (projectionAudit populated) rather than on a strict rookie-eligibility check. Five distinct concepts - prospect status, rookie eligibility, developmental stage, organizational value, fantasy relevance - are not enforced as separate fields in the cascade. See shadow methodology §1 for the detailed five-concept distinction.

What the cascade does NOT measure

Plainly listed here so any reader citing cascade output can interpret it correctly. These are real model gaps, not roadmap promises:

It does not model handedness or platoon splits. Lefty-vs-righty rate differentials are absent; the only platoon awareness is an explicit "platoon" workload-tier label that caps PA, not a rate split.
It does not use pitch-mix data in rate regression. Pitch-mix data is loaded into state but is not consumed by the cascade (see TODO at app.js line 491).
It does not use weather/park-weather data. parkWeather is loaded into state but never read by the cascade.
It does not project defensive value, base-running runs, or leverage-adjusted relief value. Bullpen leverage is computed for closer/relief role assignment but is NOT wired into save-rate projection.
It does not project plate-discipline correlation. In Monte Carlo, walks and strikeouts are drawn independently rather than correlated through chase rate or zone contact.
It does not apply position-specific aging curves. Aging is archetype-based via applyArchetypeAgingCurves (lines 14444-14455) but does not adjust per position.
It does not model minor-league prospects directly. Prospect records are handled by the separate prospect-shadow model.
It does not produce a single composite ranking. The cascade outputs projPoints for ranking; ranking-vs-value composition happens downstream in the fantasy-value layer.

2. Pipeline overview - two parallel branches

The cascade is invoked from render() at line 16353. It splits into two parallel branches that share the same general stage pattern but differ in the specific functions and rate categories:

render()                                                ← line 16353
  │
  ├─ Pitcher branch
  │     groupByPlayer(pitchRows)
  │     computePitcherProjection(rows, year, popModels)   ← line 13253
  │       │
  │       ├─ Anchor + projAge + careerIP + pitchQualified  (13254-13302)
  │       ├─ Rate projection + shrinkage + age adjust      (13336-13448)
  │       ├─ Workload (IP tier + durability + ramp)        (13450-13515)
  │       ├─ Role retention probability                    (13548-13562)
  │       ├─ Fantasy-points synthesis (K, ERA, H, BB,…)    (13564-13776)
  │       └─ projectionAudit assembly (~53 fields)         (13851-13923)
  │
  ├─ Batter branch
  │     groupByPlayer(batterRows)
  │     computeBatterProjection(rows, year, popModels)     ← line 14183
  │       │
  │       ├─ Anchor + projAge + position resolution        (14184-14220)
  │       ├─ Rate projection + shrinkage + bounds          (14228-14353)
  │       ├─ Workload (PA tier + rookie ramp + slot)       (14381-14441)
  │       ├─ Context multipliers (lineup, park, prot.)     (14379-14413)
  │       ├─ Statcast compression                          (14457-14599)
  │       ├─ Fantasy-points synthesis (R, TB, RBI, BB,…)   (14586-14610)
  │       └─ projectionAudit assembly (~30+ fields)        (14626-14683)
  │
  │   Both branches converge:
  ├─ applyAvailabilityAndRiskToProjection(proj, currentYear)  ← line 12604
  │     (injury / role / callup penalties on top of cascade output)
  │
  ├─ attachFantasyValueMetrics(proj)                          ← line 5773
  │     (master confidence, talent tier, fantasy value)
  │
  ├─ runMonteCarlo(proj, …)                                   ← line 4594
  │     (~10,000 simulations per player; produces percentiles,
  │      floorPoints, upsidePoints, pctBust, pctElite)
  │
  └─ state.lastProjectionAuditRows = filtered.slice()         ← line 16534
        (downstream surfaces, including the Player Explainer,
         read from this array.)

The cascade is not phase-numbered like the shadow model (P0-P5). It is organized by archetype (pitcher vs batter) and within each archetype by stage (anchor → rates → workload → context → fantasy points → audit). The shadow model's phase numbering reflects its incremental build history; the cascade's stage organization reflects the data dependencies between the steps.

3. Anchor & role determination

The first stage of each branch establishes a baseline "anchor" - the player's best recent career season - and resolves their projected age plus role.

3.1 Pitcher anchor (lines 13254-13302)

Identifies the best career year by composite score (innings pitched, rate quality, recency). Resolves projAge from birthdate plus current season. Determines whether the player qualifies for full rate projection (pitchQualified) versus reduced shrinkage (low-sample players). Pulls role state from role_overrides: rotation lock, closer lock, swingman flag, callup probability, injury status.

3.2 Batter anchor & position (lines 14184-14220)

Same pattern. Anchor selected from best recent career year. Position resolved with explicit override-precedence: role-override file > platform position > Statcast-derived position. The resolved positionsLabel drives downstream workload tier assignment (different PA defaults for catchers vs middle infielders vs corner outfielders).

4. Rate projection - shrinkage toward population models

The cascade projects each stat category as a per-PA or per-IP rate, then multiplies by projected workload to get totals. Rate projection is the core empirical-Bayesian layer of the cascade.

4.1 Pitcher rates (lines 13336-13448)

For each rate (K%, BB%, hit-rate, ER-rate, HR-proxy):

Compute observed rate from recent career sample.
Look up population mean for the player's bucket from popModels.pitchers (segmented by age band and SP/RP role).
Compute stabilization weight from sample size against the rate-specific stabilization point (K% stabilizes faster than HR%, etc.).
Blend: projected rate = observed × weight + population × (1 − weight).
Apply age curve: rate adjusts per projAge against the archetype-specific aging path.
Apply stuff-score nudge: a higher stuffScore (derived from velocity, IVB, extension, spin) lifts the K% projection within bounds.

Outputs: projK_rate, projER_rate, projH_rate, projBB_rate, projHRproxy_rate, kpctProj, stuffScore.

4.2 Batter rates (lines 14228-14353)

Same pattern, different categories: R, TB, RBI, BB, K, SB. Each rate is shrunk toward a position-bucketed population mean (catchers have different population means than middle infielders, etc.). Additional defensive steps:

Hard bounds applied to prevent runaway projections (e.g., TB-rate is capped at a position-tier-specific maximum).
P99.5 shrinkage - any rate above the 99.5th percentile of historical rates for that bucket is shrunk back toward the 99th percentile. This is the cascade's analog of the shadow model's K9-floor defense.
Statcast adjustments are NOT applied at the rate-projection stage directly; they enter as multipliers in §7.

Outputs: projR_rate, projTB_rate, projRBI_rate, projBB_rate, projK_rate, projSB_rate.

5. Workload modeling - IP (pitcher) / PA (batter)

5.1 Pitcher workload (lines 13450-13515)

Assigns the pitcher to a workload tier based on role, injury history, archetype, and recent IP. The tier label is exposed as workloadTierLabel in the audit block; common values include workhorse, mid-rotation, back-end, swingman, closer, setup, middle relief.

The tier determines:

workloadOuts - base projected outs for the season.
workloadRampCap - a cap for pitchers returning from injury (the IL ramp).
durabilityMult - durability-adjusted multiplier (1.0 for healthy pitchers; reduced for injury-prone profiles).
archetypeDispersionMult - a multiplier on simulation variance: closer roles have higher dispersion than workhorse starters.

5.2 Pitcher role retention (lines 13548-13562)

A separate roleRetentionProbability multiplier representing the probability the pitcher stays in their assigned role through the season. Reads rotation locks, closer locks, depth-chart rank, injury risk, prior-season starts. A 0.8 retention probability scales final IP by 0.8 (the model expects 20% of innings to be lost to role change).

5.3 Batter workload (lines 14381-14441)

Assigns the batter to a PA tier from a defined set: full_time, regular, part_time, platoon, backup, injury_replacement. The tier sets workloadPA as the base projection.

Rookie ramp adjusts PA downward for first-year players based on pedigree (rookieRampMult). Lineup slot is either taken from explicit override or guessed from position and tier (slotGuess). The guessed slot influences R/RBI multipliers (heart-of-order produces more runs and RBI per PA).

6. Context multipliers - park, lineup slot, lineup protection

Once base rates and workload are projected, context multipliers adjust the totals. Lines 14379-14413 (batter side; pitcher side has analogous park adjustments in the fantasy-point synthesis layer).

Park factors. Loaded from parkFactorsByTeamId (app.js line 2348). Applied as late multipliers on rates (e.g., parkER, parkTB) - they adjust totals after the rate × workload product. They are not used as inputs to the rate-projection stage.
Lineup-slot multiplier. Heart-of-order slots (3-4-5) boost R and RBI relative to leadoff or bottom-of-order slots. Applied via lm.r, lm.rbi, lm.pa objects.
Lineup protection. lineupProtMult applies a small multiplier when a strong run-producer hits behind the batter, modeling the "pitched-around" effect.
Fringe caps. For low-tier players, total projections are clamped to prevent runaway projections from unstable rates × moderate PA.

These multipliers are applied multiplicatively. They are NOT position-adjusted in any sophisticated way (e.g., the lineup-slot multiplier is the same for catchers and outfielders) - see §16.

7. Statcast compression layer (batters)

Lines 14457-14599. For batters, Statcast-derived features compress into multipliers applied to power and contact projections:

Power multiplier - composed of barrel rate, exit velocity (top 50%, average best speed), and hard-hit percentage. Combined via geometric mean rather than sum, to avoid double-counting (a player with high barrel and high hard-hit shares correlated signal).
Contact multiplier - composed of whiff percentage, out-of-zone swing percentage, and zone contact rates.
Speed multiplier - sprint speed, applied primarily to SB rate.
xwOBA regression - when actual wOBA exceeds xwOBA by a sample-size-dependent threshold, the projection regresses partway back to xwOBA. The threshold widens with smaller sample sizes (more tolerance for noise on small samples).

The Statcast layer is the batter cascade's most empirically-rich component. For pitchers, the analogous layer is the stuffScore nudge inside rate projection (§4.1) plus xERA blending inside fantasy-point synthesis (§8); the pitcher cascade does not have a dedicated Statcast compression block.

8. Fantasy-point synthesis

The synthesis stage converts projected rates × workload into per-category totals, then weights them by the league's scoring system into the headline projPoints.

8.1 Pitcher synthesis (lines 13564-13776)

For each scoring category:

K = projK_rate × workloadOuts, then rounded.
ERA = a blend of empirical ERA and xERA, the blend weight depending on sample stability.
H, BB = rates × outs, with hit-rate adjusted by park's hit factor.
W = a function of team win expectancy and the pitcher's quality (better pitchers on better teams accrue more wins per start).
L = analogous to W but inversely.
SV, BS = role-conditional (closer multiplier × team save opportunities).

The cascade applies a final sanity cap: if rawProjPts exceeds an archetype-specific ceiling, the result is hard-capped and the difference is recorded as spikeRisk (so the audit shows the model was constrained).

8.2 Batter synthesis (lines 14586-14610)

For each category (R, TB, RBI, BB, K, SB), totals are computed as rate × PA × context_multiplier. Aging effects and talent-ceiling clamps are applied as final modulators. talentCeiling is a per-tier maximum that prevents low-tier batters from projecting elite output even with favorable contextual factors.

The synthesis stage emits projPoints, rawProjPts, projPointsFloor, and projPointsCeiling. The Floor and Ceiling are first-pass estimates that are refined by the Monte Carlo layer (§9).

9. Monte Carlo simulation

runMonteCarlo() at line 4594. After the deterministic cascade produces a point estimate, ~10,000 simulations (configurable via MC_SIMS_FINAL or MC_SIMS_QUICK) are run per player to produce the full outcome distribution.

9.1 State-based simulation

Each simulation first draws a role state before drawing per-category outcomes:

healthy - full PA / IP draw against projected workload
injured - capped PA / IP based on injury severity priors
breakout - sampled from upper tail; young-player skewed
collapse - sampled from lower tail; old-player skewed
demoted - reduced PA / IP path
role_gain - promoted (e.g., setup → closer mid-season)

Breakout and collapse probabilities are age-conditional: young players carry ~12% breakout / 6% collapse priors; older players carry ~6% breakout / 24% collapse priors.

9.2 Per-category draws

Within each simulation, per-category rates are drawn from distributions centered at the projected rates with variance set by stabilization confidence. Categories are drawn independently (e.g., BB and K rates are not correlated through chase rate) - see §16.

9.3 Outputs

Percentiles - P10, P25, P50, P75, P90 of fantasy points across all simulations.
floorPoints - the P10.
upsidePoints - the P90.
medianPoints - the P50 (typically used as the displayed projection).
pctBust and pctElite - see §11.

The MC layer is the cascade's most computationally expensive stage. Quick mode (MC_SIMS_QUICK) is used for live UI; final mode (MC_SIMS_FINAL) is used when the user explicitly requests full recomputation.

10. Talent-tier classification

computeTrueTalentTier() at line 15173. Assigns each player to one of five tiers:

Tier	Label	Pitcher gate (example)	Batter gate (example)
1	superstar	K% > 0.30 + IP > 180, or stuff score > 0.55	peak TB rate top 0.5% + sustained PA
2	elite	K% > 0.28 + IP > 150, or stuff > 0.40	peak TB top 2% + multiple seasons
3	quality starter / regular	K% > 0.22 + IP > 120	solid current-form + adequate sample
4	role / platoon	lower IP or marginal rates	limited PA tier or weak hitter grade
5	fringe	minimal MLB sample, no rate stability	fringe-roster signal

Tier assignment is multi-gate and conservative. A player needs to meet either current-form criteria OR peak-evidence criteria to qualify for tiers 1-2; meeting both produces high-confidence assignment.

11. Master confidence - six-component composite

computeMasterConfidence() at line 5729. Multiplicative composite:

masterConfidence = rosterConf
                 × roleConf
                 × sampleConf
                 × marketConf
                 × √survivorship
                 × √realPlayerProb
                 × spikeRiskMult

Component definitions:

rosterConf - probability of 40-man roster retention (computeRosterConfidence).
roleConf - probability the player gets the role the projection assumes (e.g., closer, full-time bat). Computed from rotation/closer locks and depth-chart rank.
sampleConf - stabilization-weighted confidence in the rate projections. Closer to 1.0 for players with multiple full seasons; closer to 0.5 for low-sample players.
marketConf - ADP/expert-consensus strength. Players with tight expert agreement and clear ADP receive higher confidence; players with wide expert disagreement receive lower confidence.
survivorship - adjustment for inherent survivor-bias in the dataset (we mostly observe players who succeeded enough to keep getting drafted/projected).
realPlayerProb - probability the record refers to a real, active player (filters out stale records).
spikeRiskMult - penalizes confidence when the projection was hard-capped (the spike-risk flag from §8.1).

The composite is multiplicative, so a single very-low component significantly reduces confidence. This is intentional: a player with great rate confidence but no role lock should not be projected with high overall confidence.

12. Bust risk + elite-season probability

Both come from the Monte Carlo distribution (§9), not from separate computations.

pctBust (line 4801) - fraction of simulations scoring less than 60% of the P50 median. A high pctBust means the simulation distribution has heavy left-tail mass; the player has a meaningful chance of disastrously underperforming.
pctElite (line 4806) - fixed at 8% by construction. This is the simulation count above the elite-threshold, which is itself defined as the top 8% of the simulated outcomes. It is NOT a population-relative percentile; it is a per-player distribution percentile. A "high pctElite" therefore does not mean the player is exceptional - it means the upper 8% of their simulated outcomes was high.

The 8% elite threshold is a construction choice. The cascade uses it as a consistent yardstick for the upper tail; users should not read it as "8% chance this player is elite" without qualification.

13. Audit assembly - what projectionAudit contains

The projectionAudit block (assembled at lines 13851-13923 for pitchers, 14626-14683 for batters) is the cascade's transparency surface. It is attached to each player record as rec.projectionAudit and is the primary data source for the Player Explainer's cascade-projection sections.

Field categories:

Category	Fields
Workload	`workloadTierLabel`, `workloadIP`, `workloadPA`, `ipLo`/`ipHi`, `paFloorApplied`
Talent	`talentTier`, `talentCeiling`, `projAge`, `breakoutMult`
Confidence	`masterConfidence`, `rosterConfidence`, `roleConfidence`, `sampleConfidence2`, `marketConfidence`, `orgTrust`
Rates (batter)	`projTBrate`, `projRrate`, `projSBrate`, `projBBrate`, `projKrate`, `peakTBrate`
Rates (pitcher)	`kpctProj`, `stuffScore`, `workloadOuts`
Risk	`callupRisk`, `platoonRisk`, `roleRetentionProbability`, `durabilityMult`, `archetypeDispersionMult`, `spikeRisk`
Role	`pCloser` (probability of closer role), `workloadRampCap` (IL ramp)

The audit block is read directly by the Explainer's _renderHeadline, _renderWhy, _renderSensitivity, and other render functions. Every number a user sees in those sections corresponds to a field in this block.

14. Required inputs

The cascade reads broadly from both per-player record fields and global state.

14.1 Batter rec fields

player_id, player_name, pa, year, r_run, b_total_bases, b_rbi, walk, strikeout, r_total_stolen_base, b_game, plus Statcast: xwoba, woba, barrel_batted_rate, hard_hit_percent, avg_best_speed, sprint_speed, whiff_percent, out_zone_swing_percent, avg_swing_speed.

14.2 Pitcher rec fields

player_id, player_name, p_out, year, p_strikeout, p_earned_run, p_total_hits, p_walk, p_win, p_loss, p_save, p_blown_save, p_starting_p, p_game_in_relief, plus Statcast: xera, p_era, k_percent, bb_percent, whiff_percent, barrel_batted_rate, hard_hit_percent.

14.3 Global state reads

role_overrides - explicit role locks, pedigree, callup/platoon risk
injury_map, injury_records - IL duration estimate, injury history
playerTeamEntry - team affiliation, team win%
parkFactorsByTeamId, parkWeather - park multipliers (weather data loaded but unused)
popModels.pitchers, popModels.batters - age-regressed rates by position/bucket
calibratedFvWeights, expertConsensus, adpData - confidence and value calibration

15. Version status

As of the latest methodology revision, the projection cascade carries a methodology version stamp on every output: projectionAudit.methodologyVersion = 'cascade-v1.0'. This closes the gap previously documented in this section.

Both pitcher and batter cascade audit objects (constructed in app.js at the two projectionAudit assignment sites) include the field. Cascade-derived numbers in the Player Explainer, the research-snapshot export, and the Operations workspace now all carry the version reference.

15.1 Version history

cascade-v1.0 - initial version stamp introduced. Corresponds to the methodology as published at the time of stamp introduction. Pre-v1.0 outputs lack a version field; for those, reproducing exact computations requires reconstructing the code state from that date by other means.

15.2 Version-bump policy

Future methodology changes that alter cascade computations should be paired with a version bump here AND in the projectionAudit.methodologyVersion string. Documentation drift between this page and the audit-block stamp should be treated as a real bug, not a cosmetic inconsistency.

16. Limitations

Known limits of the cascade as it operates today. Each is a real model property, not an apology:

No methodology version exported. See §15.
No handedness / platoon split projection. The only platoon-aware element is a "platoon" workload tier that caps PA; rate projections are platoon-blind.
Pitch-mix data loaded but unused. A pitch_mix field exists in state but the cascade's rate-projection stage does not read it (TODO at app.js line 491).
Park effects applied as late multipliers, not as rate-projection inputs. Park-adjusted rates would require a different shrinkage model; the current approach treats park as a context post-hoc modifier.
Weather data loaded but unused. parkWeather is fetched and stored but the cascade does not consume it.
Categories drawn independently in Monte Carlo. BB and K are not correlated through chase or zone-contact rate; TB and BB are not correlated through plate-discipline shape. Real players show correlations the simulation does not model.
Defensive value not projected. Batters are modeled as pure offensive contributors.
Bullpen leverage computed but not wired to SV projection. Leverage is used for closer-role assignment but does not affect projected save rate.
Aging archetype-based, not position-based. Catchers and outfielders use the same aging curve when their archetype matches, despite catchers' faster real-world decline.
Lineup-slot multiplier is uniform across positions. Hitting third in the order produces the same R/RBI lift for a catcher as for an outfielder.
Monte Carlo "elite" threshold is fixed at 8%. pctElite is the per-player top-8% of simulations; it is not a population-relative percentile. Users should not interpret a high pctElite as "this player is elite."
The cascade is invoked synchronously during render(). Re-projection happens on each full render; there is no incremental update.
Survivorship-bias adjustment is constant. The survivorship factor in master confidence is a fixed scalar, not data-driven per player.

17. Related methodology

Prospect-shadow model (v0.5) - the parallel model layer that runs on prospect records (vs the cascade which runs on MLB-active players). The two models do not share computation; the shadow model produces its output independently and attaches it to rec.shadow.
FYPD methodology (v1.0) - the first-year-player draft market observation layer (also independent of the cascade).
Product & design principles - platform-level discipline including §2 "Trust Before Mathematics" which motivates this methodology page's existence.

The three model layers (cascade, prospect-shadow, FYPD) are intentionally separated. Each has its own input scope, its own methodology, and its own output surface. Composing them into a single ranking is a downstream concern, not a model-layer concern.

17.1 Cross-surface vocabulary notes

Two words appear with structurally different meanings across the platform's model layers: "confidence" (used in masterConfidence here in the cascade, but also in shadow's confidenceBucket, identity-graph join confidence, injury reports, position attribution, and the disagreement signal) and "archetype" (used in the cascade's workload, aging, and volatility classifications, and separately in the shadow model's prospect-archetype buckets and the Operations portfolio archetypes). Each usage is correct within its own context; the overloading is at the platform level.

The cross-surface translation tables are documented in shadow methodology §16.4. Refer there before comparing "confidence" or "archetype" values across different surfaces.

18. How to cite

When citing cascade output in analytical writing:

Source: managr Projection Cascade
Methodology version: projectionAudit.methodologyVersion (e.g., cascade-v1.0); see §15. Pre-v1.0 outputs lack this stamp.
Date: the date the projection was computed.
Link to this methodology page.

Example: Per managr Projection Cascade v1.0 (computed 2026-05-17), Player X projects to 425 fantasy points (P10 280, P90 560) with master confidence 0.72 and talent tier 2.