Last summer, Major League Baseball kicked around a potential new rule requiring starting pitchers to throw at least six innings. The aesthetic goals were twofold: to restore the eminence of the starting pitcher, which has eroded in an era of pitch counts and bullpen games; and to reduce strikeouts, as pitchers are easier to hit when pacing themselves over 18 outs than when throwing as hard as they can until they get tired. I am supportive of the former goal and at least sympathetic to the latter. Yet I was not a fan of their solution, which I argued would fully upend the balance between hitting and pitching, create more (not their hypothesized fewer) injuries, and (given the specific scenarios in which managers could pull their starters early) cross the Rubicon for the sport’s integrity by creating incentives for teams to intentionally give up runs.
But while the proposed six-inning-minimum rule is not viable, the Commissioner’s Office was onto something when they linked restoring pitcher-usage norms to reducing strikeouts. (Would that all their ideas demonstrated such awareness of the incentives they create.) The preponderance of whiffs in the modern game is both a cause (as throwing at higher intensity tires pitchers out sooner) and an effect (as managers realize the risk of leaving a pitcher in too long) of starters not pitching as deep into games. I made the same connection after last year’s postseason, when I blamed the spate of uncharacteristic relief-ace meltdowns on managers riding their top bullpen arms too hard. Though my analysis focused on the playoffs, I noted that the best way for teams to solve this was to put renewed emphasis on rotation durability all season long — it’s a hard skill to turn on for only a month at a time — which in turn could turn the clock back on the modern strikeout era.
This thought was rattling in my head again as I was building the Simple WAR Calculator. Wins Above Replacement is designed to be kinder to starting pitchers than relievers, because a starter’s job is harder. But that difference is getting smaller. Pacing yourself for six or seven innings, as was the norm within my lifetime, seemed paradigmatically different than airing it out for one or two frames at a time. Now that a starter is more likely to pitch four or fewer innings than to record even a single out in the seventh — yes, that’s true! — it feels more like a matter of degrees.

Shorter starts also mean increased workloads for the relievers. At a time when even the last pitcher on the staff probably throws 97 mph with a nasty slider, that may not sound like a problem. But going to the bullpen earlier risks watering down their effectiveness, as managers are forced into some combination of:
Using less-reliable relievers for the outs the back-end arms can’t cover (or calling on high-leverage pitchers in lopsided games when they could otherwise get a night off)
Leaving relievers in for longer, which could make them less effective (and leave them unavailable the next day)
Creating suboptimal matchups for situational relievers who would ideally face only a specific segment of the opposing lineup
Conceptually I don’t think this is controversial. But if there were evidence of these phenomena leading to tangibly worse bullpen outcomes, it would mean that contemporary baseball analysis underrates starters who work deep into games — and suggests a new paradigm for evaluating in-game strategy.
To investigate this, I used the baseballr package to scrape pitching linescores of every regular-season MLB game from 2022 to 2024 (this seemed like the most prudent timeframe for balancing sample robustness with representativeness) and tagged how many outs the starting pitcher recorded in each. I then calculated the leaguewide bullpen ERA given how deep into the game the starter went.1 Here’s how that looks:
The trend is noisy, as some of the samples are tiny (only five games went to extras after a starter threw nine full innings) and this measurement definitionally favors mid-inning pitching changes (the more outs there are when a reliever enters, the fewer opportunities for them to be charged with runs). Nevertheless, the results validate this theory. The three longest durations in the data also have the lowest ERAs. Every start length of 5.2 innings or more leads to an aggregate 3.80 ERA or better from the bullpen; all but one duration of 16 outs or fewer yields a reliever ERA over 3.90. If you regress the bullpen’s performance on when they were called into duty across every game in the dataset, each additional out a starter records is associated with a four-point improvement in bullpen ERA. Which equates to 12 points of ERA improvement per inning.
To pick out the two most-common start lengths, 41 percent of MLB starts over the last three years lasted either 15 or 18 outs. The league-wide bullpen ERA when the first reliever enters after five innings is 4.04. After six innings, it drops 33 points, to 3.71. The concept isn’t rocket science. It’s almost too tautological to be interesting. But people usually conceptualize saving the relievers’ bullets as an abstract aspiration, not as a managerial decision with a tangible impact. The freshness and flexibility of the bullpen is not a side effect of optimizing pitching strategy. It is a key component of it.
Obviously correlation is not causation, and it’s possible that I am getting this relationship backward. A starter going deep into a game presumably means they pitched well enough to keep it close, so the back-end arms will follow; an early hook is a sign of getting shelled, and the manager will turn to their less-reliable arms if the game appears out of reach. This is an important caveat. It’s also reductive as a full explanation. The median five-inning starter over the last three years allowed only two earned runs and 72 percent featured three or fewer, so going to the bullpen in the sixth does not mean throwing in the towel. Conversely, in an age when calling for a reliever is seen as the risk-averse default option, leaving the starter in could be a sign that the score is not close enough for the lead to be in jeopardy. And even if the relationship between start length and game leverage were straightforward, it would be mitigated (albeit not fully balanced out) by the likelihood of the other team optimizing matchups in close games or resting their starters in blowouts.
A subtler but less-dismissible sample bias is that both metrics are instruments for roster strength. Teams’ winning percentages, bullpen ERAs, and rotation workloads are strongly multicorrelated. But how many innings the rotation throws is also a function of starter quality: the more effectively they pitch, the longer they stay in the game. Better rotations lead to better teams, and better teams build better bullpens. A rebuilding club won’t put as much effort into either acquiring relievers or optimizing their usage. Which leaves us with a chicken-and-egg quandary. Do longer-lasting starters tend to have better relievers behind them? Or does reducing reliance on the bullpen let the manager use their weapons more effectively, which in turn helps them win games? I’m convinced that it’s at least partly the latter, though I acknowledge that the uncertainty creates a looming caveat over my argument.
On the flipside, these numbers may undersell the first-order impact of start length on reliever performance due to gameplanning. Unexpected short hooks happen. Pitchers get hurt or shelled. But the manager has an idea of how deep they expect the starter go: when they usually get tired, what their target pitch count is, and which matchups get dangerous the third (or even second) time through the order. Gone are the days of just sending a mop-up guy out to wear it (or throw seven perfect innings) when things get out of hand. The coaching and pitching staffs are presumably on the same page about the gameplan, and the fruits of that preparation are already baked into the data. If you split up two-inning-or-shorter starts by earned runs allowed as a quick proxy for which early exits were due to poor performance and which were designed openers, relief ERA after the starter gave up four or more runs was 71 points higher than in planned bullpen games.
Let’s take the numbers a step further. Given the rate at which relievers give up earned runs and how many innings they have to cover, we can calculate expected bullpen runs allowed based on how many outs the starter recorded. I will do this in two ways: one based on the raw results (which I will call observed), which is probably preferable for start durations with the largest sample sizes; and one based on the regression-derived rule of thumb that each starter inning is worth 12 points of reliever ERA (hereafter smoothed), which I would recommend when comparing less-common outing lengths.2
To explain what this means: the leaguewide 3.71 bullpen ERA when a starter goes six innings works out to 0.14 earned runs per out. With nine outs left to cover, you’d expect the bullpen to give up 1.24 runs per game. If the relievers sustained that pace over 12 outs, they would allow 1.65 earned runs. But since the post-five-inning ERA is 4.04, a bullpen covering four innings would instead allow 0.15 earned runs per out, which translates to 1.80 expected runs after the starter is pulled. Thus leaving your starter in for the sixth inning shaves an average of 0.56 runs off of what the relievers would allow. The smoothed difference is a little more conservative, with the math coming out to half a run.
Putting ballpark numbers to this idea opens the door to strategic optimization. When a pitcher has made it through n innings (and isn’t fatigued to the point of injury risk), the manager should go to the bullpen if and only if they expect their starter to give up more runs in the [n+1]th inning than the relievers would if they had to stretch for those three extra outs. Staying with the sixth inning, the cost of going to the bullpen after 15 outs (0.56 earned runs) implies a breakeven point of a 5.04 ERA for leaving your pitcher in the game. In other words, if a starter can continue to pitch at a level befitting a sub-five ERA, you are better off letting them pitch the sixth inning than going to a league-average bullpen.
A 5.04 ERA is not a high bar for a big-league-caliber pitcher to clear. Even the more-conservative smoothed estimate yields a 4.51 breakeven ERA, which is notably mediocre when the league average was 4.08 (as it was in 2024). More to the point, both figures are much higher than the 4.14 ERA starters posted in the sixth inning last year. This disparity suggests that modern MLB teams should leave their starters in longer — that the pendulum has swung too far, and preemptively pulling the pitcher means merely deferring (not preventing) your vulnerability on the mound. It is quantitative evidence, not mere aesthetic preference or Smoltzesque curmudgeonry, that managers are going to their bullpens too early.
Of course pitcher usage reverberates beyond a single game. Using a reliever today means they may be diminished tomorrow, or perhaps not available at all. If this theory were true, we would see the effect linger, with a fresh (or hampered) bullpen serving as a boost (or an obstacle) at least into the next day. And in fact we do:
Across all instances in the last three years in which a team played two games on back-to-back days (excluding doubleheaders and mid-game postponements or resumptions), a simple linear regression reveals that each inning a starter pitches today lowers their team’s relief ERA by seven points tomorrow. Assuming an average next-day start length of 15.5 outs, this translates to 0.03 runs of next-day benefit per rotation inning.3 That doesn’t sound like much, but a player who provided such value every game would be worth about half a WAR (and a mid-seven-figure salary) over a full season. This finding also supports start length being more than a mere latent variable for roster quality: while the connection with next-day performance makes sense for an instrument of team strength, the persistent-but-smaller-impact pattern is more befitting of it as a direct causal factor.
A manager navigating pitching changes in a close game may not care about tomorrow’s expected values. I hope you’ll be similarly disinterested in the specifics of the peaks and valleys in the chart below, where comparing pairs of small-sample aggregations creates a veritable turducken of noisy data. (I considered displaying only the smoothed results for durations outside the higher-confidence window to make the visualization less distracting, though I was afraid that would look like I had distorted the data. I did manually reset some uninterpretable negative ERA values to zero, though.) But we can use this data to answer questions like: If the goal is to optimize run prevention for both the current game and tomorrow’s, what is the breakeven level of expected performance at which point a team should ask today’s starter to get them three more outs?
Returning to our recurring hypothetical, if you factor in the ripple effects on the next day’s game, the range of estimated breakeven points for leaving your starter in for the sixth inning increases to 4.78 (from the fully smoothed estimates) at the low end and 5.65 (based on the raw results) at the high end. While you could theoretically construct a frequency distribution for pitcher stamina by which teams’ behavior represents a rational equilibrium within those thresholds, MLB starters’ collective 4.14 ERA in the sixth inning last year — nearly two thirds of a run shy of the most conservative estimated breakeven point, and a run and a half below the most-empirical one — strongly suggests that managers are going to their bullpens too soon.
If you buy these results — and even if you question my math, I think the directional impact is intuitive — there are at least three separate implications for how we think about baseball.
The first is that teams should let their starters pitch deeper into games. Obviously real-life decision-making would consider the specific circumstances, like how close the ongoing game is, how fresh your relievers are, and who’s warming up in the bullpen. You could also say that my focus on the sixth inning, a fulcrum point for whether a manager needs to cover innings or whether they can follow their game script, is cherry-picking. (Earlier pitching changes may be either preplanned or out of necessity; later in the game, the math of going to the bullpen gets more favorable.) Yet this is when these decisions happen. Last year starting pitchers averaged 5.1 innings, and lasted between 15 and 18 outs 53 percent of the time. The sample sizes in this range are relatively robust, which is important given how noisy the results are. And the broad takeaway, that there is a tangible cost to using the bullpen more than you have to, is not limited to a specific point in the game.
The second is that conventional value metrics underrate innings-eaters. Wins Above Replacement is a function of playing time: every inning pitched by a big-league-caliber player is an inning not delegated to an emergency call-up. But while dipping into the eponymous replacement-level pool is frictionless in theory, actual players take up roster spots. Shuttling in a fresh arm to help out a tired staff means sending another reliever down for two weeks, and you can’t make a waiver claim in the fourth inning because your starter got shelled in the third. If there is tangible second-order value in lightening the bullpen’s workload, then starters deserve credit not just for the innings they pitch but for the innings their teammates don’t have to pitch.
Finally, if teams found this convincing and wanted their starters to work deeper into games, there are ways to facilitate that. Reemphasizing the art of inducing weak contact to get outs faster, pocketing your secondaries in the early innings so you can give hitters a different look the second or third time through the order, and modulating your effort level to reduce stress on the arm would all put starters in better positions to last through the sixth inning — and also run counter to the sport’s well-documented trends towards higher velocities, more breaking balls, and a first principle of generating whiffs. It is a clearer throughline down the same corrective path I envisioned for playoff bullpen overuse. Could the end of the increasing-strikeout era be nigh? As poorly conceived as the league’s six-inning-minimum proposal was, they got one thing right: starters pitching deeper into games would lead to more balls in play. At least until the later innings, when the bullpen door finally swings open and a more-selective group of better-rested relievers comes in looking even nastier.
I’m not suggesting we rewind pitcher usage back by decades. Starters went over six innings per outing as recently as 2011. Even in 2017 the average rounded to six instead of five. But I believe the league has blown past the optimal point of pitch count and times-through-the-order caution, and the runs saved in the rotation are actually borrowed with interest from the bullpen. If nothing else, my suggested takeaway from these numbers is: When in doubt, leave the starter in. Then these overworked bullpens may finally get some relief.
A methodological note: In theory, using total runs is more appropriate for analyzing in-game strategy than including only earned runs. Here this is confounded by the stupidest rule in baseball: the extra baserunner who starts on second base in the 10th inning or later, and counts as an unearned run if they score. Thus RA9 is biased against deep-start situations, as a larger proportion of the bullpen’s workload is tainted by the so-called “Manfred man.” (For example, relievers who come in after 27-out starts, when every inning they pitch features a ghostrunner, had a 1.29 ERA but a 10.29 RA9.) I ran all these calculations in parallel using RA9 and got similar results when I excluded extra-inning games, but having to add that filter felt just as wrong as using the more-familiar ERA.
The smoothed results tend to skew slightly higher, mirroring the distribution of single-game ERA values on which the model is based. Regressing aggregated ERA on start length and weighting by the sample size of each increment yielded a more visually pleasing trendline and a more dramatic result (six points of bullpen ERA per starter out), but that felt like cherry-picking.
Holding the number of bullpen outs constant for tomorrow reduces the additive impact from what we see within the same game. This is also why the tomorrow-only trendlines appear so much flatter.
Is there a possibility of correlation between "Starter pitches deeper into game" and "opposing offense is weaker" (and the inverse of "Starter leaves game early" and "opposing offense is stronger") that helps explain the Bullpen ERAs in each instance?
Great analysis of this situation!!! When I saw this idea, it sounded good on the surface, but I couldn’t think of what could go wrong, and you brought out a ton of stuff that might go wrong, great job! Thanks!
Not that it’s bad, but it would also end the use of openers as well as any tandem starts.