Post #36 – Endless Fields of Green

Intrinsic Value [Rethinking Player valuation systems]

January 17, 2024

Baseball is 90 percent mental. The other half is physical. Yogi Berra

1. Introduction

On January 23, 2024, the Baseball Hall of Fame [HOF] will announce which players are being inducted into the 2024 HOF class [manager Jim Leyland has already been selected]. Until the announcement, most of the discussion about these inductions will concern two subjects: 1) how are the eligible players doing on Ryan Thibodaux’s Hall of Fame Tracker and 2) how do these eligible players measure up using the Wins Above Replacement [WAR] evaluation system. While writing this blog, I have used the WAR method myself to evaluate players; even though I could not figure out a player’s WAR to save my life.* This is, to say the very least, sub-optimal. In this essay, I will look at the history of how WAR became: 1) so widely disseminated and 2) the pre-eminent Baseball player evaluation system, especially over Bill James Win Shares [WS] system. Every Baseball player evaluation system has shortcomings. If you are going to use such a method, an understanding of the system’s weak spots should be required. As far as WAR goes, there are definitely some ghosts in the machine. The flaws inherent in the WAR system may even be large enough to deny the qualified and elect the unworthy to the Baseball Hall of Fame. Thus, knowledge of WAR’s defects should be a pre-requisite before using it to evaluate any player for any reason.

*Figuring Slugging Percentage: basic Baseball math. Figuring WAR: Baseball physics.

2. A Brief History of Win Shares and WAR

In 2003, the baseball writer Bill James, the godfather of Sabrmetrics [applying the scientific method to Baseball], published a book titled Win Shares. The Win Shares system was an all-encompassing player evaluation system. In his book, James explained his system and then got to the fun part: ranking all the players. Meanwhile in the 1990s, there was a Baseball annual published called the Baseball Prospectus [which was inspired by Bill James’ own annual Baseball Abstracts of the 1980s]. In these Baseball Prospectus books, Keith Woolner developed his own player evaluation system which he called VORP [Value Over Replacement Player]. In subsequent Baseball Prospectus annuals, the VORP system mutated into WARP [Wins Above Replacement Player] and then into WAR itself. In the realm of ideas, these two player evaluation methods, Win Shares and WAR, competed in the early 2000s. And WAR won this competition rather easily. One of the main reasons that WAR became the preferred method for evaluating Baseball players had a lot to do with simple accessibility. In 2009, the WAR system results were adopted by and included with every player’s statistical profile on the Baseball Reference website [baseballreference.com]. Baseball Reference had been founded in 2000 and, by 2005, had made all the printed Baseball Encyclopedias [BE] obsolete.* From 2009 on, WAR was right there on the source where virtually every Baseball fan goes to get their statistical fix. Meanwhile, the results of the Win Shares system were not readily accessible [though they could eventually be accessed on billjames.online]. Perhaps the first question to ask is: Did WAR win out because it was a better system than Win Shares or just because it was right there at your fingertips?

* The last MacMillan BE was published in 1996 and last Total Baseball BE in 2004.

3. Explaining Win Shares

Bill James believed any system for player evaluation needed to be grounded in actual results. If the Boston Red Sox won 105 games in 1912, the players on that 1912 Red Sox team needed to each be credited with an individual share of those 105 wins. For reasons that escape me, James then decided to triple those 105 wins so that the 1912 Red Sox players had 315 partial Win Shares to be split amongst themselves. Using statistical formulas to measure each player’s defensive and offensive contributions, James then did precisely that: he credited each player on the 1912 Red Sox with their share of the 315 partial wins.* Critics of the Win Shares system immediately pointed out that the players should have also gotten some discredit for the 47 losses suffered by the 1912 Boston Red Sox too. James would later explain that the players did receive won-loss records, but these records were then simplified into a single number. After all, a single number to rank players was the objective. Bill James did his WS calculations for every Major League team throughout history. By far the most important thing about this system was that you couldn’t see the sausage being made [which, of course, is also even more true of the WAR system]. When evaluating Baseball players, the offensive side is the much easier task. For the most part, each batter hits and produces runs without any help [disregarding the small gray areas of driving runners in, sacrifice outs, intentional walks and so forth]. In Baseball, batting or offensive contributions are easy to measure. But the defensive side is quite a bit harder. How much of pitching is just team defense? How much of defense is just good (or bad) pitching? How much defense is simply illusions of the park, defensive schemes of the team, whether the pitcher is left-handed or right-handed, player positioning, catcher framing, elective plays, etc? Perhaps the best recommendation of them all for the Win Shares system was Bill James apparently life-long obsession with quantifying Baseball defense.

*Tris Speaker led the 1912 BoSox with 51 Win Shares [or 17.0 full Wins]. Smokey Joe Wood was second with 44 [or 14.67 full wins]. But the WAR evaluation methods disagrees and lists Smokey Joe first with 11.1 WAR followed by his good friend Speaker at 10.4.

4. Win Shares Evaluated: The Problem of Luck

However, the Win Shares method had a very large problem buried in its calculations. The basic flaw of Win Shares as a Baseball player evaluation tool can be demonstrated with a hypothetical. Imagine that the exact same player performs for two different teams. Just for fun, imagine that this player is the star of each of these teams [bats .300, hits 30 homers, drives in 100 runs]. Both teams score 700 runs but also give up 700 runs during a 162-game season. The Pythagorean theorem tells us that each of these teams should have finished with a record of exactly 81 wins and 81 losses. But what if one of these teams gets lucky? Despite scoring exactly as many runs as they gave up, the team finishes with a record of 90-72. Meanwhile, the other team is unlucky and ends the season at 72-90. Random variations like these happen during Major League seasons all the time. Win Shares, which ties player value to the exact number of wins that his team gets, will conclude that the exact same player on these two teams is worth a completely different amount. In fact, the player on the lucky team will be judged to have been 25% better than the exact same player on the unlucky team.* This is nonsensical on both a player evaluation and/or single season level. It is the exact same player and his value is only fluctuating by chance. But the accuracy of the Win Shares system obviously improves over multiple seasons and longer careers. Over time, luck washes out and Win Shares should become more and more accurate. Basically, Win Shares is a player evaluation system which is not very good at evaluating single seasons or short careers; but improves steadily over time with more seasons and longer careers. Strangely enough, I would have more faith in Win Shares evaluation of Willie Mays entire career rather than any single season.

*In other words, the same exact player such as our 30HR/100 RBI/.300 star, would be given different Win Share amounts depending on the team’s finish: 24 WS with the 72-90 team, 30 WS with the 81-81 team; 37.5 WS with the 90-72 club. But, in each case, he is exactly the same player.

5. Another problem with Win Shares

And there is another major problem with Win Shares. Unlike the problem of random luck explained above, there have been (as far as I know) no articles addressing this flaw anywhere. Win Shares seems to underrate pitchers badly. This doesn’t necessarily mean that Win Shares is wrong in its evaluation of pitchers. But Win Shares evidently believes that pitchers are worth far less than the WAR system or the most basic Baseball player evaluation system of them all, the Hall of Fame. In the Baseball Hall of Fame right now, 343 people have been elected.* This breaks down as 84 pitchers, 186 position players, 40 executives, 23 managers, and 10 umpires. In other words, the Hall of Fame believes that 84 of the 270 best Baseball players of all time were pitchers [31.1%]. Of the top 270 players rated by WAR, 88 are pitchers [32.6%]. By WAR standards, the Hall of Fame seems to have a very slight bias towards hitters over pitchers. But it is also within any reasonable standard of random variation. Then we have Win Shares. Of the top 270 players rated by Win Shares, only 49 players are pitchers [a meagre 18.1%].* The question of whether WS is correct in downgrading all pitchers is beyond my mathematical capability. But, to say the least, it certainly feels completely wrong. On top of that, WS also seems to heavily favor pitchers from long ago and far away over their modern brethren. So, the question becomes: Why does the Win Shares system underestimate pitchers so badly and penalize modern pitchers even worse?

*It would be 50 pitchers if you counted John Ward, but Ward is not counted as a pitcher by either the Hall of Fame or WAR system. Of course, there is no easily accessible list of the current career WS leaders, so I may have missed a pitcher (or two at most). But, even if I did, the poor showing of pitchers in the WS system would still be remarkable.

6. Win Shares and the Question of Defense

The only way Win Shares can underestimate Pitcher Value [PV] would be to minimize the pitcher’s share of defensive credit. On the most basic level, the game of Baseball is 50% offense and 50% defense. One is the ying to the other’s yang. Offensive credit is easy to give. The production of offense in Baseball is easily broken down into individual units. But defense in Baseball is always a team effort that begins with the pitcher. Defensive credit is almost always shared. For an example, a pitcher throws a good pitch in on a batter’s hands, breaks his bat, and induces a weak pop-up. The pop-up could be easily caught by the second baseman. But the shortstop runs over, calls for it instead, and elects to make an easy catch himself. Who gets credit? Obviously, the pitcher should get the lion’s share for this play. But what about the shortstop and second baseman? In many cases, defensive plays are elective plays. How do you cut up defensive credit for elective plays? In another example, the pitcher gives up an absolute rocket and the centerfielder makes an outstanding play running it down. Who gets credit on this play? In this case, it’s the center fielder, not the pitcher, that should get the lion’s share of the credit. Every Baseball player evaluation system must decide how to apportion defensive credit, especially between pitchers and all other defensive players. Justin Verlander, the most accomplished of all the currently active pitchers, ranks exactly 63^rd by WAR right now, but just 306^th (more or less) by Win Shares.* WAR credits Verlander with being an inner circle Hall of Famer. Win Shares believes Verlander to be a very marginal HOF candidate. Does that WS result feel correct? It certainly feels wrong to me. If it is incorrect, there is very obviously a defensive apportionment problem with the WS system. WS may have shifted too much credit for the defense from the pitchers to the other players. It also means that all the other players, except for the pitchers, have had their defensive value increased across the board. It is also not the only problem with the WS defensive evaluation scheme. Every Baseball player evaluation system has to also address the Timeline problem.

*The source for career Win Shares is the Bill James website. To put it bluntly, the Win Shares finder there is a mess. Everything from rampant mathematical mistakes to more serious errors like the two different Elmer Smiths having their careers combined. Verlander may actually be 307^th or even 308^th (I compiled the career WS list as carefully as I could but who knows).

7. The Pitching Timeline Problem

Throughout the history of Baseball, starting pitchers have thrown harder and harder and, because of this fact, have also hurled less and less innings. In the 19^th Century [1871-1899], starting pitchers threw hard but probably almost never threw all out. Some pitchers would throw over 500 (even 600) innings in a season. But they still needed to save their arms. It would be fascinating to know how hard they threw. Were they consistently over 80 miles per hour [mph]? They were obviously throwing hard enough that the distance from the pitching box [later mound] was twice moved back by about 5 feet. In 1893, the traditional distance of 60 feet and 6 inches from pitching rubber to home plate was established. Innings pitched [IP] fell below 400 and continued to drop. In the early 20^th century, there were obviously starting pitchers who threw above 90 mph. But they paced themselves, throwing hard in a pinch. However, as innings continued to drop, these pitchers threw harder more consistently. By the end of the 20^th century, pitchers could lead the league with only 250 IP or so. In the 21^st century, the League leading totals of IP dropped even further to just over 200 IP. Now, virtually every pitcher (starting or relieving) is throwing the ball as hard as humanly possible all the time, even over 100 mph. Were the “Old-Innings-Eating” pitchers of yesterday more valuable than the “Better-to-Burn-Out-than-Fade-Away” flamethrowers of today? By WAR and Win Shares, the ancient pitchers are evaluated as much more valuable than any modern pitcher. But the fact is that those ancient pitchers relied much more on their defense than today’s hurlers. A modern ace, such as Jason Verlander, deserves a much greater percentage of the defensive credit for his pitching than a 19^th Century Hall of Famer like Charles Radbourn. In other words, modern pitchers are most surely getting shortchanged because the pitcher’s percentage of the defensive credit should not be constant over time. Of course, in evaluating any large group of Baseball players, there will always be timeline issues .

8. Explaining WAR

WAR is a complex statistical calculation that assigns each player a value after making determinations of that player’s contributions on Offense [batting] and Defense [pitching and fielding] and then subtracting the Replacement Value [RV] both offensively and defensively. The replacement value is yet another determined number and represents how much value a completely fungible or replaceable player would have. Once all these calculations are made, WAR adds up the player’s offense and defense for a season or a career, subtracts the replacement value, and gives the player his seasonal or overall score. In WAR, unlike Win Shares, it is possible to get a negative score (the player is worse than some random minor league players that could replace him). Of course, all this needs to be taken on faith. The inventors of this system are not even letting anyone how what is in the sausage much less how it is being made. In fact, WAR is separated into offensive and defensive WAR on Baseball Reference. But you cannot just add these two figures up to get the overall WAR. Each figure includes a duplicated RV. It is almost like they don’t want anyone to check their work. The calculations underlying WAR are based on formulas that make the assumption that certain truths are unchangeable [such as replacement value and the value of the defensive spectrum, i.e. which positions are more or less valuable]. However, there is a problem with codifying your formulas about Baseball. Baseball mutates over time. The Replacement Value and the defensive spectrum valuations are not applicable throughout the ages and eras of Baseball. This can be a major problem with the system.

*Third Base is the position usually used to illustrate how positional RV changes over time. In the early days of Baseball, third baseman needed to be quick and agile to field the numerous bunts of the time. Basically, they were good fielding players with the arm but not the range to play shortstop. But as home run hitting took over the game, bunting went all but extinct. Third basemen became players with the arm to play right field but athletic enough to also play the infield. The Replacement Values for these two sets of talents are completely different.

9. WAR Evaluated: The Problem of Compounding Errors

Basically, the main problem with WAR is the problem of Compounding Errors [CE]. WAR has so many calculations that, if the inherent errors do not cancel each other out, these flaws can compound each other and arrive at a truly odd result. The problem of these Compounding Errors is pretty much entirely on the Defensive side. The Designated Hitter penalty is a good example of one of these CEs. Of course, Designated Hitters [DH] hit for the Pitcher and do not play the field. They have no defensive value at all. Because of this, WAR gives the DH position a massive penalty, assuming DHs need to hit a ton to justify their value. But, in real life, the DH is a much harder job than it looks. Players with the proper focus to be DHs are hard to find. Usually, First Basemen [1B] hit better than DHs. But 1Bs are not penalized by WAR as badly as DHs. Over a career, this penalty compounds and a career long DH would be completely undervalued next to an equivalent 1B. WAR also seems to overvalue the top of the defensive spectrum (CF, SS and 2B, ignoring catchers). WAR’s undervaluing of poor defensive players and overvaluing of good defensive players can be demonstrated by two outfielders of this year’s HOF ballot. Gary Sheffield was admittedly not the best outfielder. But WAR compounds his poor defense until arriving at the odd conclusion that he is twice as bad as a normal DH. Meanwhile, Andruw Jones, a great centerfielder [CF] who got fat and lost his range mid-career, rates as the best CF of all time. Although personal opinions make for bad arguments, I watched both men play often. I don’t remember Sheffield as the equivalent of a cripple in a wheelchair playing Baseball. I also don’t remember Andruw Jones, even at his peak, as the Baryshnikov of the outfield. He was very good until he gained the weight. But I never thought that he was incredibly better (or even better) than other CFs that I watched such as Paul Blair, Garry Maddox, Jim Edmunds, or Kevin Kiermaier. But WAR has, for better or for worse, adopted calculations that will excessively penalize poor defense and unreasonably overvalue good defense. Over a long career, this difference can compound and lead to some truly odd player evaluations.

10. WAR and another Defensive Problem

Other than Compounding Errors, there is another problem with Defense that needs to be addressed. Defensive value is often credited to a player although the player himself basically does not deserve it. An example would be the recently banned (or curtailed) shift. In the classic shift, an infielder is shifted onto the other side of the infield or into the opposite-side outfield between the outfielders and the infield. Basically, the shift is designed to thwart pull happy hitters. A shifted infielder will generally make far more plays than an unshifted infielder. Player evaluation systems will then credit this infielder with all these extra plays; and he will seem to be a much more valuable player. But is it an individual or a team value? If the same exact player plays for a team that refuses to shift, does the player lose value? Is he a worse player? The answer would seem to be No. His “intrinsic value” as a Baseball player has not changed but his team value has risen. Another example would be the art of “pitch framing” by catchers. Some catchers, by the way they receive the ball, can trick the umpire into calling borderline balls as strikes. Is this valuable to the team? Absolutely. But is it an “intrinsic value” for catchers? No, it is an umpire weakness. Relatively soon, the art of pitch framing will probably be all but extinct after the Major Leagues adopt video reviews of strike calls. An “Intrinsic Value” cannot simply be erased. And then there is the problem of “elective defense.” Imagine your team has a great center fielder [CF]. This CF gets great jumps on the ball. He glides over to catch the ball. Many of these balls could be caught by either the right [RF] or left fielder [LF]. But, because this CF is the best outfielder on the team and is also considered the “captain” of the outfield, the CF consistently calls off the RF and/or LF and “elects” to catch the ball. Under Player Evaluation Systems, the CF gets all the extra credit for this “elective defense.” But does the CF deserve credit for these plays? On a team with a bad CF, these plays still get made, just by the RF or LF. In other words, a lot of defensive credit needs to be taken with a grain of salt.

11. Intrinsic Value

A decent Baseball Player Evaluation system should take these types of Defensive Value caveats under consideration. Defense is a team, not an individual value. Teams can even make choices that directly “devalue” their players under these evaluation systems. Last season [2023], there was a classic example of this type of situation. The St. Louis Cardinals promoted their prize rookie, Jordan Walker, to the Major Leagues. Walker had been a third baseman for his entire career but was blocked in St. Louis by Nolan Arenado, a probable future Hall of Famer. The Cardinals decided to move Walker to the outfield and let him play right [RF]. A decent hitter, WAR credits Jordan with 1.6 WAR on offense. But, with very little experience in the outfield, Walker was brutal out there in the sun field. On defense, WAR gives Walker a negative 2.1 WAR. In other words, a team decision made Walker a below average player according to WAR. If the Cardinals had posted Walker at third base [3B], it is very likely that Walker would have had a positive defensive WAR and been, by the Player Evaluation system’s estimation, a much more valuable player. Walker’s WAR system rating definitely devalued hum but his “intrinsic value” as a Baseball player remains. The great Pete Rose is another player whose value is underrated in this way. Rose began his career as a 2B [1963-1966]. Then he played LF [1967, 1972-1974] and RF [1968-1971]. He moved to 3B [1975-1978] before finishing out his career as a 1B [1979-1986]. By WAR, Rose was about average at 2B, poor in RF, good in LF, brutal at 3B, and then washed up at 1B. In each case, Rose switched positions to help his team. But WAR gives him no credit for this. According to WAR, the versatility of Rose is not an asset. It is obvious that WAR would consider Rose a more valuable player if he had just played 2B or LF for most of his career. Perhaps someday a Player Evaluation system will be designed that can measure a Baseball Player’s Intrinsic Value and the credit Pete Rose for his versatility. But, until the supercomputer capable of making these calculations is invented (and programmed), Intrinsic Value will just remain the ideal.

12. Conclusions

Every Baseball player evaluation system has its flaws. Win Shares bases its evaluation method on games won during the season. Because simple luck can play a large part in this annual total, individual player Win Share totals for each single season can contain large errors. However, the Win Shares system improves its accuracy as more and more seasons of any player are included, and luck washes out. Win Shares also has the problem of devaluing pitcher’s contributions to the team across the board. Every pitcher, but especially modern pitchers, are not credited with their full value. This missing defensive value is spread out among the rest of the team’s players. In other words, the defensive value given to every player, except the pitchers, by the Win Shares system should be reduced. On the other hand, the WAR system for evaluating Baseball Players is almost surely better at judging individual seasons. But WAR uses set formulas to generate its player evaluations and some of these formulas are off by a degree or two. In some cases, the errors in these various formulas compound with each other and the player career evaluations can get completely out of whack. It is also apparent that WAR over zealously rewards good to great defensive players while punishing poor to bad defensive players like they stole WAR’s wallet. Someday, a properly programmed computer will be able to correctly value Baseball players. Until that time, it is probably best to simply use every available tool to look at each player. But it may already be too late. The WAR system is ascendent and the Win Shares system is about to be throw on the ash heap of history.* On January 23, 2024, several Baseball players may be elected to the Baseball Hall of Fame because their WAR score is good although their “Intrinsic Value” is not quite as high. It’s a damn shame.

*The Win Shares system seems to be going the way of the dodo bird. Bill James just published his last “Bill James handbook” annual [the 2024 Walk-Off Edition]. His Bill James Online site, the only place, as far as I know, to get updated Win Share information, is shutting down (James seems to be retiring). Win Shares has conceded the field to WAR.

NEXT POST: Evaluating the 2024 Baseball Hall of Fame Candidates by using a combination of the WAR and Win Shares Player Evaluation Systems while trying to also consider the Player’s actual “Intrinsic Value.”

FUTURE POSTS: Exploring the concept of Intrinsic Value: 1) Who was a better player, Joe DiMaggio or Stan Musial, and 2) Who was best player out of David Ortiz, Manny Ramirez, and Gary Sheffield?

Leave a Comment Cancel reply