The Futility Infielder

A Baseball Journal by Jay Jaffe I'm a baseball fan living in New York City. In between long tirades about the New York Yankees and the national pastime in general, I'm a graphic designer.

Tuesday, April 25, 2006

 

Da Doo Run Run

The latest Hit List is up at Baseball Prospectus, and for the first time in its brief life, the Yankees top the List despite their rather unremarkable record of 9-8. How can that be? I discovered something very interesting in my attempt to figure out why.

As you (hopefully) know by now, the Hit List rankings are based on four components that make up BP's Adjusted Standings, namely the actual, first-, second-, and third-order winning percentages. First-order winning percentage is based on a team's Pythagorean record, that is, their expected winning percentage based on the number of runs they score and allow. Second-order winning percentage is their Pythagorean record based on the number of runs they should have scored or allowed given the run elements (hits, walks, total bases, etc.) their offense or pitching has racked up, and normalized to adjust for park and league scoring levels. Third-order is their Pythagorean record based on those same normalized run elements, but also adjusting for the level of competition based on Equivalent Average allowed.

The Yanks' overall Hit List Factor, the average of those four percentages, is .662. That's based on the following component winning percentages:
         W    L    PCT
Actual 9.0 8.0 .526
1st 11.6 5.4 .682
2nd 12.9 4.1 .759
3rd 11.5 5.5 .676
What these four figures are telling us is that first, the Yanks should have a much better record based on the number of runs they've scored and allowed; that's because they've piled up tons of runs in their wins, which have tended to be lopsided, while not scoring much in their losses, which have generally been close. Second, based on their hits, walks, total bases and such, the Yanks should have scored even more runs and allowed even fewer; it's that .759 percentage that elevates their overall Hit List Factor above those of the White Sox, Tigers and Mets, teams with better actual records at this point.

The implication, backed up by plenty of research from my BP colleagues and others, is that a team's record will come to more closely approximate these indicators given larger sample sizes. In other words, the Yankees are better than the .526 team showing up in the AL East standings, and in time they should find their winning percentage closer to .600, if not .676; they're not likely to maintain the breakneck pace of scoring 6.24 runs per game, or over 1,000 for the season, but 900 runs is a very real possibility.

Now, so far I'm simply rehashing old news as far as the way the Hit List works. What's interesting is that when I looked more closely, I found that in addition to having the largest difference between actual runs scored and allowed (35 runs, one more than the White Sox), they have the largest differential between OPS (On Base Percentage plus Slugging Percentage) and OPS allowed by a much wider margin. Through Sunday the Yanks had put up an OPS of .888 and allowed one of .679, a 209 point differential. That's 59 points more than the Tigers' differential, 85 points more than the White Sox, and 89 points better than the Mets, the three teams who the Yanks narrowly edge for the top spot.

That OPS differential has a huge predictive power when it comes to the Hit List. Based on this year, the correlation between OPS differential and Hit List Factor (as I mention in the Brewers comment) is .942. Based on last year's numbers, it's even better, .958. Correlations don't come much higher than that, though it's not a mystery why it works so well; OPS uses the same ingredients -- run components like hits, walks, and total bases -- in a more simple brew than the Hit List Factor components. As it turns out, raw run differential has an even higher correlation, .962 based on this year's list, .972 based on last year's numbers. Again, not a huge surprise except for the magnitude, which is near perfection within the realm of baseball's statistical correlations. Either differential makes an excellent proxy for the Hit List rankings.

The take-home message is that both run components and simply raw run differentials explain upwards of 90 percent (the square of the correlation, depending upon which figure you choose) of a team's ranking on the Hit List; the rest, a considerably narrower slice of the pie, is a combination of park effects, situational performance, luck, randomness, mojo, moxie, gumption, intestinal fortitude, voodoo, santeria, and those all-important intangibles which Derek Jeter grows the way a hipster grows sideburns. It's the runs, people. Da doo run run, da doo run run.

• • •

Last week was a banner week for Hall of Fame discussions, at least where the Jaffe WARP Score (JAWS) system is concerned. Marc Normandin checked in on Luis Gonzalez's case on the occasion of Gonzo's becoming the 21st player to reach 500 doubles and 300 homers over at Beyond the Box Score, while Joe Sheehan evaluated the struggling Jim Edmonds both at BP and at SI.com, where BP content is occasionally running in syndication.

Alas, Sheehan -- as he informed me after a reader pointed it out, long before I even saw the article -- botched the methodology, using the old five consecutive year WARP total as his peak score instead of his best seven years at large which I instituted this year. By those measures, Edmonds came up short. But Sheehan also made another, more subtle error by using the numbers that are currently on Edmonds' player card at BP but comparing them to the positional averages from my Hall of Fame ballot article in December. Clay Davenport, creator of the WARP (Wins Above Replacement Player) universe of player valuation, is a notorious tinkerer, and from time to time without announcing it, he'll dump a whole new set of data -- generally based on a more refined calibration of the replacement level with regards to fielding responsibilities -- on an unsuspecting public. Usually, the changes are undetectable to the casual observer, but this time they're rather drastic as far as some players are concerned. Looking at Edmonds and the Hall CFs:
OLD        BRAR BRAA FRAA  Career  Peak   JAWS
Edmonds 565 381 96 95.8 68.0 81.9
Avg HOF CF 715 466 -8 108.6 63.8 86.2

NEW BRAR BRAA FRAA Career Peak JAWS
Edmonds 555 371 106 98.3 78.2 88.3
Avg HOF CF 731 478 0 108.8 63.4 86.1
The averages (which are actually computed by throwing out the lowest player at each position to correct for the Veterans Committee's mistakes) for CFs haven't changed more than a hair, but Edmonds' totals have risen, particularly with regards to his peak score; his best seasons are more valuable than previously calculated, which puts him above the JAWS standards already and in fact are better than all but Willie Mays, Ty Cobb, Tris Speaker, Mickey Mantle and Joe DiMaggio among HOF CFs. Among active players, only Ken Griffey Jr. ranks above Edmonds. That's pretty staggering, and I'm not entirely sure I'm ready to accept that without checking with Clay to get a better understanding of the update. The one relevant figure not shown here, Fielding Runs Above Replacement, has changed more significantly for Edmonds, rising to 328 from 301, but I don't have a track on how the Hall averages have changed in that regard. Nonetheless, it's clear that Edmonds derives a good chunk of his value from defense; alas, not every player with eight Gold Gloves to his name has the advanced metrics to back it up. By the time things are all said and done, he should have a very good case for the Hall of Fame even if he never reaches 400 homers or some other round-numbered milestone.

Looking at Gonzalez and the JAWS standards for leftfielders as gathered by Normandin (I'm still waiting for my data dump):
OLD        BRAR BRAA FRAA  Career  Peak   JAWS
Gonzalez 582 322 74 93.8 55.8 74.8
Avg HOF LF 745 470 -15 105.2 59.7 82.4

NEW BRAR BRAA FRAA Career Peak JAWS
Gonzalez 584 321 88 100.7 58.3 79.5
Avg HOF LF 784 506 -3 110.0 61.1 85.6
The JAWS score for Hall LFs has risen by a few points, as has that of Gonzalez. In terms of FRAR, Gonzo's total shot up even more than Edmonds', from 258 to 317, helping to give him more wins. He's still short of the JAWS standards; a bit closer, though he probably won't make it.

The take-home message here is that the HOF averages have moved, which means I need to recalibrate JAWS unless industrious souls like Normandin step forward. I'm tickled pink that people are actually using a system I devised, but it's important not to mix apples and oranges in making the comparisons, so do check with me or crunch the numbers yourself rather than simply citing what may now be obsolete.

As an aside, I'd love it if the numbers were updated automatically; that's something I'm working towards in conjunction with BP's tech crew, but it's a low-level priority for them compared to switching servers, introducing the customized sortable stat reports, and all of that other good stuff, and I just spent my energy and tech-time chits getting the Hit List gears in working order.

As for JAWS, I've got a New York Sun piece on Bernie Williams' Hall of Fame case in the pipeline for Thursday publication, so we'll be beating this horse again later in the week.

Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

June 2001   July 2001   August 2001   September 2001   October 2001   November 2001   December 2001   January 2002   February 2002   March 2002   April 2002   May 2002   June 2002   July 2002   August 2002   September 2002   October 2002   November 2002   December 2002   January 2003   February 2003   March 2003   April 2003   May 2003   June 2003   July 2003   August 2003   September 2003   October 2003   November 2003   December 2003   January 2004   February 2004   March 2004   April 2004   May 2004   June 2004   July 2004   August 2004   September 2004   October 2004   November 2004   December 2004   January 2005   February 2005   March 2005   April 2005   May 2005   June 2005   July 2005   August 2005   September 2005   October 2005   November 2005   December 2005   January 2006   February 2006   March 2006   April 2006   May 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   May 2007   June 2007   July 2007   August 2007   September 2007   October 2007   November 2007   December 2007   January 2008   February 2008   March 2008   April 2008   May 2008   June 2008   July 2008   August 2008   September 2008   October 2008   November 2008   December 2008   January 2009   February 2009   March 2009   April 2009   May 2009   June 2009   July 2009   August 2009   September 2009   October 2009   November 2009   December 2009   January 2010   February 2010   March 2010   April 2010   May 2010  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]