The Futility Infielder

A Baseball Journal by Jay Jaffe I'm a baseball fan living in New York City. In between long tirades about the New York Yankees and the national pastime in general, I'm a graphic designer.

Friday, July 25, 2003

 

A Whole New Thang?

Most of us who start baseball blogs simply want to offer our opinions and responses to the issues of the day as well as our takes on what our favorite teams are doing. But Julien Headley, the author of Julien's Baseball Blog, is offering a new twist: the development of a statistical system that, if it does as it claims, might be to hitters what DIPS is to pitchers.

It's not easy to follow the development of a formula or system in a blog, where the oldest posts are at the bottom of an archived page, the newest stuff is piled on top, and the definitive initial presentation (à la Voros McCracken's unveiling of DIPS or Tangotiger's Base Runs) has yet to be written. Perhaps all of this is premature. But it's extremely intriguing, so I've spent some time sifting through what Julien has to say, not only so that I can understand it, but also in the hope of bringing his work to a wider audience which may offer him some suggestions or poke holes in his theory. This isn't meant as a shot at Julien, but rather an attempt to steer his work towards the kind of peer review which any work of sabermetric value needs. See Baseball Primer for a myriad of examples.

Here is a portion of Julien first post, from about a month ago. Since the writer seems to have come from the e.e. cummings school of capitalization, I've taken the liberty thoughout this article of capitalizing common statistical abbreviations such as OBP and SLG to make things more legible:
what do hitters like to do? get on base and move runners over (this is the starting point for this blog). the former is adequately measured by OBP, the latter by SLG. the thing is, both include a substantial ball-in-play component. balls in play are highly random. therefore, it takes a long time before the statistics have meaning (at least a whole season).

the idea here is to find meaning in smaller sample sizes. in contrast to balls in play, walks, strikeouts, and home runs quickly normalize to a level representative of players' abilities. thus our three stats, based on walks, strikeouts, and power.

basically we took the batting average out of OBP and SLG. OBP without batting average gives you walk percentage. SLG minus average is isolated power, a similar idea to our power percentage.

wait a minute isn't batting average important? yes, but we can interpolate it based on contact and power. this is where contact percentage comes in. you see, there are two aspects to hitting for average: making contact, and hitting the ball hard. these things are measured by CON and POW. thus, if you know these numbers, you can predict what the player's average should be, given a significant sample size.

here's the cool thing: you can use the same numbers for pitchers. this time you want the numbers to be low. walk percentage measures control, contact percentage measures strikeouts, and power percentage measures the ability to keep the ball in the park.

baserunning and fielding are important aspects that are not taken into account by our method. they will be added to the discussion.
Walks, strikeouts, and power -- if this all sounds familiar, it's because these categories almost exactly match the holy trinity of defense-independent outcomes on which DIPS focuses. "Balls in play are highly random" -- more DIPS. Julien's well aware of this.

Here are the formulae for Julien's three stats on the hitters side (I'm going to ignore pitchers for now, though Julien certainly hasn't):
WAL = (BB + HBP) / (AB + BB + HBP)

CON = (AB - K) / (AB)

POW = .273 + .285 * (TB - H) / (AB - K)
WAL is walks per plate appearance, CON is contact per at-bat, and POW is a predictor for hits per contact. According to Julien, the major league averages of WAL, CON, and POW are .100, .800, and .330, respectively. It should be noted in the short time that Julien's been running his blog that the POW concept has undergone some change; this stuff is still a work in progress. In his initial statement it was a power percentage, (TB - H) / (AB - K), which translated into extra bases per contact at-bat. But then Julien did some regressions and discovered that hits on contact is easily predicted by that old POW in a linear formula. He revised POW to the new normalized POW, and now claims that this suite can predict AVG, OBP, and SLG in the following manner:

AVG = CON * POW
OBP = WAL + (1-WAL) * CON * POW
SLG = CON * POW + ISO
ISO is short for isolated power, a stat Bill James introduced in his Baseball Abstracts in the early '80s. It's extra bases per at bat; the formula is (TB - H) / AB. Again, according to Julien:
the numbers [WAL, CON, and POW] also have the advantage that they have meaning in small sample sizes. thus they can be used to predict the results of larger sample sizes. for example, over time batting average converges to CON*POW. that means after 30--40 games you can use CON*POW to see how well someone has actually been playing, and how lucky or unlucky he's been...

the numbers work for batters and pitchers. batters want them to be high; pitchers want them to be low. everyone can now be easily compared. major league average = .100 .800 .330.
These are bold statements to make, and I'm not going to be the one to tell you definitively that they work or they don't. Much as I love baseball statistics, and know which ones are important, that doesn't make me an expert on correlations, standard deviations, regressions, significance and the other stuff which makes a new-fangled stat such as DIPS or POW statistically valid. For Julien's work to gain acceptance in the sabermetric community, he'll need to give interested readers a deeper look into the method of his madness.

I'll open with a few questions of my own, which I hope Julien will answer:

• Where is the evidence that those numbers WAL, CON, and POW have meaning in small sample sizes, that, as you say "walks, strikeouts, and home runs quickly normalize to a level representative of players' abilities"?

• Those "major-league averages" for referred for WAL, CON, and POW -- do they refer to 2003, the last few years, or a longer-range time period?

• As far as the predictive value of this suite, can we see some comparisons based on prior seasons to see where these formulae worked and where they did not?

Julien's taken the time to run the numbers on everybody who's gotten a significant amount of playing time this year, as well as some lists of the best, luckiest and unluckiest hitters and pitchers. Check it out, and keep an eye on this guy's stuff. I'll be back with another look soon.

Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

June 2001   July 2001   August 2001   September 2001   October 2001   November 2001   December 2001   January 2002   February 2002   March 2002   April 2002   May 2002   June 2002   July 2002   August 2002   September 2002   October 2002   November 2002   December 2002   January 2003   February 2003   March 2003   April 2003   May 2003   June 2003   July 2003   August 2003   September 2003   October 2003   November 2003   December 2003   January 2004   February 2004   March 2004   April 2004   May 2004   June 2004   July 2004   August 2004   September 2004   October 2004   November 2004   December 2004   January 2005   February 2005   March 2005   April 2005   May 2005   June 2005   July 2005   August 2005   September 2005   October 2005   November 2005   December 2005   January 2006   February 2006   March 2006   April 2006   May 2006   June 2006   July 2006   August 2006   September 2006   October 2006   November 2006   December 2006   January 2007   February 2007   March 2007   April 2007   May 2007   June 2007   July 2007   August 2007   September 2007   October 2007   November 2007   December 2007   January 2008   February 2008   March 2008   April 2008   May 2008   June 2008   July 2008   August 2008   September 2008   October 2008   November 2008   December 2008   January 2009   February 2009   March 2009   April 2009   May 2009   June 2009   July 2009   August 2009   September 2009   October 2009   November 2009   December 2009   January 2010   February 2010   March 2010   April 2010   May 2010  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]