Saturday, March 29, 2008

Calling All Stat Geeks


Interestingly enough, I've been reading a few baseball blogs lately. It started with this blog dedicated to all things Royals, by Sam Mellinger of the Kansas City Star. As far as quick-hit information on my team goes, it's great. One thing I have really enjoyed about this blog are the links he provides to other baseball stories or blog posts around the country. I'm finding I'd much rather read his blog than go to the actual news stories on the Kansas City Star's website.

Mellinger's blog then led me to this blog. It's another Royals blog, but trust me -- if you enjoy baseball, you will get addicted to its posts. RSS feeds are great and all, but honestly, how many times day is it healthy to check to see if there's a new post on a guy's blog?

My point in sharing these great baseball blogs with you is not to get you hooked on other blogs that are better (and not to mention, much more consistent), but to share a tiny revelation I had while reading other baseball blogs.

If you'll remember, my purpose in starting my blog was both to help good-intentioned friends understand why the game of baseball is so great, and to stir up unto remembrance those who already have a testimony of its greatness. I often did this by explaining odd or interesting details about the game. Well, it's been a while since I've been asked for any explanation, but today I found something that I had never seen or heard of: PETCO PECOTA.

Here's the context of the conversation. Rany was breaking down the Royals' opening day roster and discussing some of the late moves made by Dayton Moore, the Royals' GM (side note: Moore is a genius, and it won't be long before you hear his name mentioned with Theo Epstein's or Brian Cashman's as the premier General Manager in baseball). One such move was trading away a talented hitter. "Hitter" is used here, rather than player, because it was generally determined that there was no position for him. The subject of this player's development, or lack thereof, is a controversial one for Royals loons1 (1. I'm not there yet, 2. Yes, I just invented that term), as evidenced by the heated comments thread. It was here, amidst this wonderful exchange of ideas that this odd little name kept popping up. I'd seen it before, but had skipped over it, just like the good old days of skipping over the technical jargon in the stacks of Sociological journals I had to read in college. Finally, I could skip no longer and had to find out who this PECOTA guy was and why his opinion seemed to matter so much.

Bill Pecota was born February 16, 1960 in Redwood City, CA. He was drafted by the Kansas City Royals in 1981, and eventually played nine seasons in the Majors Leagues, the first six with the team that drafted him. His lifetime batting average was .249, basically one hit in every four at bats. He was an infielder who played mostly shortstop, then second base, then some third base, he even pitched a few innings in a pinch.

PECOTA, however, does not refer to Bill, but to Player Empirical Comparison and Optimization Test Algorithm. Of course, it's creator, Nate Silver, does submit that the acronym does originate with the third-baseman's surname. Apparently, making names into complex acronyms to describe even more complex statistical breakdowns is a common practice, as PANKOVITS is the measurement developed and used by the Houston Astros, named in honor of former utility player Jim Pankovits. In case you were wondering, and I know you were, PANKOVITS stands for Player Analysis with Neutral Knowledge of Offensively Vital Information Tracking Statistics. Sure, it's a stretch, but c'mon.

The actual formulas used to come up with PECOTA have never been released, but the principles involved have been discussed. Basically, to determine a player's performance in the upcoming season, they look at what he has done so far, either in the major, the minors, or both, and assign that player to past, comparable players. The four things used to determine "comparables" are below.

1. Production metrics – such as batting average, isolated power, and unintentional walk rate for hitters, or strikeout rate and groundball rate for pitchers.

2. Usage metrics – including career length and plate appearances or innings pitched.

3. Phenotypic attributes – including handedness, height, weight, career length (for major leaguers), and minor league level (for prospects).

4. Fielding Position (for hitters) or starting/relief role (for pitchers)

...In most cases, the database is large enough to provide a meaningfully large set of appropriate comparables. When it isn't, the program is designed to 'cheat' by expanding its tolerance for dissimilar players until a reasonable sample size is reached.

Once the comparables are determined, you can then look at how the comparables did. For example, if the player you are "PECOTING" will be 28 this season, you would look to see how his comparables did when they were 28, and come up with a statistical distribution. It makes sense to me. Especially this explanation of the results of the analysis by Nate Silver:

"What separates Pecota from the gaggle [good word, and possible future competition for Google] of projection systems that outsiders have developed over many decades is how it recognizes, even flaunts, the uncertainty of predicting a player's skills. Rather than generate one line of expected statistics, Pecota presents seven – some optimistic, some pessimistic – each with its own confidence level. The system greatly resembles the forecasting of hurricane paths: players can go in many directions, so preparing for just one is foolish."

PECOTA was released as a tool for Fantasy Baseball addicts looking for more accurate predictions for the upcoming season. As I said before, I don't do Fantasy Baseball anymore, so although I find it fascinating, I don't have much practical use for it. But, at least now I won't have to just skip over the guy's name every time I read one of those other blogs (which you should only read after reading The Perfect Game).

1If fan is short for fanatic, loon is short for lunatic, which, in my estimation, is the next step to the funny farm.

No comments:

Post a Comment

Leave a comment and check back later for a reply.