I’m currently taking a break from debugging my code (if you think its surprising i’m taking a break from work at 3:00 AM, you probably don’t know me very well), frankly because i’ve reached the point where i’m out of ideas on what could be wrong with the code and need to stop thinking about it for a while. (Is there such a thing as coders block?) So, i’m going to take the caffeine-induced late night work session to write a little bit about baseball, something i’ve been meaning to do for a while. I’m in the process of collecting different projection systems for the 2007 baseball season for the purposes of combining them in a simple, yet quantitative way that combines the best aspects of each system. For those that are interested in how these projections are done, i’m going to describe the one’s i’m going to use, ending with a description of my aggregate projection model. (Which doens’t have a name yet, so if you’ve got any bright ideas, send them my way.) Today’s post is on PECOTA…
PECOTA (Player Empirical Comparison and Optimization Test Algorithm), whose projections are available for purchase at Baseball Prospectus, was primarily developed by Nate Silver, the same guy that developed what i consider to be the best model for predicting the outcome of the 2008 presidential election. Nate also recently wrote the first half of his annual take on the 50 best player in baseball, using (amongst other criteria) the predictions made by PECOTA. So what is PECOTA? Basically, PECOTA compares a player’s proudction to all the other players that have ever played the game, and then figures out which players have had career paths similar to the one whose future you’re trying to forecast. It then uses the career paths of those similar players to figure out what the player in question will do going forward. In doing this, PECOTA accomplishes two things: first, it takes into account the effects of aging; second, it acknowledges that different types of players age in different ways (because a guy that depends heavily on speed may age differently than one who relies primarily on his batting eye or one who relies on strength, or one who is fast and strong AND has a good eye and catchers may age differently than designated hitters or second basemen). Now, there’s a lot more to PECOTA, as it also accounts for “luck” in various ways and includes regression to the mean… but the main guts of the model are how it forecasts the future performance of a player based on how the careers of similar past players have developed.
To use an analogy people are probably more familiar with, this would be similar to a weather forecast that made a bunch of measurements (such as surface temperature, barometric pressure, wind speeds, etc.), and then compared those measurements to a huge database of past measurements for the same region. Once the model found past days with similar characteristics, it could then look at what happened in the days immediately following those similar days in the past in order to forecast the weather for the rest of the current week. The model wouldn’t have to make any calculations involving the phyiscs and chemistry of the atmosphere, because the chemistry and phyiscs in the atmosphere were there 10, 20, 50, and 100 years ago and were influencing the weather, probably in a way largely similar to the way the phyiscs and chemistry will affect the weather in the upcoming week. This is even a bigger advantage for baseball, because the fundamental conrols on what a player will do in the future are subject to the vagaries of psychology and biology at least as much as they are the products of physics and chemistry. In other words, not having to write the equations down helps a lot more when you have no idea what the equations are in the first place.
As with all these approaches, this one has its weaknesses. One could think of how climate change could doom weather forecasts such as the one described above. By definition, climate change involves changing the way weather operates on the planet… so if things don’t act in a manner similar to how they used to act, then forecasting based on historical trends becomes a little silly. Similarly, one could see how changes to workout regiments, diets, ballpark effects, climate change, and chemical enhancement could all change the way players progress through their careers compared to past players. (Indeed, many have used this as “evidence” that guys who defied a “natural” aging curve such as Barry Bonds and Roger Clemens were steriod users. But more on that later… Maybe.) The other thing that PECOTA has a weakness in is predicting the career paths of players without much major league service time. This makes sense; the more data one has, the better job one will be able to do projecting a trend into the future. To use the weather analogy again, if you told me the temperature in Seattle today, i probably wouldn’t be able to predict tomorrow’s temperature nearly as accurately as if you told me the temperature for each of the last 7 days. Finally, PECOTA can struggle in projecting playing time. Figuring out the playing time of a player really comes down to three things: the ability of that player relative to his competitors for at bats, the likelyhood that the player (and his competitors) will be injured in the upcoming season, and the ability of a team’s manager to correclty assess the ability of their players. PECOTA doesn’t do as well as the predictions of fans for projecting playing time, and i suspect its because fans have a better grasp of the third of those issues – they follow the team enough to know that their manager loves, for example, playing scrappy, fast, defensive-minded, weak-hitting veterans. (Such managers may or may not like chewing on toothpicks.)
So, that’s PECOTA, and what i consider to be its major strengths and weaknesses. Its arguably the best projection system out there (although i hope to change that), and its been my primary tool for analyzing how good particular Cubs players will be in an upcoming season and for making decisions for my fantasy baseball team (the Mars Pioneers, currently locked in a battle for a playoff spot in the Keystone Fantasy Baseball League). But hopefully, that’s about to change. The things PECOTA isn’t great at should be handled well by a different projetion system, one that amounts to asking everyone, “So, what do you think Big Papi will do this year?” The wisdom of the fans will be the subject of my next proejction post, and the other fifteen percent may be wiser than you’d expect (they’re definitely smarter than i expected).
One Comment
Note, The Mars Pioneers did finish strong in 2008 and did secure a playoff spot but were beaten in the first round.