Data Coverage

There is a lot of data on the site. Some of it is ours and much of it comes from This page attempts to show you a complete list of what we do and do not have on the site. When stats were accumulated and when they are missing. Please let us know if there is an item you would like to see reported here.

Full Season Stats

We consider the start of major league baseball to be 1871, and obviously the game has changed a lot in that time, and so has the recordkeeping. Below we summarize the seasons for which we have complete data. Any entry with a "NO" in it means the data is completely missing for that season. Entries with a "Partial" means we have it for some leagues or some players in that season and "YES" means everything is known. This doesn't mean there aren't errors, just that we have a value for it.




Minor Leagues


For 1974 on we have complete play-by-play (PBP) accounts for all games. We likewise have complete accounts for all postseason and all-star games. Pre-1973, we have a good deal of PBP, but a few games are missing and we can only present box scores for those games. This means that WPA, RE24 and other PBP dependent stats are incomplete for those seasons. Below see a list of games for which we have full play-by-play or just boxscore data and the percentage of all games that year for which we have PBP.
Play-by-Play Coverage

Here are the teams affected with the number of games for which they don't have play-by-play (just for years we have any play-by-play).

Missing Play-by-Play

Hit Location Data and Batted Ball Type Data

Hit location Diagram from Retrosheet

Batted ball types include Line Drive, Fly Ball, Ground Ball, and Pop Ups. Bunts are also noted.

Please note that this data is not 100% complete, and that locations and trajectories have been measured differently in different years. We have attempted to merge different sources whenever possible to have as complete a dataset as possible. Here is the coverage for the 50+ years of data that we have on hand. The table below looks at all of the balls of play and then gives a breakout of the percent of time we know the trajectory (and the type to show how this has changed), the percent of the time we know the location and who fielded the ball (won't be 100% as there is no fielder for home runs and some things like ground rule doubles), and the percentage of plays that result in air_outs or ground_outs. Even in cases where the trajectory and location are not exactly known we may still know the fielder (even for hits) and whether a ground ball or fly ball out was recorded and by whom.

Note: for 2000-2002, home runs were classified with empty batted ball types in our data source. We have reclassified all of these hits as fly balls. Probably 20% of these home runs should be line drives and perhaps 1-2/year as ground balls. We realize this is a simplication, so please adjust your expectations of splits, etc accordingly.

Hit Locations

Pitch Data

The pitch data is only given when we know the values for the entire game and for all plays in the game. This report does not include pitch type or velocity. Instead, it records the sequence of balls and strikes, fouls, swinging strikes, pitchouts, etc.

Please note that this data is not 100% complete, and we have merged several datasets when producing this data. Back to 1998 is essentially complete and before then there is a great deal of data back to 1988. Previous to 1988, only a few years have data. For example, Allan Roth of the Dodgers compiled such data for many, many Dodgers games from the '50s and '60s.

Below are the percentage of all plays in a season that are missing pitch sequence data.

Pitch Results

Weather Data

The weather data is based on conditions at the start of the game. Below we show the percentage of each data set (temp, wind speed & dir, etc) which are not null (or unknown). This data is included in the RetroSheet data files and is provided as is and most certainly contains some errors. There is no weather data pre-1950.

Game Weather Data

