Plotting Roster Turnover

A grid of information showing offseason roster turnover over the history of the Columbus Crew

Since the end of the 2013 season, I have been compiling data about how teams change their roster between seasons. While the main part of that effort is still ongoing, I’d like to share some of what I’ve put together so far.

 Download the spreadsheets, charts, CSV file and other materials (2.2 Mb Zip archive)

Read the Examiner article discussing the recent roster changes

Scope of data

This first batch of information is focused solely on the Columbus Crew, across all official competitions. This includes league games, playoffs, US Open Cup games, and CONCACAF-sponsored international competitions such as the CONCACAF Champions League. It does not include friendlies, such as those between the Crew and Leeds United from 1997.

From this list of official competitions, the Crew have played 652 games through the end of 2013. A total of 208 players have played in these games. By plotting each game’s lineup, a waterfall chart emerges that serves as the basis for everything that comes next. In the archive linked above, this source spreadsheet is named “Columbus Crew Lineup Grid 1996 – 2013.xlsx”.

The next step was to examine the gaps between years, totaling the numbers of players that returned or departed each offseason. The data from this investigation can be found in “Crew Turnover.xlsx”. There are four worksheets:

  1. “Data” contains summary information for each class of players (a class in this sense is the group of players who either left or stayed through each offseason). It also contains a list of every player from that season, sorted by playing time, and color coded to indicate departures.
  2. “Baseline” is a replica copy of the “Data” worksheet, used to provide a background set of plots in the “PT Dist” worksheet.
  3. “Summary” shows a bar chart depicting offseason departures after each season. These departures are calculated twice, once by the % of playing time among the departed players and again by the % of the roster.
  4. “PT Dist” contains distribution curves for playing time among each year’s roster. Every player from a given year is shown as a dot along the curve, with departed players coded in red. Players who remained for the next year are shown in gray.

2013 Playing Time Distribution Curve

The rest of this archive includes the charts and plots generated by these spreadsheets, and a CSV export of the “Crew Turnover” spreadsheet. Each season’s playing time distribution curve has been rendered to a separate image (the 2013 curve can be seen above). Every player has been color-coded on the curve – returning players appear in gray, while players who left the team appear as red dots. Each season’s plot also includes each of the Crew’s 18 seasons in the background, which should help put that year’s pattern in context.

Tools used

Most of this work has been generated using Excel, although the initial spreadsheet was generated using information stored in my master MySQL database. Minimal image processing has also taken place in Gimp. Data was extracted from the MySQL database using a series of Python and VBscript scripts.

Still to come

Another aspect of this roster turnover investigation involves looking at turnover across all teams in league history. That effort is nearing completion, and any finished data will be released on this blog when it is ready.

Leave a comment

Your email address will not be published. Required fields are marked *