While Toronto and Seattle are preparing to face off in MLS Cup, the rest of the teams in Major League Soccer are turning their attention towards next season. Most immediately, teams are making decisions about which players should return and which should be let go. Continue reading Roster Turnover across Major League Soccer, 2013 – 2015
Last summer, I published an article that examined the repeatability of success in Major League Soccer’s regular season. Using data from recent seasons, I concluded that “[a]n MLS team’s performance one year has very little relationship to its performance the following year.”
With the 2016 season starting tomorrow, it seems appropriate to build upon that analysis. This piece expands on last summer’s work in two ways. It does this first by adding all seasons of MLS to the data, and second by considering a new measure of success: advancement in the playoffs.
The data used in this article, and the R code to generate the illustrations, has been posted to GitHub. I encourage anyone interested to download the data and build upon it. The data is in CSV format for maximum portability.
Turning first to the league analysis, the plots below now include the final standings from all 20 seasons of Major League Soccer. There are 251 instances in league history of teams playing consecutive seasons*. Plotting all of these in a scatterplot, with first-year performance along the X axis and second-year performance along the Y axis, produces the following figure.
> model <- lm(data$PPG2 ~ data$PPG1) > summary(model) Call: lm(formula = data$PPG2 ~ data$PPG1) Residuals: Min 1Q Median 3Q Max -0.9891 -0.1992 0.0165 0.1830 0.7460 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.04390 0.08498 12.283 < 2e-16 *** data$PPG1 0.24374 0.06118 3.984 8.91e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2972 on 249 degrees of freedom Multiple R-squared: 0.05992, Adjusted R-squared: 0.05615 F-statistic: 15.87 on 1 and 249 DF, p-value: 8.907e-05
The addition of all historical data does not appreciably change the result. The R-squared coefficient for this model is a very small 0.0599, which is similar to the 0.0681 that was found last summer with a smaller dataset.
Generally speaking, the conclusion from recent seasons holds up well when all seasons of MLS are examined. Team success in one year is a fairly poor predictor of success the next year. This minimal relationship is unlikely to be a figment of the data, but it also leaves a large amount of variation unexplained.
Turning our attention to the playoffs, the dataset now includes information about how far each team advanced in the postseason. The measure is simple – how far did the team advance? Each team is recorded with one of six values, ranging from “Did Not Qualify” to “Champion”. For some parts of this analysis, numeric equivalents of these six levels are used, ranging from 0 (did not qualify) to 5 (champion). The distribution of these values is depicted below.
It should be noted, here, that the structure of the playoffs has changed at two different points. After being an eight-team competition for the first 15 years, in 2011 an octofinal round was added that expanded the field to ten teams. In 2015 the competition was expanded again, to include 12 teams. Because of these expansions, the count for teams advancing to the octofinal stage is relatively small.
When we examine how team success in the playoff changes from year to year, some interesting details emerge.
First, teams have demonstrated radical changes in fortune in the playoffs, but not significantly enough to say that past performance is completely meaningless. Teams that missed the playoffs in one year are – slightly more likely than not – going to miss the playoffs the next year. Teams that advanced to the quarterfinals or semifinals are probably going to reach the same general stage the next year (not missing the playoffs, but also not advancing to MLS Cup).
Yet beyond these very general statements, there are some interesting details. Teams that reach MLS Cup are only able to repeat that feat 20% of the time – and are eliminated before the quarterfinals at roughly the same rate. Curiously, no MLS Cup-winning team has ever been in eliminated in the semifinals the next year – they all either lose before then, or make a second MLS Cup.
Another interesting phenomenon is how frequently teams have surged to MLS Cup from previous disappointment. The percentage of teams to win the championship after either not making the playoffs, or falling in the quarterfinals, is relatively even – approximately 5%. This is in marked contrast to the difference in these teams’ likelihood of missing the playoffs altogether.
One final quirk relates to the chances of a team’s appearing in MLS Cup. Based on these data points, the group most likely to play in the championship game is not the reigning champion – but the defeated finalist. However, while approximately 25% of losing finalists repeat their appearance the next year, those teams that do repeat are still more likely to lose a second time than to finally claim the title. Blame the New England Revolution and Houston Dynamo for these data points.
The varying fates of teams that reached different stages of the playoffs are separated in the following gallery of histograms. Each plot focuses on one level of playoff advancement.
A note on significance
This last note also bears some explanation. For as much as there are now 20 years of history in this data, there are still relatively few data points behind some of these individual categories – so it is still somewhat likely that these are aberrations rather than an emergent trend. This stands in contrast to the linear model for league success, where we can be relatively certain that a team’s performance from one year to the next only explains about 6% of the variation in future performance.
With this analysis now expanded to include all of MLS history, we can say with relative certainty that team fortunes can change drastically from year to year. Yet this begs the question – what does determine team success? Are there factors that predict repeated success? How would we go about identifying them?
Thanks and footnotes
I owe a sincere thanks to Katrin Anacker for helping me work through the analysis of the linear model in this piece, and also to Jason Little and William Rand for their helpful feedback.
* The final seasons for Tampa Bay, Miami, and Chivas USA are obviously excluded. Additionally, I have chosen to exclude the final season of San Jose before their relocation in 2005, classifying Houston as an expansion team.
When Columbus and Portland face off this evening in MLS Cup, it will be a clash between two of the better passing teams in the league. Both feature midfields heavy on ball control, with international-caliber players pulling the strings supported by a back line that likes to get forward.
In preparation for this game, I collected player-by-player passing summaries for each game the two teams played, starting from the last time they faced each other in late September. What I found indicates that fans of all stripes could be in for quite a treat.
This year’s American soccer season isn’t scheduled to end for a few weeks yet*, but my preparations for the off season have already started. Usually, the months between MLS Cup and the beginning of training camp are spent doing player scouting and filling in gaps in my data set. This year, however, I have something more ambitious planned.
I’m hoping to re-write my toolset in Python.
With the final weekend of the season coming up – dramatically branded Decision Day by the league office – this seems an appropriate time to update the Playing Time Evolution plots that I first published about a month ago.
The US Open Cup Final will be played tonight in Philadelphia. The game, between the hosting Union and Sporting Kansas City, will hopefully be an exciting finale to the 102nd edition of the tournament. This year’s event was the biggest in recent history, with 91 teams entering – including every American team from the three professional divisions.
For devotees of the tournament, the Open Cup is one of the uniquely attractive elements of soccer. In theory, any group of players could enter and see how they match up all comers. The cinderella runs of underdogs like the San Francisco Bay Seals, Cal FC, and the Rochester Rhinos (champions in 1999) are the sorts of plots that Hollywood screenwriters long for.
For all this magic, however, there is a frequent undertone from the top of the soccer pyramid that also gets dragged out every year. Search for the phrase “take the Open Cup seriously” and you will find a litany of articles from recent years, about almost every team in Major League Soccer.
Doth the writers protest too much? Continue reading Taking it seriously: MLS and the US Open Cup
In my last post, I introduced a visualization that illustrates how playing time evolves as a season progresses. The feedback for that post was compelling enough that I decided to produce similar plots for every team in Major League Soccer.
Columbus Crew SC will play host to FC Dallas this weekend in a game that will be nationally televised on Fox Sports 1. While the game is still several days away, and will feature two clubs in third place in their conferences and fighting for playoff seeding, much of the conversation ahead of the game has instead focused on absences.
The third home game of the Columbus Crew’s 2014 season is coming up this weekend. The team will play DC United – one of the team’s rivals in the early days of MLS, but a rivalry that has significantly faded in recent years.