The US Open Cup Final will be played tonight in Philadelphia. The game, between the hosting Union and Sporting Kansas City, will hopefully be an exciting finale to the 102nd edition of the tournament. This year’s event was the biggest in recent history, with 91 teams entering – including every American team from the three professional divisions.
For devotees of the tournament, the Open Cup is one of the uniquely attractive elements of soccer. In theory, any group of players could enter and see how they match up all comers. The cinderella runs of underdogs like the San Francisco Bay Seals, Cal FC, and the Rochester Rhinos (champions in 1999) are the sorts of plots that Hollywood screenwriters long for.
For all this magic, however, there is a frequent undertone from the top of the soccer pyramid that also gets dragged out every year. Search for the phrase “take the Open Cup seriously” and you will find a litany of articles from recent years, about almost every team in Major League Soccer.
Doth the writers protest too much?
There is certainly a perception among some that MLS teams don’t make the tournament a priority – at least until their regular season fate is secure, or a trophy is within easy grasp. Some teams, despite pre-game proclamations, do seem to use the Open Cup to give reserve players some experience – but does this happen more than in other situations?
Setting the scope
In order to explore this question with some degree of rigor, I conducted an analysis of several years’ worth of games by teams in Major League Soccer. The data set for this task includes every game played by a team from Major League Soccer since the beginning of 2013. That is a total of 2,194 games, by 21 teams, across five official competitions (MLS regular season, playoffs, US Open Cup, Canadian Championship, and Champions League). The games break down as follows:
The 21 teams in MLS have played between one and three seasons (Chivas USA played in 2013 and 2014, while Orlando and New York City began in 2015). The 18 teams that have operated continuously during this period have each played between 104 and 125 games, providing a decent basis for analysis.
The number of MLS Regular Season games dwarfs everything else. 87% of the games in the data set – 1,900 in total – come from that competition. The other four competitions range between 1% and 6% of the data, as illustrated in the table below:
|CONCACAF Champions League||81||4%|
* Percentages do not add to 100% due to rounding
At the beginning of this effort my hypothesis was that, by examining the lineups each team used during their games, a trend would become evident of teams varying their lineup more significantly for Open Cup games than for games in the regular season. While any team will vary the players it uses – for a variety of legitimate reasons – the deviations in lineups around the Open Cup would be measurably more significant.
No such expectation was made about the other tournaments in which teams compete – although for the sake of completeness they would also be included in this dataset. It would not be surprising to find that tournaments such as the Canadian Championship or CONCACAF Champions League might be similarly treated, but this was not formally part of my hypothesis.
The specific instrument I decided to use to test this hypothesis is derived from the Herfindahl-Hirschman Index (HHI). I’ve written about this tool before – its application to soccer treats a team roster as a market, with players viewed as individual agents competing for market share (playing time). An HHI score allows an analyst to measure the degree to which playing time is spread out, or concentrated, among a roster of players.
This methodology is applied in the following way: Imagine a team playing a 10-game season. First, calculate the HHI score for that team’s final total of minutes played for each player. Then, for each game in the season, calculate an HHI score for every game but the targeted game. For game 1, compute the HHI score for games 2-10. For game 2, use game 1 and games 3-10. For game 3, use games 1-2 and 4-10. These comparisons, between HHI scores both with and without the targeted game, will reveal the influence that each game has on the overall score. Relatively large negative values indicate a game which deviated significantly from the average, positive scores indicate that the targeted game reinforced the average lineup, while a value of zero falls exhibits an average amount of deviation.
The range of values produced by this analysis can be seen in the histogram depicted in Figure 3.
The distribution of values is slightly asymmetrical, with most values grouped close to zero. There is a longer tail to the left, indicating a small number of games in the data set exhibited relatively large deviations from the average lineup.
Figure 4, below, shows the calculated deviations for all games in this dataset, separated by game number within the team’s season.
A note on the vertical axis on the box plots in this analysis: Lineups which most closely resemble the “usual” lineup for a team exhibit little deviation. Those data points appear at the top of box plots like Figure 4. The more a lineup differs from the average, the greater the decrease in HHI score when that game is focused on (i.e. when a given lineup differs most significantly from the average, it brings the team’s overall HHI score down significantly).
Using this procedure, each of the 2,194 games in the data set was assigned a Deviation rating. By comparing the range of these Deviation ratings for all Open Cup games to the range of values from the regular season, our hypothesis could be tested using a standard analysis of variance.
The range of deviations in lineup among the five recorded competitions can be found in Figure 5 below:
This plot appears to confirm the hypothesis that lineups used in the US Open Cup tend to differ dramatically from those used in the regular season. More precisely, the lineups used in Open Cup games tend to deviate more significantly from the “average lineup” than those used in league games.
Looking at the range of deviations in the above plot, it is highly likely that individual data points can be cherry-picked that run counter to this hypothesis. Sporting Kansas City, significantly, chose to use reserve players in their league game last weekend in order to save their best players for Wednesday’s final. Over the entire data set, however, it appears that the Open Cup is used as an opportunity to rotate in reserve players for game experience.
At the risk of diving too heavily into the statistical weeds, below is the summary of a linear model generated from the above data set. This step provides a helpful check on any visual ambiguity in the above plot.
Analysis of Variance Table Response: deviations.test$Deviation Df Sum Sq Mean Sq F value Pr(>F) deviations.test$MatchType 1 3.4732e-05 3.4732e-05 378.24 < 2.2e-16 *** Residuals 2021 1.8558e-04 9.2000e-08 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The deviations in lineups illustrated in Figure 5 pass a significance test at the 99.9% confidence level. The hypothesis is confirmed.
(This is probably a good time to note that the source code and dataset used in this analysis are available on GitHub, and are linked at the bottom of this piece)
Having confirmed our hypothesis, we can now look deeper into the data set. Does the same difference between league games and Open Cup games exist between other competitions? Is the same difference exhibited by all teams in MLS, or do some vary their lineups more than others?
Looking first at all competitions, Figure 6 plots the range of variations for all five official competitions, not just league and Open Cup games.
It appears, from this plot, that teams are not just using the Open Cup as a chance to change their lineups. The Canadian teams have used their national competition in the same way.
Interestingly, there is a distinction between the CONCACAF Champions League and the MLS Playoffs. While both tournaments are selective, with only the best teams qualifying for participation, MLS teams exhibit very different behaviors when entering each. While lineup variation is significant for the Champions League – occurring at levels similar to the Open Cup and Canadian Championship – when it comes time to enter the playoffs teams exhibit the opposite tendency. Lineup variation decreases, and teams tend to turn to their usual starters more often.
Teams do not appear to treat the Open Cup uniformly, however. Plotting each team’s lineup deviations separately reveals some interesting distinctions – which can be seen in Figure 7.
Some teams, such as the Philadelphia Union, deviate very little when they choose their Open Cup lineups. Chicago and Chivas USA follow a similar trend. On the other hand, teams such as San Jose, New England, Houston, and Columbus follow a different strategy and change their lineups more significantly.
Deviations in lineups are more significant in early rounds of the competition, and appear to cease almost entirely by the time the semifinals begin. Reasons for this are not hard to guess: early-round games are more likely to be played against lower-division opponents, and the promise of a trophy (and ensuing qualification to a far-off Champions League competition) still seems remote. Additionally, the scheduling for these early round games (lately played in early June) places them before a roster freeze deadline, and either during or shortly before the summer transfer window. Coaches trying to make decisions about whether to keep marginal players may be more tempted in these games to give them what amounts to a “last chance to keep their spot”.
Some of these observations contradict one another, however. Close examination of the above two plots reveal that teams such as Philadelphia or Chicago deviate minimally from their preferred lineups even in the early rounds of competition.
Conclusions and future directions
While the hypothesis that MLS teams tend to deviate significantly from their “usual” lineup in Open Cup competition has been confirmed, a closer examination of data from the last three years reveals that this deviation is not uniform. Some teams exhibit greater willingness to experiment than others. These deviations also tend to decrease as the tournament progresses, suggesting that rhetoric about “taking the tournament seriously” becomes more believable as the Final approaches.
Additionally, the level of deviation in lineups for the Open Cup are comparable to that seen in other competitions – including the CONCACAF Champions League. This similarity indicates that, while teams may take the opportunity to change their lineup when they play outside of league competition, the Open Cup itself is not used to a more significant extent than the other alternatives.
Possible future directions for this metric exist. One immediately-apparent candidate would be to see whether deviations in a lineup have any affect on a team’s chances of winning. Such comparisons could be applied both to Open Cup and Champions League play, where the quality of opponents varies greatly, as well as to league play – where teams are at times challenged with fixture congestion and other factors which may encourage teams to rotate their starting lineups.
Download the data behind this analysis
In the spirit of open data, the scripts and datasets used in this analysis have been posted to GitHub. I welcome any comments, corrections, and feedback about this piece.