Over the weekend I described the process I’m using to simulate MLS seasons, and the odds that a team reaches the playoffs. After using this tool for most of this year, I’ve recently tried extending this work to look at how team projections change during a season.
Every run of the prediction tool generates 10,000 sets of possible final standings – each starts from the current standings, and incrementally picks a result for every remaining game. The output looks a little like this:
By reading across each row, we can determine what the final table looks like in each simulation. By reading down each column we can get a sense for what each teams’ final points total might be.
All of this, however, is just a snapshot – valid only until the next game ends.
It feels obvious, then, to want to look at how these projections change over the course of the season. Teams like Orlando or Chicago have seemed unbeatable at various points this season, only to slump later and be eclipsed by the Toronto juggernaut. Can we use a tool like this prediction script to get a sense for these shifts? I believe that we can.
This, then, has been the focus of my work over the last week or so. By adding another layer of iteration to the process, I’ve been able to generate a plot that looks like this:
How to read these plots
This approach starts from the day the season starts, and generates a set of 10,000 predictions for each day on which games are played – a total of 91 match days so far this year. Every set of simulations is then plotted in a column, reading from left (March 3rd, the day the season starts) to the latest match day (October 8th in this case).
The MLS schedule doesn’t follow a regular cadence – some weeks have games on Wednesdays, while others don’t. Some weekends there are games on Friday, Saturday, and Sunday, while others don’t. In order to help call out the weekly structure of the schedule, I’ve added alternating shading on the timeline at the bottom of the plot that groups individual match days into different weeks. The months of the year are also displayed for easier comprehension.
The vertical axis contains possible point totals that a team can earn, ranging from 10 to 89 points. This spans the range of possible outcomes generated by all the various simulations that have been run. Combining these two axes, you can inspect the chart for Chicago and see that the simulation for Wednesday, July 19th (the beginning of week 21) came up with 823 times (out of 10,000 runs) where the Fire was predicted to finish with 61 points.
Because 61 points was the outcome that appeared most often, that point total is called out in the plot with a red box. This highlight of the most-frequent-outcome helps the eye read the changing fortunes of the team, particularly during the early portion of the season where there is still so much uncertainty in these predictions.
The uncertainty in those early weeks is visible in several aspects of the chart. For one, the colors on the left side of the chart are much more muted, while on the right side of the chart they get much more intense. As the number of remaining games gets smaller, the range of possible outcomes also gets smaller – meaning that those 10,000 simulation runs are concentrated in a smaller and smaller number of outcomes.
Second, the spread of possible point totals gets much smaller. The simulation from March 4th (the second match day) returned point totals for Chicago that ranged from a low of 18 points to a high of 75. Compare that with the simulation from September 20th, which included a range of point totals that only stretched between 48 and 63 points.
An example: summarizing the Chicago Fire
Let’s look again at the Chicago Fire:
Chicago held fairly stable during the first two months of the season, with most simulations predicting that they would finish with just under 50 points at the end of the season.
Their fortunes improved – dramatically – as summer arrived however. The high water mark came in the middle of July, with predictions from that period briefly suggesting that they would finish just over 60 points. That 10-point swing was fueled by the team winning eight of nine games between May 13th and July 1.
Following that charge up the standings, the team then hit a very rough string of results. Between July 5th and August 26th the team won only once in eight games, losing six. This skid dropped their projected points tally all the way down to 51 points going into the first match day in September. Chicago has since rebounded somewhat, and as of this writing they sit on 55 points.
All the teams
The following gallery includes plots for every team in Major League Soccer. Each tells a story that can be read, similar to the Chicago Fire example above. Take a look, and let me know what you think.
Can the plots be made clearer? Would an interactive version be interesting?
I’d be curious to see these alongside graphs of the percentage of possible points earned at each stage of the season. If they’re similar (and I don’t know if they are!), it’s possible this is less of a predictive tool and simply more reflective of how the team is doing so far.
Hi Matthew ! Congrats for your work, you really like statistics and soccer !!
I like too and I’m still learning python programming and machine learning in specific.
I was used to make my brazilian soccer statistics in excel, but I believe that ML will help to increase it.
I downloaded your Trapp code and I will work on it.
Thanks for share.