How I’m simulating playoff odds

Over the course of this season I’ve been occasionally running simulations of how the rest of the Major League Soccer season might end up. The feedback I’ve gotten from this work has been generally positive, and a number of people have asked about my methodology. This post is, finally, my attempt to explain the process I’ve been following.

The TL;DR version is this: I’m running a Monte Carlo simulation that randomly assigns a result to each remaining league game.

Version 1

The slightly longer story is that I started with a very simple approach, and then added more detail after a month or so. Rather than try to estimate team strength, streaks, or anything else to estimate a team’s final points total, I chose to estimate the season as a whole – and treat the teams (and games) largely as black boxes.

I started by looking at every result in MLS from 2011 through 2016. Across these six years – a total of 1,955 games – the results broke down as follows:

  • 972 wins by the home team (~50%)
  • 533 draws (~27%)
  • 450 wins by the road team (23%)

Using these counts, I then wrote a small python script that simulates the remaining games in the season.

The script starts from the current standings and list of remaining games, and for each remaining game randomly selects a result based on the percentages in the table above. At the end of the simulated season, write out the list of points earned by each team.

Repeat the last paragraph 10,000 times.

Taking the output of that script and working a little bit of Excel magic will leave you with a very rudimentary prediction for how teams might line up at the end of the season.

You can take the set of all predicted point totals for each team and do the usual math to generate an average point total, piecing together a table:

You can even go a bit further, counting how often each team ends up with each points total, and put together a more graphical exploration like this:

After a few weeks of using this script, however, I started to feel like I could (and should) do better.

The heart of this approach is how you assign percentages to each of the three possible results that a game can have. But not all games are the same. A poor team playing on the road has a lower chance of victory than does a strong team. A team playing short-handed, or on short rest, also face different odds than a well-rested team with a full complement of players.

So I re-worked the script a bit.

Version 2

For the second iteration – which is the one I’m still using – I wanted to tweak the methodology quickly, without having to lose too much time in generating a sophisticated model. I briefly considered looking at in-game events, simulating a “shot funnel” that would look at how often each team shot the ball, how often a shot was on target, and how often their opponents saved shots that were put on goal – but that promised to be quite a bit of work, and meanwhile I had an embarrassingly simple script that could be improved much more easily.

I ended up taking the 1,955 game results from version 1, and adding only one additional data point – the relative PPG for each team coming into the game. I grouped games into three categories: the home team having a better PPG value, the away team having a better PPG value, and when the teams were tied.

Under this new classification system, the percentages for each result changed – not by a large amount, but noticeably:

When the home team has a better PPG

This accounted for 881 games in the training data.

  • 481 wins by the home team (~55%)
  • 229 draws (~26%)
  • 171 wins by the road team (19%)

When the teams have the same PPG

This occurred 117 times in the training data.

  • 58 wins by the home team (~50%)
  • 26 draws (~22%)
  • 33 wins by the road team (28%)

When the road team has a better PPG

There were 957 such games in the training data.

  • 433 wins by the home team (~45%)
  • 278 draws (~29%)
  • 246 wins by the road team (26%)

With this somewhat more nuanced dataset, I tweaked my original script and compared the two approaches:

The adjustments had the expected effect – teams at the top of the standings got slightly better, while teams at the bottom did slightly worse. This seems more accurate than simply using the same odds in all conditions, so I kept using the updated version of the script.

Future versions


If you’ve read this far, you probably have some suggestions about how this approach can be improved. Here are a few things that I’ve considered:

1. Consider factors beyond PPG

The easiest change would be to add more factors than just PPG. Two possible candidates would be to look at fatigue, and how lineups change. Fatigue, at least when measured solely by “how many days ago did you play?” seems like it could be a fruitful area to explore:

Predicting lineup changes is significantly harder, particularly given that team lineups aren’t announced until 45 minutes before kickoff. That makes long-term prediction impossible, as there’s really no way to know whether a game in August will bring changes while you run the script in June.

It might be possible to infer coaching preferences, however, using an approach like what I was exploring in August:

This would lessen the need to know which players might be changed, while still expecting whether team A would change by a small or large amount on average.

2. How do you break ties?

The problem with picking results as “home win, draw, or away win” is that you have no way of breaking ties accurately at the end of the season. The first tiebreaker in MLS is the total number of wins – which this approach would capture – but the second tiebreaker is goal difference (which would push me into #4, below).

I’m pretty sure that next year’s script will include the total-wins tiebreaker, but I’m not sure whether the remaining tiebreakers will be implemented.

3. Extend the training dataset

MLS has existed for more than 20 years. Other leagues exist. Could predictions get better if I included more games in my training data?

Honestly, I’m somewhat hesitant to go too far down this path. MLS is unique in some ways, both from its domestic counterparts (travel details, distance traveled, length of season) as well as from foreign leagues (distance traveled, different league structures that could change playing incentives). Even MLS in the past differs in some ways, as the league has grown and changed the length of its season. And I refuse to train a model on any games during the shootout era.

4. Peek inside individual games

Thus far I’ve specifically chosen to avoid simulating anything inside the game – as far as this approach is concerned, the stadium on game day is a black box into which two teams enter and a result emerges.

This is a design choice made largely because of the difficulty in modeling in-game events. Above I laid out one possible approach (dealing with a Drake-equation-esque series of consecutive probabilities when it comes to shots) – but it isn’t clear yet to me that I have the skill necessary to avoid possible pitfalls in implementation. I’d much rather err on the side of caution, and accept a certain amount of inaccuracy, than risk getting too specific and see the whole process out of control.


Hopefully this look “over my shoulder” is useful to someone. As I said in the beginning, I’ve gotten a steady stream of questions about how I was doing all of this, and I’ve intended to write up this explanation for several months. My intent is to tweak my approach during this offseason, and possibly integrate this workflow into Trapp for easier use.

If you have any feedback, suggestions, or other questions, I’d love to hear from you. The best way to reach me is on Twitter, although you can also comment below or send me an email at


I’ve posted the current script that I’m using on GitHub, if anyone wants to look over the details of how all of this is happening.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *