Recent academic research from the Netherlands sheds light on this question. Jan Lasek and coworkers looked at a variety of world rankings in soccer and asked how well they predicted the results of 979 test matches, a huge sample set.
To test the rankings, they developed a method so each rankings gave a “win probability” for a match. Then they looked at how far this probability deviated from the actual result of the match.
For example, suppose the United States is predicted to have a 54% chance to beat Mexico. If the match ends as a draw, the deviation of the prediction (0.54) from the result (0.5 for a draw) is 0.04. Taking the square of 0.04 gives a measure of the error. A win for the United States gives an error of (1.0 – 0.54) squared, while a loss results in an error of 0.54 squared.
Jan asked me to take part in the study with The Power Rank. I directly provided him with the win probability for the 979 matches in the test set.
The visual shows the results for the mean squared error. A smaller error implies a better predictor.
The horizontal bar gives a measure of the uncertainty in the error estimate. There is a 2 in 3 chance the true error is within the range of the bar.
The authors also looked at a different error measure called the binomial deviance. However, the results are similar to the mean squared error.
For the curious soccer fan, the paper draws the following conclusions.
The FIFA rankings
FIFA, the international governing body for soccer, publishes the most popular international rankings. However, it’s just a table (3 points for a win, 1 for a draw, 0 for a loss) that attempts to account for strength of opponent and importance of the match.
The FIFA rankings do poorly at predicting the outcome of matches.
What did you expect from such a simple method? They account for strength of schedule by taking the rank of an opponent and subtracting it from 200. That might have been novel in 1863.
While FIFA fails in ranking nations in men’s soccer, they do a better job for the women.The FIFA Women’s ranking uses an Elo type rating system that accounts for margin of victory. This information is critical in predicting match outcomes.
Margin of victory
The top 5 rankings for predicting matches use margin of victory in their calculations. Only one of the remaining rankings in the study (not shown in the visual) use this information.
Two of the top rankings, the FIFA women’s rankings and EloRatings.net, do not use margin of victory in any kind of sophisticated way.
For example, a typical Elo ranking uses a 1, 0.5, or 0 for a win, draw or loss in a match respectively. Instead, the FIFA women’s rankings use a number between 0 and 1 for a match outcome based on the score. These numbers, which Lasek and coworkers show in Table 2 of their paper, appears to have no mathematical justification. However, the rankings perform well in prediction.
The Least Squares rankings and The Power Rank, two methods that naturally use margin of victory, were two of the other top systems.
The Elo++ rankings show the critical importance of margin of victory. This system won a Kaggle competition for ranking chess players. It has advanced features like giving less importance to matches in the distant past and uses a sophisticated regression method in its calculation.
However, it does not account for margin of victory. While it’s performance in predicting matches isn’t as bad the FIFA rankings, it does not perform as well as the top 4 rankings.
The wisdom of crowds
The best method for predicting football matches was the Ensemble, which combined the predictions of the FIFA women’s rankings, EloRatings.net, The Power Rank and Least Squares.
The improvement from aggregation was significant. The ensemble of 4 rankings had an error 4.3% lower than the average error of the 4 systems.
Others have aggregated the wisdom of many computers, a type of ensemble learning, to make predictions. Nate Silver uses 4 different college basketball rankings in his NCAA tourney predictions. I aggregated 7 preseason baseball predictions to forecast the 2014 season.
You’ll see a lot more of this from The Power Rank heading into football season.
More games or only recent games?
The FIFA rankings use a four year window to calculate rankings. With the turnover in players and coaches on national teams, this seems like a reasonable time span over which to evaluate a team.
But maybe a team just gets lucky over that time span. Four years means less than 80 games for most countries. Maybe an underachieving country like Argentina has had bad luck in world competition recently.
When Jan Lasek asked me to be a part of his study, I did two separate calculations. For each match, I used these sets of games in predicting the outcome.
Even though the first set contains fewer and more recent games than the second set, the two calculations had about the same predictive accuracy. The first appeared in the paper, but the second had a slightly smaller mean squared deviation.
Soccer teams don’t change much over time. Simon Kuper and Stephan Szymanski found the same result for England in the book Soccernomics. From 1980 through 2001, they found that the sequence of wins for the national team was identical to the random flipping of a coin.
Network research in rankings
Lasek and coworkers also studied the rankings from a paper by Park and Newman. They developed a ranking method based on their research in networks. The nodes in the network represent teams, and edges that connect nodes are games between the teams. The Power Rank uses the same concept.
I’m not sure why, but the Park Newman method has a cult following. Maybe it’s because the paper is available for free on an archive, or that Mark Newman has a prestigious professorship in physics at the University of Michigan. But these rankings pop up everywhere. I even get random emails asking me about it.
However, the method does not use margin of victory, and it’s terrible at predicting football matches. It performs much worse than the FIFA rankings.
Check out the best international rankings
Lasek and coworkers highlight important aspects in ranking world soccer teams. However, it’s not the last word on predicting matches.
The biggest problem with their method is using one win probability for a match. While this works for testing the predictive power of rankings, it does not get to the heart of football prediction: the probability for a win, loss and draw.
For more information on soccer prediction,please visit here