On 11 June 2014, Bing announced a number of World Cup services that included Bing predicts.
Starting today, if you search for “World Cup Predictions”, or any group matches (both preliminary as well as later in the single elimination rounds) we will display the chances of each respective team to win.
Bing models evaluate the interaction of:
- previous win/loss/tie record in qualification matches and other international competitions
- margin of victory in these contests, adjusted for location
- home field (for Brazil)
- proximity (South American teams)
- playing surface (hybrid grass)
- game-time weather conditions
Google used the Google Cloud Platform (including Google Cloud Dataflow to import all the data and Google BigQuery to analyse data, build a statistical model and use machine learning to predict outcomes of each match).
The Google prediction approach used data supplied by Opta. These data enabled Google “to examine how activity in previous games predicted performance in subsequent ones”. These data were combined with “a power ranking of relative team strength” and “a metric to stand in for home team advantage based on fan enthusiasm and the number of fans who had traveled to Brazil”.
On 11 July Google announced ahead of the Final “we’re not only ready to make our prediction, but we’re doing something a little extra for you data geeks out there. We’re giving you the keys to our prediction model so you can make your own model and run your own predictions”.
We’ve put everything on GitHub. You’ll find the IPython notebook containing all of the code (using pandas and statsmodels) to build the same machine learning models that we’ve used to predict the games so far. We’ve packaged it all up in a Docker container so that you can run your own Google Compute Engine instance to crunch the data. For the most up-to-date step-by-step instructions, check out the readme on GitHub.
Predictions at Your Fingertips
The Google Platform blog has a very open evaluation of its predictions at the World Cup. They tipped a French defeat of Germany in the Quarter Final game.
World Cup teams are especially difficult to model because they play so few games together. … If data is the lifeblood of a good model, we suffered for want of more information.
we know that in the same environment, others fared better in their predictions (h/t Cortana; their model relies more on what betting markets are saying, whereas ours is an inductive model derived from game-play data).
This does identify the fundamental issue for predictions at tournaments rather than in a season of competition. How can a system be sufficiently dynamic to respond to short-term events?
My approach is very basic. For this World Cup I followed the World Football Elo Ratings. This led me inevitably to a Germany victory but only after the semi final.
- The Netherlands took Spain’s place in the tournament after their victory in the Group game.
- I anticipated that Brazil would beat Germany.
- Once Germany defeated Brazil then they assumed the highest ranking team position.
- I decided that South American advantage was not an issue in the Final.
13 of the 16 games in the Round of 16 followed the Elo ratings. The real surprise for me was Costa Rica’s progress. They were ranked 89 points below Greece on Elo ratings. Belgium overcame an Elo Ratings deficit of 8 points to defeat the USA. The Brazil v Germany game was a most remarkable overturn of the 67 points difference in their Elo ratings pre-tournament.
I did not do my own Elo calculations during the tournament. This would have given me a much more dynamic model. I did not check the betting odds either but I do understand the importance of this market on agile prediction.
As usual with my observations, I decided that the outcome of any one game was independent of referee, player selection and environmental conditions.
Wherever the Elo Ratings rule was broken in this World Cup it did give me a great opportunity to look at much more granular data.
… and to go back to many of the pre-tournament predictions to look at their robustness.