Bing and Google: Predicting World Cup Football Success

Introduction

Last month, I looked at a number of predictions about performance at the 2014 FIFA World Cup.

I did not mention Bing and Google at that time.

Bing

Bing

On 11 June 2014, Bing announced a number of World Cup services that included Bing predicts.

Starting today, if you search for “World Cup Predictions”, or any group matches (both preliminary as well as later in the single elimination rounds) we will display the chances of each respective team to win.

Bing models evaluate the interaction of:

  • previous win/loss/tie record in qualification matches and other international competitions
  • margin of victory in these contests, adjusted for location
  • home field (for Brazil)
  • proximity (South American teams)
  • playing surface (hybrid grass)
  • game-time weather conditions

There is more information about Bing’s prediction approach here. It includes David Rothschild’s approach on the integration of fundamental and prediction-market data.

Google

Google

Google used the Google Cloud Platform (including Google Cloud Dataflow to import all the data and Google BigQuery to analyse data, build a statistical model and use machine learning to predict outcomes of each match).

The Google prediction approach used data supplied by Opta. These data enabled Google “to examine how activity in previous games predicted performance in subsequent ones”. These data were combined with “a power ranking of relative team strength” and “a metric to stand in for home team advantage based on fan enthusiasm and the number of fans who had traveled to Brazil”.

On 11 July Google announced ahead of the Final “we’re not only ready to make our prediction, but we’re doing something a little extra for you data geeks out there. We’re giving you the keys to our prediction model so you can make your own model and run your own predictions”.

 
We’ve put everything on GitHub. You’ll find the IPython notebook containing all of the code (using pandas and statsmodels) to build the same machine learning models that we’ve used to predict the games so far. We’ve packaged it all up in a Docker container so that you can run your own Google Compute Engine instance to crunch the data. For the most up-to-date step-by-step instructions, check out the readme on GitHub.

Predictions at Your Fingertips

cortanapredictions.0_standard_640.0

The Google Platform blog has a very open evaluation of its predictions at the World Cup. They tipped a French defeat of Germany in the Quarter Final game.

World Cup teams are especially difficult to model because they play so few games together. … If data is the lifeblood of a good model, we suffered for want of more information.

But …

we know that in the same environment, others fared better in their predictions (h/t Cortana; their model relies more on what betting markets are saying, whereas ours is an inductive model derived from game-play data).

This does identify the fundamental issue for predictions at tournaments rather than in a season of competition. How can a system be sufficiently dynamic to respond to short-term events?

My approach is very basic. For this World Cup I followed the World Football Elo Ratings. This led me inevitably to a Germany victory but only after the semi final.

  • The Netherlands took Spain’s place in the tournament after their victory in the Group game.
  • I anticipated that Brazil would beat Germany.
  • Once Germany defeated Brazil then they assumed the highest ranking team position.
  • I decided that South American advantage was not an issue in the Final.

13 of the 16 games in the Round of 16 followed the Elo ratings. The real surprise for me was Costa Rica’s progress. They were ranked 89 points below Greece on Elo ratings. Belgium overcame an Elo Ratings deficit of 8 points to defeat the USA. The Brazil v Germany game was a most remarkable overturn of the 67 points difference in their Elo ratings pre-tournament.

I did not do my own Elo calculations during the tournament. This would have given me a much more dynamic model. I did not check the betting odds either but I do understand the importance of this market on agile prediction.

As usual with my observations, I decided that the outcome of any one game was independent of referee, player selection and environmental conditions.

Wherever the Elo Ratings rule was broken in this World Cup it did give me a great opportunity to look at much more granular data.

… and to go back to many of the pre-tournament predictions to look at their robustness.

Photo Credits

Bing Predictions (Frame Grab)

Google (Frame Grab)

Cortana Windows Phone (Tom Warren)

Weather, Risk, Analytics

I have been looking at probabilistic approaches to success in sport performance.

Whilst researching some ideas around rule based behaviour I came across this advertisement.

My interest was piqued and I sought out WeatherBill.

David Friedberg is the company’s Chief Executive Officer. David was with Google, where he joined as one of the founding members of the Google’s Corporate Development team. He managed a number of strategic projects for Google, including identifying and leading several of Google’s largest acquisitions. He has served as a Business Product Manager for AdWords. He has a degree in Astrophysics from UC Berkeley.

Siraj Khaliq is the company’s Chief Technology Officer. Siraj worked at Google in multiple technical lead roles, from the company’s distributed computing infrastructure to the high-profile Google Book Search project and other offline content search initiatives. Siraj has an M.S. degree in Computer Science from Stanford University, and a B.A. (Hons.) in Computer Science from the University of Cambridge, England.

WeatherBill offers insurance policies that allow farmers to protect themselves from “losses caused by Mother Nature.”  There are “no claims to file, no adjustment needed—if bad weather happens, WeatherBill will send you a check automatically, within 10 days of the end of your policy period.”

I listened with interest to David Friedberg’s discussion of WetherBill’s use of local data to assess and manage risk. He was a guest on Radio National’s Bush Telegraph.

I followed up with a visit to a post about WeatherBill’s use of Google Analytics.

WeatherBill has released news about recent investment in the company.

I am very interested in WeatherBill’s model for assessing and managing risk. I think it has some important resonance with discussions about home advantage in sport and more general discussions about performance in relation to ranking. The key appears to be rich local information in the context of a global system.

Photo Credits

Farmland 1

Tasmania Landscape