Predicting the Outcome of the 2014 FIFA World Cup

3692555438_3369c3719d_o

Introduction

Thirty two teams have qualified for the 2014 FIFA World Cup in Brazil.
The teams are allocated into the following Groups (with links to EA Infographics for each team). The number in brackets by each team is their FIFA Ranking on 8 May 2014 (a new ranking will be published on 6 June.)
Group A – Brazil (4), Croatia (20), Mexico (19), Cameroon (50)
Group B
– Spain (1), Netherlands (15), Chile (13), Australia (59)
Group C
– Colombia (5), Greece (10), Ivory Coast (21), Japan (47)
Group D
– Uruguay (6), Costa Rica (34), England (11), Italy (9)
Group E
– Switzerland (8), Ecuador (28), France (16), Honduras (30)
Group F
– Argentina (7), Bosnia Herzegovina (25), Iran (37), Nigeria (44)
Group G
– Germany (2), Portugal (3), Ghana (38), United States (14)
Group H
– Belgium (12), Algeria (25), Russia (18), Korea Republic (55)
The Elo Rankings of these teams are:
Group A – Brazil (1), Croatia (21), Mexico (16), Cameroon (56)
Group B
– Spain (2), Netherlands (5), Chile (10), Australia (33)
Group C
– Colombia (8), Greece (20), Ivory Coast (22), Japan (24)
Group D
– Uruguay (9), Costa Rica (32), England (6), Italy (11)
Group E
– Switzerland (16), Ecuador (19), France (12), Honduras (45)
Group F
– Argentina (4), Bosnia Herzegovina (25), Iran (34), Nigeria (30)
Group G
– Germany (3), Portugal (7), Ghana (38), United States (13)
Group H
– Belgium (14), Algeria (53), Russia (15), Korea Republic (42)

Predictions

Goal Scorers

Back in December 2013, Opta identified ten contenders for the leading goalscorer at the World Cup:

  1. Neymar (Brazil)
  2. Lionel Messi (Argentina)
  3. Radamel Falcao (Columbia)
  4. Cristiano Ronaldo (Portugal)
  5. Luis Suarez (Uruguay)
  6. Fred (Brazil)
  7. Robin van Persie (Netherlands)
  8. Sergio Aguero (Argentina)
  9. Gonzalo Higuain (Argentina)
  10. Mesut Ozil (Germany)

Winning Team

Goldman Sachs has published a prediction of the winning World Cup team. Their approach to prediction includes:

  • A stochastic model of the outcomes for each of the 64 World Cup games.
  • A regression analysis of all full international games from 1960 (using goals scored).
  • Difference in Elo rankings between both teams (“the most powerful variable in the model”).
  • A country-specific dummy variable relating to World Cup play.
  • Home advantage (country and continent).
  • A Monte Carlo simulation with 100,000 draws.

The Goldman Sachs model “does not use any information on the quality of teams or individual players that is not reflected in a team’s track record”.  The approach is “purely statistical”.
This approach yields the following outcomes.
Exhibit 3
Their knockout stage scenario is:
KO
The model predicts that Brazil, Germany, Argentina and Spain will reach the semifinals, and that Brazil will beat Argentina in the final. Goldman Sachs propose to “update these predictions after every game of the tournament on our portal“.<
Andrew Yuan shared his World Cup predictions on Visual.ly earlier this week. He investigated factors “that are measurable, available, and can be good indicators of a match outcome”.
He has provided a detailed account of his methodology on github. Andrew has looked at the outcomes of 13.337 FIFA official matches since 1994 involving the 2014 World Cup teams. He looked at each team’s relation in FIFA ranking tables, the location where the match took place (home, away or neutral venue) and the proportion of matches won. His model uses logistic regression.
He has Brazil as his probable winners. I liked his link with the FIFA Ranking.
AY
David Dormagen (2014) has presented a very clear account of a simulation model he developed to predict the outcome of the 2014 FIFA World Cup. His approach allows for the “integration of rating systems and rules where either no clear formula for a probability other than a win or loss exists or where the historical data is not enough to derive such a formula”.  In addition “We are also able to combine the results from diff erent rating methods with user-given weights without influencing other calculations, such as the calculation of the draw-probability, the adjustment of the win expectancy for home teams, or the calculation of the expected goals”.
After 100,000 iterations of his simulator, David identified the following outcome (% is the number of times a team had a certain rank in the tournament):
Final
David points out that the simulation followed the official Tournament rules, thus the resulting distribution for each team takes into account Tournament specif c attributes such as the possibility of meeting stronger opponents in most of the matches. He adds “there are four clear favourites for the first place: Brazil, Spain, Argentina and Germany”.

Winning with or against the odds?

2306287069_6f1d7192ab_b
Three different models have identified Brazil as the probable winners of the 2014 World Cup. Two agree on the four semi-finalists. Andrew Yuan has Portugal in his four ahead of Argentina.
What will be fascinating is whether any team can outperform their ‘destiny’.
The ELO Ratings on 29 May 2014 were:
ELO
From an analysis perspective any performance that overcomes these probabilities will be great to examine in detail.
Ray Stefani has provided some additional food for thought. He has looked at FIFA Rankings as his guide and has examined the performance of the top four ranked teams going into each Tournament since 1994.
Rank matrix
Ray’s summary is “No top rated team won, presenting a challenge for Spain in 2014. The second-rated teams won twice, good news for Germany. The fourth-rated teams were second twice, with host nation Brazil currently being fourth. Brazil is particularly well placed to win, given a host-nation advantage”.
France was ranked 18th in the world when it won as hosts in 1998.
This is the betting market on Oddschecker on 29 May for teams to reach the World Cup Final:
Odds
There is some fascinating ensemble convergence here. Can any team outside the five identified find the momentum to win the World Cup away from home?

Photo Credit

World Cup (Marcel Canfield, CC BY-NC 2.0)
Brazil World Cup 1982 (Oyosan, CC BY-ND 2.0)

Postscript

Simon Gleave alerted me to Infostrada’s predictions for the World Cup. Infostrada “is developing various methodologies to forecast major sporting tournaments by implementing various techniques. The methodology we have used to forecast the Football World Cup is based on the Elo rating system”.
This approach:

  • Is based on all historical match results from all teams.
  • Updates the rating after each match to show current strength.
  • Teams gain points when winning and lose points when losing.
  • More points are gained for beating stronger opponents.

The outcome of Infostrada’s analysis of the knockout phase is:
KnockoutResults1
Infostrada will be updating their model throughout the Tournament.
I note that the closest game in the final sixteen games is Uruguay v Columbia. The winner of this game meets Brazil. The four semi finalists are Brazil, Germany, Spain and Argentina.
Other sources of information
Simon Gleave’s Twitter observations on Goldman Sachs (and links to other criticisms).
James Grayson posted this response to the Goldman Sachs model on 29 May.
Calcio Cassini has looked at Examining World Cup Conventional Wisdom on 28 May.
Steve Fenn’s Twitter observations.
 
 

2 COMMENTS

Leave a Reply to Predicting the Outcome of the 2014 FIFA World Cup: Part 2 | Clyde Street Cancel reply

Please enter your comment!
Please enter your name here