The first round of group games has concluded at the 2018 FIFA World Cup.
38 goals were scored in these 16 games.
- Winning teams scored 24 goals.
- Losing teams scored 4 goals.
- There were 10 goals scored in drawn games.
The team that scored first did not lose in these 16 games.
The time intervals when these goals were scored were:
I have been following a naive Bayes approach to the probability of scoring in these time intervals.
|Time Interval (Minutes)||Prior to |
|Posterior after |
My probability priors for scoring by half (based on 2010 and 2014 World Cup performances) were:
- Score in first half: 0.40
- Score in second half: 0.60
My probability posteriors after Round 1 are: 0.36 and 0.64 respectively.
There have been 16 different referees in Round 1. Szymon Marciniak (Poland) is the only referee not to have given a yellow card in a game (Argentina v Iceland). There has been one red card (Carlos Sanchez, Colombia).
The median number of fouls per game in Round 1 was 28. The range of fouls was from 18 to 43.
During the first round of games, the median ball in play time has been 56 minutes (and as a median percentage 58%). The median length of games has been 96 minutes.
Tracking data suggests that three players covered >12000 metres in Round 1: Aleksandr Golovin (12706); Christian Erikson (12262); Iury Gazinsky (12240).
Japan v Colombia (The Japan Times, Twitter)
I am continuing to explore RStudio as a way to visualise data. I thought I would look at ball ion play time in the first 14 games played in the 2018 World Cup.
My knowledge is very basic but finding packages like ggrepel makes the learning even more enjoyable.
“ggrepel provides geoms for ggplot2 to repel overlapping text labels:
Text labels repel away from each other, away from data points, and away from edges of the plotting area.”
The first plot above uses this code and includes
The second option:
Uses this code with
I am looking forward to more learning opportunities during the World Cup.
In May 2018, FIFA announced that Electronic Performance and Tracking Systems that comprise “two optical tracking cameras located on the media tribune” will track the positional data of players and ball at the 2018 World Cup.
These data, real-time positional data and video, are offered live at World Cup games on two devices: “one for the team analyst observing the match from the media tribune, another for the coaching team at the bench”.
Post-game, the positional data are made available on the FIFA World Cup website for secondary data analysis.
I have started to compile these date in a GitHub repository.
An example of the data is this matrix from the opening game of the tournament for Russian players:
The data available are:
- Player squad number
- Player name
- Distance covered in metres (total; when team in possession; when team not in possession)
- Percentage of time spent: opposition half; attacking third; penalty area)
- Number of sprints
- Top speed
- Percentage of time in activity zones 1 to 5
The activity zones are defined as:
- Zone 1: 0-7 km/h
- Zone 2: 7-15 km/h
- Zone 3: 15-20 km/h
- Zone 4: 20-25 km/h
- Zone 5: >25 km/h
After 14 games, Aleksander Golovin‘s 12,706 metres traversed remains the most distance recorded in a game by a single player.
The list of players who have covered most distance in metres per team in the games (with a link to the data) is:
A note about traversing
I am keen to connect 2018 technology with 1930s attempts to measure distances in sport contexts. The pioneers described movements as ‘traversing‘ and provided distance estimates.
In his doctoral thesis, Lloyd Messersmith (1942:2) shared his data from basketball collected with a measuring device “which could be used in determining distances traversed” and that provided information about “distances traversed on offense and defense, and the effect of position played on distance traversed”.
FIFA World Cup (IIP Photo Archive, CC BY-SA 2.0)