Distances traversed by the 2018 #WorldCup Finalists

I have been looking at the player tracking data shared on the official World Cup website.

With an awareness of the limitations and potential accuracy of the data, I have compiled some basic R visualisations of the data for France and Croatia.

Going into the Final, the FIFA data suggest that in the tournament, the French team has traversed approximately 607,462 metres (a median distance per game of 102,193 metres) and Croatia 723,909 metres (a median distance per game of 118, 220 metres).

All three of Croatia’s knockout games have required extra time. In the semi final against England, Croatia traversed the most distance in any of their games (143,286 metres). In contrast, the data for France’s semi final indicate that they traversed 101,609 metres.

I have aggregated a record of each team’s tracking data on GitHub.

Three French players (Antoine Griezmann, Ngolo Kante and Paul Pogba) have tracking data of over 10,000 metres in knockout games (Antoine and Ngolo twice each and Paul once). No French player exceeded 10,000 metres in the game against Argentina.

In contrast, nine Croatian players have traversed over 10,000 metres. Marcelo Brozovic covered 16,339 metres in the game against England.

The distances traversed by players from Croatia and France are:

In terms of distance order:

Players traversing > 13000 metres per game:

Under 11,000 metres per game:

Photo Credit

Why Brozovic could be the key man for Croatia (The Times, 10 July 2018)

Postcript

My R code is very basic. I am using the World Cup data to develop my use of R.

I am using R version 3.5.1 in RStudio. For the plots, I used the ggplot2 and ggrepel packages. I followed guidelines provided by Garrick Aden-Buie ( Grammar) and Kamil Slowikowski (ggrepel).

An example

#install.packages for ggplot2 and ggrepel

#df is my source file (28 observations of 5 variables)

ggplot(df, aes(Opponent, Distance, label = Player)) +
    geom_text_repel(
       data = subset(df, Distance > 13000),
       nudge_y = 17000 – subset(df, Distance > 13000)$Distance,
       segment.size = 0.2,
      segment.color = “grey50”,
     direction = “x”
) +
geom_point(color = ifelse(df$Distance > 13000, “red”, “black”)) +
scale_y_continuous(limits = c(10000, 17000)) +
labs(title=“World Cup Finalists 2018”, subtitle=“Distances Traversed in the Knockout Phase > 13000 Metres”, x=“Opponent”, y=“Distances Traversed Per Game (Metres)”)

Scoring patterns in #WorldCup Knockout Phases 2010-2018

The Quarter Finals are about to take place at the 2018 FIFA World Cup.

I have been using a naive Bayes approach to anticipate when goals might be scored in the knockout phase of the 2018 tournament.

I chose some priors from the outcomes of the 2010 and 2014 tournament knockout phases. The posteriors for these were:

2010

2014

My priors for 2018 were:

The posteriors for the Round of 16 were:

A comparison of Priors and Posteriors after Round of 16:

It will be interesting to see if this relationship changes in the forthcoming games … particularly in regard of extra time.

Introduction to R and ggplot2 with Scottish Hill Race Data

A photograph of a hill race in Scotland on Kirk Craigs

I have been fascinated by the impact 35 records of Scottish hill racing from 1984 have had following their publication by Anthony Atkinson in 1986.

I have produced a Google Doc to use these data as an introduction to R and ggplot2.

I thought this might act as a microcontent resource for the OERu course in Sport Informatics and Analytics, particularly in regard to the pattern recognition (Using R) and audiences and messages (Visualising data) themes.

Photo Credit

Kirk Craigs Christmas Cracker (Ross Branigan, CC BY-NC 2.0)