#RWC2019: patterns after 29 games

29 games have been completed at #RWC2019 (link). I have continued with my idea of characterising the games played with a single number (link). This number is the median ratio of kicks and passes divided by lineouts and scrums. My hope is that this expresses the mobility of each game.

I am using data form the official World Rugby website (link) to curate my data for the tournament. I have noted that these numbers do change after the games have played (particularly the number of passes). I have used data twenty four hours after the completion of the game as a record of that game. I am collecting the data as a Google Sheet (link) with tabs for each game played.

I have used ggplot to visualise data and I am using the data to help me improve my use of R. These visualistions include:

  • ggplot
  • geom_point
  • geom_vline
  • geom_smooth
  • labs
  • annotate
  • size
  • theme_minimal
  • a colour blind palette (link)

My visualisations of the 29 games are including identified outliers are:

I have a Ratio for each game. The tournament median is 2.31 and is expressed by a geom_hline default size and shape in black.

The Ratio is expressed with a geom-smooth function in order to see what the trends in the data look like. The confidence limits are set by default at 95%. I have used the loess method with my small number of data points. The grey area expresses the confidence band for the regression line drawn with the method. The confidence interval can be turned of with se = FALSE or set at a level you specify:

Passes with a geom_hline set at the median number of passes (258):

Passes with a geom-smooth:

Kicks during the game with geom_hline set at a median set at 58 kicks per game:

Kicks with geom_smooth:

Penalties and Free Kicks Conceded with geom_point with a geom_hline set at a median of 16 penalties and free kicks conceded per game:

Penalties and free kicks conceded presented with a geom_smooth function:

Lineouts and scrums have medians of 25 and 13 respectively per game:

Lineouts and scrums with a geom_smooth:

Photo Credit

Lineout Win (World Rugby)

Cédric’s introduction to R ggplot

Cédric Scherer (link) has written a delightful guide to ggplot. His post is titled A ggplot2 Tutorial for Beautiful Plotting in R (link).

I worked through his post by looking at some of the data from the FIFA Women’s World Cup in France (link) earlier this year.

My exploration of Cédric’s suggestions was definitely of the trial and improvement kind. I did find it one of the best introductory guides to ggplot I have discovered and it helped me build on my eclectic learning journey with this form of visualisation.

The csv file I used for this exploration is available on GitHub (link) and is titled RefereesWWC.csv. My brief R record is:

I looked at five examples from the official FIFA data provided in FIFA’s Match Facts (link). I was mindful that the median ball in play time during the World Cup was 55 minutes and the median time was 97 minues.

1. A geom_point of the referees who officiated at the World Cup and the FIFA record of ball in play time in minutes.

2. A geom_line and geom_point development of visualisation 1 that connects referees that officiated at more than one game at the World Cup.

3. A geom_density_ridges visualisation of ball in play time and total game time.

4. A generative additive model for less than 1000 data points. An outlier, USA v Thailand, is recorded with annotate.

5. An example of a developed geom_density-ridges plot that used the theme_economist visualisation backdrop from the ggridges package. It uses temperature data to look at goals scored in the tournament.

This visualisation provides an opportunity to record with annotation particular games and includes two 0v0 games, the 13 v 0 game and two games involving six goals.

I do recommend Cédric’s post unreservedly. It is a great way for us to develop our use of ggplot as a visualisation tool. The basic code I used for my post is available on a GitHub (link).