#RWC2019: patterns after 29 games

29 games have been completed at #RWC2019 (link). I have continued with my idea of characterising the games played with a single number (link). This number is the median ratio of kicks and passes divided by lineouts and scrums. My hope is that this expresses the mobility of each game.

I am using data form the official World Rugby website (link) to curate my data for the tournament. I have noted that these numbers do change after the games have played (particularly the number of passes). I have used data twenty four hours after the completion of the game as a record of that game. I am collecting the data as a Google Sheet (link) with tabs for each game played.

I have used ggplot to visualise data and I am using the data to help me improve my use of R. These visualistions include:

  • ggplot
  • geom_point
  • geom_vline
  • geom_smooth
  • labs
  • annotate
  • size
  • theme_minimal
  • a colour blind palette (link)

My visualisations of the 29 games are including identified outliers are:

I have a Ratio for each game. The tournament median is 2.31 and is expressed by a geom_hline default size and shape in black.

The Ratio is expressed with a geom-smooth function in order to see what the trends in the data look like. The confidence limits are set by default at 95%. I have used the loess method with my small number of data points. The grey area expresses the confidence band for the regression line drawn with the method. The confidence interval can be turned of with se = FALSE or set at a level you specify:

Passes with a geom_hline set at the median number of passes (258):

Passes with a geom-smooth:

Kicks during the game with geom_hline set at a median set at 58 kicks per game:

Kicks with geom_smooth:

Penalties and Free Kicks Conceded with geom_point with a geom_hline set at a median of 16 penalties and free kicks conceded per game:

Penalties and free kicks conceded presented with a geom_smooth function:

Lineouts and scrums have medians of 25 and 13 respectively per game:

Lineouts and scrums with a geom_smooth:

Photo Credit

Lineout Win (World Rugby)

Time at #FIFAWWC 2019

FIFA provided Match Facts for each of the games played at the 2019 Women’s World Cup (link). From these Facts is was possible to construct a time profile of the World Cup.

The data available suggest that the median ball in play time was 55 minutes. The median game time was 97 minutes. The ball was not in play for a median time of 43 minutes. Three of the games went to extra time (Norway v Australia, France v Brazil, Netherlands v Sweden).

Ball in Play ranged from 41 minutes (Germany v Nigeria) to 73 minutes (Norway v Australia).

The geom_smooth profile of ball in play was:

I used a smoothing method to look at trends in the time data (link). The grey area visualises confidence levels (95% confidence level interval for predictions from a linear model) . The confidence limits can be varied (link). In this example, I used Loess smoothing as I had less than 1000 data points.

The FIFA data made it possible to calculate ball not in play time.

In the Netherlands v Sweden game there were 63 minutes of time when the ball was not in play. This was an extra time game.

The total game length varied from 93 minutes (Japan v Scotland, Jamaica v Australia) to 135 minutes (France v Brazil). The three extra time games are indicated in red.

The FIFA data were particularly helpful in constructing time profiles of the games.. Data were presented for each half. Extra time data were included in the match report. As well as describing what occurred these data raise important questions about ball in play time.

Photo Credit

France v Brazil (FIFA Live Blog)

#RWC2019: using geom_hline

In my investigation of single numbers to characterise performance in #RWC2019, I have been using ggplot to visualise the data from World Rugby (link).

In the visualisation below, I was keen to look at outliers <1 and >4. I found four games. A fifth game, Australia v Fiji is a 1.20 game.

I used geom_hline() with a yintercept to draw lines at 1 and 4. For these lines I used the geom_hline() function, and specified a range for the lines, their colour and their size (link):

geom_hline(yintercept = range(1, 4), color=’coral’, size=1)

I included the original geom_hline() for the median ratio. My code for this was:

geom_hline(yintercept = 2.16)

I checked the accuracy of this median with median(df$Ratio) (the result was a median of 2.155 which I rounded up to 2.16.)

Photo Credit

Reaching to score (World Rugby)