Data science and women’s cycling

Amelia Barber has written a post that combines her loves of women’s cycling and data science (link).

Her post focuses on on the demographics of the elite women’s road teams (46 teams registered with the UCI). For the post, Amelia scraped raw rider data (August 2019) from the Union Cycliste Internationale (UCI) website (link).

The code Amelia used for the post is shared by her in a GitHub repository (link). The analysis and plots were done in R and the interactive plots were made using Plotly.

I see Amelia’s work as a great example of open sharing and a desire to make much more public women’s performances.

Some yeas ago, I was involved in a project to film the final stages of women’s road races. At the time, there was very little, if any, multi-camera coverage of women’s races and the aim of the project was to see if we could make finishes much more authentic in training and competition. This was in pre-drone days and we manged with appropriate permissions to fly remote model aircraft, with video cameras, to track the final stages of races and training.

The footage obtained started to transform performance and it led to many conversations across the sport about positioning, techniques and tactics.

I am looking forward to Amelia opening up these conversations too. I am keen also to see where her work in R, ggplot2 and Plotly will take her.

Photo Credit

Peleton (BBC Sport, Twitter)


Part Two of the post was published on 25 August 2019 (link)

Exploring kable

I have been looking at ways to present tables in R. I missed Neil Collins’ summary back in April (link) when he included reference to kable.

In the interim, I have been trying to use the gt package (link) but have struggled to format the tables produced.

kable has enabled me to produce tables and I note that it is “a very simple table generator. It is simple by design. It is not intended to replace any other R packages for making tables” (link). The basic table that heads this post is an example of a kable table.

The code I have used and the csv file from the Women’s World Cup are shared in a GitHub repository (link).

I did find Hao Zhu’s (2019) guide to kable very helpful (link) and followed the his guidelines closely … all of which worked. I thought about column names too and used JM’s RProgramming Guide to renaming columns in R (link). In my example, I used JM’s guide to reduced 14 variables in my original data frame (df) to 4 in the shortened version (df1).

I am looking forward to exploring more of the kable functionality. I do find its intuitive characteristics very encouraging.


Earlier today, Mara Averick shared news of Claus Wilke’s cowplot package (link). Claus has an introduction to the new features of cowplot (link) and a more detailed set of all vignettes (link).

Whenever Mara shares anything that might add to work in sport, I try to work through the package or process she shares.

I have done this with cowplot today and have created a very basic GitHub repository to support this exploration of the cowplot package (link). I have used a very brief example from the 2019 NRL season. I use Elo rating points as the basis of what I do here (link).

My csv file has four data points for one team: the highest Elo points recorded thus far; the median for this team this season; the lowest points recorded; and the current week (week 19).

My cowplots included the following worked examples:

The inclusion of a cowplot logo:

Changed Width of Side-By-Side Plots:

Two plots with a shared title:

I find it really helpful to work through examples in R Studio whenever possible. I am looking forward to exploring more of cowplot and Claus’s extremely helpful vignettes. I do see a lot of use of this package in my own work.