Data science and women’s cycling: a follow up

I read with great interest Amelia Barber’s post that combined her loves of women’s cycling and data science (link). Since reading her post and writing a brief reply (link), I have been investigating some Union Cycliste International result data.

I did find some women’s road race data available for download (link) and took the opportunity to download the results of the Liège-Bastogne-Liège race at the end of April 2019.

Like Amelia, I am interested in how I might use R with race data to make women’s road cycling more visible. In this race I appear to have a number of visualisations that report chronological age. It is one of the column headings in the data set.

My first attempts include:

These are very basic visualisations of the data but they are for me a start in a public conversation about how we share the available data. The UCI has five pages of women’s road race data going back to October 2018 (link). I sense that an important conversation to have about these data will be web scraping and the possibilities afforded.

I look forward to sharing my analyses on GitHub (link).

Data science and women’s cycling

Amelia Barber has written a post that combines her loves of women’s cycling and data science (link).

Her post focuses on on the demographics of the elite women’s road teams (46 teams registered with the UCI). For the post, Amelia scraped raw rider data (August 2019) from the Union Cycliste Internationale (UCI) website (link).

The code Amelia used for the post is shared by her in a GitHub repository (link). The analysis and plots were done in R and the interactive plots were made using Plotly.

I see Amelia’s work as a great example of open sharing and a desire to make much more public women’s performances.

Some yeas ago, I was involved in a project to film the final stages of women’s road races. At the time, there was very little, if any, multi-camera coverage of women’s races and the aim of the project was to see if we could make finishes much more authentic in training and competition. This was in pre-drone days and we manged with appropriate permissions to fly remote model aircraft, with video cameras, to track the final stages of races and training.

The footage obtained started to transform performance and it led to many conversations across the sport about positioning, techniques and tactics.

I am looking forward to Amelia opening up these conversations too. I am keen also to see where her work in R, ggplot2 and Plotly will take her.

Photo Credit

Peleton (BBC Sport, Twitter)


Part Two of the post was published on 25 August 2019 (link)

Exploring kable

I have been looking at ways to present tables in R. I missed Neil Collins’ summary back in April (link) when he included reference to kable.

In the interim, I have been trying to use the gt package (link) but have struggled to format the tables produced.

kable has enabled me to produce tables and I note that it is “a very simple table generator. It is simple by design. It is not intended to replace any other R packages for making tables” (link). The basic table that heads this post is an example of a kable table.

The code I have used and the csv file from the Women’s World Cup are shared in a GitHub repository (link).

I did find Hao Zhu’s (2019) guide to kable very helpful (link) and followed the his guidelines closely … all of which worked. I thought about column names too and used JM’s RProgramming Guide to renaming columns in R (link). In my example, I used JM’s guide to reduced 14 variables in my original data frame (df) to 4 in the shortened version (df1).

I am looking forward to exploring more of the kable functionality. I do find its intuitive characteristics very encouraging.