I read with great interest Amelia Barber’s post that combined her loves of women’s cycling and data science (link). Since reading her post and writing a brief reply (link), I have been investigating some Union Cycliste International result data.
I did find some women’s road race data available for download (link) and took the opportunity to download the results of the Liège-Bastogne-Liège race at the end of April 2019.
Like Amelia, I am interested in how I might use R with race data to make women’s road cycling more visible. In this race I appear to have a number of visualisations that report chronological age. It is one of the column headings in the data set.
My first attempts include:
These are very basic visualisations of the data but they are for me a start in a public conversation about how we share the available data. The UCI has five pages of women’s road race data going back to October 2018 (link). I sense that an important conversation to have about these data will be web scraping and the possibilities afforded.
I look forward to sharing my analyses on GitHub (link).
The FIFA live blog for each game records temperature and humidity.
After 20 games played in the tournament, I thought I would explore these data with regard to ball in play time in each game.
The data and the RCode I used are available on GitHub. This post is another learning out loud approach to my use of R and RStudio.
Temperature and Humidity for each of the 20 games:
Humidity and Ball in Play Time:
Temperature and Ball in Play Time:
These ggplots are created with secondary data. As with all my World Cup posts, I am mindful that I have not investigated the validity and reliability of these data. I do make some basic face validity assumptions about these data.