Data science and women’s cycling: a follow up

I read with great interest Amelia Barber’s post that combined her loves of women’s cycling and data science (link). Since reading her post and writing a brief reply (link), I have been investigating some Union Cycliste International result data.

I did find some women’s road race data available for download (link) and took the opportunity to download the results of the Liège-Bastogne-Liège race at the end of April 2019.

Like Amelia, I am interested in how I might use R with race data to make women’s road cycling more visible. In this race I appear to have a number of visualisations that report chronological age. It is one of the column headings in the data set.

My first attempts include:

These are very basic visualisations of the data but they are for me a start in a public conversation about how we share the available data. The UCI has five pages of women’s road race data going back to October 2018 (link). I sense that an important conversation to have about these data will be web scraping and the possibilities afforded.

I look forward to sharing my analyses on GitHub (link).

Women’s FIFA World Cup 2019: some game data

I have been using some basic R code to look at game data from the 2019 FIFA Women’s World Cup. I have tried out some of the colourblind-friendly palettes too (link).

Some of my data: FIFA provided a record in minutes of actual game time and total game time. I used the number of fouls awarded by referees as a background theme:

There were some temperature data and I used these to look how much the ball was out of play (in minutes):

There were some humidity data:

The median time for ball in play at the 2019 FIFA Women’s World Cup was 55 minutes. I looked at three types of games: less than the median; greater than the median; and three extra time games:

My data for these visualisations are shared in a GitHub repository (link).

Temperature, humidity and ball in play time at 2018 FIFA #WorldCup after 20 games

There is a rich variety of data available on the 2018 FIFA World Cup website.

The FIFA live blog for each game records temperature and humidity.

After 20 games played in the tournament, I thought I would explore these data with regard to ball in play time in each game.

The data and the RCode I used are available on GitHub. This post is another learning out loud approach to my use of R and RStudio.

Temperature and Humidity for each of the 20 games:

Humidity and Ball in Play Time:

Temperature and Ball in Play Time:

These ggplots are created with secondary data. As with all my World Cup posts, I am mindful that I have not investigated the validity and reliability of these data. I do make some basic face validity assumptions about these data.

Photo Credit

_IGP5474 (Victor,  CC BY-SA 2.0)