Amelia Barber has written a post that combines her loves of women’s cycling and data science (link).
Her post focuses on on the demographics of the elite women’s road teams (46 teams registered with the UCI). For the post, Amelia scraped raw rider data (August 2019) from the Union Cycliste Internationale (UCI) website (link).
The code Amelia used for the post is shared by her in a GitHub repository (link). The analysis and plots were done in R and the interactive plots were made using Plotly.
I see Amelia’s work as a great example of open sharing and a desire to make much more public women’s performances.
Some yeas ago, I was involved in a project to film the final stages of women’s road races. At the time, there was very little, if any, multi-camera coverage of women’s races and the aim of the project was to see if we could make finishes much more authentic in training and competition. This was in pre-drone days and we manged with appropriate permissions to fly remote model aircraft, with video cameras, to track the final stages of races and training.
The footage obtained started to transform performance and it led to many conversations across the sport about positioning, techniques and tactics.
I am looking forward to Amelia opening up these conversations too. I am keen also to see where her work in R, ggplot2 and Plotly will take her.
More and more sports and teams are advertising for data scientists. The week, for example, the Golden State Warriors have advertised for two data scientist positions: one a senior data scientist the other a data scientist. I think they provide a fascinating example of the experiences expected of a new generation of analysts.
The online job descriptions for the Golden State include the following (shared here verbatim):
Senior Data Scientist
The Golden State Warriors are looking for a savvy and innovative Senior Data Scientist to join our growing Basketball Operations team. In this role, you will have the opportunity to utilize your skills to develop and define data requirements and recommend data structure for BI applications. To be successful, you will need a strong data engineering background and the ability to thrive in a dynamic environment. You will be responsible for data analysis and model development.
This is an exciting opportunity to share your expertise and knowledge within a growing sports and entertainment organization that values your initiative, creativity and drive for results. We are seeking an individual who thrives in an environment that is ever-changing and full of diversity and pride themselves on community! This is full-time position located in Oakland, CA relocating to San Francisco later this summer.
The required skills and experiences anticipated in this role (note the ‘deep expertise‘ qualifier) included:
A Bachelor’s degree in Computer Science, Statistics or STEM related field; Master’s degree preferred
5+ years of experience with back end engineering, SAS, MSSQL and other markdown languages (JSON, HTML, XML)
Deep expertise with standard SQL
Deep expertise (+4 years) with Python and IPython with focus on pandas
Deep expertise (+4 years) with statistical modeling
Deep expertise with scikit-learn, TensorFlow or Keras
Deep expertise with data visualization matplotlib/seaborn + (Tableau, Looker)
Experience parallelized data pipelining
Experience with cloud technologies Google Cloud or AWS preferred
Proficiency with Git, Slack, Confluence
The advertisement also noted that pursuant to the San Francisco Fair Chance Ordinance, “we will consider for employment qualified applicants with arrest and conviction records”.
In the online application, the job description for the data scientist is similar to the senior role. It does include this invitation:
Do you strive to build something new? If so, then we want to talk to you! This is an exciting opportunity to share your expertise and knowledge within a growing sports and entertainment organization that values your initiative, creativity and drive for results. We are seeking an individual who thrives in an environment that is ever-changing and full of diversity and pride themselves on community! This is full-time position located in Oakland, CA relocating to San Francisco later this summer.
I have spent much of the day reflecting on George Box’s (1979) paper Robustness in the strategy of scientific model building (link).
I have made a rather basic attempt to rewrite Figure 12 from that paper:
This guide reminded me of a more recent visualisation that I have found profoundly helpful. It is from the tidyverse (link):
Both address the way we might model behaviour. I note the inclusion of communication in the tidyverse visualisation and see remarkable potential of combining George’s parsimony and criticism with the iterations evident in the tidyverse.