In 2018, StatsBomb announced the release of free data on women’s football (link). The announcement included this observation:
Not only do we believe that analysis of the Women’s game deserves equal attention as the Men’s game, we know that by doing this better, we will improve the overall understanding of the game. We also want to encourage more Women to enter into Analytics, Technology and R&D …
The announcement included a reference to the StatsBomb Resource Centre (link). In 2019, StatsBomb provided open data from the Women’s World Cup (link) and indicated the importance of the use of R in deciphering these data (link).
We’d like this to be as approachable as possible for as many people as possible. We want you to feel comfortable jumping in and having a play around. With that in mind, we’ve put together a little primer for working with our data in the R programming language.
StatsBomb has created the StatsBombR package (link) and is shared as a repository on GitHub. The package requires a User Agreement (link) that notes “StatsBomb have made this data freely available and accessible to encourage and facilitate research and the shared analytical understanding of the game of Football. This is aimed to be a research tool, and is intended to be used as such”.
Information about the StatsBombR package can be found on GitHub (link). An example of the use of these data can be found in the FCrSTATS Github repository (link) including some getting started guidelines (link). Ryo Nakagawara has been using ggplots with some of these data (link) and shared them with #TidyTuesday visualisations (link).
I was following some data live on the Australia v Brazil group game at the 2019 FIFA World Cup (link).
It was the first time I had noticed that a Live Win Probability was being used in this way and I decided to track it with a Google Sheet of my own (link).
By the time I had reached the graph, Brazil scored a penalty in the 27th minute (Martha) and a second goal in the 38th minute (Cristiana). I am using an Elo measurement at this Tournament to assess probability based on not losing if a higher rated team scores first. Based on these two goals, the probability of Brazil not losing was moving towards at least 0.8 out of 1.
As I watched the graph progress towards half time, Australia scored (Foord). I wondered how the probability graph might respond … and what both coaches might do at half time. I did think the late Australian goal in the half might add an interesting stage in the probability of game outcome given that Australia was the higher rated Elo team in this game.
At the start of the second half, the Brazilian coach replaced Martha with Ludmila (a loss of approximately 120 caps) and Formiga with Luana (Luana Bertolucci Paixão) (a loss of approximately 150 caps) . Australia made no changes. Brazil conceded a long range goal in the 58th minute (Logarzo) and an own goal in the 66th minute (the goal was confirmed by the Video Assistant Referee (VAR, link)). Australia won the game 3v2.
I wondered how we might factor these dynamics into our visualisations and augment our machine intelligence with a reciprocal understanding about game playing in activity that has its own as well as general time series momentum. A paper by Michael Lopez and his colleagues (link) has set me off thinking about these dynamics.
Given the growth in the use of these visualisations, I do think these are very important conversations to be having now.