#FIFAWWC: Some visualisations from the 2019 data

I have continued to look at the data from the 2019 FIFA Women’s World Cup.

I have been using the data to help me look at the potential of ggplot 2 as a visualisation tool. It has helped me look at generalised linear models fit too using glm in R.

At the moment, I am using the glm function as a descriptive tool. I am going to follow David Little’s posts (link) as I move towards developing models of performance.

I have a GitHub respository for my FIFA data (link).

My most recent visualisations are:

Ball in Play and Ball Not In Play (in Minutes). I am interested in the dwell time in games, namely when the ball is not in ply.

Total Game Time in Minutes. I am keen to see how games last. In this tournament, three games went to extra time (link). The median total game time for this tournament was 97 minutes.

Ambient weather: temperature and fouls awarded. The FIFA Match Facts contain weather information. This gives an opportunity to explore some ambient data. The median temperature for this tournament was 22 degrees centigrade. The median number of fouls awarded was 20.

Ambient weather: humidity and fouls awarded. The FIFA data include a recording of humidity levels at the start of each game. The median humidity was 60%. The median number of fouls awarded was 20.

I have a particular interest in referee behaviour. In this visualisation I include two outliers: the longest and shortest ball in play time. For this tournament the median ball in play time was 55 minutes.

Using Shiny

Mitch Mooney (link) has created a Shiny application for netball. He has aggregated and curated 12,500 data points from publicly available sources.

Shiny is an R package that makes it possible to build interactive web apps straight from R.

Mitch’s Shiny application is a remarkable resource for netball and it provides us with an important example of how to collect and share data. I see it (link) as a great way to support user interaction and inquiry. It is for me a powerful exercise in reader-receptivity

There are thirty-one teams in Mitch’s database going back to 2013.

I share Mitch’s interest in Shiny as a way of making data public and encouraging reflection on those data. Many years ago, I was introduced to Wolfgang Iser (1991) and reader-receptivity criticism. Wolfagang suggested then:

By putting the response-inviting structures of literary text under scrutiny, a theory of aesthetic response provides guidelines for elucidating the interaction between text and reader.

He adds:

If a literary text does something to its readers, it also simultaneously reveals something about them. Thus literature turns into a divining rod, locating our dispostions, desires, inclinations, and eventually our overall make up.

It is this divining rod of dispositions that attracts me to Shiny and the sharing in which Mitch has engaged.

I have looked at Shiny for some time as a way to share data. Recently, I looked at goalkeeper heights at the FIFA Women’s World Cup in France (link). I have also looked at the esquisse package to share data (link).

My interest in Shiny was stimulated by the discovery of the New Zealand Tourism Dashboard (link), “a one-stop shop for all information about tourism”. The dashboard brings together a range of tourism data produced by Ministry of Business, innovation and Employment and Statistics New Zealand into an easy-to-use tool. Information available is presented using dynamic graphs and data tables.

New Zealand government departments maintain fifteen web applications built with RStudio’s Shiny framework.Their main purpose is to make public data more available and accessible for non-specialist users (link).

I see Mitch’s contribution to this sharing as very important and I am delighted he has shared his link to the netball data.

Exploring dplyr

I have been continuing my trial and improvement work with the tidyverse “an opinionated collection of R packages designed for data science” (link).

Today, I have been working through a dplyr vignette (link). I have been mindful for some time that this part of my R use needed significant improvement.

The vignette is really helpful and guided me through some fundamental procedures I should have known much earlier in my tidyverse use of data frames and tibbles (link).

The vignette points out that when working with data you must:

  • Figure out what you want to do.
  • Describe those tasks in the form of a computer program.
  • Execute the program.

The dplyr package:

  • Constrains options, and helps you think about data manipulation challenges.
  • It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to translate your thoughts into code.
  • It uses efficient backends.

I have created a GitHub repository (link) to share this example. I have attached the csv file I used for the exercise. It is a file from the 2019 FIFA Women’s World Cup in France (link).

I enjoyed working through each of the basic verbs of data manipulation:

  • filter(): select cases based on their values.
  • arrange(): reorder the cases.
  • select() and rename(): select variables based on their names.
  • mutate() and transmute(): add new variables that are functions of existing variables.
  • summarise(): condense multiple values to a single value.

The syntax and function of all these verbs are very similar in dplyr:

  • The first argument is a data frame.
  • The subsequent arguments describe what to do with the data frame. You can refer to columns in the data frame directly without using $.
  • The result is a new data frame

Photo Credit

Opening Game (FIFA)

Final (FIFA)