Trying visdat

I found a link to the vis_dat package on CRAN. In my ongoing learning journey in and with R, I am fascinated by the resources that are shared openly … in this case by Nicholas Tierney (link).

vis_dat “helps you visualise a dataframe and “get a look at the data” by displaying the variable classes in a dataframe as a plot with vis_dat, and getting a brief look into missing data patterns using vis_miss.”

I tried it with a csv file of data from the 2019 Asian Cup football tournament. The data include cards given by referees for fouls and other behaviours (including dissent). vis_dat confirmed that the data that are incomplete are for a red card and a second yellow card. Not all cards are red cards or second yellow cards. In my data set I use NA to indicate if a card has NOT been awarded.

An example of the first card given at the tournament:

My data are available as a Google Sheet (link).

The image at the start of this post was produced with vis_dat. I used vis_miss() to visualise the missing data. The function “allows for missingness to be clustered and columns rearranged”.

I am delighted I found this package. I enjoyed reading Nicholas’s thank yous. This underscored for me what a remarkable community nourishes innovation in R.

Thank you to Ivan Hanigan who first commented this suggestion after I made a blog post about an initial prototype ggplot_missing, and Jenny Bryan, whose tweet got me thinking about vis_dat, and for her code contributions that removed a lot of errors.
Thank you to Hadley Wickham for suggesting the use of the internals of readr to make vis_guess work. Thank you to Miles McBain for his suggestions on how to improve vis_guess. This resulted in making it at least 2-3 times faster. Thanks to Carson Sievert for writing the code that combined plotly with visdat, and for Noam Ross for suggesting this in the first place. Thank you also to Earo Wang and Stuart Lee for their help in getting capturing expressions in vis_expect.
Finally thank you to rOpenSci and it’s amazing onboarding process, this process has made visdat a much better package, thanks to the editor Noam Ross (@noamross), and the reviewers Sean Hughes (@seaaan) and Mara Averick (@batpigandme).

In plain sight

white open book on brown wooden table in front of clear glass window in dim room

The serendipity of finding Thomas Grisold and Alexander Kaiser’s (2017) paper (link), whilst looking for recent discussions about feedforward (link) prompted me to think about personal learning journeys.

Thomas and Alexander ask ‘How can unlearning initiate a deep learning process leading to the best version of our self?‘. The volume and quality of resources available to us makes this a very important question.

Notwithstanding the debate about the concept of ‘unlearning’, two excellent links made me think about my ongoing quest to explore visualisation and a better version of my self as a data sharer and storyteller.

The first was Claus Wilke’s (2018) Fundamentals of Data Visualization (link). His welcome message notes that the book “is meant as a guide to making visualizations that accurately reflect the data, tell a story, and look professional”.

I found it through an alert to chapter 16 of the book, Visualizing Uncertainty (link). The alert came from Matthew Kay (link) whose own work on uncertainty visualisation has also been nudging me to a better version of my self.

The second resource was Amy Cesal’s Sunlight Foundation Data Visualization Style Guidelines (link). Amy worked with Zander Furnas to develop the guidelines. There is a copy of these guidelines on GitHub (link).

Amy reflects on her use of a style guide:

Since having a style guide, I have to do less work on the majority of data visuals, because they are already 90% done when they are handed off, if they are handed off at all. I also spend less time testing for colorblindness and text readability, because I’m using pre-tested options. This way, I have more time to focus on larger projects that push the boundaries of our style guidelines, and really make the visuals exceptional.


Amy’s mention of boundaries is where my reading of Thomas and Alexander meets Claus and Amy.

Access to such outstanding visualisation resources disturbs a learned aesthetic. Thomas and Alexander note that:

Feedforward self-modelling involves constructing a desirable image of the self that represents achievements beyond the individual’s current capability. It yields the potential for improvement and rapid changes of behaviour”.

Just as I was drafting this post an alert to Cole Nussbaumer Knaflic’s (2018) accessible data viz is better data viz (link) appeared in my in box. Cole observes “Often, when we are creating charts and graphs, we think of ourselves as the ideal user. This is not only a problem because we know more about the data than the target user, but because other users might have a different set of constraints than we do.”

My hope is that from the inspiration of these great resources, I can start a process of deep learning about how to share in plain sight … not as a New Year resolution but as an everyday practice.

Photo Credit

Photo by Ilya Ilford on Unsplash

What kind of game? Run scoring patterns leading into #WT20

The 2018 ICC Women’s World Twenty20 is being hosted by the West Indies. There are three games on the first day of the tournament, 9 November.

I have data from 58 Women’s ICC T20 games played in 2017-2018 prior to the 2018 World Twenty20. I will track the games in the West Indies and compare them to the median profiles from the 58 games played. 26 of these games were won by the team batting first, 32 by the team batting second.

My profiles are:

Overall

Winning Teams

Losing Teams

Naive Priors

From these data, I have identified these priors to monitor outcomes in the tournament to track when runs are scored.

My hope is that these priors give an early indication of game outcome … and enable me to explore outlier game outcomes.

Photo Credit

Frame grab, Star Sports video (Twitter)