#cssia17 Connecting and Sharing

I have been following up on some leads shared by Mara Averick. Two recent suggestions caught my attention as I try to improve the ways I share and connect.

The first was a post by Joris Muller about reproducible computational research for R users. In it he explores ideas shared in a 2013 paper written by Geir Sandve and colleagues. In that paper, Geir proposes ten rules for reproducible computational research. These are very pertinent to those seeking to share and explore performance in sport using analytics insights.

The ten rules are:

  1. Keep track of how every result was produced.
  2. Avoid manual data manipulation steps
  3. Archive the exact versions of all external programs used
  4. Version control all custom scripts
  5. Record all intermediate results in standardised formats when possible
  6. For analyses that include randomness note underlying random seeds
  7. Always store raw data behind plots
  8. Generate hierarchical nalysis output allowing layers of increasing detail to be inspected
  9. Connect textual statements to underlying results
  10. Provide public access to scripts, runs and results

Joris concludes his post:

All the 10 rules proposed in the Sandve paper are reachable for a R user. Just by using R itself, the rmarkdown workflow and some organisational rules cover most of these rules. My basic reproductible workflow meet almost all the criterias with the notable exceptions of the software archive (but it’s work in progress with packrat) and the lack of public access (but I can’t share everything).

For an introduction to Joris’s workflow, you might find this post of interest.

The second lead from Mara focussed on reproducible behaviour too.  Jenny Bryan shared her ideas back in 2015 about Naming Things. This is one of the many resources Jenny has shared. I have found her GitHub repositories immensely helpful. In her 2015 paper, Jenny notes three principles for file names: machine readable, human readable and ‘plays well with default ordering’.

The two leads sent me off thinking about how I might improve my practice. I am fascinated by Joris’s transparency with his workflow and I see this approach as essential for sport analytics as we start to extend cumulative rather than ‘ab initio‘ research. I admire Jenny’s work immensely. I have tried to use some robust file naming conventions for the past fifteen years as I have sought to use cloud based storage for all my resources. I realise I am a long way from meeting Jenny’s three principles at the moment but this will be a work in progress.

Mara Averick’s Twitter recommendations are becoming a very important way for me to connect with a community of practice. These two leads discussed here are a way for me to make this process explicit … and to initiate a conversation about reproducible behaviours in sport analytics research and practice.

Photo Credits

Tree on campus (Keith Lyons, CC BY 4.0)

Mastodon: Sharing R Resources

I am delighted I have a Mastodon account (@KeithLyons). It provides a 500 character space for each toot.

It came to my help today.

I follow Mara Averick (@dataandme) on Twitter. I have been offline for a couple of days and found a treasure trove of links on her account.

I posted this:

I had hoped to use David Libeau’s WordPress plugin to post my toot in the way that Twitter is embedded … but that remains a work in progress.

The links Mara shared that are of direct relevance to #cssia17 included:

 
R powered web applications with Shiny (a tutorial and cheat sheet with 40 example apps)

“Creating and running simple web applications is relatively easy and there are great resources for doing this. But when you want more control of the application functionality understanding the key concepts is challenging. To help you navigate the creation of satisfying Shiny applications we’ve assembled example code below that demonstrates some of the key concepts.”

Thinking About SH//FT in Sport Analytics

mccann-datalab-poordata

Earlier today, I received an alert to Mara Averick’s post on women in the sports data revolution.

Her thoughts took me back to look at SH//FT (Shaping Holistic Inclusion in Future Technology) “a non-profit organization … providing equal opportunity– to be a foundation and platform that under represented groups can use to define their skill set, refine it, and become competitive in the job market”.

In her post, Mara discusses Nikita Taparia’s Women Are Being Left Behind by the Sports Data Revolution. It is a post about “sport stories we wish we could tell – but the data just isn’t there even at the highest level”.

Anyone interested in committing to SH//FT in sport analytics will find Nikita’s post fascinating. I am delighted I found Mara’s response to Nikita:

This is such a wonderful piece, and, realizing that it could take an epoch for me to craft a response worthy of it, I thought I’d just post responses to a few of the issues you pointed out.

These two posts and Alison McCann’s 2015 post, Hey, Nate: There Is No ‘Rich Data’ In Women’s Sports, make compelling reading.

There is a fourth too, Sue Bird’s Analyze This. Sue observes:

I think there is also some subtext to the lack of data in women’s sports. Is the WNBA, for example, not worthy of a deep dive? Do women, as fans — who account for about 70 percent of our fanbase in arenas across the league — have less of a mind, or less of an interest in numbers, than their male counterparts?

She concludes:

One day, I won’t even have to tell my niece about how great Diana Taurasi was. The numbers will speak for themselves.

… and SH//FT happens.

Photo Credit

No data (Allison McCann, FiveThirtyEight)