Football, Python and R

A few weeks ago, I was introduced to FC Python on Twitter and followed up a link to FC Python’s blog.

I was delighted to read on the blog landing page:

FC Python is a project that aims to put accessible resources for learning basic Python, programming & data skills in the hands of people interested in sport. Whether you are a Sports Science student, a coach, or anyone with a passing interest in football – the tools shown across these pages will help you to get started with programming and using data with Python.

I think this is a wonderful approach to take.

Rob Carroll hosted an FC Python blog post (Why Programming Matters) on his Video Analyst blog on 9 February that extended the reach and appeal of FC Python’s work.

In that post, FC Python observes:

There has never been a more important time for Sports Science students to take responsibility for their own development through learning programming skills. With over 10,000 students (and growing) graduating from sports-related courses every year, the problems facing job seekers are well-documented. Taking the time to learn is by far the best investment that you can make to ensure that you’ll be towards the top of the application pile.

I agree absolutely with these sentiments.

My approach in recent years has been to create, aggregate and share open educational resources. My WikiEducator course, Sport Informatics and Analytics, has an R component. I have added a Python page too, that points to FC Python’s inspirational sharing of Python programming and data skills.

My advocacy for Python is in part a lament.

When I worked at the Australian Institute of Sport, my colleague Bob Buckley was a Python specialist. I missed an important opportunity to accelerate and champion Bob’s work. It took me a decade to catch up with where he was in 2006. Bob moved on from the AIS shortly after I left. He is using his skills now as a Computational Genomic Specialist at John Curtin School of Medical Research at the Australian National University.

Shortly after finding FC Python, I was introduced to Tyler Bosch from the University of Minnesota.

I am sorry not to have found Tyler sooner. I am grateful to Jamie Coles for the introduction. Like FC Python, Tyler has a profound educational commitment to sharing. He ran an Introduction course for R in January and some of his resources are shared on Patreon.

I do try to monitor developments in R and in recent months have been guided by Mara Averick’s links. I shared some of these links in a post for the Irish Performance Analysis Exchange.

Yesterday, I discovered that FC Python had nurtured an R response, FC rSTATS. There is a blog site to accompany the Twitter account. On the home page is this acknowledgement:

The R conversion of @FC_Python. Not associated with the original but have given a thumbs up to convert their resources.

This is another important step in open sharing. It also provides a crosswalk for anyone interested in learning R and Python with association football data as the domain example.

The authors of FC Python and FC rSTATS have chosen to remain anonymous. This is a profound commitment to the essence of open educational resources. Each of us can make our own judgements about the probity of the material shared on each site.

For my part, I am in awe of what they are doing … and Tyler too.

Photo Credit

Racing ahead (Keith Lyons, CC BY 4.0)

#cssia17 Connecting and Sharing

I have been following up on some leads shared by Mara Averick. Two recent suggestions caught my attention as I try to improve the ways I share and connect.

The first was a post by Joris Muller about reproducible computational research for R users. In it he explores ideas shared in a 2013 paper written by Geir Sandve and colleagues. In that paper, Geir proposes ten rules for reproducible computational research. These are very pertinent to those seeking to share and explore performance in sport using analytics insights.

The ten rules are:

  1. Keep track of how every result was produced.
  2. Avoid manual data manipulation steps
  3. Archive the exact versions of all external programs used
  4. Version control all custom scripts
  5. Record all intermediate results in standardised formats when possible
  6. For analyses that include randomness note underlying random seeds
  7. Always store raw data behind plots
  8. Generate hierarchical nalysis output allowing layers of increasing detail to be inspected
  9. Connect textual statements to underlying results
  10. Provide public access to scripts, runs and results

Joris concludes his post:

All the 10 rules proposed in the Sandve paper are reachable for a R user. Just by using R itself, the rmarkdown workflow and some organisational rules cover most of these rules. My basic reproductible workflow meet almost all the criterias with the notable exceptions of the software archive (but it’s work in progress with packrat) and the lack of public access (but I can’t share everything).

For an introduction to Joris’s workflow, you might find this post of interest.

The second lead from Mara focussed on reproducible behaviour too.  Jenny Bryan shared her ideas back in 2015 about Naming Things. This is one of the many resources Jenny has shared. I have found her GitHub repositories immensely helpful. In her 2015 paper, Jenny notes three principles for file names: machine readable, human readable and ‘plays well with default ordering’.

The two leads sent me off thinking about how I might improve my practice. I am fascinated by Joris’s transparency with his workflow and I see this approach as essential for sport analytics as we start to extend cumulative rather than ‘ab initio‘ research. I admire Jenny’s work immensely. I have tried to use some robust file naming conventions for the past fifteen years as I have sought to use cloud based storage for all my resources. I realise I am a long way from meeting Jenny’s three principles at the moment but this will be a work in progress.

Mara Averick’s Twitter recommendations are becoming a very important way for me to connect with a community of practice. These two leads discussed here are a way for me to make this process explicit … and to initiate a conversation about reproducible behaviours in sport analytics research and practice.

Photo Credits

Tree on campus (Keith Lyons, CC BY 4.0)

Mastodon: Sharing R Resources

I am delighted I have a Mastodon account (@KeithLyons). It provides a 500 character space for each toot.

It came to my help today.

I follow Mara Averick (@dataandme) on Twitter. I have been offline for a couple of days and found a treasure trove of links on her account.

I posted this:

I had hoped to use David Libeau’s WordPress plugin to post my toot in the way that Twitter is embedded … but that remains a work in progress.

The links Mara shared that are of direct relevance to #cssia17 included:

R powered web applications with Shiny (a tutorial and cheat sheet with 40 example apps)

“Creating and running simple web applications is relatively easy and there are great resources for doing this. But when you want more control of the application functionality understanding the key concepts is challenging. To help you navigate the creation of satisfying Shiny applications we’ve assembled example code below that demonstrates some of the key concepts.”