The volume and quality of sport analytics writing fills me with awe.
Each day I try to aggregate, curate and share examples from this analytic turn. My activity is partial and selective. I am mindful of a whole community of practice doing similar work. I see lots of comparisons in this activity with walking along striding edges.
My discoveries encourage me to continue with my learning journey and give me that Leonard Cohen feeling expressed in his Preface to the Chinese edition of Beautiful Losers:
So you can understand, Dear Reader, how privileged I feel to be able to graze, even for a moment, and with such meager credentials, on the outskirts of your tradition.
Two papers today reinforced my feeling of grazing on the outskirts.
Mark Taylor wrote about the intelligent use of numbers in football analysis. In his introduction, Mark notes “The use of data in analysing football allows us to take a more nuanced view of both the events that occur during a match and also to evaluate a side’s ongoing performance to make predictions about their future results”.
Mark’s nuanced view included:
- Awareness of over-fitting non-sustainable events and a flawed projection of a team’s underlying quality.
- How we might value goal-scoring and develop dynamic probabilities of win outcomes.
- The use of Poisson calculations.
What struck me about Mark’s observations was the layers of expertise he used to share his story with an imagined audience. He prompted me to return to some of the early discussions about Poisson distributions (Moroney, 1951; Reep, Pollard and Benjamin, 1971; Maher, 1982).
My second paper comes from a friend who asked me to comment on a draft paper he is writing. The paper uses secondary data to explore winning performances in rugby union. What struck me about this paper was the learning journey my friend has made from full-time athlete to service provider as an analyst.
My friend’s paper included:
- A classification model developed with the R package ‘randomForest‘.
- The use of the R package ‘rfPermute‘ to estimate the significance of importance metrics for a Random Forest model by permuting the response variable.
- Visualisation of partial dependency plots with the R package ‘pdp‘.
- Z scores for McNemar’s test.
I spent a good part of the day exploring the data he shared. I was fascinated by his ease of use of R packages. He took me a long way from the comfort of my qualitative, ethnographic approach to performance observation and analysis.
His paper prompted me to ask him about the reproducibility of his work. I thought he might like another R opportunity … Przemysław Biecek and Marcin Kosiński’s (2017) archivist package designed to improve the management of results of data analysis.
The package enables:
- management of local and remote repositories which contain R objects and their meta-data
- archiving R objects to repositories
- sharing and retrieving objects (and their pedigree) by their unique hooks
- searching for objects with specific properties or relations to other objects
- verification of object’s identity and context of its creation.
My friend intends to produce a number of papers about winning performances. Given his expertise in R, the archivist package would seem to be a great way to share his work openly.
Grazing at the margins of Mark and my friend’s work gave me great delight. Both reminded me the kind of learning required to be a polymath at this point in the analytic turn … and the navigation of paths along striding edges.
Striding edge (The Yes Man, CC BY 2.0)