Introduction
There is a lot of discussion at the moment about data analysis and its role in sport. In addition to social media conversations about data, a number of newsletters this week have raised data issues. These include: discussions about data translators (link); futzing and moseying (link); and analysis as detective work (link).
These alerts sent me off looking at: John Tukey; Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin and Marti A. Hearst; Anne Fisher; Nathan Yau and Sean Taylor.
This post looks at some of the data issues they raise and their importance for our discussion of some of the meta-issues faced in the analysis of performance in sport.
John Tukey
Exploratory data analysis
In 1972, John Tukey wrote about exploratory data analysis (link). He suggested that the process of analysing data had three phases:
- Exploratory
- Probabilistic
- Mustering and borrowing strength
In his paper he identified some principles for exploratory data analysis:
- Walk first, run later
- Do not wait for running shoes start now
- Data analysis should be investigative
- Resistant techniques should be the usual beginning
- Analyses should come before summaries
- Present at least two different versions of analysis
- Looking at the data requires better numbers and better pictures
- Implicitly defined analyses are inevitable
- Data analysis is going to become more like biochemistry
John proposes that exploratory data analysis is actively incisive rather than passively descriptive. There is a real emphasis on the discovery of the unexpected and John believes this must become customary.
In a subsequent paper (1980), John talked about teaching data analysis and the need to address exploratory and confirmatory data science. He suggests “we need to teach exploratory as an attitude, as well as some helpful techniques, and we probably need to teach it before confirmatory” (link) (my emphasis). This attitude involves asking questions of our practice:
- How are questions generated?
- How are designs guided?
- How is data collection monitored?
- How is analysis overseen?
John notes that the analysis process requires exploratory and confirmatory data science.
Sara Alspaugh, Nava Zokaei, Andrea Liu, Cindy Jin and Marti A. Hearst
Futzing and Moseying
In 2018, Sara Alspaugh and her colleagues shared insights gained from thirty interviews of professional data analysts working in a range of environments (link). The semi-structured interviews were conducted in 2015. The terms futzing and moseying appear in the paper’s title. There is no mention of futzing in the text and one of moseying as poking around. The literature suggests that futzing is “unstructured, playful, often experimental interaction” with technologies (link). David Holland and his colleagues (2001) propose that “futz” means “tinkering or fiddling experimentally with something.” They suggest that futzing “refers specifically to making changes to the state of the system, while observing the resulting behavior in order to determine how these relate and what combination of state values is needed to achieve the desired behavior”.
Sara Alspaugh and her colleagues note of data science:
This interdisciplinary field requires its practitioners to acquire diverse technical and mental skills, and be comfortable working with ill-defined goals and uncertain outcomes.
Their study considered exploration “to be open-ended information analysis, which does not require a precisely stated goal”. They observe that analysis activity exists “along a spectrum from exploratory to directed”.
Their interviews suggested that in exploratory data analysis:
- Uncovering interesting or surprising results.
- Comparing data.
- Coming up with new questions or hypotheses.
- Engaging in exploration.
They suggest that their study showed that practitioners in the field do “ask questions of their data” in an exploratory fashion.
Data Translators
Anne Fisher (2019) notes that the job of data translator “requires a unique combination of skills, usually including both a strong grounding in data science and a talent for boiling complex ideas down to clear, practical choices”. Data translators bring with them a thorough knowledge of the business in which they are working.
One CEO Anne quotes observes “we need a new generation of executives who understand how to manage and lead through data,”
In a world that is addressing machine learning and artificial intelligence, data translators are able to construct narratives with plots that include a beginning, a middle, and an end.\
Michael Lieberman (2018) discussed the role of data translators and described it as “the must have role for the future”. He suggests a data translator is the connector between data scientists and executive decision-makers. Bernard Marr (2018) noted data translators are “specifically skilled at understanding the business needs of an organization and are data savvy enough to be able to talk tech and distill it to others in the organization in an easy-to-understand manner”.
Bernard suggests the skills a data translator needs are:
- A desire to ask questions and get a deeper understanding of issues.
- The confidence to challenge perceptions and biases of individuals at every level of the organization.
- A solid understanding of business requirements and vernacular.
- Analytics knowledge or desire to acquire it to be effective communicating with data scientists.
- Passion to give others an advantage of understanding by using accessible language.
Analysis as Detective Work
Nathan Yau used a recent Flowing Data post (link) to discuss analysis as detective work. He observed “I’ve been poking at some data the past couple of weeks and it’s got me thinking about how messy my process often is”. Nathan refers to John Tukey’s 1977 text and its reference to exploratory data analysis as detective work.
Nathan points out:
Analysis isn’t step-by-step. Sometimes it’s a process of elimination. Sometimes it’s simplicity building towards complexity. Sometimes it’s a crapshoot. Usually it’s some mix of these things.
Nathan explores this detective activity with a discussion of Sean Taylor’s response to Nathan’s chart about commuting. At the end of the Twitter thread that explored visualisation, Sean produced this:
Nathan’s original stimulus was an interactive graphic (link).
Conclusion
This is a time of vibrant conversation about data science, roles and insights in sport. Recent alerts to discussions outside sport indicate that this debate has a much wider context.
My hope is that these conversation enable us to reflect on how pedagogy and practice fit into our lives. We are explorers and on this journey we need to learn the skills of translation. We become detectives too and enjoy our futzing and moseying.
Photo Credit