Data and coherent narratives

Peter Killeen (2018), in a paper that discusses the futures of experimental analysis of behavior, observes “we must learn that data have little value until embedded in a coherent narrative”.

The construction of this narrative has been a hot topic this week in conversations about data science activities.

One example is Evan Hansleigh’s discussion of sharing data used in Economist articles:

Releasing data can give our readers extra confidence in our work, and allows researchers and other journalists to check — and to build upon — our work. So we’re looking to change this, and publish more of our data on GitHub.

He adds:

Years ago, “data” generally meant a table in Excel, or possibly even a line or bar chart to trace in a graphics program. Today, data often take the form of large CSV files, and we frequently do analysis, transformation, and plotting in R or Python to produce our stories. We assemble more data ourselves, by compiling publicly available datasets or scraping data from websites, than we used to. We are also making more use of statistical modelling. All this means we have a lot more data that we can share — and a lot more data worth sharing.

Evan’s article concludes:

We plan to publish more of our data on GitHub in the coming months—and, where it’s appropriate, the analysis and code behind them as well. We look forward to seeing how our readers use and build upon the data reporting we do.

The availability of such shared resources, in Uzma Barlaskar’s terms, will enable us to be data-informed rather than data-driven. Uzma suggests:

In data driven decision making, data is at the center of the decision making. It’s the primary (and sometimes, the only) input. You rely on data alone to decide the best path forward. In data informed decision making, data is a key input among many other variables. You use the data to build a deeper understanding of what value you are providing to your users. (Original emphases)

Alejandro Díaz, Kayvaun Rowshankish, and Tamim Saleh share insights from McKinsey research on the use of artificial intelligence in business and note “the emergence of data analytics as an omnipresent reality of modern organizational life” and the consideration that might be given to “a healthy data culture”.

Alejandro, Kayvaun and Tamim suggest that such a culture:

  • Is a decision culture
  • Has ongoing commitment to and conversations about data initiatives
  • Stimulates bottom up demand for data
  • Manages risk as a ‘smart accelerator’ for analytics processes
  • Supports change agents
  • Balances recruitment of specialists with retention of existing staff

Chris Lidner has looked at the profiles of data scientists that become part of an organisational data culture. He reports “data scientists come from a wide variety of fields of study, levels of education, and prior jobs”. They have a range of job descriptions too: data engineer, data analyst, software engineer, machine learning engineer, and data scientist.

The combination of these posts sent me back to re-read Chris Moran’s What Makes a Good Metric? published in August. I think Chris helps us think about our data narratives in the context of “audience, metrics, culture, and journalism”. He points us to Project as an example of valuing the impact of journalism to the information ecosystem.

This leads Chris to identify the characteristics of robust metrics that help us understand quality and impact:

  • Relevant
  • Measurable
  • Actionable
  • Reliable
  • Readable

He reminded us also that we should be conscious of Goodhart’s Law: any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

As a result of reflecting on these aggregated ideas and discussions, I returned to this diagram presented by Hadley Wickham and Garrett Grolmund‘s data exploration visualisation:

I wondered how this process might change if we start, as Peter Killeen suggested, with an awareness of how we might embed our narrative for a range of audiences in data intensive contexts.

Photo Credits

Basketball photo by William Topa on Unsplash
Person holding four photos photo by Josh Hild on Unsplash

Data intensive sport

Stephen Downes posted about the mutating metric machinery of higher education this morning.

His post contained links to Ben Williamson’s discussion of the mutating metric machinery and David Berry’s post the data-intensive university.

Ben and David have insights to share with us as we deal with the use of data in sport contexts.

David starts his post with this observation about data-intensive society:

we now live within a horizon of interpretability determined in large part by the capture of data and its articulation in and through algorithms.

He defines data-intensive science as the fourth paradigm in scientific enquiry (the others are: experimental; theoretical; and computational). David suggests:

we are on the verge of a new challenge for the university under the conditions of a society that is based increasingly upon digital knowledge and its economic valorisation.

David’s conclusion led me to think about the transformation of sport and the digital skills required. He argued:

a data-intensive university supports efforts to ensure a new spirit of discovery and the promotion of research through the use of computational techniques and practices which will transform the culture of departments in a university.

Ben noted that contemporary culture is increasingly defined by metrics. He discusses the emergence of a narrative in higher education that it has “been made to resemble a market in which institutions, staff and students are all positioned competitively, with measurement techniques required to assess, compare and rank their various performances”.

Ben links his discussion to David Beer’s (2016) concept of metric power that “accounts for the long-growing intensification of measurement over the last two centuries to the current mobilization of digital or ‘big’ data across diverse domains of societies”.

Ben concludes “A form of mobile, networked fast policy is propelling metrics across the sector, and increasingly prompting changes in organizational and individual behaviours that will transform the higher education sector to see and act upon itself as a market”.

David and Ben’s observations and arguments have a resonance for me in the context of sport. As sport acquires more data in training and competition environments, it is a good time to reflect in a second order way on data intensivity and behavioural change. David and Ben use their insights to investigate higher education but my reading of their posts had me interchanging sport with their higher education contexts and thinking about performance and performativity.

Photo Credits

Photo by ev on Unsplash

Photo by Jovan on Unsplash


This is the first time I have used Unsplash photographs in a post. The Unsplash website has this statement:

All photos published on Unsplash can be used for free. You can use them for commercial and noncommercial purposes. You do not need to ask permission from or provide credit to the photographer or Unsplash, although it is appreciated when possible.

Even though credit isn’t required, Unsplash photographers appreciate a credit as it provides exposure to their work and encourages them to continue sharing. A credit can be as simple as adding their name with a link to their profile or photo.