Peter Killeen (2018), in a paper that discusses the futures of experimental analysis of behavior, observes “we must learn that data have little value until embedded in a coherent narrative”.
The construction of this narrative has been a hot topic this week in conversations about data science activities.
One example is Evan Hansleigh’s discussion of sharing data used in Economist articles:
Releasing data can give our readers extra confidence in our work, and allows researchers and other journalists to check — and to build upon — our work. So we’re looking to change this, and publish more of our data on GitHub.
Years ago, “data” generally meant a table in Excel, or possibly even a line or bar chart to trace in a graphics program. Today, data often take the form of large CSV files, and we frequently do analysis, transformation, and plotting in R or Python to produce our stories. We assemble more data ourselves, by compiling publicly available datasets or scraping data from websites, than we used to. We are also making more use of statistical modelling. All this means we have a lot more data that we can share — and a lot more data worth sharing.
Evan’s article concludes:
We plan to publish more of our data on GitHub in the coming months—and, where it’s appropriate, the analysis and code behind them as well. We look forward to seeing how our readers use and build upon the data reporting we do.
The availability of such shared resources, in Uzma Barlaskar’s terms, will enable us to be data-informed rather than data-driven. Uzma suggests:
In data driven decision making, data is at the center of the decision making. It’s the primary (and sometimes, the only) input. You rely on data alone to decide the best path forward. In data informed decision making, data is a key input among many other variables. You use the data to build a deeper understanding of what value you are providing to your users. (Original emphases)
Alejandro Díaz, Kayvaun Rowshankish, and Tamim Saleh share insights from McKinsey research on the use of artificial intelligence in business and note “the emergence of data analytics as an omnipresent reality of modern organizational life” and the consideration that might be given to “a healthy data culture”.
Alejandro, Kayvaun and Tamim suggest that such a culture:
- Is a decision culture
- Has ongoing commitment to and conversations about data initiatives
- Stimulates bottom up demand for data
- Manages risk as a ‘smart accelerator’ for analytics processes
- Supports change agents
- Balances recruitment of specialists with retention of existing staff
Chris Lidner has looked at the profiles of data scientists that become part of an organisational data culture. He reports “data scientists come from a wide variety of fields of study, levels of education, and prior jobs”. They have a range of job descriptions too: data engineer, data analyst, software engineer, machine learning engineer, and data scientist.
The combination of these posts sent me back to re-read Chris Moran’s What Makes a Good Metric? published in August. I think Chris helps us think about our data narratives in the context of “audience, metrics, culture, and journalism”. He points us to Deepnews.ai Project as an example of valuing the impact of journalism to the information ecosystem.
This leads Chris to identify the characteristics of robust metrics that help us understand quality and impact:
He reminded us also that we should be conscious of Goodhart’s Law: any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
As a result of reflecting on these aggregated ideas and discussions, I returned to this diagram presented by Hadley Wickham and Garrett Grolmund‘s data exploration visualisation:
I wondered how this process might change if we start, as Peter Killeen suggested, with an awareness of how we might embed our narrative for a range of audiences in data intensive contexts.