Three posts popped up recently that explored our understanding of data.
In a recent post, Cassie Kozyrkov proposes “we need to learn to be irreverently pragmatic about data” (link).
Take a moment to realize how glorious it is to have a universal system of writing that stores numbers better than our brains do. When we record data, we produce an unfaithful corruption of our richly perceived realities, but after that we can transfer uncorrupted copies of the result to other members of our species with perfect fidelity. Writing is amazing! Little bits of mind and memory that get to live outside our bodies.
Cassie notes that when we analyse data, we are accessing someone else’s memories. If we regard ourselves as data analysts then we are engaged in the discipline of making data useful (an in doing so make decisions about analytics, statistics and machine learning). We can demystify data and talk simply about what we do, how we do it, and what we share.
After reading Cassie’s post, I followed up with Nick Barrowman’s (2018) Why Data Is Never Raw (link). He points out:
A curious fact about our data-obsessed era is that we’re often not entirely sure what we even mean by “data”: Elementary particles of knowledge? Digital records? Pure information? Sometimes when we refer to “the data,” we mean the results of an analysis or the evidence concerning a certain question. On other occasions we intend “data” to signify something like “reliable evidence” …
Like Cassie, Nick cautions against “the near-magical thinking about data”. He notes:
How data are construed, recorded, and collected is the result of human decisions — decisions about what exactly to measure, when and where to do so, and by what methods. Inevitably, what gets measured and recorded has an impact on the conclusions that are drawn.
We tend to think of data as the raw material of evidence. Just as many substances, like sugar or oil, are transformed from a raw state to a processed state, data is subjected to a series of transformations before it can be put to use. Thus a distinction is sometimes made between “raw” data and processed data, with “raw data” often seen as a kind of ground truth
Nick argues that when people use the term raw data “they usually mean that for their purposes the data provides a starting point for drawing conclusions”. (Original emphasis) He adds:
the context of data — why it was collected, how it was collected, and how it was transformed — is always relevant. There is, then, no such thing as context-free data, and thus data cannot manifest the kind of perfect objectivity that is sometimes imagined
By coincidence, I was reading Will Koehrsen’s suggestions (link) for a non-technical reading list for data science that starts with this introduction:
we can never reduce the world to mere numbers and algorithms. When it comes down to it, decisions are made by humans, and being an effective data scientist means understanding both people and data
I thought all three posts were excellent nudges to enhance our reflexive practice. They reminded me also of EH Carr’s (1961) discussion of historical ‘facts’. He noted that far from being self-evident, historians give facts their significance and do so selectively. They are in effect “a selective system of cognitive orientations”.