Discussing data

A tilt-shift photography of HTML codes

Three posts popped up recently that explored our understanding of data.

In a recent post, Cassie Kozyrkov proposes “we need to learn to be irreverently pragmatic about data” (link).

She observes:

Take a moment to realize how glorious it is to have a universal system of writing that stores numbers better than our brains do. When we record data, we produce an unfaithful corruption of our richly perceived realities, but after that we can transfer uncorrupted copies of the result to other members of our species with perfect fidelity. Writing is amazing! Little bits of mind and memory that get to live outside our bodies.

Cassie notes that when we analyse data, we are accessing someone else’s memories. If we regard ourselves as data analysts then we are engaged in the discipline of making data useful (an in doing so make decisions about analytics, statistics and machine learning). We can demystify data and talk simply about what we do, how we do it, and what we share.

After reading Cassie’s post, I followed up with Nick Barrowman’s (2018) Why Data Is Never Raw (link). He points out:

A curious fact about our data-obsessed era is that we’re often not entirely sure what we even mean by “data”: Elementary particles of knowledge? Digital records? Pure information? Sometimes when we refer to “the data,” we mean the results of an analysis or the evidence concerning a certain question. On other occasions we intend “data” to signify something like “reliable evidence” …

Like Cassie, Nick cautions against “the near-magical thinking about data”. He notes:

How data are construed, recorded, and collected is the result of human decisions — decisions about what exactly to measure, when and where to do so, and by what methods. Inevitably, what gets measured and recorded has an impact on the conclusions that are drawn.

He adds:

We tend to think of data as the raw material of evidence. Just as many substances, like sugar or oil, are transformed from a raw state to a processed state, data is subjected to a series of transformations before it can be put to use. Thus a distinction is sometimes made between “raw” data and processed data, with “raw data” often seen as a kind of ground truth

Nick argues that when people use the term raw data “they usually mean that for their purposes the data provides a starting point for drawing conclusions”. (Original emphasis) He adds:

the context of data — why it was collected, how it was collected, and how it was transformed — is always relevant. There is, then, no such thing as context-free data, and thus data cannot manifest the kind of perfect objectivity that is sometimes imagined

By coincidence, I was reading Will Koehrsen’s suggestions (link) for a non-technical reading list for data science that starts with this introduction:

we can never reduce the world to mere numbers and algorithms. When it comes down to it, decisions are made by humans, and being an effective data scientist means understanding both people and data

I thought all three posts were excellent nudges to enhance our reflexive practice. They reminded me also of EH Carr’s (1961) discussion of historical ‘facts’. He noted that far from being self-evident, historians give facts their significance and do so selectively. They are in effect “a selective system of cognitive orientations”.

Photo Credit

Photo by Markus Spiske on Unsplash

Making sense of data practices

Laura Ellis has been writing this week about solving business problems with data (link). The alert to her post came shortly after another link had taken me back to a presentation by Dan Weaving in 2017 on load monitoring in sport (link). A separate alert had drawn my attention to two Cassie Kozyrkov articles, one on hypotheses (link) and the second on what great data analysts do (link).

I have all these as tabs in my browser at the moment. They joined the tab holding David Snowden and Mary Boone’s (2007) discussion of a leader’s framework for decision-making (link).

These five connections make for fascinating reading. A good starting point, I think, is David and Mary’s visualisation that forms the reference point for the application of the Cynefin framework:

They observe “the Cynefin framework helps leaders determine the prevailing operative context so that they can make appropriate choices”.

The 2007 visualisation was modified in 2014 when ‘simple‘ became ‘obvious‘ (link). Disorder is in the centre of the diagram wherein there is no clarity about which of the other domains apply:

In a book chapter published in the year 2000 (link), David notes “the Cynefin model focuses on the location of knowledge in an organization using cultural and sense making …”. Laura, Dan and Cassie provide excellent examples of this sense-making in their own cultural contexts.

Many of my colleagues in sport will appreciate this slide from Dan’s presentation that exhorts us “to adopt a systematic process to reduce data by understanding the similarity and uniqueness of the multiple measures we collect”:

… whilst being very clear about the time constraints to share the outcomes of this process with coaches.

Photo Credit

Arboretum – Bonsai (Meg Rutherford, CC BY 2.0)