What kind of game? Run scoring patterns leading into #WT20

The 2018 ICC Women’s World Twenty20 is being hosted by the West Indies. There are three games on the first day of the tournament, 9 November.

I have data from 58 Women’s ICC T20 games played in 2017-2018 prior to the 2018 World Twenty20. I will track the games in the West Indies and compare them to the median profiles from the 58 games played. 26 of these games were won by the team batting first, 32 by the team batting second.

My profiles are:

Overall

Winning Teams

Losing Teams

Naive Priors

From these data, I have identified these priors to monitor outcomes in the tournament to track when runs are scored.

My hope is that these priors give an early indication of game outcome … and enable me to explore outlier game outcomes.

Photo Credit

Frame grab, Star Sports video (Twitter)

Data and coherent narratives

Peter Killeen (2018), in a paper that discusses the futures of experimental analysis of behavior, observes “we must learn that data have little value until embedded in a coherent narrative”.

The construction of this narrative has been a hot topic this week in conversations about data science activities.

One example is Evan Hansleigh’s discussion of sharing data used in Economist articles:

Releasing data can give our readers extra confidence in our work, and allows researchers and other journalists to check — and to build upon — our work. So we’re looking to change this, and publish more of our data on GitHub.

He adds:

Years ago, “data” generally meant a table in Excel, or possibly even a line or bar chart to trace in a graphics program. Today, data often take the form of large CSV files, and we frequently do analysis, transformation, and plotting in R or Python to produce our stories. We assemble more data ourselves, by compiling publicly available datasets or scraping data from websites, than we used to. We are also making more use of statistical modelling. All this means we have a lot more data that we can share — and a lot more data worth sharing.

Evan’s article concludes:

We plan to publish more of our data on GitHub in the coming months—and, where it’s appropriate, the analysis and code behind them as well. We look forward to seeing how our readers use and build upon the data reporting we do.

The availability of such shared resources, in Uzma Barlaskar’s terms, will enable us to be data-informed rather than data-driven. Uzma suggests:

In data driven decision making, data is at the center of the decision making. It’s the primary (and sometimes, the only) input. You rely on data alone to decide the best path forward. In data informed decision making, data is a key input among many other variables. You use the data to build a deeper understanding of what value you are providing to your users. (Original emphases)

Alejandro Díaz, Kayvaun Rowshankish, and Tamim Saleh share insights from McKinsey research on the use of artificial intelligence in business and note “the emergence of data analytics as an omnipresent reality of modern organizational life” and the consideration that might be given to “a healthy data culture”.

Alejandro, Kayvaun and Tamim suggest that such a culture:

  • Is a decision culture
  • Has ongoing commitment to and conversations about data initiatives
  • Stimulates bottom up demand for data
  • Manages risk as a ‘smart accelerator’ for analytics processes
  • Supports change agents
  • Balances recruitment of specialists with retention of existing staff

Chris Lidner has looked at the profiles of data scientists that become part of an organisational data culture. He reports “data scientists come from a wide variety of fields of study, levels of education, and prior jobs”. They have a range of job descriptions too: data engineer, data analyst, software engineer, machine learning engineer, and data scientist.

The combination of these posts sent me back to re-read Chris Moran’s What Makes a Good Metric? published in August. I think Chris helps us think about our data narratives in the context of “audience, metrics, culture, and journalism”. He points us to Deepnews.ai Project as an example of valuing the impact of journalism to the information ecosystem.

This leads Chris to identify the characteristics of robust metrics that help us understand quality and impact:

  • Relevant
  • Measurable
  • Actionable
  • Reliable
  • Readable

He reminded us also that we should be conscious of Goodhart’s Law: any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

As a result of reflecting on these aggregated ideas and discussions, I returned to this diagram presented by Hadley Wickham and Garrett Grolmund‘s data exploration visualisation:

I wondered how this process might change if we start, as Peter Killeen suggested, with an awareness of how we might embed our narrative for a range of audiences in data intensive contexts.

Photo Credits

Basketball photo by William Topa on Unsplash
Person holding four photos photo by Josh Hild on Unsplash

Documenting and Sharing

Signal Noise, The Economist and Siemens have worked together to visualise the fan energy in FC Bayern Munich’s Allianz Arena.

The visualisation includes: game timelines; fan energy; highlights; players; and social ripple. The visualisation provides the user with a rich array of options.

I think this is a great example of the analytic turn in sport and highlights the data expertise available to sport.

Earlier this year, Signal Noise hosted a Data Obscura exhibition that explored the relationship between data and truth. The exhibition was launched with a panel discussion that considered whether transparency and truth should be the ultimate aim online, and asked “how much is ‘true enough’?”.

This interplay between practice, epistemology and ontology is fundamental to anyone contemplating a career in sport analytics at a time when:

Multiple filters are applied to the information that we see: algorithms distill a world of opinions to give us a distinct view of events, and authenticity is becoming an increasingly scarce commodity. (My emphasis) (Data Obscura, 2018)

This contemplation could lead to a consideration of epistemic cultures and the machineries of knowledge construction. Karin Cetina (1999) writes:

Everyone knows what science is about: it is about knowledge, the ‘objective’ and perhaps ‘true’ representation of the world as it really is. The problem is that no one is quite sure how scientists and other experts arrive at this knowledge. The notion of epistemic culture is designed to capture these interiorised processes of knowledge creation. It refers to those sets of practices, arrangements and mechanisms bound together by necessity, affinity and historical coincidence which, in a given area of professional expertise, make up how we know what we know. Epistemic cultures are cultures of creating and warranting knowledge.

This process involves what Maurizio Ferraris (2006) defines as ‘documentality’. For Maurizio, documents are social objects (such that they involve at least two persons) “characterised by the fact of being written: on paper, in a computer file, or simply in people’s heads”.

His theory develops in three different directions:

  • an ontology (“What is a document?”)
  • a technology (an explaination of how documents are distributed)
  • a pragmatics (an understanding of the efficient distribution of documents)

Sharing the Signal Noise, The Economist and Siemens venture into the Allianz Stadium here has led me to reflect on learning journeys.

The volume and quality of data analysis opportunities positions this generation of data analysts in sport in a very important ontological and pragmatic space.

There are more ways to share primary data and analysis than ever before. Each of us can make an informed and transparent decision about the machineries we choose to construct information sharing and stimulate conversations about knowledge and understanding.

In my case, I use the WordPress blog platform to connect ideas that strike me as important. I discovered news of the Signal Noise project on Twitter. The tweet came as I was re-reading Maurizio Ferraris and editing the Ethical Issues page of the wikiEducator course Sport Informatics and Analytics. In sharing this process openly, I am hopeful that readers can make informed decisions about authenticity and contemplate these issues as worthy of consideration.

Photo Credits

Frame grab Reimagine the Game

FC Bayern (Twitter)