Dealing With Data Deluge

I have spent much of the last two days in conversations with coaches about personalising learning environments for athletes and their colleagues.
I think this ability to personalise coaching and modulate training is a characteristic of the (+) of the coach I discussed here.
A recurring theme in conversations has been the growth in pervasive sensing data in training and competition environments. I have become increasingly interested in how computational intelligence might help with these data.
Whilst contemplating these issues, I received an alert from The Scholarly Kitchen to Todd Carpenter’s Does All Science Need to be Preserved? Do We Need to Save Every Last Data Point?
In his post Todd observes:

There are at present few best practices for managing and curating data. Libraries have developed, over the decades, processes and plans for how to curate an information collection and to “de-accession” (i.e., discard) unwanted or unnecessary content. At this stage in the development of an infrastructure for data management, there is no good understanding of how to curate a data collection. This problem is compounded by the fact that we are generating far more data than we have capacity to store or analyze effectively.

He notes “the much deeper questions of large datasets and what to preserve, at what level of detail and granularity, and whether all data is equally important to preserve are questions that have yet to be fully addressed”.
Todd pointed to Kelvin Droegemeier‘s presentation, A Strategy for Dynamically Adaptive Weather Prediction: Cyberinfrastructure Reacting to the Atmosphere the U.S. National Academies Board on Research Data and Information ( a copy of the presentation here).
In his presentation, Kelvin asked a fundamental research question “Can we better understand the atmosphere, educate more effectively about it, and forecast more accurately if we adapt our technologies and approaches to the weather as it occurs?“.
To do so Kelvin noted the need to “Revolutionize the ability of scientists, students, and operational practitioners to observe, analyze, predict, understand, and respond to intense local weather by interacting with it dynamically and adaptively in real time“. He emphasised the need for adaptive systems and the provision of service oriented architecture.
This architecture for Linked Environments for Atmospheric Discovery is outlined in slide 19 of his presentation:

In his discussion of Kelvin’s paper, and the issue of volume of data, Todd suggests:

One can certainly maintain the highest grain data if in retrospect it was an extraordinary discovery or event.  However, if fine grain detail was collected and nothing of consequence occurred, does that fine-grain detail need to be preserved? Probably not, without some other specific reason to do so. Obviously, this is a simplification, since you will want to retain some version of the data collected for re-analysis, but the raw data and the resolution of that data need not be preserved on an ongoing basis.

I do think these are issues of importance for sport as volumes of pervasive sensing data are acquired. I see significant parallels between the prospective study of injury risk and Kelvin’s discussion of local weather variation.
There are important issues related to curation too. I see Todd’s post as an excellent introduction to the granularity of data and the decisions we make about the costs and benefits of collecting and storing data. A recent paper (Balli and Korukoğlu, 2012) raises an interesting question about how early these data can be collected for talent identification purposes.
Photo Credit
The British Coach Giving a Few Weight Lifting Hints

LEAVE A REPLY

Please enter your comment!
Please enter your name here