Seventeen years ago, Usama Fayyad, Gregory Piatesky-Shapiro and Padhraic Smyth wrote:
Across a wide variety of fields, data are being collected and accumulated at a dramatic pace. There is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).
I revisited their article in the AI Magazine this week after a number of finds prompted me to think about the visual analytic turn in sport.
The first visualisation that grabbed my attention was an English Premier League fixture strength table prepared by Neil Kellie (shared with me by Julian Zipparo). Neil used Tableau Public for his visualisation.
Neil developed his table by using a static star rating and a form rating combined to give a score for each fixture. This becomes a dynamic table as the season progresses. It has prompted me to think about how we weight previous year’s ranking in a model.
The Economist added its weight to the Fantasy Football discussions with its post on 16 August. The post uses topological data-analysis software provided by Ayasdi to visualise Opta data on the different attributes of players. In an experimental interactive chart:
the data is divided into overlapping groups. These groups contain clusters of data—in this case footballers with similar attributes—which are visualised as nodes. Because the groups overlap, footballers can appear in more than one node; when they do, a branch is drawn between the nodes. Some nodes have multiple connections, whereas others have few or none.
There is a 2m 32s introduction to the Ayasdi Viewer on YouTube. Lum et al (2013) exemplify their discussion of topology with an analysis of NBA roles. Their insights received considerable publicity earlier this year (“this topological network suggests a much finer stratification of players into thirteen positions rather than the traditional division into five positions”).
Back at Tableau Public, I found news of a Fanalytics seminar. One of the presenters at the workshop is Adam McCann. Adam’s most recent blog post is a comparison of radar and parallel coordinate charts. Adam led me to a keynote address by Noah Iliinsky: Four Pillars of Data Visualization (46m YouTube video). Noah works in IBM’s Center for Advanced Visualization.
This snowball sample underscores for me just how many remarkable people are in the visualisation space. I am interested to learn that a number of these people are using Tableau Public … to share sport data.
In other links this week, Satyam Mukherjee shared his visualisation of Batting Partnerships in the first Ashes Test 2013:
Simon Gleave’s 26 Predictions: English Premier League forecasting laid bare reminded me of the discussions following Nate Silver’s analysis of the 2012 Presidential Elections. I enjoyed Simon’s juxtaposition of 26 pre-season Premier League predictions, “13 which are at least partially model based, and 13 from the media. The models select Manchester City as title favourites but the journalists favour Chelsea”. Simon’s post introduced me to James Grayson and his reflection on predictions about performance. I think Simon and James have a very impressive approach to data.
This week’s links have left me thinking about an idea I had back in 2005. I wondered at that time if I could become skilful enough to combine the insights offered by Edward Tufte and Usama Fayyad. More recently, I have been wondering if I could do that with the virtuosity that pervades Snow Fall.