Commonwealth Games 2018 Netball Tournament (#GC2018Netball)

There were 38 games played in the 2018 Commonwealth Games Netball Tournament. Just one of these games, Scotland v Barbados, required extra time to determine the winner of the game.

A box plot (using BoxPlotR) of the 37 games played in regular time is:

Centre lines show the medians; box limits indicate the 25th and 75th percentiles as determined by R software; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles, outliers are represented by dots; width of the boxes is proportional to the square root of the sample size. n = 37 sample points. Winning teams are shown in light green, losing teams in light blue.

Overall, the median profiles and ranges in the 37 games were:

Photo Credit

Scotland v Barbados (Aaron Hurle, Twitter)

Using Flourish

Yesterday, I saw this tweet from Mara Averick.

I thought I would investigate and try out the Flourish visualisation platform with some data from the English Premier League.

I have looked at the momentum each of the nine teams who have changed their manager during the 2017-2018 season. I have a very basic measure of momentum based upon results (1 for a win, 0 for a draw, -1 for a defeat).

The visualisation can be found at:

Mara very kindly made a gif to overcome some issues in sharing the visualisation address on Twitter.

Basketball: archives and insights

On 19 December 2017, Google Cloud announced that it had become the official cloud partner for the National Collegiate Athletic Association (NCAA).

In the announcement, it was reported that:

the NCAA is migrating 80+ years of historical and play-by-play data, from 90 championships and 24 sports, to Google Cloud Platform

One of the first activities planned was to explore basketball data in preparation for the NCAA’s Women’s and Men’s Division I Basketball Tournaments held in March and April 2018 (March Madness).

More information about the partnership appeared in two posts on 30 March 2018. In the first post, Courtney Blacker reported a month’s-long experiment “to apply Google’s technologies to the NCAA’s treasure trove of data”. 

We assembled a team of technicians, data scientists, and basketball enthusiasts (we call them ‘The Wolfpack’) who built a data processing workflow using Google Cloud Platform technologies like BigQuery and Cloud Datalab.

The aim of this approach was “to build models that look at influential factors on team performance”. During the tournaments, the Google Cloud team planned to “use our workflow to analyze our observations from the first half of each game against NCAA historical data to hone in on a stat-based prediction for the second half that we think is highly probable”. These predictions would be presented as a television advert during the half time break.

An example from the Kansas v Villanova semi-final game:

The video suggested there would be at least 26 assists in the second half (there were 28) and 55 shot attempts (there were 64). (In the second semi final, Michigan v Loyola-Chicago, the predictions were for 37 three-point attempts (there were 38) and 29 rebounds (there were 29).

The final had these suggestions:

The second post, written by Eric Schmidt and Allen Jarvis,  about the Google Cloud and NCAA partnership, provided a detailed account of the architecture to support the data analysis that was occurring. This illustrated “the importance of proper tooling to enable collaboration across multiple disciplines, including data engineering, data analysis, data science, quantitative analysis, and machine learning”.

The architecture for this service requires:

  1. A flexible and scalable data processing workflow to support collaborative data analysis.
  2. New analytic explorations through collaboratively developed queries and visualizations.
  3. Real-time predictive insights and analysis related to the games, modeled around NCAA men’s and women’s basketball.

Eric and Allen go through each of these points at length. Their account indicates what is becoming available to sport as we explore archives for insights.

They have an important message in their conclusion:

… better data preparation means better data analysis. Many organizations imagine diving in directly to predictive modeling without a critical examination of their data or existing analytic frameworks. If the greatest value is to be found in predictive insights, followed by analysis, supported by clean but raw data, you can imagine the amount of work required to get there as the inverse: a lot of data preparation that paves the way for better analysis, which in turn clears a path for good modeling.

The ball is in all our courts.

Photo Credits

March Madness 2009 (Andy Thrasher, public domain)

Gators are in the Final Four (Courtland, CC BY-NC-ND 2.0)