Six Weeks?

Last year, Ben Cronin wrote about the importance of the first six games of a football league season (link) in the context of positions at the end of the season.

Ben’s analysis of the English Premier League noted:

The winner of Premier League has been outside the top four after six games on only thee occasions since the 38 game season was introduced in 1995/96 – (Manchester United moved from 10th to 1st in 2002/03, Manchester City won the league despite being 7th after six games in 2013/14 and Chelsea moved from 8th to 1st in 2016/17).

I followed up on Ben’s work with a look at six European leagues for the 2018-2019 seasons. Teams’ points per game averages after six games (PPG6) were:

After the latest rounds in all of the leagues, the average points per game (PPGn) are:

I looked at teams’ current performance relative to their week 6 positions (Change6n):


In the 2017-2018 seasons in the six European leagues, four were won by the team leading their respective tables at week 6. The two exceptions were: in the Bundesliga, Bayern were third in week 6, the leaders then, Dortmund, finished 4th (29 points behind Bayern); in Serie A, Juventus were second on goal difference to Napoli, by the end of the season Napoli had moved to 2nd and were 4 points adrift of Juventus.

Data Fragments

The Centre Pompidou in Paris. People crossing the street with data in the p[icture.

I have managed to read three of the tantilising feeds I received yesterday.

The first was by Prateek Karkare on Decision Trees (link). I found his intuitive introduction very helpful. He started with some binary decision examples then moved on to classification, regression, and learning.

The second was Scott Berinato’s Data Science and the Art of Persuasion (link). In it, Scott observes that organisations:

still expect data scientists to wrangle data, analyze it in the context of knowing the business and its strategy, make charts, and present them to a lay audience. That’s unreasonable.

He proposes “rethinking how data science teams are put together, how they’re managed, and who’s involved at every point in the process, from the first data stream to the final chart shown”.

Scott explores a last mile problem that has existed for a century (“As the cathedral is to its foundation so is an effective presentation of facts to the data”) (link). Scoot concludes that a better data science operation environment needs:

  • A definition of talents rather than team members (management, wrangling, analysis, domain expertise, design, storytelling)
  • Create a portfolio of talents
  • Share experiences and insights
  • Structure projects around talents

With this approach in place:

  • Assign a single, empowered stakeholder
  • Assign leading talent and support talent
  • Co-locate
  • Reuse and template

The third read was Susan Grajek’s The Student Genome Project (link). In her introduction, she observes:

In 2019, after a decade of preparing, colleges and universities stand on a threshold, eager to enter a new era of using technology to unlock our ability to apply data to advancing our missions. That threshold is similar to the one that science faced in the late 20th century: eager to begin using technology to put genetic information to use.

I thought this would resonate powerfully with sport contexts too. Note Susan’s point “We have a growing belief in the value and power of data to understand root causes and improve advice, decisions, and outcomes”.

This resonated very powerfully with me:

our sector faces a daunting preliminary task: we must understand the component parts (find the data, clean it, standardize it, safeguard it); integrate and manage those parts; and find the right tools for these tasks. Just as the big challenge facing genetics in the 1990s was foundational, so is the big challenge that confronts higher education and technology today. After almost a decade of attention and effort, we find ourselves still at the beginning of the data journey—needing to, in effect, “sequence” the data before we can apply it with any reliability or precision.

They are three data fragments but together they have provided me with another delightful day of exploration. I note them her as part of my learning portfolio.

Photo Credit

Photo by Curtis MacNewton on Unsplash

#AFLW 2019

The 2019 AFLW season starts on Saturday with the opening game between Geelong and Collingwood (link to fixtures).

I have some data from last year’s regular season (link) curated as secondary data from the official AFLW web site (link).

Median Profiles

A Violin Plot created with BoxPlotR (link). (W1Q is the winning team, L1Q is the losing team).

Plot information

These data have given me an opportunity to postulate some naive priors about when points will be scored in the 2019 season. The probabilities per quarter are based upon game outcome so that the labels ‘winning’ and ‘losing’ relate to the game not the quarter.