Scoring first in football project: six European leagues

I have been monitoring scoring first in association football.

In the 2017-2018 season I used the Worldfootball.net website to record goal scoring timings in six European leagues: EPL; Ligue 1; Bundesliga; Serie A; Eredivisie; and Primera.

I looked at the following events in games played in these leagues:

  • score first win (SFW)
  • score first draw (SFD)
  • score first lose (SFL)
  • 0 v 0 game (0goals)

I wondered if these measures could inform a naive Bayes approach to probabilistic behaviours in association football. I am thinking that for the 2018-2019 season I will use these prior probabilities to monitor teams’ progress.

For the six leagues combined, I have these median probabilities:

  • SFW 0.64
  • SFD 0.19
  • SFL 0.12
  • 0goals 0.07

I have a measure for scoring first and not losing (SFNL) (a sum SFW + SFD) and my median for the six leagues is 0.82.

My median probability profiles for each of the six leagues is:

LeagueSFWSFDSFL0goalsSFNL
EPL0.630.180.110.080.81
Ligue 10.640.190.120.050.83
Bundesliga0.620.200.110.070.82
Serie A0.650.150.130.070.80
Eredivisie0.580.200.180.050.78
Primera0.690.150.080.080.84

For the champions of each league, my probability profiles are:

LeagueSFWSFDSFL0goalsSFNL
Man City0.760.030.050.050.79
PSG0.680.030.000.050.72
Bayern0.620.060.030.030.68
Juventus0.680.050.030.080.73
PSV0.590.090.000.030.68
Barcelona0.630.080.000.030.71

For the bottom team in each of these leagues, my probability profiles are:

LeagueSFWSFDSFL0goalsSFNL
W Brom
0.160.180.110.110.34
FC Metz0.110.130.080.030.24
FC Koln0.120.090.150.060.21
Benveneto0.110.030.130.000.14
FC Twente0.180.090.180.060.27
Malaga0.130.030.080.080.16

I am hopeful that these data might be of interest to anyone undertaking a Bayes approach to goal scoring.

In my own work, I am keen to see how early we can confirm the likely trajectory of any team in each of these six leagues.

My next post in this series will share data from the last two FIFA World Cups.

Photo Credit

Supporters FCN (Manuel, CC BY-SA 2.0)

Six European Football Leagues Going into Christmas 2017

I have been following scoring patterns in six European football leagues (EPL, Ligue 1, Bundesliga, Serie A, Eredivisie and Primera) in the 2017-2018 season.

I have a particular interest in the outcome of scoring first and not losing in games in these leagues.

Prior to midweek games on 13 December 2017, the range of my data (n=875 games) thus far is:

In Ligue 1, the % of games in which the team that has scored first and not lost has ranged between 86% and 89%. In Serie A, the range is 72% to 80%. The other four leagues fit between these two leagues.

My BoxplotR visualisation of nine observations for these leagues is:

The box plot statistics are:

The EPL is about to enter an intense fixture period and I will be interested to observe any changes in pattern.

A separate project is to examine the games in which the team that scores first has lost (n=96 of the 875 games played). The Eredivisie has the largest number of these games (n=23 out of 134 games) and the Primera the smallest number (n=12 out of 150 games).

Photo Credit

Marco Verratti (PSG Officiel, Twitter)

Fireworks (AjaxDaily, Twitter)

Skill and chance in football goal scoring

These data appeared in my in box this morning.

Their arrival took me back to Charles Reep and Bernard Benjamin’s observation fifty years ago:

an excess of shots by one team does not mean that, by chance, the other side will not get more goals and thus win the match (1968:585)

The team that managed 13 shots on goal had an ELO Rating of 4 in Asia and 37 in the World (as of 4 September 2017). The team that had 1 shot on goal had an ELO Rating of 12 in Asia and 108 in the World (as of 4 September 2017).

The game ended as a 2v1 win for the higher ELO rated team.

Charles and Bernard concluded (shortly after the 1966 World Cup):

with rare exceptions (for example, the 1966 World Cup series) it takes 10 shots to score 1 goal. (1968:585)

Prior to the 2v1 result, the higher ELO rated team had scored 14 goals in 9 games, the lower ELO rated team had scored 5 goals (conceded 22) in 9 games.

The data come from a game played at home by the higher rated ELO team.

I am hopeful that these kind of data change our language in training environments. We might stop talking about shooting practice and start conversations about goal scoring practice, rehearsal, scenarios and consequences.