I met Alan Ryder’s work when I started to compile an expected goals bibliography (link). His 2004 paper (link) is regarded as a seminal paper in the study of shot quality. The subtitle of the paper is a methodology for the study of the quality of a hockey team’s shots allowed.
In a subsequent paper (2007), Alan looked carefully at the data he used in his 2004 paper and discussed reliance on a data set and the potential biases involved. His discussion reminded me of the new look at data in the Charles Reep, Richard Pollard and Bernard Benjamin (1971) (link) paper on skill and chance. I think both papers underscore the importance of looking carefully at our data.
Alan has a website Hockey Analytics (link) where he shares his hockey analytics. In his About page, he observes:
While most people’s eyes glaze over when they look at numbers, statistics talk to me. I am a graduate of the University of Waterloo with a B. Math granted by the Department of Statistics. My degree is actually in Actuarial Science and Computer Science. This means that my statistical learning was more about modeling and probability than about statistical inference. These skills have been recently topped up while working for General Electric. GE has a quality program based on rigorous statistical inference and I have been trained as a “Black Belt” – an advanced quality specialist.
In his introduction to the paper, Alan notes that the paper “explores the measurement of “Shot Quality”. What do I mean by quality? If a shot is more “dangerous”, it is of higher quality. What is a “dangerous” shot? It is a shot with a greater likelihood of becoming a goal”. He adds “a measure of shot quality would give us greater insight into the relative contribution of goaltending and defense”.
Alan used the NHL’s Real Time Scoring System to collect his data but noted “one has to accept that measurement error is present and address this fact when trying to use the data”. From these data Alan’s measure of shot quality “is the rate of conversion of shots into goals, or the probability of a goal under the studied circumstances. A higher goal probability means a higher shot quality”.
Alan said of his work “the real value of this analysis is in its ability to finely measure the quality of any given shot, to the best of our current ability. And if we can measure one shot, we can measure a whole game or a whole season for a given team”.
Alan concluded “the model to get to expected goals given the shot quality factors is simply based on the data. There are no meaningful assumptions made. The analytic methods are the classics from statistics and actuarial science”.
In 2007, Alan issued a Product Recall Notice for the 2004 paper (link). In it he discussed “data quality problems with the measurement of the quality of a hockey team’s shots taken and allowed”. Alan was concerned about the data source for his 2004 paper (the NHL’s Real Time Scoring System) and observed “I have been worried that there is a systemic bias in the data”. He added “I do think that it is a serious possibility that the scoring in certain rinks has a bias towards longer or shorter shots, the most dominant factor in a shot quality model”.
Alan observes there are two standard uses of shot quality models.
- To balance our view of the relative impact of defense and goaltending.
- To assess expected versus actual offense.
Alan concluded that “we clearly have an issue of RTSS scorer bias. The problem appears to be brutal at Madison Square Garden but is clearly non-trivial elsewhere. The NHL needs a serious look at the consistency of this process” (my emphasis).
In the final paragraph of the 2007 paper, Alan writes “It’s not really a recall. Shot quality is not broken. Just don’t use it without understanding it. Use the road factors! Shot quality is a very powerful tool. Like any tool the user needs to understand (a) how to use it and (b) its limitations. My method for assessing shot quality relies on the underlying data”.
In addition to Alan’s work, Ken Kryzwicki made an important contribution to the discussion of shot quality. For the 2005-2006 NHL regular season, Ken shared a logistical regression of his shot quality model (link). Ken used five predictor variables to analyse his data: distance; rebound; situation, shot after opponent turnover: and shot type. Each shot on goal was assigned a predicted probability of going in.
In 2010, Ken wrote about shot quality in the NHL (link). His methodology included “a 70% random sample of the 2009-10 NHL play-by-play (PBP) files from the RTSS scoring system and (x,y) coordinates obtained from the ESPN website, a logistic regression was run”.
Ken concluded “with an additional data source we were able to create additional variables for consideration when building a shot quality model, namely: shot angle and rebound push direction and degrees”.
His paper ended with this observation:
With a well-fit, robust model we were able to calculate predicted shooting and save percentages for shots on goal during the 2005-06 regular season. That is, we were able to assign shot quality, both for and against, to each shot. This data, examined at the skater, team and goalie level allowed us to make certain observations and inferences about how well a team (or player) did relative to shot quality.
Alan and Ken were at the forefront of the discussion of shot quality. Their detailed look at ice hockey raised important questions about shot quality and expected goals. Alan’s revisiting of the data he used also raises for us questions about the veracity of data and the potential biases that may exist. This is an issue in basketball too where researchers have investigated bias in box scores (link).
Alan Ryder (Twitter)
Shot (SB Nation)
Basketball (Marcus Woo, Inside Science)