Bayesian additive regression trees

Asmi Toumi and Michael Lopez (2019) (link) have been investigating ice hockey. They presented their work at the New England Symposium on Statistics in Sport (link).

In their introduction Asmi and Michael note they use “two matching methods – propensity score matching and Bayesian additive regression trees” to “leverage player-tracking data to estimate the causal benefits due to zone-entry decisions”. Asmi and Michael note that both approaches “better account for the variables that affect entry choice”.

I was particularly interested in Asmi and Michael’s reference to Bayesian additive regression trees. I have been thinking about priors and posteriors for some time and I was intrigued as to how Asmi and Michael dealt with this in ice hockey. They refer to a 2012 paper by Theodore Hill and Marco Dall’Aglio (link) and their consideration of Bayesian posteriors.

The 2012 paper led me to Theodore’s 2011 discussion of conflation (link). His study of conflation was “motivated by a basic problem in science, namely, how best to consolidate the information from several independent experiments, all designed to measure the same unknown quantity”. It also led to Marco and Theodore’s 2018 paper on Bayesian posteriors (link).

Asmi and Michael obtained zone-entry tracking data from for the 2017-18 and 2018-19 NHL seasons. In total they identified 277,661 entries, of which 158,808 were carry-ins and 118,853 were dump-ins. They concluded with the use Bayesian additive regression trees that it is approximately 23% more beneficial for a forward to carry the puck into the zone.

Asmi and Michael discuss limitations in the conclusion of their study. They note “To produce meaningful inferences on zone entry decision-making, we considered many confounders such as score differential, time left on the clock, entry player skill and entry player location. One coufounder that we did not have access to is the location of all the other players on the ice, especially that of opposing players”.

I was delighted that Asmi and Michael shared their code from the paper on GitHub (link).

Hugh Chapman and his colleagues (2010) (link) provide a detailed account of Bayesian additive regression trees.

Mens Ice Hockey

Jason Abrevaya and Robert McCulloch (2014) (link) discussed Bayesian additive regression trees in their analysis of penalty calls in the National Hockey League. Their paper helped me return from the detailed mathematical discussion in Hugh Chapman and colleagues’ paper. It also enable me to think about de jure and de facto officiating.

I was also interested to discover that there is an R package, bartMachine, for Bayesian additive regression trees (link). Adam Kepelner and Justin Bleich (2014) (link) point out the goal of bartMachine “is to provide a fast, extensive and user-friendly implementation accessible to a wide range of data analysts, and increase the visibility of BART to a broader statistical audience”.

These papers indicate how far I need to develop my thinking. I do hold some very basic views about priors and posteriors. These will be refined by my discovery of Bayesian additive regression trees and how we might regard and report performance is sport. I will need to explore the R package available and hope to find an instructive vignette about its use. Jason Abrevaya and Robert McCulloch’s paper has also encouraged me to revisit some of my work in officiating.

Photo Credit

Photo by Andy Hall on Unsplash

Face Off (Nottingham Trent University, CC BY-NC-ND 2.0)


Please enter your comment!
Please enter your name here