This is a post to share my bumping into work by Glenn Brier, Frank Harrell and David Spiegelhalter. It coincides with an email exchange I had with Tony Corke about how to share posterior outcomes in the context of prior statements of probability.
I saw a reference to a Brier Score in David Glidden’s (2018) post on forecasting American football results (link) and followed up David’s reference to a Brier score in Wikipedia (link). This encouraged me to seek out some of Glenn Brier‘s papers to find the origin of the score named after him.
An early paper was written in 1944 for the United States Weather Bureau. It was titled Verification of a forecaster’s confidence and the use of probability statements in weather forecasting (link). In the introduction to that paper, Glenn observed “one of the factors that has contributed to the difficulties and controversies of forecast verification is the failure to distinguish carefully between the scientific and practical objectives of forecasting” (1944:1). He proposed that:
- The value of forecasts can be enhanced by increased use of probability statements.
- The verification problem can be simplified if forecasts are stated in terms of probabilities.
He added “there is an inherent danger in any forecast if the user does not make use of (or is not provided with) the pertinent information regarding the reliability of the forecast” (1944:7). The sharing of information about the reliability of the forecast makes it possible to provide recommendations for action. Glenn concluded “the forecaster’s duty ends with providing accurate and unbiased estimates of the probabilities of different weather situations” (1944:10).
In a paper written in 1950, Verification of forecasts expressed in terms of probability, Glenn provided more detail about his work on probability statements and presented the details of his verification formula (link). He proposed ” perfect forecasting is defined as correctly forecasting the event to occur with a probability of unity” with 100 percent confidence (1950:2).
The 1950 paper raised the question of skill in forecasting. A decade later, Herbert Appleman (1959) initiated a discussion about how to quantify the skill of a forecaster (link). Glenn’s 1950 paper prompted Allan Murphy (1973), amongst others, to look closely at vector partitions in probability scores (link). Some time later, Tilmann Gneiting and Adrian Raftery (2007) considered scoring rules, prediction and estimation (link).
Frank Harrell fits into this kind of conversation in a thought-provoking way. Earlier this year he wrote to distinguish between classification and prediction (link). He proposes:
Whether engaging in credit risk scoring, weather forecasting, climate forecasting, marketing, diagnosis a patient’s disease, or estimating a patient’s prognosis, I do not want to use a classification method. I want risk estimates with credible intervals or confidence intervals. My opinion is that machine learning classifiers are best used in mechanistic high signal:noise ratio situations, and that probability models should be used in most other situations.
One of the key elements in choosing a method is having a sensitive accuracy scoring rule with the correct statistical properties. Experts in machine classification seldom have the background to understand this enormously important issue, and choosing an improper accuracy score such as proportion classified correctly will result in a bogus model.
There is a reference in Frank’s paper to risk (“by not thinking probabilistically, machine learning advocates frequently utilize classifiers instead of using risk prediction models”) and a link to a David Spiegelhalter paper written in 1986, Probabilistic prediction in patient management and clinical trials (link). In that paper, David argued for “the provision of accurate and useful probabilistic assessments of future events” as a fundamental task for biostatisticians when collaborating in clinical or experimental medicine. Thirty-two years later, David is Winton Professor for the Public Understanding of Risk (link).
In 2011, David and his colleagues (Mike Pearson and Ian Short) discussed visualising uncertainty about the future (link). They describe probabilities best treated “as reasonable betting odds constructed from available knowledge and information”. They identified three key concepts that can be used for evaluating techniques to display probabilistic predictions:
- Common sense and accompanying biases
- Risk perception and the role of personality and numeracy
- Type of graphic presentation used
David returned to the theme of uncertainty in 2014 and suggested “it will be vital to understand and promote uncertainty through the appropriate use of statistical methods rooted in probability theory” (link). Much of David’s recent work has focussed on the communication of risk. At the Winton Centre:
All too often, numbers are used to try to bolster an argument or persuade people to make a decision one way or the other. We want to ensure that both risks and benefits of any decision are presented equally and fairly, and the numbers are made clear and put in an appropriate context. We are here ‘to inform and not persuade’. (Link)
All these thoughts were running through my head when I decided to contact Tony Corke. I admire his work immensely. A couple of rapid emails helped me with a Brier Score issue for priors and posteriors from my Women’s T20 cricket data. I am not blessed with mathematical intelligence and Tony was very reassuring.
I am now off to research Solomon Kullback and Richard Leibler who were writing a year after Glenn Brier’s 1950 paper (link).