Allan Roth


A ScoopIt alert to an article written by Andy McCue introduced me to Allan Roth.

Andy notes:

Allan Roth pushed the analysis of baseball statistics to a new level. He promoted himself into a place those other analysts only aspired to. Roth was the first to be employed full time by a major league team…

Andy’s article is an excellent account that will be of interest to performance analysts. This is particularly so in the passages where we learn about Allan’s attempts to find a position in baseball.

In addition to Andy’s article for SABR, the following might be of interest.

Induction into the Canadian Baseball Hall of Fame 2010

New York Times Obituary 1992

Mr. Roth, who charted every pitch and did the requisite calculations either in his head or with a simple calculator, insisted on working by hand throughout his career.

Branch Rickey and Sabermetrics 1954

He was widely regarded as “the first executive to see the value of using baseball statistics in putting together and running his teams”. While GM of the Brooklyn Dodgers, this realization inspired Rickey to hire a full-time statistician named Allan Roth in 1947. Only 26 years old at the time, the Montreal-born Roth was charged with recording every conceivable piece of data pertaining to the team and then synthesizing it into relevant strategy.

Baseball’s numbers revolution: a chronology

The First Baseball “Stats Man”

Allan Roth.  “He was the guy who began it all.”


Allan was a contemporary of two people I have spent a great deal of time researching, Lloyd Messersmith and Charles Reep. I am sorry it has taken me so long to find Allan.

His journey resonates with many early performance analysts. He has insights he would like to share and develop but it requires a brave person to create the opportunity to do this. Particularly when he was born in Canada with the potential to be seen as an ‘outsider’.

I am hopeful this is the start of another history of ideas and the social construction of knowledge. He is referred to in Branch Rickey’s Wikipedia page but does not have his own page.

Photo Credit

Allan Roth and his books, tables and calculating machines (United States Library of Congress, no copyright restriction known.)


Growing Sport Analytics

I have been thinking a great deal about transforming performance and leading ahead of the curve this year. By coincidence this year is ending with the screening of Moneyball.

I have been interested in particular in how secondary data can inform and support coaches and the coaching process.

One of the catalysts in my thinking has been Usama Fayyad. I heard Usama speak at a knowledge discovery in databases conference in Sydney in 2005. His 1996 paper From Data Mining to Knowledge Discovery in Databases co-written with Gregory Piatetsky-Shapiro and Padhraic Smyth was my first engagement with a domain of enquiry that has become a primary focus for me.

In their 1996 paper the authors point out that:

Across a wide variety of fields, data are being collected and accumulated at a dramatic pace. There is an urgent need for a new generation of computational theo- ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).

I spent much of the 1980s and 90s collecting data about performance in rugby union and a number of other sports. All of these data were collected with hand notation systems. My interest then and now is the pattern of observable individual and team behaviour.

I saw an early copy of Michael Lewis’s book The Art of Winning an Unfair Game (2003) and was attracted intuitively to the power of sabermetrics. I saw in Bill James’s work the passion that fired Charles Reep in his observations of association football. Michael, Bill and Charles had a predecessor in Hugh Fullerton who in 1910 wrote about The Inside Game of baseball.

In his paper, Hugh Fullerton points out:

Last season (1909) I arranged with scorers to record hits of various kinds, and secured the scores thus kept on 40 Central League games, 26 American Association games, and fourteen college games to compare with major league scores kept in the same manner. In the college games one grounder in every 8 1/3 passed the infielders. In the Central League one in 10 7/12, in the American Association one in 12 2/43, and in the American National Leagues (45 games of my own scoring) one in every 15 3/16.

He adds that:

The figures were amazing, as they followed so closely the classification of the leagues. They proved that there is a reason for the “class”, but the proof is not found in the mathematics, but in two word (unless you hyphenate them), “team work.”

For Hugh Fullerton the inside game is “the art of getting the hits that “he couldn’t have got anyhow””.

102 years on from Hugh Fullerton’s research there is a growing community of practice in sport analytics. A recent example of this incandescence was a Sports Analytics conference in Manchester in November.

Speakers at the conference included: Bill Gerrard, Raffaele Poli, Simon Wilson, David Fallows, Gavin Fleig, Ed Sulley, Ian Lenagan, Rob Lowe, Fergus Connolly, Ian Graham, Nick Broad and Steve Houston.

Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth  point out that Knowledge KDD is “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”.

I believe they have provided a fundamental guide to the family resemblances that characterise the analysis of sport performance:

Here, data are a set of facts (for example, cases in a database), and pattern is an expression in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; finding structure from data; or, in general, making any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple iterations. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the average value of a set of numbers.

I am hopeful that 2012 will provide opportunities to share data throughout the community of practice that is sport analytics.

Photo Credits


Australian bowler, Bill O’Reilly, demonstrates his famous grip

Conference Session 3: ACCSS

The Asian Conference of Computer Science in Sports (ACCSS) is being held at the Japan Institute of Sports Sciences. The third session of the conference comprised four papers and one keynote address. (Reports of a pre-conference workshop here, session one here and session two here.)

Early arrivals:

Takahiro Morishige’s paper on match analysis support of a collegiate men’s basketball team was presented by Hiroo Takahashi. (Hiroo is Takahiro’s Masters’ supervisor.) The paper described the combination of analysis software and iPod Touch to support coach and athlete development.

Kiyoshi  Osawa explored the computation of the winning percentage in baseball with reference to the effect of fielder error.

I liked Kiyoshi’s use of animation to locate his paper within the game of baseball. I was very interested in his account of computational complexity. The analysis of the data was enabled by Aida Laboratory at the National Institute of Informatics.

Hyongjun Choi presented his second paper at the conference. This paper presented a cluster analysis of performance data using an artificial intelligence technique. He discussed his use of Kohonen self-organising maps.

The final paper of this session was presented by Nobuyoshi Hirotsu, the chair of this session. Noboyoshi discussed the relationship between data envelopment analysis and sabermetrics in the evaluation of batters in baseball.

The third session of the conference concluded with a keynote demonstration presentation by Hiroshi Inukai on computer games and sports.

Hiroshi provided an overview of the history of computer games in sports and explored the current status of gaming . He noted the opportunities for the programmable physics potential for eSports. He concluded with a discussion of the challenges facing eSport software.