An Inside Game of Baseball?


I have gone back to have a look at Hugh Fullerton’s insights into the inside game of baseball from 1910.

I noticed that Hugh observed:

Given the speed and direction of the ball and the speed of the player, it is possible to figure to a millionth of a watt where his hands will meet the ball …

Given the average speed of the infielders, it would be possible to calculate beforehand approximately the number of base hits each team will make in a season – if the players were automatons.

The prompt to return to the paper was a post by Dayn Perry about player tracking. He shared Mark Newman’s news about Major League Baseball Advanced Media’s plan for an ‘in-ballpark infrastructure designed to provide the first complete and reliable measurement of every play on the field and answer previously unanswerable analytics questions’.

This frame grab connects 2014 baseball with 1910.


With this outcome:


Photo Credits Frame Grabs


Hugh Fullerton's Inside Game

Back in the Summer of 1993 I spent some time in the Ursinus College Library.

I was looking for information about Lloyd Messersmith.

Eventually I found a copy of Lloyd’s 1942 Doctor of Education thesis The Development of a Measurement Technique for Determining the Distances Traversed by Players in Basketball. The Bibliography contains twenty-one references, one of them is to a 1910 paper in the American Magazine by Hugh Fullerton.

Fortunately a librarian at Ursinus College tracked down the journal for me and I was able to make a copy from a microfiche reader.

The paper is twelve pages long (pp. 2-13 of the Magazine) and contains no references.

I think this paper is one of the foundation documents in the analysis of sport performance. Given it is a century since the paper was published I thought I would archive the paper here as a contribution to the sociology of knowledge of performance analysis in sport.

Each page of the paper is illustrated with a picture or diagram. The originals were in black and white.

The paper:

Pages 2 and 3 HFAMp0203 (11Mb)

Pages 4 and 5 HFAMp0405 (12Mb)

Pages 6 and 7 HFAMp0607 (12Mb)

Pages 8 and 9 HFAMp0809 (13Mb)

Pages 10 and 11 HFAMp1011 (11Mb)

Pages 12 and 13 HFAMp1213 (10Mb)

Twenty-one years later Lloyd Messersmith and Stephen Corey’s paper ‘The Distance Traversed by a Basketball Player’ appeared in the Research Quarterly of the American Association for Health, Physical Education and Recreation, 2, pp 57-60, May 1931. Like Hugh Fullerton’s paper, this paper has no references.

I see both papers as the genesis of performance analysis of sport performance. A combined sixteen page kick start to the field of study, 100 years ago and 80 years ago.

Photo Credits

Art Butler, St Louis NL

Carnegie Playground 5th Ave.

Growing Sport Analytics

I have been thinking a great deal about transforming performance and leading ahead of the curve this year. By coincidence this year is ending with the screening of Moneyball.

I have been interested in particular in how secondary data can inform and support coaches and the coaching process.

One of the catalysts in my thinking has been Usama Fayyad. I heard Usama speak at a knowledge discovery in databases conference in Sydney in 2005. His 1996 paper From Data Mining to Knowledge Discovery in Databases co-written with Gregory Piatetsky-Shapiro and Padhraic Smyth was my first engagement with a domain of enquiry that has become a primary focus for me.

In their 1996 paper the authors point out that:

Across a wide variety of fields, data are being collected and accumulated at a dramatic pace. There is an urgent need for a new generation of computational theo- ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).

I spent much of the 1980s and 90s collecting data about performance in rugby union and a number of other sports. All of these data were collected with hand notation systems. My interest then and now is the pattern of observable individual and team behaviour.

I saw an early copy of Michael Lewis’s book The Art of Winning an Unfair Game (2003) and was attracted intuitively to the power of sabermetrics. I saw in Bill James’s work the passion that fired Charles Reep in his observations of association football. Michael, Bill and Charles had a predecessor in Hugh Fullerton who in 1910 wrote about The Inside Game of baseball.

In his paper, Hugh Fullerton points out:

Last season (1909) I arranged with scorers to record hits of various kinds, and secured the scores thus kept on 40 Central League games, 26 American Association games, and fourteen college games to compare with major league scores kept in the same manner. In the college games one grounder in every 8 1/3 passed the infielders. In the Central League one in 10 7/12, in the American Association one in 12 2/43, and in the American National Leagues (45 games of my own scoring) one in every 15 3/16.

He adds that:

The figures were amazing, as they followed so closely the classification of the leagues. They proved that there is a reason for the “class”, but the proof is not found in the mathematics, but in two word (unless you hyphenate them), “team work.”

For Hugh Fullerton the inside game is “the art of getting the hits that “he couldn’t have got anyhow””.

102 years on from Hugh Fullerton’s research there is a growing community of practice in sport analytics. A recent example of this incandescence was a Sports Analytics conference in Manchester in November.

Speakers at the conference included: Bill Gerrard, Raffaele Poli, Simon Wilson, David Fallows, Gavin Fleig, Ed Sulley, Ian Lenagan, Rob Lowe, Fergus Connolly, Ian Graham, Nick Broad and Steve Houston.

Usama Fayyad, Gregory Piatetsky-Shapiro and Padhraic Smyth  point out that Knowledge KDD is “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”.

I believe they have provided a fundamental guide to the family resemblances that characterise the analysis of sport performance:

Here, data are a set of facts (for example, cases in a database), and pattern is an expression in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; finding structure from data; or, in general, making any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple iterations. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the average value of a set of numbers.

I am hopeful that 2012 will provide opportunities to share data throughout the community of practice that is sport analytics.

Photo Credits


Australian bowler, Bill O’Reilly, demonstrates his famous grip