Department of Statistics
The Wharton School
University of Pennsylvania
Larry Brown is the Miers Busch Professor and Professor of Statistics. Professor Brown was President of the Institute of Mathematical Statistics in 1992–1993, Coeditor of The Annals of Statistics for 1995–1997 and gave the prestigious Wald Memorial Lectures in 1985. In 1990, Professor Brown was elected to the U.S. National Academy of Sciences. In 1993, Purdue University awarded him an honorary D.Sc. degree in recognition of his distinguished achievements, and in 2002 he was named winner of the Wilks Memorial Award of the American Statistical Association.
(November 13, 2007)
In-Season Prediction of Batting Averages: A Field-test of Simple Empirical Bayes and Bayes Methodologies
Batting average is one of the principal performance measures for an individual baseball player. It has a simple numerical structure as the percentage of successful attempts, “Hits”, as a proportion of the total number of qualifying attempts, “At-Bats”. This situation, with Hits as a number of
successes within a qualifying number of attempts, makes it natural to statistically model each player’s batting average as a binomial variable outcome, with a given value of ABi and a true (but unknown) value of pi that represents the player’s latent ability. This is a common data structure in many statistical applications; and so the methodological study here has implications for such a range of applications.
We will look at batting records for every Major League player over the course of a single season (2005). The primary focus is on using only the batting record from an earlier part of the season (e.g., the first 3 months) in order to predict the batter’s latent ability, pi, and consequently to predict their batting-average performance for the remainder of the season. Since we are using a season that has already concluded, we can validate our predictive performance by comparing the predicted values to the actual values for the remainder of the season.
The methodological purpose of this study is to gain experience with a variety of predictive methods applicable to a much wider range of situations. Several of the methods to be investigated derive from empirical Bayes and hierarchical Bayes interpretations. Although the general ideas behind these techniques have been understood for many decades*, some of these methods have only been refined relatively recently in a manner that promises to more accurately fit data such as that at hand.
One feature of all of the statistical methodologies here is the preliminary use of a particular form of variance stabilizing transformation in order to transform the binomial data problem into a somewhat more familiar structure involving (approximately) Normal random variables with known variances. This transformation technique is also useful in validating the binomial model assumption that is the conceptual basis for all our analyses.
* A particularly relevant background reference is Efron, B. and Morris, C. (1977) Stein’s paradox in statistics” Scientific American 236 119-127, and the earlier, more technical version (1975), “Data analysis using Stein’s estimator and its generalizations” Jour. Amer. Stat. Assoc. 70 311-319.
(November 14, 2007)
Nonparametric Density Estimation via the Root-Unroot Transform; with an Adaptive Wavelet Block Threshholding Implementation
Nonparametric density estimation has traditionally been treated separately from nonparametric regression. Here, we propose an approach that first transforms a density estimation problem into a nonparametric regression problem. The algorithm for this involves suitably binning the observations and then transforming the binned data counts via a carefully chosen square-root transformation. Then any suitable nonparametric regression procedure can be used.
Here, a wavelet block-threshholding rule is used for the transformed regression problem, and this produces an estimated nonparametric regression function. Finally an adjusted un-root transform is applied to yield the final nonparametric density estimator.
The procedure is easy to implement. It enjoys a high degree of asymptotic adaptivity and is shown in numerical examples to perform well for standard density estimation settings. As time permits, we will also discuss a corresponding procedure to produce confidence bands to accompany the nonparametric regression and density estimators.