Department of Statistics and
Machine Learning Department
Carnegie Mellon University
Larry Wasserman is Professor in the Department of Statistics and the Machine Learning Department at Carnegie Mellon University. He received his PhD from the University of Toronto 1988. He is a fellow of the American Association for the Advancement of Science (AAAS), the American Statistical Association (ASA), and the Institute of Mathematical Statistics (IMS). He received the COPSS Presidents’ Award in 1999 and the CRM-SSC (Centre de recherches mathematiques de Montreal – Statistical Society of Canada) Prize in Statistics in 2002. He is the founding editor of the Electronic Journal of Statistics. He won the DeGroot Prize in 2006 for his book “All of Statistics: A Concise Course in Statistical Inference”.
(November 29, 2012)
Discovering Regression Structure with a Bayesian Ensemble
A Bayesian ensemble can be used to discover and learn about the regression relationship between a variable of interest y, and vector of p potential predictor variables x. The basic idea is to model the conditional distribution of y given x by a sum of random basis elements plus a flexible noise distribution. In particular, I will focus on a Bayesian ensemble approach called BART (Bayesian Additive Regression Trees). Based on a basis of random regression trees, BART automatically produces a predictive distribution for y at any x (in or out of sample) which automatically adjusts for the uncertainty at each such x. It can do this for nonlinear relationships, even those hidden within a large number of irrelevant predictors. Further, BART opens up a novel approach for model free variable selection. Ultimately, the information provided such a Bayesian ensemble may be seen as a valuable first step towards model building for high dimensional data. (This is joint work with H. Chipman and R. McCulloch).
(November 30, 2012)
EMVS: The EM Approach to Bayesian Variable Selection
Despite rapid developments in stochastic search algorithms, the practicality of Bayesian variable selection methods has continued to pose challenges. High-dimensional data are now routinely analyzed, typically with many more covariates than observations. To broaden the applicability of Bayesian variable selection for such high-dimensional linear regression contexts, we propose EMVS, a deterministic alternative to stochastic search based on an EM algorithm which exploits a conjugate mixture prior formulation to quickly find posterior modes. Combining a spike-and-slab regularization diagram for the discovery of active predictor sets with subsequent rigorous evaluation of posterior model probabilities, EMVS rapidly identifies promising sparse high posterior probability submodels. External structural information such as likely covariate groupings or network topologies is easily incorporated into the EMVS framework. Deterministic annealing variants are seen to improve the effectiveness of our algorithms by mitigating the posterior multi-modality associated with variable selection priors. The usefulness the EMVS approach is demonstrated on real high-dimensional data, where computational complexity renders stochastic search to be less practical. (This is joint work with Veronika Rockova).