|Date||Speaker||Title (click for abstract)|
|Sep 27||Ming Yuan(Columbia University)|
|Oct 4||Sebastien Haneuse(Harvard T.H. Chan School of Public Health)|
|Oct 11||Veronika Rockova(University of Chicago Booth School of Business)|
|Oct 18||Hongcheng Liu(University of Florida)|
|Oct 25||Hongyuan Cao(Florida State University)|
|Nov 1||Pierre Jacob (Harvard University)|
|Nov 15||Subharup Guha (University of Florida)|
|On the Sample Complexity for Approximating High Dimensional Functions of Few Variables|
|Ming Yuan, Columbia University
We investigate the optimal sample complexity of recovering a general high dimensional sparse function, and means for tradeoff between sample and computational complexities. Exploiting the connection between approximation of a smooth function and exact recovery of a grid function, we identify the optimal sample complexity for recovering a high dimensional sparse function based on point queries. Our result provides a precise characterization of potential loss of information when restricting to point queries as opposed to the more general linear queries, as well as effects of measurement error in recovery.
|On the analysis of two-phase designs in cluster-correlated data settings|
|Sebastien Haneuse, Harvard T.H. Chan School of Public Health
In public health research information that is readily available may be insufficient to address the primary question(s) of interest. One cost-efficient way forward, especially in resource-limited settings, is to conduct a two-phase study in which the population is initially stratified, at phase I, by the outcome and/or some categorical risk factor(s). At phase II detailed covariate data is ascertained on a sub-sample within each phase I strata. While analysis methods for two-phase designs are well established, they have focused exclusively on settings in which participants are assumed to be independent. As such, when participants are naturally clustered (e.g. patients within clinics) these methods may yield invalid inference. To address this we develop a novel analysis approach based on inverse-probability weighting (IPW) that permits researchers to specify some working covariance structure, appropriately accounts for the sampling design and ensures valid inference via a robust sandwich estimator. In addition, to enhance statistical efficiency, we propose a calibrated IPW estimator that makes use of information available at phase I but not used in the design. A comprehensive simulation study is conducted to evaluate small-sample operating characteristics, including the impact of using naive methods that ignore correlation due to clustering, as well as to investigate design considerations. Finally, the methods are illustrated using data from a one-time survey of the national anti-retroviral treatment program in Malawi.
|Veronica Rockova, University of Chicago Booth School of Business
|Hongcheng Liu, University of Florida
|Regression analysis of longitudinal data with omitted asynchronous longitudinal covariate|
|Hongyuan Cao, Florida State University
Long term follow-up with longitudinal data is common in many medical investigations. In such studies, some longitudinal covariate can be omitted for various reasons. Naïve approach that simply ignores the omitted longitudinal covariate can lead to biased estimators. In this article, we propose new unbiased estimation methods to accommodate omitted longitudinal covariate. In addition, if the omitted longitudinal covariate is asynchronous with the longitudinal response, a two stage approach is proposed for valid statistical inference. Asymptotic properties of the proposed estimators are established. Extensive simulation studies provide numerical support for the theoretical findings. We illustrate the performance of our method on dataset from an HIV study.
|Unbiased Markov chain Monte Carlo with couplings|
|Pierre Jacob, Harvard University
Markov chain Monte Carlo methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, these estimators are generally biased after any fixed number of iterations, which complicates parallel computation and other tasks. In this talk, I will explain how to remove this burn-in bias by using couplings of Markov chains and a telescopic sum argument due to Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and various methodological developments follow. I will discuss the benefits and limitations of the proposed framework in various settings of Bayesian inference. This is joint work with John O’Leary and Yves F. Atchade. available at arxiv.org/abs/1708.03625.
|Subharup Guha, University of Florida