Statistics Seminar

SPRING 2018

Seminars are held from 4:00 p.m. – 5:00 p.m. in Griffin-Floyd 100 unless otherwise noted.

Refreshments are available before the seminars from 3:30 p.m. – 4:00 p.m. in Griffin-Floyd Hall 103.

Date Speaker    Title (click for abstract)
Mar 1 Arup Bose (Indian Statistical Institute, Kolkata)

High dimensional time series and free probability

Mar 22 Shanshan Ding (University of Delaware)

Envelope Quantile Regression

Apr 2 Marc G. Genton (KAUST, Saudi Arabia)

A Stochastic Generator of Global Monthly Wind Energy with Tukey g-and-h Autoregressive Processes

Apr 2 Ying Sun (KAUST, Saudi Arabia)

Visualization and Assessment of Spatio-temporal Covariance Properties

Apr 5 Rhonda Bacher (University of Florida)

Statistical Methods for Single-Cell RNA-seq Data

April 18 Lingjiong Zhu (Florida State University)

Approximate Variational Estimation for a Model of Network Formation

April 19 Jayaram Sethuraman (Florida State University)

The origins of the stick breaking construction of Dirichlet priors

Abstracts

High dimensional time series and free probability
Arup Bose, Indian Statistical Institute, Kolkata

Consider a sample of size \(n=n(p)\) from a linear process of dimension \(p\) where \(n, p \to \infty\), \(p/n \to y\in [0, \ \infty)\). Let \(\hat{\Gamma}_{u}\) be the sample autocovariance of order \(u\).

Under quite weak conditions, we prove, in a unified way, that the limiting spectral distribution (LSD) of any symmetric polynomial in these matrices such as \(\hat{\Gamma}_{u} + \hat{\Gamma}_{u}^{*}\), \(\hat{\Gamma}_{u}\hat{\Gamma}_{u}^{*}\), \(\hat{\Gamma}_{u}\hat{\Gamma}_{u}^{*}+\hat{\Gamma}_{k}\hat{\Gamma}_{k}^{*}\), after suitable centering and scaling, exists and is non-degenerate.

We use methods from free probability in conjunction with the method of moments to establish our results. In addition, we are able to provide a general description for the limits in terms of some freely independent variables. We also establish asymptotic normality results for the traces of these matrices.

We suggest statistical uses of these results in problems such as order determination of high-dimensional MA and AR processes and testing of hypotheses for coefficient matrices of such processes.

The problem of establishing the LSD for the non-symmetric cases is hard and is open.

Envelope Quantile Regression
Shanshan Ding, University of Delaware

Quantile regression offers a valuable complement of classical mean regression for robust and comprehensive data analysis in a variety of applications. We propose a novel envelope quantile regression method (EQR) that adapts a nascent technique called enveloping (Cook, Li, and Chiaromonte, 2010) to improve the efficiency of standard quantile regression. The new method aims to identify material and immaterial information in a quantile regression model and use only the material information for estimation. By excluding the immaterial part, the EQR method has the potential to substantially reduce the estimation variability. Unlike existing envelop model approaches which mainly rely on the likelihood framework, our proposed estimator is defined through a set of nonsmooth estimating equations. We facilitate the estimation via the generalized method of moments (GMM) and derive the asymptotic normality of the proposed estimator by applying empirical process techniques. Furthermore, we establish that EQR is asymptotically more efficient than (or at least as asymptotically efficient as) the standard quantile regression estimators without imposing stringent conditions. Hence, our work advances the envelope model theory to general distribution-free settings. We demonstrate the effectiveness of the proposed method via Monte-Carlo simulations and real data examples. This talk is based on joint works with Dr. Zhihua Su, Dr. Guangyu Zhu and Dr. Lan Wang.

A Stochastic Generator of Global Monthly Wind Energy with Tukey g-and-h Autoregressive Processes
Marc G. Genton, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Quantifying the uncertainty of wind energy potential from climate models is a very time-consuming task and requires a considerable amount of computational resources. A statistical model trained on a small set of runs can act as a stochastic approximation of the original climate model, and be used to assess the uncertainty considerably faster than by resorting to the original climate model for additional runs. While Gaussian models have been widely employed as means to approximate climate simulations, the Gaussianity assumption is not suitable for winds at policy-relevant time scales, i.e., sub-annual. We propose a trans-Gaussian model for monthly wind speed that relies on an autoregressive structure with Tukey g-and-h transformation, a flexible new class that can separately model skewness and tail behavior. This temporal structure is integrated into a multi-step spectral framework that is able to account for global nonstationarities across land/ocean boundaries, as well as across mountain ranges. Inference can be achieved by balancing memory storage and distributed computation for a data set of 220 million points. Once fitted with as few as five runs, the statistical model can generate surrogates fast and efficiently on a simple laptop, and provide uncertainty assessments very close to those obtained from all the available climate simulations (forty) on a monthly scale. The talk is based on joint work with Jaehong Jeong, Yuan Yan, and Stefano Castruccio.

Visualization and Assessment of Spatio-temporal Covariance Properties
Ying Sun, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Spatio-temporal covariances are important for describing the spatio-temporal variability of underlying random processes in geostatistical data. For second-order stationary processes, there exist subclasses of covariance functions that assume a simpler spatio-temporal dependence structure with separability and full symmetry. However, it is challenging to visualize and assess separability and full symmetry from spatio-temporal observations. In this work, we propose a functional data analysis approach that constructs test functions using the cross-covariances from time series observed at each pair of spatial locations. These test functions of temporal lags summarize the properties of separability or symmetry for the given spatial pairs. We use functional boxplots to visualize the functional median and the variability of the test functions, where the extent of departure from zero at all temporal lags indicates the degree of non-separability or asymmetry. We also develop a rank-based nonparametric testing procedure for assessing the significance of the non-separability or asymmetry. The performances of the proposed methods are examined by simulations with various commonly used spatio-temporal covariance models. To illustrate our methods in practical applications, we apply it to real datasets, including weather station data and climate model outputs. work with Jaehong Jeong, Yuan Yan, and Stefano Castruccio.

Statistical Methods for Single-Cell RNA-seq Data
Rhonda Bacher, Department of Biostatistics, UF

The development of single cell RNA sequencing (scRNA-seq) technologies promises to deliver new levels of understanding to fundamental areas of biology. While the data obtained are structurally similar to that of traditional (bulk) RNA-seq, the process of quantifying a small amount of starting material gives rise to distinct features in scRNA-seq data including an abundance of zeros, increased variability, and heterogeneous expression distributions, requiring novel statistical methods. In this talk, I will discuss statistical methods I have developed for scRNA-seq data, including a robust method for normalization and a simulation framework which characterizes technical variability and provides guidance on experimental design choices.​

Approximate Variational Estimation for a Model of Network Formation
Lingjiong Zhu , Florida State University

We study an equilibrium model of sequential network formation with heterogeneous players. The payoffs depend on the number and composition of direct connections, but also the number of indirect links. We show that the network formation process is a potential game and in the long run the model converges to an exponential random graph (ERGM). Since standard simulation-based inference methods for ERGMs could have exponentially slow convergence, we propose an alternative deterministic method, based on a variational approximation of the likelihood. We compute bounds for the approximation error for a given network size and we prove that our variational method is asymptotically exact, extending results from the large deviations and graph limits literature to allow for covariates in the ERGM. A simple Monte Carlo shows that our deterministic method provides more robust estimates than standard simulation based inference. This is based on the joint work with Angelo Mele.

The origins of the stick breaking construction of Dirichlet priors
Jayaram Sethuraman , Florida State University

My 1994 paper gave a simple direct proof of the constructive definition of Dirichlet priors (Ferguson 1973) and did not dwell on how I got the idea for that construction. In this talk I will first describe the collection of all priors in the nonparametric problem and show how this description leads to the constructive definition, nowadays called the stick breaking construction. This also leads to the invariance under size biased property (ISBP) of the GEM distribution (the stick breaking part) which gives a simpler proof than in the 1994 paper for the posterior distribution of Dirichlet priors. All the ideas of this talk emanate from deeper understanding of the Blackwell and MacQueen paper in 1973.