University of Florida Homepage

Statistics Seminar

Nominate a speaker »

**Seminars will be held In Person unless otherwise noted**

Fall 2024

Seminar One

September 5, 2024
Turlington Hall,  Room L011

Tucker McElroy
US Census Bureau

Title:

Non-nested model comparisons of differencing operators for non-stationary time series.

Abstract:

We study the problem of comparing two differencing operators for a non-stationary time series. The differencing operators may be non-nested, and either one (or both) may be incorrect; we assume that their least common multiple is sufficient to yield a stationary time series. The average squares of a time series sample converge to a constant if the time series is stationary but is explosive in the non-stationary case; we use this observation to compare two specifications via a difference of average squares, thereby testing the null hypothesis that both operators render the data stationary.  We employ a studentization that removes nuisance scale parameters from the limiting distribution while maintaining power against the alternative hypothesis that one or both operators are inadequate.

September 19, 2024
Turlington Hall, Room L011

Linjun Zhang
Rutgers University

Title:

Finite-Sample and Distribution-Free Fair Classification: Optimal Excess Risk-Fairness Trade-off and the Cost of Group-Blindness

Abstract:

Algorithmic fairness in machine learning has attracted significant attention recently, yet the impact of group fairness on excess risk remains unclear. Despite the widespread adoption of group-blindness to promote fairness, its effectiveness is uncertain. In this work, we explore the influence of fairness and group-blindness in the context of binary classification with group fairness constraints. Specifically, we propose a unified framework for fair classification with excess risk control and distribution-free and finite-sample fairness guarantees for various group fairness notions in both group-aware and group-blind scenarios. Moreover, for binary sensitive attributes, a minimax excess risk lower bound is provided, confirming the minimax optimality of the proposed algorithm. The minimax excess risk reveals the inherent trade-off between excess risk and fairness, and uncovers the inevitable cost of group-blindness, which may lead to constant excess risk in extreme cases. Through simulation studies and real data analysis, we illustrate the superior performance of our algorithm compared to existing methods and also provide empirical evidence for our theoretical findings.

Seminar Three

October 3, 2024
Turlington Hall, Room L011

Aaditya Ramdas
Carnegie Melon

Title:

A martingale theory of evidence

Abstract:

This talk will describe an approach towards testing hypotheses and estimating functionals that is based on games. In short, to test a (possibly composite, nonparametric) hypothesis, we set up a game in which no betting strategy can make money under the null (the wealth is an “e-process” under the null). But if the null is false, then smart betting strategies will have exponentially increasing wealth. Thus, hypotheses are rewritten as constraints in games, the statistician is a gambler, test statistics are betting strategies, and the wealth obtained is directly a measure of evidence which is valid at any data-dependent stopping time (an e-value). The optimal betting strategies are typically Bayesian, but the guarantees are frequentist. This “game perspective” provides new statistically and computationally efficient solutions to many modern problems, like nonparametric independence or two-sample testing by betting, estimating means of bounded random variables, testing exchangeability, and so forth.

Seminar Four

October 17, 2024
Turlington Hall, Room L011

Ryan Martin
North Carolina State University

Title:

Regularized e- processes: Anytime valid inference with knowledge based information gain

Abstract:

There’s been a recent push to develop statistical methods that are anytime-valid, i.e., where the frequentist error rate control properties hold no matter how the investigator decides to stop the data collection process — even if the data itself is used to make that decision.  As expected, the price that one pays for anytime-validity is that the solutions tend to be more conservative and, hence, less efficient than the textbook solutions that are tuned to fixed sample sizes.  Perhaps prior knowledge about the quantity of interest can be used to improve efficiency, but how?  Arbitrary adjustments to an e-process will surely jeopardize its anytime-valid properties.  In this talk, I’ll present a principled approach to knowledge-based regularization, which allows for efficiency gains while maintaining anytime-validity.  Specifically, my knowledge-based regularization does two things: first, it directly inflates the e-process at values incompatible with prior knowledge; second, it relaxes the notion of anytime-validity in a way that’s consistent with prior knowledge.  The main result is a generalized version of Ville’s inequality, which is used to show how the regularized e-process offers anytime-valid and (more) efficient inference.  I’ll also briefly explain how regularized e-processes lead to reliable (imprecise-probabilistic) uncertainty quantification.

Challis Lecture

October 29-30, 2024
Rietz Union

Francesca Dominici
Harvard University

2024 Challis Lecture Information 

Seminar Five

November 14, 2024
Turlington Hall, Room L011

Le Bao
Penn State

Title:

Mapping the Marginalized Groups with Presence-Only Data

Abstract:

Female sex workers (FSW) are affected by individual, network, and structural risks, making them vulnerable to poor health and well-being. Because of social stigma and discrimination towards FSW, it is difficult to measure or estimate the size and location of FSW at any spatial resolution, especially a fine-scale resolution. In this study, we develop zero-inflated models to estimate the female sex worker population at the grid-cell level across Eastern and Southern African countries, extending the presence-only data analysis literature. We also propose a subsampling procedure to decrease the computation time of zero-inflated models substantially.