Fall 2019 Colloquia
Friday, December 6th: Dr. Minge Xie, Rutgers University
214 Duxbury Hall, 10:00am
Title: Repro Sampling Method for Joint Inference of Model Selection and Regression Coefficients in High Dimensional Linear Models
Abstract: This paper proposes a new and effective simulation-based approach, called Repro Sampling method, to conduct statistical inference in high dimensional linear models. The Repro method creates and studies the performance of artificial samples (referred to as Repro samples) that are generated by mimicking the sampling mechanism that generated the true observed sample. By doing so, this method provides a new way to quantify model and parameter uncertainty and provide confidence sets with guaranteed coverage rates on a wide range of problems. A general theoretical framework and an effective Monte-Carlo algorithm, with supporting theories, are developed for high dimensional linear models. This method is used to jointly create confidence sets of selected models and model coefficients, with both exact and asymptotic inferences theories provided. It also provides a theoretical development to support the computational efficiency. Furthermore, this development allows us to handle inference problems involving covariates that are perfectly correlated. A new and intuitive graphical tool to present uncertainties in model selection and regression parameter estimation is also developed. We provide numerical studies to demonstrate the utility of the proposed method in a range of problems. Numerical comparisons suggest that the method is far better (in terms of improved coverage rates and significantly reduced sizes of confidence sets) than the approaches that are currently used in the literature. The development provides a simple and effective solution for the difficult post-selection inference problems.
(Joint work with Peng Wang)
Friday, November 22nd: Dr. Xiaofeng Shao, University of Illinois at Urbana-Champaign
214 Duxbury Hall, 10:00am
Title: Dependence Testing in High Dimension
Abstract: The talk consists of two parts: mean independence testing and mutual independence testing for high dimensional data. In the first part, we first introduce Martingale difference divergence (MDD), which is a metric that quantifies the mean dependence of a random vector Y given another random vector X and can be viewed as an extension of distance covariance. We propose a novel test to assess the mean dependence of a response variable on a large number of covariates. Our MDD-based procedure is able to detect certain type of departure from the null hypothesis of mean independence without making any specific model assumptions. We establish the asymptotic normality of the proposed test statistic under suitable assumptions that can be verified for covariates with banded dependence or Gaussian distribution. Power analysis and a wild bootstrap procedure will also be presented along with some simulation results.
In the second part, we propose a L2 type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed based on the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlation. Both theoretical results and finite sample results will be presented.
Friday, November 15th: Dr. Ian McKeague, Columbia University
214 Duxbury Hall, 10:00am
Title: Multiplicative Cascades and Wigner's Semicircle Law
Abstract: Various aspects of standard model particle physics might be explained by a suitably rich algebra acting on itself, as suggested recently by Furey (2015). This talk discusses the statistical behavior of large causal tree diagrams that combine freely independent elements in such an algebra. It is shown that some of the familiar limiting distributions in random matrix theory (namely the Marchenko-Pastur law and Wigner's semicircle law) emerge in this setting as limits of normalized sums-over-paths of non-negative elements of the algebra assigned to the edges of the tree. These results are established in the setting of non-commutative probability. Trees with classically independent positive edge weights (random multiplicative cascades) were originally proposed by Mandelbrot as a model displaying the fractal features of turbulence. The novelty of the present approach is the use of non-commutative (free) probability to allow the edge weights to take values in an algebra. Potential applications in theoretical neuroscience related to Alan Turing's famous "Imitation Game" paper are also discussed.
Friday, November 8th: Dr. Christopher Wikle, University of Missouri
214 Duxbury Hall, 10:00am
Title: Hybrid Deep Neural/Statistical Models for Dyanmic Spatio-Temporal Processes
Abstract: Spatio-temporal data are ubiquitous in the sciences and engineering, and their study is important for understanding and predicting a wide variety of processes. One of the difficulties with statistical modeling of spatial processes that change in time is the complexity of the dependence structures that must describe how such a process varies, and the presence of high-dimensional complex datasets and large prediction domains. It is particularly challenging to specify parameterizations for nonlinear dynamic spatio-temporal models (DSTMs) that are simultaneously useful scientifically, efficient computationally, and allow for proper uncertainty quantification. Here we show two approaches that have facilitated such an implementation by the use of deep neural models. In one case, we utilize reservoir computing, via deep echo-state network models, to generate multi-scale basis functions to perform prediction for nonlinear spatio-temporal processes. The requisite basis function expansion with random coefficients allows the conditional mean to be considered as a high-dimensional linear process for Gaussian or non-Gaussian data, the complex nonlinear spatio-temporal dynamics of the process can be captured in the construction of the deep-model generated basis functions. The second approach utilizes a deep convolutional neural network to learn the kernel mapping function in a state-dependent integro-difference equation (IDE) spatio-temporal dynamic model. We show that this model has the remarkable ability to predict a process (weather radar storm cell movement) completely different from the one on which it was trained (sea surface temperature data).
Friday, November 1st: Dr. Ariel Aloe, University of Iowa
214 Duxbury Hall, 10:00am
Title: Combining Evidence: Something New, Something Old, & Something Borrowed
Abstract: With recent emphasis on the lack of replicability of primary studies, more than ever it seems relevant to engage critically in methods to combine evidence. Although some methods are better known than others, several methods to combine empirical evidence had been available to researchers for many years. In this talk, I will attempt to summarize alternative approaches to the synthesis of quantitative evidence. I will briefly illustrate some of the current methodological issues related to a synthesis of ADHD interventions. Finally, I will discuss the biggest challenges, at least in my opinion, that we face when synthesizing data to inform policy and practice.
Friday, October 18th: Dr. Suyu Lin, MD Anderson
214 Duxbury Hall, 10:00am
Title: A Bayesian Phase I/II Trial Design for Immunotherapy
Abstract: Immunotherapy is an innovative treatment approach that stimulates patient's immune system to fight cancer. It demonstrates characteristics distinct from conventional chemotherapy, and stands to revolutionize the treatment of cancer. We propose a Bayesian phase I/II dose-finding design that incorporate the unique features of immunotherapy by simultaneously considering three outcomes: immune response, toxicity and efficacy. The objective is to identify the biologically optimal dose, defined as the dose with the highest desirability in the risk-benefit tradeoff. An Emax model is utilized to describe the marginal distribution of the immune response. Conditional on the immune response, we jointly model toxicity and efficacy using a latent variable approach. Based on accumulating data, we adaptively randomize patients to experimental doses based on the continuously updated model estimates. Simulation study shows that our proposed design has good operating characteristics in terms of selected the target dose and allocating patients to the target dose.
Friday, October 11th: Dr. Tianfu Wu, North Carolina State University
214 Duxbury Hall, 10:00am
Title: Grammar Guided Interpretable Representation Learning
Abstract: Grammar models are natural, interpretable and fundamental schema in both language and image representation learning. “Grammar in language is merely a recent extension of much older grammars that are built into the brains of all intelligent animals to analyze sensory input, to structure their actions and even formulate their thoughts.” as Prof. David Mumford believed. Can grammars help us design better deep machine learning models that are interpretable and parsimonious? In this talk, we shall explore two ways of harnessing the best of the two worlds, Grammars and DNNs: using a simple 1-D grammar to rethink and unify the design of neural architectures that achieve state-of-the-art results in computer vision, and using a simple 2-D grammar to rationalize state-of-the-art DNN-based object detection systems.
Friday, October 4th: Dr. Joseph Antonelli, University of Florida
214 Duxbury Hall, 10:00am
Title: Estimating the Health Effects of Environmental Mixtures Using Bayesian Semiparametric Regression and Sparsity Inducing Priors
Abstract: Humans are routinely exposed to mixtures of chemical and other environmental factors, making the quantification of health effects associated with environmental mixtures a critical goal for establishing environmental policy sufficiently protective of human health. The quantification of the effects of exposure to an environmental mixture poses several statistical challenges. It is often the case that exposure to multiple pollutants interact with each other to affect an outcome. Further, the exposure-response relationship between an outcome and some exposures, such as some metals, can exhibit complex, nonlinear forms, since some exposures can be beneficial and detrimental at different ranges of exposure. To estimate the health effects of complex mixtures we propose a flexible Bayesian approach that allows exposures to interact with each other and have nonlinear relationships with the outcome. We induce sparsity using multivariate spike and slab priors to determine which exposures are associated with the outcome, and which exposures interact with each other. The proposed approach is interpretable, as we can use the posterior probabilities of inclusion into the model to identify pollutants that interact with each other. We utilize our approach to study the impact of exposure to metals on child neurodevelopment in Bangladesh, and find a nonlinear, interactive relationship between Arsenic and Manganese.
Friday, September 29th: Dr. Lifeng Lin, Florida State University
214 Duxbury Hall, 10:00am
Title: Predictive Treatment Ranking in Bayesian Network Meta-analysis
Abstract: Network meta-analysis (NMA) is an important tool to provide high-quality evidence about available treatments' benefits and harms for comparative effectiveness research. Compared with conventional meta-analyses that synthesize related studies for pairs of treatments separately, an NMA uses both direct and indirect evidence to simultaneously compare all available treatments for a certain disease. It is of primary interest for clinicians to rank these treatments and select the optimal ones for patients. Various methods have been proposed to evaluate treatment ranking; among them, the mean rank and the so-called surface under the cumulative ranking curve (SUCRA) are widely used in current practice of NMAs. However, these measures only summarize treatment ranks among the studies collected in the NMA; due to heterogeneity between studies, they cannot predict treatment ranks in a future study and thus may not be directly applied to healthcare for new patients. We propose innovative measures to predict treatment ranks by accounting for the heterogeneity between the existing studies in an NMA and a new study. They are the counterparts of the mean rank and the SUCRA under the new study setting. We use two illustrative examples and a simulation study to evaluate the performance of the proposed measures.
Friday, September 13th: Dr. Gongjun Xu, University of Michigan
204 Duxbury Hall, 10:00am
Title: Identifiability of Restricted Latent Class Models
Abstract: Latent class models have wide applications in social and biological sciences. In many applications, pre-specified restrictions are imposed on the parameter space of latent class models, through a design matrix, to reflect practitioners' diagnostic assumptions about how the observed responses depend on the respondents' latent traits. Though widely used in various fields, such restricted latent class models suffer from nonidentifiability due to the models' discrete nature and complex restricted structure. This talk addresses the fundamental identifiability issue of restricted latent class models by developing a general framework for strict and partial identifiability of the model parameters. The developed identifiability conditions only depend on the design matrix and are easily checkable, which provides useful practical guidelines for designing statistically valid diagnostic tests. Furthermore, the new theoretical framework is applied to establish, for the first time, the identifiability of several designs from cognitive diagnosis applications.