Fall 2022 Colloquia

FALL 2022 Colloquia

 

Friday, December 2:  Xiwei Tang (Dept. of Statistics, University of Virginia)

11:00 a.m. via Zoom

Title: High-dimensional Point Process Regression with Applications in Neural Activity Analysis

Abstract: Point process modeling is gaining increasing attention, as point process type data are emerging in numerous scientific applications. Motivated by a neuronal spike trains study, we propose a novel point process regression model, where both the response and the predictor can be a high-dimensional point process. We model the predictor effects through the conditional intensities using a set of basis transferring functions in a convolutional fashion. We organize the corresponding transferring coefficients in the form of a three-way tensor, then impose the low-rank, sparsity, and subgroup structures on this coefficient tensor. These structures help reduce the dimensionality, integrate information across different individual processes, and facilitate the interpretation. A highly scalable optimization algorithm is developed for parameter estimation. We derive the large sample error bound for the recovered transferring coefficient structure, and establish the subgroup identification consistency, while allowing the dimension of the multivariate point process to diverge. We demonstrate the efficacy of our method through both simulations and a cross-area neuronal spike trains analysis in a sensory cortex study.

 

Friday, November 18: Efstathia Bura (Vienna University of Technology)

11:00 a.m. via Zoom

Title: Sufficient Reductions in Regression With Mixed Predictors

Abstract: Most data sets comprise of measurements on continuous and categorical variables. Yet, modeling high-dimensional mixed predictors has received limited attention in regression and classification Statistics literature. We study the general regression problem of inferring on a variable of interest based on high dimensional mixed continuous and binary predictors. The aim is to find a lower dimensional function of the mixed predictor vector that contains all the modeling information in the mixed predictors for the response, which can be either continuous or categorical. The approach we propose identifies sufficient reductions by reversing the regression and modeling the mixed predictors conditional on the response.  We derive the maximum likelihood estimator of the sufficient reductions, asymptotic tests for dimension, and a regularized estimator, which simultaneously achieves variable (feature) selection and dimension reduction (feature extraction).

We study the performance of the proposed method and compare it with other approaches through simulations and  real data examples.

 

Wednesday, November 9:  Trevor Hastie (John A. Overdek Professor of Mathematical Sciences, Professor of Statistics, Professor of Biomedical Data Science, Stanford University- Hollander Distinguished Lecturer 2022

11:00 a.m. in person & via Zoom

Title: Cross-validation in model selection and assessment

Abstract: Cross-validation is ubiquitous in data science, and is used for both model selection and assessment. Yet in some regards it is poorly understood. In this talk we discuss three aspects of CV:
•    What CV estimates?
•    Confidence intervals for prediction error using nested CV.
•    Out-of-bag error for random-forests and standard error estimates.
The research discussed is joint work with Stephen Bates, a post-doctoral researcher at University of California, Berkeley; Samyak Rajanala, a doctoral student at Stanford University; and Rob Tibshirani, a statistics professor at Stanford.
This lecture is dedicated to the late Leo Breiman, a distinguished statistician at the University of California, Berkeley, and Colin Mallows, a renowned statistician who worked at Bell Labs and AT&T Labs for forty years.

 

Friday, November 4:  Yoonkyung Lee (Dept. of Statistics, Ohio State University)

11:00 a.m. via Zoom

Title: Predictive Model Degrees of Freedom in Linear Regression

Abstract: Overparametrized interpolating models have drawn increasing attention from machine learning. Some recent studies suggest that regularized interpolating models can generalize well. This phenomenon seemingly contradicts the conventional wisdom that interpolation tends to overfit the data and may perform poorly on test data. Further, it appears to defy the bias-variance trade-off. As one of the shortcomings of the existing theory, the classical notion of model degrees of freedom fails to explain the intrinsic difference among the interpolating models since it focuses on estimation of in-sample prediction error. This motivates an alternative measure of model complexity which can differentiate those interpolating models and take different test points into account. In particular, we propose a measure with a proper adjustment based on the squared covariance between the predictions and observations. Our analysis with least squares method reveals some interesting properties of the measure, which can reconcile the "double descent" phenomenon with the classical theory. This opens doors to an extended definition of model degrees of freedom in modern predictive settings.

 

Friday, October 28:  Boxiang Wang (Dept. of Statistics and Actuarial Science, The University of Iowa) 

11:00 a.m. via Zoom

Title: A Consolidated Cross-Validation Algorithm for Support Vector Machines via Data Reduction

Abstract: In statistics and machine learning, classification is ubiquitous with wide applications in many domain fields. Practitioners have long sought better methods in terms of accuracy, and they viewed the support vector machine (SVM) as one of the most successful classifiers for several decades. In the present climate, however, the SVM is gradually losing ground to other methods; one of the reasons is that the computation of SVM is intensive, and even prohibitively intensive for large-scale problems. In this work, we propose a consolidated cross-validation (CV) algorithm for the SVM on reproducing kernel Hilbert spaces. The consolidated CV algorithm utilizes an exact leave-one-out formula of the SVM and accelerates the computation through a data reduction strategy. Our proposed algorithm directly yields the tuned SVM classifier for practical use. In addition, it is common practice to use the SVM with an intercept term, which typically leads to better prediction accuracy but cannot be handled by the data reduction methods. To this end, we further propose a novel two-staged consolidated CV algorithm to handle the SVM with an intercept. With extensive simulations and benchmark data applications, we demonstrate that our algorithm is about an order of magnitude faster than the mainstream SVM solvers, kernlab and LIBSVM, with the same accuracy.

 

Friday, October 21:  Jeffrey Morris (Dept. of Biostatistics, Epidemiology and Informatics, University of Pennsylvania)

11:00 a.m. via Zoom

Title: Connectivity Regression

Abstract: One key scientific problem in neuroscience involves assessing how functional connectivity networks in the brain vary across individuals and subject-specific covariates. We introduce a general framework for regressing subject-specific connectivity networks on covariates while accounting for inter-edge dependence within the network. The approach utilizes a matrix-logarithm function to transform the network object into an alternative space in which Gaussian assumptions are justified and positive semidefinite constraints are automatically satisfied. Multivariate regression models are fit in this space, with the covariance accounting for inter-edge network dependence, and multivariate penalization is used to induce sparsity in regression coefficients and covariance elements. We use permutation tests to perform multiplicity-adjusted inference to identify which covariates affect connectivity, and stability selection scores to indicate which network circuits vary by covariate. Simulation studies validate the inferential properties of the proposed method and demonstrate how estimating and accounting for inter-edge dependence when present leads to more efficient estimation, more powerful inference, and more accurate selection of which network circuits vary by covariates. We apply our method to data from the Human Connectome Project Young Adult study, revealing insights into how connectivity varies across language processing covariates and structural brain features.

Joint work with Neel Desai, Veera Balandandayuthapani and Taki Shinohara

 

Friday, October 14:  Robert T. Krafty (Dept. of Biostatistics & Bioinformatics, Emory University)

11:00 a.m. via Zoom

Title: Comparing Populations of High-Dimensional Time Series Spectra

Abstract: Technological advances have led to an increase in the breadth and number of studies that collect high-dimensional time series signals, such as EEG, from multiple groups and whose scientific goal is to understand differences in time series spectra between the groups. Although methods have been proposed for comparing populations of power spectra that are univariate functions of frequency, often referred to as analysis of power (ANOPOW), rigorous methods are scarce when time series are high-dimensional and spectra are complex Hermitian matrix-valued functions. In this talk, we discuss a non-parametric Bayesian approach for ANOPOW with high-dimensional time series. The method models the collection of time series through a novel functional mixed effects factor model that can capture spectral differences between groups while accounting for within-group spectral variability. The approach is motivated by and used to analyze resting-state high-dimensional EEG in patients hospitalized for a first psychotic episode to understand how their electrophysiology differs from that of healthy controls.

 

Friday, October 7:  Michele Guindani (Dept. of Statistics, UC, Irvine)

11:00 a.m. via Zoom

Title: Bayesian Approaches for Capturing the Heterogeneity of Neuroimaging Experiments

Abstract: In the neurosciences, it is now widely established that brain processes are characterized by heterogeneity at several levels. For example, neuronal processes differ by external stimuli, and patterns of brain activations vary across subjects. In this talk, we will discuss a few Bayesian strategies for characterizing heterogeneity in the neurosciences, where time-series data are assumed to be organized in different, but related, units (e.g., neurons and/or regions of interest) and some sharing of information is required to learn distinctive features of the units. First, we will discuss models for multi-subject analysis that will identify population subgroups characterized by similar brain activity patterns, also by integrating available subject information. Then, we will look at how novel techniques in intracellular calcium signals may be used to analyze neuronal responses to external stimuli in awake animals. Finally, we will discuss a mixture framework for identifying differentially activated brain regions that can classify the brain regions into several tiers with varying degrees of relevance. The performance of the models will be demonstrated by applications to data from human fMRI and animal fluorescence microscopy experiments.

 

Friday, September 30:  Kang Jian (Dept. of Biostatistics, University of Michigan)

11:00 a.m. via Zoom

Title: Bayesian Spatial Blind Source Separation via the Thresholded Gaussian Process

Abstract: Blind source separation (BSS) aims to separate latent source signals from their mixtures. For spatially dependent signals  in high dimensional and large-scale data, such as neuroimaging, most existing BSS methods do not take into account the spatial dependence and the sparsity of the latent source signals. To address these major limitations, we propose a Bayesian spatial blind source separation (BSP-BSS) approach for neuroimaging data analysis. We assume the expectation of the observed images as a linear mixture of multiple sparse and piece-wise smooth latent source signals, for which we construct a new class of Bayesian nonparametric prior models by thresholding Gaussian processes. We assign the vMF priors to mixing coefficients in the model. Under some regularity conditions, we show that the proposed  method has several desirable theoretical properties including the large support for the priors, the consistency of joint posterior distribution of the latent source intensity functions and the mixing coefficients, and the selection consistency on the number of latent sources. We use extensive simulation studies and an analysis of the resting-state fMRI data in the Autism Brain Imaging Data Exchange (ABIDE) study to demonstrate that BSP-BSS outperforms the existing method for separating latent brain networks and detecting activated brain activation in the latent sources.

 

Friday, September 23:  Todd Ogden (Dept. of Biostatistics, Columbia University)

11:00 a.m. via Zoom

Title: Nonparametric Functional Data Modeling of Pharmacokinetic Processes with Applications in Dynamic PET Imaging

Abstract: Modeling a pharmacokinetic process typically involves solving a system of linear differential equations and estimating the parameters upon which the functions depend.  In order for this approach to be valid, it is necessary that a number of fairly strong assumptions hold, assumptions involving various aspects of the kinetic behavior of the substance being studied.  In many situations, such models are  understood to be simplifications of the "true" kinetic process.  While in some circumstances such a simplified model may be a useful (and close) approximation to the truth, in some cases, important aspects of the kinetic behavior cannot be represented.  We present a nonparametric approach, based on principles of functional data analysis, to modeling of pharmacokinetic data.  We illustrate its use through application to data from a dynamic PET imaging study of the human brain.

 

Friday, September 16:  Debashis Ghosh (Dept. of Biostatistics and Informatics, University of Colorado)

11:00 a.m. via Zoom

Title: Navigating Through Spatially Resolved Cell Imaging Data: Marrying Deep Learning and Statistics

Abstract: Recently, there has been a growth in technologies for profiling tissues on slide platforms, including multiplex immunohistochemistry, multispectral imaging, sequential fluorescence immune situ hybridization, molecular ion beam imaging and related protocols.  Access to these data will expand the scope of biologists and clinicians to study molecular heterogeneity in a wide swath of disease settings.  In addition, profiling studies are integrating data from the above platforms with single-cell or spatial transcriptomics in order to better understand disease etiology.  To do so will require principled analysis of these data.  We describe some recent methods developed in our group in order to model and better understand these data.  This is joint work with the Multiplex Imaging Group at the University of Colorado.

 

Friday, September 9:  Lily Wang (Dept. of Statistics, George Mason University) 

11:00 a.m. via Zoom

Title: Statistical Inference for Mean Functions of 3D Functional Objects

Abstract: Functional data analysis has become a powerful tool for the statistical analysis of complex objects, such as curves, images, shapes, and manifold valued data. Among these data objects, 2D or 3D images obtained using medical imaging technologies have been attracting researchers’ attention. In general, 3D complex objects are usually collected within the irregular boundary, whereas the majority of existing statistical methods have been focused on a regular domain. To address this problem, we model the complex data objects as functional data and propose trivariate spline smoothing based on tetrahedralizations for estimating the mean functions of 3D functional objects. The asymptotic properties of the proposed estimator are systematically investigated where consistency and asymptotic normality are established. We also provide a computationally efficient estimation procedure for covariance function and corresponding eigenvalue and eigenfunctions and derive uniform consistency. Motivated by the need for statistical inference for complex functional objects, we then present a novel approach for constructing simultaneous confidence corridors to quantify estimation uncertainty. Extension of the procedure to a two-sample case is discussed together with numerical experiments and a real-data application using Alzheimer’s Disease Neuroimaging Initiative database.

 

Friday, September 2:  Peng Ding (Dept. of Statistics, UC, Berkeley)

11:00 a.m. in person 214 Duxbury Hall & via Zoom

Title: To adjust or not to adjust? Estimating the average treatment effect in randomized experiments with missing covariates

Abstract: Complete randomization allows for consistent estimation of the average treatment effect based on the difference in means of the outcomes without strong modeling assumptions on the outcome-generating process. Appropriate use of the pretreatment covariates can further improve the estimation efficiency. However, missingness in covariates is common in experiments and raises an important question: should we adjust for covariates subject to missingness, and if so, how? The unadjusted difference in means is always unbiased. The complete-covariate analysis adjusts for all completely observed covariates and improves the efficiency of the difference in means if at least one completely observed covariate is predictive of the outcome. Then what is the additional gain of adjusting for covariates subject to missingness? A key insight is that the missingness indicators act as fully observed pretreatment covariates as long as missingness is not affected by the treatment, and can thus be used in covariate adjustment to bring additional estimation efficiency. This motivates adding the missingness indicators to the regression adjustment, yielding the missingness-indicator method as a well-known but not so popular strategy in the literature of missing data. We recommend it due to its many advantages. We also propose modifications to the missingness-indicator method based on asymptotic and finite-sample considerations. To reconcile the conflicting recommendations in the missing data literature, we analyze and compare various strategies for analyzing randomized experiments with missing covariates under the design-based framework. This framework treats randomization as the basis for inference and does not impose any modeling assumptions on the outcome-generating process and missing-data mechanism.

 

Friday, August 26:  Yun Li (Dept. of Biostatistics, UNC, Chapel Hill)

11:00 a.m. via Zoom

Title: Cell Composition Inference and Identification of Layer-specific Transcriptional Profiles with POLARIS

Abstract: Spatial transcriptomics (ST) technology, providing spatially resolved transcriptional profiles, facilitates advanced understanding of key biological processes related to health and disease. Sequencing-based ST technologies provide whole-transcriptome profiles, but are limited by the non-single cell level resolution. Lack of knowledge in the number of cells or cell type composition at each spot can lead to invalid downstream analysis, which is a critical issue recognized in ST data analysis. Methods developed, however, tend to under-utilize histological images, which conceptually provide important and complementary information including anatomical structure and distribution of cells. To fill in the gaps, we present POLARIS, a versatile ST analysis method that can perform cell type deconvolution, identify anatomical or functional layer-wise differentially expressed (LDE) genes and enable cell composition inference from histology images. Applied to four tissues, POLARIS demonstrates high deconvolution accuracy, accurately predicts cell composition solely from images, and identifies LDE genes that are biologically relevant and meaningful.

 

 

 

 

Previous Colloquia

Spring 2022 Colloquia

Fall 2021 Colloquia

Spring 2021 Colloquia

Fall 2020 Colloquia

Spring 2020 Colloquia

Fall 2019 Colloquia

Spring 2019 Colloquia

Fall 2018 Colloquia

Spring 2018 Colloquia

Fall 2017 Colloquia

Spring 2016 Colloquia Part II

Fall 2016 Colloquia

Spring 2016 Colloquia