NEW MS DEGREE IN INTERDISCIPLINARY DATA SCIENCE

Read More


Spring 2016 Colloquia Part II

Spring 2016 Colloquia

 

Friday, January 13: Dr. Meng Li, Duke University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: New developments in probabilistic image analysis: boundary detection and image reconstruction

Abstract: Images (2D, 3D, or even higher dimensional) are a fundamental data type. The area of image analysis is undergoing a dramatic transformation to utilize the power of statistical modeling, which provides a unique way to describe uncertainties and leads to model-based solutions. We exemplify this by two critical and challenging problems, boundary detection and image reconstruction, in a comprehensive way from theory, methodology to application. We view the boundary as a closed smooth lower-dimensional manifold, and propose a nonparametric Bayesian approach based on priors indexed by the unit sphere. The proposed method achieves four goals of guaranteed geometric restriction, (nearly) minimax optimal rate adapting to the smoothness level, convenience for joint inference and computational efficiency. We introduce a probabilistic model-based technique using wavelets with adaptive random partitioning to reconstruct images. We represent multidimensional signals by a mixture of one-dimensional wavelet decompositions in the form of randomized recursive partitioning on the space of wavelet coefficient trees, where the decomposition adapts to the geometric features of the signal. State-of-the-art performances of proposed methods are demonstrated using simulations and applications including neuroimaging in brain oncology. R/Matlab packages/toolboxes and interactive shiny applications are available for routine implementation.

 

Friday, January 20: Lifeng Lin, University of Minnesota (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: On evidence cycles in network meta-analysis

Abstract: As an extension of pairwise meta-analysis of two treatments, network meta-analysis has recently attracted many researchers in evidence-based medicine because it simultaneously synthesizes both direct and indirect evidence from multiple treatments and thus facilitates better decision making. The Lu–Ades Bayesian hierarchical model is a popular method to implement network meta-analysis, and it is generally considered more powerful than conventional pairwise meta-analysis, leading to more accurate effect estimates with narrower confidence intervals. However, the improvement of effect estimates produced by Lu–Ades network meta-analysis has never been studied theoretically. In this talk, we show that such improvement depends highly on evidence cycles in the treatment network. Specifically, Lu–Ades network meta-analysis produces posterior distributions identical to separate pairwise meta-analyses for all treatment comparisons when a treatment network does not contain cycles. Even in a general network with cycles, treatment comparisons that are not contained in any cycles do not benefit from Lu–Ades network meta-analysis. Simulations and a case study are used to illustrate the equivalence of Lu–Ades network meta-analysis and pairwise meta-analysis in certain networks.

 

Friday, January 27: Dr. Hwanhee Hong, Johns Hopkins University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Integrating Data for Comparative Effectiveness Research

Abstract: Comparative effectiveness research helps answer “what works best” and provide evidence on the effectiveness, benefits, and harms of different treatments. When multiple sources of data exist on a particular question the evidence should be obtained by integrating those sources in a principled way. Network meta-analysis (NMA) is an extension of a traditional pairwise meta-analysis to compare multiple treatments simultaneously and take advantage of multiple sources of data. In some situations there are some studies with only aggregated data (AD) and others with individual patient-level data (IPD) available; standard network meta-analysis methods have been extended to synthesize these types of data simultaneously. However, existing methods do not sufficiently consider the quality of evidence (i.e., the level of precision of effect estimates or compatibility of study designs) across different data types, and assume all studies contribute equally to the treatment effect estimation regardless of whether it is AD or IPD. In this talk, I propose Bayesian hierarchical NMA models that borrow information adaptively across AD and IPD studies using power and commensurate priors. The power parameter in the power priors and spike-and-slab hyperprior in the commensurate priors govern the level of borrowing information across study types. We incorporate covariate-by-treatment interactions to examine subgroup effects and discrepancy of the subgroup effects estimated in AD and IPD (i.e., ecological bias). The methods are validated and compared via extensive simulation studies, and then applied to an example in diabetes treatment comparing 28 oral anti-diabetic drugs. We compare results across model and hyperprior specifications.  These methods development enables us to integrate different types of data in network meta-analysis with flexible prior distributions and helps enhance comparative effectiveness research by providing a comprehensive understanding of treatment effects and effect modification (via the covariate-by-treatment interactions) from multiple sources of data. 

 

Tuesday, January 31: Dr. Matey Neykov, Princeton University (Faculty Candidate)

214 Duxbury Hall, 2:00pm

Title: High Dimensions, Inference and Combinatorics. A Journey Through the Data Jungle

Abstract: This talk takes us on a journey through modern high-dimensional statistics. We begin with a brief discussion on variable selection and estimation and the challenges they bring to high-dimensional inference, and we formulate a new family of inferential problems for graphical models. Our aim is to conduct hypothesis tests on graph properties such as connectivity, maximum degree and cycle presence. The testing algorithms we introduce are applicable to properties which are invariant under edge addition. In parallel, we also develop a minimax lower bound showing the optimality of our tests over a broad family of graph properties. We apply our methods to study neuroimaging data.

 

Friday, February 3: Dr. Rajarshi Mukherjee, Stanford University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Sparse Signal Detection with Binary Outcomes

Abstract: In this talk, I will discuss some examples of sparse signal detection problems in the context of binary outcomes. These will be motivated by examples from next generation sequencing association studies, understanding heterogeneities in large scale networks, and exploring opinion distributions over networks. Moreover, these examples will serve as templates to explore interesting phase transitions present in such studies. In particular, these phase transitions will be aimed at revealing a difference between studies with possibly dependent binary outcomes and Gaussian outcomes. The theoretical developments will be further complemented with numerical results.

 

Friday, February 10: Dr. Rohit Kumar Patra, University of Florida

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: We consider estimation and inference in a single index regression model with an unknown link function. In contrast to the standard approach of using kernel methods, we consider the estimation of the link function under two different kinds of constraints namely smoothness constraints and convexity (shape) constraints. Under smoothness constraints, we use smoothing splines to estimate the link function. We develop a method to compute the penalized least squares estimators (PLSE) of the parametric and the nonparametric components given i.i.d. data. Under convexity constraint on the link function, we develop least square estimators (LSE) for the unknown quantities. We prove the consistency and find the rates of convergence of both the PLSE and the LSE. We establish root-n-rate of convergence and the asymptotic efficiency of the PLSE and the LSE of the parametric component under mild assumptions. We illustrate and validate the method through experiments on simulated and real data. This is work with Arun Kuchibhotla and Bodhisattva Sen.

 

Friday, February 17: Dr. Jun Liu, Harvard University

214 Duxbury Hall, 10:00am

Title: Robust Variable and Interaction Selection for Logistic Regression and Multiple Index Models

Abstract: Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in important predictors with both main and interaction effects, whereas in the backward stage SODA removes unimportant terms so as to optimize the extended Bayesian Information Criterion (EBIC). Compared with existing methods on quadratic discriminant analysis variable selection, SODA can deal with high-dimensional data with the number of predictors much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness.  We further extend SODA to conduct variable selection and model fitting for  general index models. Compared with the Sliced Inverse Regression (SIR) method \cite{li1991sliced} and its existing variations, SODA requires neither the linearity nor the constant variance condition and is much more robust. Our theoretical establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian  design matrices in both logistic and general index models.

 

Friday, March 3: Dr. Xin Zhang, Florida State University Department of Statistics

214 Duxbury Hall, 10:00am

Title: The Maximum Separation Subspace (MASES) in Sufficient Dimension Reduction with Binary or Categorical Response

Abstract: Sufficient dimension reduction methods are useful tools for exploring and visualizing data and prediction in regression, especially when the number of covariates is large. In this talk, we introduce the new notion of Maximum Separation Subspace (MASES) as a natural inferential and estimative object for sufficient dimension reduction with binary or categorical response. We will see connections with inverse regression subspace, the central subspace and the central discriminant subspace; and we will also illustrate via examples such as Fisher's linear discriminant analysis, quadratic discriminant analysis, single and multiple index models, etc. We study properties of the MASES and develop method to estimate it. Consistency and asymptotic normality of the MASES estimator is established. Simulations and a real data example show superb performance of the proposed MASES estimator, which substantially  outperforms classical sufficient dimension methods. 

 

Friday, March 10: Dr. Peter Hoff, Duke University

214 Duxbury Hall, 10:00am

Title: Adaptive FAB confidence intervals with constant coverage

Abstract: Confidence intervals for the means of multiple normal populations are often based on a hierarchical normal model. While commonly used interval procedures based on such a model have the nominal coverage rate on average across a population of groups, their actual coverage rate for a given group will be above or below the nominal rate, depending on the value of the group mean.

In this talk I present confidence interval procedures that have constant frequentist coverage rates and that make use of information about across-group heterogeneity, resulting in constant-coverage intervals that are narrower than standard t-intervals on average across groups. These intervals are obtained by inverting Bayes-optimal frequentist tests, and so are "frequentist, assisted by Bayes" (FAB). I present some asymptotic optimality results and some extensions to other multiparameter models, such as linear regression.

 

Friday, March 24: Dr. Ying Guo, Emory University

214 Duxbury Hall, 10:00am

Title: New ICA methods for brain network analysis using neuroimaging data

Abstract: In recent years, Independent Component Analysis (ICA) has gained significant popularity in diverse fields such as medical imaging, signal processing, and machine learning.  In particular, ICA has become an important tool for identifying and characterizing brain functional networks in neuroimaging studies. Although widely applied, current ICA methods have several major limitations that reduce their applicability in imaging studies. First, an important goal in imaging data analysis is to investigate how brain functional networks are affected by subjects’ clinical and demographic characteristics. Existing ICA methods, however, cannot directly incorporate covariate effects in ICA decomposition. Secondly, the collection of multimodal neuroimaging (e.g. fMRI and DTI) has become common practice in the neuroscience community. But current ICA methods are not flexible to accommodate and integrate multimodal imaging data that have different scales and data representations (scalar/array/matrix).   In this talk, I am going to present two new ICA models that we have developed that aim to extend the ICA methodology to address these needs in neuroimaging applications. I will first introduce a hierarchical covariate-adjusted ICA (hc-ICA) model that provides a formal statistical framework for estimating covariate effects and testing differences between brain functional networks. Hc-ICA provides a more reliable and powerful statistical tool for evaluating group differences in brain functional networks while appropriately controlling for potential confounding factors.  Computationally efficient estimation and inference procedure has been developed for the hc-ICA model. Next, I will present a novel Distributional Independent Component Analysis (D-ICA) framework for decomposing multimodal neuroimaging such as fMRI and DTI. Unlike traditional ICA which separates observed data as a mixture of independent components, the proposed D-ICA represents a fundamentally new approach that aims to perform ICA on the distribution level.  The D-ICA can potentially provide a unified framework to extract neural features across imaging modalities.  I will discuss the connection and distinction between standard ICA and D-ICA.  The proposed methods will be illustrated through simulation studies and real-world applications in neuroimaging studies.

 

Friday, March 31: Dr. Fei Zou, University of Florida

214 Duxury Hall, 10:00am

Title: On Surrogate Variable Analysis for High Dimensional Genetics and Genomics Data

Abstract: Unwanted variation in hidden variables often negatively impacts analysis of high-dimensional data, leading to high false discovery rates, and/or low rates of true discoveries.  A number of procedures have been proposed to detect and estimate the hidden variables, including principal component analysis (PCA).  However, empirical data analysis suggests that PCA is not efficient in identifying the hidden variables that only affect a subset of features but with relatively large effects. Surrogate variable analysis (SVA)  has been proposed to overcome this limitation.  But SVA also suffers some efficiency loss for data with a complicated dependent structure among the hidden variables and the variables of primary interest.  In this talk, we will describe an improved PCA procedure for detecting and estimating the hidden variables.  Some new applications of the method will also be discussed.

 

Friday, April 14: Dr. Shuangge Ma, Yale

214 Duxbury Hall, 10:00am

Title: Robust Network-based Analysis of the Associations between (Epi)Genetic Measurements

Abstract: Multiple types of (epi)genetic measurements are involved in the development and progression of complex diseases. Different types of (epi)genetic measurements are interconnected, and modeling their associations leads to a better understanding of disease biology and facilitates building clinically useful models. Such analysis is challenging in multiple aspects. To fix notations, we use gene expression (GE) and copy number variation (CNV) as an example. Both GE and CNV measurements are high-dimensional. One GE is possibly regulated by multiple CNVs, however, the set of relevant CNVs is unknown. For a specific GE, the cis-acting CNV usually has the dominant effect and can behave differently from the trans-acting CNVs. In addition, GE measurements can have long tails and contamination. Lastly, some CNVs are more tightly connected to each other than the rest. In this study, a novel method is developed to more effectively model the associations between (epi)genetic measurements. For each GE, a partially linear model is assumed with a nonlinear effect for the cis-acting CNV. A robust loss function is adopted to accommodate long-tail distributions and data contamination. We adopt penalization to accommodate the high dimensionality and select relevant CNVs. A network structure is introduced to account for the interconnections among CNVs. We develop a computational algorithm and rigorously establish the consistency properties. Simulation shows the superiority of proposed method over alternatives. The analysis of a TCGA (The Cancer Genome Atlas) dataset demonstrates the practical applicability of proposed method.

 

Friday, April 21: Dr. Hongyu Zhao, Yale

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA