# Colloquia

## Spring 2016 Colloquia

Friday, January 13: Dr. Meng Li, Duke University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: New developments in probabilistic image analysis: boundary detection and image reconstruction

Abstract: Images (2D, 3D, or even higher dimensional) are a fundamental data type. The area of image analysis is undergoing a dramatic transformation to utilize the power of statistical modeling, which provides a unique way to describe uncertainties and leads to model-based solutions. We exemplify this by two critical and challenging problems, boundary detection and image reconstruction, in a comprehensive way from theory, methodology to application. We view the boundary as a closed smooth lower-dimensional manifold, and propose a nonparametric Bayesian approach based on priors indexed by the unit sphere. The proposed method achieves four goals of guaranteed geometric restriction, (nearly) minimax optimal rate adapting to the smoothness level, convenience for joint inference and computational efficiency. We introduce a probabilistic model-based technique using wavelets with adaptive random partitioning to reconstruct images. We represent multidimensional signals by a mixture of one-dimensional wavelet decompositions in the form of randomized recursive partitioning on the space of wavelet coefficient trees, where the decomposition adapts to the geometric features of the signal. State-of-the-art performances of proposed methods are demonstrated using simulations and applications including neuroimaging in brain oncology. R/Matlab packages/toolboxes and interactive shiny applications are available for routine implementation.

Friday, January 20: Lifeng Lin, University of Minnesota (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: On evidence cycles in network meta-analysis

Abstract: As an extension of pairwise meta-analysis of two treatments, network meta-analysis has recently attracted many researchers in evidence-based medicine because it simultaneously synthesizes both direct and indirect evidence from multiple treatments and thus facilitates better decision making. The Lu–Ades Bayesian hierarchical model is a popular method to implement network meta-analysis, and it is generally considered more powerful than conventional pairwise meta-analysis, leading to more accurate effect estimates with narrower confidence intervals. However, the improvement of effect estimates produced by Lu–Ades network meta-analysis has never been studied theoretically. In this talk, we show that such improvement depends highly on evidence cycles in the treatment network. Specifically, Lu–Ades network meta-analysis produces posterior distributions identical to separate pairwise meta-analyses for all treatment comparisons when a treatment network does not contain cycles. Even in a general network with cycles, treatment comparisons that are not contained in any cycles do not benefit from Lu–Ades network meta-analysis. Simulations and a case study are used to illustrate the equivalence of Lu–Ades network meta-analysis and pairwise meta-analysis in certain networks.

Friday, January 27: Dr. Hwanhee Hong, Johns Hopkins University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Integrating Data for Comparative Effectiveness Research

Abstract: Comparative effectiveness research helps answer “what works best” and provide evidence on the effectiveness, benefits, and harms of different treatments. When multiple sources of data exist on a particular question the evidence should be obtained by integrating those sources in a principled way. Network meta-analysis (NMA) is an extension of a traditional pairwise meta-analysis to compare multiple treatments simultaneously and take advantage of multiple sources of data. In some situations there are some studies with only aggregated data (AD) and others with individual patient-level data (IPD) available; standard network meta-analysis methods have been extended to synthesize these types of data simultaneously. However, existing methods do not sufficiently consider the quality of evidence (i.e., the level of precision of effect estimates or compatibility of study designs) across different data types, and assume all studies contribute equally to the treatment effect estimation regardless of whether it is AD or IPD. In this talk, I propose Bayesian hierarchical NMA models that borrow information adaptively across AD and IPD studies using power and commensurate priors. The power parameter in the power priors and spike-and-slab hyperprior in the commensurate priors govern the level of borrowing information across study types. We incorporate covariate-by-treatment interactions to examine subgroup effects and discrepancy of the subgroup effects estimated in AD and IPD (i.e., ecological bias). The methods are validated and compared via extensive simulation studies, and then applied to an example in diabetes treatment comparing 28 oral anti-diabetic drugs. We compare results across model and hyperprior specifications.  These methods development enables us to integrate different types of data in network meta-analysis with flexible prior distributions and helps enhance comparative effectiveness research by providing a comprehensive understanding of treatment effects and effect modification (via the covariate-by-treatment interactions) from multiple sources of data.

Tuesday, January 31: Dr. Matey Neykov, Princeton University (Faculty Candidate)

214 Duxbury Hall, 2:00pm

Title: High Dimensions, Inference and Combinatorics. A Journey Through the Data Jungle

Abstract: This talk takes us on a journey through modern high-dimensional statistics. We begin with a brief discussion on variable selection and estimation and the challenges they bring to high-dimensional inference, and we formulate a new family of inferential problems for graphical models. Our aim is to conduct hypothesis tests on graph properties such as connectivity, maximum degree and cycle presence. The testing algorithms we introduce are applicable to properties which are invariant under edge addition. In parallel, we also develop a minimax lower bound showing the optimality of our tests over a broad family of graph properties. We apply our methods to study neuroimaging data.

Friday, February 3: Dr. Rajarshi Mukherjee, Stanford University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Sparse Signal Detection with Binary Outcomes

Abstract: In this talk, I will discuss some examples of sparse signal detection problems in the context of binary outcomes. These will be motivated by examples from next generation sequencing association studies, understanding heterogeneities in large scale networks, and exploring opinion distributions over networks. Moreover, these examples will serve as templates to explore interesting phase transitions present in such studies. In particular, these phase transitions will be aimed at revealing a difference between studies with possibly dependent binary outcomes and Gaussian outcomes. The theoretical developments will be further complemented with numerical results.

Friday, February 10: Rohit Kumar Patra, University of Florida

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, February 17: Jun Liu, Harvard University

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, March 3: Dr. Xin Zhang, Florida State University Department of Statistics

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, March 10: Peter Hoff, Duke University

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, March 24: Ying Guo, Emory University

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, March 31: Fei Zou, University of Florida

214 Duxury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, April 7: Tentative Yongyuan and Anna Li Graduate Student Presentation Competition

214 Duxbury Hall, 10:00am

Friday, April 14: Shuangge Ma, Yale

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

Friday, April 21: Hongyu Zhao, Yale

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

## Fall 2016 Colloquia

Friday, September 9: Dr. Adrian Barbu, Florida State University Department of Statistics

214 Duxbury Hall, 10:00am

Title: A Novel Method for Obtaining Tree Ensembles by Loss Minimization

Abstract: Tree ensembles can capture the relevant variables and to some extent the relationships between them in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of Boosting or Random Forest. Previous work showed that Boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a less compact and less interpretable model. In this talk we present a novel method for obtaining a compact ensemble of trees that grows a pool of trees in parallel with many independent Boosting threads and then selects a small subset and updates their leaf weights by loss optimization. Experiments on real datasets show that the obtained model has usually a smaller loss than Boosting, which is also reflected in a lower misclassification error on the test set.

Friday, September 16: Dr. Antonio Linero, Florida State University Department of Statistics

214 Duxbury Hall, 10:00am

Title: Bayesian regression trees for high dimensional prediction and variable selection

Abstract: Decision tree ensembles are an extremely popular tool for obtaining high quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles which are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees framework. In this talk, we take a Bayesian point of view on this problem, and show how to construction priors on decision tree ensembles which are capable of adapting to sparsity by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree. We demonstrate the efficacy of this approach in simulation studies, and argue for the theoretical strengths of this approach
by showing that, under certain conditions, the posterior concentrates around the true regression function at a rate which is independent of the number of predictors. Our approach has additional benefits over Bayesian methods for constructing tree ensembles, such as allowing for fully-Bayesian variable selection.

Friday, September 23: Dr. Xiao Wang, Purdue University

214 Duxbury Hall, 10:00am

Title: Quantile Image-on-Scalar Regression

Abstract:Quantile regression with functional response and scalar covariates has become an important statistical tool for many neuroimaging studies. In this paper, we study optimal estimation of varying coefficient functions in the framework of reproducing kernel Hilbert space. Minimax rates of convergence under both fixed and random designs are established. We have developed easily implementable estimators which are shown to be rate-optimal. Simulations and real data analysis are conducted to examine the finite-sample performance. This is a joint work with Zhengwu Zhang, Linglong Kong, and Hongtu Zhu.

Friday, September 30: Dr. Chiwoo Park, Florida State University Industrial and Manufacturing Engineering

214 Duxbury Hall, 10:00am

Title: Patching Gaussian Processes for Largescale Spatial Regression

Abstract: This talk presents a method for solving a Gaussian process (GP) regression with constraints on a regression domain boundary. The method can guide and improve the prediction around a domain boundary with the boundary constraints. More importantly, the method can be applied to improve a local GP regression as a solver of a large-scale regression analysis for remote sensing and other large datasets. In the conventional local GP regression, a regression domain is first partitioned into multiple local regions, and an independent GP model is fit for each local region using the training data belonging to the region. Two key issues with the local GP are (1) the prediction around the boundary of a local region is not as accurate as the prediction interior of the local region, and (2) two local GP models for two neighboring local regions produce different predictions at the boundary of the two regions, creating discontinuity in the output regression. These issues can be addressed by constraining local GP models on the boundary using our constrained GP regression approach. The performance of the proposed approach depends on the “quality” of the constraints posed on the local GP models. We present a method to estimate “good" constraints based on data. Some convergence results and numerical results of the proposed approach will be presented.

Friday, October 7: Dr. Dan Shen, University of South Florida

214 Duxbury Hall, 10:00am

Title: Dimension Reduction of Neuroimaging Data Analysis

Abstract: High dimensionality has become a common feature of big data” encountered in many divergent fields, such as neuroimaging and genetic analysis, which provides modern challenges for statistical analysis. To cope with the high dimensionality, dimension reduction becomes necessary. Principal component analysis (PCA) is arguably the most popular classical dimension reduction technique, which uses a few principal components (PCs) to explain most of the data variation.
I first introduce Multiscale Weighted PCR (MWPCR), a new variation of PCA, for neuroimaging analysis. MWPCA introduces two sets of novel weights, including global and local spatial weights, to enable a selective treatment of individual features and incorporation of class label information as well as spatial pattern within neuroimaging data. Simulation studies and real data analysis show that MWPCA outperforms several competing PCA methods.
Second we develop statistical methods for analyzing tree-structured data objects. This work is motivated by the statistical challenges of analyzing a set of blood artery trees, which is from a study of Magnetic Resonance Angiography (MRA) brain images of a set of 98 human subjects. The non-Euclidean property of tree space makes the application of conventional statistical analysis, including PCA, to tree data very challenging. We develop an entirely new approach that uses the Dyck path representation, which builds a bridge between the tree space (a non-Euclidean space) and curve space (standard Euclidean space). That bridge enables the exploitation of the power of functional data analysis to explore statistical properties of tree data sets.

Friday, October 14: Dr. Jonathan Bradley, Florida State University Department of Statistics

214 Duxbury Hall, 10:00am

Title: Hierarchical Models for Spatial Data with Errors that are Correlated with the Latent Process

Abstract: Prediction of a spatial Gaussian process using a “big dataset” has become a topical area of research over the last decade. The available solutions often involve placing strong assumptions on the error process associated with the data. Specifically, it has typically been assumed that the data is equal to the spatial process of principal interest plus a mutually independent error process. Further, to obtain computationally efficient predictions, additional assumptions on the latent random processes and/or parameter models have become a practical necessity (e.g., low rank models, sparse precision matrices, etc.). In this article, we consider an alternative latent process modeling schematic where it is assumed that the error process is spatially correlated and correlated with the spatial random process of principal interest. We show the counterintuitive result that error process dependencies allow one to remove assumptions on the spatial process of principal interest, and obtain computationally efficient predictions. At the core of this proposed methodology is the definition of a corrupted version of the latent process of interest, which we call the data specific latent process (DSLP). Demonstrations of the DSLP paradigm are provided through simulated examples and through an application using a large dataset consisting of the US Census Bureau’s American Community Survey 5-year period estimates of median household income on census tracts.

Friday, October 21: Dr. Bing Li, Pennsylvania State University

214 Duxbury Hall, 10:00am

Title: A nonparametric graphical model for functional data with application to brain networks based on fMRI

Abstract: We introduce a nonparametric graphical model whose observations on vertices are functions. Many modern applications, such as electroencephalogram and functional magnetic resonance imaging (fMRI), produce data are of this type. The model is based on Additive Conditional Independence (ACI), a statistical relation that captures the spirit of conditional independence without resorting to multi-dimensional kernels. The random functions are assumed to reside in a Hilbert space. No distributional assumption is imposed on the random functions: instead, their statistical relations are characterized nonparametrically by a second Hilbert space, which is a reproducing kernel Hilbert space whose kernel is determined by the inner product of the first Hilbert space. A precision operator is then constructed based on the second space, which characterizes ACI, and hence also the graph.
The resulting estimator is relatively easy to compute, requiring no iterative optimization or inversion of large matrices.
We establish the consistency the convergence rate of the estimator. Through simulation studies we demonstrate that the estimator performs better than the functional Gaussian graphical model when the relations among vertices are nonlinear or heteroscedastic. The method is applied to an fMRI data set to construct brain networks for patients with attention-deficit/hyperactivity disorder.

Friday, October 28: Dr. Glen Laird, Sanofi

214 Duxbury Hall, 10:00am

Title: Statistical Considerations for Pharmaceutical Industry Clinical Trials

Abstract: Clinical trials are the key evidence drivers for the pharmaceutical industry. These trials use a set of statistical methods particular to the setting and regulatory environment. In the context of oncology clinical trials an overview of selected methodological topics will be presented including multiplicity, dose escalation methods, Simon designs, and interim analyses.

Friday, November 4: Dr. Andre Rogatko, Cedars-Sinai Medical Center

214 Duxbury Hall, 10:00am

Title: Dose Finding with Escalation with Overdose Control in Cancer Clinical Trials

Abstract: Escalation With Overdose Control (EWOC) is a Bayesian adaptive dose finding design that produces consistent sequences of doses while controlling the probability that patients are overdosed. EWOC was the first dose-finding procedure to directly incorporate the ethical constraint of minimizing the chance of treating patients at unacceptably high doses. Its defining property is that the expected proportion of patients treated at doses above the maximum tolerated dose (MTD) is equal to a specified value α, the feasibility bound. Topics to be discussed include: two-parameter logistic model, use of covariate in prospective clinical trial, drug combinations, and  Web-EWOC, a free interactive web tool for designing and conducting dose finding trials in cancer https://biostatistics.csmc.edu/ewoc/ewocWeb.php.

Friday, November 18: Dr. Mike Daniels, University of Texas at Austin

214 Duxbury Hall, 10:00am

Title: To be announced

Abstract: To be announced

Friday, December 2: Dr. Martin Lindquist, Johns Hopkins University

214 Duxbury Hall, 10:00am

Title: High-dimensional Multivariate Mediation with Application to Neuroimaging Data

Abstract: Mediation analysis is an important tool in the behavioral sciences for investigating the role of intermediate variables that lie in the path between a randomized treatment/exposure and an outcome variable. The influence of the intermediate variable on the outcome is often explored using structural equation models (SEMs), with model coefficients interpreted as possible effects. While there has been significant research on the topic in recent years, little work has been done on mediation analysis when the intermediate variable (mediator) is a high-dimensional vector. In this work we introduce a novel method for mediation analysis in this setting called the directions of mediation (DMs). The DMs represent an orthogonal transformation of the space spanned by the set of mediators, chosen so that the transformed mediators are ranked based upon the proportion of the likelihood of the full SEM that they explain. We provide an estimation algorithm and establish the asymptotic properties of the obtained estimators. We demonstrate the method using a functional magnetic resonance imaging (fMRI) study of thermal pain where we are interested in determining which brain locations mediate the relationship between the application of a thermal stimulus and self- reported pain.

## Spring 2016 Colloquia

Friday, January 8: Dr. Dehan Kong, University of North Carolina (Faculty Candidate)

HCB 103, 10:00am

Title: High-dimensional Matrix Linear Regression Model

Abstract: We develop a high-dimensional matrix linear regression model (HMLRM) to correlate matrix responses with high-dimensional scalar covariates when coefficient matrices have low-rank structures. We propose a fast and efficient screening procedure based on the spectral norm to deal with the case that the dimension of scalar covariates is ultra-high. We develop an efficient estimation procedure based on the nuclear norm regularization, which explicitly borrows the matrix structure of coefficient matrices. We systematically investigate various theoretical properties of our estimators, including estimation consistency, rank consistency, and the sure independence screening property under HMLRM. We examine the finite-sample performance of our methods using simulations and a large-scale imaging genetic dataset collected by the Alzheimer's Disease Neuroimaging Initiative study.

Tuesday, January 12: Naveen Narisetty, University of Michigan (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Consistent and Scalable Bayesian Model Selection for High Dimensional Data

Abstract: The Bayesian paradigm offers a flexible modeling framework for statistical analysis, but relative to penalization-based methods, little is known about the consistency of Bayesian model selection methods in the high dimensional setting. I will present a new framework for understanding Bayesian model selection consistency, using sample size dependent spike and slab priors that help achieve appropriate shrinkage. More specifically, strong selection consistency is established in the sense that the posterior probability of the true model converges to one even when the number of covariates grows nearly exponentially with the sample size. Furthermore, the posterior on the model space is asymptotically similar to the L0 penalized likelihood. I will also introduce a new Gibbs sampling algorithm for posterior computation, which is much more scalable for high dimensional problems than the standard Gibbs sampler, and yet retains the strong selection consistency property. The new algorithm and the consistency theory work for a variety of problems including linear and logistic regressions, and a more challenging problem of censored quantile regression where a non-convex loss function is involved.

Friday, January 15: Jonathan Bradley, University of Missouri (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Computationally Efficient Distribution Theory for Bayesian Inference of High-Dimensional Dependent Count-Valued Data

Abstract: We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensional count-valued data. Our primary interest is when there are possibly millions of data points referenced over different variables, geographic regions, and times. This problem requires extensive methodological advancements, as jointly modeling correlated data of this size leads to the so-called "big n problem." The computational complexity of prediction in this setting is further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we develop a new computationally efficient distribution theory for this setting. In particular, we introduce a multivariate log-gamma distribution and provide substantial theoretical development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. The results in this manuscript are extremely general, and can be used for data that exhibit fewer sources of dependency than what we consider (e.g., multivariate, spatial-only, or spatio-temporal-only data). Hence, the implications of our modeling framework may have a large impact on the general problem of jointly modeling correlated count-valued data. We show the effectiveness of our approach through a simulation study. Additionally, we demonstrate our proposed methodology with an important application analyzing data obtained from the Longitudinal Employer-Household Dynamics (LEHD) program, which is administered by the U.S. Census Bureau.

Friday, January 22: Abhra Sarkar, Duke University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Novel Statistical Frameworks for Analysis of Structured Sequential Data

Abstract: We are developing a broad array of novel statistical frameworks for analyzing complex sequential data sets. Our research is primarily motivated by a collaboration with neuroscientists trying to understand the neurological, genetic and evolutionary basis of human communication using bird and mouse models. The data sets comprise structured sequences of syllables or songs' produced by animals from different genotypes under different experimental conditions. The primary goal is then to elucidate the roles of different genotypes and experimental conditions on animal vocalization behaviors and capabilities. We have developed novel statistical methods based on first order Markovian dynamics that help answer these important scientific queries. First order dynamics is, however, insufficiently flexible to learn complex serial dependency structures and systematic patterns in the vocalizations, an important secondary goal in these studies. To this end, we have developed a sophisticated nonparametric Bayesian approach to higher order Markov chains building on probabilistic tensor factorization techniques. Our proposed method is of very broad utility, with applications not limited to analysis of animal vocalizations, and provides new insights into the serial dependency structures of many previously analyzed sequential data sets arising from diverse application areas. Our method has appealing theoretical properties and practical advantages, and achieves substantial gains in performance compared to previously existing methods. Our research also paves the way to advanced automated methods for more sophisticated dynamical systems, including higher order hidden Markov models that can accommodate more general data types.

Tuesday, January 26: Alexander Petersen, University of California, Davis (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Representation of Samples of Density Functions and Regression for Random Objects

Abstract: In the first part of this talk, we will discuss challenges associated with the analysis of samples of one-dimensional density functions. Due to their inherent constraints, densities do not live in a vector space and therefore commonly used Hilbert space based methods of functional data analysis are not appropriate. To address this problem, we introduce a transformation approach, mapping probability densities to a Hilbert space of functions through a continuous and invertible map. Basic methods of functional data analysis, such as the construction of functional modes of variation, functional regression or classification, are then implemented by using representations of the densities in this linear space. Transformations of interest include log quantile density and log hazard transformations, among others. Rates of convergence are derived, taking into account the necessary preprocessing step of density estimation. The proposed methods are illustrated through applications in brain imaging.

The second part of the talk will address the more general problem of analyzing complex data that are non-Euclidean and specifically do not lie in a vector space. To address the need for statistical methods for such data, we introduce the concept of Fr\'echet regression. This is a general approach to regression when responses are complex random objects in a metric space and predictors are in $\mathcal{R}^p$. We develop generalized versions of both global least squares regression and local weighted least squares smoothing. We derive asymptotic rates of convergence for the corresponding sample based fitted regressions to the population targets under suitable regularity conditions by applying empirical process methods. Illustrative examples include responses that consist of probability distributions and correlation matrices, and we demonstrate the proposed Fr\'echet regression for demographic and brain imaging data.

Friday, January 29: Dr. Yifei Sun, Johns Hopkins University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Recurrent Marker Processes in the Presence of Competing Terminal Events

Abstract: In follow-up studies, utility marker measurements are usually collected upon the occurrence of recurrent events until a terminal event such as death takes place. In this talk, we define the recurrent marker process to characterize utility accumulation over time. For example, with medical cost and repeated hospitalizations being treated as marker and recurrent events respectively, the recurrent marker process is the trajectory of total medical cost spent, which stops to increase after death. In many applications, competing risks arise as subjects are at risk of more than one mutually exclusive terminal event, such as death from different causes, and modeling the recurrent marker process for each failure type is often of interest. However, censoring creates challenges in the methodological development, because for censored subjects, both failure type and recurrent marker process after censoring are unobserved. To circumvent this problem, we propose a nonparametric framework for analyzing this type of data. In the presence of competing risks, we start with an estimator by using marker information from uncensored subjects. As a result, the estimator can be inefficient under heavy censoring. To improve efficiency, we propose a second estimator by combining the first estimator with auxiliary information from the estimate under non-competing risks model. The large sample properties and optimality of the second estimator is established. Simulation studies and an application to the SEER-Medicare linked data are presented to illustrate the proposed methods.

Tuesday, February 2: Guan Yu, University of North Carolina at Chapel Hill (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Supervised Learning Incorporating Graphical Structure among Predictors

Abstract: With the abundance of high dimensional data in various disciplines, regularization techniques are very popular these days. Despite the success of these techniques, some challenges remain. One challenge is the development of effi cient methods incorporating structure information among predictors. Typically, the structure information among predictors can be modeled by the connectivity of an undirected graph using all predictors as nodes of the graph. In this talk, I will introduce an e cient regularization technique incorporating graphical structure information among predictors. Specifi cally, according to the undirected graph, we use a latent group lasso penalty to utilize the graph node-by-node. The predictors connected in the graph are encouraged to be selected jointly. This new regularization technique can be used for many supervised learning problems. For sparse regression, our new method using the proposed regularization technique includes adaptive Lasso, group Lasso, and ridge regression as special cases. Theoretical studies show that it enjoys model selection consistency and acquires tight fi nite sample bounds for estimation and prediction. For the multi-task learning problem, our proposed graph-guided multi-task method includes the popular 2;1-norm regularized multi-task learning method as a special case. Numerical studies using simulated datasets and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset also demonstrate the eff ectiveness of the proposed methods.

Friday, February 5: Dr. Yun Yang, Duke University (Faculty Candidate)

214 Duxbury Hall, 10:00am

Title: Computationally efficient high-dimensional variable selection via Bayesian procedures

Abstract: Variable selection is fundamental in many high-dimensional statistical problems with sparsity structures. Much of the literature is based on optimization methods, where penalty terms are incorporated that yield both convex and non-convex optimization problems. In this talk, I will take a Bayesian point of view on high-dimensional regression, by placing a prior on the model space and performing the necessary integration so as to obtain a posterior distribution. In particular, I will show that a Bayesian approach can consistently select all relevant covariates under relatively mild conditions from a frequentist point of view.
Although Bayesian procedures for variable selection are provably effective and easy to implement, it has been suggested by many statisticians that Markov Chain Monte Carlo (MCMC) algorithms for sampling from the posterior distributions may need a long time to converge, as sampling from an exponentially large number of sub-models is an intrinsically hard problem. Surprisingly, our work shows that this plausible "exponentially many model" argument is misleading. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee the rapid mixing of a particular Metropolis-Hastings algorithm. The number of iterations for this Markov chain to reach stationarity is linear in the number of covariates up to a logarithmic factor.

Friday, February 19: Dr. Somnath Datta, University of Florida

214 Duxbury Hall, 10:00am

Title: Multi-Sample Adjusted U-Statistics that Account for Confounding Covariates

Abstract: Multi-sample U-statistics encompass a wide class of test statistics that allow the comparison of two or more distributions. U-statistics are especially powerful because they can be applied to both numeric and non-numeric (e.g., textual) data. However, when comparing the distribution of a variable across two or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (using the propensity score for prospective data or the stratification score for retrospective data) to construct adjusted U-statistics that can test the equality of distributions across two (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our procedures is demonstrated through simulation studies as well as an analysis of genetic data.

Friday, March 4: Dr. Guang Cheng, Purdue University

214 Duxbury Hall, 10:00am

Title: How Many Processors Do We Really Need in Parallel Computing?

Abstract: This talk explores statistical versus computational trade-off to address a basic question in a typical divide-and-conquer setup: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline models, we observe an intriguing phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible.

Friday, March 18: Dr. Tianfu Wu, University of California, Los Angeles

214 Duxbury Hall, 10:00am

Title: Towards a Visual Turing Test and Lifelong Learning: Learning Deep Hierarchical Models and Cost-sensitive Decision Policies for Understanding Visual Big Data

Abstract: Modern technological advances produce data at breathtaking scales and complexities such as the images and videos on the web. Such big data require highly expressive models for their representation, understanding and prediction. To fit such models to the big data, it is essential to develop practical learning methods and fast inferential algorithms. My research has been focused on learning expressive hierarchical models and fast inference algorithms with homogeneous representation and architecture to tackle the underlying complexities in such big data from statistical perspectives. In this talk, with emphasis on a visual restricted Turing test -- the grand challenge in computer vision, I will introduce my work on (i) Statistical Learning of Large Scale and Highly Expressive Hierarchical Models from Big Data, and (ii) Bottom-up/Top-down Inference with Hierarchical Models by Learning Near-Optimal Cost-Sensitive Decision Policies. Applications in object detection, online object tracking and robot autonomy will be discussed.

Friday, March 25: Dr. Faming Liang, University of Florida

214 Duxbury Hall, 10:00am

Title: A Blockwise Coordinate Consistent Method for High-Dimensional Parameter Estimation

Abstract: The dramatic improvement in data collection and acquisition technologies in the last decades has enabled scientists to collect a great amount of high dimensional data. Due to their intrinsic nature, many of the high dimensional data, such as omics data and genome-wide association study (GWAS) data, have a much smaller sample size than their dimension (a.k.a. small-n-large-P). How to estimate the parameters for the high dimensional models with a small sample size is still a challenge problem though substantial progress has been obtained in the last decades. The popular method to this problem is regularization, but which can perform badly when the sample size is small and the variables are highly correlated. To alleviate this difficulty, we propose a blockwise coordinate consistent (BCC) method, which works by maximizing a new objective function---expectation of the log-likelihood function using a cyclic algorithm: iteratively finding consistent estimates for each block of parameters conditional on the current estimates of the other parameters. The BCC method reduces the high dimensional parameter estimation problem to a series of low dimensional parameter estimation problems and is ready to be applied to parameter estimation for the complicated models used in big data analysis. Our numerical results indicate that BCC can provide a drastic improvement in both parameter estimation and variable selection over the regularization methods for high dimensional systems.

Friday, April 1: Dr. George Michailidis, University of Florida

214 Duxbury Hall, 10:00am

Title: Estimating high-dimensional multi-layered networks through penalized maximum likelihood

Abstract: Gaussian graphical models represent a good tool for capturing interactions between nodes represent the underlying random variables. However, in many applications in biology one is interested in modeling associations both between, as well as within molecular compartments (e.g., interactions between genes and proteins/metabolites). To this end, inferring multi-layered network structures from high-dimensional data provides insight into understanding the conditional relationships among nodes within layers, after adjusting for and quantifying the effects of nodes from other layers. We propose an integrated algorithmic approach for estimating multi-layered networks, that incorporates a screening step for significant variables, an optimization algorithm for estimating the key model parameters and a stability selection step for selecting the most stable effects. The proposed methodology offers an efficient way of estimating the edges within and across layers iteratively, by solving an optimization problem constructed based on penalized maximum likelihood (under a Gaussianity assumption). The optimization is solved on a reduced parameter space that is identified through screening, which remedies the instability in high-dimension. Theoretical properties are considered to ensure identifiability and consistent estimation of the parameters and convergence of the optimization algorithm, despite the lack of global convexity. The performance of the methodology is illustrated on synthetic data sets and on an application on gene and metabolic expression data for patients with renal disease.

Friday, April 8: Dr. Qian Zhang, FSU College of Education

214 Duxbury Hall, 10:00am

Title: A Comparison of Methods for Estimating Moderation Effects with Missing Data in the Predictors

Abstract: The most widely used statistical model for conducting moderation analysis is the moderated multiple regression (MMR) model. While conducting moderation analysis using MMR models, missing data could pose a challenge, mainly because of the nonlinear interaction term. In the study, we consider a simple MMR model, where the effect of predictor X on the outcome Y is moderated by a moderator U. The primary interest is to find ways of estimating and testing the moderation effect with the existence of missing data in the predictor X. We mainly focus on cases when X is missing completely at random and missing at random. Theoretically, it is found in the study that the existing methods including normal-distribution-based maximum likelihood estimation (NML) and normal-distribution-based multiple imputation (NMI) yield inconsistent moderation effect estimates when data are missing at random. To cope with this issue, Bayesian estimation (BE) is proposed. To compare the existing methods and the proposed BE under finite sample sizes, a simulation study is also conducted. Results indicate that the methods in comparison have different relative performance depending on various factors. The factors are missing data mechanisms, roles of variables responsible for missingness, population moderation effect sizes, sample sizes, missing data proportions, and distributions of predictor X. Limitations of the study and future research directions are also discussed.

Friday, April 15: Wei Sun, Yahoo Research

214 Duxbury Hall, 10:00am

Title: Provable Sparse Tensor Decomposition and Its Application to Personalized Recommendation

Abstract: Tensor as a multi-dimensional generalization of matrix has received increasing attention in industry due to its success in personalized recommendation systems. Traditional recommendation systems are mainly based on the user-item matrix, whose entry denotes each user's preference for a particular item. To incorporate additional information into the analysis, such as the temporal behavior of users, we encounter a user-item-time tensor. Existing tensor decomposition methods for personalized recommendation are mostly established in the non-sparse regime where the decomposition components include all features. For high dimensional tensor-valued data, many features in the components essentially contain no information about the tensor structure, and thus there is a great need for a more appropriate method that can simultaneously perform tensor decomposition and select informative features.

In this talk, I will discuss a new sparse tensor decomposition method that incorporates the sparsity of each decomposition component to the CP tensor decomposition. Specifically, the sparsity is achieved via an efficient truncation procedure to directly solve an L0 sparsity constraint. In theory, in spite of the non-convexity of the optimization problem, it is proven that an alternating updating algorithm attains an estimator whose rate of convergence significantly improves those shown in non-sparse decomposition methods. As a by-product, our method is also widely applicable to solve a broad family of high dimensional latent variable models, including high dimensional Gaussian mixtures and mixtures of sparse regression. I will show the advantages of our method in two real applications, click-through rate prediction for online advertising and high dimensional gene clustering.

Friday, April 22: Dr. Kun Chen, University of Connecticut

214 Duxbury Hall, 10:00am

Title: Sequential Estimation in Sparse Factor Regression

Abstract: Multivariate regression models of large scales are increasingly required and formulated in various fields. A sparse singular value decomposition of the regression component matrix is appealing for achieving dimension reduction and facilitating model interpretation. However, how to recover such a composition of sparse and low-rank structures remains a challenging problem. By exploring the connections between factor analysis and reduced-rank regression, we formulate the problem as a sparse factor regression and develop an efficient sequential estimation procedure. At each sequential step, a latent factor is constructed as a sparse linear combination of the observed predictors, for predicting the responses after accounting for the effects of the previously found latent factors. Comparing to the complicated joint estimation approach, a prominent feature of our proposed sequential method is that each step reduces to a simple regularized unit-rank regression, in which the orthogonality requirement among the sparse factors becomes optional rather than necessary. The ideas of coordinate descent and Bregman iterative methods are utilized to ensure fast computation and algorithmic convergence, even in the presence of missing data and when exact orthogonality is desired. Theoretically, we show that the sequential estimators enjoy the oracle properties for recovering the underlying sparse factor structure. The efficacy of the proposed approach is demonstrated by simulation studies and two real applications in genetics.