## Fall 2017 Colloquia

**Upcoming Colloquia**

- November 17th - Jayaram Sethuraman (Florida State)
- December 1st - Gen Li (Columbia University)
- December 8th - Benjamin Alamar (ESPN)

**Friday, September 22: Dr. Jonathan Bradley, Florida State University**

214 Duxbury Hall, 10:00am

Title: Hierarchical Models with Conditionally Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family

Abstract: We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce something we call the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. We demonstrate the proposed methodology through simulated examples and several real-data analyses.

**Friday, September 22: Dr. Annie (Peiyong) Qu, University of Illinois at Urbana-Champaign**

214 Duxbury Hall, 10:00am

Title: Individualized Multi-directional Variable Selection

Abstract: In this talk, we propose an individualized variable selection approach to select different relevant variables for different individuals. In contrast to conventional model selection approaches, the key component of the new approach is to construct a separation penalty with multi-directional shrinkages including zero, which facilitates individualized modeling to distinguish strong signals from noisy ones. As a byproduct, the proposed model identifies subgroups among which individuals share similar effects, and thus improves estimation efficiency and personalized prediction accuracy. Another advantage of the proposed model is that it can incorporate within-subject correlation for longitudinal data. We provide a general theoretical foundation under a double-divergence modeling framework where the number of subjects and the number of repeated measurements both go to infinity, and therefore involves high dimensional individual parameters. In addition, we present the oracle property for the proposed estimator to ensure its optimal large sample property. Simulation studies and an application to HIV longitudinal data are illustrated to compare the new approach to existing penalization methods. This is joint work with Xiwei Tang.

**Friday, October 6th: Dr. Hira Koul, Michigan State University**

214 Duxbury Hall, 10:00am

Title: Minimum Distance Model Checking in Berkson Measurement Error Models with Validation Data

Abstract: We shall present some tests for fitting a parametric regression model in the presence of Berkson measurement error in the covariates without specifying the measurement error density but when validation data is available. The availability of validation data makes it possible to estimate calibrated regression function nonparametrically. The proposed class of tests are based on a class of minimized integrated square distances between a nonparametric estimate of the calibrated regression function and the parametric null model being fitted. The asymptotic normality results of these tests under the null hypothesis and of the corresponding minimum distance (m.d.) estimators of the null model parameters will be presented. Surprisingly, asymptotic null distributions of these test statistics are the same as in the case of known measurement error density, while those of the m.d. estimators are affected by the estimation of the calibrated regression function. A simulation study shows desirable performance of a member of the proposed class of estimators and tests.

**October 13th: Dr. Xuiaohui Chen, University of Illinois at Urbana-Champaign**

214 Duxbury Hall, 10:00am

Title: Gaussian and bootstrap approximations of high-dimensional U-statistics and their applications

Abstract: We shall first discuss the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical, the randomly reweighted, and the multiplier bootstraps as randomly reweighted quadratic forms, all asymptotically valid and inferentially first-order equivalent in high-dimensions.

The bootstrap methods are applied on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold.

**Friday, October 20th: Dr. Yun Yang, Florida State University**

214 Duxbury Hall, 10:00am

Title: Fast and Optimal Bayesian Inference via Variational Approximations

Abstract: We propose a variational approximation to Bayesian posterior distributions, called $\alpha$-VB, with provable statistical guarantees for models with and without latent variables. The standard variational approximation is a special case of $\alpha$-VB with $\alpha=1$. When $\alpha \in(0,1)$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational optimization problem, implying that maximizing the evidence lower bound in variational inference has the effect of minimizing the Bayes risk within the variational density family. Operating in a frequentist setup, the variational inequalities imply that point estimates constructed from the $\alpha$-VB procedure converge at an optimal rate to the true parameter in a wide range of problems. We illustrate our general theory with a number of examples, including the mean-field variational approximation to (low)-high-dimensional Bayesian linear regression with spike and slab priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture of) Gaussian variational approximation in regular parametric models.

**Friday, October 27th: Dr. Subhashis Ghosal, North Carolina State University**

214 Duxbury Hall, 10:00am

Title: Coverage of Credible Bands for Nonparametric Regression Function and Derivatives

Abstract: Estimating derivatives of a multivariate regression function is an interesting example of an inverse problem but has not received much attention in the Bayesian literature. In this talk, we study coverage of Bayesian credible sets for the problem, with primary interest in uniform credible bands. A finite random series of B-splines prior is especially suitable for the purpose due to the availability of explicit posterior expressions and nice structure in the derivatives of B-splines. We develop useful bounds to show that slightly inflated credible sets for the regression function and its derivatives have high coverage in the frequentist sense, and hence a Bayesian’s quantification of uncertainty has frequentist justification. The results will be also used to construct credible sets for the regression mode with guaranteed frequentist coverage. The talk is based on joint work with William Weimin Yoo.

**Friday, November 3rd: Dr. Kshitij Khare, University of Florida**

214 Duxbury Hall, 10:00am

Title: Bayesian Inference for Gaussian Graphical Models Beyond Decomposable Graphs

Abstract: Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph G is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To ad- dress this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized G-Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the G-Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized G-Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs conains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

**Friday, November 17th: Dr. Jayaram Sethuraman, Florida State University**

214 Duxbury Hall, 10:00am

Title: The Origins of the Stick Breaking Construction of Dirichlet Priors

Abstract: “My 1994 paper gave a simple direct proof of the constructive definition of Dirichlet priors (Ferguson 1973) and did not dwell on how I got the idea for that construction. In this talk I will first describe the collection of all priors in the nonparametric problem and show how this description leads to the constructive definition, nowadays called the stick breaking construction. This also leads to the invariance under size biased property (ISBP) of the GEM distribution (the stick breaking part) which gives a simpler proof than in the 1994 paper for the posterior distribution of Dirichlet priors. All the ideas of this talk emanate from deeper understanding of the Blackwell and MacQueen paper in 1973.”

## Spring 2016 Colloquia

**Friday, January 13: Dr. Meng Li, Duke University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: New developments in probabilistic image analysis: boundary detection and image reconstruction

Abstract: Images (2D, 3D, or even higher dimensional) are a fundamental data type. The area of image analysis is undergoing a dramatic transformation to utilize the power of statistical modeling, which provides a unique way to describe uncertainties and leads to model-based solutions. We exemplify this by two critical and challenging problems, boundary detection and image reconstruction, in a comprehensive way from theory, methodology to application. We view the boundary as a closed smooth lower-dimensional manifold, and propose a nonparametric Bayesian approach based on priors indexed by the unit sphere. The proposed method achieves four goals of guaranteed geometric restriction, (nearly) minimax optimal rate adapting to the smoothness level, convenience for joint inference and computational efficiency. We introduce a probabilistic model-based technique using wavelets with adaptive random partitioning to reconstruct images. We represent multidimensional signals by a mixture of one-dimensional wavelet decompositions in the form of randomized recursive partitioning on the space of wavelet coefficient trees, where the decomposition adapts to the geometric features of the signal. State-of-the-art performances of proposed methods are demonstrated using simulations and applications including neuroimaging in brain oncology. R/Matlab packages/toolboxes and interactive shiny applications are available for routine implementation.

**Friday, January 20: Lifeng Lin, University of Minnesota (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: On evidence cycles in network meta-analysis

Abstract: As an extension of pairwise meta-analysis of two treatments, network meta-analysis has recently attracted many researchers in evidence-based medicine because it simultaneously synthesizes both direct and indirect evidence from multiple treatments and thus facilitates better decision making. The Lu–Ades Bayesian hierarchical model is a popular method to implement network meta-analysis, and it is generally considered more powerful than conventional pairwise meta-analysis, leading to more accurate effect estimates with narrower confidence intervals. However, the improvement of effect estimates produced by Lu–Ades network meta-analysis has never been studied theoretically. In this talk, we show that such improvement depends highly on evidence cycles in the treatment network. Specifically, Lu–Ades network meta-analysis produces posterior distributions identical to separate pairwise meta-analyses for all treatment comparisons when a treatment network does not contain cycles. Even in a general network with cycles, treatment comparisons that are not contained in any cycles do not benefit from Lu–Ades network meta-analysis. Simulations and a case study are used to illustrate the equivalence of Lu–Ades network meta-analysis and pairwise meta-analysis in certain networks.

**Friday, January 27: Dr. Hwanhee Hong, Johns Hopkins University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Integrating Data for Comparative Effectiveness Research

Abstract: Comparative effectiveness research helps answer “what works best” and provide evidence on the effectiveness, benefits, and harms of different treatments. When multiple sources of data exist on a particular question the evidence should be obtained by integrating those sources in a principled way. Network meta-analysis (NMA) is an extension of a traditional pairwise meta-analysis to compare multiple treatments simultaneously and take advantage of multiple sources of data. In some situations there are some studies with only aggregated data (AD) and others with individual patient-level data (IPD) available; standard network meta-analysis methods have been extended to synthesize these types of data simultaneously. However, existing methods do not sufficiently consider the quality of evidence (i.e., the level of precision of effect estimates or compatibility of study designs) across different data types, and assume all studies contribute equally to the treatment effect estimation regardless of whether it is AD or IPD. In this talk, I propose Bayesian hierarchical NMA models that borrow information adaptively across AD and IPD studies using power and commensurate priors. The power parameter in the power priors and spike-and-slab hyperprior in the commensurate priors govern the level of borrowing information across study types. We incorporate covariate-by-treatment interactions to examine subgroup effects and discrepancy of the subgroup effects estimated in AD and IPD (i.e., ecological bias). The methods are validated and compared via extensive simulation studies, and then applied to an example in diabetes treatment comparing 28 oral anti-diabetic drugs. We compare results across model and hyperprior specifications. These methods development enables us to integrate different types of data in network meta-analysis with flexible prior distributions and helps enhance comparative effectiveness research by providing a comprehensive understanding of treatment effects and effect modification (via the covariate-by-treatment interactions) from multiple sources of data.

**Tuesday, January 31: Dr. Matey Neykov, Princeton University (Faculty Candidate)**

214 Duxbury Hall, 2:00pm

Title: High Dimensions, Inference and Combinatorics. A Journey Through the Data Jungle

Abstract: This talk takes us on a journey through modern high-dimensional statistics. We begin with a brief discussion on variable selection and estimation and the challenges they bring to high-dimensional inference, and we formulate a new family of inferential problems for graphical models. Our aim is to conduct hypothesis tests on graph properties such as connectivity, maximum degree and cycle presence. The testing algorithms we introduce are applicable to properties which are invariant under edge addition. In parallel, we also develop a minimax lower bound showing the optimality of our tests over a broad family of graph properties. We apply our methods to study neuroimaging data.

**Friday, February 3: Dr. Rajarshi Mukherjee, Stanford University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Sparse Signal Detection with Binary Outcomes

Abstract: In this talk, I will discuss some examples of sparse signal detection problems in the context of binary outcomes. These will be motivated by examples from next generation sequencing association studies, understanding heterogeneities in large scale networks, and exploring opinion distributions over networks. Moreover, these examples will serve as templates to explore interesting phase transitions present in such studies. In particular, these phase transitions will be aimed at revealing a difference between studies with possibly dependent binary outcomes and Gaussian outcomes. The theoretical developments will be further complemented with numerical results.

**Friday, February 10: Dr. Rohit Kumar Patra, University of Florida**

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: We consider estimation and inference in a single index regression model with an unknown link function. In contrast to the standard approach of using kernel methods, we consider the estimation of the link function under two different kinds of constraints namely smoothness constraints and convexity (shape) constraints. Under smoothness constraints, we use smoothing splines to estimate the link function. We develop a method to compute the penalized least squares estimators (PLSE) of the parametric and the nonparametric components given i.i.d. data. Under convexity constraint on the link function, we develop least square estimators (LSE) for the unknown quantities. We prove the consistency and find the rates of convergence of both the PLSE and the LSE. We establish root-n-rate of convergence and the asymptotic efficiency of the PLSE and the LSE of the parametric component under mild assumptions. We illustrate and validate the method through experiments on simulated and real data. This is work with Arun Kuchibhotla and Bodhisattva Sen.

**Friday, February 17: Dr. Jun Liu, Harvard University**

214 Duxbury Hall, 10:00am

Title: Robust Variable and Interaction Selection for Logistic Regression and Multiple Index Models

Abstract: Under the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in important predictors with both main and interaction effects, whereas in the backward stage SODA removes unimportant terms so as to optimize the extended Bayesian Information Criterion (EBIC). Compared with existing methods on quadratic discriminant analysis variable selection, SODA can deal with high-dimensional data with the number of predictors much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with the Sliced Inverse Regression (SIR) method \cite{li1991sliced} and its existing variations, SODA requires neither the linearity nor the constant variance condition and is much more robust. Our theoretical establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models.

**Friday, March 3: Dr. Xin Zhang, Florida State University Department of Statistics**

214 Duxbury Hall, 10:00am

Title: The Maximum Separation Subspace (MASES) in Sufficient Dimension Reduction with Binary or Categorical Response

Abstract: Sufficient dimension reduction methods are useful tools for exploring and visualizing data and prediction in regression, especially when the number of covariates is large. In this talk, we introduce the new notion of Maximum Separation Subspace (MASES) as a natural inferential and estimative object for sufficient dimension reduction with binary or categorical response. We will see connections with inverse regression subspace, the central subspace and the central discriminant subspace; and we will also illustrate via examples such as Fisher's linear discriminant analysis, quadratic discriminant analysis, single and multiple index models, etc. We study properties of the MASES and develop method to estimate it. Consistency and asymptotic normality of the MASES estimator is established. Simulations and a real data example show superb performance of the proposed MASES estimator, which substantially outperforms classical sufficient dimension methods.

**Friday, March 10: Dr. Peter Hoff, Duke University**

214 Duxbury Hall, 10:00am

Title: Adaptive FAB confidence intervals with constant coverage

Abstract: Confidence intervals for the means of multiple normal populations are often based on a hierarchical normal model. While commonly used interval procedures based on such a model have the nominal coverage rate on average across a population of groups, their actual coverage rate for a given group will be above or below the nominal rate, depending on the value of the group mean.

In this talk I present confidence interval procedures that have constant frequentist coverage rates and that make use of information about across-group heterogeneity, resulting in constant-coverage intervals that are narrower than standard t-intervals on average across groups. These intervals are obtained by inverting Bayes-optimal frequentist tests, and so are "frequentist, assisted by Bayes" (FAB). I present some asymptotic optimality results and some extensions to other multiparameter models, such as linear regression.

**Friday, March 24: Dr. Ying Guo, Emory University**

214 Duxbury Hall, 10:00am

Title: New ICA methods for brain network analysis using neuroimaging data

Abstract: In recent years, Independent Component Analysis (ICA) has gained significant popularity in diverse fields such as medical imaging, signal processing, and machine learning. In particular, ICA has become an important tool for identifying and characterizing brain functional networks in neuroimaging studies. Although widely applied, current ICA methods have several major limitations that reduce their applicability in imaging studies. First, an important goal in imaging data analysis is to investigate how brain functional networks are affected by subjects’ clinical and demographic characteristics. Existing ICA methods, however, cannot directly incorporate covariate effects in ICA decomposition. Secondly, the collection of multimodal neuroimaging (e.g. fMRI and DTI) has become common practice in the neuroscience community. But current ICA methods are not flexible to accommodate and integrate multimodal imaging data that have different scales and data representations (scalar/array/matrix). In this talk, I am going to present two new ICA models that we have developed that aim to extend the ICA methodology to address these needs in neuroimaging applications. I will first introduce a hierarchical covariate-adjusted ICA (hc-ICA) model that provides a formal statistical framework for estimating covariate effects and testing differences between brain functional networks. Hc-ICA provides a more reliable and powerful statistical tool for evaluating group differences in brain functional networks while appropriately controlling for potential confounding factors. Computationally efficient estimation and inference procedure has been developed for the hc-ICA model. Next, I will present a novel Distributional Independent Component Analysis (D-ICA) framework for decomposing multimodal neuroimaging such as fMRI and DTI. Unlike traditional ICA which separates observed data as a mixture of independent components, the proposed D-ICA represents a fundamentally new approach that aims to perform ICA on the distribution level. The D-ICA can potentially provide a unified framework to extract neural features across imaging modalities. I will discuss the connection and distinction between standard ICA and D-ICA. The proposed methods will be illustrated through simulation studies and real-world applications in neuroimaging studies.

**Friday, March 31: Dr. Fei Zou, University of Florida**

214 Duxury Hall, 10:00am

Title: On Surrogate Variable Analysis for High Dimensional Genetics and Genomics Data

Abstract: Unwanted variation in hidden variables often negatively impacts analysis of high-dimensional data, leading to high false discovery rates, and/or low rates of true discoveries. A number of procedures have been proposed to detect and estimate the hidden variables, including principal component analysis (PCA). However, empirical data analysis suggests that PCA is not efficient in identifying the hidden variables that only affect a subset of features but with relatively large effects. Surrogate variable analysis (SVA) has been proposed to overcome this limitation. But SVA also suffers some efficiency loss for data with a complicated dependent structure among the hidden variables and the variables of primary interest. In this talk, we will describe an improved PCA procedure for detecting and estimating the hidden variables. Some new applications of the method will also be discussed.

**Friday, April 14: Dr. Shuangge Ma, Yale**

214 Duxbury Hall, 10:00am

Title: Robust Network-based Analysis of the Associations between (Epi)Genetic Measurements

Abstract: Multiple types of (epi)genetic measurements are involved in the development and progression of complex diseases. Different types of (epi)genetic measurements are interconnected, and modeling their associations leads to a better understanding of disease biology and facilitates building clinically useful models. Such analysis is challenging in multiple aspects. To fix notations, we use gene expression (GE) and copy number variation (CNV) as an example. Both GE and CNV measurements are high-dimensional. One GE is possibly regulated by multiple CNVs, however, the set of relevant CNVs is unknown. For a specific GE, the *cis*-acting CNV usually has the dominant effect and can behave differently from the *trans*-acting CNVs. In addition, GE measurements can have long tails and contamination. Lastly, some CNVs are more tightly connected to each other than the rest. In this study, a novel method is developed to more effectively model the associations between (epi)genetic measurements. For each GE, a partially linear model is assumed with a nonlinear effect for the *cis*-acting CNV. A robust loss function is adopted to accommodate long-tail distributions and data contamination. We adopt penalization to accommodate the high dimensionality and select relevant CNVs. A network structure is introduced to account for the interconnections among CNVs. We develop a computational algorithm and rigorously establish the consistency properties. Simulation shows the superiority of proposed method over alternatives. The analysis of a TCGA (The Cancer Genome Atlas) dataset demonstrates the practical applicability of proposed method.

**Friday, April 21: Dr. Hongyu Zhao, Yale**

214 Duxbury Hall, 10:00am

Title: TBA

Abstract: TBA

## Fall 2016 Colloquia

**Friday, September 9: Dr. Adrian Barbu, Florida State University Department of Statistics**

214 Duxbury Hall, 10:00am

Title: A Novel Method for Obtaining Tree Ensembles by Loss Minimization

Abstract: Tree ensembles can capture the relevant variables and to some extent the relationships between them in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of Boosting or Random Forest. Previous work showed that Boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a less compact and less interpretable model. In this talk we present a novel method for obtaining a compact ensemble of trees that grows a pool of trees in parallel with many independent Boosting threads and then selects a small subset and updates their leaf weights by loss optimization. Experiments on real datasets show that the obtained model has usually a smaller loss than Boosting, which is also reflected in a lower misclassification error on the test set.

**Friday, September 16: Dr. Antonio Linero, Florida State University Department of Statistics**

214 Duxbury Hall, 10:00am

Title: Bayesian regression trees for high dimensional prediction and variable selection

Abstract: Decision tree ensembles are an extremely popular tool for obtaining high quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles which are motivated by a generative probabilistic model, the most influential method being the Bayesian additive regression trees framework. In this talk, we take a Bayesian point of view on this problem, and show how to construction priors on decision tree ensembles which are capable of adapting to sparsity by placing a sparsity-inducing Dirichlet hyperprior on the splitting proportions of the regression tree. We demonstrate the efficacy of this approach in simulation studies, and argue for the theoretical strengths of this approach

by showing that, under certain conditions, the posterior concentrates around the true regression function at a rate which is independent of the number of predictors. Our approach has additional benefits over Bayesian methods for constructing tree ensembles, such as allowing for fully-Bayesian variable selection.

**Friday, September 23: Dr. Xiao Wang, Purdue University**

214 Duxbury Hall, 10:00am

Title: Quantile Image-on-Scalar Regression

Abstract:Quantile regression with functional response and scalar covariates has become an important statistical tool for many neuroimaging studies. In this paper, we study optimal estimation of varying coefficient functions in the framework of reproducing kernel Hilbert space. Minimax rates of convergence under both fixed and random designs are established. We have developed easily implementable estimators which are shown to be rate-optimal. Simulations and real data analysis are conducted to examine the finite-sample performance. This is a joint work with Zhengwu Zhang, Linglong Kong, and Hongtu Zhu.

**Friday, September 30: Dr. Chiwoo Park, Florida State University Industrial and Manufacturing Engineering**

214 Duxbury Hall, 10:00am

Title: Patching Gaussian Processes for Largescale Spatial Regression

Abstract: This talk presents a method for solving a Gaussian process (GP) regression with constraints on a regression domain boundary. The method can guide and improve the prediction around a domain boundary with the boundary constraints. More importantly, the method can be applied to improve a local GP regression as a solver of a large-scale regression analysis for remote sensing and other large datasets. In the conventional local GP regression, a regression domain is first partitioned into multiple local regions, and an independent GP model is fit for each local region using the training data belonging to the region. Two key issues with the local GP are (1) the prediction around the boundary of a local region is not as accurate as the prediction interior of the local region, and (2) two local GP models for two neighboring local regions produce different predictions at the boundary of the two regions, creating discontinuity in the output regression. These issues can be addressed by constraining local GP models on the boundary using our constrained GP regression approach. The performance of the proposed approach depends on the “quality” of the constraints posed on the local GP models. We present a method to estimate “good" constraints based on data. Some convergence results and numerical results of the proposed approach will be presented.

**Friday, October 7: Dr. Dan Shen, University of South Florida **

214 Duxbury Hall, 10:00am

Title: Dimension Reduction of Neuroimaging Data Analysis

Abstract: High dimensionality has become a common feature of ``big data” encountered in many divergent fields, such as neuroimaging and genetic analysis, which provides modern challenges for statistical analysis. To cope with the high dimensionality, dimension reduction becomes necessary. Principal component analysis (PCA) is arguably the most popular classical dimension reduction technique, which uses a few principal components (PCs) to explain most of the data variation.

I first introduce Multiscale Weighted PCR (MWPCR), a new variation of PCA, for neuroimaging analysis. MWPCA introduces two sets of novel weights, including global and local spatial weights, to enable a selective treatment of individual features and incorporation of class label information as well as spatial pattern within neuroimaging data. Simulation studies and real data analysis show that MWPCA outperforms several competing PCA methods.

Second we develop statistical methods for analyzing tree-structured data objects. This work is motivated by the statistical challenges of analyzing a set of blood artery trees, which is from a study of Magnetic Resonance Angiography (MRA) brain images of a set of 98 human subjects. The non-Euclidean property of tree space makes the application of conventional statistical analysis, including PCA, to tree data very challenging. We develop an entirely new approach that uses the Dyck path representation, which builds a bridge between the tree space (a non-Euclidean space) and curve space (standard Euclidean space). That bridge enables the exploitation of the power of functional data analysis to explore statistical properties of tree data sets.

**Friday, October 14: Dr. Jonathan Bradley, Florida State University Department of Statistics **

214 Duxbury Hall, 10:00am

Title: Hierarchical Models for Spatial Data with Errors that are Correlated with the Latent Process

Abstract: Prediction of a spatial Gaussian process using a “big dataset” has become a topical area of research over the last decade. The available solutions often involve placing strong assumptions on the error process associated with the data. Specifically, it has typically been assumed that the data is equal to the spatial process of principal interest plus a mutually independent error process. Further, to obtain computationally efficient predictions, additional assumptions on the latent random processes and/or parameter models have become a practical necessity (e.g., low rank models, sparse precision matrices, etc.). In this article, we consider an alternative latent process modeling schematic where it is assumed that the error process is spatially correlated and correlated with the spatial random process of principal interest. We show the counterintuitive result that error process dependencies allow one to remove assumptions on the spatial process of principal interest, and obtain computationally efficient predictions. At the core of this proposed methodology is the definition of a corrupted version of the latent process of interest, which we call the data specific latent process (DSLP). Demonstrations of the DSLP paradigm are provided through simulated examples and through an application using a large dataset consisting of the US Census Bureau’s American Community Survey 5-year period estimates of median household income on census tracts.

**Friday, October 21: Dr. Bing Li, Pennsylvania State University**

214 Duxbury Hall, 10:00am

Title: A nonparametric graphical model for functional data with application to brain networks based on fMRI

Abstract: We introduce a nonparametric graphical model whose observations on vertices are functions. Many modern applications, such as electroencephalogram and functional magnetic resonance imaging (fMRI), produce data are of this type. The model is based on Additive Conditional Independence (ACI), a statistical relation that captures the spirit of conditional independence without resorting to multi-dimensional kernels. The random functions are assumed to reside in a Hilbert space. No distributional assumption is imposed on the random functions: instead, their statistical relations are characterized nonparametrically by a second Hilbert space, which is a reproducing kernel Hilbert space whose kernel is determined by the inner product of the first Hilbert space. A precision operator is then constructed based on the second space, which characterizes ACI, and hence also the graph.

The resulting estimator is relatively easy to compute, requiring no iterative optimization or inversion of large matrices.

We establish the consistency the convergence rate of the estimator. Through simulation studies we demonstrate that the estimator performs better than the functional Gaussian graphical model when the relations among vertices are nonlinear or heteroscedastic. The method is applied to an fMRI data set to construct brain networks for patients with attention-deficit/hyperactivity disorder.

**Friday, October 28: Dr. Glen Laird, Sanofi**

214 Duxbury Hall, 10:00am

Title: Statistical Considerations for Pharmaceutical Industry Clinical Trials

Abstract: Clinical trials are the key evidence drivers for the pharmaceutical industry. These trials use a set of statistical methods particular to the setting and regulatory environment. In the context of oncology clinical trials an overview of selected methodological topics will be presented including multiplicity, dose escalation methods, Simon designs, and interim analyses.

**Friday, November 4: Dr. Andre Rogatko, Cedars-Sinai Medical Center**

214 Duxbury Hall, 10:00am

Title: Dose Finding with Escalation with Overdose Control in Cancer Clinical Trials

Abstract: Escalation With Overdose Control (EWOC) is a Bayesian adaptive dose finding design that produces consistent sequences of doses while controlling the probability that patients are overdosed. EWOC was the first dose-finding procedure to directly incorporate the ethical constraint of minimizing the chance of treating patients at unacceptably high doses. Its defining property is that the expected proportion of patients treated at doses above the maximum tolerated dose (MTD) is equal to a specified value α, the feasibility bound. Topics to be discussed include: two-parameter logistic model, use of covariate in prospective clinical trial, drug combinations, and Web-EWOC, a free interactive web tool for designing and conducting dose finding trials in cancer https://biostatistics.csmc.edu/ewoc/ewocWeb.php.

**Friday, November 18: Dr. Mike Daniels, University of Texas at Austin**

214 Duxbury Hall, 10:00am

Title: To be announced

Abstract: To be announced

**Friday, December 2: Dr. Martin Lindquist, Johns Hopkins University**

214 Duxbury Hall, 10:00am

Title: High-dimensional Multivariate Mediation with Application to Neuroimaging Data

Abstract: Mediation analysis is an important tool in the behavioral sciences for investigating the role of intermediate variables that lie in the path between a randomized treatment/exposure and an outcome variable. The influence of the intermediate variable on the outcome is often explored using structural equation models (SEMs), with model coefficients interpreted as possible effects. While there has been significant research on the topic in recent years, little work has been done on mediation analysis when the intermediate variable (mediator) is a high-dimensional vector. In this work we introduce a novel method for mediation analysis in this setting called the directions of mediation (DMs). The DMs represent an orthogonal transformation of the space spanned by the set of mediators, chosen so that the transformed mediators are ranked based upon the proportion of the likelihood of the full SEM that they explain. We provide an estimation algorithm and establish the asymptotic properties of the obtained estimators. We demonstrate the method using a functional magnetic resonance imaging (fMRI) study of thermal pain where we are interested in determining which brain locations mediate the relationship between the application of a thermal stimulus and self- reported pain.

## Spring 2016 Colloquia

**Friday, January 8: Dr. Dehan Kong, University of North Carolina (Faculty Candidate)**

HCB 103, 10:00am

Title: High-dimensional Matrix Linear Regression Model

Abstract: We develop a high-dimensional matrix linear regression model (HMLRM) to correlate matrix responses with high-dimensional scalar covariates when coefficient matrices have low-rank structures. We propose a fast and efficient screening procedure based on the spectral norm to deal with the case that the dimension of scalar covariates is ultra-high. We develop an efficient estimation procedure based on the nuclear norm regularization, which explicitly borrows the matrix structure of coefficient matrices. We systematically investigate various theoretical properties of our estimators, including estimation consistency, rank consistency, and the sure independence screening property under HMLRM. We examine the finite-sample performance of our methods using simulations and a large-scale imaging genetic dataset collected by the Alzheimer's Disease Neuroimaging Initiative study.

**Tuesday, January 12: Naveen Narisetty, University of Michigan (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Consistent and Scalable Bayesian Model Selection for High Dimensional Data

Abstract: The Bayesian paradigm offers a flexible modeling framework for statistical analysis, but relative to penalization-based methods, little is known about the consistency of Bayesian model selection methods in the high dimensional setting. I will present a new framework for understanding Bayesian model selection consistency, using sample size dependent spike and slab priors that help achieve appropriate shrinkage. More specifically, strong selection consistency is established in the sense that the posterior probability of the true model converges to one even when the number of covariates grows nearly exponentially with the sample size. Furthermore, the posterior on the model space is asymptotically similar to the L0 penalized likelihood. I will also introduce a new Gibbs sampling algorithm for posterior computation, which is much more scalable for high dimensional problems than the standard Gibbs sampler, and yet retains the strong selection consistency property. The new algorithm and the consistency theory work for a variety of problems including linear and logistic regressions, and a more challenging problem of censored quantile regression where a non-convex loss function is involved.

**Friday, January 15: Jonathan Bradley, University of Missouri (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Computationally Efficient Distribution Theory for Bayesian Inference of High-Dimensional Dependent Count-Valued Data

Abstract: We introduce a Bayesian approach for multivariate spatio-temporal prediction for high-dimensional count-valued data. Our primary interest is when there are possibly millions of data points referenced over different variables, geographic regions, and times. This problem requires extensive methodological advancements, as jointly modeling correlated data of this size leads to the so-called "big n problem." The computational complexity of prediction in this setting is further exacerbated by acknowledging that count-valued data are naturally non-Gaussian. Thus, we develop a new computationally efficient distribution theory for this setting. In particular, we introduce a multivariate log-gamma distribution and provide substantial theoretical development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, and full-conditional distributions for a Gibbs sampler. To incorporate dependence between variables, regions, and time points, a multivariate spatio-temporal mixed effects model (MSTM) is used. The results in this manuscript are extremely general, and can be used for data that exhibit fewer sources of dependency than what we consider (e.g., multivariate, spatial-only, or spatio-temporal-only data). Hence, the implications of our modeling framework may have a large impact on the general problem of jointly modeling correlated count-valued data. We show the effectiveness of our approach through a simulation study. Additionally, we demonstrate our proposed methodology with an important application analyzing data obtained from the Longitudinal Employer-Household Dynamics (LEHD) program, which is administered by the U.S. Census Bureau.

**Friday, January 22: Abhra Sarkar, Duke University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Novel Statistical Frameworks for Analysis of Structured Sequential Data

Abstract: We are developing a broad array of novel statistical frameworks for analyzing complex sequential data sets. Our research is primarily motivated by a collaboration with neuroscientists trying to understand the neurological, genetic and evolutionary basis of human communication using bird and mouse models. The data sets comprise structured sequences of syllables or `songs' produced by animals from different genotypes under different experimental conditions. The primary goal is then to elucidate the roles of different genotypes and experimental conditions on animal vocalization behaviors and capabilities. We have developed novel statistical methods based on first order Markovian dynamics that help answer these important scientific queries. First order dynamics is, however, insufficiently flexible to learn complex serial dependency structures and systematic patterns in the vocalizations, an important secondary goal in these studies. To this end, we have developed a sophisticated nonparametric Bayesian approach to higher order Markov chains building on probabilistic tensor factorization techniques. Our proposed method is of very broad utility, with applications not limited to analysis of animal vocalizations, and provides new insights into the serial dependency structures of many previously analyzed sequential data sets arising from diverse application areas. Our method has appealing theoretical properties and practical advantages, and achieves substantial gains in performance compared to previously existing methods. Our research also paves the way to advanced automated methods for more sophisticated dynamical systems, including higher order hidden Markov models that can accommodate more general data types.

**Tuesday, January 26: Alexander Petersen, University of California, Davis (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Representation of Samples of Density Functions and Regression for Random Objects

Abstract: In the first part of this talk, we will discuss challenges associated with the analysis of samples of one-dimensional density functions. Due to their inherent constraints, densities do not live in a vector space and therefore commonly used Hilbert space based methods of functional data analysis are not appropriate. To address this problem, we introduce a transformation approach, mapping probability densities to a Hilbert space of functions through a continuous and invertible map. Basic methods of functional data analysis, such as the construction of functional modes of variation, functional regression or classification, are then implemented by using representations of the densities in this linear space. Transformations of interest include log quantile density and log hazard transformations, among others. Rates of convergence are derived, taking into account the necessary preprocessing step of density estimation. The proposed methods are illustrated through applications in brain imaging.

The second part of the talk will address the more general problem of analyzing complex data that are non-Euclidean and specifically do not lie in a vector space. To address the need for statistical methods for such data, we introduce the concept of Fr\'echet regression. This is a general approach to regression when responses are complex random objects in a metric space and predictors are in $\mathcal{R}^p$. We develop generalized versions of both global least squares regression and local weighted least squares smoothing. We derive asymptotic rates of convergence for the corresponding sample based fitted regressions to the population targets under suitable regularity conditions by applying empirical process methods. Illustrative examples include responses that consist of probability distributions and correlation matrices, and we demonstrate the proposed Fr\'echet regression for demographic and brain imaging data.

**Friday, January 29: Dr. Yifei Sun, Johns Hopkins University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Recurrent Marker Processes in the Presence of Competing Terminal Events

Abstract: In follow-up studies, utility marker measurements are usually collected upon the occurrence of recurrent events until a terminal event such as death takes place. In this talk, we define the recurrent marker process to characterize utility accumulation over time. For example, with medical cost and repeated hospitalizations being treated as marker and recurrent events respectively, the recurrent marker process is the trajectory of total medical cost spent, which stops to increase after death. In many applications, competing risks arise as subjects are at risk of more than one mutually exclusive terminal event, such as death from different causes, and modeling the recurrent marker process for each failure type is often of interest. However, censoring creates challenges in the methodological development, because for censored subjects, both failure type and recurrent marker process after censoring are unobserved. To circumvent this problem, we propose a nonparametric framework for analyzing this type of data. In the presence of competing risks, we start with an estimator by using marker information from uncensored subjects. As a result, the estimator can be inefficient under heavy censoring. To improve efficiency, we propose a second estimator by combining the first estimator with auxiliary information from the estimate under non-competing risks model. The large sample properties and optimality of the second estimator is established. Simulation studies and an application to the SEER-Medicare linked data are presented to illustrate the proposed methods.

**Tuesday, February 2: Guan Yu, University of North Carolina at Chapel Hill (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Supervised Learning Incorporating Graphical Structure among Predictors

Abstract: With the abundance of high dimensional data in various disciplines, regularization techniques are very popular these days. Despite the success of these techniques, some challenges remain. One challenge is the development of effi cient methods incorporating structure information among predictors. Typically, the structure information among predictors can be modeled by the connectivity of an undirected graph using all predictors as nodes of the graph. In this talk, I will introduce an e cient regularization technique incorporating graphical structure information among predictors. Specifi cally, according to the undirected graph, we use a latent group lasso penalty to utilize the graph node-by-node. The predictors connected in the graph are encouraged to be selected jointly. This new regularization technique can be used for many supervised learning problems. For sparse regression, our new method using the proposed regularization technique includes adaptive Lasso, group Lasso, and ridge regression as special cases. Theoretical studies show that it enjoys model selection consistency and acquires tight fi nite sample bounds for estimation and prediction. For the multi-task learning problem, our proposed graph-guided multi-task method includes the popular `2;1-norm regularized multi-task learning method as a special case. Numerical studies using simulated datasets and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset also demonstrate the eff ectiveness of the proposed methods.

**Friday, February 5: Dr. Yun Yang, Duke University (Faculty Candidate)**

214 Duxbury Hall, 10:00am

Title: Computationally efficient high-dimensional variable selection via Bayesian procedures

Abstract: Variable selection is fundamental in many high-dimensional statistical problems with sparsity structures. Much of the literature is based on optimization methods, where penalty terms are incorporated that yield both convex and non-convex optimization problems. In this talk, I will take a Bayesian point of view on high-dimensional regression, by placing a prior on the model space and performing the necessary integration so as to obtain a posterior distribution. In particular, I will show that a Bayesian approach can consistently select all relevant covariates under relatively mild conditions from a frequentist point of view.

Although Bayesian procedures for variable selection are provably effective and easy to implement, it has been suggested by many statisticians that Markov Chain Monte Carlo (MCMC) algorithms for sampling from the posterior distributions may need a long time to converge, as sampling from an exponentially large number of sub-models is an intrinsically hard problem. Surprisingly, our work shows that this plausible "exponentially many model" argument is misleading. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee the rapid mixing of a particular Metropolis-Hastings algorithm. The number of iterations for this Markov chain to reach stationarity is linear in the number of covariates up to a logarithmic factor.

**Friday, February 19: Dr. Somnath Datta, University of Florida**

214 Duxbury Hall, 10:00am

Title: Multi-Sample Adjusted U-Statistics that Account for Confounding Covariates

Abstract: Multi-sample U-statistics encompass a wide class of test statistics that allow the comparison of two or more distributions. U-statistics are especially powerful because they can be applied to both numeric and non-numeric (e.g., textual) data. However, when comparing the distribution of a variable across two or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (using the propensity score for prospective data or the stratification score for retrospective data) to construct adjusted U-statistics that can test the equality of distributions across two (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our procedures is demonstrated through simulation studies as well as an analysis of genetic data.

**Friday, March 4: Dr. Guang Cheng, Purdue University**

214 Duxbury Hall, 10:00am

Title: How Many Processors Do We Really Need in Parallel Computing?

Abstract: This talk explores statistical versus computational trade-off to address a basic question in a typical divide-and-conquer setup: what is the minimal computational cost in obtaining statistical optimality? In smoothing spline models, we observe an intriguing phase transition phenomenon for the number of deployed machines that ends up being a simple proxy for computing cost. Specifically, a sharp upper bound for the number of machines is established when the number is below this bound, statistical optimality (in terms of nonparametric estimation or testing) is achievable; otherwise, statistical optimality becomes impossible.

**Friday, March 18: Dr. Tianfu Wu, University of California, Los Angeles**

214 Duxbury Hall, 10:00am

Title: Towards a Visual Turing Test and Lifelong Learning: Learning Deep Hierarchical Models and Cost-sensitive Decision Policies for Understanding Visual Big Data

Abstract: Modern technological advances produce data at breathtaking scales and complexities such as the images and videos on the web. Such big data require highly expressive models for their representation, understanding and prediction. To fit such models to the big data, it is essential to develop practical learning methods and fast inferential algorithms. My research has been focused on learning expressive hierarchical models and fast inference algorithms with homogeneous representation and architecture to tackle the underlying complexities in such big data from statistical perspectives. In this talk, with emphasis on a visual restricted Turing test -- the grand challenge in computer vision, I will introduce my work on (i) Statistical Learning of Large Scale and Highly Expressive Hierarchical Models from Big Data, and (ii) Bottom-up/Top-down Inference with Hierarchical Models by Learning Near-Optimal Cost-Sensitive Decision Policies. Applications in object detection, online object tracking and robot autonomy will be discussed.

**Friday, March 25: Dr. Faming Liang, University of Florida**

214 Duxbury Hall, 10:00am

Title: A Blockwise Coordinate Consistent Method for High-Dimensional Parameter Estimation

Abstract: The dramatic improvement in data collection and acquisition technologies in the last decades has enabled scientists to collect a great amount of high dimensional data. Due to their intrinsic nature, many of the high dimensional data, such as omics data and genome-wide association study (GWAS) data, have a much smaller sample size than their dimension (a.k.a. small-n-large-P). How to estimate the parameters for the high dimensional models with a small sample size is still a challenge problem though substantial progress has been obtained in the last decades. The popular method to this problem is regularization, but which can perform badly when the sample size is small and the variables are highly correlated. To alleviate this difficulty, we propose a blockwise coordinate consistent (BCC) method, which works by maximizing a new objective function---expectation of the log-likelihood function using a cyclic algorithm: iteratively finding consistent estimates for each block of parameters conditional on the current estimates of the other parameters. The BCC method reduces the high dimensional parameter estimation problem to a series of low dimensional parameter estimation problems and is ready to be applied to parameter estimation for the complicated models used in big data analysis. Our numerical results indicate that BCC can provide a drastic improvement in both parameter estimation and variable selection over the regularization methods for high dimensional systems.

**Friday, April 1: Dr. George Michailidis, University of Florida**

214 Duxbury Hall, 10:00am

Title: Estimating high-dimensional multi-layered networks through penalized maximum likelihood

Abstract: Gaussian graphical models represent a good tool for capturing interactions between nodes represent the underlying random variables. However, in many applications in biology one is interested in modeling associations both between, as well as within molecular compartments (e.g., interactions between genes and proteins/metabolites). To this end, inferring multi-layered network structures from high-dimensional data provides insight into understanding the conditional relationships among nodes within layers, after adjusting for and quantifying the effects of nodes from other layers. We propose an integrated algorithmic approach for estimating multi-layered networks, that incorporates a screening step for significant variables, an optimization algorithm for estimating the key model parameters and a stability selection step for selecting the most stable effects. The proposed methodology offers an efficient way of estimating the edges within and across layers iteratively, by solving an optimization problem constructed based on penalized maximum likelihood (under a Gaussianity assumption). The optimization is solved on a reduced parameter space that is identified through screening, which remedies the instability in high-dimension. Theoretical properties are considered to ensure identifiability and consistent estimation of the parameters and convergence of the optimization algorithm, despite the lack of global convexity. The performance of the methodology is illustrated on synthetic data sets and on an application on gene and metabolic expression data for patients with renal disease.

**Friday, April 8: Dr. Qian Zhang, FSU College of Education**

214 Duxbury Hall, 10:00am

Title: A Comparison of Methods for Estimating Moderation Effects with Missing Data in the Predictors

Abstract: The most widely used statistical model for conducting moderation analysis is the moderated multiple regression (MMR) model. While conducting moderation analysis using MMR models, missing data could pose a challenge, mainly because of the nonlinear interaction term. In the study, we consider a simple MMR model, where the effect of predictor *X* on the outcome *Y* is moderated by a moderator *U*. The primary interest is to find ways of estimating and testing the moderation effect with the existence of missing data in the predictor *X*. We mainly focus on cases when *X* is missing completely at random and missing at random. Theoretically, it is found in the study that the existing methods including normal-distribution-based maximum likelihood estimation (NML) and normal-distribution-based multiple imputation (NMI) yield inconsistent moderation effect estimates when data are missing at random. To cope with this issue, Bayesian estimation (BE) is proposed. To compare the existing methods and the proposed BE under finite sample sizes, a simulation study is also conducted. Results indicate that the methods in comparison have different relative performance depending on various factors. The factors are missing data mechanisms, roles of variables responsible for missingness, population moderation effect sizes, sample sizes, missing data proportions, and distributions of predictor *X*. Limitations of the study and future research directions are also discussed.

**Friday, April 15: Wei Sun, Yahoo Research**

214 Duxbury Hall, 10:00am

Title: Provable Sparse Tensor Decomposition and Its Application to Personalized Recommendation

Abstract: Tensor as a multi-dimensional generalization of matrix has received increasing attention in industry due to its success in personalized recommendation systems. Traditional recommendation systems are mainly based on the user-item matrix, whose entry denotes each user's preference for a particular item. To incorporate additional information into the analysis, such as the temporal behavior of users, we encounter a user-item-time tensor. Existing tensor decomposition methods for personalized recommendation are mostly established in the non-sparse regime where the decomposition components include all features. For high dimensional tensor-valued data, many features in the components essentially contain no information about the tensor structure, and thus there is a great need for a more appropriate method that can simultaneously perform tensor decomposition and select informative features.

In this talk, I will discuss a new sparse tensor decomposition method that incorporates the sparsity of each decomposition component to the CP tensor decomposition. Specifically, the sparsity is achieved via an efficient truncation procedure to directly solve an L0 sparsity constraint. In theory, in spite of the non-convexity of the optimization problem, it is proven that an alternating updating algorithm attains an estimator whose rate of convergence significantly improves those shown in non-sparse decomposition methods. As a by-product, our method is also widely applicable to solve a broad family of high dimensional latent variable models, including high dimensional Gaussian mixtures and mixtures of sparse regression. I will show the advantages of our method in two real applications, click-through rate prediction for online advertising and high dimensional gene clustering.

**Friday, April 22: Dr. Kun Chen, University of Connecticut**

214 Duxbury Hall, 10:00am

Title: Sequential Estimation in Sparse Factor Regression

Abstract: Multivariate regression models of large scales are increasingly required and formulated in various fields. A sparse singular value decomposition of the regression component matrix is appealing for achieving dimension reduction and facilitating model interpretation. However, how to recover such a composition of sparse and low-rank structures remains a challenging problem. By exploring the connections between factor analysis and reduced-rank regression, we formulate the problem as a sparse factor regression and develop an efficient sequential estimation procedure. At each sequential step, a latent factor is constructed as a sparse linear combination of the observed predictors, for predicting the responses after accounting for the effects of the previously found latent factors. Comparing to the complicated joint estimation approach, a prominent feature of our proposed sequential method is that each step reduces to a simple regularized unit-rank regression, in which the orthogonality requirement among the sparse factors becomes optional rather than necessary. The ideas of coordinate descent and Bregman iterative methods are utilized to ensure fast computation and algorithmic convergence, even in the presence of missing data and when exact orthogonality is desired. Theoretically, we show that the sequential estimators enjoy the oracle properties for recovering the underlying sparse factor structure. The efficacy of the proposed approach is demonstrated by simulation studies and two real applications in genetics.