## Fall 2017 Colloquia

Friday, September 22nd: Dr. Jonathan Bradley, Florida State University** **

214 Duxbury Hall, 10:00am

Title: Hierarchical Models with Conditionally Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family

Abstract: We introduce a Bayesian approach for analyzing (possibly) high-dimensional dependent data that are distributed according to a member from the natural exponential family of distributions. This problem requires extensive methodological advancements, as jointly modeling high-dimensional dependent data leads to the so-called "big n problem." The computational complexity of the "big n problem" is further exacerbated when allowing for non-Gaussian data models, as is the case here. Thus, we develop new computationally efficient distribution theory for this setting. In particular, we introduce something we call the "conjugate multivariate distribution," which is motivated by the univariate distribution introduced in Diaconis and Ylvisaker (1979). Furthermore, we provide substantial theoretical and methodological development including: results regarding conditional distributions, an asymptotic relationship with the multivariate normal distribution, conjugate prior distributions, and full-conditional distributions for a Gibbs sampler. We demonstrate the proposed methodology through simulated examples and several real-data analyses.

Friday, September 29th: Dr. Annie (Peiyong) Qu, University of Illinois at Urbana-Champaign

214 Duxbury Hall, 10:00am

Title: Individualized Multi-directional Variable Selection

Abstract: In this talk, we propose an individualized variable selection approach to select different relevant variables for different individuals. In contrast to conventional model selection approaches, the key component of the new approach is to construct a separation penalty with multi-directional shrinkages including zero, which facilitates individualized modeling to distinguish strong signals from noisy ones. As a byproduct, the proposed model identifies subgroups among which individuals share similar effects, and thus improves estimation efficiency and personalized prediction accuracy. Another advantage of the proposed model is that it can incorporate within-subject correlation for longitudinal data. We provide a general theoretical foundation under a double-divergence modeling framework where the number of subjects and the number of repeated measurements both go to infinity, and therefore involves high dimensional individual parameters. In addition, we present the oracle property for the proposed estimator to ensure its optimal large sample property. Simulation studies and an application to HIV longitudinal data are illustrated to compare the new approach to existing penalization methods. This is joint work with Xiwei Tang.

Friday, October 6th: Dr. Hira Koul, Michigan State University

214 Duxbury Hall, 10:00am

Title: Minimum Distance Model Checking in Berkson Measurement Error Models with Validation Data

Abstract: We shall present some tests for fitting a parametric regression model in the presence of Berkson measurement error in the covariates without specifying the measurement error density but when validation data is available. The availability of validation data makes it possible to estimate calibrated regression function nonparametrically. The proposed class of tests are based on a class of minimized integrated square distances between a nonparametric estimate of the calibrated regression function and the parametric null model being fitted. The asymptotic normality results of these tests under the null hypothesis and of the corresponding minimum distance (m.d.) estimators of the null model parameters will be presented. Surprisingly, asymptotic null distributions of these test statistics are the same as in the case of known measurement error density, while those of the m.d. estimators are affected by the estimation of the calibrated regression function. A simulation study shows desirable performance of a member of the proposed class of estimators and tests.

October 13th: Dr. Xuiaohui Chen, University of Illinois at Urbana-Champaign

214 Duxbury Hall, 10:00am

Title: Gaussian and bootstrap approximations of high-dimensional U-statistics and their applications

Abstract: We shall first discuss the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical, the randomly reweighted, and the multiplier bootstraps as randomly reweighted quadratic forms, all asymptotically valid and inferentially first-order equivalent in high-dimensions.

The bootstrap methods are applied on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold.

Friday, October 20th: Dr. Yun Yang, Florida State University

214 Duxbury Hall, 10:00am

Title: Fast and Optimal Bayesian Inference via Variational Approximations

Abstract: We propose a variational approximation to Bayesian posterior distributions, called $\alpha$-VB, with provable statistical guarantees for models with and without latent variables. The standard variational approximation is a special case of $\alpha$-VB with $\alpha=1$. When $\alpha \in(0,1)$, a novel class of variational inequalities are developed for linking the Bayes risk under the variational approximation to the objective function in the variational optimization problem, implying that maximizing the evidence lower bound in variational inference has the effect of minimizing the Bayes risk within the variational density family. Operating in a frequentist setup, the variational inequalities imply that point estimates constructed from the $\alpha$-VB procedure converge at an optimal rate to the true parameter in a wide range of problems. We illustrate our general theory with a number of examples, including the mean-field variational approximation to (low)-high-dimensional Bayesian linear regression with spike and slab priors, mixture of Gaussian models, latent Dirichlet allocation, and (mixture of) Gaussian variational approximation in regular parametric models.

Friday, October 27th: Dr. Subhashis Ghosal, North Carolina State University

214 Duxbury Hall, 10:00am

Title: Coverage of Credible Bands for Nonparametric Regression Function and Derivatives

Abstract: Estimating derivatives of a multivariate regression function is an interesting example of an inverse problem but has not received much attention in the Bayesian literature. In this talk, we study coverage of Bayesian credible sets for the problem, with primary interest in uniform credible bands. A finite random series of B-splines prior is especially suitable for the purpose due to the availability of explicit posterior expressions and nice structure in the derivatives of B-splines. We develop useful bounds to show that slightly inflated credible sets for the regression function and its derivatives have high coverage in the frequentist sense, and hence a Bayesian’s quantification of uncertainty has frequentist justification. The results will be also used to construct credible sets for the regression mode with guaranteed frequentist coverage. The talk is based on joint work with William Weimin Yoo.

Friday, November 3rd: Dr. Kshitij Khare, University of Florida

214 Duxbury Hall, 10:00am

Title: Bayesian Inference for Gaussian Graphical Models Beyond Decomposable Graphs

Abstract: Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph G is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To ad- dress this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized G-Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the G-Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized G-Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs conains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

Friday, November 17th: Dr. Jayaram Sethuraman, Florida State University

214 Duxbury Hall, 10:00am

Title: The Origins of the Stick Breaking Construction of Dirichlet Priors

Abstract: “My 1994 paper gave a simple direct proof of the constructive definition of Dirichlet priors (Ferguson 1973) and did not dwell on how I got the idea for that construction. In this talk I will first describe the collection of all priors in the nonparametric problem and show how this description leads to the constructive definition, nowadays called the stick breaking construction. This also leads to the invariance under size biased property (ISBP) of the GEM distribution (the stick breaking part) which gives a simpler proof than in the 1994 paper for the posterior distribution of Dirichlet priors. All the ideas of this talk emanate from deeper understanding of the Blackwell and MacQueen paper in 1973.”

Friday, December 1st: Dr. Gen Li, Columbia University

214 Duxbury Hall, 10:00am

Title: A General Framework for the Association Analysis of Heterogeneous Data

Abstract: Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional datasets with continuous measurements. We develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two datasets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is devised for model fitting, and several variants are developed to expedite computation and achieve variable selection. The application to the Computer Audition Lab 500-song (CAL500) music annotation study sheds light on the relationship between acoustic features and semantic annotations, and provides an effective means for automatic annotation and music retrieval.

Friday, December 8th: Dr. Benjamin Alamar, ESPN

214 Duxbury Hall, 10:00am

Title: Steve Nash is My Nemesis

Abstract: NBA teams have a vested interested in utilizing predictive models to better understand the potential of NBA prospects. For most players, the only data available to base those projections on are the player’s performance in college. While this data is can help reduce the risk around players, certain players (eg

Steve Nash) significantly out perform any projection model. Leveraging research on expert performance to create novel measurements of performance may provide the key to significantly improving the performance of these predictive models.