Fall 2021 Colloquia
All colloquia this semester will be held virtually, via Zoom, and are scheduled for 12:00pm on Fridays, unless otherwise noted. Meetings with each speaker for faculty and students will take place from 1:30pm to 2:30pm.
- 10/01/21: Raquel Prado (University of California at Santa Cruz)
- 10/08/21: Daniel Sheldon (University of Massachusetts at Amherst)
- 10/15/21: Shyamal Peddada (National Institute of Child Health and Human Development)
- 10/22/21: Efstathia Bura (Vienna University of Technology)
- 10/29/21: Georgia Papadogeorgou (University of Florida)
- 11/05/21: Emma Jingfei Zhang (University of Miami)
- 11/12/21: Chenlei Leng (University of Warwick)
- 11/19/21: David Dahl (Brigham Young University)
- 12/03/21: David Rossell (Barcelona Graduate School of Economics)
Title: Non-Asymptotic Aspects of Sampling From Heavy-Tailed Distributions via Transformed Langevin Monte Carlo
Abstract: Langevin Monte Carlo (LMC) algorithms and their stochastic versions are widely used for sampling and large-scale Bayesian inference. Non-asymptotic properties of the LMC algorithm have been examined intensely over the last decade. However, existing analyses are restricted to the case of light-tailed (yet multi-modal) densities. In this talk, I will first present a variable transformation based approach for sampling from heavy-tailed densities using the LMC algorithm. This algorithm is motivated by a related approach for Metropolis random walk algorithm by Johnson and Geyer, 2013. I will next present non-asymptotic oracle complexity analysis of the proposed algorithm with illustrative examples. It will be shown that the proposed approach 'works' as long as the heavy-tailed target density satisfies certain tail conditions closely related to the so-called weak-Poincaré inequality.
Title: Stick-Breaking Non-Parametric Priors via Dependent Length Variables
Abstract: In this talk, we present new classes of Bayesian nonparametric prior distributions. By allowing length random variables, in stick-breaking constructions, to be exchangeable or Markovian, appealing models for discrete random probability measures appear. Tuning the stochastic dependence in such length variables allows to recover extreme families of random probability measures, i.e. Dirichlet and Geometric processes. As a byproduct, the ordering of the weights, in the species sampling representation, can be controlled and thus tuned for efficient MCMC implementations in density estimation or unsupervised classification problems. Various theoretical properties and illustrations will be presented.
Title: Orthogonal Subsampling for Big Data Linear Regression
Abstract: The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal subsampling (OSS) approach for big data with a focus on linear regression models. The approach is inspired by the fact that an orthogonal array of two levels provides the best experimental design for linear regression models in the sense that it minimizes the average variance of the estimated parameters and provides the best predictions. The merits of OSS are three-fold: (i) it is easy to implement and fast; (ii) it is suitable for distributed parallel computing and ensures the subsamples selected in different batches have no common data points; and (iii) it outperforms existing methods in minimizing the mean squared errors of the estimated parameters and maximizing the efficiencies of the selected subsamples. Theoretical results and extensive numerical results show that the OSS approach is superior to existing subsampling approaches. It is also more robust to the presence of interactions among covariates and, when they do exist, OSS provides more precise estimates of the interaction effects than existing methods. The advantages of OSS are also illustrated through analysis of real data.