Fall 2015 Colloquia
Friday, August 28: Dr. Hani Doss, University of Florida
214 Duxbury Hall, 10:00am
Title: A Markov Chain Monte Carlo Approach to Empirical Bayes Inference and Bayesian Sensitivity Analysis
Abstract: We consider situations in Bayesian analysis where the prior on the parameter theta is indexed by a continuous hyperparameter h, and we deal with two related problems. The first problem is as follows. Let m(h) be the marginal likelihood of the data (this is the likelihood of the data with theta integrated out). The problem is to construct a confidence interval (or region if h is multidimensional) for argmax_h m(h), the value of h for which the marginal likelihood of the data is largest. This value of h is, by definition, the empirical choice of h. If for each h we have an estimate \hat{m}(h) of m(h), then we may estimate argmax_h m(h) by first finding an estimate \hat{m}(h) for each h, and then taking argmax_h \hat{m}(h). The second problem is as follows. Suppose we fix a function f of theta. Let I(h) be the posterior expectation of f(theta) when the hyperparameter of the prior is h. The problem is to construct confidence bands for I(h). The first problem is in some sense a model selection problem, and the second is a form of Bayesian sensitivity analysis. The two problems are actually closely related in that to solve either of them we need uniformity (in h) of the convergence of the estimates. We show how tools from various parts of probability theory can be used to deal with these two problems. The methodology we develop applies very generally, and we show how it applies in particular to a commonly used model for Bayesian variable selection in linear regression. The hyperparameters governing the prior in this model have a big effect on subsequent inference, including variable selection, and we show how our methodology can be used for making inference about these hyperparameters, and give an illustration on a real data set. This is joint work with Yeonhee Park at MD Anderson Cancer Center.
Friday, September 4: Dr. P. Richard Hahn, The University of Chicago Booth School of Business
214 Duxbury Hall, 10:00am
Title: Penalized Utility Bayes Estimators
Abstract: In this talk, I describe how utility functions with an explicit parsimony term can be used to extract accessible summaries from high dimensional posterior distributions. I focus on the case of variable selection in linear and nonlinear regression models, and compare and contrast the utility approach to similar Bayesian (projection prediction) and non-Bayesian approaches (preconditioning). As an empirical demonstration, I analyze historical returns data on exchange traded funds (ETFs) with an eye towards selecting a small subset of inexpensive funds for implementing a passive investment strategy.
Friday, September 18: Dr. Hongtu Zhu, University of North Carolina
214 Duxbury Hall, 10:00am
Title: Big Data Integration in Biomedical Studies
Abstract: With the rapid growth of modern technology, many large-scale biomedical studies have collected massive datasets with large volumes of complex information (e.g., imaging, genetics, or clinical) from increasingly large cohorts, while high-dimensional missing data are frequently encountered in various stages of the data collection process. Simultaneously extracting and integrating rich and diverse heterogeneous information from such big data in the presence of high-dimensional missing data is critical for making major advances important for diagnosis, prevention, and treatment of numerous complex disorders (e.g., Alzheimer's disease). However, such extraction and integration in big data represent major computational and theoretical challenges for existing statistical methods. In this talk, we review three imminent challenges faced by researchers in the analysis of big data: (CH1) carrying out genome-wide single-nucleotide polymorphism (SNP)/marker set analysis for multivariate imaging phenotypes; (CH2) carrying out voxel-wise genome-wide SNP/marker set analysis for functional imaging phenotypes; and (CH3) integrating imaging, genetic, and clinical data both at baseline and longitudinally to predict time-to-event outcomes (e.g., time-to-disease onset).
Friday, September 25: Dr. Adrian Barbu, Florida State University
214 Duxbury Hall, 10:00am
Title: Feature Selection with Annealing for Computer Vision and Big Data Learning
Abstract: Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this work we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps dropping throughout the iterations makes it particularly suitable for big data learning. Our approach applies generically to the optimization of any differentiable loss function, and finds applications in regression, classification and ranking. The resultant algorithms build variable screening into estimation and are extremely simple to implement. With the help of prof. Yiyuan She we provide theoretical guarantees of convergence and selection consistency. In addition, one dimensional piecewise linear response functions are used to account for nonlinearity and a second order prior is imposed on these functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method compares very well with other state of the art methods in regression, classification and ranking while being computationally very efficient and scalable.
Friday, October 2: Dr. Brian Reich, North Carolina State University
214 Duxbury Hall, 10:00am
Title: Bayesian spatial variable selection.
Abstract: Multisite time series studies have reported evidence of an association between short term exposure to particulate matter (PM) and adverse health effects, but the effect size varies across the United States. Variability in the effect may partially be due to differing community level exposure and health characteristics, but also due to the chemical composition of PM which is known to vary greatly by location and time. The objective of this paper is to identify particularly harmful components of this chemical mixture. Because of the large number of highly-correlated components, we must incorporate some regularization into a statistical model. We assume that, at each spatial location, the regression coefficients come from a mixture model with the flavor of stochastic search variable selection, but utilize a copula to share information about variable inclusion and effect magnitude across locations. The model differs from current spatial variable selection techniques by accommodating both local and global variable selection. The model is used to study the association between fine PM (PM < 2.5 µm) components, measured at 115 counties nationally over the period 2000-2008, and cardiovascular emergency room admissions among Medicare patients.
Friday, October 9: Dr. Alec Kercheval, Florida State University Department of Mathematics
214 Duxbury Hall, 10:00am
Title: Multidimensional default risk for assets with jumps - a structural framework
Abstract: Asset default times have been traditionally modeled as the time when the asset value crosses below a threshold. Basket credit derivatives require understanding the joint distribution of crossing times for several assets, which becomes more difficult when the asset prices may have jumps. If instead of a price threshold, we use a log-returns threshold, it turns out we can get explicit formulas for default probabilities even in the multidimensional case. A key idea is the description of jump dependence via the concept of Levy copulas, due to Tankov.
Friday, October 16: Dr. Stephan Huckemann, Institute for Mathematical Stochastics
214 Duxbury Hall, 10:00am
Title: Statistical Analysis of Cytoskeleti: From Circular Scale Space Theory to Deconvolution with Sparse Asymptotics for In-Vivo Nano-Imaging of Biological Cells
Abstract: We are concerned with two structures of the cytoskeleton of a biologocal cell, actin-myosin filaments at a larger scale and microtubules at a very small scale. While the first structure is deemed highly important in early stem-cell diversification, the other structure plays an important role in cell division and cancer research. A prerequisite for successful statistical evaluation are imaging techniques that reliably reproduce ground truth data. For the larger scale structure such a tool chain is presented as well as a statistical evaluation procedure that is based on a newly developed circular scale space theory and specific persistence diagrams. For the small scale structure, enhanced nano-imaging techniques with visible light below Abbe's diffraction limit require sophisticated deconvolution methods with novel asymptotics in the sparsity.
Friday, October 23: Dr. Weikuan Yu, Florida State University Department of Computer Science
214 Duxbury Hall, 10:00am
Title: Case Studies of Fast Analytics on Social Networks and Bioinformatics
Abstract: The explosive growth of big data prompts many organizations to design and develop complex systems to meet their needs of processing data. This talk will cover two recent case studies on scalable data analytics. I will first present the design and evaluation of a novel parallel community detection algorithm. Our algorithm adopts the greedy policy derived from the state-of-the-art Louvain's modularity maximization. It combines a highly optimized graph mapping and data representation with an efficient communication runtime specifically designed to run large graph applications on scalable supercomputers. With the good convergence properties of the algorithm and the efficient implementation, we can analyze very large graphs in just a few seconds. Then I will briefly talk about our effort in developing a scalable bioinformatics toolkit called BioPig for large-scale sequencing data analysis. BioPig is built on Hadoop and Pig that enables easy parallel programming and scaling to datasets of terabytes. We show that our work on optimizing BioPig has led to significant improvement on the execution time of bioinformatics tools such as k-mer counting.
Friday, October 30: Dr. Betsy Becker, Florida State University Department of Education
214 Duxbury Hall, 10:00am
Title: Dependent slopes in meta-analysis
Abstract: Early work on meta-analysis involved summaries of simple summary indices like correlations and mean differences. The meta-analysis of complex studies has become increasingly important as the primary studies we wish to summarize have become more sophisticated and multivariate. One issue that arises in all areas of meta-analysis is dependence, because researchers often present results for several outcomes, multiple models, or several time points.
Syntheses of regression models have dealt with this dependence in a variety of ways. In this work we review those approaches and discuss why some common approaches are problematic by examining the degree of dependence among slopes using large sample theory and some graphical techniques.
Friday, November 6: Dr. Jing Lei, Carnegie Mellon University
214 Duxbury Hall, 10:00am
Title: A Framework for Distribution-Free Regression
Abstract: Standard theory in regression analysis has been plagued by the vulnerability of stringent model assumptions in the high dimensional setting, and the resulting inference often fails to take into account the modeling error. I will introduce a new inference framework for regression analysis, by combining the nonparametric rank and order statistics with recent advances in online learning theory. The proposed method is a generic tool that converts any point estimator to an interval predictor, producing prediction bands with valid average coverage under essentially no assumptions, while retaining the optimality of the initial point estimator under standard assumptions. The generality and flexibility of this framework will be illustrated through several topics in regression analysis, including in-sample prediction, variable selection, and prediction bands with adaptive local width. This talk is based on joint work with Larry Wasserman, Ryan Tibshirani, Alessandro Rinaldo, and Max G'Sell.
Friday, November 13: Dr. Yanyuan Ma, University of South Carolina
214 Duxbury Hall, 10:00am
Title: A Semiparametric Approach to Dimension Reduction
Abstract: We provide a novel and completely different approach to dimension-reduction problems from the existing literature. We cast the dimension reduction problem in a semiparametric estimation framework and derive estimating equations. Viewing this problem from the new angle allows us to derive a rich class of estimators, and obtain the classical dimension reduction techniques as special cases in this class. The semiparametric approach also reveals that in the inverse regression context while keeping the estimation structure intact, the common assumption of linearity and/or constant variance on the covariates can be removed at the cost of performing additional nonparametric regression.
Friday, November 20: Dr. Adrian Barbu, Florida State University Department of Statistics
214 Duxbury Hall, 10:00am
Title:Face Detection with a 3D Model
Abstract: This talk presents a part-based face detection approach where the spatial relationship between the face parts is represented by a hidden 3D model with six parameters. The computational complexity of the search in the six dimensional pose space is addressed by proposing meaningful 3D pose candidates by image-based regression from detected face keypoint locations. The 3D pose candidates are evaluated using a parameter sensitive classifier based on difference features relative to the 3D pose. A compatible subset of candidates is then obtained by non-maximal suppression. Experiments on two standard face detection datasets show that the proposed 3D model based approach obtains results comparable to or better than state of the art.