Department of Statistics
 Mu Sigma Rho National Statistics Honor Society - FSU Chapter

 Search:
 Colloquium Series
[Print View]

 Colloquia February 5, 2014, 11:14 am Senthilbalaji Girimurugan December 3, 2013, 2:00 pm Derek Tucker, FSU Statistics Dept, Essay Defense November 22, 2013, 10:00 am Dr. Washington Mio, FSU Dept. of Mathematics November 15, 2013, 9:00 am International Year of Statistics Conference at FSU November 12, 2013, 3:30 pm Mingfei Qiu, FSU Statistics, Essay Defense November 8, 2013, 10:00 am Dr. Todd Ogden, Columbia University, Dept. of Biostatistics November 5, 2013, 2:00 pm Darshan Bryner, FSU Dept. of Statistics, Dissertation Defense November 1, 2013, 4:00 pm Jose Laborde, FSU Dept. of Statistics Dissertation Defense November 1, 2013, 10:00 am Dr. Yiyuan She, FSU Dept. of Statistics October 25, 2013, 10:00 am Dr. Stephen Walker, University of Kent and University of Texas at Austin October 18, 2013, 10:00 am Dr. Steve Marron, UNC Chapel Hill October 17, 2013, 2:30 pm Qian Xie, FSU Statistics, Essay Defense October 11, 2013, 10:00 am Dr. Runze Li, Penn State University October 4, 2013, 10:00 am Dr. Robert Clickner, FSU Dept. of Statistics September 27, 2013, 1:00 pm Robert Holden September 27, 2013, 10:00 am Dr. Jim Hobert, University of Florida September 20, 2013, 10:00 am Dr. Betsy Hill, Medical University of South Carolina September 13, 2013, 10:00 am Dr. Debdeep Pati, FSU Dept. of Statistics September 6, 2013, 10:00 am Dr. Anuj Srivastava, FSU Dept. of Statistics July 2, 2013, 2:00 pm Oliver Galvis July 1, 2013, 11:00 am Seung-Yeon Ha June 27, 2013, 2:00 pm Felicia Williams June 19, 2013, 3:00 pm Yuanyuan Tang June 18, 2013, 10:00 am Ester Kim Nilles May 15, 2013, 2:00 pm Jingyong Su, FSU Dept. of Statistics, Dissertation Defense May 13, 2013, 11:00 am Yingfeng Tao May 6, 2013, 10:00 am Darshan Bryner, FSU Dept. of Statistics, Essay Defense May 1, 2013, 1:00 pm Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense April 29, 2013, 2:00 pm Wade Henning, FSU Dept. of Statistics, Essay Defense April 26, 2013, 10:00 am Paul Beaumont, FSU Department of Economics April 19, 2013, 10:00 am Wei Wu, FSU Dept. of Statistics April 15, 2013, 3:30 pm Michael Rosenthal, Ph.D Candidate Essay Defense April 12, 2013, 10:00 am Karim Lounici, Georgia Tech April 5, 2013, 10:00 am Russell G. Almond, FSU March 29, 2013, 10:00 am Genevera Allen, Rice University March 22, 2013, 10:00 am Xiaoming Huo, Georgia Tech March 20, 2013, 3:30 pm Jose Laborde, Ph.D Candidate, Essay Defense March 19, 2013, 3:00 pm Gretchen Rivera, FSU, Dissertation Defense March 8, 2013, 10:00 am Yongtao Guan, University of Miami March 4, 2013, 11:00 am Kelly McGinnity, FSU, Dissertation Defense March 1, 2013, 10:00 am Brian C. Monsell, US Census Bureau February 27, 2013, 10:00 am Rachel Becvarik, FSU, Dissertation Defense February 20, 2013, 2:00 pm Carl P. Schmertmann, Professor of Economics, at FSU February 15, 2013, 10:00 am Fred Huffer, FSU Dept. of Statistics January 25, 2013, 10:00 am Yin Xia January 23, 2013, 2:00 pm Ying Sun January 18, 2013, 10:00 am Minjing Tao January 14, 2013, 2:00 pm Naomi Brownstein January 11, 2013, 10:00 am Qing Mai December 13, 2012, 10:00 am Rommel Bain Department of Statistics, Florida State University, Dissertation Defense December 3, 2012, 1:00 pm Yuanyuan Tang, Department of Statistics, Florida State University, Essay Defense November 30, 2012, 2:00 pm Seungyeon Ha, Department of Statistics Florida State University, Essay Defense November 30, 2012, 10:00 am Yiyuan She, Department of Statistics, Florida State University November 16, 2012, 10:00 am Jiashun Jin, Department of Statistics, Carnegie Mellon University November 9, 2012, 10:00 am Ming Yuan, School of Industrial & Systems Engineering, Georgia Tech November 7, 2012, 3:35 pm David Bristol, Statistical Consulting Services, Inc. November 2, 2012, 10:00 am Jinfeng Zhang, Department of Statistics, FSU October 30, 2012, 10:00 am Steve Chung, Ph.D. Candidate October 29, 2012, 12:00 pm Emilola Abayomi, Ph.D Candidate, Dissertation October 26, 2012, 10:00 am Ciprian Crainiceanu, Department of Biostatistics, Johns Hopkins University October 19, 2012, 10:00 am Gareth James, Marshall School of Business, University of South California October 12, 2012, 10:00 am Michelle Arbeitman, College of Medicine, FSU October 5, 2012, 10:00 am Adrian Barbu, Dept. of Statistics, FSU September 28, 2012, 10:00 am Vladimir Koltchinskii, Dept. of Mathematics, Georgia Tech September 21, 2012, 10:00 am Xiaotong Shen, John Black Johnston Distinguished Professor, School of Statistics, University of Minnesota September 14, 2012, 10:00 am Xiuwen Liu, FSU Dept. of Computer Science August 9, 2012, 11:00 am Senthil Girimurugan May 4, 2012, 10:00 am Jingyong Su, FSU Dept. of Statistics April 27, 2012, 3:30 pm Ester Kim, FSU Dept of Statistics April 27, 2012, 10:00 am Sebastian Kurtek, Ph.D Candidate, Dissertation April 20, 2012, 10:00 am Sunil Rao, University of Miami April 13, 2012, 10:00 am Gretchen Rivera, FSU Dept. of Statistics April 6, 2012, 10:00 am Xu Han, University of Florida March 30, 2012, 2:00 pm Jordan Cuevas, Ph.D Candidate, Dissertation March 30, 2012, 10:00 am Jinfeng Zhang, FSU Dept. of Statistics March 29, 2012, 2:00 pm Paul Hill March 28, 2012, 9:00 am Rachel Becvarik , FSU Dept. of Statistics March 27, 2012, 3:30 pm Jihyung Shin, FSU Dept. of Statistics March 26, 2012, 1:00 pm Jianchang Lin March 23, 2012, 10:00 am Bob Clickner, FSU Dept. of Statistics March 16, 2012, 10:00 am Wei Wu, FSU Dept. of Statistics March 2, 2012, 10:00 am Piyush Kumar, FSU Dept. of Computer Science March 1, 2012, 11:00 am Jun Li, Dept. of Statistics, Stanford University February 29, 2012, 3:30 pm Cun-Hui Zhang, Rutgers University Dept. of Statistics February 29, 2012, 10:30 am Daniel Osborne, Ph.D candidate, FSU Dept. of Statistics February 28, 2012, 3:30 pm Eric Lock, Dept of Statistics, University of North Carolina at Chapel Hill February 27, 2012, 11:00 am Kelly McGinnity, FSU Dept. of Statistics February 16, 2012, 2:00 pm Alec Kercheval, FSU Dept. of Mathematics February 10, 2012, 3:30 pm Jennifer Geis, Ph.D. candidate, FSU Dept. of Statistics February 10, 2012, 10:00 am Debdeep Pati February 3, 2012, 10:00 am Zhihua Sophia Su January 27, 2012, 10:00 am Harry Crane January 20, 2012, 10:00 am Anindra Bhadra January 13, 2012, 10:00 am Xinge Jessie Jeng January 10, 2012, 3:30 pm Ingram Olkin

 February 5, 2014 Speaker: Senthilbalaji Girimurugan Title: Nonlinear multivariate tests for high-dimensional data using wavelets with applications in genomics and engineering When: February 5, 2014 11:14 am Where: 499 DSL Abstract: Abstract: Gaussian processes are not uncommon in various fields of science such as engineering, genomics, quantitative finance, astronomy, to name a few. In fact, such processes are special cases in a broader class of data known as functional data. When the underlying mean response of a process is a function, the resulting data from these processes are functional responses and specialized statistical tools are required in their analysis. The methodology discussed in this work offers non-parametric tests that can detect differences in such data with greater power and good control of Type-I error over existing methods. The incorporation of Wavelet Transforms makes the test an efficient approach due to its de-correlation properties. These tests are designed primarily to handle functional responses from multiple treatments simultaneously and generally are extensible to high dimensional data. The sparseness introduced by Wavelet Transforms is another advantage of this test when compared to traditional tests. In addition to offering a theoretical framework, several applications of such tests in the fields of engineering, genomics and quantitative finance are also discussed Back To Top

 December 3, 2013 Speaker: Derek Tucker, FSU Statistics Dept, Essay Defense Title: Functional Component Analysis using Elastic Methods When: December 3, 2013 2:00 pm Where: OSB 215 Abstract: Constructing generative models for functional observations is an important task in statistical functional analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using cross-sectional techniques such as functional principal component analysis for dimensional reduction and data modeling. Ignoring phase variability leads to a loss of structure in the data and inefficiency in data models. Moreover, most methods use a "pre-processing'' alignment step to remove the phase-variability without considering a more natural joint solution. This essay presents two approaches to this problem. The first relies on separating the phase (x-axis) and amplitude (y-axis), then modeling these components using joint distributions. This separation, in turn, is performed using a technique called elastic shape analysis of curves that involves a new mathematical representation of functional data. Then, using individual fPCAs, one each for phase and amplitude components, it imposes joint probability models on principal coefficients of these components while respecting the nonlinear geometry of the phase representation space. The second combines the phase-variability into the objective function for two component analysis methods, functional principal component analysis and functional principal least squares. This creates a more complete solution as the phase-variability is removed while simultaneously extracting the components. These ideas are demonstrated using random sampling, for models estimated from simulated and real datasets, and show their superiority over models that ignore phase-amplitude separation. Furthermore, the models are applied to classification of functional data and achieve high performance in applications involving SONAR signals of underwater objects, handwritten signatures, and periodic body movements recorded by smart phones. Back To Top

 November 22, 2013 Speaker: Dr. Washington Mio, FSU Dept. of Mathematics Title: Taming Shapes and Understanding Their Variation When: November 22, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: Quantification and interpretation of shape variation are problems that arise in multiple domains of biology and medicine. Problems such as understanding evolution and inheritance of phenotypic traits, genetic determinants of morphological traits, normal and pathological changes in the anatomy of organs and tissues, all involve shape analysis. Shape data can be quite irregular, as exemplified by images of gene expression domains and noisy 3D scans. Thus, a companion problem is that of regularizing shapes to make them amenable to analyses. In this talk, we will discuss developments in shape regularization and analysis that let us address some of these problems. We will illustrate the methods with applications to biomedical imaging. Back To Top

 November 15, 2013 Speaker: International Year of Statistics Conference at FSU Title: When: November 15, 2013 9:00 am Where: 214 Duxbury Hall (Nursing) Abstract: http://stat.fsu.edu/iyos.php/ Back To Top

 November 12, 2013 Speaker: Mingfei Qiu, FSU Statistics, Essay Defense Title: Object Data Analysis on Hilbert Manifolds When: November 12, 2013 3:30 pm Where: OSB 205 Abstract: The nonparametric methodology for projective shape analysis of a 3D configuration from its 2D regular camera images, opens the door to the one- and two-sample hypothesis test applied in areas like quality control, face recognition and scene detection. By the projective frame method, one could transform the sample points in the configuration to the convenient positions. Using the nonparametric bootstrap methodology for extrinsic means, one may obtain the confidence regions for the means. However, the studendization fails if the size of the configuration is infinite, leadingto a data analysis on Hilbert manifolds. The analysis on Hilbert manifolds by neighborhood hypothesis test helps avoiding the problem. In this paper, applications of the projective manifold data analysis are given. Back To Top

 November 8, 2013 Speaker: Dr. Todd Ogden, Columbia University, Dept. of Biostatistics Title: Images as predictors in regression models with scalar outcomes When: November 8, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: One situation that arises in the field of functional data analysis is the use of imaging data or other very high dimensional data as predictors in regression models. A motivating example involves using baseline images of a patient's brain to predict the patient's clinical outcome. Interest lies both in making such patient-specific predictions and in understanding the relationship between the imaging data and the outcome. Obtaining meaningful fits in such problems requires some type of dimension reduction but this must be done while taking into account the particular (spatial) structure of the data. This talk will describe some of the general tools that have proven effective in this context, including principal component analysis, penalized splines, and wavelet analysis. Back To Top

 November 5, 2013 Speaker: Darshan Bryner, FSU Dept. of Statistics, Dissertation Defense Title: 2D Affine and Projective Shape Analysis, and Bayesian Elastic Active Contours When: November 5, 2013 2:00 pm Where: OSB 215 Abstract: An object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian framework for shape analysis where metrics and related quantities are invariant to larger groups, the affine and projective groups, that approximate such transformations that arise from perspective skews. Using this framework, we develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussian-type statistical models in the affine and projective shape spaces. We then present a variational framework for naturally incorporating these shape models as prior knowledge in guidance of active contours for boundary extraction in images. This so-called Bayesian active contour framework is especially suitable for images where boundary estimation is difficult due to low contrast, low resolution, and presence of noise and clutter. In practice, the training shapes used for prior-shape models may be collected from viewing angles different from those for the test images. By allowing for a prior shape model to be invariant to affine transformations of elastic curves, we present an active contour algorithm where the resulting segmentation is robust to perspective skews. Back To Top

 November 1, 2013 Speaker: Jose Laborde, FSU Dept. of Statistics Dissertation Defense Title: Elastic Shape Analysis of RNAs and Proteins When: November 1, 2013 4:00 pm Where: OSB 215 Abstract: Proteins and RNAs are molecular machines performing biological functions in the cells of all organisms. Automatic comparison and classification of these biomolecules are fundamental yet open problems in the field of Structural Bioinformatics. An outstanding unsolved issue is the definition and efficient computation of a formal distance between any two biomolecules. Current methods use alignment scores, which are not proper distances, to derive statistical tests for comparison and classifications. This work applies Elastic Shape Analysis (ESA), a method recently developed in computer vision, to construct rigorous mathematical and statistical frameworks for the comparison, clustering and classification of proteins and RNAs. ESA treats bio molecular structures as 3D parameterized curves, which are represented with a special map called the square root velocity function (SRVF). In the resulting shape space of elastic curves, one can perform statistical analysis of curves as if they were random variables. One can compare, match and deform one curve into another, or as well as compute averages and covariances of curve populations, and perform hypothesis testing and classification of curves according to their shapes. We have successfully applied ESA to the comparison and classification of protein and RNA structures. We further extend the ESA framework to incorporate additional non-geometric information that tags the shape of the molecules (namely, the sequence of nucleotide/amino-acid letters for RNAs/proteins and, in the latter case, also the labels for the so-called secondary structure). The biological representation is chosen such that the ESA framework continues to be mathematically formal. We have achieved superior classification of RNA functions compared to state-of-the-art methods on benchmark RNA datasets which has led to the publication of this work in the journal, Nucleic Acids Research (NAR). Based on the ESA distances, we have also developed a fast method to classify protein domains by using a representative set of protein structures generated by a clustering-based technique we call Multiple Centroid Class Partitioning (MCCP). Comparison with other standard approaches showed that MCCP significantly improves the accuracy while keeping the representative set smaller than the other methods. The current schemes for the classification and organization of proteins (such as SCOP and CATH) assume a discrete space of their structures, where a protein is classified into one and only one class in a hierarchical tree structure. Our recent study, and studies by other researchers, showed that the protein structure space is more continuous than discrete. To capture the complex but quantifiable continuous nature of protein structures, we propose to organize these molecules using a network model, where individual proteins are mapped to possibly multiple nodes of classes, each associated with a probability. Structural classes will then be connected to form a network based on overlaps of corresponding probability distributions in the structural space. Back To Top

 November 1, 2013 Speaker: Dr. Yiyuan She, FSU Dept. of Statistics Title: Network Analytics: A Statistical Perspective When: November 1, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: Due to an explosion of network data from many fields of science, there is an increasing need for statistical techniques and tools to analyze network structure and identify system dynamics. Modern network data of interest may have ultra-high dimensionality and strong node association. System characteristics such as stationarity and asymptotic stability may strongly influence network parameter estimation and topology identification. Node transitions and correlations may share similar structures. The talks aims to provide some statistical insights into network learning to address the challenges. Motivated from real-world datasets, we discuss stationary sparse causality network learning, joint association graph screening and decomposition and topology, and dynamics estimation in large-scale recurrent networks. The examples and applications reveal the important role of statistical machine learning in network analytics. Back To Top

 October 25, 2013 Speaker: Dr. Stephen Walker, University of Kent and University of Texas at Austin Title: On the Equivalence between Bayesian and Classical Hypothesis Testing When: October 25, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: : For hypotheses of the type H_0 :\theta = \theta_0 vs H_1 :\theta ne \theta_0 we demonstrate the equivalence of a Bayesian hypothesis test using a Bayes factor and the corresponding classical test, for a large class of models, which are detailed in the talk. In particular, we show that the role of the prior and critical region for the Bayes factor test is only to specify the type I error of the test. This is their only role since, as we show, the power function of the Bayes factor test coincides exactly with that of the classical test, once the type I error has been fixed. This is joint work with Tom Shively, at the McCombs Business School, University of Texas at Austin. Back To Top

 October 18, 2013 Speaker: Dr. Steve Marron, UNC Chapel Hill Title: Object Oriented Data Analysis When: October 18, 2013 10:00 am Where: 204 Duxbury Hall (Nursing) - note, not 214 auditorium Abstract: Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics. Back To Top

 October 17, 2013 Speaker: Qian Xie, FSU Statistics, Essay Defense Title: Parallel Transport of Deformations in Shape Space of Elastic Surfaces When: October 17, 2013 2:30 pm Where: OSB 205 Abstract: Statistical shape analysis develops methods for comparisons, deformations, summarizations, and modeling of shapes in given data sets. These tasks require a fundamental tool called parallel transport of tangent vectors along arbitrary paths. This tool is essential for: (1) computation of geodesic paths using either shooting or path-straightening method, (2) transferring deformations across objects, and (3) modeling of statistical variability in shapes. Using the square-root normal field (SRNF) representation of parameterized surfaces, we present a method for transporting deformations along paths in the shape space. This is difficult despite the underlying space being a vector space because the chosen (elastic) Riemannian metric is non-standard. Using a finite-basis for representing SRNFs of shapes, we derive expressions for Christoffel symbols that enable parallel transports. We demonstrate this framework using examples from shape analysis of parameterized spherical surfaces, in the three contexts mentioned above. Back To Top

 October 11, 2013 Speaker: Dr. Runze Li, Penn State University Title: Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates When: October 11, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: This paper is concerned with feature screening and variable selection for varying coefficient models with ultrahigh dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency. To enhance the finite sample performance of the proposed procedure, we further develop an iterative feature screening procedure. Monte Carlo simulation studies were conducted to examine the performance of the proposed procedures. In practice, we advocate a two-stage approach for varying coefficient models. The two stage approach consists of (a) reducing the ultrahigh dimensionality by using the proposed procedure and (b) applying regularization methods for dimension-reduced varying coefficient models to make statistical inferences on the coefficient functions. We illustrate the proposed two-stage approach by a real data example. Back To Top

 October 4, 2013 Speaker: Dr. Robert Clickner, FSU Dept. of Statistics Title: The Work Life of a Statistician in Academia, Government, and the Private Sector: A Comparative Review When: October 4, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: Statisticians work in a variety of work environments. These work environments can be broadly categorized as academia, government, and the private sector (or industry). While all have many things in common, there are significant differences among them in the nature of the work, the required skills, the opportunities, benefits, employer expectations, demands, constraints, rewards, and compensation. No one of these work environments is the best for everyone. This will be a nontechnical talk that will describe and discuss these similarities and differences and hopefully give you a sense of which you might preferable. Back To Top

 September 27, 2013 Speaker: Robert Holden Title: FAILURE TIME REGRESSION MODELS FOR THINNED POINT PROCESSES When: September 27, 2013 1:00 pm Where: 215 OSB Abstract: In survival analysis, data on the time until a specific criterion event (or endpoint") occurs are analyzed, often with regard to the effects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component. In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the effects of the predictors on that distribution. Suppose that the criterion event isn't a terminal event that can only occur once, but is a repeatable event. The sequence of events forms a stochastic point process. Further suppose that only some of the events are detected (observed); the detected events form a thinned point process. Any failure time model based on the data will be based not on the time until the first occurrence, but on the time until the first detected occurrence of the event. The implications of estimating survival regression models from such incomplete data will be analyzed. Back To Top

 September 27, 2013 Speaker: Dr. Jim Hobert, University of Florida Title: Convergence analysis of the Gibbs sampler for Bayesian general linear mixed models with improper priors When: September 27, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: A popular default prior for the general linear mixed model is an improper prior that takes a product form with a flat prior on the regression parameter, and so-called power priors on each of the variance components. I will describe a convergence rate analysis of the Gibbs samplers associated with these Bayesian models. The main result is a simple, easily-checked sufficient condition for geometric ergodicity of the Gibbs Markov chain. (This is joint work with Jorge Roman and Brett Presnell.) Back To Top

 September 20, 2013 Speaker: Dr. Betsy Hill, Medical University of South Carolina Title: Analysis of left-censored multiplex immunoassay data: A unified approach When: September 20, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: Multiplex immunoassays (MIAs) are moderate- to high-throughput platforms for simultaneous quantitation of a panel of analytes, and increasingly are popular as hypothesis generating tools for targeted biomarker identification. As such, MIAs are not always rigorously validated, and often little is known about analytes’ expected concentrations in samples derived from the target population. As a consequence, MIA data can be plagued by high proportions of concentrations flagged either as ‘out-of-range’ – samples for which the observed response falls below (above) the lower (upper) asymptote of a 5-parameter logistic calibration curve – or as extrapolated beyond the smallest or largest standard. We present a unified approach to the analysis of left-censored MIA data in the context of a Bayesian hierarchical model that incorporates background estimation, standard curve fitting, and modeling of observed fluorescence as a function of unobserved (latent) analyte concentration, with accommodation of left-censored concentrations via variance function specification. We present results from both a simulation study and cytokine array analysis of serum specimens from head and neck cancer patients. Back To Top

 September 13, 2013 Speaker: Dr. Debdeep Pati, FSU Dept. of Statistics Title: Shrinkage prior in high dimensions When: September 13, 2013 10:00 am Where: 214 Duxbury Hall (Nursing) Abstract: Shrinkage priors are routinely used as alternative to point-mass mixture priors for sparse modeling in high-dimensional applications. The question of statistical optimality in such settings is under-studied in a Bayesian framework. We provide theoretical understanding of such Bayesian procedures in terms of two key phenomena: prior concentration around sparse vectors and posterior compressibility. We demonstrate that a large class of commonly used shrinkage priors lead to sub-optimal procedures in high-dimensional settings. As a remedy, we propose a novel shrinkage prior that leads to optimal posterior concentration. A novel sampling algorithm for our proposed prior is devised and illustrations are provided through simulation examples and an image-denoising application. Extension to massive covariance matrix estimation is discussed. Back To Top

 September 6, 2013 Speaker: Dr. Anuj Srivastava, FSU Dept. of Statistics Title: Statistical Techniques on Nonlinear Manifolds -- Their Contributions in Advancing Image Understanding When: September 6, 2013 10:00 am Where: 499 Dirac Science Library Abstract: The primary goal in image understanding is to characterize objects contained in images, in terms of their locations, motions, activities, and appearances. Due to inherent variability associated with scenes, images, and objects, statistical approaches becomes natural. Any statistical approach requires mathematical representations of objects of interest and probabilistic descriptions to capture their variabilities. The difficulty comes from nonlinearity of object representation spaces -- these are not Euclidean and one cannot perform classical multivariate statistics directly. Instead, one needs tools from differential geometry, for handling the nonlinearity, and statistical techniques adapted to these manifolds for performing object characterization. In contrast to manifold-learning problems where one estimates the underlying manifold, the representation spaces here are fully known and one need to exploit their geometries to develop efficient statistical tools. An example of this situation arises in shape analysis of objects in still images and videos. While shapes have been represented in many ways -- point sets, level-sets, parametrized curves, boundary surfaces, etc -- their representation spaces mostly form nonlinear manifolds. Here one would like to compare shapes, average them, develop statistical shape models, and provide tests for hypotheses involving different shape classes, and the nonlinearity of shape manifolds presents a challenge. I will describe recent advances in differential-geometric techniques that overcome this challenge and provide a rich set of techniques for "elastic shape analysis" . The resulting tools include computation of distances for joint shape registration and comparisons, averaging of shapes, principal component analysis to discover modes of variations, "Gaussian"-type shape models, and much more. This framework helps capture shape variability in datasets very efficiently using low-dimensional manifolds and corresponding statistical models. This approach has been applied to shape analysis in face recognition, activity classification, object recognition, medical diagnosis, and bioinformatics. Back To Top

 July 2, 2013 Speaker: Oliver Galvis Title: Hybrid Target-Category Forecasting Through Sparse Factor Auto-Regression When: July 2, 2013 2:00 pm Where: 215 OSB Abstract: Nowadays, time series forecasting in areas such as economic and finance has become a very challenging task given the enormous amount of information available that may, or may not, serve to improve the prediction outcome of the series. A peculiar data structure embracing a very large number of predictors p and a limited number of observations T, where usually (p >>T), is now typical. This data is usually grouped based on particularities of the series providing a category data structure where each series belongs to one group. When the objective is to forecast an univariate time series, or target series, the AR(4) has become a standard. We believe the performance of the AR(4) can be improved by adding additional predictors from several sources. Then, a comprehensive model that encompasses two sources of information, one coming from the past observations of the target series and the other denoted by the factor representation of the additional predictors, can be used in forecasting. By taking advantage of the category data structure we advocate that any category, specially the one containing the target series, may be explained by its own lags, and the other categories and their lags. Consequently, a multivariate regression model naturally arises bringing two challenges in modeling and forecasting. First, recognize the relevant predictors from a very large pool of candidates that truly explain the target series, and eliminate data collinearity. Second, extract the factors that better represent the significant information contained in the very large pool of predictors. To overcome both challenges we propose the Sparse Factor Auto-Regression (SFAR) model which has low rankness and cardinality control on the matrix of coefficients. In computation we propose a new version of the SEL-RRR estimator that simultaneously attains cardinality control on the number of nonzero rows in the matrix of coefficients to tackle the high dimension and collinearity problems, and a achieve low rankness on the same matrix to extract very informative factors. The cardinality control is achieved via the multivariate quantile thresholding rule while the rank reduction is obtained via a RRR decomposition. With both challenges tackled model interpretability and forecasting accuracy are accomplished. Applications of the proposed methodology are performed over synthetic and real world data sets. Results of the experiments show an improvement from our model over the AR(4) in both applications. Back To Top

 July 1, 2013 Speaker: Seung-Yeon Ha Title: Theories on Group variable selection in multivariate regression models When: July 1, 2013 11:00 am Where: 215 OSB Abstract: We study group variable selection on multivariate regression model. Group variable selection is selecting the non-zero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantees smaller MSE than OLS according to James-Stein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection. Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and Hard-Ridge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low. We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure Hard-Ridge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, ? works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of block-wise weight to l2 penalty. Back To Top

 June 27, 2013 Speaker: Felicia Williams Title: The Relationship of Diabetes to Coronary Heart Disease Mortality: A Meta-Analysis Based on Person-level Data When: June 27, 2013 2:00 pm Where: 215 OSB Abstract: Studies have suggested that diabetes is a stronger risk factor for coronary heart disease (CHD) in women than in men. We present a meta-analysis of person-level data from 42 cohort studies in which diabetes, CHD mortality and potential confounders were available and a minimum of 75 CHD deaths occurred. These studies followed up 77,863 men and 84,671 women aged 42 to 73 years on average from the US, Denmark, Iceland, Norway and the UK. Individual study prevalence rates of self-reported diabetes mellitus at baseline ranged between less than 1% in the youngest cohort and 15.7% (males) and 11.1% (females) in the NHLBI CHS study of the elderly. CHD death rates varied between 2% and 20%. A meta-analysis was performed in order to calculate overall hazard ratios (HR) of CHD mortality among diabetics compared to non-diabetics using Cox Proportional Hazard models. The random-effects HR associated with baseline diabetes and adjusted for age was significantly higher for females 2.65 (95% CI: 2.34, 2.96) than for males 2.33 (95% CI: 2.07, 2.58) (p=0.004). These estimates were similar to the random-effects estimates adjusted additionally for serum cholesterol, systolic blood pressure, and current smoking status: females 2.69 (95% CI: 2.35, 3.03) and males 2.32 (95% CI: 2.05, 2.59) . They also agree closely with estimates (odds ratios of 2.9 for females and 2.3 for males) obtained in a recent meta-analysis of 50 studies of both fatal and nonfatal CHD but not based on person-level data. This evidence suggests that diabetes diminishes the female advantage. An additional analysis was performed on race. Only 14 cohorts were analyzed in the meta-analysis. This analyses showed no significant difference between the black and white cohorts before (p=0.68) or after adjustment for the major CHD RFs (p=0.88). The limited amount of studies used may lack the power to detect any differences. Back To Top

 June 19, 2013 Speaker: Yuanyuan Tang Title: Bayesian Methods for Skewed Response including Longitudinal and Heterscedastic Data When: June 19, 2013 3:00 pm Where: 215 OSB Abstract: Skewed response data are very popular in practice, especially in biomedical area. We begin our work from the skewed longitudinal response. We present a partial linear model of median regression function of skewed longitudinal response. We provide justifications for using our methods including theoretical investigation of the support of the prior, asymptotic properties of the posterior and also simulation studies of finite sample properties. Ease of implementation and advantages of our model and method compared to existing methods are illustrated via analysis of a cardiotoxicity study of children of HIV infected mother. Then we study the skewed and heterocedastic univariate response. We present our novel extension of the transform-both-sides model to the bayesian variable selection area to simultaneously perform the variable selection and parameter estimation. At last, we proposed our novel Latent Variable Residual Density (LV-RD) model to handle the skewed univariate response with a flexible heteroscedasticity. The advantages of our semiparametric associated Bayes method include the ease of prior elicitation/determination, an easily implementable posterior computation, theoretically sound properties of the selection of priors and accommodation of possible outliers. Back To Top

 June 18, 2013 Speaker: Ester Kim Nilles Title: An Ensemble Approach to Predicting Health Outcomes When: June 18, 2013 10:00 am Where: 215 OSB Abstract: Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve. Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models - the multilayer perceptron network, k-nearest neighbor and support vector machine - to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSE-weighted, AUC-weighted, logistic and flexible naive Bayes. The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction. Back To Top

 May 15, 2013 Speaker: Jingyong Su, FSU Dept. of Statistics, Dissertation Defense Title: Statistical Analysis of Trajectories on Riemannian manifolds When: May 15, 2013 2:00 pm Where: OSB 215 Abstract: This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete at unsynchronized times. Second, trajectories are observed under arbitrary temporal evolutions. In this work, we first address the problem of estimating full smooth trajectories on nonlinear manifolds using only a set of time-indexed points, for use in interpolation, smoothing, and prediction of dynamic systems. Furthermore, we study statistical analysis of trajectories that take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal evolutions. The problem of analyzing such temporal trajectories including registration, comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity that provides both a cost function for temporal registration and a proper distance for comparison of trajectories. This distance, in turn, is used to define statistical summaries, such as the sample means and covariances, of given trajectories and Gaussian-type models to capture their variability. Both theoretical proofs and experimental results are provided to validate our work. Back To Top

 May 13, 2013 Speaker: Yingfeng Tao Title: THE FREQUENTIST PERFORMANCE OF SOME BAYESIAN CONFIDENCE INTERVALS FOR THE SURVIVAL FUNCTION When: May 13, 2013 11:00 am Where: 215 OSB Abstract: Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or interval-censored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the right-censored case, almost all confidence intervals are based in some way on the Kaplan-Meier estimator first proposed by Kaplan and Meier (1958) and widely used as the nonparametric estimator in the presence of right-censored data. For interval-censored data, the Turnbull estimator (Turnbull (1974)) plays a similar role. For a class of Bayesian models involving Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques lead to probability intervals for the survival function (at arbitrary time points) and its quantiles for both the right-censored and interval-censored cases. This dissertation will examine the frequentist properties and general performance of these probability intervals when the prior is non-informative. Simulation studies will be used to compare these probability intervals with other published approaches. Extensions of the Doss-Huffer approach are given for constructing simultaneous confidence bands for the survival function and for computing approximate confidence intervals for the survival function based on Edgeworth expansions using posterior moments. The performance of these extensions is studied by simulation. Back To Top

 May 6, 2013 Speaker: Darshan Bryner, FSU Dept. of Statistics, Essay Defense Title: Bayesian Active Contours with Affine-Invariant, Elastic Shape Priors When: May 6, 2013 10:00 am Where: OSB 215 Abstract: Active contour, especially in conjunction with prior-shape models, has become an important tool in image segmentation. However, most contour methods use shape priors based on similarity-shape analysis, i.e. analysis that is invariant to rotation, translation, and scale. In practice, the training shapes used for prior-shape models may be collected from viewing angles different from those for the test images and require invariance to a larger class of transformation. Using an elastic, affine-invariant shape modeling of planar curves, we propose an active contour algorithm in which the training and test shapes can be at arbitrary affine transformations, and the resulting segmentation is robust to perspective skews. We construct a shape space of affine-standardized curves and derive a statistical model for capturing class-specific shape variability. The active contour is then driven by the gradient of a total energy composed of a data term, a smoothing term, and an affine-invariant shape-prior term. This framework is demonstrated using a number of examples involving real images and the segmentation of shadows in sonar images of underwater objects. Back To Top

 May 1, 2013 Speaker: Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense Title: Characterizations of Complex Signals Using Functional ANOVA When: May 1, 2013 1:00 pm Where: OSB 215 Abstract: Many methods exist for detecting differences in functional data. However, most of these methods make assumptions about the noise on the signals, and the usual assumption is that the noise is normally distributed. I would like to be able to discern differences in time-dependent signals that represent the rate of flow of water exiting the spring of a karstic springshed under different treatments. These signals are simulated using a complex system called KFM developed by Chicken et al. [2007]. Because of the complex nature of KFM, even if the distribution of the noise on the inputs is known, the simulated signal has unknown noise components which are autocorrelated. This is further complicated by the fact that input noise has constraints that do not allow it to be normally distributed. The resulting noisy signal therefore has extremely non-normal noise which makes the use of established methods for detecting differences in signals inappropriate. The treatments under which the signals will be simulated are characteristics of the underground path where the water flows. Characteristics of interest include the length of the underground channels, called conduits or active connections, and the total number of conduits. It is beneficial to be able to determine differences in flow rate signals for different types of paths because typically the entirety of the underground path is difficult to map and therefore characteristics about the path are usually unknown. If differences in discharge signals are detected for different treatments, the treatment level for the underground waterway can be inferred based on the measured flow of discharge at the spring. It is also useful to be able to determine the flow rate for different types of springsheds for different weather scenarios and in the case of environmental disasters, such as the spill of a contaminant in the springshed. Being able to classify the output of a spring based on its treatment level can help in predicting the discharge rate after a weather event or the flow of the contaminant through the system. Details of KFM are presented, along with methods of simulating rain and discharge signals under different treatments. Several established methods for tests on functional data are discussed, and areas of interest are discussed for research into a method that has been modified for signals with noise components with an unknown distribution. Specific areas of interest include power calculations, treatment selection methods, and follow up tests for identifying different signals after overall differences are detected. Back To Top

 April 29, 2013 Speaker: Wade Henning, FSU Dept. of Statistics, Essay Defense Title: Characterizing the Shape Populations of Particle Sets When: April 29, 2013 2:00 pm Where: OSB 215 Abstract: Creating statistical shape models for particle sets is an important goal in particle science as researchers seek to classify them, predict their behaviors, or optimize their process parameters. Statistical shape analysis is becoming the standard approach for comparing and classifying the shapes of curves: pairwise distances are measured between boundary functions and used to construct means and covariances. However, no method currently exists for comparing shape populations. This essay uses the shape distributions of particle sets to make inferences about their implicit shape populations. A method is introduced for estimating the Fisher-Rao distance between shape populations using elastic shape analysis, kernel density estimation and Monte Carlo methods. The F-R distance is calculated between probability density functions on R and shape populations on shape manifolds. The results provide strong empirical evidence that the Fiseher Rao distance between shape populations is a discriminating measurement for comparing particle sets. Statistical modeling based on shape populations promises to revolutionize the analysis of particle sets and their processes. Back To Top

 April 26, 2013 Speaker: Paul Beaumont, FSU Department of Economics Title: Generalized Impulse Response Functions and the Spillover Index When: April 26, 2013 10:00 am Where: 108 OSB Abstract: Impulse response functions (IRF) and forecast error variance decompositions (FEVD) from vector autoregression (VAR) systems depend upon the order of the variables in the VAR. We show how to compute the order independent generalized IRF and generalized FEVD and compare them to the results of all possible VAR orderings. We then show that the FEVD related spillover index does not translate well to the generalized case and produces index values well outside the range produced by all permutations of orderings. We illustrate the methods with an application to spillover effects of economic growth rates across countries. Back To Top

 April 19, 2013 Speaker: Wei Wu, FSU Dept. of Statistics Title: Time Warping Method and Its Applications When: April 19, 2013 10:00 am Where: 108 OSB Abstract: In this talk, I will summarize my research on time warping over the past 2-3 years.  Focusing on statistical analysis on functional data, we have recently developed a novel geometric framework to compare, align, average, and model a collection of random functional observations, where the key step is to find an optimal time warping between two functions for a feature-to-feature alignment.  This framework can be easily extended to analyzing multi-dimensional curves and point process observations. The theoretical underpinning of this framework is established by proving the consistency under a semi-parametric model. Mathematical modeling between two time warpings also leads to a parametric representation for spherical regression.  Finally, I will demonstrate this new framework using experimental data in various application domains such as SONAR signals, ECG bio-signals, and spike recordings in geniculate ganglion. Back To Top

 April 15, 2013 Speaker: Michael Rosenthal, Ph.D Candidate Essay Defense Title: Advances in Spherical Regression When: April 15, 2013 3:30 pm Where: OSB 215 Abstract: The ability to define correspondences between paired observations on a spherical manifold has applications in earth science, medicine, and image analysis. Spherical data comes in many forms including geographical coordinates from plate tectonics, clouds, and GPS devices. They can also be directional in nature such as from vector cardiograms, winds, currents, and tides. Spherical data are unit vectors of arbitrary dimension and can be viewed as points on the hyper-sphere manifold. Examples of such data often include sounds, signals, shapes, and images. The Riemannian geometry of these hyper-spheres are well known and can be utilized for arbitrary dimensional unit vector data. Past works in spherical regression involve either flattening the spherical manifold to a linear space, or imposing rigid restrictions to the nature of the correspondence between predictor and response variables. While these methods have their advantages in certain settings, there are some severe limitations that will make them inappropriate in a variety of other settings. We propose a method to extend the framework to allow for a very flexible nonparametric form of correspondences for data on the two dimensional sphere. Back To Top

 April 12, 2013 Speaker: Karim Lounici, Georgia Tech Title: Variable Selection with Exponential Weights When: April 12, 2013 10:00 am Where: 108 OSB Abstract: In the context of a linear model with a sparse coecient vector, exponential weights methods have been shown to be achieve oracle inequalities for predic- tion. We show that such methods also succeed at variable selection and estima- tion under the necessary identi ability condition on the design matrix, instead of much stronger assumptions required by other methods such as the Lasso or the Dantzig Selector. The same analysis yields consistency results for Bayesian methods and BIC-type variable selection under similar conditions. Joint Work with Ery Arias-Castro. Back To Top

 April 5, 2013 Speaker: Russell G. Almond, FSU Title: A Particle Filter EM Algorithm for Estimating Parameters of a Partially Observed Markov Decision Process (POMDP) When: April 5, 2013 10:00 am Where: 108 OSB Abstract: Periodic assessments involve a series of assessments intended to measure a complex of related competencies given to the same collection of individuals at several time points. One challenge with these models is that the student competencies will grow over time as a response of the instructional activities that occur between assessments. Partially observed Markov decision process (POMDP) models, a general case of the hidden Markov model (HMM) or state space model, can capture this dynamic. The model relates a series of observable variables to a series of latent variables, which are assumed to be changing over time. The relationship between the observed and latent variables at each time point is governed by a matrix that reflects the design of the assessment. It is assumed that latent variables change according to a Markov model that is governed by a series of instructional activities. This talk provides an example of a POMDP model and describes a method for combining the particle filter and stochastic EM algorithms for estimating the parameters of POMDPs from panel data coming from the administration of periodic assessment models. Back To Top

 March 29, 2013 Speaker: Genevera Allen, Rice University Title: High-Dimensional Poisson Graphical Models When: March 29, 2013 10:00 am Where: OSB 108 Abstract: Markov Networks, especially Gaussian graphical models and Ising models, have become a popular tool to study relationships in high-dimensional data.   Variables in many data sets, however, are comprised of count data that may not be well modeled by Gaussian or multinomial distributions.  Examples include high-throughputgenomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits.  Existing methods for Poisson graphical models include the Poisson Markov Random Field (MRF) of Besag (1974) that places severe restrictions on the types of dependencies, only permitting negative correlations between variables. By restricting the domain of the variables in this joint density, we introduce a Winsorized Poisson MRF which permits a rich dependence structure and whose pair-wise conditional densities closely approximate the Poisson distribution.  An important consequence of our model is that it gives an analytical form for a multivariate Poisson density with rich dependencies; previous multivariate densities permitted only positive or only negative dependencies. We develop neighborhood selection algorithms to estimate network structure from high-dimensional count data by fitting graphical models based on Besag's MRF, our Winsorized Poisson MRF, and a local approximation to the Winsorized Poisson MRF. We also provide theoretical results illustrating the conditions under which these algorithms recover the network structure with high probability.  Through simulations and an application to breast cancer microRNAs measured by next generation sequencing, we demonstrate the advantages of our methods for network recovery from count data.  This is joint work with Zhandong Liu, Pradeep Ravikumar and Eunho Yang. Back To Top

 March 22, 2013 Speaker: Xiaoming Huo, Georgia Tech Title: Detectability and Related Theorems When: March 22, 2013 10:00 am Where: OSB 110 Abstract: The Detectability problem determines when certain type of underlying structures is detectable from noisy images. The methodology will base on analyzing the pattern of a collection of local tests. The aggregation of these testing results needs to ensure both statistical efficiency and low computational complexity. In particular, certain testing methods will depend on the distribution of the length of the longest chains that connect locally significant hypotheses tests. The asymptotic distribution of these largest lengths will reveal properties of the test. I will describe some optimality guarantee of proposed detection methods. Statistical aspect of the problem will be focused. Audience only needs to have knowledge on hypotheses testing and asymptotic theory. The strategy of testing locally and deciding globally may have applications in other statistical problems, in which the alternative hypothesis is composite, complicated or overwhelming. The relation between detectability and percolation theory will be discussed. Back To Top

 March 20, 2013 Speaker: Jose Laborde, Ph.D Candidate, Essay Defense Title: Elastic Shape Analysis of Amino Acid and Nucleotide Biomolecules When: March 20, 2013 3:30 pm Where: OSB 215 Abstract: This work aims on developing methods for shape analysis of biomolecules represented as parameterized 3D open curves for which added sequence/secondary structure information can be jointly compared. This requires the adjustment of Elastic Shape Analysis (ESA) methods in that we can use neither equally spaced 3D points nor same number points in a pair of structures to be able to compare them. It also needs a biologically relevant way to incorporate such additional information through a correct choice of auxiliary function. ESA has been applied mostly on equally re-sampled versions of original curves so this work also aims to eliminate this re-sampling step, this will enable us to incorporate sequences/secondary structure information more naturally into auxiliary post 3D coordinates. The ESA framework requires a Riemannian metric that allows: (1) re-parameterizations of curves by isometries, and (2) efficient computations of geodesic paths between curves. These tools allow for computing Karcher means and covariances (using tangent PCA) for shape classes, and a probabilistic classification of curves. To solve these problems we first introduced a mathematical representation of curves, called q-functions, and we used the L^2 metric on the space of q-functions to induce a Riemannian metric on the space of parameterized curves. This process requires optimal registration of curves and achieves a superior alignment on them. Mean Shapes and their Covariance structures can be used to specify a normal probability model on shape classes, which can then be used for classifying test shapes. We have also achieved superior classification rates compared to state-of-the-art methods on their RNA sets which has led to the acceptance of our work into the Nucleic Acids Research Journal. Back To Top

 March 19, 2013 Speaker: Gretchen Rivera, FSU, Dissertation Defense Title: Meta Analysis and Meta Regression of a Measure of Discrimination used in Prognostic Modeling. When: March 19, 2013 3:00 pm Where: OSB 215 Abstract: In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. The predictors are: age, diabetes, total serum cholesterol (mg/dl), high density lipoprotein (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 84 cohort groups. Our main interest is to evaluate how well the prognostic model discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUROC). The AUROC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUROC and its standard error (SE). We used Meta-analysis to summarize the estimated AUROCs and to evaluate if there is heterogeneity in our estimates. To evaluate the existence of significant heterogeneity we used the Q statistic. Since heterogeneity was found in our study we compare seven different methods for estimating tau^2 (between study variance). We conclude by examining whether differences in study characteristics explained the heterogeneity in the values of the AUROC. Back To Top

 March 8, 2013 Speaker: Yongtao Guan, University of Miami Title: Optimal Estimation of the Intensity Function of a Spatial Point Process When: March 8, 2013 10:00 am Where: 108 OSB Abstract: Although optimal from a theoretical point of view, maximum likelihood estimation for Cox and cluster point processes can be cumbersome in practice due to the complicated nature of the likelihood function and the associated score function. It is therefore of interest to consider alternative more easily computable estimating functions. We derive the optimal estimating function in a class of first-order estimating functions. The optimal estimating function depends on the solution of a certain Fredholm integral equation and reduces to the likelihood score in case of a Poisson process. We discuss the numerical solution of the Fredholm integral equation and note that a special case of the approximated solution is equivalent to a quasi-likelihood for binary spatial data. The practical performance of the optimal estimating function is evaluated in a simulation study and a data example. Back To Top

 March 4, 2013 Speaker: Kelly McGinnity, FSU, Dissertation Defense Title: Nonparametric Wavelet Thresholding and Profile Monitoring for Non-Gaussian Errors When: March 4, 2013 11:00 am Where: 215 OSB Abstract: Recent advancements in data collection allow scientists and researchers to obtain massive amounts of information in short periods of time. Often this data is functional and quite complex. Wavelet transforms are popular, particularly in the engineering and manufacturing fields, for handling these type of complicated signals. A common application of wavelets is in statistical process control (SPC), in which one tries to determine as quickly as possible if and when a sequence of profiles has gone out-of-control. However, few wavelet methods have been proposed that don't rely in some capacity on the assumption that the observational errors are normally distributed. This dissertation aims to fill this void by proposing a simple, nonparametric, distribution-free method of monitoring profiles and estimating changepoints. Using only the magnitudes and location maps of thresholded wavelet coefficients, our method uses the spatial adaptivity property of wavelets to accurately detect profile changes when the signal is obscured with a variety of non-Gaussian errors. Wavelets are also widely used for the purpose of dimension reduction. Applying a thresholding rule to a set of wavelet coefficients results in a "denoised" version of the original function. Once again, existing thresholding procedures generally assume independent, identically distributed normal errors. Thus, the second main focus of this dissertation is a nonparametric method of thresholding that does not assume Gaussian errors, or even that the form of the error distribution is known. We improve upon an existing even-odd cross-validation method by employing block thresholding and level dependence, and show that the proposed method works well on both skewed and heavy-tailed distributions. Such thresholding techniques are essential to the SPC procedure developed above. Back To Top

 March 1, 2013 Speaker: Brian C. Monsell, US Census Bureau Title: Research at the Census Bureau When: March 1, 2013 10:00 am Where: 108 OSB Abstract: The Census Bureau has taken steps to reinforce the role of research within the organization. This talk will give details on the role of statistical research at the U. S. Census Bureau. There are renewed opportunities for internships and collaboration with those in the academic community.  Details on areas of research important to the Census Bureau will be shared, with particular attention paid to the status of current work in time series analysis and statistical software development. Back To Top

 February 27, 2013 Speaker: Rachel Becvarik, FSU, Dissertation Defense Title: Nonparametric Nonstationary Density Estimation Including Upper Control Limit Methods for Detecting Change Points When: February 27, 2013 10:00 am Where: 215 OSB Abstract: Nonstationary nonparametric densities occur naturally including applications such as monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current work on monitoring them for changes. A new statistic is proposed which effectively monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely nonparametric, designed for no particular distributional assumptions; thus making it effective in a variety of conditions. Similarly, several estimators have been shown to be successful at monitoring for changes in functional responses ("profiles'') involving high dimensional data. These methods focus on using a single value upper control limit (UCL) based on a specified in control average run length (ARL) to detect changes in these nonstationary statistics. However, such a UCL is not designed to take into consideration the false alarm rate, the power associated with the test or the underlying distribution of the ARL. Additionally, if the monitoring statistic is known to be monotonic over time (which is typical in methods using maxima in their statistics, for example) the flat UCL does not adjust to this property. We propose several methods for creating UCLs that provide improved power and simultaneously adjust the false alarm rate to user-specified values. Our methods are constructive in nature, making no use of assumed distribution properties of the underlying monitoring statistic. We evaluate the different proposed UCLs through simulations to illustrate the improvements over current UCLs. The proposed method is evaluated with respect to profile monitoring scenarios and the proposed density statistic. The method is applicable for monitoring any monotonically nondecreasing nonstationary statistics. Back To Top

 February 20, 2013 Speaker: Carl P. Schmertmann, Professor of Economics, at FSU Title: Bayesian Forecasting of Cohort Fertility When: February 20, 2013 2:00 pm Where: OSB 108 Abstract: There are signs that fertility in rich countries may have stopped declining, but this depends critically on whether women currently in reproductive ages are postponing or reducing lifetime fertility. Analysis of average completed family sizes requires forecasts of remaining fertility for women born 1970-1995. We propose a Bayesian model for fertility that incorporates a priori information about patterns over age and time. We use a new dataset, the Human Fertility Database (HFD), to construct improper priors that give high weight to historically plausible rate surfaces. In the age dimension, cohort schedules should be well approximated by principal components of HFD schedules. In the time dimension, series should be smooth and approximately linear over short spans. We calibrate priors so that approximation residuals have theoretical distributions similar to historical HFD data. Our priors use quadratic penalties and imply a high-dimensional normal posterior distribution for each country’s fertility surface. Forecasts for HFD cohorts currently 15-44 show consistent patterns. In the US, Northern Europe, and Western Europe, slight rebounds in completed fertility are likely. In Central and Southern Europe there is little evidence for a rebound. Our methods could be applied to other forecasting and missing-data problems with only minor modifications. Back To Top

 February 15, 2013 Speaker: Fred Huffer, FSU Dept. of Statistics Title: Record Values, Poisson Mixtures, and the Joint Distribution of Counts of Strings in Bernoulli Sequences When: February 15, 2013 10:00 am Where: 108 OSB Abstract: Let U1, U2, U3, … be iid continuous random variables and Y1, Y2, Y3, …be Bernoulli rv's which indicate the position of the record values in this sequence, that is, Yj = 1 if Ui < Uj for all i < j. Let Z1 be the number of occurrences of consecutive record values in the infinite sequence U1, U2, U3, …and, more generally, Zk be the number of occurrences of two record values separated by exactly k - 1 non-record values. It is a well known but still quite surprising fact that Z1,Z2,Z3, … are independent Poisson rv's with EZk = 1/k for all k. We show how this may be proved by embedding the record sequence in a marked Poisson process. If we have only a finite sequence of trials U1,U2, …,UN, then the record counts Z1,Z2,… will no longer be exactly Poisson or exactly independent. But if N is random with an appropriately chosen distribution, we can retain these properties exactly. This also can be proved by embedding in a marked Poisson process. This is joint work with Jayaram Sethuraman and Sunder Sethuraman. Back To Top

 January 25, 2013 Speaker: Yin Xia Title: Testing of Large Covariance Matrices When: January 25, 2013 10:00 am Where: 108 OSB Abstract: This talk considers in the high-dimensional setting two inter-related problems: (a) testing the equality of two covariance matrices; (b)recovering the support of the difference of two covariance matrices. We propose a new test for testing the equality of two covariance matrices and investigate its theoretical and numerical properties. The limiting null distribution of the test statistic is derived and the power of the test is studied. The test is shown to enjoy certain optimality and to be especially powerful against sparse alternatives. The simulation results show that the test significantly outperforms the existing methods both in terms of size and power. Analysis of a p53 dataset is carried out to demonstrate the application of the testing procedures. When the null hypothesis of equal covariance matrices is rejected, it is often of significant interest to further investigate how they differ from each other. Motivated by applications in genomics, we also consider recovering the support of the difference of two covariance matrices. New procedures are introduced and their properties are studied. Applications to gene selection are also discussed. Back To Top

 January 23, 2013 Speaker: Ying Sun Title: Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets When: January 23, 2013 2:00 pm Where: 108 OSB Abstract: For Gaussian process models, likelihood based methods are often dicult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observa- tions require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational diculties. In this work, we propose new unbiased es- timating equations based on score equation approximations that are both computationally and statistically ecient. We replace the inverse covariance matrix that appears in the score equa- tions by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance ma- trix. The statistical eciency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Paci c Ocean. This talk is based on joint work with Michael Stein from University of Chicago. Back To Top

 January 18, 2013 Speaker: Minjing Tao Title: Large Volatility Matrix Estimation Based on High-Frequency Financial Data When: January 18, 2013 10:00 am Where: 108 OSB Abstract: Financial practices often need to estimate an integrated volatility matrix of a large number of assets using noisy high-frequency data. Many existing estimators of volatility matrix of small dimensions become inconsistent when the size of the matrix is close to or larger than the sample size. In this talk, we propose a new type of large volatility matrix estimators based on non-synchronized high-frequency financial data, allowing for the presence of market micro-structure noise. In addition, we investigate the optimal convergence rate for this volatility estimation problem, by building both the asymptotical theory for the proposed estimator and deriving the minimax lower bound. Our proposed estimator has a risk matching this lower bound up to a constant factor, and thus achieves the optimal convergence rate. Furthermore, a simulation study is conducted to test the finite sample performance of our proposed estimator to support the established asymptotic theory. Back To Top

 January 14, 2013 Speaker: Naomi Brownstein Title: Analysis of Time-to-Event Data & Intermediate Phenotypes in the OPPERA Study When: January 14, 2013 2:00 pm Where: 108 OSB Abstract: In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the \gold standard" for diagnosing temporomandibular disorders (TMD) is a clinical examination by an expert dentist. In a large prospective cohort study, examining all subjects in this manner is infeasible. Instead, it is common to use a cheaper (and less reliable) examination to screen for possible incident cases and perform the gold standard" examination only on those who screen positive on the simpler examination. Unfortunately, subjects may leave the study before receiving the \gold standard" examination. This results in a survival analysis problem with missing censoring indicators. Motivated by the Orofacial Pain: Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose methods for parameter estimation in survival models with missing censoring indicators. We estimate the probability of being a case for those with no gold standard" examination through a logistic regression model. Predicted probabilities facilitate estimation of the hazard ratios associated with each putative risk factor. Multiple imputation produces variance estimates for this procedure. Simulations show that our methods perform better than naïve approaches. In addition, we apply the method to data in the OPPERA study and extend the methods to account for repeated measures and missing covariates. Another problem of recent interest is the analysis of secondary phenotypes in case-control studies. Standard methods may be biased and lack coverage and power. We propose a general method for analysis of arbitrary phenotypes, including ordinal and survival outcomes. We advocate the use of inverse probability weighted methods and estimate the standard error by bootstrapping. Back To Top

 January 11, 2013 Speaker: Qing Mai Title: Semiparametric Sparse Discriminant Analysis in High Dimensions When: January 11, 2013 10:00 am Where: 108 OSB Abstract: In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Tibshirani et al. (2002), Fan & Fan (2008), Wu et al. (2009), Clemmensen et al. (2011), Cai & Liu (2011), Witten & Tib-shirani (2011), Fan et al. (2012) and Mai et al. (2012)). These research efforts are rejuvenating discriminant analysis. However, the normality assumption, which rarely holds in real applications, is still required by all of these recent methods. We develop high-dimensional semi parametric sparse discriminant analysis (SeSDA) that generalizes the normality-based discriminant analysis by relaxing the Gaussian assumption. If the underlying Bayes rule is sparse, SeSDA can estimate the Bayes rule and select the true features simultaneously with overwhelming probability, as long as the logarithm of dimension grows slower than the cube root of sample size. At the core of the theory is a new exponential concentration bound for semiparametric Gaussian copulas, which is of independent interest. Further, the analysis of a malaria data (Ockenhouse et al. (2006)) by SeSDA confirms the superior performance of SeSDA to normality-based methods in both classification and feature selection. Back To Top

 December 13, 2012 Speaker: Rommel Bain Department of Statistics, Florida State University, Dissertation Defense Title: Monte Carlo Likelihood Estimation for Conditional Autoregressive Models with Application to Sparse Spatiotemporal Data When: December 13, 2012 10:00 am Where: OSB 215 Abstract: Spatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Spatiotemporal models describe the relationship between observations collected from different spatiotemporal sites. The modeling of spatiotemporal interactions arising from spatiotemporal data is done by incorporating the space-time dependence into the covariance structure. A main goal of spatiotemporal modeling is the estimation and prediction of the underlying process that generates the observations under study and the parameters that govern the process. Furthermore, analysis of the spatiotemporal correlation of variables can be used for estimating values at sites where no measurements exist. In this work, we develop a framework for estimating quantities that are functions of complete spatiotemporal data when the spatiotemporal data is incomplete. We present two classes of conditional autoregressive (CAR) models (the homogeneous CAR (HCAR) model and the weighted CAR (WCAR) model) for the analysis of sparse spatiotemporal data (the log of monthly mean zooplankton biomass) collected on a spatiotemporal lattice by the California Cooperative Oceanic Fisheries Investigations (CalCOFI). These models allow for spatiotemporal dependencies between nearest neighbor sites on the spatiotemporal lattice. Typically, CAR model likelihood inference is quite complicated because of the intractability of the CAR model's normalizing constant. Sparse spatiotemporal data further complicates likelihood inference. We implement Monte Carlo likelihood (MCL) estimation methods for parameter estimation of our HCAR and WCAR models. Monte Carlo likelihood estimation provides an approximation for intractable likelihood functions. We demonstrate our framework by giving estimates for several different quantities that are functions of the complete CalCOFI time series data. Back To Top

 December 3, 2012 Speaker: Yuanyuan Tang, Department of Statistics, Florida State University, Essay Defense Title: Bayesian Partial Linear Model for skewed Longitudinal Data When: December 3, 2012 1:00 pm Where: OSB 215 Abstract: For longitudinal studies with heavily skewed continuous response, statistical model and methods focusing on mean response are not appropriate. In this paper, we present a partial linear model of median regression function of skewed longitudinal response. We develop a semi-parametric Bayesian estimation procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We provide justifications for using our methods including theoretical investigation of the support of the prior, asymptotic properties of the posterior and also simulation studies of finite sample properties. Ease of implementation and advantages of our model and method compared to existing methods are illustrated via analysis of a cardiotoxicity study of children of HIV infected mother. Our other aim is to develop a Bayesian simultaneous variable selection and estimation of median regression for skewed response variable. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing frequentist median lasso regression model. Considering the estimation bias and total square error, our proposed model performs as good as, or better than competing frequentist estimators. Back To Top

 November 30, 2012 Speaker: Seungyeon Ha, Department of Statistics Florida State University, Essay Defense Title: Essay Defense When: November 30, 2012 2:00 pm Where: 215 OSB Abstract: In this paper, the L0 regularization is proposed for estimating a sparse linear regression vector in high-dimensional setup, for the purpose of both prediction and variable selection. The oracle upper bounds of both prediction error and selection error are at the same rate of those via Lasso, even under no restriction on the design matrix. The estimation loss in Lq-norm, where q ?[1,?], is upper bounded at the optimal rate of O( ?(log??K)??^(q/2) ) under a less restricted condition RIF, proposed by Zhang and Zhang(2011). Sparsity recovery, or variable selection is our main concern and we will derive the required conditions for sign consistency, which control incoherence of design matrix and signal-to-noise rate(SNR). The L0 regularization achieves SNR of the optimal rate as O(?) but requires less restriction than Lasso does for achieving the optimal rate. Then, we extend our theorems to multivariate response model by considering grouping on univariate model. On both models we approach with hard-TISP algorithm proposed by She (2009), and we guarantee to get the same stationary points by scaling the design matrix properly. Back To Top

 November 30, 2012 Speaker: Yiyuan She, Department of Statistics, Florida State University Title: On the Cross-Validation for Sparse Reduced Rank Models When: November 30, 2012 10:00 am Where: 108 OSB Abstract: Recently, the availability of high-dimensional data in statistical applications has created an urgent need for methodologies to pursue sparse and/or low rank models. These approaches usually resort to a grid search with a model comparison criterion to locate the optimal value of the regularization parameter. Cross-validation is one of the most widely used tunings in statistics and computer science. We propose a new form of cross-validation referred to as the selective-projective cross-validation (SPCV) for multivariate models where relevant features may be few and/or lie in a low dimensional subspace. In contrast to most available methods, SPCV cross-validates candidate projection-selection patterns instead of regularization parameters and is not limited to specific penalties. A further scale-free complexity correction is developed based on the nonasymptotic Predictive Information Criterion (PIC) to achieve the minimax optimal error rate in this setup. Back To Top

 November 16, 2012 Speaker: Jiashun Jin, Department of Statistics, Carnegie Mellon University Title: Fast Network Community Detection by SCORE When: November 16, 2012 10:00 am Where: 108 OSB Abstract: Consider a network where the nodes split into K di erent communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity. We propose Spectral Clustering On Ratios-of-Eigenvectors (SCORE) as a new approach to community detection. Compared to classical spectral methods, the main innovation is to use the entry-wise ratios between the rst leading eigenvector and each of the other leading eigenvectors. Let X be the adjacency matrix of the network. We rst obtain the K leading eigenvectors, say, ^1; : : : ; ^K, and let ^R be the n(K????1) matrix such that ^R(i; k) = ^k+1(i)=^1(i), 1  i  n, 1  k  K ???? 1. We then use ^R for clustering by applying the k-means method. The central surprise is, the e ect of degree heterogeneity is largely ancillary, and can be e ectively removed by taking entry-wise ratios between ^k+1 and ^1, 1  k  K ???? 1. The method is successfully applied to the web blogs data and the karate club data, with error rates of 58=1222 and 1=34, respectively. These results are much more satisfactory than that by the classical spectral methods. Also, compared to modularity methods, SCORE is computationally much faster and has smaller error rates. We develop a theoretic framework where we show that under mild conditions, the SCORE stably yields successful community detection. In the core of the analysis is the recent development on Random Matrix Theory (RMT), where the matrix-form Bernstein inequality is especially helpful. Back To Top

 November 9, 2012 Speaker: Ming Yuan, School of Industrial & Systems Engineering, Georgia Tech Title: Adaptive Estimation of Large Covariance Matrices When: November 9, 2012 10:00 am Where: 108 OSB Abstract: Estimation of large covariance matrices has drawn considerable recent attention and the theoretical focus so far is mainly on developing a minimax theory over a fixed parameter space. In this talk, I shall discuss adaptive covariance matrix estimation where the goal is to construct a single procedure which is minimax rate optimal simultaneously over each parameter space in a large collection. The estimator is constructed by carefully dividing the sample covariance matrix into blocks and then simultaneously estimating the entries in a block by thresholding. I shall also illustrate the use of the technical tools developed in other matrix estimation problems. Back To Top

 November 7, 2012 Speaker: David Bristol, Statistical Consulting Services, Inc. Title: Two Adaptive Procedures for Comparing Two Doses to Placebo Using Conditional Power When: November 7, 2012 3:35 pm Where: 108 OSB Abstract: Adaptive designs have received much attention recently for various goals, including sample size re-estimation and dose selection. Here two adaptive procedures for comparing two doses of an active treatment to placebo with respect to a binomial response variable using a double-blind randomized clinical trial are presented. The goals of the interim analysis are to stop for futility or to continue with one dose or both doses, and placebo, with a possible increase in the sample size for any group that continues. Various properties of the two procedures, which are both based on the concept of conditional power, are presented. Back To Top

 November 2, 2012 Speaker: Jinfeng Zhang, Department of Statistics, FSU Title: Change-point detection for high-throughput genomic data When: November 2, 2012 10:00 am Where: 108 OSB Abstract: Analysis of high-throughput genomic data often requires detection of change-points along a genome. For example, when comparing the chromatin accessibility of two samples (e.g. normal and cancer cells), a very essential task is to detect both the locations and the lengths of genomic regions that have statistically significant differences in chromatin accessibility between the two samples. Similar tasks are encountered when comparing DNA copy number variations, nucleosome occupancy, DNA methylations, and histone modifications of two or multiple samples. In these experiments, genetic or epigenetic features are measured along the genome for thousands or millions of genomic locations. Given two different conditions, many genomic regions can undergo significant changes. Accurate detection of the changes will help scientists to understand the biological mechanisms responsible for the phenotype differences of the samples to be compared. This problem falls into a more general type of statistical problem, call change-point problem, which has been actively studied by scientists in a variety of disciplines in the past a couple decades. However, many of the existing methods are not suitable for analyzing high-throughput genomic data. In this talk, I present two related change-point problems and our solutions to them. We manually annotated a benchmark dataset and used it to rigorously compare our method to several popular methods in literature. Our method was shown to perform better than the previous methods on the benchmark dataset. We further applied the method to study the effect of drug treatments to chromatin accessibility and nucleosome occupancy using HDAC inhibitors, a class of drugs for cancer treatment. Back To Top

 October 30, 2012 Speaker: Steve Chung, Ph.D. Candidate Title: Essay Defense: A Class of Nonparametric Volatility Models: Applications to Financial Time Series When: October 30, 2012 10:00 am Where: 499 DSL Abstract: Over the past few decades, financial volatility modeling has been very active and extensive research area for academics and practitioners. It is still one of the main ongoing research areas in empirical finance and time series economics. We first examine several parametric and nonparametric volatility models in the literature. Some of the popular parametric models include generalized autoregressive conditional heteroscedastic (GARCH), exponential GARCH (EGARCH), and threshold GARCH (TGARCH) models. However, these models rely on explicit functional form assumptions which can lead to model misspecification problem. Nonparametric models, on the other hand, are free from such functional form assumptions and possess model flexibility. In this talk, we show how to estimate financial volatility using multivariate adaptive regression splines (MARS) as a preliminary analysis to build a nonparametric volatility model. Despite its popularity, MARS has never been applied to model financial volatility. To implement the MARS methodology in a time series setting, we let the predictor variables to be lagged values which results in a model referred to as adaptive spline threshold autoregression (ASTAR). The estimation is illustrated through simulations and empirical examples by using historical stock data and exchange rate data. We compare the performance of MARS volatility model with the existing models by using several out-of-sample goodness-of-fit measures. Back To Top

 October 29, 2012 Speaker: Emilola Abayomi, Ph.D Candidate, Dissertation Title: The Relationship between Body Mass and Blood Pressure in Diverse Populations When: October 29, 2012 12:00 pm Where: OSB 215 Abstract: High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body mass is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waist-to-hip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index,BMI= Weight(kg)/Height(m)^2. The relationship between the level of blood pressure and BMI has been perceived to be linear and strong. This thesis examined the relationship of blood pressure and BMI among diverse populations. The Diverse Populations Collaboration is a dataset comprised of almost 30 observational studies from around the world. We conducted a meta-analysis to explore heterogeneity that may be present amongst the relationship in diverse populations. If heterogeneity was present, a meta-regression was conducted to determine if characteristics such as race and gender explain the differences among studies. We also examined the functional form of BMI and blood pressure to determine whether a linear assumption was acceptable when modeling the relationship in all populations. Back To Top

 October 26, 2012 Speaker: Ciprian Crainiceanu, Department of Biostatistics, Johns Hopkins University Title: Longitudinal analysis of high resolution structural brain images When: October 26, 2012 10:00 am Where: 108 OSB Abstract: The talk will provide a gentle introduction to brain imaging and describe the problems associated with the longitudinal analysis of ultra-high dimensional 3D brain images. In particular, I will describe the work we have done to understand and characterize the micro structure of white matter brain tracts as well as lesion occurrence and development in a large cohort of subjects who suffer of multiple sclerosis. The statistical methods developed are in response to real scientific problems from our first line collaborations with our colleagues from NIH and Johns Hopkins School of Medicine. For more information about the speaker: www.biostat.jhsph.edu/~ccrainic. For more information about the research group: www.smart-stats.org. Back To Top

 October 12, 2012 Speaker: Michelle Arbeitman, College of Medicine, FSU Title: Genes to Behavior: Genomic analyses of sex-specific behaviors When: October 12, 2012 10:00 am Where: 108 OSB Abstract: My lab is interested in understanding the molecular-genetic basis of complex behaviors. We use the model system Drosophila melanogaster (fruit flies) to address our questions. Drosophila is a ideal model to study behavior as there are powerful tools for molecular-genetic studies and males and female flies display complex reproductive behaviors that are genetically specified by one of the best characterized genetic regulatory hierarchies. My talk will introduce next generation sequencing technologies and some of the computational and statistical challenges in analyzing these data sets. I will also present some of our experimental results on Drosophila sex-specific biology that were obtained utilizing next generation sequencing platforms. Back To Top

 October 5, 2012 Speaker: Adrian Barbu, Dept. of Statistics, FSU Title: Feature Selection by Scheduled Elimination When: October 5, 2012 10:00 am Where: 108 OSB Abstract: Many computer vision and medical imaging problems are faced with learning classifiers from large datasets, with millions of observations and features. In this work we propose a novel efficient algorithm for variable selection and learning on such datasets, optimizing a constrained penalized likelihood without any sparsity inducing priors. The iterative suboptimal algorithm alternates parameter updates with tightening the constraints by gradually removing variables based on a criterion and a schedule. We present a generic approach applicable to any differentiable loss function and present an application to logistic regression. We use one dimensional piecewise linear response functions for nonlinearity and introduce a second order prior on the response functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method usually outperforms Logitboost and L1-penalized methods for both variable selection and prediction while being computationally faster. Back To Top

 September 28, 2012 Speaker: Vladimir Koltchinskii, Dept. of Mathematics, Georgia Tech Title: Complexity Penalization in Low Rank Matrix Recovery When: September 28, 2012 10:00 am Where: 108 OSB Abstract: The problem of estimation of a large Hermitian matrix based on random linear measurements will be discussed. Such problems have been intensively studied in the recent years in the cases when the target matrix has relatively small rank, or it can be well approximated by small rank matrices. Important examples include matrix completion, where a random sample of entries of the target matrix is observed, and quantum state tomography, where the target matrix is a density matrix of a quantum system and it has to be estimated based on the measurements of a finite number of randomly picked observables. We will consider several approaches to such problems based on a penalized least squares method (and its modifications) with complexity penalties defined in terms of nuclear norm, von Neumann entropy and other functionals that “promote” small rank solutions and discuss oracle inequalities for the resulting estimators with explicit dependence of the error terms on the rank and other parameters of the problem. We will also discuss a version of these methods when the target matrix is a “smooth ” low rank kernel defined on a large graph and the goal is to design estimators that are adaptive simultaneously to the rank of the kernel and to its degree of smoothness. Back To Top

 September 21, 2012 Speaker: Xiaotong Shen, John Black Johnston Distinguished Professor, School of Statistics, University of Minnesota Title: On personalized information filtering When: September 21, 2012 10:00 am Where: 108 OSB Abstract: Personalized information filtering extracts the information specifically relevant to a user, based on the opinions of users who think alike or the content of the items that a specific user prefers. In this presentation, we discuss latent models to utilize additional user-specific and content-specific predictors, for personalized prediction. In particular, we factorize a user-over-item preference matrix into a product of two matrices, each having the same rank as the original matrix. On this basis, we seek a sparsest latent factorization from a class of overcomplete factorizations, possibly with a high percentage of missing values. A likelihood approach is discussed, with an emphasis towards scalable computation. Examples will be given to contrast with popular methods for collaborative filtering and contented-based filtering. This work is joint with Changqing Ye and Yunzhang Zhu. Back To Top

 September 14, 2012 Speaker: Xiuwen Liu, FSU Dept. of Computer Science Title: Quantitative Models for Nucleosome Occupancy Prediction When: September 14, 2012 10:00 am Where: 108 OSB Abstract: Nucleosome is the basic unit of DNA in eukaryotic cells. As nucleosomes limit the accessibility of the wrapped DNA to transcription factors and other DNA-binding proteins, their positions play an essential role in regulations of gene activities. Experiments have indicated that DNA sequence itself strongly influences nucleosome positioning by enhancing or reducing their binding affinity to nucleosomes, therefore providing an intrinsic cell regulatory mechanism. In this talk I will present quantitative models that I have developed for nucleosome occupancy prediction with Prof. Jonanthan Dennis and my students. In particular, I will focus on two models we have proposed recently. The first one is a new dinucleotide matching model, where we propose a new feature set for nucleosome occupancy prediction and learn the parameters via regression; evaluation using a genome-wide dataset shows that our model gives most accurate prediction than existing models. The second one is a new algorithm to achieve the ultimate single basepair resolution in localizing nucleosomes by posing the genome-wide localization problem as a classification using datasets via chemical mapping. Short Bio: Xiuwen Liu received his PhD from the Ohio State University in 1999 in Computer and Information Science and joined the Department of Computer Science at the Florida State University in 2000, where he is a full professor. His recent areas of research interest include computational models for Biology, image analysis, machine learning, computer security, and manifold-based modeling for security in cyber-physical systems. Back To Top

 August 9, 2012 Speaker: Senthil Girimurugan Title: Detecting differences in Signals via reduced dimension Wavelets When: August 9, 2012 11:00 am Where: OSB 215 Abstract: All processes in engineering and other fields of science either have a signal as an output or contain an underlying signal that describes the process. A process can be understood in detail by analyzing the associated signal in an efficient manner. In statistical quality control, such an analysis is carried out by monitoring profiles (signals) and detecting differences between an in-control (IC) and an out-of-control (OOC) signal. The dimensions of profiles have increased tremendously with recent advancements in technology resulting in an increased complexity of analysis. In this work, we explore several methods in detecting signal differences by reducing dimension using Wavelets. The methodology involves the well-known Hotelling T2 statistic improved by Wavelets. In the current work, a statistical power analysis is conducted to determine the efficiency of this statistic in detecting local, global differences and laying a foundation to a Wavelet based ANOVA setup involving the proposed statistic. Also, as an application, the proposed methodology is applied to detect differences in genetic data. Back To Top

 May 4, 2012 Speaker: Jingyong Su, FSU Dept. of Statistics Title: Estimation, Analysis and Modeling of Random Trajectories on Nonlinear When: May 4, 2012 10:00 am Where: OSB 215 Abstract: A growing number of datasets now contain both a spatial and a temporal dimension. Trajectories are natural spatiotemporal data descriptors. Estimation, analysis and modeling of such trajectories are thus becoming increasingly important in many applications ranging from computer vision to medical imaging. Many problems in these areas are naturally posed as problems on nonlinear manifolds. This is because there are some intrinsic constraints on the pertinent features that force the corresponding representations to these manifolds. There are many difficulties when estimating and analyzing random trajectories on nonlinear manifold. First, most of standard techniques on Euclidean spaces cannot be directly extended to nonlinear manifolds. Furthermore, such trajectories are always noisy, parametrized. In this work, we begin by estimating full paths on common nonlinear manifolds using only a set of time-indexed points, for use in interpolation, smoothing, and prediction of dynamic systems. Next, we address the problem of registration and comparison of such temporal trajectories. In future work, we will focus on modeling random trajectories on nonlinear manifolds. Back To Top

 April 27, 2012 Speaker: Ester Kim, FSU Dept of Statistics Title: An Ensemble Approach to Predict the Risk of Coronary and Cardiovascular Disease When: April 27, 2012 3:30 pm Where: OSB 215 Abstract: Coronary and cardiovascular diseases continue to be the leading cause of mortality in the United States and across the globe. They are also estimated to have the highest medical expenditures in the United States among chronic diseases. Early detection of the development of a heart disease plays a critical role in preserving heart health and its accurate prediction is highly valuable information for early treatment. For the past few decades, estimates of coronary or cardiovascular risks have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field, but few have been applied in disease prediction, particularly for coronary or cardiovascular risks. We first evaluate the predictive performance of the machine learning models, the multilayer perceptron network and the k-nearest neighbor, to the statistical models logistic regression and the Cox proportional hazards. Our aim is to combine these predictive models into one model in an ensemble approach for a superior classification performance. The ensemble approaches include bagging, which is a bootstrap aggregating model, and a multimodel ensemble, which is a combination of independently constructed models. The ensemble models are also evaluated for predictive performance comparative to the single models. Various measures and methods are used to evaluate the models’ performances based on the Framingham Heart Study data. Back To Top

 April 27, 2012 Speaker: Sebastian Kurtek, Ph.D Candidate, Dissertation Title: Riemannian Shape Analysis of Curves and Surfaces When: April 27, 2012 10:00 am Where: Abstract: Shape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the analysis must be performed on infinite-dimensional and sometimes non-linear spaces, which poses an additional difficulty. In this work, we develop and apply methods, which address these issues. We begin by defining a framework for shape analysis of parameterized open curves and extend these ideas to shape analysis of surfaces. We utilize the presented frameworks in various classification experiments spanning multiple application areas. In the case of curves, we consider the problem of clustering DT-MRI brain fibers, classification of protein backbones, modeling and segmentation of signatures and statistical analysis of biosignals. In the case of surfaces, we perform disease classification using 3D anatomical structures in the brain, classification of handwritten digits by viewing images as quadrilateral surfaces, and finally classification of cropped facial surfaces. We provide two additional extensions of the general shape analysis frameworks that are the focus of this thesis. The first one considers shape analysis of marked spherical surfaces where in addition to the surface information we are given a set of manually or automatically generated landmarks. This requires additional constraints on the definition of the re-parameterization group and is applicable in many domains, especially medical imaging and graphics. Second, we consider reflection symmetry analysis of planar closed curves and spherical surfaces. Here, we also provide an example of disease detection based on brain asymmetry measures. We close with a brief summary and a discussion of open problems, which we plan on exploring in the future. Back To Top

 April 20, 2012 Speaker: Sunil Rao, University of Miami Title: Best Predictive Estimation for Linear Mixed Models with Applications to Small Area Estimation When: April 20, 2012 10:00 am Where: OSB 110 Abstract: We derive the best predictive estimator (BPE) of the fixed parameters for a linear mixed model. This leads to a new prediction procedure called observed best prediction (OBP), which is different from the empirical best linear unbiased prediction (EBLUP). We show that BPE is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood (ML) and restricted maximum likelihood (REML), if the main interest is the prediction of the mixed effect. We show how the OBP can significantly outperform the EBLUP in terms of mean squared prediction error (MSPE) if the underlying model is misspecified. On the other hand, when the underlying model is correctly specified, the overall predictive performance of the OBP can be very similar to the EBLUP. The well known Fay-Herriot small area model is used as an illustration of the methodology. In addition, simulations and analysis of a data set on graft failure rates from kidney transplant operations will be used to show empirical performance. This is joint work with Jiming Jiang of UC-Davis and Thuan Nguyen of Oregon Health and Science University. Back To Top

 April 13, 2012 Speaker: Gretchen Rivera, FSU Dept. of Statistics Title: Meta Analysis of Measures of Discrimination and Prognostic Modeling When: April 13, 2012 10:00 am Where: OSB 110 Abstract: In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper, the logistic model has been used. The dataset used is the Diverse Populations Collaboration (DPC) dataset, which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. My predictors are: age, diabetes, total serum cholesterol (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 70 cohort groups. Our main interest is to evaluate how well the prognostic modeling discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUC). The AUC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUC and its standard error (SE). Given the association between the AUC and the Wilcoxon statistic we use the Wilcoxon statistic to estimate the SE. We used Meta-analysis to find the overall AUC and to evaluate if there is heterogeneity in our estimates. To evaluate the extent of heterogeneity we used the Q statistic. Since, heterogeneity was found in our study we compare seven different methods for estimating between study variance. Back To Top

 April 6, 2012 Speaker: Xu Han, University of Florida Title: False Discovery Control Under Arbitrary Dependence When: April 6, 2012 10:00 am Where: OSB 110 Abstract: Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of hypotheses are tested simultaneously to find if any genes are associated with some traits; in finance, thousands of tests are performed to see which fund managers have winning ability. In practice, these tests are correlated. False discovery control under arbitrary covariance dependence is a very challenging and important open problem in the modern research. We propose a new methodology based on principal factor approximation, which successfully ex- tracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used for rejection, and provide a consistent estimate of FDP. Specifically, we decompose the test statistics into an approximate multifactor model with weakly dependent errors, derive the factor loadings and estimate the unobserved but realized factors which account for the dependence by L1- regression. Asymptotic theory is derived to justify the consistency of our proposed method. This result has important applications in controlling FDR and FDP. The nite sample performance of our procedure is critically evaluated by various simulation studies. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by in the simulated examples. Our approach is further illustrated by some real data in genome-wide association studies. This is joint work with Professor Jianqing Fan and Mr. Weijie Gu at Princeton University. fields. In genome-wide association studies, tens of thousands of hypotheses are tested simultaneously to find if any genes are associated with some traits; in fin ance, thousands of tests are performed to see which fund managers have winning ability. In practice, these tests are correlated. False discovery control under arbitrary covariance dependence is a very challenging and important open problem in the modern research. We propose a new methodology based on principal factor approximation, which successfully ex- tracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used for rejection, and provide a consistent estimate of FDP. Specifically, we decompose the test statistics into an approximate multifactor model with weakly dependent errors, derive the factor loadings and estimate the unobserved but realized factors which account for the dependence by L1- regression. Asymptotic theory is derived to justify the consistency of our proposed method. This result has important applications in controlling FDR and FDP. The nite sample performance of our procedure is critically evaluated by various simulation studies. Our estimate of FDP compares favorably with Efron (2007)'s approach, as demonstrated by in the simulated examples. Our approach is further illustrated by some real data in genome-wide association studies. This is joint work with Professor Jianqing Fan and Mr. Weijie Gu at Princeton University. Back To Top

 March 30, 2012 Speaker: Jordan Cuevas, Ph.D Candidate, Dissertation Title: Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage When: March 30, 2012 2:00 pm Where: OSB 108 Abstract: Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an in-control process, and Phase II, in which new data is monitored for deviations from the in-control form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an n-dimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a Phase-I setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the Phase-II problem of monitoring sequences of profiles for deviations from in-control. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation. Back To Top

 March 30, 2012 Speaker: Jinfeng Zhang, FSU Dept. of Statistics Title: Statistical approaches for protein structure comparison and their applications in protein function prediction When: March 30, 2012 10:00 am Where: OSB 110 Abstract: Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. Our method performs comparably with commonly used methods in protein structure classification, but with a much improved speed. Some recent result on comparison of protein surfaces will also be presented. Back To Top

 March 29, 2012 Speaker: Paul Hill Title: Bootstrap Prediction Bands for Non-Parametric Function Signals in a Complex System When: March 29, 2012 2:00 pm Where: BEL 243 Abstract: Methods employed in the construction of prediction bands for continuous curves require a different approach to those used for a data point. In many cases, the underlying function is unknown and thus a distribution-free approach which preserves sufficient coverage for the signal in its entirety is necessary in the signal analysis. Four methods for the formation of (1-?) 100% prediction and containment bands are presented and their performances are compared through the coverage probabilities obtained. These techniques are applied to constructing prediction bands for spring discharge in a successful manner giving good coverage in each case. Spring discharge measured over time can be considered as a continuous signal and the ability to predict the future signals of spring discharge is useful for monitoring flow and other issues such as contaminant influence related to the spring. There has been common use of the gamma distribution in the simulation of rainfall. We propose a bootstrapping method to simulate rainfall. This allows for adequately creating new samples over different periods of time as well as specific rain events such as hurricanes or drought. Both non-windowed and windowed approaches to bootstrapping the recharge are considered as well as the resulting effects on the prediction band coverage for the spring discharge. This non-parametric approach to the input rainfall augurs well for the non-parametric nature of the output signal. In addition to the above, the question arises as to whether the discharge is dependent on the pathway navigated by the flow. These pathways are referred to as "trees" and are of great interest because identifying significant differences between trees leads to establishing a classification for them which could aid in better establishing a model that fits any given input recharge data. A T2 test assumes multivariate normality. Since we cannot make that assumption in this instance, a non-parametric approach with less rigorous assumptions is desired. A classification test via the k-means clustering process is utilized to distinguish between the pathways taken by the flow of the discharge in the spring. Back To Top

 March 28, 2012 Speaker: Rachel Becvarik , FSU Dept. of Statistics Title: An Alternative Upper Control Limit to the Average Run Length to Balance Power and False Alarms When: March 28, 2012 9:00 am Where: OSB 215 Abstract: It has been shown likelihood ratio tests successfully monitor for changes in profiles involving high dimensional nonlinear data. These methods focus on using a traditional flat line upper control limit (UCL) based on average run length (ARL). The current methods do not take into consideration either the error or power associated with the test or the underlying distribution of the ARL. Additionally, if the statistic is known to be increasing over time, the flat UCL does not adapt to the increase. This paper will focus on a method to find the most powerful UCL for an increasing statistic at a specified type I error. Back To Top

 March 27, 2012 Speaker: Jihyung Shin, FSU Dept. of Statistics Title: Mixed-effects and mixed-distribution models for count data with applications to educational research data. When: March 27, 2012 3:30 pm Where: OSB 215 Abstract: This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non-negative values. In such cases, a lognormal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixed-effects and mixed-distribution models by Tooze(2002) et al. for semicontinuous data and zero-inflated Poisson regression models by Lambert(1992) for count data are reviewed. Then, we apply zero-inflated Poisson models to repeated measures data of zero-inflated data by introducing a pair of possibly correlated random effects to the zero-inflated Poisson model to accommodate within-subject correlation and between subject heterogeneity. The likelihood function is maximized using dual quasi-Newton optimization of an approximated by adaptive Gaussian quadrature through standard statistical software package. The simulation study and application results are also presented. Back To Top

 March 26, 2012 Speaker: Jianchang Lin Title: Semiparametric Bayesian survival analysis using models with log-linear median When: March 26, 2012 1:00 pm Where: 215 OSB Abstract: First, we present two novel semiparametric survival models with log-linear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our modeling techniques facilitate the ease of prior elicitation and computation for both parametric and semiparametric Bayesian analysis of survival data. We illustrate the advantages of our modeling, as well as model diagnostics, via reanalysis of a small-cell lung cancer study. Results of our simulation study provide further guidance regarding appropriate modelling in practice. Our second goal is to develop the methods of analysis and associated theoretical properties for interval censored and current status survival data. These new regression models use log-linear regression function for the median. We present frequentist and Bayesian procedures for estimation of the regression parameters. Our model is a useful and practical alternative to the popular semiparametric models which focus on modeling the hazard function. We illustrate the advantages and properties of our proposed methods via reanalyzing a breast cancer study. Our other aim is to develop a model which is able to account for the heteroscedasticity of response, together with robust parameter estimation and outlier detection using sparsity penalization. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing median lasso regression model. Considering the estimation bias, mean squared error and other identification benchmark measures, our proposed model performs better than the competing frequentist estimator. Back To Top

 March 23, 2012 Speaker: Bob Clickner, FSU Dept. of Statistics Title: Statistical Investigation of the Relationship between Fish Consumption and Mercury in Blood When: March 23, 2012 10:00 am Where: OSB 110 Abstract: Fish and shellfish are an important and healthy source of many nutrients, including protein, vitamins, omega-3 fatty acids and others. However, humans are also exposed to methylmercury (MeHg) through the consumption of finfish and shellfish. Mercury released into the environment is converted to MeHg in soils and sediments and bioaccumulates through aquatic food webs. This bioaccumulation leads to increased levels of MeHg in large, predatory fish. MeHg exposure in utero is associated with adverse health effects, e.g., neuropsychological deficits such as IQ and motor function deficits, in children. Over a period of several years, we studied exposure to MeHg via fish and shellfish consumption through a series of statistical analyses of data on fish tissue mercury concentrations and 1999-2008 NHANES blood mercury concentrations and fish consumption data in women of reproductive age (16-49 years). The objective was to investigate the strength and level of the association and patterns in fish consumption and mercury exposure, including demographic, socio-economic, geographic, and temporal trends. Blood MeHg was calculated from the blood total and inorganic concentrations after imputing below-detection-limit concentrations. NHANES dietary datasets were combined to estimate 30-day finfish/shellfish consumption. Fish tissue mercury concentrations were combined with the NHANES data to estimate 30-day mercury intake per gram of body weight. Linear and logistic regression analyses were used to evaluate associations and trends, adjusting for demographic characteristics. Back To Top

 March 16, 2012 Speaker: Wei Wu, FSU Dept. of Statistics Title: Consistency Theory for Signal Estimation under Random Time-Warping When: March 16, 2012 10:00 am Where: OSB 110 Abstract: Function registration/alignment is one of the central problems in Functional Data Analysis and has been extensively investigated over the past two decades. Using a generative model, this problem can also be studied as a problem of estimating signal observed under random time-warpings. An important requirement here is that the estimator should to be consistent, i.e. it converges to the underlying deterministic function when the observation size goes to infinity. This has not been accomplished by previous methods in general terms. We have recently introduced a novel framework for estimating the unknown signal under random warpings, and have shown its superiority to the state-of-the-art performance in function registration/alignment. Here we demonstrate that the proposed algorithm leads to a consistent estimator of the underlying signal. This estimation is also illustrated with convincing examples. Furthermore, we extend our method to estimation for multi-dimensional signals by providing rigorous proofs and illustrative examples. This is joint work with Anuj Srivastava. Back To Top

 March 2, 2012 Speaker: Piyush Kumar, FSU Dept. of Computer Science Title: Instant approximate 1-center on roads When: March 2, 2012 10:00 am Where: OSB 110 Abstract: Computing the mean, center or median is one of the fundamental tasks in many applications. In this talk, I will present an algorithm to compute 1-center solutions on road networks, an important problem in GIS. Using Euclidean embeddings, and reduction to fast nearest neighbor search, we devise an approximation algorithm for this problem. Our initial experiments on real world data sets indicate fast computation of constant factor approximate solutions for query sets much larger than previously computable using exact techniques. Our techniques extend to k-clustering problems as well. I will end with some interesting open problems we are working on. This is joint work with my students : Samidh Chatterjee, James McClain and Bradley Neff. Back To Top

 March 1, 2012 Speaker: Jun Li, Dept. of Statistics, Stanford University Title: "Differential Expression Identification and False Discovery Rate Estimation in RNA-Seq Data" When: March 1, 2012 11:00 am Where: OSB 215 Abstract: RNA-Sequencing (RNA-Seq) is taking place of microarrays and becoming the primary tool for measuring genome-wide transcript expression. We discuss the identification of features (genes, isoforms, exons, etc.) that are associated with an outcome in RNA-Seq and other sequencing-based comparative genomic experiments. That is, we aim to find features that are differentially expressed in samples in different biological conditions or under different disease statuses. RNA-Seq data take the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or “sequencing depths”. Existing methods for this problem are based on Poisson or negative-binomial models: they are useful but can be heavily influenced by “outliers” in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class, or multiple-class outcomes. We compare our proposed method to Poisson and negative-binomial based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods. Back To Top

 February 29, 2012 Speaker: Cun-Hui Zhang, Rutgers University Dept. of Statistics Title: Statistical Inference with High-Dimensional Data When: February 29, 2012 3:30 pm Where: OSB 108 Abstract: We propose a semi low-dimensional (LD) approach for statistical analysis of certain types of high-dimensional (HD) data. The proposed approach is best described with the following model statement: model = LD component + HD component. The main objective of this semi-LD approach is to develop statistical inference procedures for the LD component, including p-values and confidence regions. This semi-LD approach is very much inspired by the semiparametric approach in which a statistical model is decomposed as follows: model = parametric component + nonparametric component. Just as in the semiparametric approach, the worst LD submodel gives the minimum Fisher information for the LD component, along with an efficient score function. The efficient score function, or an estimate of it, can be used to derive an efficient estimator for the LD component. The efficient estimator is asymptotically normal with the inverse of the minimum Fisher information as its asymptotic covariance matrix. This asymptotic covariance matrix may be consistently estimated in a natural way. Consequently, approximate confidence intervals and p-values can be constructed. Back To Top

 February 29, 2012 Speaker: Daniel Osborne, Ph.D candidate, FSU Dept. of Statistics Title: Nonparametric Data Analysis on Manifolds with Applications in Medical Imaging When: February 29, 2012 10:30 am Where: Montgomery Gym (Mon) Rm 102 Abstract: Over the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the Size-and-Reflection Shape Space SR?_3,0^k of k-ads in general position in 3D. This work is a part of larger project on planning reconstructive surgery in severe skull injuries which includes preprocessing and post-processing steps of CT images. The next problem corresponds to analyzing MR diffusion tensor imaging data. Here, we develop a two-sample procedure for testing the equality of the generalized Frobenius means of two independent populations on the space of symmetric positive matrices. These new methods, naturally lead to an analysis based on Cholesky decompositions of covariance matrices which helps to decrease computational time and does not increase dimensionality. The resulting nonparametric matrix valued statistics are used for testing if there is a difference on average between corresponding signals in Diffusion Tensor Images (DTI) in young children with dyslexia when compared to their clinically normal peers. The results presented here correspond to data that was previously used in the literature using parametric methods which also showed a significant difference. Back To Top

 February 28, 2012 Speaker: Eric Lock, Dept of Statistics, University of North Carolina at Chapel Hill Title: Joint and Individual Variation Explained (JIVE) for Integrated Analysis of Multiple Datatypes. When: February 28, 2012 3:30 pm Where: OSB 110 Abstract: Research in a number of fi elds now requires the analysis of datasets in which multiple high-dimensional types of data are available for a common set of objects. We introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such datasets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across datatypes, low-rank approximations for structured variation individual to each datatype, and residual noise. JIVE quantifies the amount of joint variation between datatypes, reduces the dimensionality of the data in an insightful way, and provides new directions for the visual exploration of joint and individual structure. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. We describe a JIVE analysis of gene expression and microRNA data for cancerous tumor samples, and discuss additional applications. This is joint work with Andrew Nobel, J.S. Marron and Katherine Hoadley. Back To Top

 February 27, 2012 Speaker: Kelly McGinnity, FSU Dept. of Statistics Title: Nonparametric Cross-Validated Wavelet Thresholding for Non-Gaussian Errors When: February 27, 2012 11:00 am Where: OSB 215 Abstract: Wavelet thresholding generally assumes independent, identically distributed Gaussian errors when estimating functions in a nonparametric regression setting. VisuShrink and SureShrink are just two of the many common thresholding methods based on this assumption. When the errors are not normally distributed, however, few methods have been proposed. In this paper, a distribution-free method for thresholding wavelet coefficients in nonparametric regression is described. Unlike some other non-normal error thresholding methods, the proposed method does not assume the form of the nonnormal distribution is known. A simulation study shows the efficiency of the proposed method on a variety of non-Gaussian errors, including comparisons to existing wavelet threshold estimators. Back To Top

 February 16, 2012 Speaker: Alec Kercheval, FSU Dept. of Mathematics Title: A generalized birth-death stochastic model for high-frequency order book dynamics in the electronic stock market When: February 16, 2012 2:00 pm Where: DSL 499 Abstract: The limit order book is an electronic clearing house for limit and market orders operated by the major stock exchanges. Computer driven traders interact with the exchange using this order book on the millisecond time scale. Traders and regulators are interested in understanding the dynamics of this object as it can affect the economy as a whole, now that more than 50% of all trading volume on the NYSE is from automated trades. In this talk we look at the structure of the limit order book and discuss ways to model the evolution of prices in order to compute probabilities of interest to traders. Back To Top

 February 10, 2012 Speaker: Jennifer Geis, Ph.D. candidate, FSU Dept. of Statistics Title: Adaptive Canonical Correlation Analysis through a Weighted Rank Selection Criterion: Inferential Methods for Multivariate Response Models with Applications to a HIV/Neurocognitive Study When: February 10, 2012 3:30 pm Where: OSB 108 Abstract: Multivariate response models are being used increasingly more in almost all fields, employing inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of canonical relationships, or, equivalently so, determining the rank of the coefficient estimator which may be done using the Rank Selection Criterion (RSC) by Bunea et al. under an i.i.d. assumption on the error terms. While necessary to show their strong theoretical results, some flexibility is required in practical application. What is developed here are theoretics for the large sample setting that parallels their work, providing support for the addition of a decorrelator'' weight matrix. One such possibility in the large sample setting is the sample residual covariance. However, a computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving an Adaptive CCA (ACCA). However, particular considerations are required for the high dimensional setting as similar theoretics no longer hold. What will be offered instead are extensive simulations that will reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also good estimation of the number of canonical relationships and variates. It will be argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the high dimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA for inferential conclusions. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these other concerns. To offer a practical application of these ideas, ACCA will be applied to a neuroimaging dataset. A high dimensional dataset will be generated from this large sample set to which Group LASSO will be first utilized before ACCA. A unique perspective may then be offered into the relationships between cognitive deficiencies in HIV-positive patients and extensive, available neuroimaging measures. Back To Top

 February 10, 2012 Speaker: Debdeep Pati Title: Nonparametric Bayes learning of low dimensional structure in big objects When: February 10, 2012 10:00 am Where: OSB 110 Abstract: The first part of the talk will focus on Bayesian nonparametric models for learning low-dimensional structure underlying higher dimensional objects with special emphasis on models for 2D and 3D shapes where the data typically consists of points embedded in 2D pixelated images or a cloud of points in $\mathbb{R}^3$. Models for distributions of shapes can be widely used in biomedical applications ranging from tumor tracking for targeted radiation therapy to classifying cells in a blood sample. We propose tensor product-based Bayesian probability models for 2D closed curves and 3D closed surfaces. We initially consider models for a single surface using a cyclic basis and array shrinkage priors. The model avoids parameter constraints, leads to highly efficient posterior computation, and has strong theoretical properties including near minimax optimal rates. Focusing on the 2D case, we also develop a multiscale deformation model for joint alignment and analysis of related shapes motivated by data on images containing many related objects. Efficient and scalable algorithms are developed for posterior computation, and the models are applied to 3D surface estimation data from the literature and 2D imaging data on cell shapes. In developing general purpose models for potentially high-dimensional objects and surfaces, it is important to consider theoretical properties. In the final part of the talk, we give an overview of our recent theoretical results on large support, consistency and minimax optimal rates in Bayesian models for regression surfaces and density regression. Back To Top

 February 3, 2012 Speaker: Zhihua Sophia Su Title: Envelope Models and Methods When: February 3, 2012 10:00 am Where: OSB 110 Abstract: This talk presents a new statistical concept called an envelope. An envelope has the potential to achieve substantial efficiency gains in multivariate analysis by identifying and cleaning up immaterial information in the data. The efficiency gains will be demonstrated both by theory and example. Some recent developments in this area, including partial envelopes and inner envelopes, will also be discussed. They refine and extend the enveloping idea, adapting it to more data types and increasing the potential to achieve efficiency gains. Applications of envelopes and their connection to other fields will also be mentioned. Back To Top

 January 27, 2012 Speaker: Harry Crane Title: Partition-valued Processes and Applications to Phylogenetic Inference When: January 27, 2012 10:00 am Where: OSB 110 Abstract: In this talk, we present the cut-and-paste process, a novel infinitely exchangeable process on the state space of partitions of the natural numbers whose sample paths differ from previously studied exchangeable coalescent (Kingman 1982; Pitman 1999) and fragmentation (Bertoin 2001) processes. We discuss some mathematical properties of this process as well as a two parameter subfamily which has a matrix as one of its parameters. This matrix can be interpreted as a similarity matrix for pairwise relationships and has a natural application to inference of the phylogenetic tree of a group of species for which we have mitochondrial DNA data. We compare the results of this inference to those of some other methods and discuss some computational issues which arise as well as some natural extensions of this model to Bayesian inference, hidden Markov models and tree-valued Markov processes. We also discuss how this process and its extensions fit into the more general framework of statistical modeling of structure and dependence via combinatorial stochastic processes, e.g.\ random partitions, trees and networks, and the practical importance of infinite exchangeability in this context. Back To Top

 January 20, 2012 Speaker: Anindra Bhadra Title: Simulation-based maximum likelihood inference for partially observed Markov process models When: January 20, 2012 10:00 am Where: OSB 110 Abstract: Estimation of static (or time constant) parameters in a general class of nonlinear, non-Gaussian, partially observed Markov process models is an active area of research. In recent years, simulation-based techniques have made estimation and inference feasible for these models and have offered great flexibility to the modeler. An advantageous feature of many of these techniques is that there is no requirement to evaluate the state transition density of the model, which is often high-dimensional and unavailable in closed-form. Instead, inference can proceed as long as one is able to simulate from the state transition density - often a much simpler problem. In this talk, we introduce a simulation-based maximum likelihood inference technique known as iterated filtering that uses an underlying sequential Monte Carlo (SMC) filter. We discuss some key theoretical properties of iterated filtering. In particular, we prove the convergence of the method and establish connections between iterated filtering and well-known stochastic approximation methods. We then use the iterated filtering technique to estimate parameters in a nonlinear, non-Gaussian mechanistic model of malaria transmission and answer scientific questions regarding the effect of climate factors on malaria epidemics in Northwest India. Motivated by the challenges encountered in modeling the malaria data, we conclude by proposing an improvement technique for SMC filters used in an off-line, iterative setting. Back To Top

 January 13, 2012 Speaker: Xinge Jessie Jeng Title: Optimal Sparse Signal Identification with Applications in Copy Number Variation Analysis When: January 13, 2012 10:00 am Where: 110 OSB Abstract: DNA copy number variation (CNV) plays an important role in population diversity and complex diseases. Motivated by CNV analysis based on high-density single nucleotide polymorphism (SNP) data, we consider two problems arising from the need to identify sparse and short CNV segments in long sequences of genome-wide data. The first problem is to identify the CNVs utilizing a single sample. An efficient likelihood ratio selection (LRS) procedure is developed, and its asymptotic optimality is presented for identifying short and sparse CNVs. The second problem aims to identify recurrent CNVs based on a large number of samples from a population. We propose a proportion adaptive segment selection (PASS) procedure that automatically and optimally adjusts to the unknown proportions of CNV carriers. In these problems, we introduce an innovative statistical framework for developing optimal procedures for CNV analysis. We study fundamental properties for signal identification by characterizing the detectable and the undetectable regions. Only in the detectable region, it is possible to consistently separate the CNV signals from noise. Such demarcations can provide deep insights towards methods development and serve as benchmarks for evaluating methods. We prove that the LRS and PASS are consistent in the interiors of each of their respective detectable regions, thus, implying asymptotic optimalities of the proposed methods. The proposed methods are demonstrated with simulations and analysis of a family trio dataset and a Neuroblastoma dataset. The results show that the LRS procedure can yield greater gain in power for detecting short CNVs than some popular CNV identification procedures and PASS significantly improves the power for CNV detection by pooling information from multiple samples and efficiently identifying both rare and common CNVs carried by neuroblastoma patients. Back To Top

 January 10, 2012 Speaker: Ingram Olkin Title: INEQUALITIES: THEORY OF MAJORIZATION AND ITS APPLICATIONS When: January 10, 2012 3:30 pm Where: 110 OSB Abstract: There are many theories of "equations": linear equations, differential equations, functional equations, and more, However, there is no central theory of "inequations" There are several general themes that lead to many inequalities. One such theme is convexity. Another theme is majorization, which is a particular partial order. What us important in this context is that the partial order have lots of examples, and that teh order-preserving functions be a rich class. In this case majorization arises in many fields: in mathematics:geometry, numerical analysis, graph theory; in other fields: physics, chemistry, political science, economics. In this talk we describe the origins of majorization and many examples of majorization and its consequences. Back To Top
 Google Maps Email: webmaster@stat.fsu.edu Office: 214 OSB 117 N. Woodward Ave. P.O. Box 3064330 Tallahassee, FL32306-4330 Phone: (850) 644-3218 Fax: (850) 644-5271Admissions Inquiries: info@stat.fsu.edu