FSU Seal logo
Department of Statistics
Home
The Department
People
Academics
News
Graduate Application
Consulting Center
Colloquium Notices
Other Information
Florida Chapter of the American Statistics Association
Related Programs
Jobs
Statistical Shape and Modeling Group


Search:
 
 Colloquium Series
[Print View]

Join the colloquium announcement mailing list
Colloquia Archive

Colloquia
April 20, 2012, 10:00 am Sunil Rao, Case Western Reserve University
April 13, 2012, 10:00 am Gretchen Rivera, FSU Dept. of Statistics
April 6, 2012, 10:00 am Xu Han, University of Florida
March 30, 2012, 10:00 am Jinfeng Zhang, FSU Dept. of Statistics
March 28, 2012, 9:00 am Rachel Becvarik
March 23, 2012, 10:00 am Bob Clickner, FSU Dept. of Statistics
March 16, 2012, 10:00 am Wei Wu, FSU Dept. of Statistics
March 2, 2012, 10:00 am Piyush Kumar, FSU Dept. of Computer Science
February 29, 2012, 3:30 am Cun-Hui Zhang, Rutgers University Dept. of Statistics
February 27, 2012, 11:00 am Kelly McGinnity, FSU Dept. of Statistics
February 16, 2012, 2:00 am Alec Kercheval, FSU Dept. of Mathematics
February 10, 2012, 3:30 pm Jennifer Geis, Ph.D. candidate, FSU Dept. of Statistics
February 10, 2012, 10:00 am Debdeep Pati
February 3, 2012, 10:00 am Zhihua Sophia Su
January 27, 2012, 10:00 am Harry Crane
January 20, 2012, 10:00 am Anindra Bhadra
January 13, 2012, 10:00 am Xinge Jessie Jeng
January 10, 2012, 3:30 pm Ingram Olkin
December 2, 2011, 10:05 am Dr. Ji Zhu
November 18, 2011, 10:05 am Dr. Kshitij Khare
November 8, 2011, 11:00 am Dr. Bertrand Clark, Professor of Statistics, Dept of Medicine and Dept of Epidemiology and Public Health Center for Computational Science Miller School of Medicine - Univ of Miami
November 4, 2011, 10:05 am Dr. Giray Okten
October 28, 2011, 10:05 am Dr. Howard Bondell
October 21, 2011, 10:05 am Dr. Jinfeng Zhang
October 14, 2011, 10:05 am Dr. Yiyuan She
October 7, 2011, 10:05 am Dr. Jonathan H. Dennis
September 30, 2011, 10:05 am Dr. Hui Zou
September 23, 2011, 10:10 am Dr. Robert Clickner
September 16, 2011, 10:10 am Dr. Adrian Barbu
September 2, 2011, 10:10 am Dr. Victor Patrangenaru
August 25, 2011, 2:00 pm Rommel Bain
August 23, 2011, 9:30 am Jihyung Shin
August 15, 2011, 9:30 am Wei Liu
August 3, 2011, 9:30 am Sentibaleng Ncube
July 29, 2011, 10:00 am Emilola Abayomi
July 28, 2011, 10:00 am Felicia Williams
June 9, 2011, 10:00 am Lindsey Bell
June 7, 2011, 10:00 am Greg Miller
June 2, 2011, 10:00 am Robert Holden
May 31, 2011, 1:00 pm Jennifer Geis
May 26, 2011, 10:00 am Leif Ellingson
May 19, 2011, 1:00 pm Jianchang Lin
May 19, 2011, 10:00 am Daniel Osborne
May 16, 2011, 9:30 am Tamika Royal-Thomas
May 2, 2011, 11:00 am Yinfeng Tao
April 29, 2011, 10:00 am Paul Hill
March 25, 2011, 2:00 pm Yu Gu
March 24, 2011, 3:35 pm Vernon Lawhern
March 16, 2011, 9:30 am Anqi Tang
February 25, 2011, 10:10 am Wei Wu
February 18, 2011, 10:10 am Adrian Barbu
February 11, 2011, 10:10 am Feng Zhao
January 28, 2011, 10:10 am Anuj Srivastava
January 21, 2011, 10:10 am Jim Berger of Duke University



April 20, 2012
Speaker:Sunil Rao, Case Western Reserve University
Title:
When:April 20, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

April 13, 2012
Speaker:Gretchen Rivera, FSU Dept. of Statistics
Title:
When:April 13, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

April 6, 2012
Speaker:Xu Han, University of Florida
Title:
When:April 6, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

March 30, 2012
Speaker:Jinfeng Zhang, FSU Dept. of Statistics
Title:Statistical approaches for protein structure comparison and their applications in protein function prediction
When:March 30, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

March 28, 2012
Speaker:Rachel Becvarik
Title:
When:March 28, 2012 9:00 am
Where:
Abstract:
Back To Top

March 23, 2012
Speaker:Bob Clickner, FSU Dept. of Statistics
Title:
When:March 23, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

March 16, 2012
Speaker:Wei Wu, FSU Dept. of Statistics
Title:
When:March 16, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

March 2, 2012
Speaker:Piyush Kumar, FSU Dept. of Computer Science
Title:
When:March 2, 2012 10:00 am
Where:OSB 110
Abstract:
Back To Top

February 29, 2012
Speaker:Cun-Hui Zhang, Rutgers University Dept. of Statistics
Title:
When:February 29, 2012 3:30 am
Where:OSB 108
Abstract:
Back To Top

February 27, 2012
Speaker:Kelly McGinnity, FSU Dept. of Statistics
Title:Nonparametric Cross-Validated Wavelet Thresholding for Non-Gaussian Errors
When:February 27, 2012 11:00 am
Where:OSB 215
Abstract:
Wavelet thresholding generally assumes independent, identically distributed Gaussian errors when estimating functions in a nonparametric regression setting. VisuShrink and SureShrink are just two of the many common thresholding methods based on this assumption. When the errors are not normally distributed, however, few methods have been proposed. In this paper, a distribution-free method for thresholding wavelet coefficients in nonparametric regression is described. Unlike some other non-normal error thresholding methods, the proposed method does not assume the form of the nonnormal distribution is known. A simulation study shows the efficiency of the proposed method on a variety of non-Gaussian errors, including comparisons to existing wavelet threshold estimators.
Back To Top

February 16, 2012
Speaker:Alec Kercheval, FSU Dept. of Mathematics
Title:
When:February 16, 2012 2:00 am
Where:DSL 499
Abstract:
Back To Top

February 10, 2012
Speaker:Jennifer Geis, Ph.D. candidate, FSU Dept. of Statistics
Title:Adaptive Canonical Correlation Analysis through a Weighted Rank Selection Criterion: Inferential Methods for Multivariate Response Models with Applications to a HIV/Neurocognitive Study
When:February 10, 2012 3:30 pm
Where:OSB 108
Abstract:
Multivariate response models are being used increasingly more in almost all fields, employing inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of canonical relationships, or, equivalently so, determining the rank of the coefficient estimator which may be done using the Rank Selection Criterion (RSC) by Bunea et al. under an i.i.d. assumption on the error terms. While necessary to show their strong theoretical results, some flexibility is required in practical application. What is developed here are theoretics for the large sample setting that parallels their work, providing support for the addition of a ``decorrelator'' weight matrix. One such possibility in the large sample setting is the sample residual covariance. However, a computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving an Adaptive CCA (ACCA). However, particular considerations are required for the high dimensional setting as similar theoretics no longer hold. What will be offered instead are extensive simulations that will reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also good estimation of the number of canonical relationships and variates. It will be argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the high dimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA for inferential conclusions. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these other concerns. To offer a practical application of these ideas, ACCA will be applied to a neuroimaging dataset. A high dimensional dataset will be generated from this large sample set to which Group LASSO will be first utilized before ACCA. A unique perspective may then be offered into the relationships between cognitive deficiencies in HIV-positive patients and extensive, available neuroimaging measures.
Back To Top

February 10, 2012
Speaker:Debdeep Pati
Title:Nonparametric Bayes learning of low dimensional structure in big objects
When:February 10, 2012 10:00 am
Where:OSB 110
Abstract:
The first part of the talk will focus on Bayesian nonparametric models for learning low-dimensional structure underlying higher dimensional objects with special emphasis on models for 2D and 3D shapes where the data typically consists of points embedded in 2D pixelated images or a cloud of points in $\mathbb{R}^3$. Models for distributions of shapes can be widely used in biomedical applications ranging from tumor tracking for targeted radiation therapy to classifying cells in a blood sample. We propose tensor product-based Bayesian probability models for 2D closed curves and 3D closed surfaces. We initially consider models for a single surface using a cyclic basis and array shrinkage priors. The model avoids parameter constraints, leads to highly efficient posterior computation, and has strong theoretical properties including near minimax optimal rates. Focusing on the 2D case, we also develop a multiscale deformation model for joint alignment and analysis of related shapes motivated by data on images containing many related objects. Efficient and scalable algorithms are developed for posterior computation, and the models are applied to 3D surface estimation data from the literature and 2D imaging data on cell shapes. In developing general purpose models for potentially high-dimensional objects and surfaces, it is important to consider theoretical properties. In the final part of the talk, we give an overview of our recent theoretical results on large support, consistency and minimax optimal rates in Bayesian models for regression surfaces and density regression.
Back To Top

February 3, 2012
Speaker:Zhihua Sophia Su
Title:Envelope Models and Methods
When:February 3, 2012 10:00 am
Where:OSB 110
Abstract:
This talk presents a new statistical concept called an envelope. An envelope has the potential to achieve substantial efficiency gains in multivariate analysis by identifying and cleaning up immaterial information in the data. The efficiency gains will be demonstrated both by theory and example. Some recent developments in this area, including partial envelopes and inner envelopes, will also be discussed. They refine and extend the enveloping idea, adapting it to more data types and increasing the potential to achieve efficiency gains. Applications of envelopes and their connection to other fields will also be mentioned.
Back To Top

January 27, 2012
Speaker:Harry Crane
Title:Partition-valued Processes and Applications to Phylogenetic Inference
When:January 27, 2012 10:00 am
Where:OSB 110
Abstract:
In this talk, we present the cut-and-paste process, a novel infinitely exchangeable process on the state space of partitions of the natural numbers whose sample paths differ from previously studied exchangeable coalescent (Kingman 1982; Pitman 1999) and fragmentation (Bertoin 2001) processes. We discuss some mathematical properties of this process as well as a two parameter subfamily which has a matrix as one of its parameters. This matrix can be interpreted as a similarity matrix for pairwise relationships and has a natural application to inference of the phylogenetic tree of a group of species for which we have mitochondrial DNA data. We compare the results of this inference to those of some other methods and discuss some computational issues which arise as well as some natural extensions of this model to Bayesian inference, hidden Markov models and tree-valued Markov processes. We also discuss how this process and its extensions fit into the more general framework of statistical modeling of structure and dependence via combinatorial stochastic processes, e.g.\ random partitions, trees and networks, and the practical importance of infinite exchangeability in this context.
Back To Top

January 20, 2012
Speaker:Anindra Bhadra
Title:Simulation-based maximum likelihood inference for partially observed Markov process models
When:January 20, 2012 10:00 am
Where:OSB 110
Abstract:
Estimation of static (or time constant) parameters in a general class of nonlinear, non-Gaussian, partially observed Markov process models is an active area of research. In recent years, simulation-based techniques have made estimation and inference feasible for these models and have offered great flexibility to the modeler. An advantageous feature of many of these techniques is that there is no requirement to evaluate the state transition density of the model, which is often high-dimensional and unavailable in closed-form. Instead, inference can proceed as long as one is able to simulate from the state transition density - often a much simpler problem. In this talk, we introduce a simulation-based maximum likelihood inference technique known as iterated filtering that uses an underlying sequential Monte Carlo (SMC) filter. We discuss some key theoretical properties of iterated filtering. In particular, we prove the convergence of the method and establish connections between iterated filtering and well-known stochastic approximation methods. We then use the iterated filtering technique to estimate parameters in a nonlinear, non-Gaussian mechanistic model of malaria transmission and answer scientific questions regarding the effect of climate factors on malaria epidemics in Northwest India. Motivated by the challenges encountered in modeling the malaria data, we conclude by proposing an improvement technique for SMC filters used in an off-line, iterative setting.
Back To Top

January 13, 2012
Speaker:Xinge Jessie Jeng
Title:Optimal Sparse Signal Identification with Applications in Copy Number Variation Analysis
When:January 13, 2012 10:00 am
Where:110 OSB
Abstract:
DNA copy number variation (CNV) plays an important role in population diversity and complex diseases. Motivated by CNV analysis based on high-density single nucleotide polymorphism (SNP) data, we consider two problems arising from the need to identify sparse and short CNV segments in long sequences of genome-wide data. The first problem is to identify the CNVs utilizing a single sample. An efficient likelihood ratio selection (LRS) procedure is developed, and its asymptotic optimality is presented for identifying short and sparse CNVs. The second problem aims to identify recurrent CNVs based on a large number of samples from a population. We propose a proportion adaptive segment selection (PASS) procedure that automatically and optimally adjusts to the unknown proportions of CNV carriers. In these problems, we introduce an innovative statistical framework for developing optimal procedures for CNV analysis. We study fundamental properties for signal identification by characterizing the detectable and the undetectable regions. Only in the detectable region, it is possible to consistently separate the CNV signals from noise. Such demarcations can provide deep insights towards methods development and serve as benchmarks for evaluating methods. We prove that the LRS and PASS are consistent in the interiors of each of their respective detectable regions, thus, implying asymptotic optimalities of the proposed methods. The proposed methods are demonstrated with simulations and analysis of a family trio dataset and a Neuroblastoma dataset. The results show that the LRS procedure can yield greater gain in power for detecting short CNVs than some popular CNV identification procedures and PASS significantly improves the power for CNV detection by pooling information from multiple samples and efficiently identifying both rare and common CNVs carried by neuroblastoma patients.
Back To Top

January 10, 2012
Speaker:Ingram Olkin
Title:INEQUALITIES: THEORY OF MAJORIZATION AND ITS APPLICATIONS
When:January 10, 2012 3:30 pm
Where:110 OSB
Abstract:
There are many theories of "equations": linear equations, differential equations, functional equations, and more, However, there is no central theory of "inequations" There are several general themes that lead to many inequalities. One such theme is convexity. Another theme is majorization, which is a particular partial order. What us important in this context is that the partial order have lots of examples, and that teh order-preserving functions be a rich class. In this case majorization arises in many fields: in mathematics:geometry, numerical analysis, graph theory; in other fields: physics, chemistry, political science, economics. In this talk we describe the origins of majorization and many examples of majorization and its consequences.
Back To Top

December 2, 2011
Speaker:Dr. Ji Zhu
Title:Joint Estimation of Multiple Graphical Models
When:December 2, 2011 10:05 am
Where:OSB 110
Abstract:
Gaussian graphical models explore dependence relationships between random variables, through estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method which jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its superior performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is also included. This is joint work with Jian Guo, Elizaveta Levina, and George Michailidis.
Back To Top

November 18, 2011
Speaker:Dr. Kshitij Khare
Title:Cholesky based estimation in graphical models.
When:November 18, 2011 10:05 am
Where:OSB 110
Abstract:
We consider the problem of sparse covariance estimation in high dimensional settings using graphical models. These models can be represented in terms of a graph, where the nodes represent random variables and edges represent their interactions. When the random variables are jointly Gaussian distributed, the lack of edges in such graphs can be interpreted as conditional and/or marginal independencies between these variables. We present a computationally efficient approach for high dimensional sparse covariance estimation in graphical models based on the Cholesky decomposition of the covariance matrix or its inverse. The proposed method is illustrated on both simulated and real data.
Back To Top

November 8, 2011
Speaker:Dr. Bertrand Clark, Professor of Statistics, Dept of Medicine and Dept of Epidemiology and Public Health Center for Computational Science Miller School of Medicine - Univ of Miami
Title:Clustering Stability: Impossibility and Possibility
When:November 8, 2011 11:00 am
Where:499 DSL (Dirac Science Library)
Abstract:
In the first part of this talk we present a theorem that gives conditions under which high dimensional clustering is unstable. Specifically, for any fixed sample size, clustering becomes impossible (in a squared error sense) as the dimension increases unless the separation among the clusters is large enough in the sense that coordinatewise differences do not decrease too quickly with $D$, the dimension of the data points. We also show that clustering impossibility occurs with a theoretical rate of ${\cal{O}}(\sqrt{D})$. In the second part of this talk we present a Bayesian method for assessing clustering stability. Roughly, the idea is to evaluate the probability that the distances between points and cluster centers can be re-ordered by random factors. The method seems to be consistent for choosing the number of clusters and we argue that it accurately reflects what we mean by the stability of a clustering. This work is ongoing research and hence comments and discussion are particularly welcome.
Back To Top

November 4, 2011
Speaker:Dr. Giray Okten
Title:Putting randomness back in quasi-Monte Carlo
When:November 4, 2011 10:05 am
Where:OSB 110
Abstract:
Abstract: The quasi-Monte Carlo method is often described as the deterministic version of the Monte Carlo method. It was developed in the last few decades, and its main advantage over Monte Carlo is faster convergence, at least asymptotically, and deterministic error bounds. In quasi-Monte Carlo one uses the so called low-discrepancy sequences to sample from a function, somewhat similar to the way pseudorandom numbers are used in Monte Carlo. Some mathematicians have even suggested avoiding the use of pseudorandom numbers altogether in favor of low-discrepancy sequences, since the former does not have a "rigorous" definition. Despite having certain advantages, the quasi-Monte Carlo method has also some drawbacks. In this talk I will give a survey of hybrid methods: these are methods that bring randomness back in quasi-Monte Carlo, in order bring together the best features of Monte Carlo and quasi-Monte Carlo.
Back To Top

October 28, 2011
Speaker:Dr. Howard Bondell
Title:Efficient Robust Estimation via Two-Stage Generalized Empirical Likelihood
When:October 28, 2011 10:05 am
Where:OSB 110
Abstract:
The triumvirate of outlier resistance, distributional robustness, and efficiency in both small and large samples, constitute the Holy Grail of robust statistics. We show that a two-stage procedure based on an initial robust estimate of scale followed by an application of generalized empirical likelihood comes very close to attaining that goal. The resulting estimators are able to attain full asymptotic efficiency at the Normal distribution, while simulations point to the ability to maintain this efficiency down to small sample sizes. Additionally, the estimators are shown to have the maximum attainable finite-sample replacement breakdown point, and thus remain stable in the presence of heavy-tailed distributions and outliers. Although previous proposals with full asymptotic efficiency exist in the literature, their finite sample efficiency can often be low. The method is discussed in detail for linear regression, but can be naturally extended to other areas, such as multivariate estimation of location and covariance.
Back To Top

October 21, 2011
Speaker:Dr. Jinfeng Zhang
Title:Integrated Bio-entity Network: A System for Biological Knowledge Discovery
When:October 21, 2011 10:05 am
Where:OSB 110
Abstract:
A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Uncertainties associated with each edge in IBN are quantified by the probabilities inferred using statistical machine learning methods. Under this framework, probabilistic-based graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.
Back To Top

October 14, 2011
Speaker:Dr. Yiyuan She
Title:Predictive Learning Though Joint Variable Selection and Rank Reduction for High-dimensional Data
When:October 14, 2011 10:05 am
Where:OSB 110
Abstract:
The talk discusses joint variable and rank selection for supervised dimension reduction in predictive learning. When the number of responses and/or that of the predictors exceed the sample size, one has to consider shrinkage methods for estimation and prediction. We propose to apply sparsity and reduced rank techniques jointly to attain simultaneous feature selection and feature extraction. A class of estimators are introduced are based on novel penalties that impose both row and rank restrictions on the coefficient matrix. We prove that these estimators adapt to the unknown matrix sparsity and have fast rates of convergence than LASSO and reduced rank regression. A computation algorithm is developed and applied to real world applications in machine learning, cognitive neuroscience and macroeconometrics forecasting.
Back To Top

October 7, 2011
Speaker:Dr. Jonathan H. Dennis
Title:The Regulatory Organization of the Human Genome
When:October 7, 2011 10:05 am
Where:OSB 110
Abstract:
A hallmark of cancer is altered chromosome structure. Consequently, the development and progression of cancer is classified by taking into account chromosomal changes that cells undergo as they become more aggressive cancers. Although there have been numerous studies on chromosomal aberrations in cancer, molecular assessment of chromosomal structure information has been understudied, and its role in malignant transformation remains poorly characterized. The human genome is organized into chromatin. The most fundamental subunit of chromatin is the nucleosome: ~150 base pairs of DNA wrapped around a “spool” of histones. We have identified chromatin-based patterns across different lung adenocarcinoma cancer grades. To address the role of chromatin structure in the progression of cancer, we compared the chromatin-structure from primary lung adenocarcinomas of grades one, two and three to their normal adjacent tissue, from several individuals, at multiple scales. We developed a systematic, robust, nucleosome distribution and chromatin accessibility microarray mapping platforms to analyze chromatin structure genome-wide across cancer grades between normal and tumor samples. We measured chromatin structure at three levels of resolution: nucleosome distribution, chromatin accessibility and three-dimensional molecular cytology. We show that grade one lung adenocarcinomas have greatly altered nucleosome distributions compared to the adjacent normal tissue, but nearly identical chromatin accessibility. Conversely, the grade three samples show extensive rearrangements in chromosomal accessibility, but only modest changes in nucleosome distribution when tumor and normal samples are compared. These data have allowed us to develop a model in which early grade lung adenocarcinomas are linked to changes in nucleosome distributions, while later grade cancers are linked to large-scale chromosomal changes. These results indicate that we should be able to use these chromatin structural changes to identify grade sub-type specific cancer biomarkers.
Back To Top

September 30, 2011
Speaker:Dr. Hui Zou
Title: Some Results on Large Bandable Covariance Matrix Estimation
When:September 30, 2011 10:05 am
Where:OSB 110
Abstract:
Covariance matrix is fundamental to many multivariate analysis techniques. In the era of high-dimensional data, estimating large covariance matrices is practically important and theoretically interesting. The first part of my talk concerns a general minimax theorem on the optimal estimation of large bandable covariance matrices. This result is a generalization of the minimax theorem obtained in Cai et al. (2010). The general minimax theorem reveals some new interesting phenomena. For example, for certain parameter spaces there is a tapering estimator that simultaneously attains the minimax optimal rates of convergence under both Frobenius and Spectral norms. For the same parameter spaces it is even possible to achieve adaptive minimax optimal estimation under the spectral norm with NP dimensions. In the second part of the talk, I will address the issue of selecting the right tapering parameter. We propose a SURE tuning method based on the Stein's Unbiased Risk Estimation theory. An extensive empirical study shows that SURE tuning is often comparable to the oracle tuning and outperforms CV.
Back To Top

September 23, 2011
Speaker:Dr. Robert Clickner
Title:Applications of Statistics to Environmental, Health and Housing Research
When:September 23, 2011 10:10 am
Where:OSB 110
Abstract:
The United States government and other governments collect and analyze data to inform public policy and programs. Generally, studies to collect and analyze data consist of one or more of the following components: research design, statistical design, methods development, implementation or data collection, data analysis, and report writing. All of these need statistics to help ensure the validity of the findings and the public policies and programs that may result. This is a review of the applications and use of statistical methods in environmental, health and housing studies, drawn on my personal experiences and those of my colleagues. Topics include the design and implementation of population-based housing and environmental studies; modeling of industrial effluents; analyses of the contributions of environmental contaminants to human body burden and health effects in the face of measurement errors, confounding variables and other data issues; and the presentation of the findings.
Back To Top

September 16, 2011
Speaker:Dr. Adrian Barbu
Title:Hierarchical Object Parsing from Noisy Point Clouds
When:September 16, 2011 10:10 am
Where:OSB 110
Abstract:
Object parsing and segmentation from point clouds are challenging tasks because the relevant data is available only as thin structures along object boundaries or other object features and is corrupted by large amounts of  noise. One way to handle this kind of data is by employing shape models that can accurately follow the object boundaries. Popular  models such as Active Shape and Active Appearance models lack the necessary flexibility for this task. While more flexible models such as Recursive Compositional Models have been proposed, this paper builds on the Active Shape models  and makes three contributions. First, it presents a flexible, mid-entropy, hierarchical generative model of object shape and appearance in images. The input data is explained by an object parsing layer, which is a deformation of a hidden PCA shape model with Gaussian prior. Second, it presents a novel efficient inference algorithm that uses a set of informed data-driven proposals to initialize local searches for the hidden variables. Third, it applies the proposed model and algorithm to object parsing from point clouds such as edge detection images, obtaining state of the art parsing errors on two standard datasets without using any intensity information.
Back To Top

September 2, 2011
Speaker:Dr. Victor Patrangenaru
Title:Object Data Analysis
When:September 2, 2011 10:10 am
Where:110 OSB
Abstract:
Analysis of Object Data is the more traditional name for Data Analysis on Sample Spaces with a Manifold Stratification. It includes Multivariate Analysis, Directional Data Analysis, Projective Shape Analysis as well as classical Shape Analysis, Diffusion Tensor Imaging, Functional Data Analysis, Analysis of Phylogenetic Trees Data; pretty much any non-categorical statistical problem can be formulated as object data analysis problem. Much of the standard nonparametric methodology extends from the multivariate case, in the generic case when the Frechet mean of a random object (r.o.) is at a regular point. In practice there are situations when a r.o. has a mean located on the singular part of the stratified sample space, and the manifold CLT based technique break down. Our initial goal is to understand the asymptotic behavior of the estimators of the Frechet mean of an arbitrary random object, and to develop nonparametric methodologies and fast inference techniques in applications.
Back To Top

August 25, 2011
Speaker:Rommel Bain
Title:Monte Carlo Likelihood Estimation for Conditional Autoregressive Models with Application to Sparse Spatiotemporal Data
When:August 25, 2011 2:00 pm
Where:108 OSB
Abstract:
Spatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Often analyses of spatiotemporal models are fraught with problems such as missing data and computational complexity. In this paper, a Monte Carlo likelihood (MCL) method is introduced for the analysis of sparse spatiotemporal temporal data (monthly mean zooplankton biomass) collected on a spatiotemporal lattice by the California Cooperative Oceanic Fisheries Investigations (CalCOFI) and assumed to follow a log-normal distribution. A conditional autoregressive (CAR) model is used to allow for spatiotemporal dependencies between nearest neighbor sites on the spatiotemporal lattice. Typically, CAR model likelihood inference is quite complicated because of the intractability of the CAR model’s normalizing constant. Monte Carlo likelihood estimation provides an approximation for intractable likelihood functions. We illustrate MCL parameter estimation by computing log normalized monthly mean (small) zooplankton displacement volume in ml/1000m3 to describe zooplankton seasonal variations for the CalCOFI time series.
Back To Top

August 23, 2011
Speaker:Jihyung Shin
Title:Mixed-effects and mixed-distribution models for count data with applications to educational research data
When:August 23, 2011 9:30 am
Where:108 OSB
Abstract:
This research is motivated by an analysis of reading and vocabulary data collected by Florida Center for Reading Research. We are interested in modeling the outcome of reading ability of kindergarten children aged between 5 and 7. With consents of both parents and teacher, data was collected from 461 students of number of letters with correct pronunciation in sixty seconds time period. The test has been conducted three times over the academic year, Fall, Winter and Spring. The data showed excessive zero scores on the test. In this dissertation, we examine zero-inflated Poisson (ZIP) regression models and mixed-e ffects and mixed-distribution models (MEMD) proposed by Lambert(1992) and Tooze(2002) respectively. The MEMD model is extended to Poisson count data in longitudinal setting. The maximum likelihood estimation is obtained through standard statistical software package. The application result is also shown.
Back To Top

August 15, 2011
Speaker:Wei Liu
Title:A RIEMANNIAN FRAMEWORK FOR ANNOTATED CURVES ANALYSIS
When:August 15, 2011 9:30 am
Where:108 OSB
Abstract:
We propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries. These attributes may be in form of vector-valued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions using both geometric and auxiliary coordinates, and analyze shapes of these curves. The difficulty comes from the need for removing different groups from different components: the shape is invariant to rigid-motion, global scale and re-parameterization while the auxiliary component is usually invariant only to the re-parameterization. Thus, the removal of some transformations (rigid motion and global scale) is restricted only to the geometric coordinates, while the re-parameterization group is removed for all coordinates. We demonstrate this framework using a number of experiments.
Back To Top

August 3, 2011
Speaker:Sentibaleng Ncube
Title:A Novel Riemannian Metric for Analyzing Spherical Functions with Applications to HARDI Data
When:August 3, 2011 9:30 am
Where:108 OSB
Abstract:
We propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of PDFs modulo the rotations. In other words, any two PDFs are compared in: (1) shape by rotationally aligning one PDF to another, using the Fisher-Rao distance on the aligned PDFs, and (2) orientation by comparing their rotation matrices. This idea improves upon the results from using the Fisher-Rao metric in analyzing PDFs directly, a technique that is being used increasingly, and leads to geodesic interpolations that are biologically feasible. This framework leads to definitions and efficient computations for the Karcher mean that provide tools for improved interpolation and denoising. We demonstrate these ideas, using an experimental setup involving several PDFs.
Back To Top

July 29, 2011
Speaker:Emilola Abayomi
Title:The Relationship of Body Weight to Blood Pressure in Diverse Populations
When:July 29, 2011 10:00 am
Where:108 OSB
Abstract:
High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body fat is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waist-to-hip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index, ?. Although the relationship between level of blood pressure and BMI has been extensively reported, several questions remain: Is there a significant relationship between blood pressure and body mass index in all populations? Is the relationship between body mass and blood pressure linear? How does the relationship vary in different populations? Do characteristics such as race and gender explain heterogeneity that maybe present amongst the relationship in diverse populations? How does the relationship of other measures of body fat (skinfolds, waist-to-hip ratio, etc.) compare to the relationships found for BMI in diverse populations? To examine these questions we will conduct a meta-analysis based on person-level data from almost 30 observational studies from around the world.
Back To Top

July 28, 2011
Speaker:Felicia Williams
Title:The Relationship of Diabetes and Coronary Heart Disease Mortality: a meta- analysis based on person level data
When:July 28, 2011 10:00 am
Where:108 OSB
Abstract:
Studies have suggested that diabetes is a stronger risk factor for coronary heart disease (CHD) in women than in men. We present a meta-analysis of person-level data from diverse populations to examine this issue.  Our data  comes from 18 studies in which diabetes, CHD mortality and potential confounders were available and a minimum of 75 CHD deaths occurred. These studies followed up 69,308 men and 74,735 women aged 42 to 73 years on average from the US, Denmark, Iceland, Norway and the UK. Individual study prevalence rates of diabetes mellitus (mostly self-reported) at baseline ranged between less than 1\% in the youngest cohort and 15.7% (males) and 11.1% (females) in the NHLBI Cardiovascular Health Study (CHS) of the elderly. CHD death rates varied between 2% and 20%. Hazard ratios (HR) associated with baseline diabetes, adjusted for age only, varied between 1.06 and 5.12 (males) and between 1.42 and 10.87 (females). Hazard ratios (HR) associated with baseline diabetes, adjusted for age, serum cholesterol level , systolic blood pressure, and cigarette smoking status varied between 1.10 and 6.69 (males) and between 1.25 and 9.01 (females). The male fixed-effect estimated HR of fatal CHD in diabetic versus non-diabetic participants after adjustment for the major risk factors was 2.40 (95% CI 2.11-2.73) whereas the corresponding HR for females was 2.89 (2.50-3.34). These estimates differed only slightly from unadjusted ones obtained from the same data [males: 2.43 (2.14, 2.75) p-value>0.25 and females: 2.91 (2.52, 3.35) p-value>0.25]. They agree closely with estimates (odds ratios of 2.3 for males and 2.9 for females) obtained in a recent meta-analysis of 8 studies of both fatal and nonfatal CHD but based on literature-based data. There are insufficient data to suggest that there is a difference in the models that adjust for additional major CHD risk factors and the models that are unadjusted.
Back To Top

June 9, 2011
Speaker:Lindsey Bell
Title:A STATISTICAL APPROACH FOR INFORMATION EXTRACTION OF BIOLOGICAL RELATIONSHIPS
When:June 9, 2011 10:00 am
Where:HCB 207
Abstract:
Vast amounts of biomedical information are stored in scienti c literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identi ed through information retrieval (IR), next important concepts and terms are agged using entity recognition(ER), and then facts concerning these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and small-molecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classi ers in an ensemble approach. The three classi ers we consider are Bayesian Networks, Support Vector Machines, and mixture of logistic models de ned by interaction word. The three classi ers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and cross-corpus validation to replicate an application scenario. The three classi ers are unique and we nd that performance of individual classi ers varies depending on the corpus. Therefore, an ensemble of classi ers removes the need to choose one classi er and provides optimal performance.
Back To Top

June 7, 2011
Speaker:Greg Miller
Title:INVESTIGATING THE USE OF MORTALITY DATA AS A SURROGATE FOR MORBIDITY DATA
When:June 7, 2011 10:00 am
Where:Bel 001
Abstract:
We are interested in dierences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether those two developed models dier in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and nd that the coecients dier for the age covariate, but no signicant dierence for other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to emulate the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped condence intervals, and nd that age and systolic blood pressure have dierences with their covariates. We continue the simulation by using Net Reclassication Index (NRI) to evaluate the treatment decision performance of the two models, and that the two models do not perform signicantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across dierent risk models, treatment decisions based on classifying an upper specied percent as high risk will not be signicantly different. We conclude the dissertation with statements about future methods of approaching our question.
Back To Top

June 2, 2011
Speaker:Robert Holden
Title:Failure Time Regression Models for Thinned Point Processes
When:June 2, 2011 10:00 am
Where:BEL 006
Abstract:
In survival analysis, data on the time until a speci c criterion event occurs are analyzed, often with regard to the eff ects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component. In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the eff ects of the predictors on that distribution. Suppose that the criterion event isn't a terminal event that can only occur once, but is a repeatable event. The sequence of events forms a stochastic point process. Further suppose that only some of the events are detected (observed); the detected events form a thinned point process. Any failure time model based on the data will be based not on the time until the first occurrence, but on the time until the fi rst detected occurrence of the event. I will consider the implications of this for survival regression models. Such models will have little meaning unless the regression parameters are independent of the detection probability, or, more generally, the thinning mechanism. I will show that the e ffect of thinning on regression parameters depends on the combination of the type of regression model and the type of point process that generates the events. For some combinations, the eff ect of a predictor will be the same for time to the rst event and the time to the fi rst detected event. For other combinations, the regression eff ect will be changed as a result of the incomplete detection.
Back To Top

May 31, 2011
Speaker:Jennifer Geis
Title:A WEIGHTED APPROACH TO RANK SELECTION WITH A DATA-ADAPTIVE METHOD TO CANONICAL CORRELATION ANALYSIS
When:May 31, 2011 1:00 pm
Where:BEL 007
Abstract:
Back To Top

May 26, 2011
Speaker:Leif Ellingson
Title:STATISTICAL SHAPE ANALYSIS ON MANIFOLDS WITH APPLICATIONS TO PLANAR CONTOURS AND STRUCTURAL PROTEOMICS
When:May 26, 2011 10:00 am
Where:006 BEL
Abstract:
The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either high-dimensional or infinite-dimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoretically-sound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the three-dimensional atomic structures of protein binding sites. First, we adapt Kendallfs definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezoldfs nonparametric view of FrLechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of Hilbert-Schmidt operators in order to obtain the Veronese-Whitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a one-sample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using exrinsic reflection size-and-shape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Back To Top

May 19, 2011
Speaker:Jianchang Lin
Title:Semiparametric Bayesian survival analysis via transform-both-sides model
When:May 19, 2011 1:00 pm
Where:BEL 007
Abstract:
We propose a new semiparametric survival model with a log-linear median regression function as an useful alternative to the popular Cox's (1972) models and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many practical advantages, including the interpretation of regression parameters via median, ability to incorporate heteroscedasticity, the ease of prior elicitation and computation of Bayesian estimators. Our Bayesian estimation method is also extended to multivariate survival model with symmetric random effects distribution. Our multivariate survival model has same covariate effects on marginal (population average) as well as conditional (given random effects) median survival time. Our other aim is to develop a Bayesian simultaneous variable selection and estimation of median regression for skewed response variable. Our hierarchical Bayesian model can incorporates advantages of Lasso penalty albeit for skewed and heteroscedastic response variable. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing frequentist median lasso. Considering the estimation bias and Mean squared error, our proposed model performs as good as and, in some scenarios, better than competing frequentist estimators. We illustrate our approaches and model diagnostics via reanalysis of some real life clinical studies including a small-cell lung cancer study and a retinopathy study.
Back To Top

May 19, 2011
Speaker:Daniel Osborne
Title:Nonparametric Data Analysis on Manifolds with an Application in Medical Imaging
When:May 19, 2011 10:00 am
Where:006 BEL
Abstract:
Over the past fifteen years, there has been a rapid development in Nonparametric Statistical Analysis on Shape Manifolds applied to Medical Imaging. For surgery planning, a more appropriate approach is to take into account the size as well when analyzing the CT scan data. In this context, one performs a nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the Size-and-Reflection Shape Space of k-ads in general position in 3D. This work, as part of larger project on planning reconstructive surgery in severe skull injuries, includes preprocessing and post-processing steps of CT images. The preprocessing step, consists of the extraction the boundary of the bone structure from the CT slices while the post-processing steps consists of 3D reconstruction of the virtual skull from these bone extractions and smoothing. Next we present preliminary results for the Schoenberg’s sample mean Size-and-Reflection Shape of k-ads in general position in R^3 for the human skull based on these virtual reconstructions. The bootstrap distribution of the Schoenberg sample means 3D Size-and-Reflection Shape for a selected group of anatomic landmarks and pseudo-landmarks, are computed for 500 bootstrap resamples of the original 20 skulls represented by the 3 by k configurations, when k=9. Finally, we report a confidence region for the Schoenberg mean configuration.
Back To Top

May 16, 2011
Speaker:Tamika Royal-Thomas
Title:Interrelating of Longitudinal Processes: An Empirical Example
When:May 16, 2011 9:30 am
Where:207 HCB
Abstract:
The Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process- Blood pressure and anthropometric measurements at seven time-points on the mother during pregnancy. (ii)In Utero measurements - Ultrasound measurements of the fetus taken at six time-points during pregnancy. (iii)Birth to present process - Children's anthropometric and blood pressure measurements at 24 time-points from birth to 14 years. A comprehensive analysis of the interrelationship of these three longitudinal processes is presented using joint modeling for multivariate longitudinal profiles. We propose a new methodology of examining child's cardiovascular risk by extending a current view of likelihood estimation. Joint modeling of multivariate longitudinal profiles is done and the extension of the traditional likelihood method is utilized in this paper and compared to the maximum likelihood estimates. Our main goal is to examine whether the process in mothers predicts fetal development which in turn predicts the future cardiovascular health of the children. One of the difficulties with `in utero' and early childhood data is that certain variables are highly correlated and so using dimension reduction techniques are quite applicable in this scenario. Principal component analysis (PCA) is utilized in creating a smaller dimension of uncorrelated data which is then utilized in a longitudinal analysis setting. These principal components are then utilized in an optimal linear mixed model for longitudinal data which indicates that in utero and early childhood attributes predicts the future cardiovascular health of the children. This thesis has added a body of knowledge to developmental origins of adult diseases and has supplied some significant results while utilizing a rich diversity of statistical methodologies.
Back To Top

May 2, 2011
Speaker:Yinfeng Tao
Title:Title: "The frequentist properties and general performance of Bayesian Confidence Intervals for the survival function."
When:May 2, 2011 11:00 am
Where:108 OSB
Abstract:
Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. We consider estimation of confidence intervals for the survival function based on right censored or interval-censored survival data. In the right-censored case, almost all confidence intervals are based in some way on the Kaplan-Meier estimator first proposed by Kaplan and Meier (1958) and widely used As the nonparametric estimator in the presence of right-censored data. For interval-censored data, the Turnbull estimator (Turnbull(1974)) plays a similar role. For a class of Bayesian models involving mixtures of Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques can lead to point estimates and probability intervals for the survival function at arbitrary time points for both the right-censored and interval-censored cases. The main objective of this thesis is to examine the frequentist properties and general performance of the Bayesian probability intervals when the prior is non-informative. Simulation studies will be used to compare these Bayesian probability intervals based on Doss & Huffer's approach with other published methods for obtaining pointwise confidence intervals for the survival function. Similar comparisons will be carried out for confidence intervals for quantiles of the survival function. Also we describe an approach for constructing simultaneous confidence bands for the survival function which will be investigated in future work.
Back To Top

April 29, 2011
Speaker:Paul Hill
Title:Bootstrap Prediction Bands for Non-Parametric Function Signals in a Complex System
When:April 29, 2011 10:00 am
Where:307 HCB
Abstract:
Methods employed in the construction of prediction bands for continuous curves require a different approach to those used for a data point. In many cases, the underlying function is unknown and thus a distribution-free approach which preserves sufficient coverage for the signal in its entirety is necessary in the signal analysis. Three methods for the formation of (1-?)100% prediction bands are presented and their performances are compared through the coverage probabilities obtained. These techniques are applied to constructing prediction bands for spring discharge in a successful manner giving good coverage in each case. Spring discharge measured over time can be considered as a continuous signal and the ability to predict the future signals of spring discharge is useful for monitoring flow and other issues related to the spring. There has been common use of the gamma distribution in the simulation of rainfall. Bootstrapping the rainfall in the proposed manner, allows for adequately creating new samples over different periods of time as well as specific rain events such as hurricanes or drought. This non-parametric approach to the input rainfall augurs well for the non-parametric nature of the output signal.
Back To Top

March 25, 2011
Speaker:Yu Gu
Title:New Semiparametric Methods for Recurrent Events Data" and here is the abstract.
When:March 25, 2011 2:00 pm
Where:HCB 210
Abstract:
Recurrent events data are rising in all areas of biomedical research. We present a model for recurrent events data with the same link for the intensity and mean functions. Simple interpretations of the covariate effects on both the intensity and mean functions lead to a better understanding of the covariate effects on the recurrent events process. We use partial likelihood and empirical Bayes methods for inference and provide theoretical justifications and as well as relationships between these methods. We also show the asymptotic properties of the empirical Bayes estimators. We illustrate the computational convenience and implementation of our methods with the analysis of a heart transplant study. We also propose an additive regression model and associated empirical Bayes method for the risk of a new event given the history of the recurrent events. Both the cumulative mean and rate functions have closed form expressions for our model. Our inference method for the simiparametric model is based on maximizing a finite dimensional integrated likelihood obtained by integrating over the nonparametric cumulative baseline hazard function. Our method can accommodate time-varying covariates and is easier to implement computationally instead of iterative algorithm based full Bayes methods. The asymptotic properties of our estimates give the large-sample justifications from a frequentist stand point. We apply our method on a study of heart transplant patients to illustrate the computational convenience and other advantages of our method.
Back To Top

March 24, 2011
Speaker:Vernon Lawhern
Title:"Statistical Modeling and Applications of Neural Spike Trains"
When:March 24, 2011 3:35 pm
Where:110 OSB
Abstract:
Understanding how spike trains encode information is a principle question in the study of neural activity. Recent advances in biotechnology have given researchers the ability to record neural activity on a wide scale, allowing researchers to perform detailed analyses that may have been impossible just a few years ago. Here we present several frameworks for the statistical modeling of neural spike trains. We first develop a Generalized Linear Model (GLM) framework that incorporates the effects of hidden states in the modeling of neural activity in the primate motor cortex. We then develop a state-space model that incorporates target information in the modeling framework. In both cases, significant improvements in model fitting and decoding accuracy were observed. Finally, in joint work with Dr. Contreras and Dr. Nikonov from the Psychology Department, we study taste coding and discrimination in the gustatory system by using information-theoretic tools such as Mutual Information, and by using a recently-developed spike train metric to study the clustering performance from recordings of proximate neurons.
Back To Top

March 16, 2011
Speaker:Anqi Tang
Title:A CLASS OF MIXED-DISTRIBUTION MODELS WITH APPLICATIONS IN FINANCIAL DATA ANALYSIS
When:March 16, 2011 9:30 am
Where:499 DSL
Abstract:
Statisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zero-inflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time.  In this dissertation, we propose a two-part mixed distribution model to model zero-inflated longitudinal data. The first part of the model is a logistic regression model that models the probability of nonzero response; the other part is a linear model that models the mean response given that the outcomes are not zeros. Random effects with AR(1) covariance structure are introduced into the both parts of the model to allow serial correlation and subject specific effect. Estimating the two-part model is challenging because of high dimensional integration necessary to obtain the maximum likelihood estimates. We propose a Monte Carlo EM algorithm for estimating the maximum likelihood estimates of parameters. Through simulation study, we demonstrate the good performance of the MCEM method in parameter and standard error estimation. To illustrate, we apply the two-part model with correlated random effects and the model with autoregressive random effects to executive compensation data to investigate potential determinants of CEO stock option grants.
Back To Top

February 25, 2011
Speaker:Wei Wu
Title:
When:February 25, 2011 10:10 am
Where:
Abstract:
Back To Top

February 18, 2011
Speaker:Adrian Barbu
Title:Automatic Detection and Segmentation of Lymph Nodes from CT Data
When:February 18, 2011 10:10 am
Where:108 OSB
Abstract:
Lymph nodes are assessed routinely in clinical practice and their size is followed throughout radiation or chemotherapy to monitor the effectiveness of cancer treatment. This work presents a robust learning-based method for automatic detection of solid lymph nodes from CT data, with the following contributions. First, it presents a learning based approach to solid lymph node detection that relies on Marginal Space Learning to achieve great speedup with virtually no loss in accuracy. Second, it presents an efficient segmentation method for solid lymph nodes. Third, it introduces two new sets of features that are effective for LN detection, one that self-aligns to high gradients and another set obtained from the segmentation result. The method is evaluated on large datasets obtaining better than state of the art results, with a running time of 5-40 seconds per volume. An added benefit of the method is the capability to detect and segment conglomerated lymph nodes.
Back To Top

February 11, 2011
Speaker:Feng Zhao
Title:Bayesian portfolio optimization with time-varying factor models
When:February 11, 2011 10:10 am
Where:108 OSB
Abstract:
We develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity (\betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors (\alphas"), and returns of risk factors in excess of the risk-free rate (\risk premia"). Both
Back To Top

January 28, 2011
Speaker:Anuj Srivastava
Title:Statistical Modeling of Elastic Functions
When:January 28, 2011 10:10 am
Where:108 OSB
Abstract:
Motivated by the well-known problem of finding spurts in Berkeley growth data, we are interested in modeling functions that allow some warping in the time domains. Such warping is useful in improving the matching of peaks and valleys across functions and results in models that better preserve structures in the original data. The challenge is to develop a principled approach that can automatically warp a given set functions  to result in an optimal alignment. The aligned functions are said to represent the "y-variability" and the warping functions used to align them form the "x-variability" of the data. I will start by summarizing the main ideas used in the past literature (e.g. Kneip and Gasser, Annals, 1992; Ramsay and Li, JRSSB, 1998; Kneip and Ramsay, JASA 2008; Liu and Mueller, JASA, 2004) and what I view as their limitations. Then, I will describe our approach for (1) aligning, comparing and modeling functions and (2)  modeling the warping functions. This framework is based on the use of the Fisher-Rao Riemannian metric that provides a proper distance for comparing time-warped functional data.  These distances are then used to define Karcher means and the individual functions are optimally warped to align them to the Karcher means to extract the y variability. Principal component analysis and stochastic modeling of these constituents ­ x and y ­ in their respective  spaces leads to the desired modeling of functional variation. These ideas are demonstrated using both simulated and real data from different application domains: the Berkeley growth study, handwritten signature curves, and neuroscience spike trains. (Collaborators: Wei Wu, Sebastian Kurtek, Eric Klassen, and J. Steve Marron (UNC, Chapel Hill))
Back To Top

January 21, 2011
Speaker:Jim Berger of Duke University
Title:"I don't know where I'm gonna go when the volcano blows"
When:January 21, 2011 10:10 am
Where:108 OSB
Abstract:
wrote Jimmy Buffet. Great song line, but usually it's too late to go when the volcano blows; one has to know when to go before the volcano blows. The problem of risk assessment for rare natural hazards -- such as volcanic pyroclastic flows -- is addressed, and illustrated with the Soufriere Hills Volcano on the island of Montserrat. Assessment is approached through a combination of mathematical computer modeling, statistical modeling of geophysical data, and extreme-event probability computation. A mathematical computer model of the natural hazard is used to provide the needed extrapolation to unseen parts of the hazard space. Statistical modeling of the available geophysical data is needed to determine the initializing distribution for exercising the computer model. In dealing with rare events, direct simulations involving the computer model are prohibitively expensive, so computation of the risk probabilities requires a combination of adaptive design of computer model approximations (emulators) and rare event simulation.
Back To Top

Google Maps

Email: info@stat.fsu.edu

Office: 214 OSB
117 N. Woodward Ave.
P.O. Box 3064330
Tallahassee, FL
32306-4330

Phone: (850) 644-3218

Fax: (850) 644-5271

Admissions Inquiries: richburg@stat.fsu.edu