|
|
|
Colloquia
|
| November 22, 2013, 10:00 am |
Dr. Washington Mio, FSU Dept. of Mathematics |
| November 15, 2013, 10:00 am |
Dr. Nilanjan Chatterjee, National Cancer Institute |
| November 12, 2013, 3:30 pm |
Mingfei Qiu, Essay Defense |
| November 8, 2013, 10:00 am |
Dr. Todd Ogden, Columbia University, Dept. of Biostatistics |
| November 1, 2013, 4:00 pm |
Jose Laborde, FSU Dept. of Statistics Dissertation Defense |
| November 1, 2013, 10:00 am |
Dr. Yiyuan She |
| October 25, 2013, 10:00 am |
Dr. Stephen Walker, University of Kent and University of Texas at Austin |
| October 18, 2013, 10:00 am |
Dr. Steve Marron, UNC Chapel Hill |
| October 17, 2013, 2:30 pm |
Qian Xie, FSU Statistics, Essay Defense |
| October 11, 2013, 10:00 am |
Dr. Runze Li, Penn State University |
| October 4, 2013, 10:00 am |
Dr. Robert Clickner, FSU Dept. of Statistics |
| September 27, 2013, 1:00 pm |
Robert Holden |
| September 27, 2013, 10:00 am |
Dr. Jim Hobert, University of Florida |
| September 20, 2013, 10:00 am |
Dr. Betsy Hill, Medical University of South Carolina |
| September 13, 2013, 10:00 am |
Dr. Debdeep Pati, FSU Dept. of Statistics |
| September 6, 2013, 10:00 am |
Dr. Anuj Srivastava, FSU Dept. of Statistics |
| July 2, 2013, 2:00 pm |
Oliver Galvis |
| July 1, 2013, 11:00 am |
Seung-Yeon Ha |
| June 27, 2013, 2:00 pm |
Felicia Williams |
| June 19, 2013, 3:00 pm |
Yuanyuan Tang |
| June 18, 2013, 10:00 am |
Ester Kim Nilles |
| May 15, 2013, 2:00 pm |
Jingyong Su, FSU Dept. of Statistics, Dissertation Defense |
| May 13, 2013, 11:00 am |
Yingfeng Tao |
| May 6, 2013, 10:00 am |
Darshan Bryner, FSU Dept. of Statistics, Essay Defense |
| May 1, 2013, 1:00 pm |
Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense |
| April 29, 2013, 2:00 pm |
Wade Henning, FSU Dept. of Statistics, Essay Defense |
| April 26, 2013, 10:00 am |
Paul Beaumont, FSU Department of Economics |
| April 19, 2013, 10:00 am |
Wei Wu, FSU Dept. of Statistics |
| April 15, 2013, 3:30 pm |
Michael Rosenthal, Ph.D Candidate Essay Defense |
| April 12, 2013, 10:00 am |
Karim Lounici, Georgia Tech |
| April 5, 2013, 10:00 am |
Russell G. Almond, FSU |
| March 29, 2013, 10:00 am |
Genevera Allen, Rice University |
| March 22, 2013, 10:00 am |
Xiaoming Huo, Georgia Tech |
| March 20, 2013, 3:30 pm |
Jose Laborde, Ph.D Candidate, Essay Defense |
| March 19, 2013, 3:00 pm |
Gretchen Rivera, FSU, Dissertation Defense |
| March 8, 2013, 10:00 am |
Yongtao Guan, University of Miami |
| March 4, 2013, 11:00 am |
Kelly McGinnity, FSU, Dissertation Defense |
| March 1, 2013, 10:00 am |
Brian C. Monsell, US Census Bureau |
| February 27, 2013, 10:00 am |
Rachel Becvarik, FSU, Dissertation Defense |
| February 20, 2013, 2:00 pm |
Carl P. Schmertmann, Professor of Economics, at FSU |
| February 15, 2013, 10:00 am |
Fred Huffer, FSU Dept. of Statistics |
| January 25, 2013, 10:00 am |
Yin Xia |
| January 23, 2013, 2:00 pm |
Ying Sun |
| January 18, 2013, 10:00 am |
Minjing Tao |
| January 14, 2013, 2:00 pm |
Naomi Brownstein |
| January 11, 2013, 10:00 am |
Qing Mai |
| November 22, 2013 |
| Dr. Washington Mio, FSU Dept. of Mathematics |
| Taming Shapes and Understanding Their Variation |
| November 22, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| Quantification and interpretation of shape variation are problems that arise in multiple domains of biology and medicine. Problems such as understanding evolution and inheritance of phenotypic traits, genetic determinants of morphological traits, normal and pathological changes in the anatomy of organs and tissues, all involve shape analysis. Shape data can be quite irregular, as exemplified by images of gene expression domains and noisy 3D scans. Thus, a companion problem is that of regularizing shapes to make them amenable to analyses. In this talk, we will discuss developments in shape regularization and analysis that let us address some of these problems. We will illustrate the methods with applications to biomedical imaging. |
| Back To Top |
| November 15, 2013 |
| Dr. Nilanjan Chatterjee, National Cancer Institute |
| |
| November 15, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
|
| Back To Top |
| November 12, 2013 |
| Mingfei Qiu, Essay Defense |
| |
| November 12, 2013 3:30 pm |
| OSB 205 |
|
|
| Back To Top |
| November 8, 2013 |
| Dr. Todd Ogden, Columbia University, Dept. of Biostatistics |
| Images as predictors in regression models with scalar outcomes |
| November 8, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| One situation that arises in the field of functional data analysis is
the use of imaging data or other very high dimensional data as
predictors in regression models. A motivating example involves using
baseline images of a patient's brain to predict the patient's clinical
outcome. Interest lies both in making such patient-specific
predictions and in understanding the relationship between the imaging
data and the outcome. Obtaining meaningful fits in such problems
requires some type of dimension reduction but this must be done while
taking into account the particular (spatial) structure of the data.
This talk will describe some of the general tools that have proven
effective in this context, including principal component analysis,
penalized splines, and wavelet analysis.
|
| Back To Top |
| November 1, 2013 |
| Jose Laborde, FSU Dept. of Statistics Dissertation Defense |
| Elastic Shape Analysis of RNAs and Proteins |
| November 1, 2013 4:00 pm |
| OSB 215 |
|
| Proteins and RNAs are molecular machines performing biological functions
in the cells of all organisms. Automatic comparison and classification of
these biomolecules are fundamental yet open problems in the field of
Structural Bioinformatics. An outstanding unsolved issue is the definition
and efficient computation of a formal distance between any two
biomolecules. Current methods use alignment scores, which are not proper
distances, to derive statistical tests for comparison and classifications.
This work applies Elastic Shape Analysis (ESA), a method recently
developed in computer vision, to construct rigorous mathematical and
statistical frameworks for the comparison, clustering and classification
of proteins and RNAs. ESA treats bio molecular structures as 3D
parameterized curves, which are represented with a special map called the
square root velocity function (SRVF). In the resulting shape space of
elastic curves, one can perform statistical analysis of curves as if they
were random variables. One can compare, match and deform one curve into
another, or as well as compute averages and covariances of curve
populations, and perform hypothesis testing and classification of curves
according to their shapes. We have successfully applied ESA to the
comparison and classification of protein and RNA structures.
We further extend the ESA framework to incorporate additional
non-geometric information that tags the shape of the molecules (namely,
the sequence of nucleotide/amino-acid letters for RNAs/proteins and, in
the latter case, also the labels for the so-called secondary structure).
The biological representation is chosen such that the ESA framework
continues to be mathematically formal. We have achieved superior
classification of RNA functions compared to state-of-the-art methods on
benchmark RNA datasets which has led to the publication of this work in
the journal, Nucleic Acids Research (NAR).
Based on the ESA distances, we have also developed a fast method to
classify protein domains by using a representative set of protein
structures generated by a clustering-based technique we call Multiple
Centroid Class Partitioning (MCCP). Comparison with other standard
approaches showed that MCCP significantly improves the accuracy while
keeping the representative set smaller than the other methods.
The current schemes for the classification and organization of proteins
(such as SCOP and CATH) assume a discrete space of their structures, where
a protein is classified into one and only one class in a hierarchical tree
structure. Our recent study, and studies by other researchers, showed that
the protein structure space is more continuous than discrete. To capture
the complex but quantifiable continuous nature of protein structures, we
propose to organize these molecules using a network model, where
individual proteins are mapped to possibly multiple nodes of classes, each
associated with a probability. Structural classes will then be connected
to form a network based on overlaps of corresponding probability
distributions in the structural space. |
| Back To Top |
| November 1, 2013 |
| Dr. Yiyuan She |
| |
| November 1, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
|
| Back To Top |
| October 25, 2013 |
| Dr. Stephen Walker, University of Kent and University of Texas at Austin |
| On the Equivalence between Bayesian and Classical Hypothesis Testing |
| October 25, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| : For hypotheses of the type
H_0 :\theta = \theta_0 vs H_1 :\theta ne \theta_0
we demonstrate the equivalence of a Bayesian hypothesis test using a Bayes
factor and the corresponding classical test, for a large class of models,
which are detailed in the talk. In particular, we show that the role of the
prior and critical region for the Bayes factor test is only to specify the
type I error of the test. This is their only role since, as we show, the
power function of the Bayes factor test coincides exactly with that of the
classical test, once the type I error has been fixed.
This is joint work with Tom Shively, at the McCombs Business School,
University of Texas at Austin.
|
| Back To Top |
| October 18, 2013 |
| Dr. Steve Marron, UNC Chapel Hill |
| Object Oriented Data Analysis |
| October 18, 2013 10:00 am |
| 204 Duxbury Hall (Nursing) - note, not 214 auditorium |
|
| Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Challenges in modern medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. The notion of Object Oriented Data Analysis also impacts data analysis, through providing a language for discussion of the many choices needed in many modern complex data analyses. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
|
| Back To Top |
| October 17, 2013 |
| Qian Xie, FSU Statistics, Essay Defense |
| Parallel Transport of Deformations in Shape Space of Elastic Surfaces |
| October 17, 2013 2:30 pm |
| OSB 205 |
|
| Statistical shape analysis develops methods for comparisons, deformations, summarizations, and modeling of shapes in given data sets. These tasks require a fundamental tool called parallel transport of tangent vectors along arbitrary paths. This tool is essential for: (1) computation of geodesic paths using either shooting or path-straightening method, (2) transferring deformations across objects, and
(3) modeling of statistical variability in shapes. Using the square-root normal field (SRNF) representation of parameterized surfaces, we present a method for transporting deformations along paths in the shape space. This is difficult despite the underlying space being a vector space because the chosen (elastic) Riemannian metric is non-standard. Using a finite-basis for representing SRNFs of shapes, we derive expressions for Christoffel symbols that enable parallel transports. We demonstrate this framework using examples from shape analysis of parameterized spherical surfaces, in the three contexts mentioned above. |
| Back To Top |
| October 11, 2013 |
| Dr. Runze Li, Penn State University |
| Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates |
| October 11, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| This paper is concerned with feature screening and variable selection for varying
coefficient models with ultrahigh dimensional covariates. We propose a new feature
screening procedure for these models based on conditional correlation coefficient. We
systematically study the theoretical properties of the proposed procedure, and establish
their sure screening property and the ranking consistency. To enhance the finite sample
performance of the proposed procedure, we further develop an iterative feature screening
procedure. Monte Carlo simulation studies were conducted to examine the performance of
the proposed procedures. In practice, we advocate a two-stage approach for varying
coefficient models. The two stage approach consists of (a) reducing the ultrahigh
dimensionality by using the proposed procedure and (b) applying regularization methods for
dimension-reduced varying coefficient models to make statistical inferences on the
coefficient functions. We illustrate the proposed two-stage approach by a real data
example.
|
| Back To Top |
| October 4, 2013 |
| Dr. Robert Clickner, FSU Dept. of Statistics |
| The Work Life of a Statistician in Academia, Government, and the Private Sector: A Comparative Review |
| October 4, 2013 10:00 am |
| 214 Duxbury Hall (Nursing)
|
|
| Statisticians work in a variety of work environments. These work environments can be broadly categorized as academia, government, and the private sector (or industry). While all have many things in common, there are significant differences among them in the nature of the work, the required skills, the opportunities, benefits, employer expectations, demands, constraints, rewards, and compensation. No one of these work environments is the best for everyone. This will be a nontechnical talk that will describe and discuss these similarities and differences and hopefully give you a sense of which you might preferable. |
| Back To Top |
| September 27, 2013 |
| Robert Holden |
| FAILURE TIME REGRESSION MODELS
FOR THINNED POINT PROCESSES |
| September 27, 2013 1:00 pm |
| 215 OSB |
|
| In survival analysis, data on the time until a specific criterion event (or ``endpoint") occurs are analyzed, often with regard to the effects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component.
In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the effects of the predictors on that distribution.
Suppose that the criterion event isn't a terminal event that can only occur once, but is a repeatable event. The sequence of events forms a stochastic point process.
Further suppose that only some of the events are detected (observed); the detected events form a thinned point process.
Any failure time model based on the data will be based not on the time until the first occurrence, but on the time until the first detected occurrence of the event.
The implications of estimating survival regression models from such incomplete data will be analyzed. |
| Back To Top |
| September 27, 2013 |
| Dr. Jim Hobert, University of Florida |
| Convergence analysis of the Gibbs sampler for Bayesian general linear
mixed models with improper priors
|
| September 27, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| A popular default prior for the general linear mixed model is an
improper prior that takes a product form with a flat prior on the
regression parameter, and so-called power priors on each of the variance
components. I will describe a convergence rate analysis of the Gibbs
samplers associated with these Bayesian models. The main result is a
simple, easily-checked sufficient condition for geometric ergodicity of
the Gibbs Markov chain. (This is joint work with Jorge Roman and
Brett Presnell.)
|
| Back To Top |
| September 20, 2013 |
| Dr. Betsy Hill, Medical University of South Carolina |
| Analysis of left-censored multiplex immunoassay data: A unified approach |
| September 20, 2013 10:00 am |
| 214 Duxbury Hall (Nursing) |
|
| Multiplex immunoassays (MIAs) are moderate- to high-throughput platforms for simultaneous quantitation of a panel of analytes, and increasingly are popular as hypothesis generating tools for targeted biomarker identification. As such, MIAs are not always rigorously validated, and often little is known about analytes’ expected concentrations in samples derived from the target population. As a consequence, MIA data can be plagued by high proportions of concentrations flagged either as ‘out-of-range’ – samples for which the observed response falls below (above) the lower (upper) asymptote of a 5-parameter logistic calibration curve – or as extrapolated beyond the smallest or largest standard. We present a unified approach to the analysis of left-censored MIA data in the context of a Bayesian hierarchical model that incorporates background estimation, standard curve fitting, and modeling of observed fluorescence as a function of unobserved (latent) analyte concentration, with accommodation of left-censored concentrations via variance function specification. We present results from both a simulation study and cytokine array analysis of serum specimens from head and neck cancer patients. |
| Back To Top |
| September 13, 2013 |
| Dr. Debdeep Pati, FSU Dept. of Statistics |
| Shrinkage prior in high dimensions |
| September 13, 2013 10:00 am |
| 214 Duxbury Hall (Nursing)
|
|
| Shrinkage priors are routinely used as alternative to point-mass mixture priors for sparse modeling in high-dimensional applications. The question of statistical optimality in such settings is under-studied in a Bayesian framework. We provide theoretical understanding of such Bayesian procedures in terms of two key phenomena: prior concentration around sparse vectors and posterior compressibility. We demonstrate that a large class of commonly used shrinkage priors lead to sub-optimal procedures in high-dimensional settings. As a remedy, we propose a novel shrinkage prior that leads to optimal posterior concentration. A novel sampling algorithm for our proposed prior is devised and illustrations are provided through simulation examples and an image-denoising application. Extension to massive covariance matrix estimation is discussed. |
| Back To Top |
| September 6, 2013 |
| Dr. Anuj Srivastava, FSU Dept. of Statistics |
| Statistical Techniques on Nonlinear Manifolds -- Their Contributions in Advancing Image Understanding |
| September 6, 2013 10:00 am |
| 499 Dirac Science Library |
|
| The primary goal in image understanding is to characterize objects contained in images, in terms of
their locations, motions, activities, and appearances. Due to inherent variability associated with scenes,
images, and objects, statistical approaches becomes natural. Any statistical approach requires mathematical
representations of objects of interest and probabilistic descriptions to capture their variabilities. The difficulty
comes from nonlinearity of object representation spaces -- these are not Euclidean and one cannot perform classical
multivariate statistics directly. Instead, one needs tools from differential geometry, for handling the nonlinearity,
and statistical techniques adapted to these manifolds for performing object characterization.
In contrast to manifold-learning problems where one estimates the underlying manifold, the representation spaces
here are fully known and one need to exploit their geometries to develop efficient statistical tools.
An example of this situation arises in shape analysis of objects in still images and videos. While shapes
have been represented in many ways -- point sets, level-sets, parametrized curves, boundary
surfaces, etc -- their representation spaces mostly form nonlinear manifolds. Here one would like
to compare shapes, average them, develop statistical shape models, and provide tests for hypotheses involving
different shape classes, and the nonlinearity of shape manifolds presents a challenge. I will describe recent advances
in differential-geometric techniques that overcome this challenge and provide a rich set of techniques
for "elastic shape analysis" . The resulting tools include computation of distances for joint shape registration
and comparisons, averaging of shapes, principal component analysis to discover modes of variations,
"Gaussian"-type shape models, and much more. This framework helps capture shape variability in datasets
very efficiently using low-dimensional manifolds and corresponding statistical models. This approach has been
applied to shape analysis in face recognition, activity classification, object recognition, medical diagnosis,
and bioinformatics.
|
| Back To Top |
| July 2, 2013 |
| Oliver Galvis |
| Hybrid Target-Category Forecasting Through Sparse Factor Auto-Regression |
| July 2, 2013 2:00 pm |
| 215 OSB |
|
| Nowadays, time series forecasting in areas such as economic and finance has become a very
challenging task given the enormous amount of information available that may, or may not, serve to
improve the prediction outcome of the series. A peculiar data structure embracing a very large number of
predictors p and a limited number of observations T, where usually (p >>T), is now typical. This
data is usually grouped based on particularities of the series providing a category data structure
where each series belongs to one group. When the objective is to forecast an univariate time series,
or target series, the AR(4) has become a standard. We believe the performance of the AR(4) can
be improved by adding additional predictors from several sources. Then, a comprehensive model
that encompasses two sources of information, one coming from the past observations of the target
series and the other denoted by the factor representation of the additional predictors, can be used
in forecasting. By taking advantage of the category data structure we advocate that any category,
specially the one containing the target series, may be explained by its own lags, and the other
categories and their lags. Consequently, a multivariate regression model naturally arises bringing
two challenges in modeling and forecasting. First, recognize the relevant predictors from a very
large pool of candidates that truly explain the target series, and eliminate data collinearity. Second,
extract the factors that better represent the significant information contained in the very large
pool of predictors. To overcome both challenges we propose the Sparse Factor Auto-Regression
(SFAR) model which has low rankness and cardinality control on the matrix of coefficients. In
computation we propose a new version of the SEL-RRR estimator that simultaneously attains
cardinality control on the number of nonzero rows in the matrix of coefficients to tackle the high
dimension and collinearity problems, and a achieve low rankness on the same matrix to extract very
informative factors. The cardinality control is achieved via the multivariate quantile thresholding
rule while the rank reduction is obtained via a RRR decomposition. With both challenges tackled
model interpretability and forecasting accuracy are accomplished. Applications of the proposed
methodology are performed over synthetic and real world data sets. Results of the experiments
show an improvement from our model over the AR(4) in both applications.
|
| Back To Top |
| July 1, 2013 |
| Seung-Yeon Ha |
| Theories on Group variable selection in multivariate regression models |
| July 1, 2013 11:00 am |
| 215 OSB |
|
| We study group variable selection on multivariate regression model. Group variable selection is selecting the non-zero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantees smaller MSE than OLS according to James-Stein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection.
Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and Hard-Ridge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low.
We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure Hard-Ridge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, ? works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of block-wise weight to l2 penalty.
|
| Back To Top |
| June 27, 2013 |
| Felicia Williams |
| The Relationship of Diabetes to Coronary Heart Disease Mortality: A Meta-Analysis Based on Person-level Data |
| June 27, 2013 2:00 pm |
| 215 OSB |
|
| Studies have suggested that diabetes is a stronger risk factor for coronary heart disease (CHD) in women than in men. We present a meta-analysis of person-level data from 42 cohort studies in which diabetes, CHD mortality and potential confounders were available and a minimum of 75 CHD deaths occurred. These studies followed up 77,863 men and 84,671 women aged 42 to 73 years on average from the US, Denmark, Iceland, Norway and the UK. Individual study prevalence rates of self-reported diabetes mellitus at baseline ranged between less than 1% in the youngest cohort and 15.7% (males) and 11.1% (females) in the NHLBI CHS study of the elderly. CHD death rates varied between 2% and 20%. A meta-analysis was performed in order to calculate overall hazard ratios (HR) of CHD mortality among diabetics compared to non-diabetics using Cox Proportional Hazard models. The random-effects HR associated with baseline diabetes and adjusted for age was significantly higher for females 2.65 (95% CI: 2.34, 2.96) than for males 2.33 (95% CI: 2.07, 2.58) (p=0.004). These estimates were similar to the random-effects estimates adjusted additionally for serum cholesterol, systolic blood pressure, and current smoking status: females 2.69 (95% CI: 2.35, 3.03) and males 2.32 (95% CI: 2.05, 2.59) . They also agree closely with estimates (odds ratios of 2.9 for females and 2.3 for males) obtained in a recent meta-analysis of 50 studies of both fatal and nonfatal CHD but not based on person-level data. This evidence suggests that diabetes diminishes the female advantage. An additional analysis was performed on race. Only 14 cohorts were analyzed in the meta-analysis. This analyses showed no significant difference between the black and white cohorts before (p=0.68) or after adjustment for the major CHD RFs (p=0.88). The limited amount of studies used may lack the power to detect any differences. |
| Back To Top |
| June 19, 2013 |
| Yuanyuan Tang |
| Bayesian Methods for Skewed Response including Longitudinal and Heterscedastic Data |
| June 19, 2013 3:00 pm |
| 215 OSB |
|
| Skewed response data are very popular in practice, especially in biomedical area.
We begin our work from the skewed longitudinal response. We present a partial linear model of median regression function of skewed longitudinal response. We provide justifications for using our methods including theoretical investigation of the support of the prior, asymptotic properties of the posterior and also simulation studies of finite sample properties. Ease of implementation and advantages of our model and method compared to existing methods are illustrated via analysis of a cardiotoxicity study of children of HIV infected mother.
Then we study the skewed and heterocedastic univariate response. We present our novel extension of the transform-both-sides model to the bayesian variable selection area to simultaneously perform the variable selection and parameter estimation.
At last, we proposed our novel Latent Variable Residual Density (LV-RD) model to handle the skewed univariate response with a flexible heteroscedasticity. The advantages of our semiparametric associated Bayes method include the ease of prior elicitation/determination, an easily implementable posterior computation, theoretically sound properties of the selection of priors and accommodation of possible outliers.
|
| Back To Top |
| June 18, 2013 |
| Ester Kim Nilles |
| An Ensemble Approach to Predicting Health Outcomes |
| June 18, 2013 10:00 am |
| 215 OSB |
|
| Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States.
Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve.
Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models - the multilayer perceptron network, k-nearest neighbor and support vector machine - to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSE-weighted, AUC-weighted, logistic and flexible naive Bayes.
The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction.
|
| Back To Top |
| May 15, 2013 |
| Jingyong Su, FSU Dept. of Statistics, Dissertation Defense |
| Statistical Analysis of Trajectories on Riemannian manifolds |
| May 15, 2013 2:00 pm |
| OSB 215 |
|
| This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear
manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear
manifold. First, the observed data are always noisy and discrete at unsynchronized
times. Second, trajectories are observed under arbitrary temporal evolutions. In this work,
we first address the problem of estimating full smooth trajectories on nonlinear manifolds
using only a set of time-indexed points, for use in interpolation, smoothing, and prediction
of dynamic systems. Furthermore, we study statistical analysis of trajectories that
take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal
evolutions. The problem of analyzing such temporal trajectories including registration,
comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity
that provides both a cost function for temporal registration and a proper distance for
comparison of trajectories. This distance, in turn, is used to define statistical summaries,
such as the sample means and covariances, of given trajectories and Gaussian-type models
to capture their variability. Both theoretical proofs and experimental results are provided to
validate our work. |
| Back To Top |
| May 13, 2013 |
| Yingfeng Tao |
| THE FREQUENTIST PERFORMANCE OF SOME BAYESIAN CONFIDENCE INTERVALS FOR THE SURVIVAL FUNCTION |
| May 13, 2013 11:00 am |
| 215 OSB |
|
| Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or interval-censored survival data.
Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the right-censored case, almost all confidence intervals are based in some way on the Kaplan-Meier estimator first proposed by Kaplan and Meier (1958) and widely used as the nonparametric estimator in the presence of right-censored data. For interval-censored data, the Turnbull estimator (Turnbull (1974)) plays a similar role.
For a class of Bayesian models involving Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques lead to probability intervals for the survival function (at arbitrary time points) and its quantiles for both the right-censored and interval-censored cases. This dissertation will examine the frequentist properties and general performance of these probability intervals when the prior is non-informative. Simulation studies will be used to compare these probability intervals with other published approaches. Extensions of the Doss-Huffer approach are given for constructing simultaneous confidence bands for the survival function and for computing approximate confidence intervals for the survival function based on Edgeworth expansions using posterior moments. The performance of these extensions is studied by simulation.
|
| Back To Top |
| May 6, 2013 |
| Darshan Bryner, FSU Dept. of Statistics, Essay Defense |
| Bayesian Active Contours with Affine-Invariant, Elastic Shape Priors |
| May 6, 2013 10:00 am |
| OSB 215 |
|
| Active contour, especially in conjunction with prior-shape
models, has become an important tool in image segmentation. However, most
contour methods use shape priors based on similarity-shape analysis, i.e.
analysis that is invariant to rotation, translation, and scale. In
practice, the training shapes used for prior-shape models may be collected
from viewing angles different from those for the test images and require
invariance to a larger class of transformation. Using an elastic,
affine-invariant shape modeling of planar curves, we propose an active
contour algorithm in which the training and test shapes can be at
arbitrary affine transformations, and the resulting segmentation is robust
to perspective skews. We construct a shape space of affine-standardized
curves and derive a statistical model for capturing class-specific shape
variability. The active contour is then driven by the gradient of a total
energy composed of a data term, a smoothing term, and an affine-invariant
shape-prior term. This framework is demonstrated using a number of
examples involving real images and the segmentation of shadows in sonar
images of underwater objects. |
| Back To Top |
| May 1, 2013 |
| Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense |
| Characterizations of Complex Signals Using Functional ANOVA |
| May 1, 2013 1:00 pm |
| OSB 215 |
|
| Many methods exist for detecting differences in functional data. However, most of these methods make assumptions about the noise on the signals, and the usual assumption is that the noise is normally distributed. I would like to be able to discern differences in time-dependent signals that represent the rate of flow of water exiting the spring of a karstic springshed under different treatments. These signals are simulated using a complex system called KFM developed by Chicken et al. [2007]. Because of the complex nature of KFM, even if the distribution of the noise on the inputs is known, the simulated signal has unknown noise components which are autocorrelated. This is further complicated by the fact that input noise has constraints that do not allow it to be normally distributed. The resulting noisy signal therefore has extremely non-normal noise which makes the use of established methods for detecting differences in signals inappropriate.
The treatments under which the signals will be simulated are characteristics of the underground path where the water flows. Characteristics of interest include the length of the underground channels, called conduits or active connections, and the total number of conduits. It is beneficial to be able to determine differences in flow rate signals for different types of paths because typically the entirety of the underground path is difficult to map and therefore characteristics about the path are usually unknown. If differences in discharge signals are detected for different treatments, the treatment level for the underground waterway can be inferred based on the measured flow of discharge at the spring. It is also useful to be able to determine the flow rate for different types of springsheds for different weather scenarios and in the case of environmental disasters, such as the spill of a contaminant in the springshed. Being able to classify the output of a spring based on its treatment level can help in predicting the discharge rate after a weather event or the flow of the contaminant through the system.
Details of KFM are presented, along with methods of simulating rain and discharge signals under different treatments. Several established methods for tests on functional data are discussed, and areas of interest are discussed for research into a method that has been modified for signals with noise components with an unknown distribution. Specific areas of interest include power calculations, treatment selection methods, and follow up tests for identifying different signals after overall differences are detected.
|
| Back To Top |
| April 29, 2013 |
| Wade Henning, FSU Dept. of Statistics, Essay Defense |
| Characterizing the Shape Populations of Particle Sets |
| April 29, 2013 2:00 pm |
| OSB 215 |
|
| Creating statistical shape models for particle sets is an important
goal in particle science as researchers seek to classify them, predict
their behaviors, or optimize their process parameters. Statistical
shape analysis is becoming the standard approach for comparing
and classifying the shapes of curves: pairwise distances are measured
between boundary functions and used to construct means and covariances.
However, no method currently exists for comparing shape
populations. This essay uses the shape distributions of particle sets
to make inferences about their implicit shape populations. A method
is introduced for estimating the Fisher-Rao distance between shape
populations using elastic shape analysis, kernel density estimation and
Monte Carlo methods. The F-R distance is calculated between probability
density functions on R and shape populations on shape manifolds.
The results provide strong empirical evidence that the Fiseher
Rao distance between shape populations is a discriminating measurement
for comparing particle sets. Statistical modeling based on shape
populations promises to revolutionize the analysis of particle sets and
their processes. |
| Back To Top |
| April 26, 2013 |
| Paul Beaumont, FSU Department of Economics |
| Generalized Impulse Response Functions and the Spillover Index |
| April 26, 2013 10:00 am |
| 108 OSB |
|
| Impulse response functions (IRF) and forecast error variance decompositions (FEVD) from vector autoregression (VAR) systems depend upon the order of the variables in the VAR. We show how to compute the order independent generalized IRF and generalized FEVD and compare them to the results of all possible VAR orderings. We then show that the FEVD related spillover index does not translate well to the generalized case and produces index values well outside the range produced by all permutations of orderings. We illustrate the methods with an application to spillover effects of economic growth rates across countries. |
| Back To Top |
| April 19, 2013 |
| Wei Wu, FSU Dept. of Statistics |
| Time Warping Method and Its Applications |
| April 19, 2013 10:00 am |
| 108 OSB |
|
| In this talk, I will summarize my research on time warping over the past 2-3 years. Focusing on statistical analysis on functional data, we have recently developed a novel geometric framework to compare, align, average, and model a collection of random functional observations, where the key step is to find an optimal time warping between two functions for a feature-to-feature alignment. This framework can be easily extended to analyzing multi-dimensional curves and point process observations. The theoretical underpinning of this framework is established by proving the consistency under a semi-parametric model. Mathematical modeling between two time warpings also leads to a parametric representation for spherical regression. Finally, I will demonstrate this new framework using experimental data in various application domains such as SONAR signals, ECG bio-signals, and spike recordings in geniculate ganglion.
|
| Back To Top |
| April 15, 2013 |
| Michael Rosenthal, Ph.D Candidate Essay Defense |
| Advances in Spherical Regression |
| April 15, 2013 3:30 pm |
| OSB 215 |
|
|
The ability to define correspondences between paired observations on a spherical manifold has applications in earth science, medicine, and image analysis. Spherical data comes in many forms including geographical coordinates from plate tectonics, clouds, and GPS devices. They can also be directional in nature such as from vector cardiograms, winds, currents, and tides. Spherical data are unit vectors of arbitrary dimension and can be viewed as points on the hyper-sphere manifold. Examples of such data often include sounds, signals, shapes, and images. The Riemannian geometry of these hyper-spheres are well known and can be utilized for arbitrary dimensional unit vector data. Past works in spherical regression involve either flattening the spherical manifold to a linear space, or imposing rigid restrictions to the nature of the correspondence between predictor and response variables. While these methods have their advantages in certain settings, there are some severe limitations that will make them inappropriate in a variety of other settings. We propose a method to extend the framework to allow for a very flexible nonparametric form of correspondences for data on the two dimensional sphere. |
| Back To Top |
| April 12, 2013 |
| Karim Lounici, Georgia Tech |
| Variable Selection with Exponential Weights |
| April 12, 2013 10:00 am |
| 108 OSB |
|
| In the context of a linear model with a sparse coecient vector, exponential
weights methods have been shown to be achieve oracle inequalities for predic-
tion. We show that such methods also succeed at variable selection and estima-
tion under the necessary identiability condition on the design matrix, instead
of much stronger assumptions required by other methods such as the Lasso or
the Dantzig Selector. The same analysis yields consistency results for Bayesian
methods and BIC-type variable selection under similar conditions.
Joint Work with Ery Arias-Castro.
|
| Back To Top |
| April 5, 2013 |
| Russell G. Almond, FSU |
| A Particle Filter EM Algorithm for Estimating Parameters of a Partially Observed Markov Decision Process (POMDP) |
| April 5, 2013 10:00 am |
| 108 OSB |
|
| Periodic assessments involve a series of assessments intended to measure a complex of related competencies given to the same collection of individuals at several time points. One challenge with these models is that the student competencies will grow over time as a response of the instructional activities that occur between assessments. Partially observed Markov decision process (POMDP) models, a general case of the hidden Markov model (HMM) or state space model, can capture this dynamic. The model relates a series of observable variables to a series of latent variables, which are assumed to be changing over time. The relationship between the observed and latent variables at each time point is governed by a matrix that reflects the design of the assessment. It is assumed that latent variables change according to a Markov model that is governed by a series of instructional activities. This talk provides an example of a POMDP model and describes a method for combining the particle filter and stochastic EM algorithms for estimating the parameters of POMDPs from panel data coming from the administration of periodic assessment models.
|
| Back To Top |
| March 29, 2013 |
| Genevera Allen, Rice University |
| High-Dimensional Poisson Graphical Models |
| March 29, 2013 10:00 am |
| OSB 108 |
|
|
Markov Networks, especially Gaussian graphical models and Ising models, have become a popular tool to study relationships in high-dimensional data. Variables in many data sets, however, are comprised of count data that may not be well modeled by Gaussian or multinomial distributions. Examples include high-throughputgenomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing methods for Poisson graphical models include the Poisson Markov Random Field (MRF) of Besag (1974) that places severe restrictions on the types of dependencies, only permitting negative correlations between variables. By restricting the domain of the variables in this joint density, we introduce a Winsorized Poisson MRF which permits a rich dependence structure and whose pair-wise conditional densities closely approximate the Poisson distribution. An important consequence of our model is that it gives an analytical form for a multivariate Poisson density with rich dependencies; previous multivariate densities permitted only positive or only negative dependencies. We develop neighborhood selection algorithms to estimate network structure from high-dimensional count data by fitting graphical models based on Besag's MRF, our Winsorized Poisson MRF, and a local approximation to the Winsorized Poisson MRF. We also provide theoretical results illustrating the conditions under which these algorithms recover the network structure with high probability. Through simulations and an application to breast cancer microRNAs measured by next generation sequencing, we demonstrate the advantages of our methods for network recovery from count data. This is joint work with Zhandong Liu, Pradeep Ravikumar and Eunho Yang. |
| Back To Top |
| March 22, 2013 |
| Xiaoming Huo, Georgia Tech |
| Detectability and Related Theorems |
| March 22, 2013 10:00 am |
| OSB 110 |
|
| The Detectability problem determines when certain type of underlying structures is detectable from noisy images. The methodology will base on analyzing the pattern of a collection of local tests. The aggregation of these testing results needs to ensure both statistical efficiency and low computational complexity. In particular, certain testing methods will depend on the distribution of the length of the longest chains that connect locally significant hypotheses tests. The asymptotic distribution of these largest lengths will reveal properties of the test. I will describe some optimality guarantee of proposed detection methods. Statistical aspect of the problem will be focused. Audience only needs to have knowledge on hypotheses testing and asymptotic theory. The strategy of testing locally and deciding globally may have applications in other statistical problems, in which the alternative hypothesis is composite, complicated or overwhelming. The relation between detectability and percolation theory will be discussed. |
| Back To Top |
| March 20, 2013 |
| Jose Laborde, Ph.D Candidate, Essay Defense |
| Elastic Shape Analysis of Amino Acid and Nucleotide Biomolecules |
| March 20, 2013 3:30 pm |
| OSB 215 |
|
| This work aims on developing methods for shape analysis of biomolecules
represented as parameterized 3D open curves for which added
sequence/secondary structure information can be jointly compared. This
requires the adjustment of Elastic Shape Analysis (ESA) methods in that we
can use neither equally spaced 3D points nor same number points in a pair
of structures to be able to compare them. It also needs a biologically
relevant way to incorporate such additional information through a correct
choice of auxiliary function. ESA has been applied mostly on equally
re-sampled versions of original curves so this work also aims to eliminate
this re-sampling step, this will enable us to incorporate
sequences/secondary structure information more naturally into auxiliary
post 3D coordinates. The ESA framework requires a Riemannian metric that
allows: (1) re-parameterizations of curves by isometries, and (2)
efficient computations of geodesic paths between curves. These tools allow
for computing Karcher means and covariances (using tangent PCA) for shape
classes, and a probabilistic classification of curves. To solve these
problems we first introduced a mathematical representation of curves,
called q-functions, and we used the L^2 metric on the space of q-functions
to induce a Riemannian metric on the space of parameterized curves. This
process requires optimal registration of curves and achieves a superior
alignment on them. Mean Shapes and their Covariance structures can be used
to specify a normal probability model on shape classes, which can then be
used for classifying test shapes. We have also achieved superior
classification rates compared to state-of-the-art methods on their RNA
sets which has led to the acceptance of our work into the Nucleic Acids
Research Journal. |
| Back To Top |
| March 19, 2013 |
| Gretchen Rivera, FSU, Dissertation Defense |
| Meta Analysis and Meta Regression of a Measure of Discrimination used in Prognostic Modeling. |
| March 19, 2013 3:00 pm |
| OSB 215 |
|
| In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model.
The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. The predictors are: age, diabetes, total serum cholesterol (mg/dl), high density lipoprotein (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 84 cohort groups.
Our main interest is to evaluate how well the prognostic model discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUROC). The AUROC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUROC and its standard error (SE). We used Meta-analysis to summarize the estimated AUROCs and to evaluate if there is heterogeneity in our estimates. To evaluate the existence of significant heterogeneity we used the Q statistic. Since heterogeneity was found in our study we compare seven different methods for estimating tau^2 (between study variance). We conclude by examining whether differences in study characteristics explained the heterogeneity in the values of the AUROC.
|
| Back To Top |
| March 8, 2013 |
| Yongtao Guan, University of Miami |
| Optimal Estimation of the Intensity Function of a Spatial Point Process |
| March 8, 2013 10:00 am |
| 108 OSB |
|
| Although optimal from a theoretical point of view, maximum likelihood estimation for Cox and cluster point processes can be cumbersome in practice due to the complicated nature of the likelihood function and the associated score function. It is therefore of interest to consider alternative more easily computable estimating functions. We derive the optimal estimating function in a class of first-order estimating functions. The optimal estimating function depends on the solution of a certain Fredholm integral equation and reduces to the likelihood score in case of a Poisson process. We discuss the numerical solution of the Fredholm integral equation and note that a special case of the approximated solution is equivalent to a quasi-likelihood for binary spatial data. The practical performance of the optimal estimating function is evaluated in a simulation study and a data example. |
| Back To Top |
| March 4, 2013 |
| Kelly McGinnity, FSU, Dissertation Defense |
| Nonparametric Wavelet Thresholding and Profile Monitoring for Non-Gaussian Errors |
| March 4, 2013 11:00 am |
| 215 OSB |
|
| Recent advancements in data collection allow scientists and researchers to
obtain massive amounts of information in short periods of time. Often
this data is functional and quite complex. Wavelet transforms are
popular, particularly in the engineering and manufacturing fields, for
handling these type of complicated signals.
A common application of wavelets is in statistical process control (SPC),
in which one tries to determine as quickly as possible if and when a
sequence of profiles has gone out-of-control. However, few wavelet
methods have been proposed that don't rely in some capacity on the
assumption that the observational errors are normally distributed. This
dissertation aims to fill this void by proposing a simple, nonparametric,
distribution-free method of monitoring profiles and estimating
changepoints. Using only the magnitudes and location maps of thresholded
wavelet coefficients, our method uses the spatial adaptivity property of
wavelets to accurately detect profile changes when the signal is obscured
with a variety of non-Gaussian errors.
Wavelets are also widely used for the purpose of dimension reduction.
Applying a thresholding rule to a set of wavelet coefficients results in a
"denoised" version of the original function. Once again, existing
thresholding procedures generally assume independent, identically
distributed normal errors. Thus, the second main focus of this
dissertation is a nonparametric method of thresholding that does not
assume Gaussian errors, or even that the form of the error distribution is
known. We improve upon an existing even-odd cross-validation method by
employing block thresholding and level dependence, and show that the
proposed method works well on both skewed and heavy-tailed distributions.
Such thresholding techniques are essential to the SPC procedure developed
above. |
| Back To Top |
| March 1, 2013 |
| Brian C. Monsell, US Census Bureau |
| Research at the Census Bureau |
| March 1, 2013 10:00 am |
| 108 OSB |
|
| The Census Bureau has taken steps to reinforce the role of research within the organization. This talk will give details on the role of statistical research at the
U. S. Census Bureau. There are renewed opportunities for internships and collaboration with those in the academic community. Details on areas of research important to the Census Bureau will be shared, with particular attention paid to the status of current work in time series analysis and statistical software development. |
| Back To Top |
| February 27, 2013 |
| Rachel Becvarik, FSU, Dissertation Defense |
| Nonparametric Nonstationary Density Estimation Including Upper Control Limit Methods for Detecting Change Points |
| February 27, 2013 10:00 am |
| 215 OSB |
|
| Nonstationary nonparametric densities occur naturally including applications such as
monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current
work on monitoring them for changes. A new statistic is proposed which effectively
monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely nonparametric, designed for no particular distributional assumptions; thus making it effective in a variety of conditions.
Similarly, several estimators have been shown to be successful at monitoring for
changes in functional responses ("profiles'') involving high dimensional data. These methods focus on using a single value upper control limit (UCL) based on a specified in control average run length (ARL) to detect changes in these nonstationary statistics. However, such a UCL is not designed to take into consideration the false alarm rate, the power associated with the test or the underlying distribution of the ARL. Additionally, if the monitoring statistic is known to be monotonic over time (which is typical in methods using maxima in their statistics, for example) the flat UCL does not adjust to this property. We propose several methods for creating UCLs that provide improved power and simultaneously adjust the false alarm rate to user-specified values. Our methods are constructive in nature, making no use of assumed distribution properties of the underlying monitoring statistic. We evaluate the different proposed UCLs through simulations
to illustrate the improvements over current UCLs. The proposed method is evaluated with respect to profile monitoring scenarios and the proposed density statistic. The method is applicable for monitoring any monotonically nondecreasing nonstationary statistics.
|
| Back To Top |
| February 20, 2013 |
| Carl P. Schmertmann, Professor of Economics, at FSU |
| Bayesian Forecasting of Cohort Fertility |
| February 20, 2013 2:00 pm |
| OSB 108 |
|
| There are signs that fertility in rich countries may have stopped declining, but this depends critically on whether women currently in reproductive ages are postponing or reducing lifetime fertility. Analysis of average completed family sizes requires forecasts of remaining fertility for women born 1970-1995. We propose a Bayesian model for fertility that incorporates a priori information about patterns over age and time. We use a new dataset, the Human Fertility Database (HFD), to construct improper priors that give high weight to historically plausible rate surfaces. In the age dimension, cohort schedules should be well approximated by principal components of HFD schedules. In the time dimension, series should be smooth and approximately linear over short spans. We calibrate priors so that approximation residuals have theoretical distributions similar to historical HFD data. Our priors use quadratic penalties and imply a high-dimensional normal posterior distribution for each country’s fertility surface. Forecasts for HFD cohorts currently 15-44 show consistent patterns. In the US, Northern Europe, and Western Europe, slight rebounds in completed fertility are likely. In Central and Southern Europe there is little evidence for a rebound. Our methods could be applied to other forecasting and missing-data problems with only minor modifications. |
| Back To Top |
| February 15, 2013 |
| Fred Huffer, FSU Dept. of Statistics |
| Record Values, Poisson Mixtures, and the Joint Distribution of Counts of Strings in Bernoulli Sequences |
| February 15, 2013 10:00 am |
| 108 OSB |
|
| Let U1, U2, U3, … be iid continuous random variables and Y1, Y2, Y3, …be Bernoulli rv's which indicate the position of the record values in this sequence, that is, Yj = 1 if Ui < Uj for all i < j. Let Z1 be the number of occurrences of consecutive record values in the infinite sequence U1, U2, U3, …and, more generally, Zk be the number of occurrences of two record values separated by exactly k - 1 non-record values. It is a well known but still quite surprising fact that Z1,Z2,Z3, … are independent Poisson rv's with
EZk = 1/k for all k. We show how this may be proved by embedding
the record sequence in a marked Poisson process. If we have only a finite
sequence of trials U1,U2, …,UN, then the record counts Z1,Z2,… will no
longer be exactly Poisson or exactly independent. But if N is random with
an appropriately chosen distribution, we can retain these properties exactly.
This also can be proved by embedding in a marked Poisson process. This is joint work with Jayaram Sethuraman and Sunder Sethuraman.
|
| Back To Top |
| January 25, 2013 |
| Yin Xia |
| Testing of Large Covariance Matrices |
| January 25, 2013 10:00 am |
| 108 OSB |
|
| This talk considers in the high-dimensional setting two inter-related problems: (a) testing the equality of two covariance matrices; (b)recovering the support of the difference of two covariance matrices. We propose a new test for testing the equality of two covariance matrices and investigate its theoretical and numerical properties. The limiting null distribution of the test statistic is derived and the power of the test is studied. The test is shown to enjoy certain optimality and to be especially powerful against sparse alternatives. The simulation results show that the test significantly outperforms the existing methods both in terms of size and power. Analysis of a p53 dataset is carried out to demonstrate the application of the testing procedures.
When the null hypothesis of equal covariance matrices is rejected, it is often of significant interest to further investigate how they differ from each other. Motivated by applications in genomics, we also consider recovering the support of the difference of two covariance matrices. New procedures are introduced and their properties are studied. Applications to gene selection are also discussed. |
| Back To Top |
| January 23, 2013 |
| Ying Sun |
| Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets |
| January 23, 2013 2:00 pm |
| 108 OSB |
|
| For Gaussian process models, likelihood based methods are often dicult to use with large
irregularly spaced spatial datasets, because exact calculations of the likelihood for n observa-
tions require O(n3) operations and O(n2) memory. Various approximation methods have been
developed to address the computational diculties. In this work, we propose new unbiased es-
timating equations based on score equation approximations that are both computationally and
statistically ecient. We replace the inverse covariance matrix that appears in the score equa-
tions by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic
forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix
is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance ma-
trix. The statistical eciency of the resulting unbiased estimating equations are evaluated both
in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based
measurements of water vapor levels over a region in the Southeast Pacic Ocean. This talk is
based on joint work with Michael Stein from University of Chicago. |
| Back To Top |
| January 18, 2013 |
| Minjing Tao |
| Large Volatility Matrix Estimation Based on High-Frequency Financial Data |
| January 18, 2013 10:00 am |
| 108 OSB |
|
| Financial practices often need to estimate an integrated volatility matrix of a large number of assets using noisy high-frequency data. Many existing estimators of volatility matrix of small dimensions become inconsistent when the size of the matrix is close to or larger than the sample size. In this talk, we propose a new type of large volatility matrix estimators based on non-synchronized high-frequency financial data, allowing for the presence of market micro-structure noise. In addition, we investigate the optimal convergence rate for this volatility estimation problem, by building both the asymptotical theory for the proposed estimator and deriving the minimax lower bound. Our proposed estimator has a risk matching this lower bound up to a constant factor, and thus achieves the optimal convergence rate. Furthermore, a simulation study is conducted to test the finite sample performance of our proposed estimator to support the established asymptotic theory. |
| Back To Top |
| January 14, 2013 |
| Naomi Brownstein |
| Analysis of Time-to-Event Data & Intermediate Phenotypes in the OPPERA Study |
| January 14, 2013 2:00 pm |
| 108 OSB |
|
| In a prospective cohort study, examining all participants for incidence of the condition of interest may be prohibitively expensive. For example, the \gold standard" for diagnosing temporomandibular disorders (TMD) is a clinical examination by an expert dentist. In a large prospective cohort study, examining all subjects in this manner is infeasible. Instead, it is common to use a cheaper (and less reliable) examination to screen for possible incident cases and perform the gold standard" examination only on those who screen positive on the simpler examination. Unfortunately, subjects may leave the study before receiving the \gold standard" examination. This results in a survival analysis problem with missing censoring indicators. Motivated by the Orofacial Pain: Evaluation and Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose methods for parameter estimation in survival models with missing censoring indicators. We estimate the probability of being a case for those with no gold standard" examination through a logistic regression model. Predicted probabilities facilitate estimation of the hazard ratios associated with each putative risk factor. Multiple imputation produces variance estimates for this procedure. Simulations show that our methods perform better than naïve approaches. In addition, we apply the method to data in the OPPERA study and extend the methods to account for repeated measures and missing covariates.
Another problem of recent interest is the analysis of secondary phenotypes in case-control studies. Standard methods may be biased and lack coverage and power. We propose a general method for analysis of arbitrary phenotypes, including ordinal and survival outcomes. We advocate the use of inverse probability weighted methods and estimate the standard error by bootstrapping.
|
| Back To Top |
| January 11, 2013 |
| Qing Mai |
| Semiparametric Sparse Discriminant Analysis in High Dimensions |
| January 11, 2013 10:00 am |
| 108 OSB |
|
| In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Tibshirani et al. (2002), Fan & Fan (2008), Wu et al. (2009), Clemmensen et al. (2011), Cai & Liu (2011), Witten & Tib-shirani (2011), Fan et al. (2012) and Mai et al. (2012)). These research efforts are rejuvenating discriminant analysis. However, the normality assumption, which rarely holds in real applications, is still required by all of these recent methods. We develop high-dimensional semi parametric sparse discriminant analysis (SeSDA) that generalizes the normality-based discriminant analysis by relaxing the Gaussian assumption. If the underlying Bayes rule is sparse, SeSDA can estimate the Bayes rule and select the true features simultaneously with overwhelming probability, as long as the logarithm of dimension grows slower than the cube root of sample size. At the core of the theory is a new exponential concentration bound for semiparametric Gaussian copulas, which is of independent interest. Further, the analysis of a malaria data (Ockenhouse et al. (2006)) by SeSDA confirms the superior performance of SeSDA to normality-based methods in both classification and feature selection.
|
| Back To Top |
|
|
|