|
Colloquia
|
| November 22, 2013, 10:00 am |
Dr. Washington Mio, FSU Dept. of Mathematics |
| November 15, 2013, 10:00 am |
Dr. Nilanjan Chatterjee, National Cancer Institute |
| November 8, 2013, 10:00 am |
Dr. Todd Ogden, Columbia University, Dept. of Biostatistics |
| November 1, 2013, 10:00 am |
Dr. Yiyuan She |
| October 25, 2013, 10:00 am |
Dr. Stephen Walker, University of Kent and University of Texas at Austin |
| October 18, 2013, 10:00 am |
Dr. Steve Marron, UNC Chapel Hill |
| October 11, 2013, 10:00 am |
Dr. Runze Li, Penn State University |
| October 4, 2013, 10:00 am |
Dr. Robert Clickner, FSU |
| September 27, 2013, 10:00 am |
Dr. Jim Hobert, University of Florida |
| September 20, 2013, 9:00 am |
TBA |
| September 13, 2013, 10:00 am |
TBA |
| September 6, 2013, 10:00 am |
TBA |
| July 2, 2013, 2:00 pm |
Oliver Galvis |
| July 1, 2013, 11:00 am |
Seung-Yeon Ha |
| June 27, 2013, 2:00 pm |
Felicia Williams |
| June 19, 2013, 3:00 pm |
Yuanyuan Tang |
| June 18, 2013, 10:00 am |
Ester Kim Nilles |
| May 15, 2013, 2:00 pm |
Jingyong Su, FSU Dept. of Statistics, Dissertation Defense |
| May 13, 2013, 11:00 am |
Yingfeng Tao |
| May 6, 2013, 10:00 am |
Darshan Bryner, FSU Dept. of Statistics, Essay Defense |
| May 1, 2013, 1:00 pm |
Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense |
| April 29, 2013, 2:00 pm |
Wade Henning, FSU Dept. of Statistics, Essay Defense |
| April 26, 2013, 10:00 am |
Paul Beaumont, FSU Department of Economics |
| April 19, 2013, 10:00 am |
Wei Wu, FSU Dept. of Statistics |
| April 15, 2013, 3:30 pm |
Michael Rosenthal, Ph.D Candidate Essay Defense |
| April 12, 2013, 10:00 am |
Karim Lounici, Georgia Tech |
| April 5, 2013, 10:00 am |
Russell G. Almond, FSU |
| March 29, 2013, 10:00 am |
Genevera Allen, Rice University |
| March 22, 2013, 10:00 am |
Xiaoming Huo, Georgia Tech |
| March 20, 2013, 3:30 pm |
Jose Laborde, Ph.D Candidate, Essay Defense |
| March 19, 2013, 3:00 pm |
Gretchen Rivera, FSU, Dissertation Defense |
| March 8, 2013, 10:00 am |
Yongtao Guan, University of Miami |
| March 4, 2013, 11:00 am |
Kelly McGinnity, FSU, Dissertation Defense |
| March 1, 2013, 10:00 am |
Brian C. Monsell, US Census Bureau |
| February 27, 2013, 10:00 am |
Rachel Becvarik, FSU, Dissertation Defense |
| February 20, 2013, 2:00 pm |
Carl P. Schmertmann, Professor of Economics, at FSU |
| February 15, 2013, 10:00 am |
Fred Huffer, FSU Dept. of Statistics |
| January 25, 2013, 10:00 am |
Yin Xia |
| January 23, 2013, 2:00 pm |
Ying Sun |
| January 18, 2013, 10:00 am |
Minjing Tao |
| January 14, 2013, 2:00 pm |
Naomi Brownstein |
| January 11, 2013, 10:00 am |
Qing Mai |
| July 2, 2013 |
| Oliver Galvis |
| Hybrid Target-Category Forecasting Through Sparse Factor Auto-Regression |
| July 2, 2013 2:00 pm |
| 215 OSB |
|
| Nowadays, time series forecasting in areas such as economic and finance has become a very
challenging task given the enormous amount of information available that may, or may not, serve to
improve the prediction outcome of the series. A peculiar data structure embracing a very large number of
predictors p and a limited number of observations T, where usually (p >>T), is now typical. This
data is usually grouped based on particularities of the series providing a category data structure
where each series belongs to one group. When the objective is to forecast an univariate time series,
or target series, the AR(4) has become a standard. We believe the performance of the AR(4) can
be improved by adding additional predictors from several sources. Then, a comprehensive model
that encompasses two sources of information, one coming from the past observations of the target
series and the other denoted by the factor representation of the additional predictors, can be used
in forecasting. By taking advantage of the category data structure we advocate that any category,
specially the one containing the target series, may be explained by its own lags, and the other
categories and their lags. Consequently, a multivariate regression model naturally arises bringing
two challenges in modeling and forecasting. First, recognize the relevant predictors from a very
large pool of candidates that truly explain the target series, and eliminate data collinearity. Second,
extract the factors that better represent the significant information contained in the very large
pool of predictors. To overcome both challenges we propose the Sparse Factor Auto-Regression
(SFAR) model which has low rankness and cardinality control on the matrix of coefficients. In
computation we propose a new version of the SEL-RRR estimator that simultaneously attains
cardinality control on the number of nonzero rows in the matrix of coefficients to tackle the high
dimension and collinearity problems, and a achieve low rankness on the same matrix to extract very
informative factors. The cardinality control is achieved via the multivariate quantile thresholding
rule while the rank reduction is obtained via a RRR decomposition. With both challenges tackled
model interpretability and forecasting accuracy are accomplished. Applications of the proposed
methodology are performed over synthetic and real world data sets. Results of the experiments
show an improvement from our model over the AR(4) in both applications.
|
| Back To Top |
| July 1, 2013 |
| Seung-Yeon Ha |
| Theories on Group variable selection in multivariate regression models |
| July 1, 2013 11:00 am |
| 215 OSB |
|
| We study group variable selection on multivariate regression model. Group variable selection is selecting the non-zero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantees smaller MSE than OLS according to James-Stein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection.
Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and Hard-Ridge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low.
We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure Hard-Ridge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, ? works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of block-wise weight to l2 penalty.
|
| Back To Top |
| June 18, 2013 |
| Ester Kim Nilles |
| An Ensemble Approach to Predicting Health Outcomes |
| June 18, 2013 10:00 am |
| 215 OSB |
|
| Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States.
Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve.
Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models - the multilayer perceptron network, k-nearest neighbor and support vector machine - to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSE-weighted, AUC-weighted, logistic and flexible naive Bayes.
The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction.
|
| Back To Top |
| May 15, 2013 |
| Jingyong Su, FSU Dept. of Statistics, Dissertation Defense |
| Statistical Analysis of Trajectories on Riemannian manifolds |
| May 15, 2013 2:00 pm |
| OSB 215 |
|
| This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear
manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear
manifold. First, the observed data are always noisy and discrete at unsynchronized
times. Second, trajectories are observed under arbitrary temporal evolutions. In this work,
we first address the problem of estimating full smooth trajectories on nonlinear manifolds
using only a set of time-indexed points, for use in interpolation, smoothing, and prediction
of dynamic systems. Furthermore, we study statistical analysis of trajectories that
take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal
evolutions. The problem of analyzing such temporal trajectories including registration,
comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity
that provides both a cost function for temporal registration and a proper distance for
comparison of trajectories. This distance, in turn, is used to define statistical summaries,
such as the sample means and covariances, of given trajectories and Gaussian-type models
to capture their variability. Both theoretical proofs and experimental results are provided to
validate our work. |
| Back To Top |
| May 6, 2013 |
| Darshan Bryner, FSU Dept. of Statistics, Essay Defense |
| Bayesian Active Contours with Affine-Invariant, Elastic Shape Priors |
| May 6, 2013 10:00 am |
| OSB 215 |
|
| Active contour, especially in conjunction with prior-shape
models, has become an important tool in image segmentation. However, most
contour methods use shape priors based on similarity-shape analysis, i.e.
analysis that is invariant to rotation, translation, and scale. In
practice, the training shapes used for prior-shape models may be collected
from viewing angles different from those for the test images and require
invariance to a larger class of transformation. Using an elastic,
affine-invariant shape modeling of planar curves, we propose an active
contour algorithm in which the training and test shapes can be at
arbitrary affine transformations, and the resulting segmentation is robust
to perspective skews. We construct a shape space of affine-standardized
curves and derive a statistical model for capturing class-specific shape
variability. The active contour is then driven by the gradient of a total
energy composed of a data term, a smoothing term, and an affine-invariant
shape-prior term. This framework is demonstrated using a number of
examples involving real images and the segmentation of shadows in sonar
images of underwater objects. |
| Back To Top |
| May 1, 2013 |
| Katie Hillebrandt, FSU Dept. of Statistics, Essay Defense |
| Characterizations of Complex Signals Using Functional ANOVA |
| May 1, 2013 1:00 pm |
| OSB 215 |
|
| Many methods exist for detecting differences in functional data. However, most of these methods make assumptions about the noise on the signals, and the usual assumption is that the noise is normally distributed. I would like to be able to discern differences in time-dependent signals that represent the rate of flow of water exiting the spring of a karstic springshed under different treatments. These signals are simulated using a complex system called KFM developed by Chicken et al. [2007]. Because of the complex nature of KFM, even if the distribution of the noise on the inputs is known, the simulated signal has unknown noise components which are autocorrelated. This is further complicated by the fact that input noise has constraints that do not allow it to be normally distributed. The resulting noisy signal therefore has extremely non-normal noise which makes the use of established methods for detecting differences in signals inappropriate.
The treatments under which the signals will be simulated are characteristics of the underground path where the water flows. Characteristics of interest include the length of the underground channels, called conduits or active connections, and the total number of conduits. It is beneficial to be able to determine differences in flow rate signals for different types of paths because typically the entirety of the underground path is difficult to map and therefore characteristics about the path are usually unknown. If differences in discharge signals are detected for different treatments, the treatment level for the underground waterway can be inferred based on the measured flow of discharge at the spring. It is also useful to be able to determine the flow rate for different types of springsheds for different weather scenarios and in the case of environmental disasters, such as the spill of a contaminant in the springshed. Being able to classify the output of a spring based on its treatment level can help in predicting the discharge rate after a weather event or the flow of the contaminant through the system.
Details of KFM are presented, along with methods of simulating rain and discharge signals under different treatments. Several established methods for tests on functional data are discussed, and areas of interest are discussed for research into a method that has been modified for signals with noise components with an unknown distribution. Specific areas of interest include power calculations, treatment selection methods, and follow up tests for identifying different signals after overall differences are detected.
|
| Back To Top |
| April 15, 2013 |
| Michael Rosenthal, Ph.D Candidate Essay Defense |
| Advances in Spherical Regression |
| April 15, 2013 3:30 pm |
| OSB 215 |
|
|
The ability to define correspondences between paired observations on a spherical manifold has applications in earth science, medicine, and image analysis. Spherical data comes in many forms including geographical coordinates from plate tectonics, clouds, and GPS devices. They can also be directional in nature such as from vector cardiograms, winds, currents, and tides. Spherical data are unit vectors of arbitrary dimension and can be viewed as points on the hyper-sphere manifold. Examples of such data often include sounds, signals, shapes, and images. The Riemannian geometry of these hyper-spheres are well known and can be utilized for arbitrary dimensional unit vector data. Past works in spherical regression involve either flattening the spherical manifold to a linear space, or imposing rigid restrictions to the nature of the correspondence between predictor and response variables. While these methods have their advantages in certain settings, there are some severe limitations that will make them inappropriate in a variety of other settings. We propose a method to extend the framework to allow for a very flexible nonparametric form of correspondences for data on the two dimensional sphere. |
| Back To Top |
| March 29, 2013 |
| Genevera Allen, Rice University |
| High-Dimensional Poisson Graphical Models |
| March 29, 2013 10:00 am |
| OSB 108 |
|
|
Markov Networks, especially Gaussian graphical models and Ising models, have become a popular tool to study relationships in high-dimensional data. Variables in many data sets, however, are comprised of count data that may not be well modeled by Gaussian or multinomial distributions. Examples include high-throughputgenomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing methods for Poisson graphical models include the Poisson Markov Random Field (MRF) of Besag (1974) that places severe restrictions on the types of dependencies, only permitting negative correlations between variables. By restricting the domain of the variables in this joint density, we introduce a Winsorized Poisson MRF which permits a rich dependence structure and whose pair-wise conditional densities closely approximate the Poisson distribution. An important consequence of our model is that it gives an analytical form for a multivariate Poisson density with rich dependencies; previous multivariate densities permitted only positive or only negative dependencies. We develop neighborhood selection algorithms to estimate network structure from high-dimensional count data by fitting graphical models based on Besag's MRF, our Winsorized Poisson MRF, and a local approximation to the Winsorized Poisson MRF. We also provide theoretical results illustrating the conditions under which these algorithms recover the network structure with high probability. Through simulations and an application to breast cancer microRNAs measured by next generation sequencing, we demonstrate the advantages of our methods for network recovery from count data. This is joint work with Zhandong Liu, Pradeep Ravikumar and Eunho Yang. |
| Back To Top |
| March 4, 2013 |
| Kelly McGinnity, FSU, Dissertation Defense |
| Nonparametric Wavelet Thresholding and Profile Monitoring for Non-Gaussian Errors |
| March 4, 2013 11:00 am |
| 215 OSB |
|
| Recent advancements in data collection allow scientists and researchers to
obtain massive amounts of information in short periods of time. Often
this data is functional and quite complex. Wavelet transforms are
popular, particularly in the engineering and manufacturing fields, for
handling these type of complicated signals.
A common application of wavelets is in statistical process control (SPC),
in which one tries to determine as quickly as possible if and when a
sequence of profiles has gone out-of-control. However, few wavelet
methods have been proposed that don't rely in some capacity on the
assumption that the observational errors are normally distributed. This
dissertation aims to fill this void by proposing a simple, nonparametric,
distribution-free method of monitoring profiles and estimating
changepoints. Using only the magnitudes and location maps of thresholded
wavelet coefficients, our method uses the spatial adaptivity property of
wavelets to accurately detect profile changes when the signal is obscured
with a variety of non-Gaussian errors.
Wavelets are also widely used for the purpose of dimension reduction.
Applying a thresholding rule to a set of wavelet coefficients results in a
"denoised" version of the original function. Once again, existing
thresholding procedures generally assume independent, identically
distributed normal errors. Thus, the second main focus of this
dissertation is a nonparametric method of thresholding that does not
assume Gaussian errors, or even that the form of the error distribution is
known. We improve upon an existing even-odd cross-validation method by
employing block thresholding and level dependence, and show that the
proposed method works well on both skewed and heavy-tailed distributions.
Such thresholding techniques are essential to the SPC procedure developed
above. |
| Back To Top |
| February 27, 2013 |
| Rachel Becvarik, FSU, Dissertation Defense |
| Nonparametric Nonstationary Density Estimation Including Upper Control Limit Methods for Detecting Change Points |
| February 27, 2013 10:00 am |
| 215 OSB |
|
| Nonstationary nonparametric densities occur naturally including applications such as
monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current
work on monitoring them for changes. A new statistic is proposed which effectively
monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely nonparametric, designed for no particular distributional assumptions; thus making it effective in a variety of conditions.
Similarly, several estimators have been shown to be successful at monitoring for
changes in functional responses ("profiles'') involving high dimensional data. These methods focus on using a single value upper control limit (UCL) based on a specified in control average run length (ARL) to detect changes in these nonstationary statistics. However, such a UCL is not designed to take into consideration the false alarm rate, the power associated with the test or the underlying distribution of the ARL. Additionally, if the monitoring statistic is known to be monotonic over time (which is typical in methods using maxima in their statistics, for example) the flat UCL does not adjust to this property. We propose several methods for creating UCLs that provide improved power and simultaneously adjust the false alarm rate to user-specified values. Our methods are constructive in nature, making no use of assumed distribution properties of the underlying monitoring statistic. We evaluate the different proposed UCLs through simulations
to illustrate the improvements over current UCLs. The proposed method is evaluated with respect to profile monitoring scenarios and the proposed density statistic. The method is applicable for monitoring any monotonically nondecreasing nonstationary statistics.
|
| Back To Top |
| February 20, 2013 |
| Carl P. Schmertmann, Professor of Economics, at FSU |
| Bayesian Forecasting of Cohort Fertility |
| February 20, 2013 2:00 pm |
| OSB 108 |
|
| There are signs that fertility in rich countries may have stopped declining, but this depends critically on whether women currently in reproductive ages are postponing or reducing lifetime fertility. Analysis of average completed family sizes requires forecasts of remaining fertility for women born 1970-1995. We propose a Bayesian model for fertility that incorporates a priori information about patterns over age and time. We use a new dataset, the Human Fertility Database (HFD), to construct improper priors that give high weight to historically plausible rate surfaces. In the age dimension, cohort schedules should be well approximated by principal components of HFD schedules. In the time dimension, series should be smooth and approximately linear over short spans. We calibrate priors so that approximation residuals have theoretical distributions similar to historical HFD data. Our priors use quadratic penalties and imply a high-dimensional normal posterior distribution for each country’s fertility surface. Forecasts for HFD cohorts currently 15-44 show consistent patterns. In the US, Northern Europe, and Western Europe, slight rebounds in completed fertility are likely. In Central and Southern Europe there is little evidence for a rebound. Our methods could be applied to other forecasting and missing-data problems with only minor modifications. |
| Back To Top |
| February 15, 2013 |
| Fred Huffer, FSU Dept. of Statistics |
| Record Values, Poisson Mixtures, and the Joint Distribution of Counts of Strings in Bernoulli Sequences |
| February 15, 2013 10:00 am |
| 108 OSB |
|
| Let U1, U2, U3, … be iid continuous random variables and Y1, Y2, Y3, …be Bernoulli rv's which indicate the position of the record values in this sequence, that is, Yj = 1 if Ui < Uj for all i < j. Let Z1 be the number of occurrences of consecutive record values in the infinite sequence U1, U2, U3, …and, more generally, Zk be the number of occurrences of two record values separated by exactly k - 1 non-record values. It is a well known but still quite surprising fact that Z1,Z2,Z3, … are independent Poisson rv's with
EZk = 1/k for all k. We show how this may be proved by embedding
the record sequence in a marked Poisson process. If we have only a finite
sequence of trials U1,U2, …,UN, then the record counts Z1,Z2,… will no
longer be exactly Poisson or exactly independent. But if N is random with
an appropriately chosen distribution, we can retain these properties exactly.
This also can be proved by embedding in a marked Poisson process. This is joint work with Jayaram Sethuraman and Sunder Sethuraman.
|
| Back To Top |
| January 11, 2013 |
| Qing Mai |
| Semiparametric Sparse Discriminant Analysis in High Dimensions |
| January 11, 2013 10:00 am |
| 108 OSB |
|
| In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Tibshirani et al. (2002), Fan & Fan (2008), Wu et al. (2009), Clemmensen et al. (2011), Cai & Liu (2011), Witten & Tib-shirani (2011), Fan et al. (2012) and Mai et al. (2012)). These research efforts are rejuvenating discriminant analysis. However, the normality assumption, which rarely holds in real applications, is still required by all of these recent methods. We develop high-dimensional semi parametric sparse discriminant analysis (SeSDA) that generalizes the normality-based discriminant analysis by relaxing the Gaussian assumption. If the underlying Bayes rule is sparse, SeSDA can estimate the Bayes rule and select the true features simultaneously with overwhelming probability, as long as the logarithm of dimension grows slower than the cube root of sample size. At the core of the theory is a new exponential concentration bound for semiparametric Gaussian copulas, which is of independent interest. Further, the analysis of a malaria data (Ockenhouse et al. (2006)) by SeSDA confirms the superior performance of SeSDA to normality-based methods in both classification and feature selection.
|
| Back To Top |