Join the colloquium announcement mailing list
Colloquia Archive

Colloquia
December 3, 2010, 10:10 am Dr. Andrea Troxel, University of Pennsylvania
December 2, 2010, 2:30 pm Wei Liu
December 2, 2010, 2:30 pm Xiaoyun (Nicole) Li
December 2, 2010, 9:30 am Matthew Dutton
November 23, 2010, 9:30 am Lindsey Bell
November 19, 2010, 10:10 am Jordan Cuevas
November 12, 2010, 10:10 am Dr. Scott Schmidler, Duke University
October 15, 2010, 10:10 am Malay Ghosh, University of Florida
October 14, 2010, 1:00 pm Kunle Olumide
October 1, 2010, 10:10 am Dr. Elizabeth Slate of Medical University of South Carolina
September 28, 2010, 9:30 am Haiyan Zhao
September 17, 2010, 10:00 am Feng Zhao
September 14, 2010, 2:00 pm Weting Wang
April 2, 2010, 10:00 am Dr. Sam Kou: Department of Statistics, Harvard University
March 26, 2010, 10:00 am Quentin Rentmeesters: Catholic University of Louvain, Belgium
March 19, 2010, 10:00 am Vernon Lawhern
March 17, 2010, 3:35 pm Wenting Wang
February 19, 2010, 12:00 am ASA Meeting
January 29, 2010, 10:10 am Dr. Wei Wu



December 3, 2010
Speaker:Dr. Andrea Troxel, University of Pennsylvania
Title:A weighted combination of pseudo-likelihood estimators for longitudinal binary data subject to nonignorable non-monotone missingness
When:December 3, 2010 10:10 am
Where:108 OSB
Abstract:
For longitudinal binary data with non-monotone non-ignorably missing outcomes over time, a full likelihood approach is complicated algebraically, and with many follow-up times, maximum likelihood estimation can be computationally prohibitive. As alternatives, two pseudo-likelihood approaches have been proposed that use minimal parametric assumptions. One formulation requires specification of the marginal distributions of the outcome and missing data mechanism at each time point, but uses an “independence working assumption,” i.e., an assumption that observations are independent over time. Another method avoids having to estimate the missing data mechanism by formulating a “protective estimator.” In simulations, these two estimators can be very inefficient, both for estimating time trends in the first case and for estimating both time-varying and time-stationary effects in the second. In this paper, we propose use of the optimal weighted combination of these two estimators, and in simulations we show that the optimal weighted combination can be much more efficient than either estimator alone. Finally, the proposed method is used to analyze data from two longitudinal clinical trials of HIV-infected patients.
Back To Top

December 2, 2010
Speaker:Wei Liu
Title:A RIEMANNIAN FRAMEWORK FOR ANNOTATED CURVES ANALYSIS
When:December 2, 2010 2:30 pm
Where:003 BEL
Abstract:
We propose a Riemannian framework for shape analysis of annotated curves -- curves that have certain attributes defined along them, in addition to their geometries. These attributes may be in form of vector-valued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions using both geometric and auxiliary coordinates, and analyze shapes of these curves. The difficulty comes from the need for removing different groups from different components: the shape is invariant to rigid-motion, global scale and re-parameterization while the auxiliary component is usually invariant only to the latter. Thus, the removal of some transformations (rigid motion and global scale) is restricted only to the geometric coordinates, while the re-parameterization group is removed for all coordinates. We demonstrate this framework using a number of examples from computer vision, DTI tractography, protein structural alignment, and landmark-based shape matching.
Back To Top

December 2, 2010
Speaker:Xiaoyun (Nicole) Li
Title:ANALYSIS OF MULTIVARIATE DATA WITH RANDOM CLUSTER SIZE
When:December 2, 2010 2:30 pm
Where:104 MON
Abstract:
In this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as clustered data if sampling unit is a cluster of subjects, or it can be referred as longitudinal data when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real data applications. In biomedical studies involving clustered binary responses, the cluster size can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel two-stage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population. We modify and adopt the model in longitudinal studies with binary longitudinal response and informative missing data. In longitudinal studies, when treating each subject as a cluster, cluster size is the total number of observations for each subject. When data is informative missing, cluster size of each subject can vary and is related to the binary response of interest and we are also interested in the missing mechanism. This is a modified situation of the cluster binary data with present components. We modify and adopt our proposed two-stage random effects logistic regression model so that both the marginal probability of binary response and missing indicator as well as the conditional probability of binary response and missing indicator preserve logistic regression forms. We present a Bayesian framework of this model and illustrate our proposed model on an AIDS data.
Back To Top

December 2, 2010
Speaker:Matthew Dutton
Title:INDIVIDUAL PATIENT-LEVEL DATA META-ANALYSIS: A COMPARISON OF METHODS FOR THE DIVERSE POPULATIONS COLLABORATION DATA SET
When:December 2, 2010 9:30 am
Where:104 MON
Abstract:
DerSimonian and Laird define meta-analysis as \the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical meta-analytic approaches in known as Individual Patient-Level Data, or IPD, meta-analysis. Rather than depending on summary statistics calculated for individual studies, IPD meta-analysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta-analytic framework are investigated. A two-stage analysis is first conducted, in which individual models are _t for each study and summarized using classical meta-analysis procedures. Secondly, a one-stage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Back To Top

November 23, 2010
Speaker:Lindsey Bell
Title:A statistical approach for information extraction of biological relationships
When:November 23, 2010 9:30 am
Where:104 MON
Abstract:
Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. As this content continues to grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. Under this framework we aim to combine the strengths of Bayesian Networks and Logistic Regression to fit multiple models for classifying triplets as true (a relationship exists between the entities) or false (no relationship is mentioned). The preliminary study shows promising results. Using a mixture of logistic models, 10x10 fold cross validation gives recall and precision of 73 ± 6% and 74 ± 6% respectively with F-measure 73±3%. The method is also shown to be fairly robust for different proportions of true and false cases in the training data. A simple extension of the method also models the relationship direction for appropriate interaction words, obtaining accuracy of 83 ±2% in 10 fold cross validation. Using different dictionaries the method may include information extraction of any biological relationships mentioned in the literature. The ultimate goal is to create an ensemble approach for information extraction.
Back To Top

November 19, 2010
Speaker:Jordan Cuevas
Title:
When:November 19, 2010 10:10 am
Where:108 OSB
Abstract:
A method is desired that determines when both the variability of the noise and the function structure in a sequence of functional profiles deviates from known, fixed values. The functional portion of the profiles should be allowed to come from a large class of functions and so nonparametric methods are preferred. A method is proposed that makes use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next. Several alternative implementations of the estimator are compared on a variety of conditions, including allowing the noise subspace to be substantially contaminated by functional structure. The functional portion of the profiles are often highly irregular with jumps throughout. In order to effectively monitor the functional portion of the profile, it is also necessary to accurately reconstruct jumps from a noisy function. A popular wavelet method for estimating jumps in functions is through the use of the translation invariant (TI) estimator. However, a drawback to TI is that it includes multiple shifted estimates of the data in its construction, even those that may reduce, rather than improve, the effectiveness of the method. Here, a method is proposed that modifies the TI to improve jump reconstruction in terms of mean square error of the reconstructions and visual performance. Information from the set of shifted data sets is used to mimic the performance of an oracle which knows exactly which are the best TI shifts to retain in the reconstruction.
Back To Top

November 12, 2010
Speaker:Dr. Scott Schmidler, Duke University
Title:Bayesian Shape Matching for Protein Structure Alignment and Phylogeny
When:November 12, 2010 10:10 am
Where:108 OSB
Abstract:
Understanding protein structure and function remains one of the great post-genome challenges of biology and molecular medicine. The 3D structure of a protein provides fundamental insights into its biological function, mechanism, and interactions, and plays a key role in drug design. Large-scale experimental efforts are collecting increasingly large numbers of high-resolution structural data. We present a Bayesian approach to modeling protein structure families, using methods adapted from the statistical theory of shape. Our approach provides natural solutions to a variety of problems in the field, including pairwise and multiple alignment for the study of conservation and variability, algorithms for flexible matching, and the impact of alignment uncertainty on phylogenetic tree reconstruction.
Back To Top

October 15, 2010
Speaker:Malay Ghosh, University of Florida
Title:AN INTRODUCTION FOR FREQUENTISTS
When:October 15, 2010 10:10 am
Where:108 OSB
Abstract:
Bayesian methods are increasingly applied in these days in the theory and practice of statistics. Any Bayesian inference depends on a likelihood and a prior. Ideally one would like to elicit a prior from related sources of information or past data. However, in its absence, Bayesian methods need to rely on some objective" or default" priors, and the resulting posterior inference can still be quite valuable. Not surprisingly, over the years, the catalog of objective priors also has become prohibitively large, and one has to set some specific criteria for the selection of such priors. Our aim is to review some of these criteria, compare their performance, and illustrate them with some simple examples. While for very large sample sizes, it does not possibly matter what objective prior one uses, the selection of such a prior does influence inference for small or moderate samples. For regular models where asymptotic normality holds, Jeffreys' general rule prior, the positive square root of the determinant of the Fisher information matrix, enjoys many optimality properties in the absence of nuisance parameters. In the presence of nuisance parameters, however, there are many other priors which emerge as optimal depending on the criterion selected. One new feature in this article is that a prior different from Jeffreys' is shown to be optimal under the chisquare divergence criterion even in the absence of nuisance parameters. The latter is also invariant under one-to-one reparameterization.
Back To Top

October 14, 2010
Speaker:Kunle Olumide
Title:A PROBABILISTIC AND GRAPHICAL ANALYSIS OF EVIDENCE IN O.J. SIMPSON'S MURDER CASE USING BAYESIAN NETWORKS
When:October 14, 2010 1:00 pm
Where:Montgomery 104
Abstract:
This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the sub-fields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence – the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or non-guilt of the accused; this main hypothesis will be supplemented by series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such evaluation. Some sensitivity analysis will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis.
Back To Top

October 1, 2010
Speaker:Dr. Elizabeth Slate of Medical University of South Carolina
Title:Logic Forest:An ensemble method for biomarker discovery.
When:October 1, 2010 10:10 am
Where:108 OSB
Abstract:
Back To Top

September 28, 2010
Speaker:Haiyan Zhao
Title:Time-varying Coefficient Models with ARMA-GARCH Structures for Longitudinal Data Analysis
When:September 28, 2010 9:30 am
Where:HCB 209
Abstract:
The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meter squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the time-varying effects of the risk factors on CHD incidence. Time-varying coefficient models with ARMA-GARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since high-dimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The Kullback-Leibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the time-varying effects of covariates with respect to CHD incidence. To specify the time-series structures of the effects of 1 risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For male, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays most the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trend for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For female, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Back To Top

September 17, 2010
Speaker:Feng Zhao
Title:Bayesian Portfolio Optimization with Time-varying Factor Models
When:September 17, 2010 10:00 am
Where:108 OSB
Abstract:
We develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the risk-free rate ("risk premia"). Both firm-level characteristics and macroeconomic variables are used to predict stocks' time-varying alphas and betas, and macroeconomic variables are used to predict the risk premia. All of the models are specified in a Bayesian framework to account for estimation risk, and informative prior distributions on both stock returns and model parameters are adopted to reduce estimation error. To gauge the economic significance of the predictability, we apply the models to the U.S. stock market and construct optimal portfolios based on model predictions. Out-of-sample performance of the portfolios is evaluated to compare the models.
Back To Top

September 14, 2010
Speaker:Weting Wang
Title:Some Methods for Design and Analysis of Survival Data
When:September 14, 2010 2:00 pm
Where:Bellamy 243
Abstract:
The Abstract is: For survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival model (POSM). We presented a sufficient condition to check whether log-rank based tests have inflated type I error. We show that POSM and some other commonly used survival models within the LTM class all satisfy this condition. Simulation studies show that repeated use of our test instead of using log-rank based tests will be a safer statistical practice. Our second goal is to develop a practical Bayesian model for survival data with high dimensional covariate vector. We develop the Information Matrix (IM) and Information Matrix Ridge (IMR) priors for commonly used survival models including the Cox's model and the cure rate model proposed by Chen et al.(1999), and examine many desirable theoretical properties including sufficient conditions for the existence of the moment generating functions for these priors and corresponding posterior distributions. The performance of these priors in practice is compared with some competing priors via the Bayesian analysis of a study that investigates the relationship between lung cancer survival time and a large number of genetic markers.
Back To Top

April 2, 2010
Speaker:Dr. Sam Kou: Department of Statistics, Harvard University
Title:Multi-resolution inference of stochastic models from partially observed data.
When:April 2, 2010 10:00 am
Where:108 OSB
Abstract:
Stochastic models, diffusion models in particular, are widely used in science, engineering and economics. Inferring the parameter values from data is often complicated by the fact that the underlying stochastic processes are only partially observed. Examples include inference of discretely observed diffusion processes, stochastic volatility models, and double stochastic Poisson (Cox) processes. Likelihood based inference faces the difficulty that the likelihood is usually not available even numerically. Conventional approach discretizes the stochastic model to approximate the likelihood. In order to have desirable accuracy, one has to use highly dense discretization. However, dense discretization usually imposes unbearable computation burden. In this talk we will introduce the framework of Bayesian multi-resolution inference to address this difficulty. By working on different resolution (discretization) levels simultaneously and by letting the resolutions talk to each other, we substantially improve not only the computational efficiency, but also the estimation accuracy. We will illustrate the strength of the multi-resolution approach by examples.
Back To Top

March 26, 2010
Speaker:Quentin Rentmeesters: Catholic University of Louvain, Belgium
Title:Filtering on Manifolds
When:March 26, 2010 10:00 am
Where:108 OSB
Abstract:
In many applications related to signal and image processing, a filtering technique is required to reduce the influence of perturbations on the measurements. In this talk, we will see different approaches to implement such a filtering technique when the measurements belong to a nonlinear space. We will illustrate these techniques on the sphere and will present some concrete applications related to the Grassmann manifold, i.e., on subspace tracking problems.
Back To Top

March 19, 2010
Speaker:Vernon Lawhern
Title:Statistical Modeling and Applications of Neural Spike Trains
When:March 19, 2010 10:00 am
Where:108 OSB
Abstract:
Understanding how spike trains encode information is a principle question in the study of neural activity. Recent advances in biotechnology have given researchers the ability to record neural activity on a wide scale, allowing researchers to perform detailed analyses that may have been impossible just a few years ago. Here we present several frame-works for the statistical modeling of neural spike trains. We first develop a Generalized Linear Model (GLM) framework that incorporates the effects of hidden states in the modeling of neural activity in the primate motor cortex. We then develop a state-space model that incorporates target information in the modeling framework. In both cases, significant improvements in model fitting and decoding accuracy were observed. Finally, in joint work with Dr. Contreras and Dr. Nikonov from the Psychology Department, we study taste coding and discrimination in the gustatory system by using information-theoretic tools such as Mutual Information, and by using a recently-developed spike train metric to study the clustering performance from recordings of proximate neurons.
Back To Top

March 17, 2010
Speaker:Wenting Wang
Title:Practical Uses and Methods for Proportional Odds Survival Model
When:March 17, 2010 3:35 pm
Where:OSB 108
Abstract:
For survival outcomes, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are usually based on Cox's (1972) proportional hazards assumption. We present the alternative method based on the proportional odds survival models (POSM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on Cox's model. We first show that alternative statistical hypothesis for equivalence of treatment arms under POSM can be formulated when we are interested either in maximum difference between survival functions or in difference between hazard functions from two treatment arms. We develop different statistical tests to deal with survival function for the standard treatment being unknown and known. Our simulation studies show that repeated use of our test instead of using log-rank based tests will be a safer statistical practice, because fewer numbers and percentages of statistically accepted (via our tests) equivalent treatments are going to be actually clinically non-equivalent. In addition, we propose an Empirical Bayesian approach for estimation and testing under POSM. We show that the integrated likelihood can be maximized using standard statistical software. The estimates and standard error can be easily computed and estimators have good asymptotic properties.
Back To Top

February 19, 2010
Speaker:ASA Meeting
Title:
When:February 19, 2010 12:00 am
Where:
Abstract:
Back To Top

January 29, 2010
Speaker:Dr. Wei Wu
Title:Information-Geometric Metrics for a Statistical Analysis of Spike Trains
When:January 29, 2010 10:10 am
Where:108 OSB
Abstract:
Understanding information represented in spike trains has been a fundamental problem in neural coding. Therefore, a metric for comparing spike trains is centrally important in characterizing the variability of neural firing activity. Various mathematical frameworks, such as the commonly-used Victor-Purpura metric and van Rossum metric, have been developed to quantify differences between spike trains. Motivated by the Fisher-Rao metric used in information geometry, we introduce a parametrized family of metrics that takes into account different time warpings of spike trains. The parameter is similar to p in the standard Lp norm used in functional analysis. The metrics are based on optimal matching of spikes (and inter-spike intervals) across spike trains under a penalty term that restrains drastic temporal mapping. In particular, when p equals 1 or 2, this metric generalizes the Victor-Purpura metric and the van Rossum metric, respectively. This framework further allows the notions of basic descriptive statistics such as means and medians for spike trains and shortest paths between two spike trains. Using some restrictive conditions, we derive analytical expressions for these quantities. Finally, we test the new method in measuring distances between spike trains on an experimental recording from primate motor cortex. It is found that the method achieves desirable classification performance. This is a joint work with Prof. Anuj Srivastava.
Back To Top