THE MAXIMIN EWMA CONTROL CHART WITH VARIABLE
SAMPLING INTERVALS

Raid Amin,, University of West Florida

ABSTRACT

Amin et al. (1999) introduced a control chart based on the smallest and largest
observations in each sample. They showed that the MaxMin EWMA chart was useful
for jointly monitoring the process mean and process variability, and that it was
meaningful to place specification limits on the chart. It is a control procedure that
offers useful graphical guidance for monitoring processes and for trouble shooting in
applications. An adaptive MaxMin EWMA with variable sampling intervals is
proposed. The sampling interval between samples is allowed to vary depending on
what is being observed in the current sample. The variable interval EWMA uses short
sampling intervals if there is an indication that the process mean or variance has
changed, and long sampling intervals if there is no indication of a change in the
process. If the EWMA chart statistics actually enter the signal region, then the VSI
EWMA chart signals in the same manner as the standard EWMA chart. A two-
dimensional Markov-chain is used to approximate the Average Run Length (ARL)
for the proposed control chart, in addition to extensive simulations. A design
procedure for the proposed VSI MaxMin EWMA chart is given.


Bayesian Neural Networks for Bivariate Binary Data:
An Application to Prostate Cancer Study

Sounak Chakraborty, Malay Ghosh and Tapabrata Maiti

ABSTRACT

Prostate cancer is one of the most common cancers in American men.The cancer
could either be locally confined, or it could spread outside the organ. When locally
confined, there are several options for treating and curing this disease. Otherwise,
surgery is the only option, and in extreme cases of outside spread, it could very easily
recur within a short time even after surgery and subsequent radiation therapy. Hence,
it is important to know, based on pre-surgery biopsy results how likely the cancer is
organ-confined or not.

The paper considers a hierarchical Bayesian neural network approach for posterior
prediction probabilities of certain features indicative of non-organ confined prostate
cancer. In particular, we find such probabilities for margin positivity and SV positivity.
The available training set consists of bivariate binary outcomes indicating the presence
or absence of the two. In addition, we have certain covariates such as prostate
specific antigen (PSA), Gleason Score and the indicator for the cancer to be unilateral
or bilateral (i.e. spread on one or both sides). We take a hierarchical Bayesian neural
network approach to find the posterior prediction probabilities for a test set, and
compare these with the actual outcomes. The Bayesian procedure is implemented by
an application of the Markov chain Monte Carlo numerical integration technique. For
the problem at hand, the bivariate neural network procedure is shown to be superior
to the univariate hierarchical Bayesian neural network applied separately to predict
Margin and SV positivity.


USING MULTIVARIATE ANALYSIS TO IDENTIFY SENSITIVE
BIOGEOCHEMICAL INDICATORS IN THE NORTHERN EVERGLADES

Ron Corstanje, University of Florida

ABSTRACT

The extent of vegetation displacement in the Northern Everglades and resulting
changes in environmental conditions has resulted in a need for a sensitive set of
indicators that prelude negative environmental changes, and specifically changes that
result from nutrient enrichment. In this talk we report on a two-step data analysis
leading to the identification of such indicators. In the first step a cluster analysis was
used on sediment chemistry variables to identify eutropic gradients. In the second
step, microbial soil characteristics were used to discriminate these clusters. The
biological response variables are successful in predicting group membership and point
to specific microbial parameters that should serve as sensitive precursors of negative
ecosystem change.


Automated Classification of Cardiac ECG Images Using
Generalized Laplacian Model

Mahtab Munshi, FSU

ABSTRACT


Our goal is to provide an efficient algorithm for classifying given echocardiographic
(ECG) images of hearts into several categories: two chambered or four chambered,
normal or abnormal. We decompose images into their spectral components using a
family of bandpass filters. We have used Gabor filters and various combinations of
Gabor filters to filter these images. A statistical model on the filtered images is
imposed by modeling the univariate distribution of the filtered pixels as Generalized
Laplacian (GL) density. To compare two images, we compare the GL densities of
their components under the same filters. We have used the Kullback Leibler distance
to compare two GL densities, and have obtained a closed form expression for this
divergence. We illustrate the resulting pseudometric on the image space using two
kinds of experiments. One is to cluster a set of images and the other is to perform a
classification of test images using models learnt from the training images. In the latter
case, we divide the available dataset into training and test sets in a random fashion,
and classify the test images using the nearest image in the training data. We will
present some experimental results for classifying ECG cardiac images in two or four
chambered classes.


BUILDING TRACKING PORTFOLIOS BASED ON A GENERALIZED
INFORMATION CRITERION

Xufeng Niu, Florida State University

ABSTRACT

One important topic in financial studies is to build a tracking portfolio of stocks whose
return mimics that of a chosen investment target. Statistically, this task can be
accomplished by selecting an optimal model from constrained linear models. In this
article, we extend the Generalized Information Criterion (GIC) to constrained linear
models either with independently and identically distributed random errors or with
dependent errors that follow a stationary Gaussian process. The extended GIC
procedure is proved to be asymptotically loss efficient and consistent under mild
conditions. Simulation results show that the relative frequency of selecting the optimal
constrained linear model by the GIC is close to one for finite samples. We apply the
GIC based procedure for building an optimal tracking portfolio to the problem of
measuring the long-term impact of a corporate event on stock returns and
demonstrate empirically that it outperforms two other competing methods.


THE IN-AND-OUT-OF-SAMPLE (IOS) LIKELIHOOD RATIO TEST FOR
MODEL MISSPECIFICATION

Brett Presnell and Dennis D. Boos

ABSTRACT

A new test of model misspecification is proposed, based on the ratio of in-sample
and out-of-sample likelihoods. The test is broadly applicable, and in simple problems
approximates well known, intuitive methods. Using jackknife influence curve
approximations, it is shown that the test statistic can be viewed asymptotically as a
multiplicative contrast between two estimates of the information matrix that are both
consistent under correct model specification. This approximation is used to show that
the statistic is asymptotically normally distributed, though it is suggested that p-values
be computed using the parametric bootstrap. The resulting methodology is
demonstrated with a variety of examples and simulations involving both discrete and
continuous data.


INFLUENCE DIAGNOSTICS IN LINEAR MIXED MODELS

Oliver Schabenberger, SAS Institute

ABSTRACT

Measures to gauge the influence of one or more observations on the analysis are well
established in the general linear model for uncorrelated data. Computationally these
measures present no difficulty because closed-form update expressions allow their
evaluation without refitting the model. When applying notions of statistical influence to
mixed models, things are not so straightforward. Data points that exhibit influence are
likely to impact fixed effects and covariance parameter estimates. Update formulas
that compute leave-one-out estimates exist only for narrow classes of mixed models
or impose untenable assumptions. In repeated measures or longitudinal studies one is
often interested in multivariate influence, rather than the impact of isolated points. This
talk will examine some influence measures that can be applied in mixed models and
describes their utility in discerning influential cases and sets of observations. Several
pplications are presented.


MOMENT MATRICES OF THE NORMAL DISTRIBUTION

James R. Schott, University of Central Florida

ABSTRACT

If $x\sim N_m(0,\Omega)$, then the second-order and fourth-order moment
matrice of $x$ are given by $\Psi_2=E(xx')=\Omega$ and
$\Psi_4=E(xx'\otimes xx')=(I_{m^2}+K_{mm})(\Omega\otimes\Omega)+{\rm
vec}(\Omega){\rm vec}(\Omega)'$, where $K_{mm}$ is a commutation matrix.
Formulas have been given for $\Psi_6$ and $\Psi_8$, but these are rather
complicated looking, making subsequent computations such as the calculation of a
generalized inverse, rather difficult. By introducing a special class of symmetric
idempotent matrices, we obtain fairly simple expressions for these moment matrices.
An explicit expression is given for $\Psi_k$ for any even positive integer $k$. An
application utilizing $\Psi_8$ is discussed.


DATA INTEGRATION TECHNIQUES: A CRITICAL REVIEW

Bikas K Sinha, India Statistics Institute

ABSTRACT

Data Integration refers to combining evidences from a number of independent sources
to arrive at an over-all decision. First we describe in details a well-known technique
for data integration viz., "TOPSIS" with a numerical example. Next we examine the
technique from a mature point of view and point out some of its drawbacks. Then we
describe some ways to rectify the drawbacks and illustrate with the same numerical
example. At the end we mention about another technique : "ELECTRE".


Measurement Error Models for Small Area Estimation

Malay Ghosh and Karabi Sinha

ABSTRACT

It is well-known that direct survey estimators of small areas are usually not very
reliable, being accompanied with large standard errors and coefficients of variation.
To meet the need for finding indirect small area estimators, a rich collection of
models, either explicit or implicit, have been proposed and studied over the past few
years. However, it appears that measurement error models, in spite of many other
applications, have hardly been used in this context. Such models, however, seem
appropriate in several small area contexts. For example, the USDA uses satellite data
as auxiliary variables in the analysis of many of their crop surveys Clearly, these
measurements are subject to error, and measurement error models seem appropriate
in these situations for small area estimation. This article develops a general normal
hierarchical Bayesian methodology for small area estimation in such situations. We
consider both unit-level and area-specific models and illustrate them in real-life
examples as well as with simulated data.


SEMIPARAMETRIC BAYESIAN ANALYSIS OF MATCHED
CASE-CONTROL STUDIES

Samiran Sinha, University of Florida

ABSTRACT

This talks considers Bayesian analysis of case-control studies in which the exposure
variable has a conditional disbribution belonging to the exponential family and may
contain missing observations. A completely observed set of covariates are also
measured. Considered is a matched data design where each stratum contains one
case but M controls. The standard analysis approach is to assume that the distribution
of the exposure variable does not involve any isolated stratum effects except through
the covariates. In a new model, we allow for the presence of stratum effects while
modeling the distribution of the exposure variable. Consequently, for the retrospective
conditional likelihood of the exposure variable, the stratum effects remain as nuisance
parameters, which grow in direct proportion to the sample size. We assume a
Dirichlet process prior with a mixing normal distribution for the stratum effects and
estimate all the parameters in a Bayesian framework. The Bayesian procedure is
implemented via Markov chain Monte Carlo numerical integration techniques. Two
matched case-control examples and a simultion study are considered to illustrate our
methods and the computing scheme.


BRINGING SURVEY DESIGN TO COLLEGE IMPACT STUDIES: APPLICATION
TO A STUDY OF BETHUNE-COOKMAN COLLEGE

Mark D. Soskin, University of Central Florida

ABSTRACT

Commissioning an impact study has become routine before a college or university
launches a new capital campaign or community outreach effort. Economic impact
studies can furnish essential information to banks, alumni, state and local government,
the media, and the campus community. Unfortunately, these studies are generally long
on public relations but short on primary data and statistical methodology. Second-
hand expenditure ?per caps? and unsubstantiated multipliers are enlisted to reduce
costs and bring in the desired ?gee-whiz? impact estimates. Yet, although the typical
study trumpets the enormous impact of the college, two of the most important
questions remain unanswered:

* Why is the college so under-appreciated and misunderstood?
* How can the college improve its image and reach out more effectively to the
surrounding community?

In this talk, a primary survey- and data-based design is presented for such impact
studies. Advantages, difficulties and cost barriers encountered in implementing this
design are discussed using our experiences in application of this methodology to a
just-completed impact study of Bethune-Cookman College, a large historically-black
college (HBC) in Daytona Beach, Florida.


SIMULTANEOUS TESTS FOR CATEGORICAL AND COUNTS
DATA WITH AN APPLICATION TO CONTROLLING
MULTISTREAM ATTRBUTE PROCESSES

Peter Wludyka, University of North Florida

ABSTRACT

A single Chi-Squared test statistic, W, can be used to simultaneously test I
independent hypotheses. With this test a simultaneous Type I error rate can be
established. In practice, one would estimate the quantiles of W using Monte Carlo
methods. The test is intuitive and simple to perform. Comparisons are made to
alternative approaches (including using tabulated Chi-Squared quantiles).
Circumstances in which this test might be useful are identified. Attribute multistream
processes (ones in which attribute process observations can be modeled as
arising from I independent streams) can be controlled using the W statistic. Both
homogeneous (all I streams governed by the same probability model) and non-
homogeneous processes can be controlled using W. Post out-of-control signal methods useful for searching for assignable causes will be offered. Average run length comparisons with other control schemes will be made.