THE MAXIMIN EWMA CONTROL CHART WITH VARIABLE
SAMPLING INTERVALS
Raid Amin,, University of West Florida
ABSTRACT
Amin et al. (1999) introduced a control chart based on the smallest and largest
observations in each sample. They showed that the MaxMin EWMA chart was useful
for jointly monitoring the process mean and process variability, and that it
was
meaningful to place specification limits on the chart. It is a control procedure
that
offers useful graphical guidance for monitoring processes and for trouble shooting
in
applications. An adaptive MaxMin EWMA with variable sampling intervals is
proposed. The sampling interval between samples is allowed to vary depending
on
what is being observed in the current sample. The variable interval EWMA uses
short
sampling intervals if there is an indication that the process mean or variance
has
changed, and long sampling intervals if there is no indication of a change in
the
process. If the EWMA chart statistics actually enter the signal region, then
the VSI
EWMA chart signals in the same manner as the standard EWMA chart. A two-
dimensional Markov-chain is used to approximate the Average Run Length (ARL)
for the proposed control chart, in addition to extensive simulations. A design
procedure for the proposed VSI MaxMin EWMA chart is given.
Bayesian Neural Networks for Bivariate Binary Data:
An Application to Prostate Cancer Study
Sounak Chakraborty, Malay Ghosh and Tapabrata Maiti
ABSTRACT
Prostate cancer is one of the most common cancers in American men.The cancer
could either be locally confined, or it could spread outside the organ. When
locally
confined, there are several options for treating and curing this disease. Otherwise,
surgery is the only option, and in extreme cases of outside spread, it could
very easily
recur within a short time even after surgery and subsequent radiation therapy.
Hence,
it is important to know, based on pre-surgery biopsy results how likely the
cancer is
organ-confined or not.
The paper considers a hierarchical Bayesian neural network approach for posterior
prediction probabilities of certain features indicative of non-organ confined
prostate
cancer. In particular, we find such probabilities for margin positivity and
SV positivity.
The available training set consists of bivariate binary outcomes indicating
the presence
or absence of the two. In addition, we have certain covariates such as prostate
specific antigen (PSA), Gleason Score and the indicator for the cancer to be
unilateral
or bilateral (i.e. spread on one or both sides). We take a hierarchical Bayesian
neural
network approach to find the posterior prediction probabilities for a test set,
and
compare these with the actual outcomes. The Bayesian procedure is implemented
by
an application of the Markov chain Monte Carlo numerical integration technique.
For
the problem at hand, the bivariate neural network procedure is shown to be superior
to the univariate hierarchical Bayesian neural network applied separately to
predict
Margin and SV positivity.
USING MULTIVARIATE ANALYSIS TO IDENTIFY SENSITIVE
BIOGEOCHEMICAL INDICATORS IN THE NORTHERN EVERGLADES
Ron Corstanje, University of Florida
ABSTRACT
The extent of vegetation displacement in the Northern Everglades and resulting
changes in environmental conditions has resulted in a need for a sensitive set
of
indicators that prelude negative environmental changes, and specifically changes
that
result from nutrient enrichment. In this talk we report on a two-step data analysis
leading to the identification of such indicators. In the first step a cluster
analysis was
used on sediment chemistry variables to identify eutropic gradients. In the
second
step, microbial soil characteristics were used to discriminate these clusters.
The
biological response variables are successful in predicting group membership
and point
to specific microbial parameters that should serve as sensitive precursors of
negative
ecosystem change.
Automated Classification of Cardiac ECG Images Using
Generalized Laplacian Model
Mahtab Munshi, FSU
ABSTRACT
Our goal is to provide an efficient algorithm for classifying given echocardiographic
(ECG) images of hearts into several categories: two chambered or four chambered,
normal or abnormal. We decompose images into their spectral components using
a
family of bandpass filters. We have used Gabor filters and various combinations
of
Gabor filters to filter these images. A statistical model on the filtered images
is
imposed by modeling the univariate distribution of the filtered pixels as Generalized
Laplacian (GL) density. To compare two images, we compare the GL densities of
their components under the same filters. We have used the Kullback Leibler distance
to compare two GL densities, and have obtained a closed form expression for
this
divergence. We illustrate the resulting pseudometric on the image space using
two
kinds of experiments. One is to cluster a set of images and the other is to
perform a
classification of test images using models learnt from the training images.
In the latter
case, we divide the available dataset into training and test sets in a random
fashion,
and classify the test images using the nearest image in the training data. We
will
present some experimental results for classifying ECG cardiac images in two
or four
chambered classes.
BUILDING TRACKING PORTFOLIOS BASED ON A GENERALIZED
INFORMATION CRITERION
Xufeng Niu, Florida State University
ABSTRACT
One important topic in financial studies is to build a tracking portfolio of
stocks whose
return mimics that of a chosen investment target. Statistically, this task can
be
accomplished by selecting an optimal model from constrained linear models. In
this
article, we extend the Generalized Information Criterion (GIC) to constrained
linear
models either with independently and identically distributed random errors or
with
dependent errors that follow a stationary Gaussian process. The extended GIC
procedure is proved to be asymptotically loss efficient and consistent under
mild
conditions. Simulation results show that the relative frequency of selecting
the optimal
constrained linear model by the GIC is close to one for finite samples. We apply
the
GIC based procedure for building an optimal tracking portfolio to the problem
of
measuring the long-term impact of a corporate event on stock returns and
demonstrate empirically that it outperforms two other competing methods.
THE IN-AND-OUT-OF-SAMPLE (IOS) LIKELIHOOD RATIO TEST FOR
MODEL MISSPECIFICATION
Brett Presnell and Dennis D. Boos
ABSTRACT
A new test of model misspecification is proposed, based on the ratio of in-sample
and out-of-sample likelihoods. The test is broadly applicable, and in simple
problems
approximates well known, intuitive methods. Using jackknife influence curve
approximations, it is shown that the test statistic can be viewed asymptotically
as a
multiplicative contrast between two estimates of the information matrix that
are both
consistent under correct model specification. This approximation is used to
show that
the statistic is asymptotically normally distributed, though it is suggested
that p-values
be computed using the parametric bootstrap. The resulting methodology is
demonstrated with a variety of examples and simulations involving both discrete
and
continuous data.
INFLUENCE DIAGNOSTICS IN LINEAR MIXED MODELS
Oliver Schabenberger, SAS Institute
ABSTRACT
Measures to gauge the influence of one or more observations on the analysis
are well
established in the general linear model for uncorrelated data. Computationally
these
measures present no difficulty because closed-form update expressions allow
their
evaluation without refitting the model. When applying notions of statistical
influence to
mixed models, things are not so straightforward. Data points that exhibit influence
are
likely to impact fixed effects and covariance parameter estimates. Update formulas
that compute leave-one-out estimates exist only for narrow classes of mixed
models
or impose untenable assumptions. In repeated measures or longitudinal studies
one is
often interested in multivariate influence, rather than the impact of isolated
points. This
talk will examine some influence measures that can be applied in mixed models
and
describes their utility in discerning influential cases and sets of observations.
Several
pplications are presented.
MOMENT MATRICES OF THE NORMAL DISTRIBUTION
James R. Schott, University of Central Florida
ABSTRACT
If $x\sim N_m(0,\Omega)$, then the second-order and fourth-order moment
matrice of $x$ are given by $\Psi_2=E(xx')=\Omega$ and
$\Psi_4=E(xx'\otimes xx')=(I_{m^2}+K_{mm})(\Omega\otimes\Omega)+{\rm
vec}(\Omega){\rm vec}(\Omega)'$, where $K_{mm}$ is a commutation matrix.
Formulas have been given for $\Psi_6$ and $\Psi_8$, but these are rather
complicated looking, making subsequent computations such as the calculation
of a
generalized inverse, rather difficult. By introducing a special class of symmetric
idempotent matrices, we obtain fairly simple expressions for these moment matrices.
An explicit expression is given for $\Psi_k$ for any even positive integer $k$.
An
application utilizing $\Psi_8$ is discussed.
DATA INTEGRATION TECHNIQUES: A CRITICAL REVIEW
Bikas K Sinha, India Statistics Institute
ABSTRACT
Data Integration refers to combining evidences from a number of independent
sources
to arrive at an over-all decision. First we describe in details a well-known
technique
for data integration viz., "TOPSIS" with a numerical example. Next
we examine the
technique from a mature point of view and point out some of its drawbacks. Then
we
describe some ways to rectify the drawbacks and illustrate with the same numerical
example. At the end we mention about another technique : "ELECTRE".
Measurement Error Models for Small Area Estimation
Malay Ghosh and Karabi Sinha
ABSTRACT
It is well-known that direct survey estimators of small areas are usually
not very
reliable, being accompanied with large standard errors and coefficients of
variation.
To meet the need for finding indirect small area estimators, a rich collection
of
models, either explicit or implicit, have been proposed and studied over the
past few
years. However, it appears that measurement error models, in spite of many
other
applications, have hardly been used in this context. Such models, however,
seem
appropriate in several small area contexts. For example, the USDA uses satellite
data
as auxiliary variables in the analysis of many of their crop surveys Clearly,
these
measurements are subject to error, and measurement error models seem appropriate
in these situations for small area estimation. This article develops a general
normal
hierarchical Bayesian methodology for small area estimation in such situations.
We
consider both unit-level and area-specific models and illustrate them in real-life
examples as well as with simulated data.
SEMIPARAMETRIC BAYESIAN ANALYSIS OF MATCHED
CASE-CONTROL STUDIES
Samiran Sinha, University of Florida
ABSTRACT
This talks considers Bayesian analysis of case-control studies in which the
exposure
variable has a conditional disbribution belonging to the exponential family
and may
contain missing observations. A completely observed set of covariates are also
measured. Considered is a matched data design where each stratum contains one
case but M controls. The standard analysis approach is to assume that the distribution
of the exposure variable does not involve any isolated stratum effects except
through
the covariates. In a new model, we allow for the presence of stratum effects
while
modeling the distribution of the exposure variable. Consequently, for the retrospective
conditional likelihood of the exposure variable, the stratum effects remain
as nuisance
parameters, which grow in direct proportion to the sample size. We assume a
Dirichlet process prior with a mixing normal distribution for the stratum effects
and
estimate all the parameters in a Bayesian framework. The Bayesian procedure
is
implemented via Markov chain Monte Carlo numerical integration techniques. Two
matched case-control examples and a simultion study are considered to illustrate
our
methods and the computing scheme.
BRINGING SURVEY DESIGN TO COLLEGE IMPACT STUDIES: APPLICATION
TO A STUDY OF BETHUNE-COOKMAN COLLEGE
Mark D. Soskin, University of Central Florida
ABSTRACT
Commissioning an impact study has become routine before a college or university
launches a new capital campaign or community outreach effort. Economic impact
studies can furnish essential information to banks, alumni, state and local
government,
the media, and the campus community. Unfortunately, these studies are generally
long
on public relations but short on primary data and statistical methodology. Second-
hand expenditure ?per caps? and unsubstantiated multipliers are enlisted to
reduce
costs and bring in the desired ?gee-whiz? impact estimates. Yet, although the
typical
study trumpets the enormous impact of the college, two of the most important
questions remain unanswered:
* Why is the college so under-appreciated and misunderstood?
* How can the college improve its image and reach out more effectively to the
surrounding community?
In this talk, a primary survey- and data-based design is presented for such
impact
studies. Advantages, difficulties and cost barriers encountered in implementing
this
design are discussed using our experiences in application of this methodology
to a
just-completed impact study of Bethune-Cookman College, a large historically-black
college (HBC) in Daytona Beach, Florida.
SIMULTANEOUS TESTS FOR CATEGORICAL AND COUNTS
DATA WITH AN APPLICATION TO CONTROLLING
MULTISTREAM ATTRBUTE PROCESSES
Peter Wludyka, University of North Florida
ABSTRACT
A single Chi-Squared test statistic, W, can be used to simultaneously test
I
independent hypotheses. With this test a simultaneous Type I error rate can
be
established. In practice, one would estimate the quantiles of W using Monte
Carlo
methods. The test is intuitive and simple to perform. Comparisons are made to
alternative approaches (including using tabulated Chi-Squared quantiles).
Circumstances in which this test might be useful are identified. Attribute multistream
processes (ones in which attribute process observations can be modeled as
arising from I independent streams) can be controlled using the W statistic.
Both
homogeneous (all I streams governed by the same probability model) and non-
homogeneous processes can be controlled using W. Post out-of-control signal
methods useful for searching for assignable causes will be offered. Average
run length comparisons with other control schemes will be made.