Sieve Strategy of Nonparametric Statistical Data Modelling

EMANUEL PARZEN
Texas A&M University

Modern attempts to define the foundations of statistical science emphasize that it is a way to think about data modeling. The influential 1970 book of Box and Jenkins taught us that statistical science is NOT just about estimation, testing, or even data analysis, but is about data modeling done in a systematic way by an iterated series of steps which I formulate as SIEVE: sample statistics, identify parametric model, estimate parameters and orders, validate goodness of fit, estimation non-parametric. Another summary of SIEVE: observation, expectation, estimation, comparison, decision. 

The SIEVE data modeling strategy can be illustrated by data (X,Y1,Y2) where X=1 or 2 according as the patient observed is healthy or diseased. One observes also medical variable Y1 and Y2 (in one data set, Y1 is total heart weight and Y2 is body weight). From the small set of n1 observations of (1,Y1,Y2) and n2 measurements of (2,Y1,Y2) one obtains an extensive output to describe the relations between the variables and probabilities such as P[X=1|Y1,Y2]. 

Univariate analysis provides data models of the probability distributions of Y1|X=1, Y1|X=2, Y1, Y2|X=1, Y2|X=2, Y2. The initial no-model step estimates mean, standard deviations, median, quartiles, tails, skewness, histogram, boxplot, quantile/quartile plots. 

Two sample univariate analysis tests homogeneity (equality of the distributions of Y1 in the two sample X=1 and X=2). It compares Y1|X=1 and Y1|X=2 by comparing Y1|X=1 and Y1. It compares Y2|X=1 and Y2

Bivariate analysis tests independence of Y1 and Y2, (conditional on X=1) by testing the independence of Y1 and Y2 by comparing Y1|Y2 and Y1. One computes scattergram-boxplot, correlation coefficients, conditional quantile QY2|Y1=y (u), non-parametric regression E[Y2|Y1=y]. 

Two sample bivariate analysis test homogeneity of distribution in the two groups (X=1,X=2) of (Y1,Y2) by comparing (Y1,Y2)|X=1 with (Y1,Y2). This is a generic problem of machine learning. 


 
 
 


File translated from TEX by TTH, version 3.13.
On 1 Oct 2002, 16:30.