STA 3024 Data Analysis 3 Due: March 27 D. Meeter, Spring, '01

Purpose: to demonstrate that you can find a measurement scale in which statistical assumptions seem reasonable and describe the results of a linear regression, including predictions, in both technical and non-technical terms.

The data: http://stat.fsu.edu/~meeter ç 3024 ç Data Sets ç Data for Assignment 3

Follow the same steps as in Assignment 2. This time, the data is in bush.xls.

In your job for a public interest group, you would like to analyze the Palm Beach County "butterfly ballot" problem. You want to see whether a) it is a reasonable that Buchanan got votes intended for Gore, and b) if so, how many? Perhaps it is reasonable to predict the Buchanan vote using the Bush vote, since both are conservative. (Their county totals might be correlated.)

Plot the data (y vs. x). Perform a simple linear regression, getting deleted residuals (ask me why.)

Within Stat>Regression>Regression, use Graphs Residuals for Plots: ….. · Deleted

Examine appropriate plots for normality, and for nonconstant variance or curvature. Conclusions?

Take logs. Why is log y vs. log x more sensible than log y vs. x ?

Compare the adequacy of your new model with that of the original model (same plots and r2.)

How have your ideas about the adequacy of the fit changed as a result of the log transformation?

From this point on, for y, use Buc(del) which has Palm Beach and Dade as missing data.

You will need to make a new column containing the logs of Buc(del).

Compare the predicted relationship between Buchanan vote and Bush vote in the original scale (Model A), vs. if you take logs, do a regression, and transform back to the original scale (Model B.) Compare the two model's predictions when the Bush vote tends to zero. Which model, A or B, is more reasonable?

Use a prediction interval (pp. 43-44, notes) to predict the Buchanan vote in Palm Beach and Dade counties using models A and B. Compare these to the actual votes for Buchanan. How many excess votes did Buchanan get in Palm Beach County? Give an interval derived from the prediction interval, which estimates a range where the Buchanan vote should be. Use: Options Prediction intervals for new observations: (Enter the x values for which you want to predict y.)

Note: If you want to predict Buchanan's vote for Palm Beach, x = 152,954 (see data file.) If you use log x as your predictor, then you want to predict at log x = ? (depends on log10 or loge.)

Extra

In Model B, interpret what the model would say about the relationship between Buchanan's vote and Bush's if the exponent of x were 1. Analogous to allometric growth. Using the standard error (StDev) of the slope, get a 95% confidence interval for the exponent. Interpret the exponent being less than one. (As x increases, does y increase proportionately?)

Do a few checks to see whether the Gore or Nader votes predict the Buchanan vote.