Test II STA 3024 D. Meeter Sp '00 Name__ANSWERS_____________

 

  1. The sample correlation r between y and x

  1. will not be statistically significant if H0: r = 0 is true False; can be sig. due to sampling error.
  2. if statistically non-significant, demonstrates that x and y can't be causally related False; there could be a Type II error; there could be a nonlinear relationship(energy use vs. temp.)
  3. is the same as the correlation between 3x and 23y True; correlation is scale free.
  4. can be zero even if x causes changes in y True; the relationship could be nonlinear, or it could be due to sampling variability.

"Statistically adequate" means "appears to satisfy Assumptions A1-A4, according to our checks."

2. A regression model

  1. may have statistically significant coefficients even though the model is statistically inadequate True, e.g. the Buchanan/Bush first model (and many other examples.)
  2. may be statistically adequate even though every regression coefficient is non-significant. True
  3. may have a very low R2 even though the model is statistically inadequate True.
  4. will never have a higherR2 if we drop a term from the model True. Since SSE can't go up (R2 can't go down) if we add a term to the model, just reverse the direction.

  1. When adding x2 to a linear regression of y on x1,

  1. b 1 will change if x1 and x2 are correlated True
  2. the Total SS will never change True
  3. SSreg will never decrease True
  4. b 1 will always become more signif. since SSE decreases False. Look at a). Look at fireflies.

4. From a multiple regression:

Source SS df There are 30 observations. The regression of y on x2 alone

Due to x1 1200 1 yielded SSE = 4,000. The total SS, with 29 d.f., is 9,200.

Extra due to x2 2400 1

Extra due to x3 3000 1 a) Calculate SSE for the above table. 2600

b) Estimate s 2 in the full (three-variable) regression 2600/26

c) Test H0: b 3 = 0 using a partial F test. F = (3000/1)/(2600/26) = 30 d.f. = 1,26

d) Calculate SSE in the regression of y on x2 alone. What? It is given above!

  1. In a regression of y = time to exhaustion (sec) on x = temperature (oC), species B and C were studied. The dummy variable D was coded 1 for C, 0 for B. The fitted equation was
  2. .

    a) Assuming that all coefficients are significant, what is the prediction equation for Species C?

    y-hat = 27.4 + 4.6x

    b) How much is the slope for Species C different from that of Species B? It is 1.6 higher.

    c) Suppose that 1.6 was not significantly different from zero. Interpret the fitted equation, using words from this particular problem, e.g. time, species, temperature.

    At0 o C, time to exhaustion is 22 sec for Species B and 27.4 sec for Species C. For both species, time to exhaustion increases by 3 sec. for each 1o increase in temp.

    d) What does c) have to do with ANCOVA? The model in c) is appropriate for ANCOVA, since the regression of y on x has the same slope in the two groups.

  3. The fitted equation was
  4. What is the fitted equation in terms of y? You can assume either natural or common logs.

    y-hat = 3.32x0.9 (natural logs)

    If the std. dev. of 1.2 is 0.2 and the std. dev. of 0.9 is 0.1, is it reasonable to state that y is proportional to x? Justify your answer. Since the exponent of x (in the y scale) is only 1 std. error away from an exponent of 1 while the constant term is significantly different from zero, it is reasonable to assume an exponent of 1for x and write y-hat = 3.32x, so y is proportional to x.

  5. The dependent variable was log # species; the independent variable log area searched. The data was taken on islands A and B. Draw a scatterplot (label axes) which illustrates

  1. Island A averages fewer species, but after adjustment for area, Island A has more species.

b) Island B has about the same # species as A; after adjusting for area, Island B has more species.

  1. There is a correlation between log # species and log area when the island data is pooled, but within each island, there is no correlation between the two variables.

  1. Give an example from your field (state field) in which a multiple regression model would be appropriate. Identify all variables; state how the sample is selected.
  2. Give an example from your field in which ANCOVA would be appropriate. Identify all variables; state how the sample is selected. Do not use the same example as Problem 6, although one could be a special case of the other.
  3. Give an example from your field of data which is usually transformed before it is statistically analyzed, and state why the transformation is used.
  4. In an article which said 53% of Americans were overweight, a formula for BMI (Body Mass Index) was given. In the metric system, the formula is BMI = weight/(height)2. The website for the article said that "BMI correlates with body fat." Using this information, speculate how multiple regression was used on a sample of people to come up with an equation. (Body fat can be measured; BMI is just a variable constructed from body fat.)

Log BMI = log weight -2log height. Probably researchers measured body fat on a number of subjects, along with their weight and height. On performing a regression of log body fat on log weight and log height, they found that the coefficients of log weight and log height were proportional to 1 and -2, respectively. This would produce the stated result.