FINAL TEST STA4222/5225 SPRING, 2000 D.A. MEETER
Due: April 25, 9 A.M. I have not given or received aid from other humans on this exam.
Questions in bold are extra credit for STA 4222 Name:_____________________________
The state of Florida wants to survey the energy use of all dwelling units (du). There are 5,000,000 du's served by 50 utilities. The per-utility numbers of du's are (in thousands) 10, 12, 12.5, 13, ... , 62, 67, 78, 80, 400, 450, 900, 1,300. A sample of n = 1000 is desired. Consider plans A-F:
A. Take every kth entry, starting with a random number r. systematic selection
from the list; each page has 80 entries. Select four entries at random from each page. Two-stage sampling, equal clusters
F. Select 10 utilities at random and select a constant fraction (f) from each utility. Two-stage sampling, unequal clusters
I. TYPES OF PLANS
For each plan above, give the number of the best description.
1. srs
2. systematic selection
3. quota
4. stratified
5. proportionate stratified
6. post-stratified
7. optimally stratified
8. cluster
9. two-stage sampling, equal clusters
10. two-stage sampling, unequal clusters
11. stratified clustered
12. stratified clustered with subsampling
II. PLAN DETAILS
1. Which plan would be expected to have a larger variance, B or C? Why? C; more loss of information due to association among elements on the same page (usually from the same utility.)
could be useful in this case. The sample size will vary widely, depending on which utilities you select. 1. Combine and/or split utilities. 2. Use pps sampling of clusters.
3. In Plan C, describe population, sampling unit, element, frame.
Pop: all d.u.s served by any of the electric utilities in FL. Sampling unit: A page of the combined list. Element: A single d.u. on that list. Frame: The combined list of all d.u.s from all utilities.
4. In Plan A, calculate k and state the possible values of r.
k = 5,000,000/1,000 = 5,000. r = 1,2, …, 5,000
5. The standard deviation of the estimator in Plan F is 2.5 times larger than that of the srs. How many observations need to be taken to get the same precision as an srs of 1000? Variance is proportional to 1/n. Variance is 6.25 times larger; you need 6,250 observations (ignoring fpc)
III. SYSTEMATIC SELECTION
In each case, state what problems (if any) might be caused by using systematic selection on the frame. DO NOT question the FRAME.
1. Republican Women's Club. The membership roster is ordered by year of joining the club. Questions involve opinions on issues. Since people's opinions may trend as they age, and since the members who joined early will, on average, be older. There may be a trend in the data.
3. Boy Scouts of Leon County. The list is sorted by troop; each troop is about the same size, and within each troop, the members are grouped by rank. Generally, older scouts have higher rank. Questions involve satisfaction with scouting. Older, higher ranked scouts are more likely to be satisfied (they stuck with it); nearly equal troop sizes could mean that there is a "sawtooth" cycle in the data.
IV. CLUSTER SAMPLING
In each case, what concerns (if any) could be caused by cluster sampling. DO NOT question the FRAME.
1. Boy Scouts again. Each cluster is a troop. The questions involve satisfaction with scouting. We might expect that satisfaction would be similar among members of the same troop; loss of info.
3. Department of Insurance rate increase filings. A cluster is all filings received in a week. Y = time to process a filing. There are probably deadlines for the filings which will cause many of them to be filed in the same week; this will cause the Y values for those filings to have a high mean.
In each case, what problems (if any) could be caused by these forms of estimation.
2. Y = votes cast in a precinct, X = number of voters registered there. Regression. The two variables should have a good correlation, so that ratio or regression estimation should be useful.
3. Y = 1999 incidents of school violence/state, X = 1998 data. What form of estimation? Why?
Difference estimation is reasonable if we expect the 1999 data to approximate the 1998 data, plus a constant. Ratio estimation might be better if the large size differences across states (AK vs. CA, e.g.) produced greater differences in the states with large 1998 figures (quite likely to happen). (This would not be so likely if rates rather than totals were used.