FINAL TEST STA4222/5225 SPRING, 2000 D.A. MEETER

Due: April 25, 9 A.M. I have not given or received aid from other humans on this exam.

Questions in bold are extra credit for STA 4222 Name:_____________________________

The state of Florida wants to survey the energy use of all dwelling units (du). There are 5,000,000 du's served by 50 utilities. The per-utility numbers of du's are (in thousands) 10, 12, 12.5, 13, ... , 62, 67, 78, 80, 400, 450, 900, 1,300. A sample of n = 1000 is desired. Consider plans A-F:

A. Take every kth entry, starting with a random number r. systematic selection

  1. Obtain a list of all du's from each electric utility; combine these lists. Select an srs of pages

from the list; each page has 80 entries. Select four entries at random from each page. Two-stage sampling, equal clusters

  1. Obtain a list of all du's from each electric utility; combine these lists. Select an srs of pages from the list; take all entries on the page; each page has 80 entries. Cluster
  2. Take a separate srs sample in each county utility;choose different n's in each county utility based on cost and variance within county. "Optimally" stratified
  3. Take a separate srs in every utility, not all with the same f. Stratified

F. Select 10 utilities at random and select a constant fraction (f) from each utility. Two-stage sampling, unequal clusters

I. TYPES OF PLANS

For each plan above, give the number of the best description.

1. srs

2. systematic selection

3. quota

4. stratified

5. proportionate stratified

6. post-stratified

7. optimally stratified

8. cluster

9. two-stage sampling, equal clusters

10. two-stage sampling, unequal clusters

11. stratified clustered

12. stratified clustered with subsampling

II. PLAN DETAILS

1. Which plan would be expected to have a larger variance, B or C? Why? C; more loss of information due to association among elements on the same page (usually from the same utility.)

  1. What is unsatisfactory about plan F? State two methods of dealing with this problem which

could be useful in this case. The sample size will vary widely, depending on which utilities you select. 1. Combine and/or split utilities. 2. Use pps sampling of clusters.

3. In Plan C, describe population, sampling unit, element, frame.

Pop: all d.u.s served by any of the electric utilities in FL. Sampling unit: A page of the combined list. Element: A single d.u. on that list. Frame: The combined list of all d.u.s from all utilities.

4. In Plan A, calculate k and state the possible values of r.

k = 5,000,000/1,000 = 5,000. r = 1,2, …, 5,000

5. The standard deviation of the estimator in Plan F is 2.5 times larger than that of the srs. How many observations need to be taken to get the same precision as an srs of 1000? Variance is proportional to 1/n. Variance is 6.25 times larger; you need 6,250 observations (ignoring fpc)

III. SYSTEMATIC SELECTION

In each case, state what problems (if any) might be caused by using systematic selection on the frame. DO NOT question the FRAME.

1. Republican Women's Club. The membership roster is ordered by year of joining the club. Questions involve opinions on issues. Since people's opinions may trend as they age, and since the members who joined early will, on average, be older. There may be a trend in the data.

  1. Big Bend Sierra Club. The list is alphabetical by last name. Questions involve priorities on ecological issues. Initial letter of family name should not be related to priorities

3. Boy Scouts of Leon County. The list is sorted by troop; each troop is about the same size, and within each troop, the members are grouped by rank. Generally, older scouts have higher rank. Questions involve satisfaction with scouting. Older, higher ranked scouts are more likely to be satisfied (they stuck with it); nearly equal troop sizes could mean that there is a "sawtooth" cycle in the data.

IV. CLUSTER SAMPLING

In each case, what concerns (if any) could be caused by cluster sampling. DO NOT question the FRAME.

1. Boy Scouts again. Each cluster is a troop. The questions involve satisfaction with scouting. We might expect that satisfaction would be similar among members of the same troop; loss of info.

  1. Businesswomen of Florida. The list is alphabetical. A cluster is all the names on one page. Questions on tax policy. There should be no correlation of opinions due to women on the same page having family names that begin with the same letter.

3. Department of Insurance rate increase filings. A cluster is all filings received in a week. Y = time to process a filing. There are probably deadlines for the filings which will cause many of them to be filed in the same week; this will cause the Y values for those filings to have a high mean.

  1. RATIO, REGRESSION, AND DIFFERENCE ESTIMATION

In each case, what problems (if any) could be caused by these forms of estimation.

  1. FSU undergraduates. Y = GPA, X = SAT score. Ratio estimation Seems reasonable; there is a correlation between the two variables (it is not strong, past the freshman year.)

2. Y = votes cast in a precinct, X = number of voters registered there. Regression. The two variables should have a good correlation, so that ratio or regression estimation should be useful.

3. Y = 1999 incidents of school violence/state, X = 1998 data. What form of estimation? Why?

Difference estimation is reasonable if we expect the 1999 data to approximate the 1998 data, plus a constant. Ratio estimation might be better if the large size differences across states (AK vs. CA, e.g.) produced greater differences in the states with large 1998 figures (quite likely to happen). (This would not be so likely if rates rather than totals were used.

  1. Questionnaires involving opinions about musical events are to be given to attendees at events with ticket charges in the FSU School of Music; n = 1000. The events, maximum nightly capacity in ( ), are eight opera performances (1,200), four orchestra concerts (1,200), six chamber concerts (500), and 10 recitals(200). Describe sampling plan, population, frame, element, sampling unit. Justify your plan in terms of cost/variance, ease of execution by music school employees, and good response rate. Stratify according to the five types of events, since opinions may vary across types. Use the hall capacity (plus music school experience) to guess population sizes. Cluster within types, which means sample n1 operas, n2 orchestras, etc. Within a performance, use a simple form of systematic sampling that a person standing behind the ticket takers should be able to implement. Pretest the questionnaire; have pencils available, and put clearly labeled boxes at each exit to receive the quests. If possible, have an announcement made, or have something put in the program about the questionnaire.
  2. Suppose you wanted to estimate the number of alligators at Wakulla Springs, using an economical technique. Would you use: a) a mailed questionnaire, b) a telephone questionnaire, c) direct sampling, d) indirect sampling, or e) sampling of areas, and why? a) obvious; b) obvious; c) would be good if you could arrange for a random sample after you tag, and if you know enough to avoid s getting close to zero; d) similar to c) in the first part, but here you need to be prepared for variability in the time you spend finding enough of the tagged aligators; e) this might be the best method if you could easily create a grid of the swampy areas of the park (would the Global Positioning Satellite [GPS)] help?) Since alligators tend to stick to certain areas (I think), d) and c) also require some way of randomly searching the area. e) requires random sampling of areas, whereas c) and d) require random sampling of gators.