United States General Accounting Office GAO Report to the Ranking Minority Member, Committee on Governmental Affairs, U.S. Senate July 1997 2000 CENSUS Progress Made on Design, but Risks Remain GAO/GGD-97-142 United States GAO General Accounting Office Washington, D.C. 20548 General Government Division B-276531 July 14, 1997 The Honorable John Glenn Ranking Minority Member Committee on Governmental Affairs United States Senate Dear Senator Glenn: This letter responds to your request that we update the information provided in our October 25, 1995, testimony on the Census Bureau’s plans for the 2000 Decennial Census.1 In that testimony, we summarized the work we did in reviewing the results of the 1990 Census, which was the most costly in history and which produced data that were less accurate than those from the 1980 Census, leaving millions of Americans—especially members of minority groups—uncounted. A key reason for the increased cost and decreased accuracy was a sharp decline in the proportion of households that returned questionnaires by mail, causing the Bureau to spend hundreds of millions of dollars to send Bureau employees, known as enumerators, to try to collect census information directly from individual citizens. We and others concluded that the established approach used for taking the census in 1990 had exhausted its potential for counting the population cost-effectively and that fundamental design changes were needed to reduce census costs and improve the quality of the data collected. In that testimony, we also detailed how several of the initiatives that the Census Bureau was planning for the 2000 Decennial Census were consistent with suggestions that we had made since the 1990 Census. These initiatives included (1) simplified and streamlined census questionnaires, (2) multiple mail contacts to prompt a response, (3) increased use of the Postal Service to improve the master address file for the census and identify vacant and nonexistent housing units, and (4) the use of statistical sampling and estimation procedures aimed at reducing the cost and increasing the accuracy of the census. However, in that testimony, we also raised concerns that the further the Census Bureau proceeded with design plans for conducting the 2000 Census without input from Congress, the less Congress would be able to affect the census design without significant risk of wasted expenditures and unacceptable results. In the intervening months, the administration 1 Decennial Census: Fundamental Design Decisions Merit Congressional Attention (GAO/T-GGD-96-37, Oct. 25, 1995). Page 1 GAO/GGD-97-142 2000 Census Design B-276531 has been unable to come to agreement with Congress on critical design and funding decisions. In February 1997, we designated the 2000 Decennial Census a new high-risk area because of the possibility that delays could jeopardize an effective census and increase the likelihood that billions of dollars could be spent and the nation still be left with demonstrably inaccurate census results.2 The objectives of this report are to (1) provide information on the progress that the Bureau has made on its plans concerning the initiatives discussed in our October 1995 testimony and on other initiatives that the Bureau has promoted since the testimony and (2) assess whether the Bureau has demonstrated the feasibility of its plans for carrying out the 2000 Decennial Census. Since our October 1995 testimony, the Census Bureau has continued with Results in Brief the planning of the new design initiatives that we and others have suggested, which are aimed at increasing the mail response rate. This is important because it will reduce the need for follow-up visits by census enumerators, the most costly, difficult to manage, and error-prone operation in the census. These initiatives include such changes as redesigning the questionnaires used to collect information from the public to make them shorter, simpler, and more user friendly, as well as contacting people by mail more than one time to encourage responses. The Bureau believes that its initiatives in this area will produce a mail response rate of about 67 percent from the nation’s housing units, which would be about 2 percent higher than the response rate achieved in 1990 and about 12 percent higher than the 55-percent rate the Bureau would expect to achieve without the initiatives. The Bureau also has continued to develop its plan for dealing with those who do not respond by mail and for checking the quality of the results it gets from mail responses and from visits by enumerators. Its current plan, which was put forth in March 1997, is to statistically sample those who do not respond to its mail survey. It plans to do this by directly sampling nonrespondents in every census tract—which is a small geographic area with an average population of about 4,000 people—in the country until it has information on 90 percent of the housing units in each tract. It will then use statistical methods to estimate the data for the remaining 2 The High-Risk Series (GAO/HR-97-2, Feb. 1997) is a special effort to review and report on the federal program areas we have identified as high risk because of their vulnerability to waste, fraud, abuse, or mismanagement. Page 2 GAO/GGD-97-142 2000 Census Design B-276531 10 percent of the housing units by projecting the information it obtains from its nonresponse follow-up sample. The Bureau’s current plan then calls for staff to be sent to gather another sample of 750,000 housing units and for these independently collected data to be compared with the information obtained from the preceding data collection efforts, such as mail-backs and nonresponse follow-up. Again using statistical methods, the Bureau plans to use the results of its 750,000-household quality check to complete its final population totals. The Bureau believes that this approach offers the best combination of reduced costs, improved accuracy expected at various geographic levels, and operational feasibility. According to the Bureau’s cost estimates, using this plan would save between $700 and $800 million off the cost of using a conventional census plan that incorporated all of the new initiatives proposed for the 2000 Census except those involving sampling and statistical estimation. The Bureau developed accuracy estimates by simulating what the results of the census would likely be under various design options. We reviewed the Bureau’s simulations and identified several shortcomings in the methods and assumptions it used for developing and presenting the accuracy estimates. After the Bureau made several modifications, we concluded that, if the Bureau’s methods and assumptions were properly applied, the final data produced in June 1997 should be generally reasonable for use in projecting the likely effects of the Bureau’s proposed sampling and statistical estimation initiatives. We recognize that because the actual census environment and methods in 2000 may vary from those that were simulated and tested, the Bureau may not actually achieve these results for the 2000 Census. However, because these types of data are the best available to reflect the probable effects of an actual census, we believe it is reasonable to use them to attempt to project the possible effects of various Bureau design options on the 2000 Census. The Bureau’s final data showed that, under the Bureau’s March 1997 plan for the 2000 Census, the relative error in census data (i.e., the measured error in terms of a percentage of the area’s population) would likely be about 0.1 percent for the national total and an average of 0.5 percent for states, 0.6 percent for congressional districts, and 1.1 percent for census tracts. The Bureau’s simulations projected that a design that did not use sampling for nonresponse follow-up and a quality check would likely result in relative error rates of about 1.9 percent for the national total and Page 3 GAO/GGD-97-142 2000 Census Design B-276531 average error rates of 1.9 percent for states, congressional districts, and census tracts. Because of concerns about the potential effects of sampling at the local level, we requested that the Bureau provide more detailed data on error rates at the census tract level. The Bureau’s simulations showed that the March sampling design plan would likely produce more accurate population estimates in two-thirds of the census tracts than using a conventional design once again. However, one third of the tracts would likely have less accurate estimates when compared to the conventional design. As of July 1, 1997, the Bureau had not shared the detailed results of its analysis with Congress, nor had it yet fully demonstrated the operational feasibility of its current plan to Congress. Although the Bureau has revised its plan for the 2000 Census several times since the October 1995 testimony, it provided only limited details to demonstrate and allow those outside the Bureau to check and scrutinize the relative merits of the various alternatives. Citing, in part, the lack of sufficient data on the effects of the Bureau’s proposed use of sampling and statistical estimation methods, some Members of Congress have expressed concern about the Bureau’s plan. They also questioned the use of sampling and statistical estimation on constitutional and statutory grounds. While a draft of this report was with the Bureau for comment, Congress enacted legislation (Public Law 105-18) requiring the Department of Commerce to provide detailed data about the Bureau’s plan by July 12, 1997. It is unlikely that Congress and the administration will come to an agreement on the design and its associated funding level for the 2000 Census without continuous, full, and open disclosure of the effects of its plan at all levels of geography. Thus, the Bureau needs to continue to keep Congress informed of any changes in its approach to the census or refinements to its data that it makes after July 12, 1997. The Bureau is planning a dress rehearsal for the 2000 Census in 1998 to demonstrate and test its design features. It is important for Congress and the administration to reach agreement on the design as soon as possible before the dress rehearsal so that (1) the Bureau can test what it plans to implement in 2000, (2) Congress and the Bureau can discuss the operational feasibility of the plan in terms of the dress rehearsal results, and (3) Congress and the Bureau can determine whether the dress Page 4 GAO/GGD-97-142 2000 Census Design B-276531 rehearsal outcomes are sufficiently similar to the results of the Bureau’s research and simulations to proceed with that design for the census. Time and resources could be wasted if the Bureau tests a plan in 1998 that Congress later finds it cannot accept. The decennial census is the nation’s most comprehensive and expensive Background statistical data-gathering program. The Constitution requires a decennial census of the population in order to reapportion seats in the House of Representatives. Public and private decisionmakers also use census data on population counts and social and economic characteristics for a variety of purposes. State and local redistricting; allocations of government funding; and many planning and evaluation activities, such as site selection for new schools, market research, and evaluations of local labor markets, rely on decennial census data. In addition, the census is the only national source of detailed population statistics for small geographic areas, such as towns or school districts, and for population groups, such as Native Americans. The Bureau has used short- and long-form questionnaires to carry out the decennial census. Most households are sent a short form to complete; however, some are asked to complete the long-form questionnaire. In 1990, for example, about one in six households was required to complete a long-form questionnaire. Many federal agencies use information collected through the decennial census long-form questionnaire as a source of data for their own statistical and programmatic activities. Since 1970, the Bureau has used essentially the same methodology to count the vast majority of the population during the decennial census. It develops an address list of the nation’s housing units and mails census forms to those housing units that ask the occupants to mail back the completed forms. The Bureau then hires temporary census-takers, known as enumerators, by the hundreds of thousands to gather the requested information for each nonresponding housing unit. A critical factor affecting the cost of a census is the necessity for the Bureau to follow up on nonresponding housing units. A declining response rate to the census questionnaires has increased the Bureau’s costly nonresponse workload. In the 1980 Census, the mail response rate was 75 percent, 3 percentage points lower than it was in the 1970 Census. In the 1990 Census, the response rate dropped to 65 percent, 10 percentage points lower than it was in 1980. According to Bureau officials, if the Page 5 GAO/GGD-97-142 2000 Census Design B-276531 downward trend in public cooperation continues without changes to the Bureau’s methods for soliciting responses, the mail response rate could be as low as 55 percent in 2000 and generate a potential nonresponse workload of about 53 million cases, a substantial increase over the 1990 nonresponse workload of about 34 million cases. Since 1970, census costs have been increasing faster than inflation, even after allowing for population growth. In 1990 constant dollars, total census cycle costs were $0.7 billion in 1970, $1.8 billion in 1980, and $2.6 billion in 1990.3 Furthermore, the cost per housing unit jumped from $11 in 1970, to $20 in 1980, and to $25 in 1990. The Bureau estimated that, if census-taking methods were not changed, the 2000 Census could cost almost $5 billion (in 2000 dollars). Unfortunately, the nation’s growing investment in the census has not resulted in uniformly more accurate results. Since the Bureau began evaluating census coverage in 1940, it has documented a net undercount, which is the difference between the estimated population and the census count. Figure 1 shows the net undercount for each census since 1940; note that the undercount decreased for each subsequent census until it increased for the 1990 Census. 3 Constant-dollar value is measured in terms of prices for a base period, to remove the influence of inflation. The resulting constant-dollar value is the value that would exist if prices had remained the same as in the base period. Page 6 GAO/GGD-97-142 2000 Census Design B-276531 Figure 1: The Net Undercount Since 1940 10 Net millions of persons missed 9 8 7.5 7 6.5 6 5.7 5.7 5 4.7 4 3 2.8 2 1 0 1940 1950 1960 1970 1980 1990 Decennial census Source: Bureau of the Census estimates of net undercounts based on demographic analysis and derived largely from administrative data, such as birth and death records, as of June 1991. The net undercount masks an even larger gross error in the census. The 1990 Post Enumeration Survey (PES) provided a greater level of detail on such errors than is possible using demographic analysis.4 While the net undercount, as measured by the PES, was about 1.6 percent of the population (about 4 million persons) in 1990, this does not mean that over 98 percent of the population was accurately counted, as is often reported. In fact, the number of persons missed in the 1990 Census was partially offset by millions of persons who were improperly included. The Bureau estimated that about 6 million persons were counted twice in the 1990 Census, while 10 million were missed. The sum of these numbers—16 million—represents a minimum tally of gross errors since they do not include other errors, such as persons assigned to the wrong locations. 4 The 1990 PES was designed to estimate the net undercount in the census. It was a matching study in which the Bureau interviewed a sample of households several months after the census. The results of these interviews were compared with census questionnaires to determine whether each person was correctly counted in the census, missed, or included in error. Page 7 GAO/GGD-97-142 2000 Census Design B-276531 Even more troubling, the Census Bureau’s evaluations showed a persistent differential undercount of minority groups. The decennial census has not counted all population groups and areas in the United States equally well. The 4.4 percentage point difference in the 1990 net undercount between blacks (5.7 percent) and nonblacks (1.3 percent) was the highest since the Bureau began estimating coverage in the 1940 Census. After the 1990 Census, we, Congress, the Department of Commerce and its Office of the Inspector General, the Bureau itself, and other stakeholders, such as advisory committees to the Bureau, all recognized the need to reassess the conventional census-taking approach to ensure the achievement of a more accurate and cost-effective census in 2000. The Bureau evaluated the results of the 1990 Census to develop new, cost-saving approaches that could improve accuracy for the next census. For example, the single most expensive component of the census was the operation to gather data on nonrespondents, which took 14 weeks, rather than the 6 weeks originally scheduled. The final 6 weeks were devoted just to resolving the last 10 percent of the nonresponse cases. Furthermore, evaluations demonstrated that the amount of error in the census increased precipitously as time and effort were extended to count the last few percentages of the population, with the Bureau ultimately accepting whatever information could be obtained using “last resort” or “closeout” procedures, such as interviews with neighbors, mail carriers, or other persons who were not residents of the nonresponse households. The National Academy of Sciences also reexamined the conventional census design and proposed alternatives for 2000. The general conclusion from these extensive efforts was that fundamental changes were needed in the census design to accurately and economically account for the U.S. population in the next census. To provide information on the progress that the Bureau has made on its Scope and plans concerning the initiatives discussed in our October 1995 testimony Methodology and on other initiatives that the Bureau has proposed since the testimony and to assess the feasibility of the Bureau’s plans for carrying out the 2000 Decennial Census, we (1) reviewed Bureau research, evaluation, and planning documents and data produced since our October 1995 testimony; (2) interviewed Bureau, Department of Commerce, and Office of Management and Budget (OMB) officials; and (3) reviewed a September 1996 report of the House Committee on Government Reform and Oversight entitled Sampling and Statistical Adjustment in the Decennial Census: Fundamental Flaws, our work on prior decennial Page 8 GAO/GGD-97-142 2000 Census Design B-276531 census activities, and various reports prepared by the National Academy of Sciences related to planning for the 2000 Census. This report provides a summary of our work and its results. It includes a technical appendix (app. I) that explains in more detail the Bureau’s sampling and statistical estimation initiatives and our analysis of their effects. It also includes an appendix (app. II) presenting the Bureau’s most recent summary information and explanation of projected differences in the costs and accuracy of selected census designs. In assessing the Bureau’s plans for its statistical sampling and estimation initiatives, we began with the data and research that the Bureau provided, mostly from simulations that the Bureau performed using 1990 Census data and results from the 1995 Census Test. Because it was impractical for us to verify the underlying census data the Bureau used, we accepted the census data in the Bureau’s analysis without further verification. However, we challenged the Bureau’s analysis in several respects, including the methods and assumptions for producing the data and the way they were presented. In response to our questions, the Bureau made several modifications, most of which were to clarify the data and assumptions. Furthermore, to produce the final data comparing the design alternatives included in appendix II, the Bureau redid the simulations after detecting shortcomings in the draft data. To ensure its accuracy, the Bureau had two groups working independently on the simulations and reconciled differences prior to providing the final data to us. Because the actual census environment and methods in 2000 may vary from those that were simulated and tested, the Bureau may not actually achieve these results for the 2000 Census. However, because these types of data are the best available to reflect the probable effects of an actual census, we believe it is reasonable to use them to attempt to project the possible effects of various Bureau design options on the 2000 Census. It is important to note that the Bureau, which will continue its research efforts until the 2000 Census begins, is still refining, modifying, and testing elements of its proposed census design. For example, the Bureau was in the process of revising and verifying its cost estimates when we completed our work. Therefore, we relied on the most recent available cost data, those from June 1997, in the final version of this report, but the Bureau’s estimates are still subject to revision. Finally, because all initiatives of the Bureau’s current plan for the 2000 Census are closely related, none of the initiatives should be considered in isolation from the rest of the census design. Page 9 GAO/GGD-97-142 2000 Census Design B-276531 We did our work in Washington, D.C., and at the Bureau’s headquarters in Suitland, MD, between October 1995 and June 1997 in accordance with generally accepted government auditing standards. We requested comments on a draft of this report from the Secretary of Commerce. We received comments from the Director of the Bureau of the Census (see app. III), which we address at the end of this letter. Declining rates of public response to census questionnaires have Obtaining Responses generated a costly, time-consuming workload for the Bureau. The key to a to Census successful census as measured in terms of cost and data quality is Questionnaires obtaining mail responses to census questionnaires from residents of housing units. The 65-percent mail response rate for the 1990 Census was Presents a Formidable troublesome to the Bureau because of the extensive follow-up effort Challenge required to obtain information from nonresponding housing units. In our October 1995 testimony, we reported the status of new Bureau initiatives that were aimed at obtaining responses to census questionnaires. These initiatives included the creation of simplified and streamlined census questionnaires and the use of multiple mail contacts to prompt a response. Subsequently, the Bureau announced the development of a new outreach and promotion program to encourage public cooperation. The Bureau expects that, when combined, these initiatives should produce a mail response rate of 66.9 percent. This is lower than the 70-percent response rate that the Bureau expected to receive in the 1990 Census and only slightly higher than the 65-percent response rate achieved in 1990. However, it is 12 percentage points higher than the Bureau expects to achieve without the initiatives. Bureau officials did not quantify the individual effect of these initiatives, stating that their effects were too interrelated to be measured separately. Simplified and Streamlined Since the 1990 Census, the Bureau has been working to simplify and Questionnaire Has streamline the short-form questionnaire. As of February 1997, the draft Potential to Improve Mail short-form questionnaire that the Bureau plans to use for the 2000 Decennial Census contained eight questions, which is six questions fewer Response Rates than were on the form used in 1990. The new short-form questionnaire now asks only for the name, age, gender, race, ethnicity, relationship of each household member, and housing tenure (owned or rented). Over the years, we have strongly suggested such an abbreviated form. The Bureau also has been simplifying the long-form questionnaire that asks for more detailed sociodemographic, economic, and housing information. As in the 1990 Census, the Bureau plans to ask one in six housing units to complete Page 10 GAO/GGD-97-142 2000 Census Design B-276531 a long form. Although the final design is still being evaluated, the Bureau expects the 2000 Census long-form questionnaire to have fewer questions than the 1990 Census long-form questionnaire had. The 1995 Census Test evaluated one short-form and several long-form questionnaires of different lengths. The long-form questionnaires ranged in length from 16 to 53 questions, with even the longest version including 11 fewer questions than did the 1990 long-form questionnaire. The 1995 Census Test showed that, the shorter the questionnaire, the more likely housing units were to respond. During the test, the response rate for short-form questionnaires was 55 percent, whereas the response rate for the three versions of the long form ranged from 38.1 percent to 46.8 percent, with the longest questionnaire having the lowest response rate. During 1996, the Bureau conducted the 2000 Census Test to help it determine which specific question wording, formatting, and sequencing would elicit the most accurate responses. It also tested alternative-form designs and assessed the differences in coverage, completeness, and cooperation. As part of this test, short-form questionnaires were sent to 42,000 housing units, and various versions of the long form were sent to 52,500 housing units. The Bureau released preliminary test results in December 1996 that indicated no statistical difference in the housing units’ responsiveness to the various lengths of either the short-form or long-form questionnaires. However, the long-form questionnaire had a lower average response rate (65 percent) than did the short-form questionnaire (72 percent). The Bureau is continuing to work on the design of the census questionnaires with the assistance of marketing and survey design consultants, giving consideration to the printing, mailing, and processing of a large volume of questionnaires. Bureau officials told us that several visual design issues, such as illustrations on the census form that are aimed at promoting response, are still to be resolved. As reported in our 1995 testimony, although the Bureau continues to progress in simplifying the census questionnaires, it has not gained consensus among policymakers and other stakeholders on the content of the questionnaires, their ultimate length, or the use of a long form. Although some Members of Congress have raised questions about the length of the short-form questionnaire and the need for the long-form questionnaire, the Bureau plans to use both the short- and long-form Page 11 GAO/GGD-97-142 2000 Census Design B-276531 questionnaires unless formally directed not to do so by Congress. In the meantime, demands on the Bureau for data collection are increasing. For example, during 1996, the Welfare Reform Act was enacted, which, among other things, mandates that the Census Bureau collect data on grandparents as primary caregivers for their grandchildren. The Bureau is currently proposing to add two questions to the long-form questionnaire to comply with the act. Multiple Mail Contacts We first suggested the use of a multiple mail contact strategy after we Should Promote Increased analyzed the results of the 1980 Census. The Bureau’s initiative for Response to multiple mail contacts consisted of four household contacts—a pre-notice letter, an initial questionnaire, a thank you/reminder card, and a Questionnaires replacement questionnaire. Bureau evaluations during 1995 showed that multiple mail contacts should increase mail response rates. Precise percentages could not be determined, however, because Bureau officials could not determine which part of the multiple mail contact initiative prompted the response. For example, some housing units may have returned the original census questionnaire because they received a thank you/reminder card, while others may have returned the questionnaire without receiving the cards. The Bureau’s test of its initiative was not designed to permit an analysis of this nature. Nevertheless, during testing in 1995, about 7 percent of housing units responded to the questionnaires using the replacement questionnaires, indicating that multiple mail contacts prompted responses. Considering that follow-up on each 1 percent of nonresponding housing units is expected to cost about $25 million, increasing the response rate through multiple mail contact could produce significant savings. Furthermore, because of the approximately 97.3 million housing units for which census questionnaires are projected to be mailed in 2000 and returned by mail, the cost of multiple mail contacts could be a significant but worthwhile investment since it may free the Bureau from having to do follow-up visits to approximately 6.8 million housing units. According to Bureau officials, only a limited number of printing/mailing vendors who are technically qualified have shown interest in bidding on the Bureau’s 2000 Decennial Census questionnaire printing contract. Because of the large volume of printing/mailing involved and the remailing time constraints, most of these interested printing vendors did not believe that a replacement questionnaire could be implemented under the Bureau’s requirements. Therefore, the Bureau changed its initiative to Page 12 GAO/GGD-97-142 2000 Census Design B-276531 include sending all housing units a replacement questionnaire, although it plans to continue discussing less costly and less duplicative methods with vendors for possible use in the 2000 Census. Bureau officials, with assistance from the Government Printing Office, plan to resolve all questionnaire printing and operational issues and award a printing contract to a single contractor or a consortium of printing contractors by December 1998. Increased Use of the Postal As we testified in October 1995, the Bureau is working with the Postal Service Provides Service and local communities to maintain and update its address list. Opportunity for Savings Furthermore, the Bureau is planning to use the Postal Service to identify vacant and nonexistent housing units early in the census-taking process to improve data quality and reduce costly nonresponse follow-up. Through greater reliance on the Postal Service and local communities in updating its address list, the Bureau estimated that it could save as much as $188 million in the 2000 Census. The Bureau also estimated that the use of the Postal Service to identify vacant and nonexistent housing units could reduce nonresponse workload by about 6 percent in the 2000 Census and thereby save an additional $135 million. The Census Bureau has tested and evaluated the use of a combined Census Bureau and Postal Service master address file and is now in the process of updating the file for 2000. As we stated in 1995, a geographically structured address list is critical for planning the 2000 Decennial Census because such a list will enable enumerators to physically locate the addresses of housing units and determine where they may be missing housing units. As of May 1997, the Bureau had updated 85 percent, or 84 million, of the currently known universe of about 99 million city-style addresses on its mapping system.5 Rural addresses, which the Bureau projected will number about 21.3 million in 2000, are to be identified as they were in past censuses—by having Bureau employees canvass rural areas to determine addresses for and geographically locate housing units. Postal Service information is not designed to geographically locate these housing units. The Bureau continues to keep pace with its planned schedule for updating addresses and has determined that March 1999 will be critical for completing the address list. At that time, local governments are expected to review the list and provide updated information as the Bureau prepares for mailing census questionnaires in spring 2000. According to Bureau 5 Not all of these city-style addresses will be delivered census questionnaires by the Postal Service; some questionnaires will be hand-delivered by enumerators and returned through the mail by respondents. Page 13 GAO/GGD-97-142 2000 Census Design B-276531 officials, a special effort will be needed because the last 10 to 20 percent of the addresses will be the most difficult for the Bureau to identify because of poor quality reference sources. For example, an address zoned for single-family residents may have been converted to contain two or more households. Outreach and Promotion In February 1996, the Bureau unveiled its plans for a new outreach and Program Could Encourage promotion initiative, which is expected to cost about $230 million, for Public Response to encouraging public response to the census. While not discussed in our October 1995 testimony, this initiative goes hand in hand with other Census, but to What Extent initiatives discussed in this report because of its potential impact on the Is Unclear response rate to, and the cost of, the census. One key feature of the outreach and promotion initiative is cooperative ventures with local governments aimed at involving elected local officials, business leaders, minority groups, religious organizations, and others in developing outreach activities within local communities. The Bureau’s 1995 Census Test indicated that cooperative ventures with local governments provided a way to promote public participation in the census. However, even though local communities were enthusiastic about participating in the Bureau’s outreach efforts, funding was an issue. Local governments in urban areas where response rates are lowest reported that their lack of funding to promote the Bureau’s initiative is an issue. As was the case in the 1990 Census, the Bureau’s plans do not include funding for these cooperative ventures, and therefore the level of local government involvement in the 2000 Census is unclear. Another feature of the Bureau’s outreach and promotion initiative is the targeting of certain populations and geographic areas that historically have been undercounted, such as inner-city populations. Several of these targeted methods include the use of community-based outreach organizations and the use of unaddressed questionnaires that will not be sent to housing units but rather will be made available at locations throughout a community, such as in community centers and convenience stores. The 1995 Census Test evaluations of targeted methods showed that the use of community-based organizations provides valuable assistance to outreach and promotion efforts in hard-to-enumerate areas. As a result of the test evaluation, the Bureau concluded that the use of unaddressed questionnaires increased response rates by a small percentage, especially from those who may not otherwise have completed a questionnaire. Page 14 GAO/GGD-97-142 2000 Census Design B-276531 The third feature of the outreach and promotion initiative is the Bureau’s plan to contract with a private-sector advertising firm to promote the 2000 Census. The Bureau estimates that this paid advertising contract will cost about $100 million of the $230 million budgeted for the outreach and promotion program. This paid advertising program will end a 50-year partnership between the Bureau and the Advertising Council, which provided pro bono promotional services valued at $65 million for the 1990 Census.6 By using paid advertising, Bureau officials said that they expect to have more control over the placement of advertising to reach targeted populations. For example, under pro bono promotional services, the Bureau had no control over the time of day that advertising was aired. In contrast, under a paid advertising program, the Bureau can decide when advertising “spots” should be aired, including during “prime time” and popular television programs. The Bureau is still in the research and development phase of its paid advertising campaign and plans to spend $450,000 in fiscal year 1997 for focus groups to develop a census message and image acceptable to the majority of the population, as well as specific targeted population groups. The Bureau plans to incorporate the results of these focus groups into the final contract for the advertising firm that is selected to carry out the overall 2000 Census promotional campaign. The Bureau plans to award the advertising contract in September 1997. If funding permits, Bureau officials said they plan to evaluate their full-treatment advertising campaign in the 1998 dress rehearsal. Although Bureau officials acknowledged that a direct link between investment in advertising and a corresponding increase in mail response rates cannot be proven, these officials said that they intuitively believe that advertising aids response by making people aware of the census. Nevertheless, the Bureau’s own research found that, although about 93 percent of the public was aware of the 1990 Census, the mail response rate was only 65 percent. In a related example of civic response, although most U.S. citizens were aware of the 1996 presidential election, and despite massive advertising by candidates and political parties, as well as through public service announcements, only 49.7 percent of the voting age population exercised their right to vote. Although the Bureau expects the use of outreach and promotion to encourage participation, especially in the hard-to-enumerate areas, we are 6 The Advertising Council, Inc., is a nonprofit organization responsible for administering public service advertising campaigns for television, radio, and print media. Page 15 GAO/GGD-97-142 2000 Census Design B-276531 concerned that the Bureau’s funding plans may not bring the high response rates hoped for because of other, larger demographic, economic, and attitudinal variables in our society that cannot be easily overcome. For example, in the 1990 Census, the Bureau planned for a 70-percent mail response rate but achieved only a 65-percent rate. The Bureau intends to expand the use of statistical sampling and The Potential Effects estimation procedures in the 2000 Decennial Census to reduce the time of Statistical Sampling and cost required to follow up on housing units that do not respond to and Estimation on census questionnaires and to improve the accuracy of the population count through the use of integrated coverage measurement (ICM) Census Accuracy and procedures. ICM is a statistical procedure that is designed to improve the Cost accuracy of the census count by reconciling the original census counts with data obtained from an independent sample of housing units and using the results to adjust the census. While the Bureau used sampling and statistical estimation procedures in past censuses, its current plan for the 2000 Census would greatly expand reliance on such procedures in producing the final census totals. In 1992, after comprehensively studying the 1990 Census, we recommended that the Bureau consider using statistical sampling to develop information on nonrespondents in an effort to achieve significant cost-savings.7 In our October 1995 testimony, we again noted that sampling could both improve the accuracy of census data on nonrespondents and save money.8 However, we also testified that the Bureau must be prepared to provide policymakers with data on the trade-offs between the accuracy and potential cost-savings of sampling. In addition, we noted that, if the Bureau were to use sampling and statistical estimation procedures in the form of ICM to adjust for undercounting, such procedures must be reliable. We were concerned that errors introduced by sampling nonrespondents and using ICM would overshadow the benefits that these procedures could provide when applied to smaller geographic levels, such as census tracts.9 In December 1996, the Bureau began providing us with draft results of its research using 1990 Census data that were applied to different design 7 Decennial Census: 1990 Results Show Need for Fundamental Reform (GAO/GGD-92-94, June 9, 1992). 8 GAO/GGD-T-96-37, October 25, 1995. 9 A census tract is a small, relatively permanent statistical subdivision of a county. Census tracts usually have between 2,500 and 8,000 persons—averaging about 4,000—and, when first delineated, are designed to be homogeneous with respect to population characteristics, economic status, and living conditions. Census tracts do not cross county boundaries. Page 16 GAO/GGD-97-142 2000 Census Design B-276531 options, and these results were finalized by June 1997. The Bureau did not release all of these results publicly. The results showed that the Bureau’s plan for statistical sampling and estimation, if effectively implemented, has the potential for producing a more accurate and less costly census than if only conventional census procedures were used. The following sections summarize the Bureau’s general development of sampling for nonresponse follow-up and ICM since our 1995 testimony and provide a view of the projected combined effects of these initiatives to produce the census count. A fuller and more technical description of the purpose, strategies, and research on the Bureau’s alternative designs for sampling for nonresponse follow-up and ICM is included in appendix I. Sampling for Nonresponse Declining rates of public response to the census questionnaires have Follow-Up Could Reduce generated a costly, time-consuming nonresponse follow-up workload for Cost and Save Time the Bureau. The Bureau had to follow up on about 34 million nonresponding housing units in 1990. Although the Bureau plans many initiatives for the 2000 Census to encourage higher mail response rates and reduce its reliance on expensive enumerator visits to every nonresponding housing unit, the nonresponse workload is still expected to be greater than it was in 1990. Therefore, after trying to directly contact every housing unit by providing a census form and requesting a response, the Bureau plans to sample a portion of the nonresponding housing units, rather than continuing with 100-percent follow-up as it did in 1990. The primary purposes of using sampling for nonresponse follow-up are to reduce the cost and time required to finish the census operation. In June 1997, the Bureau estimated that its current plan for sampling nonresponding housing units in 2000 could save about $400 million off the cost of using a census design that incorporated all other improvements on the 1990 model except for sampling for nonresponse. Research results also indicated that sampling can help the Bureau to complete nonresponse follow-up operations in a much more timely manner than was the case in past censuses. Since our October 1995 testimony, the Bureau has twice revised its plan for sampling a portion of the housing units that do not mail back a completed census questionnaire. Under its original plan, which was officially presented to the public in February 1996, the Bureau intended to continue conventional mail response data collection and follow-up interviews until information was obtained for 90 percent of the housing units in each county. The Bureau then planned to truncate, or stop, conventional follow-up, select a sample of 1 in 10 of the remaining housing Page 17 GAO/GGD-97-142 2000 Census Design B-276531 units to interview, and rely on information obtained from interviewing the sample housing units to produce census data that would be substituted for all of the remaining nonrespondents. After this plan was announced, some Members of Congress and other stakeholders and observers raised concerns about the effect of county-level truncation on the accuracy and fairness (equity) of census data. One of the concerns regarding county-level truncation was that the 90-percent threshold would be achieved primarily by more complete enumeration of the areas and population groups within a county that were easiest to count, leaving hard-to-count areas and subgroups of the population, especially minorities, to be disproportionately covered by sampling. In response, the Bureau decided, in September 1996, that its preferred design would be to base sampling for nonresponse on mail response rates at the census tract level while maintaining the goal of obtaining responses from at least 90 percent of all housing units before relying on sample data to account for the remaining units. This change should improve overall accuracy and fairness because tracts are generally smaller and more homogeneous than counties. However, tract-level truncation will pose some additional challenges in the management and implementation of census field operations, not the least of which is the difficulty of tracking and controlling operations for over 60,000 separate tracts instead of for about 3,000 counties. The Bureau studied the following three options to determine how best to implement its revised plan: (1) truncation at 90 percent, (2) time truncation, and (3) direct sampling. Truncation at 90 percent featured the same basic design as the original February 1996 plan, except that completion rates would be tracked for each census tract rather than for each county. Under this option, the Bureau would continue conventional follow-up interviews until it had achieved the 90-percent completion threshold for each tract and would then begin sampling to account for the remaining nonrespondents. Under the time truncation option, conventional follow-up interviews would continue for a predetermined length of time, such as 3 weeks. After this initial follow-up period, the Bureau would select a sample of the remaining housing units in each tract that included enough housing units to raise the completion rate to at least 90 percent. For example, in a tract for which the Bureau achieved a 70-percent completion rate after this initial follow-up period, two of every three of the remaining housing units would be selected for the follow-up sample. Under the direct sampling option, there would be no conventional follow-up phase. Instead, at the end of the mail response phase, the Page 18 GAO/GGD-97-142 2000 Census Design B-276531 Bureau would select a sample of the remaining nonresponding housing units in each tract that would be sufficient to reach at least a 90-percent completion rate and would then project the remaining 10 percent. For example, in a tract with a mail response rate of 30 percent, six of every seven of the remaining housing units would be sampled. Under each of the 3 options, the Bureau would take a 1-in-10 sample of the remaining housing units for any tract with an initial response rate of more than 90 percent. The Bureau’s research indicated that all three options had the potential to (1) produce results with similar accuracy; (2) reduce the cost of follow-up operations at least somewhat when compared to the cost of 100-percent follow-up; and perhaps most important, (3) allow the Bureau to complete nonresponse operations in time to also implement and complete ICM. Of the three nonresponse sampling options the Bureau considered, direct sampling appeared to produce the greatest benefits in terms of cost, accuracy, and operational feasibility. Therefore, in March 1997, the Bureau selected the direct sampling design option. The Bureau estimated the cost of implementing direct sampling to be about $200 million less than truncating at 90 percent and $600 million less than using the time truncation option. In simulations of the accuracy of different options, direct sampling produced slightly better results, particularly for small geographic areas, such as census tracts. This option was also favored by Bureau field staff because it is simpler to implement. However, direct sampling differs the most from the plan the Bureau originally proposed and would mean that nonresponding housing units that were not selected as part of the sample would not have another chance to be interviewed by census enumerators. This option, therefore, may be somewhat more difficult for the public to understand and accept. ICM Could Improve the The purpose of ICM is to reduce coverage error in the census, particularly Accuracy of the Population the differential undercount of minorities and other hard-to-enumerate Totals populations and areas. Even if all other redesign initiatives produce the anticipated improvements in the conventional census counting operations, the evidence from past census evaluations indicates that coverage errors and differential net undercounts in the census data will still occur. Evaluations have demonstrated that the census misses entire housing units (and any occupants), misses people within housing units that it does count, and includes other persons in the census counts in error (e.g., counting them more than once or in the wrong place). Young adult males, members of ethnic and racial minorities, renters, and people living in rural Page 19 GAO/GGD-97-142 2000 Census Design B-276531 areas, among others, are more likely than other categories of residents to be undercounted by the census. Evaluations of the 1990 Census also showed that error rates were significantly higher for persons living in housing units enumerated using “last resort” or “closeout” procedures at the end of nonresponse follow-up operations, when the Bureau accepted information from persons who were not residents of the households. ICM is designed to use the results of a coverage measurement survey to do a quality check in which enumerators visit an independent sample of households to check the accuracy of the original census data. ICM would be conducted after basic data collection, including nonresponse follow-up, had ended and would estimate the extent to which people were correctly counted, missed, or included in error by the census. These estimates would then be used to correct coverage errors in the results of previous data collection efforts. This would be the last phase in completing the census and producing the final census results. Although the Bureau has used coverage measurement surveys since 1950 to help it determine the magnitude and characteristics of census errors and undercounts, it has not used the findings of these evaluations to correct for coverage errors in the decennial census data tabulations.10 A key feature of the Bureau’s current plan for 2000 is to produce a one-number census by integrating an adjustment into the basic process for developing decennial census data tabulations. The Bureau designed the proposed ICM for 2000 to address several major weaknesses of the coverage measurement survey that was used in the 1990 Census (the PES), especially those regarding timeliness and the accuracy of estimates for subnational geographic areas. In the 1995 Census Test, new procedures and reliance on improved survey technology demonstrated the potential to improve the timeliness of a post-census quality check. The Bureau was able to produce the final test census results before the end of 1995, a significant improvement over the performance of the 1990 PES, which did not generate adjusted census estimates until the spring of 1991, well after the December 31, 1990, deadline for apportionment totals. The Bureau also plans to use a sample in 2000 that is approximately 5 times larger than the 1990 PES sample (about 750,000 housing units versus about 150,000). This larger sample is designed to 10 The Bureau has used the 1990 Census adjustment factors for purposes of adjusting its survey controls but not the decennial census tabulations. Page 20 GAO/GGD-97-142 2000 Census Design B-276531 allow the Bureau to produce direct estimates for each state and improve the quality of estimates for smaller geographic areas.11 While results from research and testing to date have been promising in general, it is also clear that the Bureau’s plans for further research and testing are important to the development of ICM for the 2000 Census. The Bureau experienced some operational and technical problems with ICM in the 1995 Census Test. For example, enumerators tested the use of laptop computers to quickly reconcile the information obtained during the ICM visit with original census data but found that the data were not always contained in the computer, making on-the-spot reconciliations impossible. The Bureau is currently testing and evaluating redesigned procedures and its survey instrument. The ICM survey portion of this test began in January 1997, and the results of the test should help determine whether ICM operational and technical problems have been addressed sufficiently. However, evaluations from this test are not expected to be completed until fall 1997, and Bureau officials have indicated that evaluation of ICM may continue through the Census 2000 Dress Rehearsal. Combined Effects of The effects of the proposed sampling for nonresponse follow-up and of ICM Planned Sampling for procedures on the success of census operations and the quality of the Nonresponse Follow-Up resulting data need to be viewed in combination to be meaningful. Sampling a portion of the nonresponse workload can save time and money and ICM on Population when compared with the option of 100-percent follow-up of Totals nonrespondents; however, it is unlikely to significantly change (either improve or decrease) census accuracy. Conversely, ICM, which would increase costs, is designed to address problems with census accuracy. But ICM is unlikely to be successful unless preceding data collection efforts, in particular nonresponse follow-up, are completed on schedule. ICM is designed to reduce the systematic bias observed in past censuses (i.e., differential undercounts), but it also introduces sampling error. Similarly, a census design that would not include sampling for nonresponse follow-up or ICM would also involve trade-offs. For example, such designs would be easier to explain to the public, would more closely resemble past censuses, and would not introduce the level of uncertainty in the results that accompanies sampling. However, these designs are also likely to be 11 A direct estimate is based entirely on data from the area for which the estimate is calculated. For instance, a direct population estimate for Missouri would be calculated using only data collected from Missouri. Indirect estimates, such as the 1990 PES state population estimates, draw on data from outside the area being estimated. Page 21 GAO/GGD-97-142 2000 Census Design B-276531 more expensive and have shown no likelihood of reversing or significantly reducing past accuracy problems in census data. The Bureau shared with us data and analysis it had developed from its research and simulations done for the different design alternatives it considered for the 2000 Census. We reviewed the Bureau’s methods and assumptions and, after the Bureau made a number of revisions in response to our questions, found that the revised data produced were generally reasonable to use to project the possible effects of the Bureau’s proposed sampling and statistical estimation initiatives. The results of the simulations show that the Bureau’s plan, if effectively implemented, has the potential for producing a more accurate and less costly census than if only conventional census procedures were used. According to the Bureau’s cost estimates, which are shown in table 1, using the Bureau’s refined plan for the 2000 Census would save between $700 million and $800 million off the cost of using a plan that incorporated all of the new initiatives proposed for the 2000 Census except those involving sampling and statistical estimation. Page 22 GAO/GGD-97-142 2000 Census Design B-276531 Table 1: Comparison of Estimated 2000 Census Costs for Selected Dollars in billions Design Alternatives Cost in 2000 Design alternatives Description dollarsa Bureau’s refined plan —Include all planned improvements of $ 4.0 for 2000 Census (as 1990 procedures; of March 1997) —sample nonrespondents directly after mail return phase to achieve a 90-percent response for each census tract, then use sample data for remaining nonrespondents; —use ICM to complete the census. 90-Percent truncation —Include all planned improvements of 4.2 for nonresponse 1990 procedures; follow-up —use conventional nonresponse follow-up until achieving 90-percent response for each census tract; then sample remaining nonrespondents; —use ICM to complete the census. Time truncation for —Include all planned improvements of 4.6 nonresponse 1990 procedures; follow-up —use conventional nonresponse follow-up for a set period of time, then sample remaining nonrespondents to achieve a 90-percent response rate for each census tract; —use ICM to complete the census. Conduct 2000 —Do not use sampling for nonresponse 4.4 Census without follow-up; sampling for —include all other planned improvements nonresponse of 1990 Census procedures, follow-up including ICM to complete the census. Conduct 2000 —Do not use sampling for nonresponse 4.7-4.8 Census without follow-up or ICM; sampling for —include all other planned improvements nonresponse of 1990 procedures; follow-up or ICM —use increased activities (e.g., publicity, follow-up of vacant housing units) to attempt to achieve census coverage consistent with 1990 levels; —use a PES only to evaluate the quality of the census. Note 1: Planned improvements of 1990 procedures include initiatives such as the multiple mail strategy, questionnaire redesign, enhanced outreach and promotion, sampling for nonresponse follow-up, and ICM. Note 2: All design alternatives assume an overall mail response rate of about 67 percent. a Bureau cost estimates are as of June 1997 and are subject to revision. Source: Census Bureau data. Page 23 GAO/GGD-97-142 2000 Census Design B-276531 The Bureau provided us with results from its computer simulations of what could be produced in the 2000 Census using its refined design for 2000 (i.e., using direct sampling controlled at the census tract level for nonresponse follow-up together with ICM). Those research results indicated that, relative to the size of the population being estimated, the new methods proposed by the Bureau would likely result in less relative error in census data for the nation, states, congressional districts, and most census tracts than using a conventional census design (see fig. 2).12 Because of the limitations of the research we reviewed, these numbers only serve as an illustration of likely results in 2000. Despite these caveats, results near these levels would represent a reduction in the relative error rates experienced in the 1990 Census. 12 In this report, we primarily use relative error to enable us to compare results of different design alternatives and geographic areas with differing population sizes. The Bureau’s simulations measured two different types of error: sampling error for the Bureau’s alternatives that incorporate sampling for nonresponse follow-up and ICM and net undercount or overcount for conventional census design alternatives. Page 24 GAO/GGD-97-142 2000 Census Design B-276531 Figure 2: Simulations Indicate Bureau’s Refined Design Could Produce Lower Relative Error Rates Percentage of relative error Than a No Sampling Design in 2000 2.5 2.0 1.9 1.9 1.9 1.9 1.5 1.1 1.0 0.6 0.5 0.5 0.1 0.0 National States Congressional Census tracts districts Geographic level No sampling design Direct sampling design Note 1: The average relative error is shown for each geographic level. Note 2: Relative error for the no sampling design (i.e., implementing all planned improvements of 1990 procedures except sampling for nonresponse follow-up and ICM) is the projected net undercount rate in the 2000 Census. Note 3: Relative error for the direct sampling design combines the sampling error from sampling for nonresponse follow-up and ICM. Note 4: Data for states exclude Washington, D.C. Note 5: Data do not include possible errors from other sources, such as any bias in the statistical models used to produce the population estimates. Source: Census Bureau data. Page 25 GAO/GGD-97-142 2000 Census Design B-276531 The Bureau’s simulation results for design alternatives also illustrated one of the major trade-offs in accuracy between designs that use sampling and statistical estimation and those that do not. The simulation results suggest that the new statistical methods the Bureau proposes to use in the 2000 Census would likely produce results that appear more accurate or more equitable according to at least three broad criteria: (1) better average levels of error, (2) error distributions compressed closer to the average levels, and (3) an apparently better cumulative error distribution. For example, state population totals, which are the basis for apportioning seats in the House of Representatives, not only show lower error rates on average but also show much less variation in the error rates among states when compared to the undercount rates in the 1990 Census. The results also showed, however, that some areas that may have very low net error rates using a conventional census design could have higher error rates if sampling for nonresponse and ICM are used, particularly as smaller geographic levels are considered. The smallest geographic areas for which detailed data from the Bureau simulations are available are census tracts. The Bureau projected that its refined plan for the 2000 Census, using the direct sampling option for nonresponse follow-up, would have an average error rate of 1.1 percent for census tracts. The average error rates were 1.5 percent for the nonresponse follow-up option, using 90-percent truncation, and 1.3 percent, using the time-truncation option. The estimated average net undercount for tracts using conventional procedures was 1.9 percent. The Bureau calculated that its refined plan for 2000 would produce less error for 64 percent of census tracts. For the other two options the Bureau considered implementing, the simulations indicated that the time-truncation option would produce less error for 54 percent of the tracts and that the 90-percent truncation would produce less error for 51 percent of the tracts. The converse of these data is that the conventional procedures were projected to perform better for around 40 to 50 percent of the tracts, depending on the option used for comparison. (For more detailed information on projected error levels, using these alternative census designs, see apps. I and II.) The Bureau intends to contract for a study of the potential block-level effects of using sampling. Technically, the most accurate design alternative, according to the results of the Bureau’s research, would be to attempt 100-percent follow-up of nonrespondents and use ICM to address accuracy problems. That design could produce slightly improved accuracy in census data, particularly for smaller geographic areas, but would come at a greater cost (approximately Page 26 GAO/GGD-97-142 2000 Census Design B-276531 $400 million more than that for the Bureau’s refined plan). Furthermore, such an option may not be feasible given projected staffing difficulties and, especially, the risk that the Bureau could not complete 100-percent follow-up and ICM by the December 31 deadline for reporting census results for congressional reapportionment. The results of the Bureau’s research on alternative sampling approaches are included in appendixes I and II. Constitutional and Legal Issues The use of sampling in connection with the decennial census count has been questioned on constitutional and statutory grounds. Article I, section 2 of the Constitution requires an “actual Enumeration” of the population every 10 years and vests Congress with the authority to conduct that census “in such a Manner as they shall by Law direct.” Congress has, in turn, delegated this authority to the Department of Commerce through the Census Act.13 One of the key issues in the constitutional debate on sampling is whether sampling would be considered an “actual Enumeration” as required by article I, section 2. The Supreme Court has never considered the specific issue of whether the use of sampling violates the Constitution. The Court has, however, considered various constitutional challenges to the conduct of the census in other contexts. Most recently, the Court determined that the Secretary of Commerce’s decision not to use a post-enumeration statistical adjustment in the Department’s final census count in 1990 was within the constitutional bounds of discretion that the Secretary has over the conduct of the census.14 The Court specifically stated in the decision that it was “not decid[ing] whether the Constitution might prohibit Congress from conducting the type of statistical adjustment considered here.”15 In that case, the Court, citing its previous decisions concerning census issues, concluded that so long as the Secretary’s conduct of the census is “consistent with the constitutional language and the constitutional goal of equal representation,” it is within the limits of the Constitution.16 The Court based its deference to the Secretary’s action on “the wide discretion bestowed by the Constitution upon Congress, and by Congress upon the Secretary.”17 13 13 U.S.C. 141(a). 14 Wisconsin v. City of New York, 116 S. Ct. 1091 (1996). 15 [Id., at 1101.] 16 [Id.] 17 [Id., at 1103.] Page 27 GAO/GGD-97-142 2000 Census Design B-276531 While, as noted earlier, this decision did not address whether statistical adjustments were constitutionally permissible, proponents of sampling have argued that the Court’s recognition of the considerable discretion granted Congress and, by delegation, the Secretary of Commerce, to conduct the census would support the use of sampling if the Secretary determined sampling was necessary to produce a more accurate count of the population than would result from a bare headcount. Alternatively, opponents of sampling have argued for a more literal interpretation of the phrase “actual Enumeration” in the Constitution and point to dissatisfaction with the initial congressional apportionment (which has been described as a “conjectural ratio”) and to the consistent practice of using unadjusted headcounts from the first census in 1790 until recent decades. There is also a controversy concerning the application of specific statutory provisions to sampling. The Census Act states that the Secretary of Commerce is to undertake a decennial census “in such form and content as he may determine, including the use of sampling procedures.” However, another section of the act18 appears to restrict the use of sampling as follows: “[e]xcept for the determination of population for purposes of apportionment of Representatives in Congress among the several States, the Secretary shall, if he considers it feasible, authorize the use of the statistical method known as “sampling” in carrying out the provisions of this title.” (emphasis added) The language and legislative history of these statutory provisions has been the subject of debate on the issue as to what, if any, limits on the use of sampling were envisioned by Congress. Judicial interpretation of the statutory provisions has been confined to lower courts and has generally supported the conclusion that section 195 permits adjustment for apportionment purposes.19 The question of whether sampling is statutorily and constitutionally permissible in determining the decennial census count can only be definitively resolved by the Supreme Court. 18 13 U.S.C. 195. 19 See, e.g., City of Philadelphia v. Klutznick, 503 F. Supp. 663 (E.D. Pa. 1980); Young v. Klutznick, 497 F. Supp. 1318 (E.D. Mich. 1980). Page 28 GAO/GGD-97-142 2000 Census Design B-276531 The 1990 Census was more costly yet less accurate than the 1980 census. Risk of a Failed By 1994, the fundamental design of the 1990 Census had been found to be Census in 2000 Has flawed and in need of change. This conclusion was reached independently Increased by the Department of Commerce task force for designing the 2000 Decennial Census; two expert panels of the National Academy of Sciences, one of which was commissioned by Congress to study the 1990 Census; the Bureau; and us. As a result of this conclusion, the Bureau was faced with the question of how best to change the conventional census-taking methods in a manner that would make them less costly and more accurate than the 1990 Census and would meet the approval of stakeholders, including Congress, federal agencies, state and local governments, the public, demographers, and others who rely on census information. Planning a decennial census that is acceptable to all of these stakeholders includes analyzing the lessons learned from past practices, identifying those initiatives that show promise for producing a better census at lower cost, testing those initiatives to determine their effectiveness and feasibility, and convincing stakeholders of the value of the proposed changes. Although the Bureau has generally been responsive to concerns, suggestions, and recommendations made by us and others, it has not been able to convince all of its key stakeholders, particularly Congress, of the value and acceptability of its plans and proposals for improving the design of the 2000 Census. Since our October 1995 testimony, significant congressional opposition to the Bureau’s census design has surfaced, and Members of Congress have raised questions about the level of funding being requested for the census. However, the Bureau has said that the alternative of returning to the conventional census design (i.e., without methods improvements, sampling for nonresponding housing units, or ICM) that failed to include more than 4 million people in 1990 and would miss at least 5 million people in 2000 is not an alternative. Thus, in February 1997, this uncertainty over design and funding levels, at this late stage of census preparation, led us to designate the 2000 Census as being at high risk for wasted expenditures and unsatisfactory results.20 At least two factors have contributed to the Bureau’s inability to reach agreement with Congress. First, the Bureau has not always provided sufficient information to Congress or others to support its initiatives. Specifically, the Bureau has not provided (1) enough detailed data to show the range and distribution of effects that its initiatives are designed to achieve and (2) the results of its research to address concerns about the soundness or subjectivity associated with its proposed statistical methods. 20 GAO/HR-97-2, February 1997, pp. 141-146. Page 29 GAO/GGD-97-142 2000 Census Design B-276531 Second, the Bureau has neither successfully tested the operational feasibility of some key initiatives it would implement in 2000 nor yet determined how well all these initiatives work together. This situation contributes to uncertainty over whether the Bureau’s plans can be successfully carried out. Need Exists for More Over the last few years, the Bureau has provided general data on the Detailed Information and anticipated mail response rates to questionnaires, the accuracy of the Data to Support Key census data, and the estimated cost of and dollar savings from its initiatives. However, it has not always provided sufficiently detailed data Bureau Proposals on the expected effects of its initiatives on such key variables as the accuracy or equity of census data. This lack of sufficient data on expected effects has made it difficult for Congress and other stakeholders to support Bureau initiatives. The Bureau’s proposal to expand the use of statistical sampling and estimation procedures to handle the workload resulting from nonresponding housing units is an example of a proposal for which the Bureau did not provide sufficiently detailed data. When we testified in October 1995, the Bureau was studying alternative sampling designs. Although we supported the concept of using statistical sampling and estimation methods, we noted that further study of the alternatives was necessary. In the same congressional hearing, the Director of the Census Bureau testified that the Bureau would continue its research and provide details of the Bureau’s plans, including details on alternatives for sampling nonresponding housing units and for statistical estimation procedures. In February 1996, the Director of the Bureau, the Secretary of Commerce, and the Director of OMB, among others, presented the Bureau’s overall plans for the 2000 Decennial Census. The sampling initiative the Bureau selected involved a 90-percent truncation design that would be controlled at the county level. The Bureau continued to say that a design employing the truncation option for nonresponse follow-up, together with ICM, was the superior alternative. However, the Bureau provided no additional details about the effects of the selected option or of the other alternatives on census accuracy or equity. The Bureau’s plan for the 2000 Census was not well received by the House Committee on Government Reform and Oversight and certain other stakeholders who were concerned about the effects of this plan on the accuracy and equity of the census. At hearings held the day after the Page 30 GAO/GGD-97-142 2000 Census Design B-276531 Bureau presented its plans, several witnesses testified that the plan was unacceptable. One concern was that sampling at the county level would cause a deterioration in the accuracy of the counts of minorities. On September 16, 1996, the Bureau announced a revision to its plan to control sampling of nonresponding housing units at the county level, stating that it would sample at the tract level. However, the Bureau provided only limited additional data that would enable Congress and other stakeholders to gauge the impact of the revised proposal on accuracy and equity. In general, the data were summary statistics showing the average error rates at various geographic levels. When the Bureau provided additional information on ranges of error rates at different geographic levels, it did not provide the supporting details. In particular, the Bureau did not provide the details on the distribution of errors across geographic areas, such as states or tracts. Such details would help show whether the potential error rates for most areas were close to reported averages or distributed more widely across the range of errors. They would also help identify the number of geographic areas with potentially high error rates and whether they were scattered across the country or clustered in certain areas. On September 24, 1996, the House Committee on Government Reform and Oversight issued a report that was critical of the Bureau’s initiatives for sampling and statistical estimation. Among other things, the Committee found that the Bureau had not clarified issues of accuracy, particularly for small geographic areas, raised by the sampling initiative. The Committee also raised concerns about the operational feasibility of, and possible subjectivity associated with, the Bureau’s proposed sampling and estimation procedures. It also found that views differed on the constitutionality and legality of using the proposed sampling and estimation procedures, particularly with regard to apportionment. The Committee recommended that the Bureau not use sampling and estimation procedures to complete or adjust the census. In December 1996, we met with Bureau officials to discuss the need for and importance of having sufficient information on the potential effects that its proposed sampling and estimation initiatives would have on accuracy and equity. The Bureau officials said that they had been reluctant to release detailed draft data while research was under way because of concerns about criticisms the Bureau may face if the numbers changed on the basis of subsequent research results. However, the officials agreed that they needed to provide these data and expedited their efforts to do so. In Page 31 GAO/GGD-97-142 2000 Census Design B-276531 December 1996 and January 1997, the Bureau provided us with detailed data on the results of its simulations. Although these were draft data, they compared the potential effects on accuracy at various geographic levels that could be produced using various design alternatives proposed for the 2000 Census. However, in a February 1997 response to the concerns of the House Committee, the Bureau provided neither these nor other detailed data to address the Committee’s concerns about the effects of the planned sampling and estimation procedures. On April 2, 1997, we provided the Bureau with a draft of this report for comment. In attempting to verify the data and other information that they had provided us for the report, Bureau officials discovered some discrepancies and other errors. For example, there were some inconsistencies in the data displayed in the chart the Bureau had been using to summarize the potential costs and accuracy of alternative census designs. (The revised chart appears as fig. II.1.) Most columns in the chart presented projected results for the 2000 Census, but the tract-level data represented a simulation of results for the 1990 Census. Bureau officials also discovered a problem in the data files used to simulate the results for those census designs that did not involve sampling or statistical estimation; this problem increased the reported error rates at the census tract level for those designs. The cost estimate for the design alternative using direct sampling for nonresponse (the Bureau’s refined plan) was also understated because the Bureau had not revised its estimate to account for the cost of changing from a nonresponse follow-up based on county-level response rates to one based on tract-level rates. The Bureau notified us about these data problems and mistakes later in April 1997 and began to rerun the data we had requested. However, the Bureau was not able to provide us with a final version of the revised information and data sets until June 1997. The revised data addressed the problems identified in April by consistently presenting projected results for the 2000 Census, including revised cost estimates. While we waited for the revised data, which should be shared with all stakeholders, Public Law 105-18 was enacted, requiring Commerce to provide Congress with a comprehensive and detailed plan outlining its proposed methodologies for conducting the 2000 Decennial Census. Another example of the Bureau’s not providing detailed data on the expected effects of a Bureau initiative involves congressional concerns about the subjectivity associated with the Bureau’s sampling plans. In its September 1996 report, the House Committee on Government Reform and Page 32 GAO/GGD-97-142 2000 Census Design B-276531 Oversight raised concerns about the subjectivity associated with such decisions as the selection of the samples the Bureau intends to use for nonresponse follow-up and for ICM. The final details of the Bureau’s methods and procedures could affect the census results, and this is especially important since the formula used to apportion seats in the House of Representatives is mathematically very sensitive to the number of people in states that may receive the last few seats through the apportionment process. However, in its February 1997 response to the House Committee on Government Reform and Oversight, the Bureau did not provide specific information on how it would make many decisions, such as sample selections. Without information on the census design options considered, their likely implications, the choices the Bureau made, and the bases for such choices, Congress and other stakeholders may continue to have concerns over subjectivity in the 2000 Census. The Bureau also has not provided detailed justification for its proposed initiative for a $100 million advertising campaign, concerning its effects on cost and response rates. Previously, the Advertising Council had conducted a public service advertising campaign for the census at no cost to the Bureau. Although we share the Bureau’s hope that its planned advertising campaign will increase mail response rates, the Bureau has provided no detailed data linking expenditures on advertising with a corresponding increase in public response. Although it may be impossible for the Bureau to predict precisely the increase in the response rate that paid advertising may produce, as of June 20, 1997, the Bureau had not provided data supporting the budget of its proposed $100 million advertising campaign. In the recent 1996 election, hundreds of millions of dollars were spent not only to promote the candidates, but also in a public service advertising effort to promote voting to specific groups, such as 18-to 25-year-olds, who were targeted by rock stars and other celebrities on radio, television, and cable television channels using the slogan “Rock the Vote.” Nevertheless, under 50 percent of the voting age population turned out to vote. Although differences exist between filling out a census questionnaire that is sent to one’s home and either getting to a polling place or arranging to vote at home, both require responses motivated by civic involvement. Although about 93 percent of those surveyed on the 1990 Census were aware of it, only 65 percent of those households that received questionnaires by mail responded. Thus, a question is raised about the effectiveness of paid advertising in stimulating action as opposed to simply raising awareness. The Bureau is planning to evaluate its proposed advertising campaign in the Census 2000 Dress Rehearsal, but Page 33 GAO/GGD-97-142 2000 Census Design B-276531 it is not yet clear how much money Congress will provide for this evaluation. Finally, in March 1997, Bureau officials said that, as part of its multiple mail contact initiative, the Bureau plans to send replacement questionnaires to all housing units, as opposed to (as planned) just those that did not return the original questionnaire. However, the Bureau did not release data on the costs or benefits of this change, which would result in the Bureau’s mailing two questionnaires to about 97.3 million households in 2000, about 59.5 million of which may already have returned a questionnaire.21 Questions Remain About Members of Congress have questioned whether the Bureau can the Operational Feasibility successfully implement some aspects of its current plan for the 2000 of Some Aspects of the Census. They have also raised concerns about the Bureau’s proposal to incorporate an adjustment into its basic counting process, citing issues of Bureau’s Refined Plan subjectivity and potential error associated with such an adjustment. Because the Bureau has not tested some aspects of its currently proposed plan and has not tested whether all aspects of this plan will work in concert with each other, there is uncertainty as to whether or how well the Bureau’s plan can be carried out. Field testing enables the Bureau, as well as Congress and other stakeholders, to assess the operational feasibility of key initiatives of its plans. The Bureau has been evaluating features of its proposed census design in a variety of tests over the past few years and plans to do more testing before 2000. For example, the Bureau did a test census in 1995 and began another test, primarily of its proposed coverage measurement survey and estimation methods (ICM), in October 1996. However, some operational aspects of the current census design have not yet been fully tested successfully or have not yet been tested in a manner similar to the implementation being proposed for 2000. For example, the Bureau’s current plan depends on completing sampling for nonresponse follow-up and ICM in time to produce the population count by December 31, 2000. However, the 1995 Census Test did not test a sampling operation designed to help determine whether nonresponse follow-up of the magnitude projected by the Bureau’s current plan could 21 The Bureau anticipates mailing questionnaires to about 82 percent of housing units and plans to use other procedures for obtaining responses from the remaining housing units. If the Bureau finds concentrations of new city-style addresses during its address list development process, it may add them to the percentage of housing units that are to receive a questionnaire by mail in 2000. Page 34 GAO/GGD-97-142 2000 Census Design B-276531 be completed in time for ICM to be done on schedule. While the Bureau did use a version of direct sampling for nonresponse follow-up in the 1995 Census Test, it was not designed to achieve at least a 90-percent completion rate before the Bureau relied on sample data to account for the remaining nonrespondents. In order to do ICM more rapidly, the Bureau plans to use laptop computers in the field. However, during the 1995 Census Test, the Bureau had difficulty loading data from nonresponse follow-up activities into the laptop computers in time for use by enumerators doing ICM interviews. As a result, enumerators, were unable to match interview data with original census questionnaire data, which prevented them from resolving discrepancies as originally planned. The Bureau revised the procedure and its software to correct earlier problems, began to retest this operation in early 1997, and is continuing to work on the procedures and software to be used for computerized matching of housing units and individuals listed by ICM and other census operations. Another aspect of the current plan that has yet to be tested is the operation of the scanning equipment for the 2000 Census that will be used to capture data on census forms. Although the Bureau used some scanning in the 1990 Census (i.e., multistage photoimaging equipment that turned photographs into microfiche and, in turn, into tapes), it is planning to use more sophisticated scanning equipment that can “read” handwritten material and convert it directly into a machine-readable format, as well as a more extensive use of scanning, in 2000. The Bureau used a prototype scanning system and optical character reader in parallel with the 1995 Census Test. The successful operation of this equipment is critical to the Bureau’s plan because its ICM procedure depends on the availability of accurate information. The Bureau has been developing the equipment with a contractor and plans to have a prototype ready for testing in the Census 2000 Dress Rehearsal in 1998. At this time, however, it is not clear that this test will be sufficient for determining whether the equipment can successfully handle the volume of forms that will be processed in 2000. If the equipment cannot or does not work out as expected, the Bureau proposes using a keypunching operation, which would be considerably slower than the scanning alternative and could cost more to complete the task. According to the Bureau, the Census 2000 Dress Rehearsal is to provide a census-like environment to demonstrate simultaneously those procedures that the Bureau plans to use in the 2000 Census. A meaningful dress Page 35 GAO/GGD-97-142 2000 Census Design B-276531 rehearsal is important for at least three reasons. First, the Bureau plans to implement several complex new design features in 2000, including the use of technologically sophisticated equipment that has not been used in previous censuses. Second, Congress and the Bureau have yet to reach agreement on the 2000 Census design, and uncertainty over the operational feasibility of the Bureau’s design was one of Congress’ major concerns. Third, a key feature of the Bureau’s current plan is to produce a one-number census by using sampling and statistical estimation. Thus far, the Bureau has not fully tested its proposed design, and the Census 2000 Dress Rehearsal is the last opportunity for such full-scale testing. While the Bureau provided us with projected results from simulations of its design, the Dress Rehearsal would provide an opportunity to determine whether the Bureau’s refined plan would actually produce similar results when implemented in the field. Implementing the 2000 Census without adequate testing creates the risk of a census with unsatisfactory results. Similarly, if the Bureau implements its current plan in 2000 and Congress were to decide after the Census 2000 Dress Rehearsal that it did not want the Bureau to make an adjustment to its initial count, substantial funds could be wasted, and the census results could be questionable. The Bureau has time before the 2000 Census is to begin to resolve open issues and test much of what it actually plans to implement in 2000. However, the available time is diminishing, and the Bureau has not yet completed detailed planning for all of its design features or its dress rehearsal. The Census 2000 Dress Rehearsal appears to be the last real chance the Bureau will have for a large-scale operational test of its overall design. Thus, a well planned and executed dress rehearsal should provide the Bureau with a good opportunity to demonstrate to Congress and others whether or not it can successfully implement its current plan and produce acceptable results. The need for this demonstration is particularly important considering the problems the Bureau has experienced to date and the controversy that surrounds its design. The Bureau has made considerable progress in preparing for the 2000 Conclusions Census since our October 1995 testimony. The Bureau has, however, revised some of its initial plans and encountered problems involving some aspects of its proposed design. For example, it has changed its plan for sending replacement questionnaires to just those housing units that did not return the original questionnaire. Instead, it now plans to send Page 36 GAO/GGD-97-142 2000 Census Design B-276531 replacement questionnaires to all housing units. Most importantly, it has run into major opposition from Congress on its plans to use sampling for nonresponse follow-up and ICM, and consequently it is still not certain how much funding will be made available. This situation creates a high risk to the nation of a census involving wasted expenditures and unsatisfactory results. Two of the major reasons for congressional concern are the lack of sufficiently detailed data, particularly on the effects on accuracy and equity of the Bureau’s proposals for sampling for nonresponse and ICM, and the uncertainty surrounding the operational feasibility of key aspects of the Bureau’s current census design. We recognize that the Bureau has faced difficulties as it has tried to address the concerns of all its stakeholders—who at times have had conflicting views—and as some aspects of its plan have not worked out as well as expected during testing. Although the Bureau faces a risk if it provides draft data while research is under way, we believe that Congress and other stakeholders would benefit from having the best data available at the time Bureau proposals are made, along with the appropriate qualifications. Full and open disclosure is the only possible antidote to suspicions that the Bureau is failing to fully inform its legitimate stakeholders. Since one of the purposes of testing is to determine the operational feasibility of plans, it should not be surprising that problems arise. No design is without flaws and trade-offs. The Bureau has said that the alternative to its proposed design is to return to past census methods and incorporate changes, such as initiatives to improve the response rate, but not sampling for nonresponse follow-up or ICM. In this regard, data on the costs and effects of that alternative would be helpful in considering whether the Bureau’s proposal should be approved. However, the Bureau has not provided Congress with these data. By not providing sufficient data on the likely effects of the Bureau’s initiatives for addressing the key goals for the census—reduced costs and improved accuracy and equity—the Bureau may fail to convincingly demonstrate the value of its plans and, in turn, may contribute to congressional skepticism about census design and the necessary funding level. Through the enactment of Public Law 105-18, Commerce is now required to supply data by July 12th on the likely effects of the census design, which should contribute to a more informed debate. However, as new data become available after July 12th, they also should be shared with Congress and other stakeholders. Page 37 GAO/GGD-97-142 2000 Census Design B-276531 The Census 2000 Dress Rehearsal offers a final opportunity for the Bureau to demonstrate the operational feasibility of its current plan, which proposes many new design initiatives for the 2000 Census. If in the dress rehearsal the Bureau does not demonstrate that all of the key initiatives of the design that are to be used in 2000 can successfully be implemented to produce acceptable results, it risks a census with higher rates of error than in 1990 as well as higher costs. With less than 3 years remaining until the census is to take place, the Bureau and Congress are not yet in agreement on some basic census design issues and the overall funding level. Although we believe there is still sufficient time for agreement to be reached and for the Bureau to prepare for a successful census, little margin for missteps, indecision, or miscommunication remains. We recommend that the Director of the Bureau of the Census Recommendations to the Director, Bureau • provide Congress and other stakeholders with detailed data, which are of the Census updated as necessary to meet the objective of full and open disclosure, on the expected effects of the Bureau’s census design proposals on costs and on accuracy and equity at various geographic levels, particularly as they relate to sampling for nonresponse and ICM as well as on a design that would not involve sampling nonrespondents and ICM; • work with Department of Commerce and OMB officials in reaching agreement with Congress on the design and funding level as quickly as possible, so that the Census 2000 Dress Rehearsal can be used to demonstrate all key design features planned for the 2000 Census; and • conduct the Census 2000 Dress Rehearsal to mirror as closely as possible the design features planned for the 2000 Census, including paid advertising, to test the operational feasibility of the design and to determine whether the outcomes achieved in the dress rehearsal are similar to those of the Bureau’s research and simulations, and provide these results to Congress in sufficient time to enable it to affect, if it so chooses, the final design for the 2000 Census. On April 2, 1997, we requested comments on a draft of this report from the Agency Comments Secretary of Commerce. On April 23, 1997, the Director of the Bureau of and Our Evaluation the Census responded that she agreed with our recommendations. She said that the Bureau had begun an intensive effort to improve communications with Congress and demonstrate responsiveness by Page 38 GAO/GGD-97-142 2000 Census Design B-276531 providing the information Congress needs to assess the value of the Bureau’s plans for Census 2000. She also said that the Bureau expects these improved communications to lead to agreement on the plan for conducting the Census 2000 Dress Rehearsal as a means to demonstrate the robustness of the methods proposed for use in Census 2000. The Director also stated that, in reviewing our draft report, the Bureau was alerted to several inconsistencies in the data it had provided us and on which we relied in making our analysis. She said the Bureau would regenerate the data. These data were provided to us on June 16, 1997, and are included as appendix II. Our report has been modified to reflect the new data where appropriate. The revised data did not cause us to change our basic analysis and conclusions, but they did reinforce the need, as expressed in our recommendations, for the Bureau to expose the data relating to the effects of its plan to broad scrutiny by Congress and other stakeholders. In view of the reporting requirements of Public Law 105-18, which was enacted after the Director commented on our draft report, we modified our recommendations slightly. While we continue to believe the Bureau should provide the details of its plans to Congress, we also believe updated data should be provided as it becomes available, beyond the reporting date established in Public Law 105-18. We are sending copies of this report to the Chairman, Senate Committee on Governmental Affairs; Chairman and Ranking Minority Member, House Committee on Government Reform and Oversight; Director, OMB; Secretary of Commerce; Director, Bureau of the Census; and other interested parties. Copies will be made available to others on request. Page 39 GAO/GGD-97-142 2000 Census Design B-276531 Please contact me on (202) 512-8676 or James H. Burow, Assistant Director, on (202) 512-3941 if you or your staff have any questions. Major contributors to this report are listed in appendix IV. Sincerely yours, L. Nye Stevens Director Federal Management and Workforce Issues Page 40 GAO/GGD-97-142 2000 Census Design Page 41 GAO/GGD-97-142 2000 Census Design Contents Letter 1 Appendix I 44 Sampling for Nonresponse Follow-Up 44 The 2000 Decennial Integrated Coverage Measurement 54 Census With Combined Effects of Sampling for Nonresponse and ICM 65 Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Appendix II 77 Bureau of the Census Summary Information on Design Alternatives for the 2000 Census Appendix III 87 Comments From the Bureau of the Census Appendix IV 88 Major Contributors to This Report Related GAO Products 91 Tables Table 1: Comparison of Estimated 2000 Census Costs for Selected 23 Design Alternatives Table I.1: Distribution of Census Tracts by Relative Error Level 69 Using Alternative Census Designs Page 42 GAO/GGD-97-142 2000 Census Design Contents Table II.1: Bureau of the Census Summary Data on Projected 78 Costs and Accuracy of Selected Census 2000 Alternative Methodologies Figures Figure 1: The Net Undercount Since 1940 7 Figure 2: Simulations Indicate Bureau’s Refined Design Could 25 Produce Lower Relative Error Rates Than a No Sampling Design in 2000 Figure I.1: How Different Sampling Options Would Work in a 50 Hypothetical Tract Figure I.2: Distribution of Census Tracts by Error Level Shows 71 Trade-Off Between Direct Sampling and No Sampling Designs Abbreviations CAPI Computer-assisted personal interviewing CV Coefficient of variation DSE Dual system estimation ICM Integrated coverage measurement OMB Office of Management and Budget PES Post enumeration survey SE Standard error Page 43 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues As part of our ongoing oversight of decennial census activities, and in order to assist Congress in assessing the proposed design for the 2000 Census, we have been reviewing the Bureau’s plans and efforts to incorporate sampling for nonresponse follow-up and integrated coverage measurement (ICM) procedures in the next census. In this appendix, we provide additional background and technical information on the results of the Bureau’s research on sampling and estimation methods, such as the expected advantages, disadvantages, costs, and benefits of different design options the Bureau has considered for the 2000 Census. The appendix is organized in three main sections. The first section presents information on sampling for nonresponse follow-up, the second on ICM, and the third on the combined effects of these proposed census procedures. Appendix II presents the Bureau’s summary data on the projected costs and accuracy of its current plan and alternative designs for the 2000 Census. Declining rates of public response to the census have generated a costly, Sampling for time-consuming nonresponse follow-up workload for the Bureau. The Nonresponse Bureau is planning for many efforts in the 2000 Census to encourage Follow-Up higher response rates and reduce its reliance on expensive enumerator visits to every housing unit for which a census questionnaire is not returned by mail. However, the nonresponse workload is still likely to be substantial. Therefore, after making what it believes are reasonable efforts to directly contact every housing unit, the Bureau plans to sample a portion of the remaining nonresponse units rather than continuing its efforts to complete follow-up interviews for all of the nonresponse units as it has in the past. The primary purposes of using sampling for nonresponse follow-up are to save money and time. The Bureau estimates that its current plan for sampling during nonresponse follow-up operations could save about $400 million in the 2000 Census off the cost of a census design incorporating all other proposed improvements on the 1990 Census model except sampling for nonresponse. The use of sampling also should enable the Bureau to complete follow-up operations more quickly than in past censuses, which is essential if it hopes to complete ICM operations and produce the census data by its legal deadlines. The Bureau focused its research in this area on identifying and refining the most promising options for sampling a portion of nonrespondents after the initial mail return phase of the census is completed. On the basis of its initial work on this subject, the Bureau decided in February 1996 that its Page 44 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues plan for nonresponse follow-up in the 2000 Census should ensure that it completes census questionnaires for at least 90 percent of all housing units in each county before using the information from sample units to account for the remaining nonrespondents. Public feedback and concerns about the potential fairness and accuracy of this initial proposal persuaded the Bureau to revise its plans in September 1996 to track response rates and control nonresponse sampling at the level of census tracts, which are generally smaller and more homogeneous than counties. Once the Bureau decided to control nonresponse sampling at the census tract level, it began to study three options for how to implement this basic design: (1) truncating conventional follow-up interviews after reaching the 90-percent completion threshold, then sampling the remainder; (2) truncating conventional follow-up after a specific period of time, then sampling the remainder; and (3) sampling nonrespondents directly after the end of the mail return phase of the census. All three options are designed to collect information directly for at least 90 percent of the housing units in each tract, through a combination of mail returns, follow-up interviews, and other means, before relying on sample data to estimate the population counts and characteristics of the remaining nonrespondents. The Bureau’s research indicated that implementing sampling directly after the mail return phase would have more advantages than the other options with regard to cost, accuracy, and operational feasibility. Therefore, in March 1997, the Bureau selected the direct sampling option for nonresponse follow-up in the 2000 Census. Census Nonresponse For the 2000 Census, the Bureau intends to rely on mail returns to collect Generates Substantial census data from most housing units in the country, as it has in every Problems for Bureau census since 1970. Unfortunately, census mail response rates have been falling since the Bureau first implemented this mail-back approach. If past trends continue, the Bureau believes the mail response rate could decline from the 65-percent rate achieved in the 1990 Census to 55 percent in the 2000 Census, leaving over 50 million nonresponding units to account for. The declining rate of public cooperation with the decennial census generated a substantial problem for the Bureau in 1990. When questionnaires were not returned for all housing units provided a census form, the Bureau sent temporary enumerators out into the field in an attempt to get data on each nonresponding unit. This was an extremely costly, time-consuming, and sometimes error-prone operation. In the 1990 Census, nonresponse follow-up operations required a minimum of Page 45 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues $560 million to carry out and continued for 14 weeks instead of the planned 6 weeks. The delay in completing nonresponse operations can be attributed in large part to the effort needed to contact the relatively small portion of units that proved most difficult to resolve. This problem was widespread. For half of the census tracts in the country, once 90 percent of returns were in, the Bureau still needed 65 days or more to finish collecting data on the last 10 percent of housing units. The Bureau’s evaluations of the 1990 Census demonstrated that the quality of these enumerations declined substantially as data collection efforts continued over time, (e.g., with regard to persons counted more than once, counted in the wrong place, or missed entirely). The Bureau plans a number of efforts to encourage people to voluntarily respond to the 2000 Census. These efforts include enhancements such as more user-friendly census forms, an improved marketing plan, providing multiple ways for people to respond to the census, and implementing a strategy of multiple mail contacts with each address. The Bureau estimates the combined effects of these efforts will result in an increase of about 12 percent in baseline response rates, projecting to an overall mail response rate of 66.9 percent in 2000. However, even if these new efforts work as well as planned, the Bureau would still face a substantial nonresponse workload in the next census of nearly 40 million housing units. This would exceed the entire nonresponse workload of the 1990 Census by approximately 5 million housing units. Handling a nonresponse follow-up workload of this size is likely to pose a serious challenge to the Bureau in the 2000 Census. Mail response rates can vary dramatically among different areas of the country, generating very large nonresponse workloads in some places. In addition, there is ample evidence that nonresponse follow-up is becoming more expensive, not only because the number of nonrespondents is growing, but also because (1) residents of nonresponding housing units are becoming more difficult to find and interview and (2) the Bureau is finding it harder to recruit and afford enough qualified temporary workers to complete the task. Those problems, in turn, can contribute to escalating labor costs and declining productivity during the census. Bureau officials believe that these problems, especially the workforce difficulties they anticipate in the 2000 Census, suggest that they could not use the 1990 Census design again even if they wanted to. They intend to implement a number of other efforts to reduce the reliance on enumerator visits to each nonresponding housing unit (e.g., attempting to cover more Page 46 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues of these units through telephone interviews). Altogether, the variety of activities planned for the next census should result in multiple efforts to directly contact every housing unit, as well as multiple opportunities for people to respond to the census. However, Bureau officials still believe that sampling a portion of nonrespondents will be necessary to control the cost of the census and enable them to complete follow-up operations in a timely manner. Bureau Continues to The primary purposes of using sampling for nonresponse follow-up are to Refine Alternatives for save money and time. After preliminary research efforts to identify Sampling Nonrespondents promising designs for sampling nonrespondents, Bureau management announced in February 1996 selection of a 90-percent truncation design for the Census 2000 plan. Under this design, the Bureau would implement a conventional nonresponse follow-up operation at the end of the mail return phase of the census that would continue until enumerators were able to obtain information for at least 90 percent of the housing units in each county or county-equivalent area (such as parishes in Louisiana). The Bureau would then truncate (curtail) conventional follow-up operations and select a 1-in-10 sample of the remaining nonresponding units. It would use the information obtained from interviewing these sample units to provide census data for the sample units themselves and also to estimate population counts and characteristics for the remaining nonresponse units. In September 1996, the Bureau revised its planned approach. The new design would control nonresponse sampling at the level of each census tract—a small, relatively permanent statistical subdivision of a county—rather than each county.1 This plan maintained the overall goal of collecting data on at least 90 percent of the housing units in each area before relying on sampling to estimate data for the remaining nonrespondents. The Bureau made the change to tract-level truncation in response to public feedback and reservations about the potential effects of county-level truncation. Stakeholders from minority communities and others, including a National Academy of Sciences panel, pointed out that areas within counties may often differ in terms of how easy they are to enumerate, which could affect the implementation of the Bureau’s plan. If the Bureau controlled truncation at the county level, they noted the likelihood that the threshold would be achieved disproportionately by mail 1 Census tracts usually have between 2,500 and 8,000 persons—averaging about 4,000—and, when first delineated, are designed to be homogeneous with respect to population characteristics, economic status, and living conditions. Census tracts do not cross county boundaries. The 1990 Census included over 60,000 tracts and tract-equivalent areas. Page 47 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues responses and direct follow-up interviews in the areas where enumeration was easiest. Sampling might then be relied upon for much more than just the last 10 percent of households within the harder-to-enumerate areas. The Bureau’s shift to basing its plan on tract-level response rates rather than county-level rates appears to improve the equity of the sampling plan but also presents new challenges. On balance, the change to tract-level truncation may provide more equitable and consistent implementation of sampling for nonresponse follow-up than the original design. This is primarily because the socioeconomic characteristics of households at the tract level are more homogeneous than is the case at the county level. Therefore, sampling controlled at the tract level is less likely to present a risk of uneven implementation (i.e., ignoring hard-to-enumerate areas until the very end of the census). However, shifting to tract-level truncation also complicates the task of managing and monitoring the nonresponse follow-up phase, especially given that progress will need to be tracked and controlled for around 60,000 census tracts nationwide, compared to about 3,000 counties under the Bureau’s original proposal. It may also increase the difficulty of reaching the goal of a 90-percent completion rate in every tract, at any rate, within a reasonable time frame. When the Bureau decided that its basic design for nonresponse follow-up operations in the 2000 Census should ensure that it obtains responses from at least 90 percent of all housing units in each census tract before relying on the information from sample units to account for the remaining nonrespondents, it began considering alternative designs to achieve this goal. The Bureau’s research focused on three design options: (1) truncating conventional follow-up at 90-percent completion, then sampling the remaining nonrespondents; (2) truncating conventional follow-up after a specific period of time, then sampling the remaining nonresponse units; and (3) implementing sampling of nonresponding units directly after the mail return phase ends. The first option was the Bureau’s original design from February 1996, except that it would track completion rates for each tract rather than each county. Under the time-truncation option, conventional follow-up interviews would continue for a predetermined length of time, such as 3 weeks. After this initial follow-up period, the Bureau would select a sample of the remaining nonrespondents in each tract that included enough units to raise the completion rate to at least 90 percent. Under the direct-sampling option, there would be no conventional follow-up phase. Instead, at the end of the mail return phase the Bureau would select a Page 48 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues sample of the nonresponding housing units in each tract that would be sufficient to achieve at least a 90-percent completion rate. For example, in a tract with a mail response rate of 70 percent, the Bureau would select two out of every three of the remaining units for the follow-up sample. Under each of the 3 options, for any tract with an initial response rate above 90 percent the Bureau would follow up on a 1-in-10 sample of the remaining addresses. Figure I.1 illustrates how each of these options might work in a hypothetical census tract where the Bureau is able to obtain a mail response rate of 60 percent. In this simplified example, we show how the Bureau would determine the census data for housing units in the tract as it works toward resolving 100 percent of the units. Page 49 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Figure I.1: How Different Sampling Options Would Work in a Hypothetical Design options Tract 90% truncation Time truncation Direct sampling 0 10 20 30 40 50 60 70 80 90 100 Percentage of housing units Estimated using sample data Sample interviews Conventional nonresponse follow-up interviews Counted by mail return Note 1: For all options, we assume a mail response rate for housing units in the tract of 60 percent. The base for this calculation would be all housing units provided a census form and asked to return it by mail. Note 2: For the time-truncation option, we assume that the Bureau is able to obtain responses from half of the nonrespondents during a limited period of conventional follow-up interviews. Note 3: Some of the units that make up the nonresponse workload will be vacant or not be nonresidential units. The Bureau expects that the Postal Service will identify a portion of the vacant units in the 2000 Census (although the Bureau will recheck a sample of these). To simplify our presentation, we have not included any estimates for this component. Source: Example created by GAO for illustration purposes only. Under the 90-percent truncation option, and assuming that the tract had a mail response rate of 60 percent, the Bureau would implement a Page 50 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues conventional nonresponse follow-up operation until enumerators were able to get responses for another 30 percent of the housing units in the tract (bringing the total completion rate for the tract up to 90 percent). At that point, the Bureau would select a random sample at the rate of 1 in 10 of the nonresponding units that still remained unaccounted for. The information gathered from that 1 percent of the tract’s housing units would provide the census data for the sample units and the remaining 9 percent of the tract’s units. Using the time-truncation option, there would be a limited period of time, such as 3 weeks or until a prespecified date, during which census enumerators would attempt follow-up visits to all nonresponding units. For our hypothetical example, we assume that enumerators are able to complete interviews for half of the nonresponding units during this initial follow-up phase, taking the overall completion rate to 80 percent. The Bureau would then need to select and interview a sample of one of every two of the remaining nonrespondents to achieve a completion rate of 90 percent. The information gathered from that 10 percent of the tract’s housing units would provide the census data for the sample units and for the 10 percent of nonresponding units that remain. With the direct sampling option in this scenario, the Bureau would select a random sample at the rate of three of every four of the nonresponding units in the tract at the end of the mail return phase. A sample of this size would be enough to get the overall completion rate to 90 percent. The information gathered from the 30 percent of the tract’s housing units in the sample would provide the census data for the sample units themselves and for the remaining 10 percent of units not selected for the sample. Research Identified The Bureau engaged in a variety of research efforts to study the operations Potential Advantages and and potential outcomes of its planned approach for conducting Disadvantages of nonresponse follow-up in 2000, as well as other alternatives. The Bureau conducted field tests and evaluations of various elements of nonresponse Alternative Designs operations during the 1995 Census Test. The Bureau also carried out computer simulations, using 1990 data, to examine the results produced by various alternative methods for nonresponse sampling. The Bureau used these simulations to help identify the sampling options that were most promising. In general, the Bureau’s research confirmed the potential for sampling to produce less costly, more timely results from nonresponse follow-up. The Page 51 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Bureau estimated that its current plan for the 2000 Census—using the direct sampling option for handling nonresponse follow-up—could save approximately $400 million, when compared to the cost of a census design incorporating all other planned improvements of the 1990 Census design (including ICM) except sampling for nonresponse follow-up. Among the major factors driving the cost differences are the number of nonresponding housing units that the Bureau would need to visit under each design and the peak staffing levels that would be required in the local census offices to carry out the follow-up interviews. Completing nonresponse follow-up operations in a timely manner is important if the Bureau is to limit the deterioration in the quality of the data collected that occurs as nonresponse operations drag out over time. It is also crucial to the success of any coverage measurement survey, such as the ICM proposed for the 2000 Census, because the Bureau must provide the final census data for reapportionment and redistricting purposes by legislatively mandated deadlines. Title 13 of the U.S. Code mandates that the state population totals required for reapportionment of the House of Representatives be provided within 9 months after Census Day (April 1) and that local area data needed for redistricting be provided within one year after the decennial census date. The legal deadlines the Bureau must meet are therefore December 31 of the census year for reapportionment data and March 31 of the following year for redistricting data. The 1995 Census Test demonstrated the potential to complete nonresponse follow-up operations in a more timely manner by using sampling. In that test, the Bureau implemented a version of direct sampling for nonresponse follow-up immediately after the completion of mail return data collection. The overall sampling rates used by the Bureau in this test were two-sevenths of the housing units that did not respond by mail in the Oakland, California, test site and one-sixth of the nonresponding housing units for the test sites in Paterson, New Jersey, and six parishes in Northwest Louisiana. According to Bureau officials, this approach enabled them to complete nonresponse follow-up operations on time and within budget for the first time in any census or test. However, this direct sampling approach did not require the Bureau to achieve at least a 90-percent completion rate before relying on sample data to account for the remaining nonrespondents. Therefore, while the 1995 Census Test results were encouraging, they did not resolve the question of whether, using its selected option, the Bureau can complete nonresponse follow-up in every tract on or close to schedule in the 2000 Census. Page 52 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues The Bureau’s initial efforts to study alternative designs for sampling nonrespondents helped it reach a decision on the extent to which it would use sampling in the 2000 Census. On the basis of that preliminary work, the Bureau proposed using a 90-percent truncation design for nonresponse follow-up in the plan for the 2000 Census. Bureau management selected truncation at 90 percent as the preferred design primarily because of concerns about whether the public would understand and accept using sampling to a greater extent. By truncating conventional follow-up only after completing responses for 90 percent of housing units, then sampling the remaining units, the Bureau would obtain direct responses for at least 91 percent of all units. Under other alternatives for nonresponse sampling that the Bureau initially considered, such as truncating at 70-percent completion then sampling, it would have obtained direct responses for less than 80 percent of all housing units. After the Bureau revised its plan in September 1996 so that it would control nonresponse follow-up at the tract level, subsequent research focused on identifying the most promising design to achieve the 90-percent completion goal in each census tract. The Bureau identified advantages and disadvantages to each of the three design options it considered—truncation at 90 percent, time truncation, and direct sampling. On balance, the Bureau’s research indicated that implementing sampling directly after the mail return phase would have more advantages than the other options. Therefore, in March 1997, the Bureau selected the direct sampling option for nonresponse follow-up in the 2000 Census. The Bureau’s research suggested that a design using the direct sampling option for nonresponse follow-up would be better than other designs in terms of cost, accuracy, and operational feasibility. Bureau officials estimated the cost of implementing direct sampling to be between $200 million and $600 million less than the cost of the other two options they considered. In simulations of the accuracy of different options, direct sampling produced slightly better results, particularly for small geographic areas, such as census tracts. (We discuss the expected accuracy of alternative census designs in more detail in the last section of this appendix, and additional summary information from the Bureau appears in app. II.) The Bureau’s regional directors and field staff preferred this option because it is simpler to implement, entails only one operation with a single workload that is established at the start of the follow-up phase, and provides more time to complete interviews for the designated sample units. The direct sampling design, however, differs most from the design the Bureau originally announced as its plan for the next census, as well as Page 53 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues from the procedures used in past censuses. In addition, once the sample is selected, nonrespondents in housing units not selected as part of the sample would not have another chance to be interviewed by census enumerators. Therefore, Bureau officials believe this option could have some public-perception problems. According to the Bureau, the time-truncation option may have an advantage with regard to public perception because, under that design, enumerators would make follow-up visits to all nonresponding housing units in an intensive effort before sampling begins. A time truncation design should also take less time to finish than the 90-percent truncation design because conventional follow-up interviews would automatically end on schedule, rather than continue until the Bureau was able to resolve 90 percent of each tract’s housing units. However, because it involves more than one follow-up operation and workload, the time-truncation design is likely to take more time than direct sampling. The intensive initial follow-up effort also makes time truncation more expensive than the other two options for sampling nonrespondents. The Bureau would need a large number of temporary enumerators to attempt to contact every nonresponding unit within the short period of conventional follow-up interviews. Public perception was also the primary advantage identified by the Bureau for the 90-percent truncation design. Not only would it most closely resemble past census procedures, but it is also the design option that the Bureau has been discussing publicly since it announced its original plan for the 2000 Census in February 1996. However, the Bureau’s cost-model projections indicated that truncating at 90-percent completion would be more costly than the direct sampling option, and that this option also had the highest average error rate for small neighborhoods, like tracts, in Bureau simulations. Also, the Bureau’s regional directors believed that accomplishing this design in the allotted time for nonresponse follow-up operations would be very difficult. Truncating at 90-percent completion, like the time-truncation option, separates nonresponse follow-up into two operations. The Bureau is concerned that the break between conventional interviews and sample interviews makes this design more complex and could result in higher staff turnover. The purpose of ICM is to improve the accuracy of census data, in particular Integrated Coverage by reducing the differential undercounts of minorities and other Measurement hard-to-enumerate population groups and areas that have been Page 54 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues documented for previous censuses. Evaluations of past censuses have shown a persistent net undercount in the final census population total and, for subnational data, net undercounts that differed across population groups and geographic areas. Evaluations also indicated that simply adding more conventional counting operations was not effective in eliminating or reducing these undercounts in the 1990 Census. Therefore, the Bureau concluded that the 2000 Census design should incorporate the results of a coverage measurement survey conducted immediately following basic data collection as an integral and necessary step toward completing the census. ICM would be the last phase in producing the final census numbers, following mail returns and other basic data collection efforts, nonresponse follow-up data from enumerator interviews, and nonresponse sampling results. During this last phase, the Bureau would conduct a large sample survey to check the accuracy of all earlier Census 2000 data collection efforts. Bureau enumerators would compare and reconcile the results from the ICM survey and the earlier census efforts for each housing unit in the ICM sample to determine the extent to which people and housing units were correctly counted, missed, or included in error in the previous phases. Using statistical estimation methods, the Bureau would then use this information to estimate and correct for errors in the census data for the entire country. The Bureau designed the proposed 2000 ICM to address several major weaknesses of the 1990 Post Enumeration Survey (PES), especially regarding timeliness and the accuracy of population estimates for subnational areas.2 Because of these design changes, the 2000 Census ICM would entail a substantial investment in resources, compared with the 1990 PES. It would also represent a dramatic shift in the integral census operations because, rather than producing an alternative adjusted set of census data, the results of ICM would automatically be incorporated into one official set of census data. While results from research and testing to date indicate that ICM has the potential to improve the accuracy of census data, they also show that further operational and methodological testing and development are needed before 2000. 2 The 1990 PES was designed to estimate the net undercount in the census. It was a matching study in which the Bureau interviewed a sample of households several months after the census. The results of these interviews were compared with census questionnaires to determine whether each person was correctly counted in the census, missed, or included in error. The results could have been used to adjust the 1990 Census to correct for coverage errors, if the Secretary of Commerce had so decided. Page 55 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Persistent Coverage Errors One of the most fundamental criticisms of the census, and much of the Affect Census Accuracy impetus behind efforts to redesign the census-taking approach, is that it and Equity fails to count every area and population group equally well. Undercounts that are not equally distributed among geographical areas and population groups can create inequities in political representation and the distribution of public funds. To be equitable, it is not enough for a census to be generally accurate in a strict numeric sense (i.e., that the total count be close to the total U.S. population). For many uses of census data, including reapportionment of congressional seats, legislative redistricting, and some funds distribution, proportions matter more than the raw totals.3 Evaluations of past censuses have revealed persistent coverage errors, such as net undercounts in the total population figures and differential net undercounts in census data by race and geographic area. For the 1990 Census, the reported net undercount was about 1.8 percent of the population (4.7 million persons), according to independent demographic analysis.4 However, that does not mean that over 98 percent of U.S. residents were actually counted, as is often reported, since the number of persons missed by the census was partially offset in the net count by millions of persons who were double counted or improperly included. The PES indicated that about 6 million persons were counted twice in the 1990 Census, while 10 million were missed.5 The 4.4 percent difference between the net undercount for blacks (5.7 percent) and nonblacks (1.3 percent) in 1990 was the highest differential undercount measured by independent demographic analysis since 1940. The Bureau’s evaluations demonstrated that the 1990 Census missed entire housing units (and their occupants), missed people within housing units that it did count, and included other persons in the population counts in error (e.g., by counting them more than once or in the wrong place). The 3 Distributional accuracy, however, is not the only criterion for the quality of census data. For example, some applications use specific population thresholds, such as those defining eligibility for becoming a metropolitan area. 4 Demographic analysis provides an estimate of the population derived largely from administrative data such as birth and death records. It is important because it provides an independent estimate and a consistent historical series of data from 1940 to the present. However, its ability to produce reliable estimates below the national level, or for components of the population other than black and nonblack, is very limited. 5 In addition, evaluations revealed that about 2.4 percent of the enumerated count in the 1990 Census represented persons who should not have been counted at all (such as those who died before or were born after census day, April 1) and those who should have been counted at another address. While some of these errors would not affect the numeric total (e.g., as in the case of a person who was missed at the address where he or she should have been counted, but was included in error at another location), these types of errors could affect the distribution and accuracy of population counts reported for small areas, such as blocks. Page 56 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues detailed results from evaluations showed that young adult males, members of ethnic and racial minorities, and renters, among others, were more likely to be undercounted by the census than other residents. The difficulty of obtaining a complete, accurate enumeration of the residents in major urban areas is well known, but evaluations have consistently shown high error rates for data on persons living in rural areas as well. Error rates were also demonstrably higher for persons counted on forms completed through enumerator follow-up rather than mailed in by the household. The Bureau’s research revealed many potential sources of coverage errors. For example, at the very start of the 1990 Census, the address lists used to guide data collection were incomplete and had other problems that made it difficult for the Bureau to deliver questionnaires or attempt interviews with every housing unit. In addition, people moved during the census operations, which made it more likely that some persons would be missed, counted more than once, or counted in the wrong place. Also, respondents and enumerators sometimes had difficulty interpreting the residency rules that determine whether and where people should be counted for census purposes. Just as sampling for nonresponse follow-up is only one component of the Bureau’s strategy to handle the nonresponse problem, ICM is only one element in the Bureau’s plans to improve census accuracy and reduce the differential undercount in 2000. For example, improvements in address list development and expanded partnerships with other levels of government and community groups should identify some housing units that the Bureau might otherwise miss. To cite another example, the Bureau intends to use special targeted methods to improve the count of population groups and the count in geographic areas in where the census has tended to miss a disproportionate share of the people. Altogether, the efforts planned for Census 2000 should result in the Bureau making multiple attempts to contact and count all residents and should provide multiple opportunities for people to respond to the census. However, even if all other design components produce improvements in the conventional census counting operations, the evidence from past coverage evaluations indicates that errors in the census data will still occur. Also, the task of accurately counting all members of the population is becoming more difficult, in part due to a growing population, but also as a reflection of the difficulty of having census rules, definitions, and methods keep pace with changes in society. Bureau evaluations of the Page 57 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues 1990 Census indicated that simply adding more conventional counting operations was not effective in eliminating or reducing differential undercounts of areas and population groups that have been hard to enumerate accurately. Therefore, the Bureau and expert panels of the National Academy of Sciences concluded that the 2000 Census design should incorporate the results of a coverage measurement survey conducted immediately following basic data collection as an integral and necessary step toward completing the next census. Proposed ICM Operation The purpose of ICM is to improve the accuracy of census data—in Includes Major Changes particular, to reduce the differential undercount—rather than just evaluate From 1990 PES the quality of the data. While the Bureau has evaluated the magnitude and characteristics of census errors and undercounts since 1950, the evaluation findings have not been used to correct for coverage errors in the decennial census tabulations. A statistical adjustment was considered after the 1990 Census, with the Bureau using the results of the 1990 PES to produce a second, adjusted set of census data. However, upon review of the original census data and the adjusted census data, the Secretary of Commerce found the evidence in support of an adjustment to be inconclusive and unconvincing and decided that the 1990 Census counts should not be changed. The Bureau’s proposed design for the 2000 Census would therefore mark the first time that such a step would be an integral part of completing the census. The ICM results would be the last component in producing final census numbers, following mail returns and other basic data collection efforts, nonresponse follow-up data from enumerator interviews, and nonresponse sampling results. The proposed ICM operation is a coverage measurement survey that estimates the true population on Census Day based on interviewing housing units in a sample of blocks across the country. It would involve several main stages: (1) selecting a sample of blocks in advance of the 2000 Census that the Bureau would survey after all census data collection efforts have been completed in those blocks, (2) developing the best possible address list for ICM sample blocks, (3) completing interviews for every housing unit in the sample blocks to compile an independent list of Census Day residents and match this ICM list to the census list, and (4) using the results to estimate the true population on Census Day. In the first stage, the Bureau would select a sample of blocks from across the country. To ensure that this sample was sufficiently large and representative to accurately estimate coverage errors in the census for Page 58 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues specific geographic areas and population groups, the Bureau would stratify blocks by characteristics such as the states they are located in, population size, racial and ethnic composition of the residents (e.g., blocks in which at least 50 percent of the residents in the 1990 Census were black), tenure (e.g., whether most housing units are owned or rented), whether or not the block is in an urbanized area, and other factors. Stratifying and weighting the sample blocks, rather than just drawing a simple random sample of blocks across the country, would enable the Bureau to include an adequate number of sample units to provide estimates of census coverage errors for specific population groups or areas, even if they constitute a relatively small part of the total population in the nation. Once the ICM sample has been selected, the Bureau would compile an enhanced master address list for the sample blocks. This enhanced address list is intended to be the most complete listing possible of the housing units in those blocks, given that it would be produced by more intense and higher quality listing procedures than are possible for the census as a whole. The Bureau would create this list by combining, comparing, and reconciling the address list from the census with an independent list developed by Bureau enumerators before the ICM interviews begin. In the third stage, Bureau enumerators would complete an interview at the housing units in the sample blocks after basic census data collection has been done. Under its current plan, the Bureau intends to use computer assisted personal interviewing (CAPI) technology, laptop computers that have the ICM questionnaire and other data needed for the interview already preloaded into the machine. The Bureau enumerators would attempt to complete an interview for each housing unit on the enhanced address list, thereby obtaining a roster of the people living at the unit on Census Day. Once this ICM roster was completed, the CAPI system would match all census and ICM responses possible and reveal the roster from any census questionnaire for that particular address. The Bureau enumerator would then attempt to reconcile any differences between the different rosters and determine which persons should have been enumerated at the housing unit according to census residency rules.6 In the final stage, information from the ICM interviews would be used to estimate the extent to which housing units and people were correctly 6 Because the ICM procedures estimate the quality of census data collected by comparing ICM and census results for the same housing units, the Bureau would use 100-percent follow-up of nonrespondents in all ICM sample blocks, instead of sampling a portion of the nonrespondents. Page 59 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues enumerated, missed, or counted in error for the entire census. The Bureau would estimate the correct population for entire geographic areas, as well as for specific poststrata as defined by characteristics such as age, sex, tenure, race and ethnic origin. In other words, while stratification of the original sample would be done on the basis of the characteristics of blocks, poststratification would be based on the characteristics of individuals. For example, in the Oakland test site of the 1995 Census Test, one ICM poststratum for which the Bureau produced an estimate was Asian and Pacific Islander females between the ages of 18 and 29 who lived in nonowner (rental) housing units. The resulting estimates for all poststrata would be incorporated through statistical procedures into the final census data. To address concerns raised about the PES during the adjustment decision after the 1990 Census, the Bureau plans several major design changes for its coverage evaluation survey in the 2000 Census. Perhaps most significantly, the Bureau intends to make ICM an integral part of the census process in 2000, thus producing one official set of numbers that reflect the Bureau’s best estimate, rather than the original and adjusted census data sets that were produced after the 1990 Census. Thus, unlike the process followed in the 1990 Census adjustment decision, this “one-number census” design would not include a step where two sets of numbers are evaluated to determine which set is more accurate: the data incorporating the results of the ICM survey would automatically become the official census data. This one-number approach may help to mitigate concerns expressed by the Secretary of Commerce when he decided not to adjust the 1990 Census, such as the potential for confusion that more than one set of numbers could create and the potential for political considerations to play a part in choosing between sets of numbers when the outcome of the choices (such as differences in apportionment of seats in Congress) can be known in advance of a decision. The Bureau plans other changes that are intended to improve the timeliness and accuracy of the ICM results, compared with those produced by the 1990 PES. The ICM is designed to be completed in a much shorter time than the PES in order to meet the deadline for reporting census data for apportionment purposes. The CAPI technology is one of the key design changes that can improve timeliness since it reduces the need for additional follow-up interviews of sample units. The Bureau also proposes using a sample in the 2000 ICM that is approximately five times larger (about 750,000 housing units) than the 1990 PES (about 150,000 units). This larger sample should allow the Bureau to produce direct estimates for Page 60 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues each state and improve the quality of estimates for substate geographic areas.7 As another enhancement of the 1990 PES procedures, the Bureau is exploring ways to produce household-level data. Persons would be added to or subtracted from individual housing units through the ICM estimation procedures, thus providing household characteristic data for researchers and other data users, in place of using the 1990 PES procedure that would have adjusted only the block data totals. As a consequence of these changes, especially the larger sample size, the planned 2000 ICM is also more expensive than the 1990 PES (in constant 1990 dollars, the cost is $230 million for the 2000 ICM compared to $55 million for the 1990 PES).8 Research and Test Results The Bureau has used a combination of computer simulations and field for ICM Methods Have testing in its research to design and refine the proposed ICM operation for Shown Mixed Results the 2000 Census. Simulations of ICM results using 1990 Census data showed general improvement over the results produced by the 1990 counting operations. Bureau simulations also indicated that an ICM survey of the size proposed for 2000 could support a design that would achieve for each state a coefficient of variation of approximately 0.5 percent or a standard error of 60,000 persons, whichever is smaller.9 The simulated ICM reduced the patterns of differential undercounts across population subgroups and areas observed in the 1990 Census data. The Bureau had several key ICM-related objectives in the 1995 Census Test. Operational objectives focused on whether the new procedures, with heavy reliance on CAPI technology, would work and improve the completeness and timeliness of the coverage survey operation. Technical objectives focused on continuing the evaluation of two potential estimation methods for the ICM, called Dual System Estimation (DSE) and CensusPlus, as well as completing a number of studies to evaluate potential sources of bias in the population estimates produced by ICM. 7 A direct estimate is based entirely on data from the area for which the estimate is calculated. For instance, a direct population estimate for Missouri would be calculated using only data collected from Missouri. Indirect estimates, such as the 1990 PES state population estimates, draw on data from outside the area being estimated. 8 Even if Congress should decide against using ICM to produce the final census numbers, some portion of the resources slated for ICM would still be needed for a smaller coverage survey operation that would be used for evaluation purposes only. Such an operation is essential for evaluating the quality of the census, especially at subnational levels, and also provides information used to plan and budget the next census. 9 The standard error (SE) and coefficient of variation (CV) are measures of the precision of an estimate. Whatever true value the estimate might have, the standard error tells how wide an interval to expect from all possible samples like the one under consideration. The CV is the standard error relative to the size of the estimate. Page 61 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Statistical estimation methods are needed to translate the information from the ICM interviews into estimates of the true population, which in turn generate the factors used to correct the raw census counts up through nonresponse follow-up. For example, if the estimation method indicates that the census undercounted people in a particular poststratum by 4 percent, the Bureau would multiply every person counted by the census in that poststratum by 1.04 to produce the final census counts. DSE is a capture-recapture estimation method that assumes neither the census nor the ICM counts everyone. Instead, using probability theory, it uses a comparison of the results from the two lists (the census and the ICM survey) to estimate the total population.10 In other words, it assumes that using two independent estimates of the population can generate a third, better estimate of the “true” population. DSE was used in the 1990 PES and prior census coverage evaluation surveys, as well as several tests, so the Bureau has a body of research and experience to rely on in the design and execution of the method. The other major advantage of this method is that the resulting estimates include a component to estimate persons missed by both the census and the ICM survey. However, DSE is a complex estimation method. Therefore, it takes more time to complete population estimates using DSE, and the method may be harder to explain to the public. In contrast, CensusPlus is a new method that the Bureau had never tested before 1995. CensusPlus assumes that a second, higher quality survey (the ICM) can find the “true” population by reconciling an ICM roster with the census roster for the same housing unit. The reconciled count provides a ratio to extrapolate for non-ICM households. The major advantages of CensusPlus are that it is simpler than DSE, and may be faster to complete and easier to explain. The results of the 1995 Census Test for ICM were mixed. Given that it was the first field test of the operation, this was not surprising. Among the successes that the Bureau identified in the test for ICM was the CAPI-based interviewing, which was well received by enumerators and those being interviewed, and proved feasible for use in a coverage survey. The test results showed that this technology offers considerable advantages in terms of timeliness, control, and quality of ICM interview data. Overall timeliness of the coverage survey operation was much improved compared with the 1990 PES experience. The Bureau completed field work for the 1995 ICM in about 2 months less time than it took to complete work for the 1990 PES, which enabled the Bureau to produce the final census data for both estimation methods before the end of the calendar year. 10 For a more detailed description of the DSE method and the mathematical procedures used, see 1990 Census Adjustment: Estimating Census Accuracy—A Complex Task (GAO/GGD-91-42, Mar. 11, 1991). Page 62 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Finally, evaluations of specific potential sources of bias in the ICM estimates and of other data collection errors indicated that the operations were generally unbiased and effective. While some of the new techniques and processes worked well in the census test, the test results also clearly showed that further research is needed in a number of areas. The Bureau experienced some serious operational and technical problems with the ICM in the 1995 Census Test. On the operational side, the most serious problems were a high rate of ICM nonresponse and the failure to load all census data into the laptop computers in time for the ICM interviews. The big technical problem was the poor showing of the CensusPlus estimation method. According to Bureau evaluations, the nonresponse rates for the ICM interviews were too high, generating more missing data than the Bureau’s estimation methods were designed to handle. Some of the nonresponses represented partial interviews that did not provide sufficient data for the Bureau to use in reconciling census and ICM rosters (i.e., there was not enough specific information to be able to tell whether a person listed on the census roster was the same person as one listed on the ICM roster). High rates of noninterviews and missing data can create significant problems for the Bureau when it attempts to estimate the true population. The population estimates can vary depending on different assumptions the Bureau makes about how to treat the missing information. For example, in the Bureau’s evaluation of coverage in the 1980 Census, different assumptions about the treatment of missing data, along with other limitations, generated 12 different sets of estimates of the true population. In the 1995 Census Test, the Bureau imputed responses for missing data based on the responses that it was able to obtain, but its subsequent evaluations of the test indicated that the ICM nonrespondents were dissimilar to the ICM respondents. Although the ICM began after nonresponse follow-up ended, the Bureau had not yet finished recording all the data from the last nonresponse interviews. Therefore, not all the census data needed for ICM interviews were loaded into the laptop computers before enumerators went out for those interviews. Bureau officials said that this problem was compounded because a relatively large portion of the census data from nonresponse follow-up interviews came in at the end of that operation. Census data were missing in 29 percent of the ICM sample cases from which interviewers called up information. For those cases, this made on-the-spot reconciliation of ICM and census rosters impossible and also eliminated the Page 63 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues possibility of ICM interviewers probing to resolve discrepancies and identify persons not originally accounted for. Among other difficulties with the computer technology, the Bureau discovered problems with the flow and complexity of the survey instrument loaded into the machines. Of the two estimation methods the Bureau tested in 1995, only DSE showed a consistent ability to reduce the differential undercount of traditionally undercounted population groups. In contrast, the CensusPlus method produced an unexpectedly poor showing according to Bureau officials. The estimates produced by this method were lower than the total count after nonresponse follow-up (i.e., from all phases before ICM results were incorporated) for some traditionally undercounted population groups. CensusPlus results did not reduce the differential undercount of blacks and only provided limited improvement in the counts for Hispanics. The CensusPlus results also showed patterns that did not appear reasonable when compared with independent estimates, such as those produced by demographic analysis. Bureau officials were not certain whether the poor performance of CensusPlus was due to problems with operations and the computer survey instrument or to flaws in the method itself. Because the Bureau has extensive experience with the DSE method, and because that method has been performing well in recent tests, the Bureau decided in March 1997 that it will use DSE as part of the ICM procedures planned for the 2000 Census. The Bureau began testing redesigned procedures and computerized survey instruments for ICM in its 1996 Community Census, for which the ICM survey portion started in January 1997. Among the planned changes were revisions to the computerized survey instrument to make it easier for respondents and interviewers to move through the questions. Also, the ICM schedule was extended by about 2 weeks to allow the Bureau to load all census data on the laptop computers before ICM interviews. The Bureau also intended to use the longer schedule to implement a special operation to try to convert ICM noninterviews into completed interviews, thus alleviating some of the missing data problems experienced in the 1995 Census Test. A similar nonresponse conversion operation for the 1990 PES reduced the noninterview rates to about 1.5 percent. The 1996 test should provide information to determine whether the operational problems have been addressed sufficiently, but the evaluations from the 1996 Community Census are not expected to be completed until fall 1997. Page 64 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues The effects of the proposed new statistical methods on the success of Combined Effects of census operations and the quality of the resulting data need to be viewed Sampling for in combination. The Bureau’s research to date illustrates that there are Nonresponse and ICM many trade-offs and interrelationships between the components of alternative census designs. Sampling a portion of the nonresponse workload can save time and money compared to attempting to follow up 100 percent of the nonrespondents, but it is not likely to significantly improve census accuracy. The ICM is designed to address problems with census accuracy, but it is unlikely to be successful unless preceding data collection efforts, in particular nonresponse follow-up, are completed on schedule. The results from Bureau research suggest that the statistical methods proposed by the Bureau for 2000 should reduce the bias observed in past census data (i.e., the differential undercounts), but these methods would also introduce additional random error in the data because of sampling. Similarly, designs that do not include sampling for nonresponse follow-up or ICM also involve trade-offs. For example, such designs would be easier to explain to the public and more closely resemble past censuses. They would not introduce the additional level of uncertainty in the results that accompanies sampling. However, the Bureau’s research results and projections also suggest that those designs would be more expensive and show no likelihood of reversing or substantially reducing accuracy problems in census data. Projections of the expected accuracy, equity, costs, and operational feasibility of alternative census designs are likely to change as the Bureau’s research continues. The Bureau and other census observers have identified a number of areas in which additional research and decisions are needed with regard to the details of the proposed statistical methods. Such research is particularly critical because technical changes and refinements can affect census results. Results From Computer To determine the potential levels and sources of error for various levels of Simulations Illustrate geography, the Bureau relied primarily on computer simulations. At our Trade-Offs in Accuracy request, the Bureau provided us data from its most recent simulations of the results that would likely be produced in the 2000 Census using various alternative census designs. According to Bureau officials, one important advance in this research, compared to previous simulations, was that they were able to use detailed information from the operational data files of the 1990 Census about the processing of census forms collected from the nation’s housing units. For simulating the results of sampling for nonresponse follow-up, the most important operational items were the Page 65 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues response status (i.e., whether or not a given unit responded to the census by mail) and the check-in date of each census form (which identified when the Bureau was able to collect data for each unit). Using this information, the Bureau was able to identify nonresponding housing units and define the nonresponse follow-up sampling universes for different designs with a high degree of accuracy, yielding more realistic results with fewer assumptions than were needed for previous simulations. To account for expected growth in the population, the Bureau used population projections for the counts and characteristics of persons in 2000. In addition, the Bureau assumed in its simulations that the percentage undercounts (and overcounts) measured for population groups by the 1990 PES would also apply in the 2000 Census. The Bureau’s summary chart showing projected results from its simulations of alternative designs for conducting the 2000 Census, as of June 1997, is presented in appendix II. Because of the limitations of this research, the Bureau’s estimates should serve only as a rough illustration of possible results in 2000. The most important limitation is that actual results can vary from the expected results produced in theory or in a computer simulation. Among other limitations are that (1) the Bureau continues to refine the different methods it is studying; (2) as new information becomes available, such as data on expected staffing and pay levels in 2000, the results may vary; and (3) more research is needed to better understand and quantify some of the effects and sources of error that are not reflected in this current set of results.11 Despite these caveats, the simulation results suggest that, relative to the size of the population being estimated, the new methods proposed by the Bureau would generally reduce the level of error in census data for the nation, states, congressional districts, and most census tracts. Results near the levels projected by the Bureau’s simulations for its refined plan would represent a reduction in the relative error levels experienced in the 1990 Census, as well as in the levels projected for a 2000 Census design that incorporates all proposed improvements in 1990 Census procedures except those involving sampling or statistical estimation. The Bureau’s simulation results for alternative designs also illustrate one of the major trade-offs in accuracy between designs that use sampling and statistical estimation and those that do not. While the data showed that 11 For example, the procedures used to model population estimates for all smaller geographic areas, such as tracts, using a limited amount of sample data also generate errors. The potential magnitude of those errors and the validity of the assumptions used in the models are the subjects of research both within and outside the Bureau. Page 66 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues most places and geographic levels had lower rates of error using the methods the Bureau proposes to use in 2000, some places had lower rates of error using conventional census methods without sampling and statistical estimation. The simulation data suggest that the Bureau’s current design for the 2000 Census would likely produce results that appear more accurate or more equitable by at least three broad criteria: (1) the average levels of error are better; (2) the shape of the error distributions is compressed closer to the average levels; and (3) the cumulative error distributions also appear to be better. For example, the difference between the average relative error rates for the Bureau’s current design for the 2000 Census and a conventional design without sampling range from 0.8 percent at the level of census tracts (1.1 percent compared to 1.9 percent) to 1.8 percent for the national total (0.1 percent compared to 1.9 percent). The distribution of tracts, using the Bureau’s current design, showed that over 90 percent of tracts had results between 0.5 percent and 2.0 percent relative error; the comparable range to account for 90 percent of tracts using conventional procedures would extend from under 0.5 percent to between 4.0 and 4.5 percent. In terms of the cumulative distributions, while the data again indicated that over 90 percent of tracts had error rates of less than 2.0 percent using the Bureau’s current design, only about 63 percent of tracts had rates of less than 2.0 percent using the conventional procedures. However, the results also show that some places that should have very low error rates using a conventional census design could have higher error levels if sampling for nonresponse and ICM are used. For example, the Bureau’s current plan for the 2000 Census is designed to achieve a relative error rate of 0.5 percent for all but the largest states.12 But its simulations projected that two states could have error rates of less than 0.5 percent by using a conventional design without sampling in the 2000 Census. (In the 1990 Census, five states had estimated error rates of under 0.5 percent.) The trade-off is more noticeable for smaller geographic areas. The smallest geographic areas for which we have detailed data from the Bureau simulation are census tracts.13 Using a trimmed data set, the Bureau calculated that its current plan for 2000 (using direct sampling for nonresponse follow-up, together with ICM) would produce less error for 12 For the four states with a population of over 12 million persons (California, Florida, New York, and Texas), the Bureau’s plan was designed to ensure that the standard errors from sampling did not exceed 60,000 persons, which produced relative error rates of under 0.5 percent for those states. 13 The simulation did not produce block-level data on the potential effects of the Bureau’s current plan and options, but Bureau officials told us they intend to contract for a study of the effects on block data. Page 67 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues 64 percent of census tracts.14 The time-truncation option produced less error for 54 percent of tracts, and the 90-percent truncation option produced less error for only 51 percent of tracts. The converse of these figures is that the 1990 Census procedures performed better for approximately 40 to 50 percent of tracts, depending on the option used for comparison. The question therefore becomes, how much better or worse are the error levels for individual tracts using different designs? The Bureau is still working with and studying the data set from its latest simulations and has not yet produced detailed data tables to address that question directly. But it did provide us with data from its simulation showing the distribution of census tracts by the level of relative error. Table I.1 provides summary information on the distribution of census tracts by the level of relative error. For this table, we chose to present data for all 60,128 tracts from the untrimmed data set. We used the untrimmed data set because we believe that the ends of the distributions, which the Bureau trimmed from the other data set, matter unless the Bureau decides not to apply the same methods to these tracts. 14 The Bureau “trimmed” the number of census tracts represented in these calculations by removing tracts with unusual characteristics (e.g., those with especially small or large populations). Trimming reduced the number of tracts from 60,128 to 56,022. A more detailed explanation of trimming is included in the Bureau’s footnotes to its summary chart in appendix II. Page 68 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Table I.1: Distribution of Census Tracts by Relative Error Level Using Alternative Census Designs Relative error level in percentage 0.5 to 1.0 to 1.5 to 2.0 to 2.5 to 3.0 to 3.5 to 4.0 to 4.5 to Design alternative < 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 > 5.0 Percentage of tracts Direct sampling 1.5 39.5 41.6 11.6 3.3 1.1 0.5 0.2 0.1 0.1 0.4 90-percent truncation 1.5 8.2 40.8 32.9 10.7 3.4 1.3 0.5 0.2 0.2 0.4 Time truncation 1.5 16.1 46.7 25.0 6.7 2.1 0.8 0.3 0.2 0.1 0.4 No sampling 22.2 12.1 15.0 13.1 8.8 6.6 5.1 4.0 3.5 2.7 6.8 Cumulative percentage of tracts Direct sampling 1.5 41.0 82.6 94.2 97.5 98.7 99.2 99.4 99.5 99.6 100.0 90-percent truncation 1.5 9.7 50.5 83.4 94.0 97.4 98.7 99.2 99.4 99.6 100.0 Time truncation 1.5 17.6 64.4 89.3 96.1 98.2 98.9 99.3 99.5 99.6 100.0 No sampling 22.2 34.3 49.4 62.5 71.3 77.9 82.9 87.0 90.4 93.2 100.0 Note 1: Data presented reflect simulation results for 60,128 census tracts. Note 2: Relative error for the direct sampling, 90-percent truncation, and time-truncation design alternatives is the combined sampling error from sampling for nonresponse follow-up and ICM in simulations of the 2000 Census. It does not include possible errors from other sources, such as any bias in the statistical models used to produce the tract-level estimates. Note 3: Relative error for the no sampling design alternative (i.e., without using sampling for nonresponse follow-up or ICM) is the absolute value of the estimated undercount or overcount rate for tracts in simulations of the 2000 Census. It does not include possible errors from other sources. Also, although model error would not apply in the actual use of the no sampling design, it does represent an unmeasured error component in producing these tract-level estimates. Note 4: Cumulative percentages may be affected by rounding. Source: Bureau of the Census There is a clear difference in the shape of the distributions for the design alternatives using sampling for nonresponse and ICM, and the distribution for a design that does not use those procedures. The distributions of tract data for designs using these new methods are compressed toward the average relative error level for tracts. The distribution of tracts for the conventional census design, while showing that about 22 percent of all tracts had minimal net undercounts (i.e., less than 0.5 percent), is more dispersed across the range of error levels. This suggests that the tracts that would have relatively more error using the new methods may have only slightly more error. Figure I.2 illustrates the different distributions of census tracts by the relative error level produced by the Bureau’s selected design for Census Page 69 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues 2000 (the direct sampling option for nonresponse follow-up together with ICM) and a census design that does not include sampling for nonresponse follow-up and ICM. The patterns are similar when the results for the 90-percent truncation and time-truncation designs are graphed. Again, the distributions are compressed toward the average error level. However, since the averages for those designs are slightly higher than for direct sampling, the trade-off compared to a conventional design is more pronounced. Page 70 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Figure I.2: Distribution of Census Tracts by Error Level Shows Trade-Off Between Direct Sampling and No Sampling Designs Percentage of census tracts 100 90 80 70 60 50 40 30 20 10 0 < 0.5 0.5--1.0 1.0--1.5 1.5--2.0 2.0--2.5 2.5--3.0 3.0--3.5 3.5--4.0 4.0--4.5 4.5--5.0 > 5.0 Percentage of relative error level No sampling design Direct sampling design No sampling (cumulative) Direct sampling (cumulative) Note 1: Data presented reflect simulation results for 60,128 census tracts. Note 2: Relative error for the direct sampling design is the combined sampling error from sampling for nonresponse follow-up and ICM in simulations of the 2000 Census. It does not include possible errors from other sources, such as any bias in the statistical models used to produce the tract-level estimates. Note 3: Relative error for the no sampling design (i.e., without using sampling for nonresponse follow-up or ICM) is the absolute value of the estimated undercount or overcount rate for tracts in simulations of the 2000 Census. It does not include other possible errors. Source: Bureau of the Census data. Page 71 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues The Bureau’s research over the past several years also showed that, while both sampling for nonresponse follow-up and ICM procedures introduce sampling error into census data, they do so in different proportions at different geographic levels. Bureau officials found that, on average, sampling for nonresponse follow-up would contribute most of the sampling error or variability for smaller geographic areas, such as tracts and blocks, while ICM would contribute almost all of the error at the level of congressional districts and larger geographic areas. In general, relative sampling error increases as the population of a geographic area decreases. Simulation results also showed that sampling errors could be significantly high for some very small areas or population groups, although the level of nonsampling errors for such areas and groups can also be high when conventional counting methods are used. This highlights the need for additional research to identify the block-level effects of different census designs. However, relative variability becomes smaller as block-level data are aggregated to larger geographic areas, and the simulations indicated that data for areas or groups of equivalent population size are likely to have similar levels of sampling variability. In large part, the accuracy trade-off in the Bureau’s proposed approach may entail accepting that sampling and statistical estimation would introduce random sampling error in the data, especially for small geographic areas (such as most blocks and census tracts with 1,000 or fewer persons), but would reduce systematic bias in the results for all larger geographic areas. Bias in this context occurs because certain individuals and households are more likely to be missed by the census. To the extent that these missed units have distinctive characteristics, the resulting census data will be biased since these units are not included in the census tabulations. Random sampling error refers to the fact that one random sample will differ somewhat from another even if the two samples are drawn from the population in the same random way. The magnitude of random sampling errors can be estimated from an actual sample, and it is possible to limit the magnitude of random sampling errors as part of the sample design process. Bureau’s Cost Projections The Bureau has been conducting research to estimate the overall effects of Show Cost Savings Are using alternative designs on the cost of the 2000 Census. To develop cost Possible Using Its Current projections for the full census cycle, the Bureau used two cost models, one to generate estimates for Bureau headquarters activities and the Design for Census 2000 second, called the Year 2000 Cost Model, to generate expected costs in all nonheadquarters components of the census cycle for fiscal years Page 72 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues 1997-2001, using information from the 1990 Census and research for the 2000 Census. Among the underlying design principles of the Year 2000 Cost Model are that it is intended to replicate the whole census process and model interrelationships among operations and activities. For example, if the assumed response rate changes, a domino effect may result for many activities and costs, in areas as diverse as recruiting and training of census enumerators, printing, and postage. The Bureau contracted with an outside consultant to provide validation of the Year 2000 Cost Model’s logic. The firm, Booz-Allen & Hamilton, Inc., was responsible for writing an executive summary explaining how the model operates and detailing sources of model inputs. Assuming the Bureau can obtain an overall mail response rate of about 67 percent, it estimated in June 1997 that the cost for the entire 2000 Census cycle would range from about $4.0 billion to $4.8 billion (in 2000 dollars) for different designs. Given the cost estimates provided to us by the Bureau, its selected design, which would use the direct sampling option for nonresponse follow-up, appears to offer the greatest potential to control the overall cost of the 2000 Census. According to the Bureau’s estimates, the difference between its current selected design and a plan that would not use sampling for nonresponse or ICM to produce the final counts would be about $700 million to $800 million. Cost projections for all designs may increase depending on changes in a number of cost-model assumptions, such as the pay and turnover rates that affect staffing costs and the size of the nonresponse workload. The Bureau was still examining and validating wage rate information from a study by Westat, Inc., when we completed our work. That study recommended higher wage rates to ensure an adequate labor supply in the 2000 Census, but the Bureau has not included those rates in its current cost estimates. Costs are also likely to increase if the Bureau cannot fully implement all of the proposed changes in operations for the 2000 Census as planned. For example, the Bureau may need to rely more heavily on labor-intensive clerical procedures than what is now reflected in the cost-model estimates if new technology does not work as well as planned. Costs would also escalate if the efforts planned to encourage mail responses to the census do not increase response rates as much as the Bureau anticipates, resulting in a larger nonresponse workload than the Bureau has been projecting for the 2000 Census. Page 73 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues Additional Research, There is still uncertainty surrounding the potential results that would be Testing, and Clarification produced by alternative census designs and methods and questions that of Objectives Remain to Be need to be answered. In the relatively short time available before the next census, the Bureau has identified additional research and testing that it Completed Before the 2000 needs to complete in order to refine its plans, as well as to address Census broader concerns expressed by some Members of Congress and other observers about the Bureau’s proposal to expand the use of statistical methods. The remaining work to prepare for Census 2000 also includes reaching decisions on some technical details of the proposed methods that can affect the final results. Bureau officials have been developing a prioritized list of research topics on sampling and estimation methods in the census that need further work. For example, the design and selection of the ICM sample (i.e., identifying the number and types of blocks that would be surveyed in areas across the country) would need to be determined on the basis of the desired level of precision in estimates for various levels of geography and population groups. The Bureau also plans to continue research to refine the overall estimation methods and procedural details for producing a census data file. One of the Bureau’s goals for this research is to develop a way to take the results from the proposed estimation methods and place persons down to the level of individual housing units in each block. Another key research topic is determining when and how to produce direct and indirect population estimates. Evaluations of the 1995 Census Test also indicated the need for more development of the procedures and software used for computerized matching of addresses and individuals listed by the ICM and the other census operations. These are only some of the remaining topics for investigation that Bureau researchers identified. Issues that have been raised in Congress and by observers of the census planning process, including members of the statistical community, concerning the expanded use of statistical methods in the 2000 Census also pose important questions for the Bureau’s research efforts. These issues generally can be grouped into two areas. The first area involves questions about the technical soundness of the proposed methods, such as whether the underlying assumptions are valid, the proposed methods are statistically robust (i.e., that the results produced are not overly sensitive to variations in reasonable assumptions and alternatives to the production procedures), and the effects and consequences of using and combining these methods are well understood. For example, one important question in weighing the effects of alternative designs is the extent to which the Bureau’s estimates for the precision of population estimates produced by Page 74 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues various methods accurately account for all sources of error. The second set of issues involves questions about the operational feasibility of the proposed procedures. Because the sample surveys that the Bureau intends to employ in 2000 are large and complex, some observers are concerned about whether all components of the Bureau’s plan can be implemented effectively and with limited errors in a decennial census environment. One reason for such concerns is that, several months after the July 1991 census adjustment decision, the Bureau discovered a computer coding error in the 1990 PES estimation procedures. Correcting for that error, together with other subsequent modifications and edits, lowered the PES estimates of the 1990 net undercount by about half a percentage point, from around 2.1 percent to about 1.6 percent. The Bureau should be better able to address both sets of issues as it completes more research in the priority areas it has identified and undertakes evaluations of how the proposed statistical methods performed in the 1996 Community Census and in the Census 2000 Dress Rehearsal in 1998. The work that remains to be done by the Bureau is important because decisions about the details of its methods and procedures can affect the census results. Some of these technical details involve policy decisions, such as defining what sample allocation is equitable or what level of precision the Bureau’s methods should attempt to achieve for different levels of census geography. One of the most difficult challenges regarding the design of census-taking procedures is that the formula for apportioning seats in the House of Representatives is mathematically very sensitive to small changes in the number of people in states that receive the last few seats. Variations and errors in conventional census data collection can also affect apportionment; this is not a problem unique to the use of statistical methods. With the current apportionment formula and size of the House of Representatives, a perfect count or apportionment is not likely using any census design, as demonstrated in some of the Bureau’s research. The question is which design will produce better estimates of the population. When Secretary of Commerce Robert Mosbacher made his decision not to adjust the 1990 Census using the results of the PES, he recognized this dilemma and noted that some sensitivity should be expected. He pointed out that no production of the complexity of the census could be completely prespecified and that technical decisions were made in the course of the estimation procedure following the 1990 Census. However, in commenting on the precedent for future censuses, the Secretary also noted that there are different implications to the many decisions that are Page 75 GAO/GGD-97-142 2000 Census Design Appendix I The 2000 Decennial Census With Statistical Sampling and Estimation: An Overview of Operational and Technical Issues made during the course of the census process (when “the decision maker does not know the import of his decision”) and the decisions made when the results of different choices can be known. The House Committee on Government Reform and Oversight raised similar concerns about the potential subjectivity of sampling and statistical estimation methods in its report released on September 24, 1996. To mitigate such concerns about the subjectivity of its proposed statistical methods in the 2000 Census, the Bureau expects to subject its plans and procedures for implementing the methods to the review and scrutiny of professional experts, advisory committees, and other stakeholders. According to the Bureau, once the detailed procedures for the statistical methods have been developed by the Bureau and accepted by the reviewers, these procedures will be “frozen” to ensure that there is no introduction of subjectivity into the results for Census 2000. Page 76 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census The Bureau’s summary data and information on alternative designs for conducting the 2000 Census, as of June 1997, are presented in this appendix. Figure II.1 shows the projected error rates from the Bureau’s simulations for selected design alternatives, together with cost estimates for those designs and other explanatory information in the accompanying footnotes. Please note that, while we chose to present data in appendix I from an “untrimmed” data set of simulation results for all 60,128 census tracts, the Bureau used the “trimmed” set of results for 56,022 tracts in its chart to reduce the potential for unusual tracts to affect the results. The Bureau’s use of trimming, along with other details on the methods and assumptions used to produce the chart data, is discussed in the footnotes. In general, close attention to the Bureau’s footnotes is important for understanding the information presented in the chart and the limitations of the Bureau’s data. We made some minor formatting changes needed to reproduce the Bureau’s chart in our report and added a figure title and source. Otherwise, the following material in this appendix presents the Bureau’s original text, data, and notes. Page 77 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census Table II.1: Bureau of the Census Summary Data on Projected Costs and Accuracy of Selected Census 2000 Alternative Methodologies NATIONAL STATES (excluding DC) THE CENSUS BUREAU’S PLAN Cost in Error, by sourceb Error, by sourceb AND ALTERNATIVE 2000 Misses/ Misses/ METHODOLOGIES FOR dollars Double Combined Double Combined CONDUCTING CENSUS 2000 (billions)a Countsd Sampling Errore Countsd Sampling Errore THE PLAN TO ASSURE MEETING IMPROVED ACCURACY AND REDUCED COST GOALS OF CENSUS 2000 (Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate) Refined Plan: $4.0 * 0.1% 0.1% * 0.5%g 0.5%g - Direct Sampling for NRFU - Census Tract Response Control (0.2% to (0.2% to - Quality Check 0.5%)g 0.5%)g ALTERNATIVES THAT DO NOT MEET IMPROVED ACCURACY AND/OR REDUCED COST GOALS OF CENSUS 2000 (Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate) Conduct Census 2000 Using $4.7 1.9% N/A 1.9%d 1.9% N/A 1.9%d Improved Proceduresj to Except there is: $4.8k,l (0.4% to (0.4% to - Full NRFU (no sampling) 3.2%)j 3.2%)i - No Quality Checkk - An Evaluation Study + Conduct Census 2000 Using $4.4l * 0.1% 0.1% * 0.5%g 0.5%g Improved Procedures Except there is: (0.2% to (0.2% to - Full NRFU (no sampling) 0.5%)g 0.5%)g (Still includes the Quality Check) SCENARIO SHOWN FOR REFERENCE AND COMPARISON ONLY; THIS IS NOT AN ALTERNATIVE FOR CENSUS 2000m (1990 forms/single mail contact/pro bono advertising strategy yielding about 55 percent mail response rate) Conduct Census 2000 Using 1990 $4.8k,n 1.9% N/A 1.9%d 1.9% N/A 1.9%d Proceduresj - Full NRFU (no sampling) (0.4% to (0.4% to - No Quality Checkk 3.2%)i 3.2%)i - PES Page 78 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census CONGRESSIONAL DISTRICTS CENSUS TRACTS b Error, by source Error (trimmed), by sourceb,c Misses/ Misses/ Double Combined Double Combined Countsd Samplingf Errore Countsd Sampling Errore Comments * 0.6%h 0.6%h * 1.1%h 1.1%h - Low combined error for small geographic areas (0.3% to (0.3% to (0.6% to (0.6% to 2.3%)i 2.3%)i 2.4%)c,i 2.4%)c,i 1.9% N/A 1.9%d 1.9% N/A 1.9%d - Much higher cost than the Refined Plan (–1.2% to (–1.2% to (–1.2% to (–1.2% to - Higher combined error for all 7.0%)i 7.0%)i 6.2%)c,i 6.2%)c,i geographic areas - A PES is used to evaluate census quality; results available after delivery of apportionment totals - Increased activities (publicity, followup visits at all vacant housing units, etc.) to attempt to achieve census coverage consistent with 1990 levels * 0.6%h 0.6%h * 0.8%h 0.8%h - Higher cost than the Refined Plan (0.3% to (0.3% to (0.4% to (0,4% to - Low combined error for small 2.3%)i 2.3%)i 1.9%)c,i 1.9%)c,i geographic areas - Increased risk of management failure 1.9% N/A 1.9%d 1.9%d N/A 1.9%d - Much higher cost than the Refined Plan (–1.2% to (–1.2% to (–1.2% to (–1.2% to - Much higher combined error for 7.0%)i 7.0%)i 6.2%)c,i 6.2%) c,i all geographic areas - A PES is used to evaluate census quality; results available after delivery of apportionment totals (continued) Page 79 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census NATIONAL STATES (excluding DC) THE CENSUS BUREAU’S PLAN Cost in Error, by sourceb Error, by sourceb AND ALTERNATIVE 2000 Misses/ Misses/ METHODOLOGIES FOR dollars Double Combined Double Combined CONDUCTING CENSUS 2000 (billions)a Countsd Sampling Errore Countsd Sampling Errore DISCARDED ALTERNATIVES THAT MEET IMPROVED ACCURACY AND REDUCED COST GOALS OF CENSUS 2000 (Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate) The Original Plan: Truncate at 90% $4.2o * 0.1% 0.1% * 0.5%g 0.5%g - Census Tract Response Control - Sampling for NRFU (0.2% to (0.2% to - Quality Check 0.5%)g 0.5%)g Implementation Alternative 1: Time $4.6o * 0.1% 0.1% * 0.5%g 0.5%g Truncation - Census Tract Response Control (0.2% to (0.2% to - Sampling for NRFU 0.5%)g 0.5%)g - Quality Check Page 80 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census CONGRESSIONAL DISTRICTS CENSUS TRACTS Error, by sourceb Error (trimmed), by sourceb,c Misses/ Misses/ Double Combined Double Combined Countsd Samplingf Errore Countsd Sampling Errore Comments * 0.6%h 0.6%h * 1.5%h 1.5%h (0.3% to (0.3% to (0.8% to (0.8% to 2.3%)i 2.3%)i 2.9%)c,i 2.9%)c,i * 0.6%h 0.6%h * 1.3%h 1.3%h (0.3% to (0.3% to (0.7% to (0.7% to 2.3%)i 2.3%)i 2.7%)c,i 2.7%)c,i Legend: * -Too small to measure with reasonable cost and operations NRFU -Nonresponse Followup Quality Check -Also known as the Integrated Coverage Measurement Survey or ICM N/A -Not applicable; in the absence of any sampling process, there is no sampling error + -The evaluation study would yield coverage measures similar to the 1990 PES; results not available until after delivery of the apportionment totals PES -Post Enumeration Survey; results not available until after delivery of the apportionment totals a The dollar figures shown do not include the higher wage rates recommended by Westat, Inc. to ensure an adequate labor supply for Census 2000. The results of the Westat study are still being evaluated by the Census Bureau for their effect on the cost of the Refined Plan and each alternative to the plan for Census 2000. For the Refined Plan and Alternatives 5 and 6 that use sampling to reduce the workload for nonresponse followup, the effect is likely to be less than $100 million. For the alternatives that include full (rather than sample) nonresponse followup operations, the nearly 60 percent (11.9 million housing unit) increase in workload, multiplied by the recommended wage rates, could add several hundred million dollars to the estimated cost figure shown. b All error figures were derived using simulations of 1990 census estimates of undercounts and overcounts for census tracts. To account for expected growth in the population of the United States through the year 2000, the 1990 census tract population totals by race and Hispanic origin were projected using the factors shown on Attachment 1. The projection factors were derived from a widely used process known as Demographic Analysis. The simulations assume that the percentage undercounts (and overcounts) measured for each group in the 1990 Post Enumeration Survey also would apply in Census 2000. To determine the amount of undercount or overcount for each census tract, the projected population totals for each were computed with and without the results of the 1990 PES for each region and for the various segments of the population within it. The totals for the specific census tracts in each geographic entity were summed to derive the error rates for the more populous geographic levels shown on this chart, such as congressional districts, states, and the Nation. Page 81 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census c The average census tract has a resident population of about 4,000 people; however, some census tracts have very few people and some are far more populous because of specific local circumstances and changes in settlement pattern since the census tracts initially were established. To evaluate the effect of sampling on all except the most unusual census tracts, the error distributions shown were “trimmed” (a widely used practice in analyzing large data sets) by removing values for census tracts that contain only group quarters population, census tracts that contain 10 or fewer people, the least populous 3 percent of all remaining census tracts, and the most populous 3 percent of all remaining census tracts. Trimming was done separately for each alternative shown. d The single figure shown in the “Misses/Double Counts” column is the estimated net undercount rate. In 1990, this rate was 1.6 percent, meaning that the 1990 census population total for the United States failed to include more than 4 million people. The estimated net undercount rate in 2000 is projected to be 1.9 percent if there is no Quality Check operation. This estimate is based on growth rates in the populations most difficult to count; this means Census 2000 likely will fail to include more than 5.2 million people. The net undercount figure fails to convey the magnitude of enumeration errors actually made during a decennial census. Several studies looked at the component measures of “gross” undercount — total people missed and total erroneous enumerations. There is no generally accepted definition of gross undercount, and different studies had widely divergent totals for people missed and people included erroneously. Regardless of the components included in the different measures, the net undercount — people missed minus people included erroneously — was about 4 million people. e The “Combined Error” figures for congressional districts and census tracts do not include error due to modeling; model error would not apply to either of the nonsampling alternatives. f One might expect the “Sampling Error” figures for a congressional district always to be larger than the sampling error figures for a state. However, some congressional districts are more populous than the least populous states. g To ensure highly accurate population totals for each state (because these totals are used to apportion the 435 seats in the U.S. House of Representatives among the 50 states) the Census Bureau has designated the sampling processes to yield a “coefficient of variation” of 0.5 percent. To control the “standard error” (the expected amount of variation in the number of people that could be included in each state’s total) the designed coefficient of variation will actually be smaller than 0.5 percent in the four states that had a 1990 census population of 12 million or more (Texas, New York, Florida, and California); the figures in parenthesis show the range in variation. h The single figure shown is the “average error” (the numeric mean) of the estimated error for each individual entity (all congressional districts or all census tracts) in this geographic level. i The figures in parenthesis show the “range of error” (lowest and highest situations) for all entities in this geographic level. Negative figures identify overcount estimates; overcounts happen for several reasons, including erroneous inclusion of some people on completed census forms. All other figures identify undercount estimates; undercounts happen when residents of an area are not included on any census form. j This scenario restores several coverage improvement activities as a substitute for the “Quality Check” planned for Census 2000. These activities have been viewed by the National Academy of Sciences, the Inspector General for the Department of Commerce, and many others as only marginally effective. However, they are the only alternatives expected to deal with some aspects of the total undercount and the differential participation rates among some segments of the population. Page 82 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census k An obvious question is, “Why should the Congress pay for the improvements (other than sampling) offered by the plan for Census 2000 when the cost is nearly as high as repeating 1990 census methods and the quality appears no better?” The answer starts with understanding that the census-taking environment in 2000 will be significantly more difficult than it was in 1990: Mistrust of, and cynicism about, the government and its programs has increased; people are increasingly resistant to intrusions on their time; there is increasing concern about privacy; the number of people working more than one job has increased, along with the number of multiple-worker families, so fewer people are home when an enumerator visits; and so forth. Many major improvements planned for Census 2000 — a better address list, improved through partnerships with the U.S. Postal Service and local and tribal governments, easy to read and complete census forms, fewer questions to answer, multiple opportunities to respond, improved publicity, and improved procedures for dealing with those who have no “usual” residence — are needed to keep initial response rates about even with those in 1990. Studies from past censuses have shown clearly that responses received by mail are of better quality than those gathered by temporary field staff. So high initial response improves census quality and reduces census cost, freeing resources to deal more effectively with the 33 percent of households likely not to respond initially. Those households that do not respond initially are likely to have attitudes that make finding and including them more difficult. Thus, in the absence of the “Quality Check” operation planned for Census 2000, the Census Bureau believes it would need to implement several additional procedures in an attempt to ensure a complete and accurate census. These activities would require the expenditure of additional funds, even though the National Academy of Sciences, the Inspector General for the Department of Commerce, and many others, view these activities as only marginally effective. The Census Bureau would include the following activities because they are the only alternatives to the Quality Check expected to deal with some aspects of the total undercount and the differential participation rates among various segments of the population. •Conduct Census Bureau followup visits at all vacant housing units — not just a sample of them — which increases requirements for temporary field staff. Added cost: $200 million •Conduct field followup visits for all incomplete questionnaires (i.e., primarily data omissions), which increases requirements for temporary field staff. Added cost: $150 million •Expand partnership activities with state, local, and tribal governments and with various community and business groups, which increases the requirements for temporary field staff. Added cost: $25 million to $50 million •Further expand marketing and publicity activities through increased media placements and Census Bureau outreach activities. Added cost: $50 million to $100 million •Deploy special activities and tactics (e.g., team enumeration and blanket census tract coverage) not otherwise included in the plan for Census 2000 to assure reaching those segments of the population traditionally difficult to reach and typically among the most undercounted. Added cost: $25 million to $50 million •Further expand enumerator supervision to assure greater quality in enumeration for nonresponse. Added cost: $25 million to $50 million Even with these added activities and their associated costs, there is no assurance that the Census Bureau would be able to hold the undercount rate to the levels achieved in 1990: We estimate the undercount would go up to approximately 1.9 percent of the total population. And, it would cost more than alternative approaches from which we would get better results! Page 83 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census l The Quality Check operation is estimated to cost $325 million. An obvious question is, “Why, if the Quality Check costs $325 million and Alternative 2 does not include the Quality Check, does Alternative 2 cost $300-400 million more than Alternative 3?” The answer is found in understanding that having a response from every household (every address), plus responses from all people living in group quarters (institutional facilities) and all other locations where people without a usual address reside on Census Day (including migrant workers and the people often referred to as “the homeless”) does not guarantee a complete enumeration of the population. Even when the Census Bureau accounted for every known address in the 1990 census, the census failed to include more than 4 million people. Only the Quality Check will find all the people missed by normal decennial census procedures. Thus, in the absence of the “Quality Check” operation planned for Census 2000, the Census Bureau believes it would need to implement several additional procedures in an attempt to ensure a complete and accurate census. These activities would require the expenditure of an additional $500-600 million, the details of which are described in footnote i. The Census Bureau would include theses additional activities because they are the only alternatives to the Quality Check expected to deal with some aspects of the total undercount and the differential participation rates among various segments of the population. Adding $500-600 million in new activities, while subtracting $325 million for the Quality Check, adds a net of about $175-275 million to the cost of Alternative 2. In addition, performing the Evaluation Study that will be needed to assess the completeness of Census 2000 in the absence of the Quality Check, adds an additional $125 million to the cost of Alternative 2, for a net increase of $300-400 million when compared with Alternative 3. m This scenario provides a reference point for comparison with earlier decennial census methods; it is not an alternative the Census Bureau would ever propose to use again. It assumes: •A 1990 census-taking environment in 2000; •No partnerships with the U.S. Postal Service or local and tribal governments to improve the address list and the maps used to guide temporary Census Bureau employees assigned to visit nonresponding households; •The continued use of 1990-style census forms designed to make processing easy for computers, not respondents; •Only one delivery of census forms to each address with no early notices, no reminder post cards, and no replacement forms; •No opportunity for responding via telephone; •No opportunity to pick up blank census forms at convenient locations if no form was delivered to the address of the person wishing to respond; •No use of U.S. Postal Service knowledge about vacant housing units; •No automated access to data tabulations for customer-specified geographic areas and population groupings; and, •No “Quality Check” to significantly reduce or eliminate the undercount in each state, each congressional district, and most local and tribal governments, or to significantly reduce or eliminate the differential undercounts among various segments of the population. Page 84 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census n The $4.8 billion figure was estimated in 1992 in response to a question from the General Accounting Office. Although the Census Bureau has learned many things in the intervening five years, this figure has been retained as a benchmark for purposes of comparison. It does not include improvements planned for Census 2000 to counter the ongoing decline in the mail response rate, estimated to reach 55 percent without such improvements. If the Census Bureau found it necessary to make nonresponse followup visits at 45 percent of all households, instead of the 33 percent expected when using the improved procedures planned for Census 2000, the extra workload and cost would divert significant financial resources away from improved publicity and local/tribal partnership activities, as well as additional special procedures needed to deal with the most reluctant residents at nonresponding addresses. Repeating 1990 census procedures also would require more followup visits to apparently vacant housing units, and other procedures aimed at trying to include the traditionally hard-to-count. o The cost figure shown is higher than originally estimated because of necessary refinements identified subsequent to excluding these options from consideration for Census 2000. Source: Bureau of the Census summary chart, footnotes, and attachment (June 18, 1997). Attachment 1 2000/1990 RATIOS Projected Not Hispanic Hispanic 1990 Census 2000 Amer. API + (can be any State Codes TOTAL Total White Black Indian “Other” race) UNITED STATES 248,709,873 1.10423 1.04748 1.14896 1.14548 1.46646 1.40312 63 01 Alabama 4,040,587 1.10147 1.09159 1.11352 1.06300 1.48703 1.47367 94 02 Alaska 550,043 1.18771 1.13427 1.22785 1.07502 2.32220 1.67618 86 04 Arizona 3,665,228 1.30893 1.23871 1.43201 1.22171 1.64231 1.55515 71 05 Arkansas 2,350,725 1.11939 1.11468 1.09301 1.22545 1.47510 1.78114 93 06 California 29,760,021 1.09278 0.91384 1.02155 0.92357 1.44806 1.38473 84 08 Colorado 3,294,394 1.26525 1.22898 1.39307 1.36705 1.60175 1.40029 16 09 Connecticut 3,287,116 0.99910 0.95210 1.12244 0.96790 1.43362 1.34820 51 10 Delaware 666,168 1.15220 1.10324 1.28455 1.07276 1.59955 1.60449 53 11 District of Columbia 606,900 0.86230 0.91740 0.79814 0.67971 1.19415 1.24699 59 12 Florida 12,937,926 1.17741 1.09811 1.26960 1.18241 1.54617 1.51884 58 13 Georgia 6,478,216 1.21558 1.15992 1.30216 1.21543 1.80126 1.74751 95 15 Hawaii 1,108,229 1.13461 1.04357 1.02558 1.15571 1.16397 1.32293 82 16 Idaho 1,006,749 1.33748 1.30453 1.90003 1.42914 1.55623 1.82060 33 17 Illinois 11,430,602 1.05426 1.00040 1.08302 0.99879 1.40220 1.40206 32 18 Indiana 5,544,159 1.09025 1.07505 1.15474 1.11493 1.46407 1.42174 42 19 Iowa 2,776,755 1.04432 1.02733 1.27244 1.19320 1.59977 1.62551 47 20 Kansas 2,477,574 1.07697 1.04678 1.18499 1.11482 1.46760 1.47784 61 21 Kentucky 3,685,296 1.08392 1.07860 1.09028 1.14045 1.47735 1.48194 72 22 Louisiana 4,219,973 1.04849 1.00561 1.11374 1.02292 1.37983 1.27807 (continued) Page 85 GAO/GGD-97-142 2000 Census Design Appendix II Bureau of the Census Summary Information on Design Alternatives for the 2000 Census Attachment 1 2000/1990 RATIOS Projected Not Hispanic Hispanic 1990 Census 2000 Amer. API + (can be any State Codes TOTAL Total White Black Indian “Other” race) 11 23 Maine 1,227,928 1.02544 1.02216 0.91493 0.97796 1.36832 1.37853 52 24 Maryland 4,781,468 1.10314 1.01334 1.24195 1.10961 1.51274 1.72371 14 25 Massachusetts 6,016,425 1.03030 0.98139 1.20818 0.98018 1.45724 1.51768 34 26 Michigan 9,295,297 1.04127 1.01828 1.10506 1.04919 1.44585 1.28821 41 27 Minnesota 4,375,099 1.10393 1.06980 1.63299 1.25371 1.72112 1.75265 64 28 Mississippi 2,573,216 1.09425 1.08019 1.10752 1.04004 1.54860 1.42941 43 29 Missouri 5,117,073 1.08272 1.06675 1.13972 1.17665 1.43599 1.45827 81 30 Montana 799,065 1.18846 1.17243 1.54728 1.25065 1.66946 1.68137 46 31 Nebraska 1,578,385 1.08051 1.05432 1.23405 1.21358 1.68420 1.62682 88 32 Nevada 1,201,833 1.55704 1.44317 1.67091 1.42260 2.07420 2.21791 12 33 New Hampshire 1,109,252 1.10365 1.09617 1.06905 1.14838 1.47377 1.51390 22 34 New Jersey 7,730,188 1.05790 0.97179 1.12152 1.14476 1.66426 1.41281 85 35 New Mexico 1,515,069 1.22793 1.19314 1.23095 1.22370 1.33586 1.27163 21 36 New York 17,990,455 1.00866 0.93419 1.03862 1.03730 1.40773 1.26675 56 37 North Carolina 6,628,637 1.17328 1.15627 1.19086 1.16099 1.74674 1.56188 44 38 North Dakota 638,800 1.03583 1.01661 1.36511 1.25615 1.62593 1.61886 31 39 Ohio 10,847,115 1.04348 1.02408 1.13818 1.07222 1.41716 1.31574 73 40 Oklahoma 3,145,585 1.07214 1.04141 1.19289 1.10532 1.39829 1.43386 92 41 Oregon 2,842,321 1.19521 1.15897 1.32502 1.25145 1.58609 1.71517 23 42 Pennsylvania 11,881,643 1.02697 1.00368 1.10125 1.17127 1.48854 1.43937 15 44 Rhode Island 1,003,464 0.99416 0.94914 1.18525 1.16644 1.11675 1.65560 57 45 South Carolina 3,486,703 1.10650 1.09792 1.11148 1.03248 1.39562 1.41838 45 46 South Dakota 696,004 1.11648 1.09946 1.58438 1.20611 1.72261 1.68107 62 47 Tennessee 4,877,185 1.15992 1.14393 1.19369 1.29541 1.69500 1.76137 74 48 Texas 16,986,510 1.18443 1.09527 1.21758 1.14037 1.55113 1.35377 87 49 Utah 1,722,850 1.28102 1.24794 1.63259 1.46452 1.75503 1.61397 13 50 Vermont 562,758 1.09604 1.08778 1.48662 1.08601 1.68444 1.60148 54 51 Virginia 6,187,358 1.13086 1.07650 1.20923 1.08587 1.63047 1.67326 91 53 Washington 4,866,692 1.20377 1.15636 1.22703 1.25074 1.64470 1.67648 55 54 West Virginia 1,793,477 1.02649 1.02270 1.02458 1.01100 1.40708 1.46413 35 55 Wisconsin 4,891,769 1.08883 1.05972 1.31732 1.19069 1.78329 1.44430 83 56 Wyoming 453,588 1.15678 1.13747 1.28955 1.33115 1.65389 1.33373 Source: Bureau of the Census summary chart, footnotes, and attachment (June 18, 1997). Page 86 GAO/GGD-97-142 2000 Census Design Appendix III Comments From the Bureau of the Census Page 87 GAO/GGD-97-142 2000 Census Design Appendix IV Major Contributors to This Report James H. Burow, Assistant Director General Government Victoria E. Miller, Evaluator-in-Charge Division Timothy A. Bober, Senior Evaluator Jacqueline E. Matthews, Senior Evaluator Kiki Theodoropoulos, Senior Evaluator (Communications Analyst) Thomas M. Beall, Technical Advisor Thomas B. Jabine, Statistical Consultant Arthur J. Kendall, Senior Mathematical Statistician National Security and International Affairs Division Alan N. Belkin, Assistant General Counsel Office of General James M. Rebbe, Attorney-Advisor Counsel, Washington, D.C. Page 88 GAO/GGD-97-142 2000 Census Design Appendix IV Major Contributors to This Report Page 89 GAO/GGD-97-142 2000 Census Design Appendix IV Major Contributors to This Report Page 90 GAO/GGD-97-142 2000 Census Design Related GAO Products Addressing the Deficit: Budgetary Implications of Selected GAO Work for Fiscal Year 1998 (GAO/OCG-97-2, Mar. 14, 1997). High-Risk Series (GAO/HR-97-1 and 97-2, Feb. 1997). Addressing the Deficit: Updating the Budgetary Implications of Selected GAOWork (GAO/OCG-96-5, June 28, 1996). Decennial Census: Fundamental Design Decisions Merit Congressional Attention (GAO/T-GGD-96-37, Oct. 25, 1995). Addressing the Deficit: Budgetary Implications of Selected GAO Work for Fiscal Year 1996 (GAO/OCG-95-2, Mar. 15, 1995). Decennial Census: 1995 Test Census Presents Opportunities to Evaluate New Census-Taking Methods (GAO/T-GGD-94-136, Sept. 27, 1994). Decennial Census: Promising Proposals, Some Progress, but Challenges Remain (GAO/T-GGD-94-80, Jan. 26, 1994). Decennial Census: Test Design Proposals Are Promising, but Fundamental Reform Is Still at Risk (GAO/T-GGD-94-12, Oct. 7, 1993). Decennial Census: Fundamental Reform Jeopardized by Lack of Progress (GAO/T-GGD-93-6, Mar. 2, 1993). Transition Series: Commerce Issues (GAO/OCG-93-12TR, Dec. 1992). Decennial Census: 1990 Results Show Need for Fundamental Reform (GAO/GGD-92-94, June 9, 1992). 1990 Census: Reported Net Undercount Obscured Magnitude of Error (GAO/GGD-91-113, Aug. 22, 1991). 1990 Census Adjustment: Estimating Census Accuracy—A Complex Task (GAO/GGD-91-42, Mar. 11, 1991). Progress of the 1990 Decennial Census: Some Causes for Concern (GAO/T-GGD-90-44, May 21, 1990). Page 91 GAO/GGD-97-142 2000 Census Design Related GAO Products Critical Issues for Census Adjustment: Completing Post Enumeration Survey on Time While Protecting Data Quality (GAO/T-GGD-90-15, Jan. 30, 1990). (410022) Page 92 GAO/GGD-97-142 2000 Census Design Ordering Information The first copy of each GAO report and testimony is free. Additional copies are $2 each. Orders should be sent to the following address, accompanied by a check or money order made out to the Superintendent of Documents, when necessary. VISA and MasterCard credit cards are accepted, also. Orders for 100 or more copies to be mailed to a single address are discounted 25 percent. Orders by mail: U.S. General Accounting Office P.O. Box 6015 Gaithersburg, MD 20884-6015 or visit: Room 1100 700 4th St. NW (corner of 4th and G Sts. NW) U.S. General Accounting Office Washington, DC Orders may also be placed by calling (202) 512-6000 or by using fax number (301) 258-4066, or TDD (301) 413-0006. Each day, GAO issues a list of newly available reports and testimony. To receive facsimile copies of the daily list or any list from the past 30 days, please call (202) 512-6000 using a touchtone phone. A recorded menu will provide information on how to obtain these lists. For information on how to access GAO reports on the INTERNET, send an e-mail message with "info" in the body to: firstname.lastname@example.org or visit GAO’s World Wide Web Home Page at: http://www.gao.gov PRINTED ON RECYCLED PAPER United States Bulk Rate General Accounting Office Postage & Fees Paid Washington, D.C. 20548-0001 GAO Permit No. G100 Official Business Penalty for Private Use $300 Address Correction Requested
2000 Census: Progress Made on Design, but Risks Remain
Published by the Government Accountability Office on 1997-07-14.
Below is a raw (and likely hideous) rendition of the original report. (PDF)