oversight

2000 Census: Progress Made on Design, but Risks Remain

Published by the Government Accountability Office on 1997-07-14.

Below is a raw (and likely hideous) rendition of the original report. (PDF)

                 United States General Accounting Office

GAO              Report to the Ranking Minority Member,
                 Committee on Governmental Affairs,
                 U.S. Senate


July 1997
                 2000 CENSUS
                 Progress Made on
                 Design, but Risks
                 Remain




GAO/GGD-97-142
      United States
GAO   General Accounting Office
      Washington, D.C. 20548

      General Government Division

      B-276531

      July 14, 1997

      The Honorable John Glenn
      Ranking Minority Member
      Committee on Governmental Affairs
      United States Senate

      Dear Senator Glenn:

      This letter responds to your request that we update the information
      provided in our October 25, 1995, testimony on the Census Bureau’s plans
      for the 2000 Decennial Census.1 In that testimony, we summarized the
      work we did in reviewing the results of the 1990 Census, which was the
      most costly in history and which produced data that were less accurate
      than those from the 1980 Census, leaving millions of
      Americans—especially members of minority groups—uncounted. A key
      reason for the increased cost and decreased accuracy was a sharp decline
      in the proportion of households that returned questionnaires by mail,
      causing the Bureau to spend hundreds of millions of dollars to send
      Bureau employees, known as enumerators, to try to collect census
      information directly from individual citizens. We and others concluded
      that the established approach used for taking the census in 1990 had
      exhausted its potential for counting the population cost-effectively and
      that fundamental design changes were needed to reduce census costs and
      improve the quality of the data collected.

      In that testimony, we also detailed how several of the initiatives that the
      Census Bureau was planning for the 2000 Decennial Census were
      consistent with suggestions that we had made since the 1990 Census.
      These initiatives included (1) simplified and streamlined census
      questionnaires, (2) multiple mail contacts to prompt a response,
      (3) increased use of the Postal Service to improve the master address file
      for the census and identify vacant and nonexistent housing units, and
      (4) the use of statistical sampling and estimation procedures aimed at
      reducing the cost and increasing the accuracy of the census.

      However, in that testimony, we also raised concerns that the further the
      Census Bureau proceeded with design plans for conducting the 2000
      Census without input from Congress, the less Congress would be able to
      affect the census design without significant risk of wasted expenditures
      and unacceptable results. In the intervening months, the administration


      1
      Decennial Census: Fundamental Design Decisions Merit Congressional Attention (GAO/T-GGD-96-37,
      Oct. 25, 1995).



      Page 1                                                  GAO/GGD-97-142 2000 Census Design
                   B-276531




                   has been unable to come to agreement with Congress on critical design
                   and funding decisions. In February 1997, we designated the 2000
                   Decennial Census a new high-risk area because of the possibility that
                   delays could jeopardize an effective census and increase the likelihood
                   that billions of dollars could be spent and the nation still be left with
                   demonstrably inaccurate census results.2

                   The objectives of this report are to (1) provide information on the progress
                   that the Bureau has made on its plans concerning the initiatives discussed
                   in our October 1995 testimony and on other initiatives that the Bureau has
                   promoted since the testimony and (2) assess whether the Bureau has
                   demonstrated the feasibility of its plans for carrying out the 2000
                   Decennial Census.


                   Since our October 1995 testimony, the Census Bureau has continued with
Results in Brief   the planning of the new design initiatives that we and others have
                   suggested, which are aimed at increasing the mail response rate. This is
                   important because it will reduce the need for follow-up visits by census
                   enumerators, the most costly, difficult to manage, and error-prone
                   operation in the census. These initiatives include such changes as
                   redesigning the questionnaires used to collect information from the public
                   to make them shorter, simpler, and more user friendly, as well as
                   contacting people by mail more than one time to encourage responses.
                   The Bureau believes that its initiatives in this area will produce a mail
                   response rate of about 67 percent from the nation’s housing units, which
                   would be about 2 percent higher than the response rate achieved in 1990
                   and about 12 percent higher than the 55-percent rate the Bureau would
                   expect to achieve without the initiatives.

                   The Bureau also has continued to develop its plan for dealing with those
                   who do not respond by mail and for checking the quality of the results it
                   gets from mail responses and from visits by enumerators. Its current plan,
                   which was put forth in March 1997, is to statistically sample those who do
                   not respond to its mail survey. It plans to do this by directly sampling
                   nonrespondents in every census tract—which is a small geographic area
                   with an average population of about 4,000 people—in the country until it
                   has information on 90 percent of the housing units in each tract. It will
                   then use statistical methods to estimate the data for the remaining


                   2
                     The High-Risk Series (GAO/HR-97-2, Feb. 1997) is a special effort to review and report on the federal
                   program areas we have identified as high risk because of their vulnerability to waste, fraud, abuse, or
                   mismanagement.



                   Page 2                                                        GAO/GGD-97-142 2000 Census Design
B-276531




10 percent of the housing units by projecting the information it obtains
from its nonresponse follow-up sample.

The Bureau’s current plan then calls for staff to be sent to gather another
sample of 750,000 housing units and for these independently collected data
to be compared with the information obtained from the preceding data
collection efforts, such as mail-backs and nonresponse follow-up. Again
using statistical methods, the Bureau plans to use the results of its
750,000-household quality check to complete its final population totals.

The Bureau believes that this approach offers the best combination of
reduced costs, improved accuracy expected at various geographic levels,
and operational feasibility. According to the Bureau’s cost estimates, using
this plan would save between $700 and $800 million off the cost of using a
conventional census plan that incorporated all of the new initiatives
proposed for the 2000 Census except those involving sampling and
statistical estimation.

The Bureau developed accuracy estimates by simulating what the results
of the census would likely be under various design options. We reviewed
the Bureau’s simulations and identified several shortcomings in the
methods and assumptions it used for developing and presenting the
accuracy estimates. After the Bureau made several modifications, we
concluded that, if the Bureau’s methods and assumptions were properly
applied, the final data produced in June 1997 should be generally
reasonable for use in projecting the likely effects of the Bureau’s proposed
sampling and statistical estimation initiatives. We recognize that because
the actual census environment and methods in 2000 may vary from those
that were simulated and tested, the Bureau may not actually achieve these
results for the 2000 Census. However, because these types of data are the
best available to reflect the probable effects of an actual census, we
believe it is reasonable to use them to attempt to project the possible
effects of various Bureau design options on the 2000 Census.

The Bureau’s final data showed that, under the Bureau’s March 1997 plan
for the 2000 Census, the relative error in census data (i.e., the measured
error in terms of a percentage of the area’s population) would likely be
about 0.1 percent for the national total and an average of 0.5 percent for
states, 0.6 percent for congressional districts, and 1.1 percent for census
tracts. The Bureau’s simulations projected that a design that did not use
sampling for nonresponse follow-up and a quality check would likely
result in relative error rates of about 1.9 percent for the national total and



Page 3                                         GAO/GGD-97-142 2000 Census Design
B-276531




average error rates of 1.9 percent for states, congressional districts, and
census tracts.

Because of concerns about the potential effects of sampling at the local
level, we requested that the Bureau provide more detailed data on error
rates at the census tract level. The Bureau’s simulations showed that the
March sampling design plan would likely produce more accurate
population estimates in two-thirds of the census tracts than using a
conventional design once again. However, one third of the tracts would
likely have less accurate estimates when compared to the conventional
design.

As of July 1, 1997, the Bureau had not shared the detailed results of its
analysis with Congress, nor had it yet fully demonstrated the operational
feasibility of its current plan to Congress. Although the Bureau has revised
its plan for the 2000 Census several times since the October 1995
testimony, it provided only limited details to demonstrate and allow those
outside the Bureau to check and scrutinize the relative merits of the
various alternatives.

Citing, in part, the lack of sufficient data on the effects of the Bureau’s
proposed use of sampling and statistical estimation methods, some
Members of Congress have expressed concern about the Bureau’s plan.
They also questioned the use of sampling and statistical estimation on
constitutional and statutory grounds. While a draft of this report was with
the Bureau for comment, Congress enacted legislation (Public Law
105-18) requiring the Department of Commerce to provide detailed data
about the Bureau’s plan by July 12, 1997. It is unlikely that Congress and
the administration will come to an agreement on the design and its
associated funding level for the 2000 Census without continuous, full, and
open disclosure of the effects of its plan at all levels of geography. Thus,
the Bureau needs to continue to keep Congress informed of any changes
in its approach to the census or refinements to its data that it makes after
July 12, 1997.

The Bureau is planning a dress rehearsal for the 2000 Census in 1998 to
demonstrate and test its design features. It is important for Congress and
the administration to reach agreement on the design as soon as possible
before the dress rehearsal so that (1) the Bureau can test what it plans to
implement in 2000, (2) Congress and the Bureau can discuss the
operational feasibility of the plan in terms of the dress rehearsal results,
and (3) Congress and the Bureau can determine whether the dress



Page 4                                        GAO/GGD-97-142 2000 Census Design
             B-276531




             rehearsal outcomes are sufficiently similar to the results of the Bureau’s
             research and simulations to proceed with that design for the census. Time
             and resources could be wasted if the Bureau tests a plan in 1998 that
             Congress later finds it cannot accept.


             The decennial census is the nation’s most comprehensive and expensive
Background   statistical data-gathering program. The Constitution requires a decennial
             census of the population in order to reapportion seats in the House of
             Representatives. Public and private decisionmakers also use census data
             on population counts and social and economic characteristics for a variety
             of purposes. State and local redistricting; allocations of government
             funding; and many planning and evaluation activities, such as site
             selection for new schools, market research, and evaluations of local labor
             markets, rely on decennial census data. In addition, the census is the only
             national source of detailed population statistics for small geographic
             areas, such as towns or school districts, and for population groups, such
             as Native Americans.

             The Bureau has used short- and long-form questionnaires to carry out the
             decennial census. Most households are sent a short form to complete;
             however, some are asked to complete the long-form questionnaire. In
             1990, for example, about one in six households was required to complete a
             long-form questionnaire. Many federal agencies use information collected
             through the decennial census long-form questionnaire as a source of data
             for their own statistical and programmatic activities.

             Since 1970, the Bureau has used essentially the same methodology to
             count the vast majority of the population during the decennial census. It
             develops an address list of the nation’s housing units and mails census
             forms to those housing units that ask the occupants to mail back the
             completed forms. The Bureau then hires temporary census-takers, known
             as enumerators, by the hundreds of thousands to gather the requested
             information for each nonresponding housing unit.

             A critical factor affecting the cost of a census is the necessity for the
             Bureau to follow up on nonresponding housing units. A declining response
             rate to the census questionnaires has increased the Bureau’s costly
             nonresponse workload. In the 1980 Census, the mail response rate was 75
             percent, 3 percentage points lower than it was in the 1970 Census. In the
             1990 Census, the response rate dropped to 65 percent, 10 percentage
             points lower than it was in 1980. According to Bureau officials, if the



             Page 5                                      GAO/GGD-97-142 2000 Census Design
B-276531




downward trend in public cooperation continues without changes to the
Bureau’s methods for soliciting responses, the mail response rate could be
as low as 55 percent in 2000 and generate a potential nonresponse
workload of about 53 million cases, a substantial increase over the 1990
nonresponse workload of about 34 million cases.

Since 1970, census costs have been increasing faster than inflation, even
after allowing for population growth. In 1990 constant dollars, total census
cycle costs were $0.7 billion in 1970, $1.8 billion in 1980, and $2.6 billion in
1990.3 Furthermore, the cost per housing unit jumped from $11 in 1970, to
$20 in 1980, and to $25 in 1990. The Bureau estimated that, if census-taking
methods were not changed, the 2000 Census could cost almost $5 billion
(in 2000 dollars).

Unfortunately, the nation’s growing investment in the census has not
resulted in uniformly more accurate results. Since the Bureau began
evaluating census coverage in 1940, it has documented a net undercount,
which is the difference between the estimated population and the census
count. Figure 1 shows the net undercount for each census since 1940; note
that the undercount decreased for each subsequent census until it
increased for the 1990 Census.




3
 Constant-dollar value is measured in terms of prices for a base period, to remove the influence of
inflation. The resulting constant-dollar value is the value that would exist if prices had remained the
same as in the base period.



Page 6                                                         GAO/GGD-97-142 2000 Census Design
                                     B-276531




Figure 1: The Net Undercount Since
1940                                 10      Net millions of persons missed

                                         9

                                         8
                                             7.5

                                         7             6.5

                                         6                        5.7         5.7

                                         5                                                    4.7

                                         4

                                         3                                            2.8


                                         2

                                         1

                                         0

                                               1940     1950       1960        1970    1980    1990
                                               Decennial census



                                     Source: Bureau of the Census estimates of net undercounts based on demographic analysis and
                                     derived largely from administrative data, such as birth and death records, as of June 1991.




                                     The net undercount masks an even larger gross error in the census. The
                                     1990 Post Enumeration Survey (PES) provided a greater level of detail on
                                     such errors than is possible using demographic analysis.4 While the net
                                     undercount, as measured by the PES, was about 1.6 percent of the
                                     population (about 4 million persons) in 1990, this does not mean that over
                                     98 percent of the population was accurately counted, as is often reported.
                                     In fact, the number of persons missed in the 1990 Census was partially
                                     offset by millions of persons who were improperly included. The Bureau
                                     estimated that about 6 million persons were counted twice in the 1990
                                     Census, while 10 million were missed. The sum of these
                                     numbers—16 million—represents a minimum tally of gross errors since
                                     they do not include other errors, such as persons assigned to the wrong
                                     locations.



                                     4
                                      The 1990 PES was designed to estimate the net undercount in the census. It was a matching study in
                                     which the Bureau interviewed a sample of households several months after the census. The results of
                                     these interviews were compared with census questionnaires to determine whether each person was
                                     correctly counted in the census, missed, or included in error.



                                     Page 7                                                           GAO/GGD-97-142 2000 Census Design
              B-276531




              Even more troubling, the Census Bureau’s evaluations showed a persistent
              differential undercount of minority groups. The decennial census has not
              counted all population groups and areas in the United States equally well.
              The 4.4 percentage point difference in the 1990 net undercount between
              blacks (5.7 percent) and nonblacks (1.3 percent) was the highest since the
              Bureau began estimating coverage in the 1940 Census.

              After the 1990 Census, we, Congress, the Department of Commerce and its
              Office of the Inspector General, the Bureau itself, and other stakeholders,
              such as advisory committees to the Bureau, all recognized the need to
              reassess the conventional census-taking approach to ensure the
              achievement of a more accurate and cost-effective census in 2000. The
              Bureau evaluated the results of the 1990 Census to develop new,
              cost-saving approaches that could improve accuracy for the next census.
              For example, the single most expensive component of the census was the
              operation to gather data on nonrespondents, which took 14 weeks, rather
              than the 6 weeks originally scheduled. The final 6 weeks were devoted just
              to resolving the last 10 percent of the nonresponse cases. Furthermore,
              evaluations demonstrated that the amount of error in the census increased
              precipitously as time and effort were extended to count the last few
              percentages of the population, with the Bureau ultimately accepting
              whatever information could be obtained using “last resort” or “closeout”
              procedures, such as interviews with neighbors, mail carriers, or other
              persons who were not residents of the nonresponse households. The
              National Academy of Sciences also reexamined the conventional census
              design and proposed alternatives for 2000. The general conclusion from
              these extensive efforts was that fundamental changes were needed in the
              census design to accurately and economically account for the U.S.
              population in the next census.


              To provide information on the progress that the Bureau has made on its
Scope and     plans concerning the initiatives discussed in our October 1995 testimony
Methodology   and on other initiatives that the Bureau has proposed since the testimony
              and to assess the feasibility of the Bureau’s plans for carrying out the 2000
              Decennial Census, we (1) reviewed Bureau research, evaluation, and
              planning documents and data produced since our October 1995 testimony;
              (2) interviewed Bureau, Department of Commerce, and Office of
              Management and Budget (OMB) officials; and (3) reviewed a
              September 1996 report of the House Committee on Government Reform
              and Oversight entitled Sampling and Statistical Adjustment in the
              Decennial Census: Fundamental Flaws, our work on prior decennial



              Page 8                                        GAO/GGD-97-142 2000 Census Design
B-276531




census activities, and various reports prepared by the National Academy
of Sciences related to planning for the 2000 Census. This report provides a
summary of our work and its results. It includes a technical appendix
(app. I) that explains in more detail the Bureau’s sampling and statistical
estimation initiatives and our analysis of their effects. It also includes an
appendix (app. II) presenting the Bureau’s most recent summary
information and explanation of projected differences in the costs and
accuracy of selected census designs.

In assessing the Bureau’s plans for its statistical sampling and estimation
initiatives, we began with the data and research that the Bureau provided,
mostly from simulations that the Bureau performed using 1990 Census
data and results from the 1995 Census Test. Because it was impractical for
us to verify the underlying census data the Bureau used, we accepted the
census data in the Bureau’s analysis without further verification. However,
we challenged the Bureau’s analysis in several respects, including the
methods and assumptions for producing the data and the way they were
presented. In response to our questions, the Bureau made several
modifications, most of which were to clarify the data and assumptions.
Furthermore, to produce the final data comparing the design alternatives
included in appendix II, the Bureau redid the simulations after detecting
shortcomings in the draft data. To ensure its accuracy, the Bureau had two
groups working independently on the simulations and reconciled
differences prior to providing the final data to us. Because the actual
census environment and methods in 2000 may vary from those that were
simulated and tested, the Bureau may not actually achieve these results for
the 2000 Census. However, because these types of data are the best
available to reflect the probable effects of an actual census, we believe it is
reasonable to use them to attempt to project the possible effects of various
Bureau design options on the 2000 Census.

It is important to note that the Bureau, which will continue its research
efforts until the 2000 Census begins, is still refining, modifying, and testing
elements of its proposed census design. For example, the Bureau was in
the process of revising and verifying its cost estimates when we completed
our work. Therefore, we relied on the most recent available cost data,
those from June 1997, in the final version of this report, but the Bureau’s
estimates are still subject to revision. Finally, because all initiatives of the
Bureau’s current plan for the 2000 Census are closely related, none of the
initiatives should be considered in isolation from the rest of the census
design.




Page 9                                         GAO/GGD-97-142 2000 Census Design
                             B-276531




                             We did our work in Washington, D.C., and at the Bureau’s headquarters in
                             Suitland, MD, between October 1995 and June 1997 in accordance with
                             generally accepted government auditing standards. We requested
                             comments on a draft of this report from the Secretary of Commerce. We
                             received comments from the Director of the Bureau of the Census (see
                             app. III), which we address at the end of this letter.


                             Declining rates of public response to census questionnaires have
Obtaining Responses          generated a costly, time-consuming workload for the Bureau. The key to a
to Census                    successful census as measured in terms of cost and data quality is
Questionnaires               obtaining mail responses to census questionnaires from residents of
                             housing units. The 65-percent mail response rate for the 1990 Census was
Presents a Formidable        troublesome to the Bureau because of the extensive follow-up effort
Challenge                    required to obtain information from nonresponding housing units. In our
                             October 1995 testimony, we reported the status of new Bureau initiatives
                             that were aimed at obtaining responses to census questionnaires. These
                             initiatives included the creation of simplified and streamlined census
                             questionnaires and the use of multiple mail contacts to prompt a response.
                             Subsequently, the Bureau announced the development of a new outreach
                             and promotion program to encourage public cooperation. The Bureau
                             expects that, when combined, these initiatives should produce a mail
                             response rate of 66.9 percent. This is lower than the 70-percent response
                             rate that the Bureau expected to receive in the 1990 Census and only
                             slightly higher than the 65-percent response rate achieved in 1990.
                             However, it is 12 percentage points higher than the Bureau expects to
                             achieve without the initiatives. Bureau officials did not quantify the
                             individual effect of these initiatives, stating that their effects were too
                             interrelated to be measured separately.


Simplified and Streamlined   Since the 1990 Census, the Bureau has been working to simplify and
Questionnaire Has            streamline the short-form questionnaire. As of February 1997, the draft
Potential to Improve Mail    short-form questionnaire that the Bureau plans to use for the 2000
                             Decennial Census contained eight questions, which is six questions fewer
Response Rates               than were on the form used in 1990. The new short-form questionnaire
                             now asks only for the name, age, gender, race, ethnicity, relationship of
                             each household member, and housing tenure (owned or rented). Over the
                             years, we have strongly suggested such an abbreviated form. The Bureau
                             also has been simplifying the long-form questionnaire that asks for more
                             detailed sociodemographic, economic, and housing information. As in the
                             1990 Census, the Bureau plans to ask one in six housing units to complete



                             Page 10                                     GAO/GGD-97-142 2000 Census Design
B-276531




a long form. Although the final design is still being evaluated, the Bureau
expects the 2000 Census long-form questionnaire to have fewer questions
than the 1990 Census long-form questionnaire had.

The 1995 Census Test evaluated one short-form and several long-form
questionnaires of different lengths. The long-form questionnaires ranged in
length from 16 to 53 questions, with even the longest version including 11
fewer questions than did the 1990 long-form questionnaire. The 1995
Census Test showed that, the shorter the questionnaire, the more likely
housing units were to respond. During the test, the response rate for
short-form questionnaires was 55 percent, whereas the response rate for
the three versions of the long form ranged from 38.1 percent to
46.8 percent, with the longest questionnaire having the lowest response
rate.

During 1996, the Bureau conducted the 2000 Census Test to help it
determine which specific question wording, formatting, and sequencing
would elicit the most accurate responses. It also tested alternative-form
designs and assessed the differences in coverage, completeness, and
cooperation. As part of this test, short-form questionnaires were sent to
42,000 housing units, and various versions of the long form were sent to
52,500 housing units. The Bureau released preliminary test results in
December 1996 that indicated no statistical difference in the housing units’
responsiveness to the various lengths of either the short-form or long-form
questionnaires. However, the long-form questionnaire had a lower average
response rate (65 percent) than did the short-form questionnaire
(72 percent).

The Bureau is continuing to work on the design of the census
questionnaires with the assistance of marketing and survey design
consultants, giving consideration to the printing, mailing, and processing
of a large volume of questionnaires. Bureau officials told us that several
visual design issues, such as illustrations on the census form that are
aimed at promoting response, are still to be resolved.

As reported in our 1995 testimony, although the Bureau continues to
progress in simplifying the census questionnaires, it has not gained
consensus among policymakers and other stakeholders on the content of
the questionnaires, their ultimate length, or the use of a long form.
Although some Members of Congress have raised questions about the
length of the short-form questionnaire and the need for the long-form
questionnaire, the Bureau plans to use both the short- and long-form



Page 11                                      GAO/GGD-97-142 2000 Census Design
                           B-276531




                           questionnaires unless formally directed not to do so by Congress. In the
                           meantime, demands on the Bureau for data collection are increasing. For
                           example, during 1996, the Welfare Reform Act was enacted, which, among
                           other things, mandates that the Census Bureau collect data on
                           grandparents as primary caregivers for their grandchildren. The Bureau is
                           currently proposing to add two questions to the long-form questionnaire to
                           comply with the act.


Multiple Mail Contacts     We first suggested the use of a multiple mail contact strategy after we
Should Promote Increased   analyzed the results of the 1980 Census. The Bureau’s initiative for
Response to                multiple mail contacts consisted of four household contacts—a pre-notice
                           letter, an initial questionnaire, a thank you/reminder card, and a
Questionnaires             replacement questionnaire. Bureau evaluations during 1995 showed that
                           multiple mail contacts should increase mail response rates. Precise
                           percentages could not be determined, however, because Bureau officials
                           could not determine which part of the multiple mail contact initiative
                           prompted the response. For example, some housing units may have
                           returned the original census questionnaire because they received a thank
                           you/reminder card, while others may have returned the questionnaire
                           without receiving the cards. The Bureau’s test of its initiative was not
                           designed to permit an analysis of this nature.

                           Nevertheless, during testing in 1995, about 7 percent of housing units
                           responded to the questionnaires using the replacement questionnaires,
                           indicating that multiple mail contacts prompted responses. Considering
                           that follow-up on each 1 percent of nonresponding housing units is
                           expected to cost about $25 million, increasing the response rate through
                           multiple mail contact could produce significant savings. Furthermore,
                           because of the approximately 97.3 million housing units for which census
                           questionnaires are projected to be mailed in 2000 and returned by mail, the
                           cost of multiple mail contacts could be a significant but worthwhile
                           investment since it may free the Bureau from having to do follow-up visits
                           to approximately 6.8 million housing units.

                           According to Bureau officials, only a limited number of printing/mailing
                           vendors who are technically qualified have shown interest in bidding on
                           the Bureau’s 2000 Decennial Census questionnaire printing contract.
                           Because of the large volume of printing/mailing involved and the remailing
                           time constraints, most of these interested printing vendors did not believe
                           that a replacement questionnaire could be implemented under the
                           Bureau’s requirements. Therefore, the Bureau changed its initiative to



                           Page 12                                     GAO/GGD-97-142 2000 Census Design
                              B-276531




                              include sending all housing units a replacement questionnaire, although it
                              plans to continue discussing less costly and less duplicative methods with
                              vendors for possible use in the 2000 Census. Bureau officials, with
                              assistance from the Government Printing Office, plan to resolve all
                              questionnaire printing and operational issues and award a printing
                              contract to a single contractor or a consortium of printing contractors by
                              December 1998.


Increased Use of the Postal   As we testified in October 1995, the Bureau is working with the Postal
Service Provides              Service and local communities to maintain and update its address list.
Opportunity for Savings       Furthermore, the Bureau is planning to use the Postal Service to identify
                              vacant and nonexistent housing units early in the census-taking process to
                              improve data quality and reduce costly nonresponse follow-up. Through
                              greater reliance on the Postal Service and local communities in updating
                              its address list, the Bureau estimated that it could save as much as
                              $188 million in the 2000 Census. The Bureau also estimated that the use of
                              the Postal Service to identify vacant and nonexistent housing units could
                              reduce nonresponse workload by about 6 percent in the 2000 Census and
                              thereby save an additional $135 million. The Census Bureau has tested and
                              evaluated the use of a combined Census Bureau and Postal Service master
                              address file and is now in the process of updating the file for 2000.

                              As we stated in 1995, a geographically structured address list is critical for
                              planning the 2000 Decennial Census because such a list will enable
                              enumerators to physically locate the addresses of housing units and
                              determine where they may be missing housing units. As of May 1997, the
                              Bureau had updated 85 percent, or 84 million, of the currently known
                              universe of about 99 million city-style addresses on its mapping system.5
                              Rural addresses, which the Bureau projected will number about
                              21.3 million in 2000, are to be identified as they were in past censuses—by
                              having Bureau employees canvass rural areas to determine addresses for
                              and geographically locate housing units. Postal Service information is not
                              designed to geographically locate these housing units.

                              The Bureau continues to keep pace with its planned schedule for updating
                              addresses and has determined that March 1999 will be critical for
                              completing the address list. At that time, local governments are expected
                              to review the list and provide updated information as the Bureau prepares
                              for mailing census questionnaires in spring 2000. According to Bureau

                              5
                               Not all of these city-style addresses will be delivered census questionnaires by the Postal Service;
                              some questionnaires will be hand-delivered by enumerators and returned through the mail by
                              respondents.



                              Page 13                                                        GAO/GGD-97-142 2000 Census Design
                             B-276531




                             officials, a special effort will be needed because the last 10 to 20 percent of
                             the addresses will be the most difficult for the Bureau to identify because
                             of poor quality reference sources. For example, an address zoned for
                             single-family residents may have been converted to contain two or more
                             households.


Outreach and Promotion       In February 1996, the Bureau unveiled its plans for a new outreach and
Program Could Encourage      promotion initiative, which is expected to cost about $230 million, for
Public Response to           encouraging public response to the census. While not discussed in our
                             October 1995 testimony, this initiative goes hand in hand with other
Census, but to What Extent   initiatives discussed in this report because of its potential impact on the
Is Unclear                   response rate to, and the cost of, the census. One key feature of the
                             outreach and promotion initiative is cooperative ventures with local
                             governments aimed at involving elected local officials, business leaders,
                             minority groups, religious organizations, and others in developing
                             outreach activities within local communities. The Bureau’s 1995 Census
                             Test indicated that cooperative ventures with local governments provided
                             a way to promote public participation in the census. However, even
                             though local communities were enthusiastic about participating in the
                             Bureau’s outreach efforts, funding was an issue. Local governments in
                             urban areas where response rates are lowest reported that their lack of
                             funding to promote the Bureau’s initiative is an issue. As was the case in
                             the 1990 Census, the Bureau’s plans do not include funding for these
                             cooperative ventures, and therefore the level of local government
                             involvement in the 2000 Census is unclear.

                             Another feature of the Bureau’s outreach and promotion initiative is the
                             targeting of certain populations and geographic areas that historically have
                             been undercounted, such as inner-city populations. Several of these
                             targeted methods include the use of community-based outreach
                             organizations and the use of unaddressed questionnaires that will not be
                             sent to housing units but rather will be made available at locations
                             throughout a community, such as in community centers and convenience
                             stores. The 1995 Census Test evaluations of targeted methods showed that
                             the use of community-based organizations provides valuable assistance to
                             outreach and promotion efforts in hard-to-enumerate areas. As a result of
                             the test evaluation, the Bureau concluded that the use of unaddressed
                             questionnaires increased response rates by a small percentage, especially
                             from those who may not otherwise have completed a questionnaire.




                             Page 14                                        GAO/GGD-97-142 2000 Census Design
B-276531




The third feature of the outreach and promotion initiative is the Bureau’s
plan to contract with a private-sector advertising firm to promote the 2000
Census. The Bureau estimates that this paid advertising contract will cost
about $100 million of the $230 million budgeted for the outreach and
promotion program. This paid advertising program will end a 50-year
partnership between the Bureau and the Advertising Council, which
provided pro bono promotional services valued at $65 million for the 1990
Census.6

By using paid advertising, Bureau officials said that they expect to have
more control over the placement of advertising to reach targeted
populations. For example, under pro bono promotional services, the
Bureau had no control over the time of day that advertising was aired. In
contrast, under a paid advertising program, the Bureau can decide when
advertising “spots” should be aired, including during “prime time” and
popular television programs. The Bureau is still in the research and
development phase of its paid advertising campaign and plans to spend
$450,000 in fiscal year 1997 for focus groups to develop a census message
and image acceptable to the majority of the population, as well as specific
targeted population groups. The Bureau plans to incorporate the results of
these focus groups into the final contract for the advertising firm that is
selected to carry out the overall 2000 Census promotional campaign. The
Bureau plans to award the advertising contract in September 1997. If
funding permits, Bureau officials said they plan to evaluate their
full-treatment advertising campaign in the 1998 dress rehearsal.

Although Bureau officials acknowledged that a direct link between
investment in advertising and a corresponding increase in mail response
rates cannot be proven, these officials said that they intuitively believe that
advertising aids response by making people aware of the census.
Nevertheless, the Bureau’s own research found that, although about
93 percent of the public was aware of the 1990 Census, the mail response
rate was only 65 percent. In a related example of civic response, although
most U.S. citizens were aware of the 1996 presidential election, and
despite massive advertising by candidates and political parties, as well as
through public service announcements, only 49.7 percent of the voting age
population exercised their right to vote.

Although the Bureau expects the use of outreach and promotion to
encourage participation, especially in the hard-to-enumerate areas, we are

6
 The Advertising Council, Inc., is a nonprofit organization responsible for administering public service
advertising campaigns for television, radio, and print media.



Page 15                                                       GAO/GGD-97-142 2000 Census Design
                          B-276531




                          concerned that the Bureau’s funding plans may not bring the high
                          response rates hoped for because of other, larger demographic, economic,
                          and attitudinal variables in our society that cannot be easily overcome. For
                          example, in the 1990 Census, the Bureau planned for a 70-percent mail
                          response rate but achieved only a 65-percent rate.


                          The Bureau intends to expand the use of statistical sampling and
The Potential Effects     estimation procedures in the 2000 Decennial Census to reduce the time
of Statistical Sampling   and cost required to follow up on housing units that do not respond to
and Estimation on         census questionnaires and to improve the accuracy of the population
                          count through the use of integrated coverage measurement (ICM)
Census Accuracy and       procedures. ICM is a statistical procedure that is designed to improve the
Cost                      accuracy of the census count by reconciling the original census counts
                          with data obtained from an independent sample of housing units and using
                          the results to adjust the census. While the Bureau used sampling and
                          statistical estimation procedures in past censuses, its current plan for the
                          2000 Census would greatly expand reliance on such procedures in
                          producing the final census totals.

                          In 1992, after comprehensively studying the 1990 Census, we
                          recommended that the Bureau consider using statistical sampling to
                          develop information on nonrespondents in an effort to achieve significant
                          cost-savings.7 In our October 1995 testimony, we again noted that sampling
                          could both improve the accuracy of census data on nonrespondents and
                          save money.8 However, we also testified that the Bureau must be prepared
                          to provide policymakers with data on the trade-offs between the accuracy
                          and potential cost-savings of sampling. In addition, we noted that, if the
                          Bureau were to use sampling and statistical estimation procedures in the
                          form of ICM to adjust for undercounting, such procedures must be reliable.
                          We were concerned that errors introduced by sampling nonrespondents
                          and using ICM would overshadow the benefits that these procedures could
                          provide when applied to smaller geographic levels, such as census tracts.9

                          In December 1996, the Bureau began providing us with draft results of its
                          research using 1990 Census data that were applied to different design

                          7
                           Decennial Census: 1990 Results Show Need for Fundamental Reform (GAO/GGD-92-94, June 9, 1992).
                          8
                           GAO/GGD-T-96-37, October 25, 1995.
                          9
                           A census tract is a small, relatively permanent statistical subdivision of a county. Census tracts
                          usually have between 2,500 and 8,000 persons—averaging about 4,000—and, when first delineated, are
                          designed to be homogeneous with respect to population characteristics, economic status, and living
                          conditions. Census tracts do not cross county boundaries.



                          Page 16                                                    GAO/GGD-97-142 2000 Census Design
                           B-276531




                           options, and these results were finalized by June 1997. The Bureau did not
                           release all of these results publicly. The results showed that the Bureau’s
                           plan for statistical sampling and estimation, if effectively implemented, has
                           the potential for producing a more accurate and less costly census than if
                           only conventional census procedures were used. The following sections
                           summarize the Bureau’s general development of sampling for nonresponse
                           follow-up and ICM since our 1995 testimony and provide a view of the
                           projected combined effects of these initiatives to produce the census
                           count. A fuller and more technical description of the purpose, strategies,
                           and research on the Bureau’s alternative designs for sampling for
                           nonresponse follow-up and ICM is included in appendix I.


Sampling for Nonresponse   Declining rates of public response to the census questionnaires have
Follow-Up Could Reduce     generated a costly, time-consuming nonresponse follow-up workload for
Cost and Save Time         the Bureau. The Bureau had to follow up on about 34 million
                           nonresponding housing units in 1990. Although the Bureau plans many
                           initiatives for the 2000 Census to encourage higher mail response rates and
                           reduce its reliance on expensive enumerator visits to every nonresponding
                           housing unit, the nonresponse workload is still expected to be greater than
                           it was in 1990. Therefore, after trying to directly contact every housing unit
                           by providing a census form and requesting a response, the Bureau plans to
                           sample a portion of the nonresponding housing units, rather than
                           continuing with 100-percent follow-up as it did in 1990. The primary
                           purposes of using sampling for nonresponse follow-up are to reduce the
                           cost and time required to finish the census operation. In June 1997, the
                           Bureau estimated that its current plan for sampling nonresponding
                           housing units in 2000 could save about $400 million off the cost of using a
                           census design that incorporated all other improvements on the 1990 model
                           except for sampling for nonresponse. Research results also indicated that
                           sampling can help the Bureau to complete nonresponse follow-up
                           operations in a much more timely manner than was the case in past
                           censuses.

                           Since our October 1995 testimony, the Bureau has twice revised its plan
                           for sampling a portion of the housing units that do not mail back a
                           completed census questionnaire. Under its original plan, which was
                           officially presented to the public in February 1996, the Bureau intended to
                           continue conventional mail response data collection and follow-up
                           interviews until information was obtained for 90 percent of the housing
                           units in each county. The Bureau then planned to truncate, or stop,
                           conventional follow-up, select a sample of 1 in 10 of the remaining housing



                           Page 17                                       GAO/GGD-97-142 2000 Census Design
B-276531




units to interview, and rely on information obtained from interviewing the
sample housing units to produce census data that would be substituted for
all of the remaining nonrespondents.

After this plan was announced, some Members of Congress and other
stakeholders and observers raised concerns about the effect of
county-level truncation on the accuracy and fairness (equity) of census
data. One of the concerns regarding county-level truncation was that the
90-percent threshold would be achieved primarily by more complete
enumeration of the areas and population groups within a county that were
easiest to count, leaving hard-to-count areas and subgroups of the
population, especially minorities, to be disproportionately covered by
sampling. In response, the Bureau decided, in September 1996, that its
preferred design would be to base sampling for nonresponse on mail
response rates at the census tract level while maintaining the goal of
obtaining responses from at least 90 percent of all housing units before
relying on sample data to account for the remaining units. This change
should improve overall accuracy and fairness because tracts are generally
smaller and more homogeneous than counties. However, tract-level
truncation will pose some additional challenges in the management and
implementation of census field operations, not the least of which is the
difficulty of tracking and controlling operations for over 60,000 separate
tracts instead of for about 3,000 counties.

The Bureau studied the following three options to determine how best to
implement its revised plan: (1) truncation at 90 percent, (2) time
truncation, and (3) direct sampling. Truncation at 90 percent featured the
same basic design as the original February 1996 plan, except that
completion rates would be tracked for each census tract rather than for
each county. Under this option, the Bureau would continue conventional
follow-up interviews until it had achieved the 90-percent completion
threshold for each tract and would then begin sampling to account for the
remaining nonrespondents. Under the time truncation option,
conventional follow-up interviews would continue for a predetermined
length of time, such as 3 weeks. After this initial follow-up period, the
Bureau would select a sample of the remaining housing units in each tract
that included enough housing units to raise the completion rate to at least
90 percent. For example, in a tract for which the Bureau achieved a
70-percent completion rate after this initial follow-up period, two of every
three of the remaining housing units would be selected for the follow-up
sample. Under the direct sampling option, there would be no conventional
follow-up phase. Instead, at the end of the mail response phase, the



Page 18                                      GAO/GGD-97-142 2000 Census Design
                             B-276531




                             Bureau would select a sample of the remaining nonresponding housing
                             units in each tract that would be sufficient to reach at least a 90-percent
                             completion rate and would then project the remaining 10 percent. For
                             example, in a tract with a mail response rate of 30 percent, six of every
                             seven of the remaining housing units would be sampled. Under each of the
                             3 options, the Bureau would take a 1-in-10 sample of the remaining
                             housing units for any tract with an initial response rate of more than
                             90 percent.

                             The Bureau’s research indicated that all three options had the potential to
                             (1) produce results with similar accuracy; (2) reduce the cost of follow-up
                             operations at least somewhat when compared to the cost of 100-percent
                             follow-up; and perhaps most important, (3) allow the Bureau to complete
                             nonresponse operations in time to also implement and complete ICM. Of
                             the three nonresponse sampling options the Bureau considered, direct
                             sampling appeared to produce the greatest benefits in terms of cost,
                             accuracy, and operational feasibility. Therefore, in March 1997, the Bureau
                             selected the direct sampling design option. The Bureau estimated the cost
                             of implementing direct sampling to be about $200 million less than
                             truncating at 90 percent and $600 million less than using the time
                             truncation option. In simulations of the accuracy of different options,
                             direct sampling produced slightly better results, particularly for small
                             geographic areas, such as census tracts. This option was also favored by
                             Bureau field staff because it is simpler to implement. However, direct
                             sampling differs the most from the plan the Bureau originally proposed
                             and would mean that nonresponding housing units that were not selected
                             as part of the sample would not have another chance to be interviewed by
                             census enumerators. This option, therefore, may be somewhat more
                             difficult for the public to understand and accept.


ICM Could Improve the        The purpose of ICM is to reduce coverage error in the census, particularly
Accuracy of the Population   the differential undercount of minorities and other hard-to-enumerate
Totals                       populations and areas. Even if all other redesign initiatives produce the
                             anticipated improvements in the conventional census counting operations,
                             the evidence from past census evaluations indicates that coverage errors
                             and differential net undercounts in the census data will still occur.
                             Evaluations have demonstrated that the census misses entire housing
                             units (and any occupants), misses people within housing units that it does
                             count, and includes other persons in the census counts in error (e.g.,
                             counting them more than once or in the wrong place). Young adult males,
                             members of ethnic and racial minorities, renters, and people living in rural



                             Page 19                                      GAO/GGD-97-142 2000 Census Design
B-276531




areas, among others, are more likely than other categories of residents to
be undercounted by the census. Evaluations of the 1990 Census also
showed that error rates were significantly higher for persons living in
housing units enumerated using “last resort” or “closeout” procedures at
the end of nonresponse follow-up operations, when the Bureau accepted
information from persons who were not residents of the households.

ICM is designed to use the results of a coverage measurement survey to do
a quality check in which enumerators visit an independent sample of
households to check the accuracy of the original census data. ICM would
be conducted after basic data collection, including nonresponse follow-up,
had ended and would estimate the extent to which people were correctly
counted, missed, or included in error by the census. These estimates
would then be used to correct coverage errors in the results of previous
data collection efforts. This would be the last phase in completing the
census and producing the final census results. Although the Bureau has
used coverage measurement surveys since 1950 to help it determine the
magnitude and characteristics of census errors and undercounts, it has not
used the findings of these evaluations to correct for coverage errors in the
decennial census data tabulations.10 A key feature of the Bureau’s current
plan for 2000 is to produce a one-number census by integrating an
adjustment into the basic process for developing decennial census data
tabulations.

The Bureau designed the proposed ICM for 2000 to address several major
weaknesses of the coverage measurement survey that was used in the
1990 Census (the PES), especially those regarding timeliness and the
accuracy of estimates for subnational geographic areas. In the 1995
Census Test, new procedures and reliance on improved survey technology
demonstrated the potential to improve the timeliness of a post-census
quality check. The Bureau was able to produce the final test census results
before the end of 1995, a significant improvement over the performance of
the 1990 PES, which did not generate adjusted census estimates until the
spring of 1991, well after the December 31, 1990, deadline for
apportionment totals. The Bureau also plans to use a sample in 2000 that is
approximately 5 times larger than the 1990 PES sample (about 750,000
housing units versus about 150,000). This larger sample is designed to




10
 The Bureau has used the 1990 Census adjustment factors for purposes of adjusting its survey
controls but not the decennial census tabulations.



Page 20                                                    GAO/GGD-97-142 2000 Census Design
                        B-276531




                        allow the Bureau to produce direct estimates for each state and improve
                        the quality of estimates for smaller geographic areas.11

                        While results from research and testing to date have been promising in
                        general, it is also clear that the Bureau’s plans for further research and
                        testing are important to the development of ICM for the 2000 Census. The
                        Bureau experienced some operational and technical problems with ICM in
                        the 1995 Census Test. For example, enumerators tested the use of laptop
                        computers to quickly reconcile the information obtained during the ICM
                        visit with original census data but found that the data were not always
                        contained in the computer, making on-the-spot reconciliations impossible.

                        The Bureau is currently testing and evaluating redesigned procedures and
                        its survey instrument. The ICM survey portion of this test began in
                        January 1997, and the results of the test should help determine whether
                        ICM operational and technical problems have been addressed sufficiently.
                        However, evaluations from this test are not expected to be completed until
                        fall 1997, and Bureau officials have indicated that evaluation of ICM may
                        continue through the Census 2000 Dress Rehearsal.


Combined Effects of     The effects of the proposed sampling for nonresponse follow-up and of ICM
Planned Sampling for    procedures on the success of census operations and the quality of the
Nonresponse Follow-Up   resulting data need to be viewed in combination to be meaningful.
                        Sampling a portion of the nonresponse workload can save time and money
and ICM on Population   when compared with the option of 100-percent follow-up of
Totals                  nonrespondents; however, it is unlikely to significantly change (either
                        improve or decrease) census accuracy. Conversely, ICM, which would
                        increase costs, is designed to address problems with census accuracy. But
                        ICM is unlikely to be successful unless preceding data collection efforts, in
                        particular nonresponse follow-up, are completed on schedule. ICM is
                        designed to reduce the systematic bias observed in past censuses (i.e.,
                        differential undercounts), but it also introduces sampling error. Similarly,
                        a census design that would not include sampling for nonresponse
                        follow-up or ICM would also involve trade-offs. For example, such designs
                        would be easier to explain to the public, would more closely resemble past
                        censuses, and would not introduce the level of uncertainty in the results
                        that accompanies sampling. However, these designs are also likely to be


                        11
                          A direct estimate is based entirely on data from the area for which the estimate is calculated. For
                        instance, a direct population estimate for Missouri would be calculated using only data collected from
                        Missouri. Indirect estimates, such as the 1990 PES state population estimates, draw on data from
                        outside the area being estimated.



                        Page 21                                                      GAO/GGD-97-142 2000 Census Design
B-276531




more expensive and have shown no likelihood of reversing or significantly
reducing past accuracy problems in census data.

The Bureau shared with us data and analysis it had developed from its
research and simulations done for the different design alternatives it
considered for the 2000 Census. We reviewed the Bureau’s methods and
assumptions and, after the Bureau made a number of revisions in response
to our questions, found that the revised data produced were generally
reasonable to use to project the possible effects of the Bureau’s proposed
sampling and statistical estimation initiatives. The results of the
simulations show that the Bureau’s plan, if effectively implemented, has
the potential for producing a more accurate and less costly census than if
only conventional census procedures were used. According to the
Bureau’s cost estimates, which are shown in table 1, using the Bureau’s
refined plan for the 2000 Census would save between $700 million and
$800 million off the cost of using a plan that incorporated all of the new
initiatives proposed for the 2000 Census except those involving sampling
and statistical estimation.




Page 22                                     GAO/GGD-97-142 2000 Census Design
                                   B-276531




Table 1: Comparison of Estimated
2000 Census Costs for Selected     Dollars in billions
Design Alternatives                                                                                                      Cost in 2000
                                   Design alternatives       Description                                                     dollarsa
                                   Bureau’s refined plan —Include all planned improvements of                                   $ 4.0
                                   for 2000 Census (as   1990 procedures;
                                   of March 1997)        —sample nonrespondents directly after
                                                         mail return phase to achieve a
                                                         90-percent response for each census tract,
                                                         then use sample data for
                                                         remaining nonrespondents;
                                                         —use ICM to complete the census.
                                   90-Percent truncation —Include all planned improvements of                                      4.2
                                   for nonresponse       1990 procedures;
                                   follow-up             —use conventional nonresponse follow-up
                                                         until achieving 90-percent
                                                         response for each census tract; then
                                                         sample remaining
                                                         nonrespondents;
                                                         —use ICM to complete the census.
                                   Time truncation for       —Include all planned improvements of                                  4.6
                                   nonresponse               1990 procedures;
                                   follow-up                 —use conventional nonresponse follow-up
                                                             for a set period of time,
                                                             then sample remaining nonrespondents to
                                                             achieve a 90-percent
                                                             response rate for each census tract;
                                                             —use ICM to complete the census.
                                   Conduct 2000              —Do not use sampling for nonresponse                                  4.4
                                   Census without            follow-up;
                                   sampling for              —include all other planned improvements
                                   nonresponse               of 1990 Census procedures,
                                   follow-up                 including ICM to complete the census.
                                   Conduct 2000              —Do not use sampling for nonresponse                              4.7-4.8
                                   Census without            follow-up or ICM;
                                   sampling for              —include all other planned improvements
                                   nonresponse               of 1990 procedures;
                                   follow-up or ICM          —use increased activities (e.g., publicity,
                                                             follow-up of vacant housing
                                                             units) to attempt to achieve census
                                                             coverage consistent
                                                             with 1990 levels;
                                                             —use a PES only to evaluate the quality of
                                                             the
                                                             census.
                                   Note 1: Planned improvements of 1990 procedures include initiatives such as the multiple mail
                                   strategy, questionnaire redesign, enhanced outreach and promotion, sampling for nonresponse
                                   follow-up, and ICM.

                                   Note 2: All design alternatives assume an overall mail response rate of about 67 percent.
                                   a
                                   Bureau cost estimates are as of June 1997 and are subject to revision.

                                   Source: Census Bureau data.



                                   Page 23                                                     GAO/GGD-97-142 2000 Census Design
B-276531




The Bureau provided us with results from its computer simulations of
what could be produced in the 2000 Census using its refined design for
2000 (i.e., using direct sampling controlled at the census tract level for
nonresponse follow-up together with ICM). Those research results
indicated that, relative to the size of the population being estimated, the
new methods proposed by the Bureau would likely result in less relative
error in census data for the nation, states, congressional districts, and
most census tracts than using a conventional census design (see fig. 2).12
Because of the limitations of the research we reviewed, these numbers
only serve as an illustration of likely results in 2000. Despite these caveats,
results near these levels would represent a reduction in the relative error
rates experienced in the 1990 Census.




12
  In this report, we primarily use relative error to enable us to compare results of different design
alternatives and geographic areas with differing population sizes. The Bureau’s simulations measured
two different types of error: sampling error for the Bureau’s alternatives that incorporate sampling for
nonresponse follow-up and ICM and net undercount or overcount for conventional census design
alternatives.



Page 24                                                       GAO/GGD-97-142 2000 Census Design
                                     B-276531




Figure 2: Simulations Indicate
Bureau’s Refined Design Could
Produce Lower Relative Error Rates   Percentage of relative error
Than a No Sampling Design in 2000    2.5




                                     2.0
                                                1.9                   1.9                    1.9                   1.9




                                     1.5



                                                                                                                            1.1

                                     1.0



                                                                                                     0.6
                                                                                0.5
                                     0.5



                                                         0.1

                                     0.0
                                                 National                  States         Congressional           Census tracts
                                                                                          districts
                                             Geographic level


                                                      No sampling design
                                                      Direct sampling design



                                     Note 1: The average relative error is shown for each geographic level.

                                     Note 2: Relative error for the no sampling design (i.e., implementing all planned improvements of
                                     1990 procedures except sampling for nonresponse follow-up and ICM) is the projected net
                                     undercount rate in the 2000 Census.

                                     Note 3: Relative error for the direct sampling design combines the sampling error from sampling
                                     for nonresponse follow-up and ICM.

                                     Note 4: Data for states exclude Washington, D.C.

                                     Note 5: Data do not include possible errors from other sources, such as any bias in the statistical
                                     models used to produce the population estimates.

                                     Source: Census Bureau data.




                                     Page 25                                                       GAO/GGD-97-142 2000 Census Design
B-276531




The Bureau’s simulation results for design alternatives also illustrated one
of the major trade-offs in accuracy between designs that use sampling and
statistical estimation and those that do not. The simulation results suggest
that the new statistical methods the Bureau proposes to use in the 2000
Census would likely produce results that appear more accurate or more
equitable according to at least three broad criteria: (1) better average
levels of error, (2) error distributions compressed closer to the average
levels, and (3) an apparently better cumulative error distribution. For
example, state population totals, which are the basis for apportioning
seats in the House of Representatives, not only show lower error rates on
average but also show much less variation in the error rates among states
when compared to the undercount rates in the 1990 Census.

The results also showed, however, that some areas that may have very low
net error rates using a conventional census design could have higher error
rates if sampling for nonresponse and ICM are used, particularly as smaller
geographic levels are considered. The smallest geographic areas for which
detailed data from the Bureau simulations are available are census tracts.
The Bureau projected that its refined plan for the 2000 Census, using the
direct sampling option for nonresponse follow-up, would have an average
error rate of 1.1 percent for census tracts. The average error rates were
1.5 percent for the nonresponse follow-up option, using 90-percent
truncation, and 1.3 percent, using the time-truncation option. The
estimated average net undercount for tracts using conventional
procedures was 1.9 percent. The Bureau calculated that its refined plan for
2000 would produce less error for 64 percent of census tracts. For the
other two options the Bureau considered implementing, the simulations
indicated that the time-truncation option would produce less error for
54 percent of the tracts and that the 90-percent truncation would produce
less error for 51 percent of the tracts. The converse of these data is that
the conventional procedures were projected to perform better for around
40 to 50 percent of the tracts, depending on the option used for
comparison. (For more detailed information on projected error levels,
using these alternative census designs, see apps. I and II.) The Bureau
intends to contract for a study of the potential block-level effects of using
sampling.

Technically, the most accurate design alternative, according to the results
of the Bureau’s research, would be to attempt 100-percent follow-up of
nonrespondents and use ICM to address accuracy problems. That design
could produce slightly improved accuracy in census data, particularly for
smaller geographic areas, but would come at a greater cost (approximately



Page 26                                      GAO/GGD-97-142 2000 Census Design
                                  B-276531




                                  $400 million more than that for the Bureau’s refined plan). Furthermore,
                                  such an option may not be feasible given projected staffing difficulties
                                  and, especially, the risk that the Bureau could not complete 100-percent
                                  follow-up and ICM by the December 31 deadline for reporting census
                                  results for congressional reapportionment. The results of the Bureau’s
                                  research on alternative sampling approaches are included in appendixes I
                                  and II.

Constitutional and Legal Issues   The use of sampling in connection with the decennial census count has
                                  been questioned on constitutional and statutory grounds. Article I, section
                                  2 of the Constitution requires an “actual Enumeration” of the population
                                  every 10 years and vests Congress with the authority to conduct that
                                  census “in such a Manner as they shall by Law direct.” Congress has, in
                                  turn, delegated this authority to the Department of Commerce through the
                                  Census Act.13 One of the key issues in the constitutional debate on
                                  sampling is whether sampling would be considered an “actual
                                  Enumeration” as required by article I, section 2. The Supreme Court has
                                  never considered the specific issue of whether the use of sampling violates
                                  the Constitution. The Court has, however, considered various
                                  constitutional challenges to the conduct of the census in other contexts.

                                  Most recently, the Court determined that the Secretary of Commerce’s
                                  decision not to use a post-enumeration statistical adjustment in the
                                  Department’s final census count in 1990 was within the constitutional
                                  bounds of discretion that the Secretary has over the conduct of the
                                  census.14 The Court specifically stated in the decision that it was “not
                                  decid[ing] whether the Constitution might prohibit Congress from
                                  conducting the type of statistical adjustment considered here.”15 In that
                                  case, the Court, citing its previous decisions concerning census issues,
                                  concluded that so long as the Secretary’s conduct of the census is
                                  “consistent with the constitutional language and the constitutional goal of
                                  equal representation,” it is within the limits of the Constitution.16 The
                                  Court based its deference to the Secretary’s action on “the wide discretion
                                  bestowed by the Constitution upon Congress, and by Congress upon the
                                  Secretary.”17


                                  13
                                    13 U.S.C. 141(a).
                                  14
                                    Wisconsin v. City of New York, 116 S. Ct. 1091 (1996).
                                  15
                                    [Id., at 1101.]
                                  16
                                    [Id.]
                                  17
                                    [Id., at 1103.]



                                  Page 27                                                    GAO/GGD-97-142 2000 Census Design
B-276531




While, as noted earlier, this decision did not address whether statistical
adjustments were constitutionally permissible, proponents of sampling
have argued that the Court’s recognition of the considerable discretion
granted Congress and, by delegation, the Secretary of Commerce, to
conduct the census would support the use of sampling if the Secretary
determined sampling was necessary to produce a more accurate count of
the population than would result from a bare headcount. Alternatively,
opponents of sampling have argued for a more literal interpretation of the
phrase “actual Enumeration” in the Constitution and point to
dissatisfaction with the initial congressional apportionment (which has
been described as a “conjectural ratio”) and to the consistent practice of
using unadjusted headcounts from the first census in 1790 until recent
decades.

There is also a controversy concerning the application of specific statutory
provisions to sampling. The Census Act states that the Secretary of
Commerce is to undertake a decennial census “in such form and content
as he may determine, including the use of sampling procedures.” However,
another section of the act18 appears to restrict the use of sampling as
follows:

“[e]xcept for the determination of population for purposes of apportionment of
Representatives in Congress among the several States, the Secretary shall, if he considers
it feasible, authorize the use of the statistical method known as “sampling” in carrying out
the provisions of this title.” (emphasis added)


The language and legislative history of these statutory provisions has been
the subject of debate on the issue as to what, if any, limits on the use of
sampling were envisioned by Congress. Judicial interpretation of the
statutory provisions has been confined to lower courts and has generally
supported the conclusion that section 195 permits adjustment for
apportionment purposes.19 The question of whether sampling is statutorily
and constitutionally permissible in determining the decennial census
count can only be definitively resolved by the Supreme Court.




18
  13 U.S.C. 195.
19
 See, e.g., City of Philadelphia v. Klutznick, 503 F. Supp. 663 (E.D. Pa. 1980); Young v. Klutznick, 497 F.
Supp. 1318 (E.D. Mich. 1980).



Page 28                                                        GAO/GGD-97-142 2000 Census Design
                     B-276531




                     The 1990 Census was more costly yet less accurate than the 1980 census.
Risk of a Failed     By 1994, the fundamental design of the 1990 Census had been found to be
Census in 2000 Has   flawed and in need of change. This conclusion was reached independently
Increased            by the Department of Commerce task force for designing the 2000
                     Decennial Census; two expert panels of the National Academy of Sciences,
                     one of which was commissioned by Congress to study the 1990 Census;
                     the Bureau; and us. As a result of this conclusion, the Bureau was faced
                     with the question of how best to change the conventional census-taking
                     methods in a manner that would make them less costly and more accurate
                     than the 1990 Census and would meet the approval of stakeholders,
                     including Congress, federal agencies, state and local governments, the
                     public, demographers, and others who rely on census information.

                     Planning a decennial census that is acceptable to all of these stakeholders
                     includes analyzing the lessons learned from past practices, identifying
                     those initiatives that show promise for producing a better census at lower
                     cost, testing those initiatives to determine their effectiveness and
                     feasibility, and convincing stakeholders of the value of the proposed
                     changes. Although the Bureau has generally been responsive to concerns,
                     suggestions, and recommendations made by us and others, it has not been
                     able to convince all of its key stakeholders, particularly Congress, of the
                     value and acceptability of its plans and proposals for improving the design
                     of the 2000 Census. Since our October 1995 testimony, significant
                     congressional opposition to the Bureau’s census design has surfaced, and
                     Members of Congress have raised questions about the level of funding
                     being requested for the census. However, the Bureau has said that the
                     alternative of returning to the conventional census design (i.e., without
                     methods improvements, sampling for nonresponding housing units, or ICM)
                     that failed to include more than 4 million people in 1990 and would miss at
                     least 5 million people in 2000 is not an alternative. Thus, in February 1997,
                     this uncertainty over design and funding levels, at this late stage of census
                     preparation, led us to designate the 2000 Census as being at high risk for
                     wasted expenditures and unsatisfactory results.20

                     At least two factors have contributed to the Bureau’s inability to reach
                     agreement with Congress. First, the Bureau has not always provided
                     sufficient information to Congress or others to support its initiatives.
                     Specifically, the Bureau has not provided (1) enough detailed data to show
                     the range and distribution of effects that its initiatives are designed to
                     achieve and (2) the results of its research to address concerns about the
                     soundness or subjectivity associated with its proposed statistical methods.

                     20
                       GAO/HR-97-2, February 1997, pp. 141-146.



                     Page 29                                      GAO/GGD-97-142 2000 Census Design
                           B-276531




                           Second, the Bureau has neither successfully tested the operational
                           feasibility of some key initiatives it would implement in 2000 nor yet
                           determined how well all these initiatives work together. This situation
                           contributes to uncertainty over whether the Bureau’s plans can be
                           successfully carried out.


Need Exists for More       Over the last few years, the Bureau has provided general data on the
Detailed Information and   anticipated mail response rates to questionnaires, the accuracy of the
Data to Support Key        census data, and the estimated cost of and dollar savings from its
                           initiatives. However, it has not always provided sufficiently detailed data
Bureau Proposals           on the expected effects of its initiatives on such key variables as the
                           accuracy or equity of census data. This lack of sufficient data on expected
                           effects has made it difficult for Congress and other stakeholders to
                           support Bureau initiatives.

                           The Bureau’s proposal to expand the use of statistical sampling and
                           estimation procedures to handle the workload resulting from
                           nonresponding housing units is an example of a proposal for which the
                           Bureau did not provide sufficiently detailed data. When we testified in
                           October 1995, the Bureau was studying alternative sampling designs.
                           Although we supported the concept of using statistical sampling and
                           estimation methods, we noted that further study of the alternatives was
                           necessary. In the same congressional hearing, the Director of the Census
                           Bureau testified that the Bureau would continue its research and provide
                           details of the Bureau’s plans, including details on alternatives for sampling
                           nonresponding housing units and for statistical estimation procedures.

                           In February 1996, the Director of the Bureau, the Secretary of Commerce,
                           and the Director of OMB, among others, presented the Bureau’s overall
                           plans for the 2000 Decennial Census. The sampling initiative the Bureau
                           selected involved a 90-percent truncation design that would be controlled
                           at the county level. The Bureau continued to say that a design employing
                           the truncation option for nonresponse follow-up, together with ICM, was
                           the superior alternative. However, the Bureau provided no additional
                           details about the effects of the selected option or of the other alternatives
                           on census accuracy or equity.

                           The Bureau’s plan for the 2000 Census was not well received by the House
                           Committee on Government Reform and Oversight and certain other
                           stakeholders who were concerned about the effects of this plan on the
                           accuracy and equity of the census. At hearings held the day after the



                           Page 30                                       GAO/GGD-97-142 2000 Census Design
B-276531




Bureau presented its plans, several witnesses testified that the plan was
unacceptable. One concern was that sampling at the county level would
cause a deterioration in the accuracy of the counts of minorities.

On September 16, 1996, the Bureau announced a revision to its plan to
control sampling of nonresponding housing units at the county level,
stating that it would sample at the tract level. However, the Bureau
provided only limited additional data that would enable Congress and
other stakeholders to gauge the impact of the revised proposal on
accuracy and equity. In general, the data were summary statistics showing
the average error rates at various geographic levels. When the Bureau
provided additional information on ranges of error rates at different
geographic levels, it did not provide the supporting details. In particular,
the Bureau did not provide the details on the distribution of errors across
geographic areas, such as states or tracts. Such details would help show
whether the potential error rates for most areas were close to reported
averages or distributed more widely across the range of errors. They
would also help identify the number of geographic areas with potentially
high error rates and whether they were scattered across the country or
clustered in certain areas.

On September 24, 1996, the House Committee on Government Reform and
Oversight issued a report that was critical of the Bureau’s initiatives for
sampling and statistical estimation. Among other things, the Committee
found that the Bureau had not clarified issues of accuracy, particularly for
small geographic areas, raised by the sampling initiative. The Committee
also raised concerns about the operational feasibility of, and possible
subjectivity associated with, the Bureau’s proposed sampling and
estimation procedures. It also found that views differed on the
constitutionality and legality of using the proposed sampling and
estimation procedures, particularly with regard to apportionment. The
Committee recommended that the Bureau not use sampling and
estimation procedures to complete or adjust the census.

In December 1996, we met with Bureau officials to discuss the need for
and importance of having sufficient information on the potential effects
that its proposed sampling and estimation initiatives would have on
accuracy and equity. The Bureau officials said that they had been reluctant
to release detailed draft data while research was under way because of
concerns about criticisms the Bureau may face if the numbers changed on
the basis of subsequent research results. However, the officials agreed that
they needed to provide these data and expedited their efforts to do so. In



Page 31                                      GAO/GGD-97-142 2000 Census Design
B-276531




December 1996 and January 1997, the Bureau provided us with detailed
data on the results of its simulations. Although these were draft data, they
compared the potential effects on accuracy at various geographic levels
that could be produced using various design alternatives proposed for the
2000 Census. However, in a February 1997 response to the concerns of the
House Committee, the Bureau provided neither these nor other detailed
data to address the Committee’s concerns about the effects of the planned
sampling and estimation procedures.

On April 2, 1997, we provided the Bureau with a draft of this report for
comment. In attempting to verify the data and other information that they
had provided us for the report, Bureau officials discovered some
discrepancies and other errors. For example, there were some
inconsistencies in the data displayed in the chart the Bureau had been
using to summarize the potential costs and accuracy of alternative census
designs. (The revised chart appears as fig. II.1.) Most columns in the chart
presented projected results for the 2000 Census, but the tract-level data
represented a simulation of results for the 1990 Census. Bureau officials
also discovered a problem in the data files used to simulate the results for
those census designs that did not involve sampling or statistical
estimation; this problem increased the reported error rates at the census
tract level for those designs. The cost estimate for the design alternative
using direct sampling for nonresponse (the Bureau’s refined plan) was
also understated because the Bureau had not revised its estimate to
account for the cost of changing from a nonresponse follow-up based on
county-level response rates to one based on tract-level rates.

The Bureau notified us about these data problems and mistakes later in
April 1997 and began to rerun the data we had requested. However, the
Bureau was not able to provide us with a final version of the revised
information and data sets until June 1997. The revised data addressed the
problems identified in April by consistently presenting projected results
for the 2000 Census, including revised cost estimates. While we waited for
the revised data, which should be shared with all stakeholders, Public Law
105-18 was enacted, requiring Commerce to provide Congress with a
comprehensive and detailed plan outlining its proposed methodologies for
conducting the 2000 Decennial Census.

Another example of the Bureau’s not providing detailed data on the
expected effects of a Bureau initiative involves congressional concerns
about the subjectivity associated with the Bureau’s sampling plans. In its
September 1996 report, the House Committee on Government Reform and



Page 32                                      GAO/GGD-97-142 2000 Census Design
B-276531




Oversight raised concerns about the subjectivity associated with such
decisions as the selection of the samples the Bureau intends to use for
nonresponse follow-up and for ICM. The final details of the Bureau’s
methods and procedures could affect the census results, and this is
especially important since the formula used to apportion seats in the
House of Representatives is mathematically very sensitive to the number
of people in states that may receive the last few seats through the
apportionment process. However, in its February 1997 response to the
House Committee on Government Reform and Oversight, the Bureau did
not provide specific information on how it would make many decisions,
such as sample selections. Without information on the census design
options considered, their likely implications, the choices the Bureau made,
and the bases for such choices, Congress and other stakeholders may
continue to have concerns over subjectivity in the 2000 Census.

The Bureau also has not provided detailed justification for its proposed
initiative for a $100 million advertising campaign, concerning its effects on
cost and response rates. Previously, the Advertising Council had
conducted a public service advertising campaign for the census at no cost
to the Bureau. Although we share the Bureau’s hope that its planned
advertising campaign will increase mail response rates, the Bureau has
provided no detailed data linking expenditures on advertising with a
corresponding increase in public response. Although it may be impossible
for the Bureau to predict precisely the increase in the response rate that
paid advertising may produce, as of June 20, 1997, the Bureau had not
provided data supporting the budget of its proposed $100 million
advertising campaign. In the recent 1996 election, hundreds of millions of
dollars were spent not only to promote the candidates, but also in a public
service advertising effort to promote voting to specific groups, such as
18-to 25-year-olds, who were targeted by rock stars and other celebrities
on radio, television, and cable television channels using the slogan “Rock
the Vote.” Nevertheless, under 50 percent of the voting age population
turned out to vote. Although differences exist between filling out a census
questionnaire that is sent to one’s home and either getting to a polling
place or arranging to vote at home, both require responses motivated by
civic involvement. Although about 93 percent of those surveyed on the
1990 Census were aware of it, only 65 percent of those households that
received questionnaires by mail responded. Thus, a question is raised
about the effectiveness of paid advertising in stimulating action as
opposed to simply raising awareness. The Bureau is planning to evaluate
its proposed advertising campaign in the Census 2000 Dress Rehearsal, but




Page 33                                      GAO/GGD-97-142 2000 Census Design
                              B-276531




                              it is not yet clear how much money Congress will provide for this
                              evaluation.

                              Finally, in March 1997, Bureau officials said that, as part of its multiple
                              mail contact initiative, the Bureau plans to send replacement
                              questionnaires to all housing units, as opposed to (as planned) just those
                              that did not return the original questionnaire. However, the Bureau did not
                              release data on the costs or benefits of this change, which would result in
                              the Bureau’s mailing two questionnaires to about 97.3 million households
                              in 2000, about 59.5 million of which may already have returned a
                              questionnaire.21


Questions Remain About        Members of Congress have questioned whether the Bureau can
the Operational Feasibility   successfully implement some aspects of its current plan for the 2000
of Some Aspects of the        Census. They have also raised concerns about the Bureau’s proposal to
                              incorporate an adjustment into its basic counting process, citing issues of
Bureau’s Refined Plan         subjectivity and potential error associated with such an adjustment.
                              Because the Bureau has not tested some aspects of its currently proposed
                              plan and has not tested whether all aspects of this plan will work in
                              concert with each other, there is uncertainty as to whether or how well the
                              Bureau’s plan can be carried out.

                              Field testing enables the Bureau, as well as Congress and other
                              stakeholders, to assess the operational feasibility of key initiatives of its
                              plans. The Bureau has been evaluating features of its proposed census
                              design in a variety of tests over the past few years and plans to do more
                              testing before 2000. For example, the Bureau did a test census in 1995 and
                              began another test, primarily of its proposed coverage measurement
                              survey and estimation methods (ICM), in October 1996. However, some
                              operational aspects of the current census design have not yet been fully
                              tested successfully or have not yet been tested in a manner similar to the
                              implementation being proposed for 2000.

                              For example, the Bureau’s current plan depends on completing sampling
                              for nonresponse follow-up and ICM in time to produce the population count
                              by December 31, 2000. However, the 1995 Census Test did not test a
                              sampling operation designed to help determine whether nonresponse
                              follow-up of the magnitude projected by the Bureau’s current plan could

                              21
                                The Bureau anticipates mailing questionnaires to about 82 percent of housing units and plans to use
                              other procedures for obtaining responses from the remaining housing units. If the Bureau finds
                              concentrations of new city-style addresses during its address list development process, it may add
                              them to the percentage of housing units that are to receive a questionnaire by mail in 2000.



                              Page 34                                                      GAO/GGD-97-142 2000 Census Design
B-276531




be completed in time for ICM to be done on schedule. While the Bureau did
use a version of direct sampling for nonresponse follow-up in the 1995
Census Test, it was not designed to achieve at least a 90-percent
completion rate before the Bureau relied on sample data to account for
the remaining nonrespondents.

In order to do ICM more rapidly, the Bureau plans to use laptop computers
in the field. However, during the 1995 Census Test, the Bureau had
difficulty loading data from nonresponse follow-up activities into the
laptop computers in time for use by enumerators doing ICM interviews. As
a result, enumerators, were unable to match interview data with original
census questionnaire data, which prevented them from resolving
discrepancies as originally planned. The Bureau revised the procedure and
its software to correct earlier problems, began to retest this operation in
early 1997, and is continuing to work on the procedures and software to be
used for computerized matching of housing units and individuals listed by
ICM and other census operations.


Another aspect of the current plan that has yet to be tested is the
operation of the scanning equipment for the 2000 Census that will be used
to capture data on census forms. Although the Bureau used some scanning
in the 1990 Census (i.e., multistage photoimaging equipment that turned
photographs into microfiche and, in turn, into tapes), it is planning to use
more sophisticated scanning equipment that can “read” handwritten
material and convert it directly into a machine-readable format, as well as
a more extensive use of scanning, in 2000. The Bureau used a prototype
scanning system and optical character reader in parallel with the 1995
Census Test. The successful operation of this equipment is critical to the
Bureau’s plan because its ICM procedure depends on the availability of
accurate information. The Bureau has been developing the equipment with
a contractor and plans to have a prototype ready for testing in the Census
2000 Dress Rehearsal in 1998. At this time, however, it is not clear that this
test will be sufficient for determining whether the equipment can
successfully handle the volume of forms that will be processed in 2000. If
the equipment cannot or does not work out as expected, the Bureau
proposes using a keypunching operation, which would be considerably
slower than the scanning alternative and could cost more to complete the
task.

According to the Bureau, the Census 2000 Dress Rehearsal is to provide a
census-like environment to demonstrate simultaneously those procedures
that the Bureau plans to use in the 2000 Census. A meaningful dress



Page 35                                       GAO/GGD-97-142 2000 Census Design
              B-276531




              rehearsal is important for at least three reasons. First, the Bureau plans to
              implement several complex new design features in 2000, including the use
              of technologically sophisticated equipment that has not been used in
              previous censuses. Second, Congress and the Bureau have yet to reach
              agreement on the 2000 Census design, and uncertainty over the
              operational feasibility of the Bureau’s design was one of Congress’ major
              concerns. Third, a key feature of the Bureau’s current plan is to produce a
              one-number census by using sampling and statistical estimation.

              Thus far, the Bureau has not fully tested its proposed design, and the
              Census 2000 Dress Rehearsal is the last opportunity for such full-scale
              testing. While the Bureau provided us with projected results from
              simulations of its design, the Dress Rehearsal would provide an
              opportunity to determine whether the Bureau’s refined plan would
              actually produce similar results when implemented in the field.
              Implementing the 2000 Census without adequate testing creates the risk of
              a census with unsatisfactory results. Similarly, if the Bureau implements
              its current plan in 2000 and Congress were to decide after the Census 2000
              Dress Rehearsal that it did not want the Bureau to make an adjustment to
              its initial count, substantial funds could be wasted, and the census results
              could be questionable.

              The Bureau has time before the 2000 Census is to begin to resolve open
              issues and test much of what it actually plans to implement in 2000.
              However, the available time is diminishing, and the Bureau has not yet
              completed detailed planning for all of its design features or its dress
              rehearsal. The Census 2000 Dress Rehearsal appears to be the last real
              chance the Bureau will have for a large-scale operational test of its overall
              design. Thus, a well planned and executed dress rehearsal should provide
              the Bureau with a good opportunity to demonstrate to Congress and
              others whether or not it can successfully implement its current plan and
              produce acceptable results. The need for this demonstration is particularly
              important considering the problems the Bureau has experienced to date
              and the controversy that surrounds its design.


              The Bureau has made considerable progress in preparing for the 2000
Conclusions   Census since our October 1995 testimony. The Bureau has, however,
              revised some of its initial plans and encountered problems involving some
              aspects of its proposed design. For example, it has changed its plan for
              sending replacement questionnaires to just those housing units that did
              not return the original questionnaire. Instead, it now plans to send



              Page 36                                       GAO/GGD-97-142 2000 Census Design
B-276531




replacement questionnaires to all housing units. Most importantly, it has
run into major opposition from Congress on its plans to use sampling for
nonresponse follow-up and ICM, and consequently it is still not certain how
much funding will be made available. This situation creates a high risk to
the nation of a census involving wasted expenditures and unsatisfactory
results.

Two of the major reasons for congressional concern are the lack of
sufficiently detailed data, particularly on the effects on accuracy and
equity of the Bureau’s proposals for sampling for nonresponse and ICM,
and the uncertainty surrounding the operational feasibility of key aspects
of the Bureau’s current census design. We recognize that the Bureau has
faced difficulties as it has tried to address the concerns of all its
stakeholders—who at times have had conflicting views—and as some
aspects of its plan have not worked out as well as expected during testing.
Although the Bureau faces a risk if it provides draft data while research is
under way, we believe that Congress and other stakeholders would benefit
from having the best data available at the time Bureau proposals are made,
along with the appropriate qualifications.

Full and open disclosure is the only possible antidote to suspicions that
the Bureau is failing to fully inform its legitimate stakeholders. Since one
of the purposes of testing is to determine the operational feasibility of
plans, it should not be surprising that problems arise. No design is without
flaws and trade-offs. The Bureau has said that the alternative to its
proposed design is to return to past census methods and incorporate
changes, such as initiatives to improve the response rate, but not sampling
for nonresponse follow-up or ICM. In this regard, data on the costs and
effects of that alternative would be helpful in considering whether the
Bureau’s proposal should be approved. However, the Bureau has not
provided Congress with these data.

By not providing sufficient data on the likely effects of the Bureau’s
initiatives for addressing the key goals for the census—reduced costs and
improved accuracy and equity—the Bureau may fail to convincingly
demonstrate the value of its plans and, in turn, may contribute to
congressional skepticism about census design and the necessary funding
level. Through the enactment of Public Law 105-18, Commerce is now
required to supply data by July 12th on the likely effects of the census
design, which should contribute to a more informed debate. However, as
new data become available after July 12th, they also should be shared with
Congress and other stakeholders.



Page 37                                      GAO/GGD-97-142 2000 Census Design
                           B-276531




                           The Census 2000 Dress Rehearsal offers a final opportunity for the Bureau
                           to demonstrate the operational feasibility of its current plan, which
                           proposes many new design initiatives for the 2000 Census. If in the dress
                           rehearsal the Bureau does not demonstrate that all of the key initiatives of
                           the design that are to be used in 2000 can successfully be implemented to
                           produce acceptable results, it risks a census with higher rates of error than
                           in 1990 as well as higher costs.

                           With less than 3 years remaining until the census is to take place, the
                           Bureau and Congress are not yet in agreement on some basic census
                           design issues and the overall funding level. Although we believe there is
                           still sufficient time for agreement to be reached and for the Bureau to
                           prepare for a successful census, little margin for missteps, indecision, or
                           miscommunication remains.


                           We recommend that the Director of the Bureau of the Census
Recommendations to
the Director, Bureau   •   provide Congress and other stakeholders with detailed data, which are
of the Census              updated as necessary to meet the objective of full and open disclosure, on
                           the expected effects of the Bureau’s census design proposals on costs and
                           on accuracy and equity at various geographic levels, particularly as they
                           relate to sampling for nonresponse and ICM as well as on a design that
                           would not involve sampling nonrespondents and ICM;
                       •   work with Department of Commerce and OMB officials in reaching
                           agreement with Congress on the design and funding level as quickly as
                           possible, so that the Census 2000 Dress Rehearsal can be used to
                           demonstrate all key design features planned for the 2000 Census; and
                       •   conduct the Census 2000 Dress Rehearsal to mirror as closely as possible
                           the design features planned for the 2000 Census, including paid
                           advertising, to test the operational feasibility of the design and to
                           determine whether the outcomes achieved in the dress rehearsal are
                           similar to those of the Bureau’s research and simulations, and provide
                           these results to Congress in sufficient time to enable it to affect, if it so
                           chooses, the final design for the 2000 Census.


                           On April 2, 1997, we requested comments on a draft of this report from the
Agency Comments            Secretary of Commerce. On April 23, 1997, the Director of the Bureau of
and Our Evaluation         the Census responded that she agreed with our recommendations. She
                           said that the Bureau had begun an intensive effort to improve
                           communications with Congress and demonstrate responsiveness by



                           Page 38                                       GAO/GGD-97-142 2000 Census Design
B-276531




providing the information Congress needs to assess the value of the
Bureau’s plans for Census 2000. She also said that the Bureau expects
these improved communications to lead to agreement on the plan for
conducting the Census 2000 Dress Rehearsal as a means to demonstrate
the robustness of the methods proposed for use in Census 2000.

The Director also stated that, in reviewing our draft report, the Bureau was
alerted to several inconsistencies in the data it had provided us and on
which we relied in making our analysis. She said the Bureau would
regenerate the data. These data were provided to us on June 16, 1997, and
are included as appendix II. Our report has been modified to reflect the
new data where appropriate. The revised data did not cause us to change
our basic analysis and conclusions, but they did reinforce the need, as
expressed in our recommendations, for the Bureau to expose the data
relating to the effects of its plan to broad scrutiny by Congress and other
stakeholders.

In view of the reporting requirements of Public Law 105-18, which was
enacted after the Director commented on our draft report, we modified
our recommendations slightly. While we continue to believe the Bureau
should provide the details of its plans to Congress, we also believe
updated data should be provided as it becomes available, beyond the
reporting date established in Public Law 105-18.


We are sending copies of this report to the Chairman, Senate Committee
on Governmental Affairs; Chairman and Ranking Minority Member, House
Committee on Government Reform and Oversight; Director, OMB;
Secretary of Commerce; Director, Bureau of the Census; and other
interested parties. Copies will be made available to others on request.




Page 39                                      GAO/GGD-97-142 2000 Census Design
B-276531




Please contact me on (202) 512-8676 or James H. Burow, Assistant
Director, on (202) 512-3941 if you or your staff have any questions. Major
contributors to this report are listed in appendix IV.

Sincerely yours,




L. Nye Stevens
Director
Federal Management and Workforce Issues




Page 40                                      GAO/GGD-97-142 2000 Census Design
Page 41   GAO/GGD-97-142 2000 Census Design
Contents



Letter                                                                                            1


Appendix I                                                                                       44
                        Sampling for Nonresponse Follow-Up                                       44
The 2000 Decennial      Integrated Coverage Measurement                                          54
Census With             Combined Effects of Sampling for Nonresponse and ICM                     65
Statistical Sampling
and Estimation: An
Overview of
Operational and
Technical Issues
Appendix II                                                                                      77

Bureau of the Census
Summary Information
on Design
Alternatives for the
2000 Census
Appendix III                                                                                     87

Comments From the
Bureau of the Census
Appendix IV                                                                                      88

Major Contributors to
This Report
Related GAO Products                                                                             91


Tables                  Table 1: Comparison of Estimated 2000 Census Costs for Selected          23
                          Design Alternatives
                        Table I.1: Distribution of Census Tracts by Relative Error Level         69
                          Using Alternative Census Designs




                        Page 42                                    GAO/GGD-97-142 2000 Census Design
          Contents




          Table II.1: Bureau of the Census Summary Data on Projected              78
            Costs and Accuracy of Selected Census 2000 Alternative
            Methodologies

Figures   Figure 1: The Net Undercount Since 1940                                  7
          Figure 2: Simulations Indicate Bureau’s Refined Design Could            25
            Produce Lower Relative Error Rates Than a No Sampling Design
            in 2000
          Figure I.1: How Different Sampling Options Would Work in a              50
            Hypothetical Tract
          Figure I.2: Distribution of Census Tracts by Error Level Shows          71
            Trade-Off Between Direct Sampling and No Sampling Designs




          Abbreviations

          CAPI       Computer-assisted personal interviewing
          CV         Coefficient of variation
          DSE        Dual system estimation
          ICM        Integrated coverage measurement
          OMB        Office of Management and Budget
          PES        Post enumeration survey
          SE         Standard error


          Page 43                                   GAO/GGD-97-142 2000 Census Design
Appendix I

The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues
               As part of our ongoing oversight of decennial census activities, and in
               order to assist Congress in assessing the proposed design for the 2000
               Census, we have been reviewing the Bureau’s plans and efforts to
               incorporate sampling for nonresponse follow-up and integrated coverage
               measurement (ICM) procedures in the next census. In this appendix, we
               provide additional background and technical information on the results of
               the Bureau’s research on sampling and estimation methods, such as the
               expected advantages, disadvantages, costs, and benefits of different design
               options the Bureau has considered for the 2000 Census. The appendix is
               organized in three main sections. The first section presents information on
               sampling for nonresponse follow-up, the second on ICM, and the third on
               the combined effects of these proposed census procedures. Appendix II
               presents the Bureau’s summary data on the projected costs and accuracy
               of its current plan and alternative designs for the 2000 Census.


               Declining rates of public response to the census have generated a costly,
Sampling for   time-consuming nonresponse follow-up workload for the Bureau. The
Nonresponse    Bureau is planning for many efforts in the 2000 Census to encourage
Follow-Up      higher response rates and reduce its reliance on expensive enumerator
               visits to every housing unit for which a census questionnaire is not
               returned by mail. However, the nonresponse workload is still likely to be
               substantial. Therefore, after making what it believes are reasonable efforts
               to directly contact every housing unit, the Bureau plans to sample a
               portion of the remaining nonresponse units rather than continuing its
               efforts to complete follow-up interviews for all of the nonresponse units as
               it has in the past.

               The primary purposes of using sampling for nonresponse follow-up are to
               save money and time. The Bureau estimates that its current plan for
               sampling during nonresponse follow-up operations could save about
               $400 million in the 2000 Census off the cost of a census design
               incorporating all other proposed improvements on the 1990 Census model
               except sampling for nonresponse. The use of sampling also should enable
               the Bureau to complete follow-up operations more quickly than in past
               censuses, which is essential if it hopes to complete ICM operations and
               produce the census data by its legal deadlines.

               The Bureau focused its research in this area on identifying and refining the
               most promising options for sampling a portion of nonrespondents after the
               initial mail return phase of the census is completed. On the basis of its
               initial work on this subject, the Bureau decided in February 1996 that its



               Page 44                                      GAO/GGD-97-142 2000 Census Design
                        Appendix I
                        The 2000 Decennial Census With Statistical
                        Sampling and Estimation: An Overview of
                        Operational and Technical Issues




                        plan for nonresponse follow-up in the 2000 Census should ensure that it
                        completes census questionnaires for at least 90 percent of all housing units
                        in each county before using the information from sample units to account
                        for the remaining nonrespondents. Public feedback and concerns about
                        the potential fairness and accuracy of this initial proposal persuaded the
                        Bureau to revise its plans in September 1996 to track response rates and
                        control nonresponse sampling at the level of census tracts, which are
                        generally smaller and more homogeneous than counties.

                        Once the Bureau decided to control nonresponse sampling at the census
                        tract level, it began to study three options for how to implement this basic
                        design: (1) truncating conventional follow-up interviews after reaching the
                        90-percent completion threshold, then sampling the remainder;
                        (2) truncating conventional follow-up after a specific period of time, then
                        sampling the remainder; and (3) sampling nonrespondents directly after
                        the end of the mail return phase of the census. All three options are
                        designed to collect information directly for at least 90 percent of the
                        housing units in each tract, through a combination of mail returns,
                        follow-up interviews, and other means, before relying on sample data to
                        estimate the population counts and characteristics of the remaining
                        nonrespondents. The Bureau’s research indicated that implementing
                        sampling directly after the mail return phase would have more advantages
                        than the other options with regard to cost, accuracy, and operational
                        feasibility. Therefore, in March 1997, the Bureau selected the direct
                        sampling option for nonresponse follow-up in the 2000 Census.


Census Nonresponse      For the 2000 Census, the Bureau intends to rely on mail returns to collect
Generates Substantial   census data from most housing units in the country, as it has in every
Problems for Bureau     census since 1970. Unfortunately, census mail response rates have been
                        falling since the Bureau first implemented this mail-back approach. If past
                        trends continue, the Bureau believes the mail response rate could decline
                        from the 65-percent rate achieved in the 1990 Census to 55 percent in the
                        2000 Census, leaving over 50 million nonresponding units to account for.

                        The declining rate of public cooperation with the decennial census
                        generated a substantial problem for the Bureau in 1990. When
                        questionnaires were not returned for all housing units provided a census
                        form, the Bureau sent temporary enumerators out into the field in an
                        attempt to get data on each nonresponding unit. This was an extremely
                        costly, time-consuming, and sometimes error-prone operation. In the 1990
                        Census, nonresponse follow-up operations required a minimum of



                        Page 45                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




$560 million to carry out and continued for 14 weeks instead of the
planned 6 weeks. The delay in completing nonresponse operations can be
attributed in large part to the effort needed to contact the relatively small
portion of units that proved most difficult to resolve. This problem was
widespread. For half of the census tracts in the country, once 90 percent
of returns were in, the Bureau still needed 65 days or more to finish
collecting data on the last 10 percent of housing units. The Bureau’s
evaluations of the 1990 Census demonstrated that the quality of these
enumerations declined substantially as data collection efforts continued
over time, (e.g., with regard to persons counted more than once, counted
in the wrong place, or missed entirely).

The Bureau plans a number of efforts to encourage people to voluntarily
respond to the 2000 Census. These efforts include enhancements such as
more user-friendly census forms, an improved marketing plan, providing
multiple ways for people to respond to the census, and implementing a
strategy of multiple mail contacts with each address. The Bureau
estimates the combined effects of these efforts will result in an increase of
about 12 percent in baseline response rates, projecting to an overall mail
response rate of 66.9 percent in 2000. However, even if these new efforts
work as well as planned, the Bureau would still face a substantial
nonresponse workload in the next census of nearly 40 million housing
units. This would exceed the entire nonresponse workload of the 1990
Census by approximately 5 million housing units.

Handling a nonresponse follow-up workload of this size is likely to pose a
serious challenge to the Bureau in the 2000 Census. Mail response rates
can vary dramatically among different areas of the country, generating
very large nonresponse workloads in some places. In addition, there is
ample evidence that nonresponse follow-up is becoming more expensive,
not only because the number of nonrespondents is growing, but also
because (1) residents of nonresponding housing units are becoming more
difficult to find and interview and (2) the Bureau is finding it harder to
recruit and afford enough qualified temporary workers to complete the
task. Those problems, in turn, can contribute to escalating labor costs and
declining productivity during the census.

Bureau officials believe that these problems, especially the workforce
difficulties they anticipate in the 2000 Census, suggest that they could not
use the 1990 Census design again even if they wanted to. They intend to
implement a number of other efforts to reduce the reliance on enumerator
visits to each nonresponding housing unit (e.g., attempting to cover more



Page 46                                       GAO/GGD-97-142 2000 Census Design
                          Appendix I
                          The 2000 Decennial Census With Statistical
                          Sampling and Estimation: An Overview of
                          Operational and Technical Issues




                          of these units through telephone interviews). Altogether, the variety of
                          activities planned for the next census should result in multiple efforts to
                          directly contact every housing unit, as well as multiple opportunities for
                          people to respond to the census. However, Bureau officials still believe
                          that sampling a portion of nonrespondents will be necessary to control the
                          cost of the census and enable them to complete follow-up operations in a
                          timely manner.


Bureau Continues to       The primary purposes of using sampling for nonresponse follow-up are to
Refine Alternatives for   save money and time. After preliminary research efforts to identify
Sampling Nonrespondents   promising designs for sampling nonrespondents, Bureau management
                          announced in February 1996 selection of a 90-percent truncation design
                          for the Census 2000 plan. Under this design, the Bureau would implement
                          a conventional nonresponse follow-up operation at the end of the mail
                          return phase of the census that would continue until enumerators were
                          able to obtain information for at least 90 percent of the housing units in
                          each county or county-equivalent area (such as parishes in Louisiana). The
                          Bureau would then truncate (curtail) conventional follow-up operations
                          and select a 1-in-10 sample of the remaining nonresponding units. It would
                          use the information obtained from interviewing these sample units to
                          provide census data for the sample units themselves and also to estimate
                          population counts and characteristics for the remaining nonresponse
                          units.

                          In September 1996, the Bureau revised its planned approach. The new
                          design would control nonresponse sampling at the level of each census
                          tract—a small, relatively permanent statistical subdivision of a
                          county—rather than each county.1 This plan maintained the overall goal of
                          collecting data on at least 90 percent of the housing units in each area
                          before relying on sampling to estimate data for the remaining
                          nonrespondents. The Bureau made the change to tract-level truncation in
                          response to public feedback and reservations about the potential effects of
                          county-level truncation. Stakeholders from minority communities and
                          others, including a National Academy of Sciences panel, pointed out that
                          areas within counties may often differ in terms of how easy they are to
                          enumerate, which could affect the implementation of the Bureau’s plan. If
                          the Bureau controlled truncation at the county level, they noted the
                          likelihood that the threshold would be achieved disproportionately by mail

                          1
                           Census tracts usually have between 2,500 and 8,000 persons—averaging about 4,000—and, when first
                          delineated, are designed to be homogeneous with respect to population characteristics, economic
                          status, and living conditions. Census tracts do not cross county boundaries. The 1990 Census included
                          over 60,000 tracts and tract-equivalent areas.



                          Page 47                                                     GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




responses and direct follow-up interviews in the areas where enumeration
was easiest. Sampling might then be relied upon for much more than just
the last 10 percent of households within the harder-to-enumerate areas.

The Bureau’s shift to basing its plan on tract-level response rates rather
than county-level rates appears to improve the equity of the sampling plan
but also presents new challenges. On balance, the change to tract-level
truncation may provide more equitable and consistent implementation of
sampling for nonresponse follow-up than the original design. This is
primarily because the socioeconomic characteristics of households at the
tract level are more homogeneous than is the case at the county level.
Therefore, sampling controlled at the tract level is less likely to present a
risk of uneven implementation (i.e., ignoring hard-to-enumerate areas until
the very end of the census). However, shifting to tract-level truncation also
complicates the task of managing and monitoring the nonresponse
follow-up phase, especially given that progress will need to be tracked and
controlled for around 60,000 census tracts nationwide, compared to about
3,000 counties under the Bureau’s original proposal. It may also increase
the difficulty of reaching the goal of a 90-percent completion rate in every
tract, at any rate, within a reasonable time frame.

When the Bureau decided that its basic design for nonresponse follow-up
operations in the 2000 Census should ensure that it obtains responses
from at least 90 percent of all housing units in each census tract before
relying on the information from sample units to account for the remaining
nonrespondents, it began considering alternative designs to achieve this
goal. The Bureau’s research focused on three design options:
(1) truncating conventional follow-up at 90-percent completion, then
sampling the remaining nonrespondents; (2) truncating conventional
follow-up after a specific period of time, then sampling the remaining
nonresponse units; and (3) implementing sampling of nonresponding units
directly after the mail return phase ends.

The first option was the Bureau’s original design from February 1996,
except that it would track completion rates for each tract rather than each
county. Under the time-truncation option, conventional follow-up
interviews would continue for a predetermined length of time, such as 3
weeks. After this initial follow-up period, the Bureau would select a
sample of the remaining nonrespondents in each tract that included
enough units to raise the completion rate to at least 90 percent. Under the
direct-sampling option, there would be no conventional follow-up phase.
Instead, at the end of the mail return phase the Bureau would select a



Page 48                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




sample of the nonresponding housing units in each tract that would be
sufficient to achieve at least a 90-percent completion rate. For example, in
a tract with a mail response rate of 70 percent, the Bureau would select
two out of every three of the remaining units for the follow-up sample.
Under each of the 3 options, for any tract with an initial response rate
above 90 percent the Bureau would follow up on a 1-in-10 sample of the
remaining addresses.

Figure I.1 illustrates how each of these options might work in a
hypothetical census tract where the Bureau is able to obtain a mail
response rate of 60 percent. In this simplified example, we show how the
Bureau would determine the census data for housing units in the tract as it
works toward resolving 100 percent of the units.




Page 49                                      GAO/GGD-97-142 2000 Census Design
                                       Appendix I
                                       The 2000 Decennial Census With Statistical
                                       Sampling and Estimation: An Overview of
                                       Operational and Technical Issues




Figure I.1: How Different Sampling
Options Would Work in a Hypothetical
                                       Design options
Tract

                                       90% truncation




                                       Time truncation




                                       Direct sampling




                                                         0    10       20      30      40     50     60      70    80    90     100

                                                         Percentage of housing units


                                                                   Estimated using sample data
                                                                   Sample interviews
                                                                   Conventional nonresponse follow-up interviews
                                                                   Counted by mail return




                                       Note 1: For all options, we assume a mail response rate for housing units in the tract of
                                       60 percent. The base for this calculation would be all housing units provided a census form and
                                       asked to return it by mail.

                                       Note 2: For the time-truncation option, we assume that the Bureau is able to obtain responses
                                       from half of the nonrespondents during a limited period of conventional follow-up interviews.

                                       Note 3: Some of the units that make up the nonresponse workload will be vacant or not be
                                       nonresidential units. The Bureau expects that the Postal Service will identify a portion of the
                                       vacant units in the 2000 Census (although the Bureau will recheck a sample of these). To simplify
                                       our presentation, we have not included any estimates for this component.

                                       Source: Example created by GAO for illustration purposes only.




                                       Under the 90-percent truncation option, and assuming that the tract had a
                                       mail response rate of 60 percent, the Bureau would implement a




                                       Page 50                                                       GAO/GGD-97-142 2000 Census Design
                           Appendix I
                           The 2000 Decennial Census With Statistical
                           Sampling and Estimation: An Overview of
                           Operational and Technical Issues




                           conventional nonresponse follow-up operation until enumerators were
                           able to get responses for another 30 percent of the housing units in the
                           tract (bringing the total completion rate for the tract up to 90 percent). At
                           that point, the Bureau would select a random sample at the rate of 1 in 10
                           of the nonresponding units that still remained unaccounted for. The
                           information gathered from that 1 percent of the tract’s housing units
                           would provide the census data for the sample units and the remaining
                           9 percent of the tract’s units.

                           Using the time-truncation option, there would be a limited period of time,
                           such as 3 weeks or until a prespecified date, during which census
                           enumerators would attempt follow-up visits to all nonresponding units.
                           For our hypothetical example, we assume that enumerators are able to
                           complete interviews for half of the nonresponding units during this initial
                           follow-up phase, taking the overall completion rate to 80 percent. The
                           Bureau would then need to select and interview a sample of one of every
                           two of the remaining nonrespondents to achieve a completion rate of
                           90 percent. The information gathered from that 10 percent of the tract’s
                           housing units would provide the census data for the sample units and for
                           the 10 percent of nonresponding units that remain.

                           With the direct sampling option in this scenario, the Bureau would select a
                           random sample at the rate of three of every four of the nonresponding
                           units in the tract at the end of the mail return phase. A sample of this size
                           would be enough to get the overall completion rate to 90 percent. The
                           information gathered from the 30 percent of the tract’s housing units in the
                           sample would provide the census data for the sample units themselves and
                           for the remaining 10 percent of units not selected for the sample.


Research Identified        The Bureau engaged in a variety of research efforts to study the operations
Potential Advantages and   and potential outcomes of its planned approach for conducting
Disadvantages of           nonresponse follow-up in 2000, as well as other alternatives. The Bureau
                           conducted field tests and evaluations of various elements of nonresponse
Alternative Designs        operations during the 1995 Census Test. The Bureau also carried out
                           computer simulations, using 1990 data, to examine the results produced by
                           various alternative methods for nonresponse sampling. The Bureau used
                           these simulations to help identify the sampling options that were most
                           promising.

                           In general, the Bureau’s research confirmed the potential for sampling to
                           produce less costly, more timely results from nonresponse follow-up. The



                           Page 51                                       GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




Bureau estimated that its current plan for the 2000 Census—using the
direct sampling option for handling nonresponse follow-up—could save
approximately $400 million, when compared to the cost of a census design
incorporating all other planned improvements of the 1990 Census design
(including ICM) except sampling for nonresponse follow-up. Among the
major factors driving the cost differences are the number of
nonresponding housing units that the Bureau would need to visit under
each design and the peak staffing levels that would be required in the local
census offices to carry out the follow-up interviews.

Completing nonresponse follow-up operations in a timely manner is
important if the Bureau is to limit the deterioration in the quality of the
data collected that occurs as nonresponse operations drag out over time. It
is also crucial to the success of any coverage measurement survey, such as
the ICM proposed for the 2000 Census, because the Bureau must provide
the final census data for reapportionment and redistricting purposes by
legislatively mandated deadlines. Title 13 of the U.S. Code mandates that
the state population totals required for reapportionment of the House of
Representatives be provided within 9 months after Census Day (April
1) and that local area data needed for redistricting be provided within one
year after the decennial census date. The legal deadlines the Bureau must
meet are therefore December 31 of the census year for reapportionment
data and March 31 of the following year for redistricting data.

The 1995 Census Test demonstrated the potential to complete
nonresponse follow-up operations in a more timely manner by using
sampling. In that test, the Bureau implemented a version of direct
sampling for nonresponse follow-up immediately after the completion of
mail return data collection. The overall sampling rates used by the Bureau
in this test were two-sevenths of the housing units that did not respond by
mail in the Oakland, California, test site and one-sixth of the
nonresponding housing units for the test sites in Paterson, New Jersey,
and six parishes in Northwest Louisiana. According to Bureau officials,
this approach enabled them to complete nonresponse follow-up
operations on time and within budget for the first time in any census or
test. However, this direct sampling approach did not require the Bureau to
achieve at least a 90-percent completion rate before relying on sample data
to account for the remaining nonrespondents. Therefore, while the 1995
Census Test results were encouraging, they did not resolve the question of
whether, using its selected option, the Bureau can complete nonresponse
follow-up in every tract on or close to schedule in the 2000 Census.




Page 52                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




The Bureau’s initial efforts to study alternative designs for sampling
nonrespondents helped it reach a decision on the extent to which it would
use sampling in the 2000 Census. On the basis of that preliminary work,
the Bureau proposed using a 90-percent truncation design for nonresponse
follow-up in the plan for the 2000 Census. Bureau management selected
truncation at 90 percent as the preferred design primarily because of
concerns about whether the public would understand and accept using
sampling to a greater extent. By truncating conventional follow-up only
after completing responses for 90 percent of housing units, then sampling
the remaining units, the Bureau would obtain direct responses for at least
91 percent of all units. Under other alternatives for nonresponse sampling
that the Bureau initially considered, such as truncating at 70-percent
completion then sampling, it would have obtained direct responses for less
than 80 percent of all housing units.

After the Bureau revised its plan in September 1996 so that it would
control nonresponse follow-up at the tract level, subsequent research
focused on identifying the most promising design to achieve the
90-percent completion goal in each census tract. The Bureau identified
advantages and disadvantages to each of the three design options it
considered—truncation at 90 percent, time truncation, and direct
sampling. On balance, the Bureau’s research indicated that implementing
sampling directly after the mail return phase would have more advantages
than the other options. Therefore, in March 1997, the Bureau selected the
direct sampling option for nonresponse follow-up in the 2000 Census.

The Bureau’s research suggested that a design using the direct sampling
option for nonresponse follow-up would be better than other designs in
terms of cost, accuracy, and operational feasibility. Bureau officials
estimated the cost of implementing direct sampling to be between
$200 million and $600 million less than the cost of the other two options
they considered. In simulations of the accuracy of different options, direct
sampling produced slightly better results, particularly for small geographic
areas, such as census tracts. (We discuss the expected accuracy of
alternative census designs in more detail in the last section of this
appendix, and additional summary information from the Bureau appears
in app. II.) The Bureau’s regional directors and field staff preferred this
option because it is simpler to implement, entails only one operation with
a single workload that is established at the start of the follow-up phase,
and provides more time to complete interviews for the designated sample
units. The direct sampling design, however, differs most from the design
the Bureau originally announced as its plan for the next census, as well as



Page 53                                      GAO/GGD-97-142 2000 Census Design
                      Appendix I
                      The 2000 Decennial Census With Statistical
                      Sampling and Estimation: An Overview of
                      Operational and Technical Issues




                      from the procedures used in past censuses. In addition, once the sample is
                      selected, nonrespondents in housing units not selected as part of the
                      sample would not have another chance to be interviewed by census
                      enumerators. Therefore, Bureau officials believe this option could have
                      some public-perception problems.

                      According to the Bureau, the time-truncation option may have an
                      advantage with regard to public perception because, under that design,
                      enumerators would make follow-up visits to all nonresponding housing
                      units in an intensive effort before sampling begins. A time truncation
                      design should also take less time to finish than the 90-percent truncation
                      design because conventional follow-up interviews would automatically
                      end on schedule, rather than continue until the Bureau was able to resolve
                      90 percent of each tract’s housing units. However, because it involves
                      more than one follow-up operation and workload, the time-truncation
                      design is likely to take more time than direct sampling. The intensive
                      initial follow-up effort also makes time truncation more expensive than
                      the other two options for sampling nonrespondents. The Bureau would
                      need a large number of temporary enumerators to attempt to contact
                      every nonresponding unit within the short period of conventional
                      follow-up interviews.

                      Public perception was also the primary advantage identified by the Bureau
                      for the 90-percent truncation design. Not only would it most closely
                      resemble past census procedures, but it is also the design option that the
                      Bureau has been discussing publicly since it announced its original plan
                      for the 2000 Census in February 1996. However, the Bureau’s cost-model
                      projections indicated that truncating at 90-percent completion would be
                      more costly than the direct sampling option, and that this option also had
                      the highest average error rate for small neighborhoods, like tracts, in
                      Bureau simulations. Also, the Bureau’s regional directors believed that
                      accomplishing this design in the allotted time for nonresponse follow-up
                      operations would be very difficult. Truncating at 90-percent completion,
                      like the time-truncation option, separates nonresponse follow-up into two
                      operations. The Bureau is concerned that the break between conventional
                      interviews and sample interviews makes this design more complex and
                      could result in higher staff turnover.


                      The purpose of ICM is to improve the accuracy of census data, in particular
Integrated Coverage   by reducing the differential undercounts of minorities and other
Measurement           hard-to-enumerate population groups and areas that have been



                      Page 54                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




documented for previous censuses. Evaluations of past censuses have
shown a persistent net undercount in the final census population total and,
for subnational data, net undercounts that differed across population
groups and geographic areas. Evaluations also indicated that simply
adding more conventional counting operations was not effective in
eliminating or reducing these undercounts in the 1990 Census. Therefore,
the Bureau concluded that the 2000 Census design should incorporate the
results of a coverage measurement survey conducted immediately
following basic data collection as an integral and necessary step toward
completing the census.

ICM would be the last phase in producing the final census numbers,
following mail returns and other basic data collection efforts, nonresponse
follow-up data from enumerator interviews, and nonresponse sampling
results. During this last phase, the Bureau would conduct a large sample
survey to check the accuracy of all earlier Census 2000 data collection
efforts. Bureau enumerators would compare and reconcile the results
from the ICM survey and the earlier census efforts for each housing unit in
the ICM sample to determine the extent to which people and housing units
were correctly counted, missed, or included in error in the previous
phases. Using statistical estimation methods, the Bureau would then use
this information to estimate and correct for errors in the census data for
the entire country.

The Bureau designed the proposed 2000 ICM to address several major
weaknesses of the 1990 Post Enumeration Survey (PES), especially
regarding timeliness and the accuracy of population estimates for
subnational areas.2 Because of these design changes, the 2000 Census ICM
would entail a substantial investment in resources, compared with the
1990 PES. It would also represent a dramatic shift in the integral census
operations because, rather than producing an alternative adjusted set of
census data, the results of ICM would automatically be incorporated into
one official set of census data. While results from research and testing to
date indicate that ICM has the potential to improve the accuracy of census
data, they also show that further operational and methodological testing
and development are needed before 2000.




2
 The 1990 PES was designed to estimate the net undercount in the census. It was a matching study in
which the Bureau interviewed a sample of households several months after the census. The results of
these interviews were compared with census questionnaires to determine whether each person was
correctly counted in the census, missed, or included in error. The results could have been used to
adjust the 1990 Census to correct for coverage errors, if the Secretary of Commerce had so decided.



Page 55                                                    GAO/GGD-97-142 2000 Census Design
                             Appendix I
                             The 2000 Decennial Census With Statistical
                             Sampling and Estimation: An Overview of
                             Operational and Technical Issues




Persistent Coverage Errors   One of the most fundamental criticisms of the census, and much of the
Affect Census Accuracy       impetus behind efforts to redesign the census-taking approach, is that it
and Equity                   fails to count every area and population group equally well. Undercounts
                             that are not equally distributed among geographical areas and population
                             groups can create inequities in political representation and the distribution
                             of public funds. To be equitable, it is not enough for a census to be
                             generally accurate in a strict numeric sense (i.e., that the total count be
                             close to the total U.S. population). For many uses of census data, including
                             reapportionment of congressional seats, legislative redistricting, and some
                             funds distribution, proportions matter more than the raw totals.3

                             Evaluations of past censuses have revealed persistent coverage errors,
                             such as net undercounts in the total population figures and differential net
                             undercounts in census data by race and geographic area. For the 1990
                             Census, the reported net undercount was about 1.8 percent of the
                             population (4.7 million persons), according to independent demographic
                             analysis.4 However, that does not mean that over 98 percent of U.S.
                             residents were actually counted, as is often reported, since the number of
                             persons missed by the census was partially offset in the net count by
                             millions of persons who were double counted or improperly included. The
                             PES indicated that about 6 million persons were counted twice in the 1990
                             Census, while 10 million were missed.5 The 4.4 percent difference between
                             the net undercount for blacks (5.7 percent) and nonblacks (1.3 percent) in
                             1990 was the highest differential undercount measured by independent
                             demographic analysis since 1940.

                             The Bureau’s evaluations demonstrated that the 1990 Census missed entire
                             housing units (and their occupants), missed people within housing units
                             that it did count, and included other persons in the population counts in
                             error (e.g., by counting them more than once or in the wrong place). The

                             3
                              Distributional accuracy, however, is not the only criterion for the quality of census data. For example,
                             some applications use specific population thresholds, such as those defining eligibility for becoming a
                             metropolitan area.
                             4
                              Demographic analysis provides an estimate of the population derived largely from administrative data
                             such as birth and death records. It is important because it provides an independent estimate and a
                             consistent historical series of data from 1940 to the present. However, its ability to produce reliable
                             estimates below the national level, or for components of the population other than black and
                             nonblack, is very limited.
                             5
                              In addition, evaluations revealed that about 2.4 percent of the enumerated count in the 1990 Census
                             represented persons who should not have been counted at all (such as those who died before or were
                             born after census day, April 1) and those who should have been counted at another address. While
                             some of these errors would not affect the numeric total (e.g., as in the case of a person who was
                             missed at the address where he or she should have been counted, but was included in error at another
                             location), these types of errors could affect the distribution and accuracy of population counts
                             reported for small areas, such as blocks.



                             Page 56                                                       GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




detailed results from evaluations showed that young adult males, members
of ethnic and racial minorities, and renters, among others, were more
likely to be undercounted by the census than other residents. The
difficulty of obtaining a complete, accurate enumeration of the residents in
major urban areas is well known, but evaluations have consistently shown
high error rates for data on persons living in rural areas as well. Error
rates were also demonstrably higher for persons counted on forms
completed through enumerator follow-up rather than mailed in by the
household.

The Bureau’s research revealed many potential sources of coverage errors.
For example, at the very start of the 1990 Census, the address lists used to
guide data collection were incomplete and had other problems that made
it difficult for the Bureau to deliver questionnaires or attempt interviews
with every housing unit. In addition, people moved during the census
operations, which made it more likely that some persons would be missed,
counted more than once, or counted in the wrong place. Also, respondents
and enumerators sometimes had difficulty interpreting the residency rules
that determine whether and where people should be counted for census
purposes.

Just as sampling for nonresponse follow-up is only one component of the
Bureau’s strategy to handle the nonresponse problem, ICM is only one
element in the Bureau’s plans to improve census accuracy and reduce the
differential undercount in 2000. For example, improvements in address list
development and expanded partnerships with other levels of government
and community groups should identify some housing units that the Bureau
might otherwise miss. To cite another example, the Bureau intends to use
special targeted methods to improve the count of population groups and
the count in geographic areas in where the census has tended to miss a
disproportionate share of the people. Altogether, the efforts planned for
Census 2000 should result in the Bureau making multiple attempts to
contact and count all residents and should provide multiple opportunities
for people to respond to the census.

However, even if all other design components produce improvements in
the conventional census counting operations, the evidence from past
coverage evaluations indicates that errors in the census data will still
occur. Also, the task of accurately counting all members of the population
is becoming more difficult, in part due to a growing population, but also as
a reflection of the difficulty of having census rules, definitions, and
methods keep pace with changes in society. Bureau evaluations of the



Page 57                                      GAO/GGD-97-142 2000 Census Design
                         Appendix I
                         The 2000 Decennial Census With Statistical
                         Sampling and Estimation: An Overview of
                         Operational and Technical Issues




                         1990 Census indicated that simply adding more conventional counting
                         operations was not effective in eliminating or reducing differential
                         undercounts of areas and population groups that have been hard to
                         enumerate accurately. Therefore, the Bureau and expert panels of the
                         National Academy of Sciences concluded that the 2000 Census design
                         should incorporate the results of a coverage measurement survey
                         conducted immediately following basic data collection as an integral and
                         necessary step toward completing the next census.


Proposed ICM Operation   The purpose of ICM is to improve the accuracy of census data—in
Includes Major Changes   particular, to reduce the differential undercount—rather than just evaluate
From 1990 PES            the quality of the data. While the Bureau has evaluated the magnitude and
                         characteristics of census errors and undercounts since 1950, the
                         evaluation findings have not been used to correct for coverage errors in
                         the decennial census tabulations. A statistical adjustment was considered
                         after the 1990 Census, with the Bureau using the results of the 1990 PES to
                         produce a second, adjusted set of census data. However, upon review of
                         the original census data and the adjusted census data, the Secretary of
                         Commerce found the evidence in support of an adjustment to be
                         inconclusive and unconvincing and decided that the 1990 Census counts
                         should not be changed. The Bureau’s proposed design for the 2000 Census
                         would therefore mark the first time that such a step would be an integral
                         part of completing the census. The ICM results would be the last
                         component in producing final census numbers, following mail returns and
                         other basic data collection efforts, nonresponse follow-up data from
                         enumerator interviews, and nonresponse sampling results.

                         The proposed ICM operation is a coverage measurement survey that
                         estimates the true population on Census Day based on interviewing
                         housing units in a sample of blocks across the country. It would involve
                         several main stages: (1) selecting a sample of blocks in advance of the
                         2000 Census that the Bureau would survey after all census data collection
                         efforts have been completed in those blocks, (2) developing the best
                         possible address list for ICM sample blocks, (3) completing interviews for
                         every housing unit in the sample blocks to compile an independent list of
                         Census Day residents and match this ICM list to the census list, and
                         (4) using the results to estimate the true population on Census Day.

                         In the first stage, the Bureau would select a sample of blocks from across
                         the country. To ensure that this sample was sufficiently large and
                         representative to accurately estimate coverage errors in the census for



                         Page 58                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




specific geographic areas and population groups, the Bureau would
stratify blocks by characteristics such as the states they are located in,
population size, racial and ethnic composition of the residents (e.g., blocks
in which at least 50 percent of the residents in the 1990 Census were
black), tenure (e.g., whether most housing units are owned or rented),
whether or not the block is in an urbanized area, and other factors.
Stratifying and weighting the sample blocks, rather than just drawing a
simple random sample of blocks across the country, would enable the
Bureau to include an adequate number of sample units to provide
estimates of census coverage errors for specific population groups or
areas, even if they constitute a relatively small part of the total population
in the nation.

Once the ICM sample has been selected, the Bureau would compile an
enhanced master address list for the sample blocks. This enhanced
address list is intended to be the most complete listing possible of the
housing units in those blocks, given that it would be produced by more
intense and higher quality listing procedures than are possible for the
census as a whole. The Bureau would create this list by combining,
comparing, and reconciling the address list from the census with an
independent list developed by Bureau enumerators before the ICM
interviews begin.

In the third stage, Bureau enumerators would complete an interview at the
housing units in the sample blocks after basic census data collection has
been done. Under its current plan, the Bureau intends to use computer
assisted personal interviewing (CAPI) technology, laptop computers that
have the ICM questionnaire and other data needed for the interview already
preloaded into the machine. The Bureau enumerators would attempt to
complete an interview for each housing unit on the enhanced address list,
thereby obtaining a roster of the people living at the unit on Census Day.
Once this ICM roster was completed, the CAPI system would match all
census and ICM responses possible and reveal the roster from any census
questionnaire for that particular address. The Bureau enumerator would
then attempt to reconcile any differences between the different rosters
and determine which persons should have been enumerated at the housing
unit according to census residency rules.6

In the final stage, information from the ICM interviews would be used to
estimate the extent to which housing units and people were correctly

6
 Because the ICM procedures estimate the quality of census data collected by comparing ICM and
census results for the same housing units, the Bureau would use 100-percent follow-up of
nonrespondents in all ICM sample blocks, instead of sampling a portion of the nonrespondents.



Page 59                                                    GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




enumerated, missed, or counted in error for the entire census. The Bureau
would estimate the correct population for entire geographic areas, as well
as for specific poststrata as defined by characteristics such as age, sex,
tenure, race and ethnic origin. In other words, while stratification of the
original sample would be done on the basis of the characteristics of
blocks, poststratification would be based on the characteristics of
individuals. For example, in the Oakland test site of the 1995 Census Test,
one ICM poststratum for which the Bureau produced an estimate was Asian
and Pacific Islander females between the ages of 18 and 29 who lived in
nonowner (rental) housing units. The resulting estimates for all poststrata
would be incorporated through statistical procedures into the final census
data.

To address concerns raised about the PES during the adjustment decision
after the 1990 Census, the Bureau plans several major design changes for
its coverage evaluation survey in the 2000 Census. Perhaps most
significantly, the Bureau intends to make ICM an integral part of the census
process in 2000, thus producing one official set of numbers that reflect the
Bureau’s best estimate, rather than the original and adjusted census data
sets that were produced after the 1990 Census. Thus, unlike the process
followed in the 1990 Census adjustment decision, this “one-number
census” design would not include a step where two sets of numbers are
evaluated to determine which set is more accurate: the data incorporating
the results of the ICM survey would automatically become the official
census data. This one-number approach may help to mitigate concerns
expressed by the Secretary of Commerce when he decided not to adjust
the 1990 Census, such as the potential for confusion that more than one
set of numbers could create and the potential for political considerations
to play a part in choosing between sets of numbers when the outcome of
the choices (such as differences in apportionment of seats in Congress)
can be known in advance of a decision.

The Bureau plans other changes that are intended to improve the
timeliness and accuracy of the ICM results, compared with those produced
by the 1990 PES. The ICM is designed to be completed in a much shorter
time than the PES in order to meet the deadline for reporting census data
for apportionment purposes. The CAPI technology is one of the key design
changes that can improve timeliness since it reduces the need for
additional follow-up interviews of sample units. The Bureau also proposes
using a sample in the 2000 ICM that is approximately five times larger
(about 750,000 housing units) than the 1990 PES (about 150,000 units). This
larger sample should allow the Bureau to produce direct estimates for



Page 60                                      GAO/GGD-97-142 2000 Census Design
                            Appendix I
                            The 2000 Decennial Census With Statistical
                            Sampling and Estimation: An Overview of
                            Operational and Technical Issues




                            each state and improve the quality of estimates for substate geographic
                            areas.7 As another enhancement of the 1990 PES procedures, the Bureau is
                            exploring ways to produce household-level data. Persons would be added
                            to or subtracted from individual housing units through the ICM estimation
                            procedures, thus providing household characteristic data for researchers
                            and other data users, in place of using the 1990 PES procedure that would
                            have adjusted only the block data totals. As a consequence of these
                            changes, especially the larger sample size, the planned 2000 ICM is also
                            more expensive than the 1990 PES (in constant 1990 dollars, the cost is
                            $230 million for the 2000 ICM compared to $55 million for the 1990 PES).8


Research and Test Results   The Bureau has used a combination of computer simulations and field
for ICM Methods Have        testing in its research to design and refine the proposed ICM operation for
Shown Mixed Results         the 2000 Census. Simulations of ICM results using 1990 Census data showed
                            general improvement over the results produced by the 1990 counting
                            operations. Bureau simulations also indicated that an ICM survey of the size
                            proposed for 2000 could support a design that would achieve for each
                            state a coefficient of variation of approximately 0.5 percent or a standard
                            error of 60,000 persons, whichever is smaller.9 The simulated ICM reduced
                            the patterns of differential undercounts across population subgroups and
                            areas observed in the 1990 Census data.

                            The Bureau had several key ICM-related objectives in the 1995 Census Test.
                            Operational objectives focused on whether the new procedures, with
                            heavy reliance on CAPI technology, would work and improve the
                            completeness and timeliness of the coverage survey operation. Technical
                            objectives focused on continuing the evaluation of two potential
                            estimation methods for the ICM, called Dual System Estimation (DSE) and
                            CensusPlus, as well as completing a number of studies to evaluate
                            potential sources of bias in the population estimates produced by ICM.



                            7
                             A direct estimate is based entirely on data from the area for which the estimate is calculated. For
                            instance, a direct population estimate for Missouri would be calculated using only data collected from
                            Missouri. Indirect estimates, such as the 1990 PES state population estimates, draw on data from
                            outside the area being estimated.
                            8
                             Even if Congress should decide against using ICM to produce the final census numbers, some portion
                            of the resources slated for ICM would still be needed for a smaller coverage survey operation that
                            would be used for evaluation purposes only. Such an operation is essential for evaluating the quality of
                            the census, especially at subnational levels, and also provides information used to plan and budget the
                            next census.
                            9
                             The standard error (SE) and coefficient of variation (CV) are measures of the precision of an estimate.
                            Whatever true value the estimate might have, the standard error tells how wide an interval to expect
                            from all possible samples like the one under consideration. The CV is the standard error relative to the
                            size of the estimate.


                            Page 61                                                       GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




Statistical estimation methods are needed to translate the information
from the ICM interviews into estimates of the true population, which in turn
generate the factors used to correct the raw census counts up through
nonresponse follow-up. For example, if the estimation method indicates
that the census undercounted people in a particular poststratum by
4 percent, the Bureau would multiply every person counted by the census
in that poststratum by 1.04 to produce the final census counts. DSE is a
capture-recapture estimation method that assumes neither the census nor
the ICM counts everyone. Instead, using probability theory, it uses a
comparison of the results from the two lists (the census and the ICM
survey) to estimate the total population.10 In other words, it assumes that
using two independent estimates of the population can generate a third,
better estimate of the “true” population. DSE was used in the 1990 PES and
prior census coverage evaluation surveys, as well as several tests, so the
Bureau has a body of research and experience to rely on in the design and
execution of the method. The other major advantage of this method is that
the resulting estimates include a component to estimate persons missed by
both the census and the ICM survey. However, DSE is a complex estimation
method. Therefore, it takes more time to complete population estimates
using DSE, and the method may be harder to explain to the public. In
contrast, CensusPlus is a new method that the Bureau had never tested
before 1995. CensusPlus assumes that a second, higher quality survey (the
ICM) can find the “true” population by reconciling an ICM roster with the
census roster for the same housing unit. The reconciled count provides a
ratio to extrapolate for non-ICM households. The major advantages of
CensusPlus are that it is simpler than DSE, and may be faster to complete
and easier to explain.

The results of the 1995 Census Test for ICM were mixed. Given that it was
the first field test of the operation, this was not surprising. Among the
successes that the Bureau identified in the test for ICM was the CAPI-based
interviewing, which was well received by enumerators and those being
interviewed, and proved feasible for use in a coverage survey. The test
results showed that this technology offers considerable advantages in
terms of timeliness, control, and quality of ICM interview data. Overall
timeliness of the coverage survey operation was much improved
compared with the 1990 PES experience. The Bureau completed field work
for the 1995 ICM in about 2 months less time than it took to complete work
for the 1990 PES, which enabled the Bureau to produce the final census
data for both estimation methods before the end of the calendar year.

10
 For a more detailed description of the DSE method and the mathematical procedures used, see 1990
Census Adjustment: Estimating Census Accuracy—A Complex Task (GAO/GGD-91-42, Mar. 11, 1991).



Page 62                                                   GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




Finally, evaluations of specific potential sources of bias in the ICM
estimates and of other data collection errors indicated that the operations
were generally unbiased and effective.

While some of the new techniques and processes worked well in the
census test, the test results also clearly showed that further research is
needed in a number of areas. The Bureau experienced some serious
operational and technical problems with the ICM in the 1995 Census Test.
On the operational side, the most serious problems were a high rate of ICM
nonresponse and the failure to load all census data into the laptop
computers in time for the ICM interviews. The big technical problem was
the poor showing of the CensusPlus estimation method.

According to Bureau evaluations, the nonresponse rates for the ICM
interviews were too high, generating more missing data than the Bureau’s
estimation methods were designed to handle. Some of the nonresponses
represented partial interviews that did not provide sufficient data for the
Bureau to use in reconciling census and ICM rosters (i.e., there was not
enough specific information to be able to tell whether a person listed on
the census roster was the same person as one listed on the ICM roster).
High rates of noninterviews and missing data can create significant
problems for the Bureau when it attempts to estimate the true population.
The population estimates can vary depending on different assumptions the
Bureau makes about how to treat the missing information. For example, in
the Bureau’s evaluation of coverage in the 1980 Census, different
assumptions about the treatment of missing data, along with other
limitations, generated 12 different sets of estimates of the true population.
In the 1995 Census Test, the Bureau imputed responses for missing data
based on the responses that it was able to obtain, but its subsequent
evaluations of the test indicated that the ICM nonrespondents were
dissimilar to the ICM respondents.

Although the ICM began after nonresponse follow-up ended, the Bureau
had not yet finished recording all the data from the last nonresponse
interviews. Therefore, not all the census data needed for ICM interviews
were loaded into the laptop computers before enumerators went out for
those interviews. Bureau officials said that this problem was compounded
because a relatively large portion of the census data from nonresponse
follow-up interviews came in at the end of that operation. Census data
were missing in 29 percent of the ICM sample cases from which
interviewers called up information. For those cases, this made on-the-spot
reconciliation of ICM and census rosters impossible and also eliminated the



Page 63                                       GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




possibility of ICM interviewers probing to resolve discrepancies and
identify persons not originally accounted for. Among other difficulties
with the computer technology, the Bureau discovered problems with the
flow and complexity of the survey instrument loaded into the machines.

Of the two estimation methods the Bureau tested in 1995, only DSE showed
a consistent ability to reduce the differential undercount of traditionally
undercounted population groups. In contrast, the CensusPlus method
produced an unexpectedly poor showing according to Bureau officials.
The estimates produced by this method were lower than the total count
after nonresponse follow-up (i.e., from all phases before ICM results were
incorporated) for some traditionally undercounted population groups.
CensusPlus results did not reduce the differential undercount of blacks
and only provided limited improvement in the counts for Hispanics. The
CensusPlus results also showed patterns that did not appear reasonable
when compared with independent estimates, such as those produced by
demographic analysis. Bureau officials were not certain whether the poor
performance of CensusPlus was due to problems with operations and the
computer survey instrument or to flaws in the method itself. Because the
Bureau has extensive experience with the DSE method, and because that
method has been performing well in recent tests, the Bureau decided in
March 1997 that it will use DSE as part of the ICM procedures planned for
the 2000 Census.

The Bureau began testing redesigned procedures and computerized survey
instruments for ICM in its 1996 Community Census, for which the ICM
survey portion started in January 1997. Among the planned changes were
revisions to the computerized survey instrument to make it easier for
respondents and interviewers to move through the questions. Also, the ICM
schedule was extended by about 2 weeks to allow the Bureau to load all
census data on the laptop computers before ICM interviews. The Bureau
also intended to use the longer schedule to implement a special operation
to try to convert ICM noninterviews into completed interviews, thus
alleviating some of the missing data problems experienced in the 1995
Census Test. A similar nonresponse conversion operation for the 1990 PES
reduced the noninterview rates to about 1.5 percent. The 1996 test should
provide information to determine whether the operational problems have
been addressed sufficiently, but the evaluations from the 1996 Community
Census are not expected to be completed until fall 1997.




Page 64                                      GAO/GGD-97-142 2000 Census Design
                         Appendix I
                         The 2000 Decennial Census With Statistical
                         Sampling and Estimation: An Overview of
                         Operational and Technical Issues




                         The effects of the proposed new statistical methods on the success of
Combined Effects of      census operations and the quality of the resulting data need to be viewed
Sampling for             in combination. The Bureau’s research to date illustrates that there are
Nonresponse and ICM      many trade-offs and interrelationships between the components of
                         alternative census designs. Sampling a portion of the nonresponse
                         workload can save time and money compared to attempting to follow up
                         100 percent of the nonrespondents, but it is not likely to significantly
                         improve census accuracy. The ICM is designed to address problems with
                         census accuracy, but it is unlikely to be successful unless preceding data
                         collection efforts, in particular nonresponse follow-up, are completed on
                         schedule. The results from Bureau research suggest that the statistical
                         methods proposed by the Bureau for 2000 should reduce the bias observed
                         in past census data (i.e., the differential undercounts), but these methods
                         would also introduce additional random error in the data because of
                         sampling. Similarly, designs that do not include sampling for nonresponse
                         follow-up or ICM also involve trade-offs. For example, such designs would
                         be easier to explain to the public and more closely resemble past
                         censuses. They would not introduce the additional level of uncertainty in
                         the results that accompanies sampling. However, the Bureau’s research
                         results and projections also suggest that those designs would be more
                         expensive and show no likelihood of reversing or substantially reducing
                         accuracy problems in census data.

                         Projections of the expected accuracy, equity, costs, and operational
                         feasibility of alternative census designs are likely to change as the
                         Bureau’s research continues. The Bureau and other census observers have
                         identified a number of areas in which additional research and decisions
                         are needed with regard to the details of the proposed statistical methods.
                         Such research is particularly critical because technical changes and
                         refinements can affect census results.


Results From Computer    To determine the potential levels and sources of error for various levels of
Simulations Illustrate   geography, the Bureau relied primarily on computer simulations. At our
Trade-Offs in Accuracy   request, the Bureau provided us data from its most recent simulations of
                         the results that would likely be produced in the 2000 Census using various
                         alternative census designs. According to Bureau officials, one important
                         advance in this research, compared to previous simulations, was that they
                         were able to use detailed information from the operational data files of the
                         1990 Census about the processing of census forms collected from the
                         nation’s housing units. For simulating the results of sampling for
                         nonresponse follow-up, the most important operational items were the



                         Page 65                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




response status (i.e., whether or not a given unit responded to the census
by mail) and the check-in date of each census form (which identified when
the Bureau was able to collect data for each unit). Using this information,
the Bureau was able to identify nonresponding housing units and define
the nonresponse follow-up sampling universes for different designs with a
high degree of accuracy, yielding more realistic results with fewer
assumptions than were needed for previous simulations. To account for
expected growth in the population, the Bureau used population
projections for the counts and characteristics of persons in 2000. In
addition, the Bureau assumed in its simulations that the percentage
undercounts (and overcounts) measured for population groups by the
1990 PES would also apply in the 2000 Census.

The Bureau’s summary chart showing projected results from its
simulations of alternative designs for conducting the 2000 Census, as of
June 1997, is presented in appendix II. Because of the limitations of this
research, the Bureau’s estimates should serve only as a rough illustration
of possible results in 2000. The most important limitation is that actual
results can vary from the expected results produced in theory or in a
computer simulation. Among other limitations are that (1) the Bureau
continues to refine the different methods it is studying; (2) as new
information becomes available, such as data on expected staffing and pay
levels in 2000, the results may vary; and (3) more research is needed to
better understand and quantify some of the effects and sources of error
that are not reflected in this current set of results.11

Despite these caveats, the simulation results suggest that, relative to the
size of the population being estimated, the new methods proposed by the
Bureau would generally reduce the level of error in census data for the
nation, states, congressional districts, and most census tracts. Results near
the levels projected by the Bureau’s simulations for its refined plan would
represent a reduction in the relative error levels experienced in the 1990
Census, as well as in the levels projected for a 2000 Census design that
incorporates all proposed improvements in 1990 Census procedures
except those involving sampling or statistical estimation.

The Bureau’s simulation results for alternative designs also illustrate one
of the major trade-offs in accuracy between designs that use sampling and
statistical estimation and those that do not. While the data showed that

11
  For example, the procedures used to model population estimates for all smaller geographic areas,
such as tracts, using a limited amount of sample data also generate errors. The potential magnitude of
those errors and the validity of the assumptions used in the models are the subjects of research both
within and outside the Bureau.



Page 66                                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




most places and geographic levels had lower rates of error using the
methods the Bureau proposes to use in 2000, some places had lower rates
of error using conventional census methods without sampling and
statistical estimation. The simulation data suggest that the Bureau’s
current design for the 2000 Census would likely produce results that
appear more accurate or more equitable by at least three broad criteria:
(1) the average levels of error are better; (2) the shape of the error
distributions is compressed closer to the average levels; and (3) the
cumulative error distributions also appear to be better. For example, the
difference between the average relative error rates for the Bureau’s
current design for the 2000 Census and a conventional design without
sampling range from 0.8 percent at the level of census tracts (1.1 percent
compared to 1.9 percent) to 1.8 percent for the national total (0.1 percent
compared to 1.9 percent). The distribution of tracts, using the Bureau’s
current design, showed that over 90 percent of tracts had results between
0.5 percent and 2.0 percent relative error; the comparable range to account
for 90 percent of tracts using conventional procedures would extend from
under 0.5 percent to between 4.0 and 4.5 percent. In terms of the
cumulative distributions, while the data again indicated that over
90 percent of tracts had error rates of less than 2.0 percent using the
Bureau’s current design, only about 63 percent of tracts had rates of less
than 2.0 percent using the conventional procedures.

However, the results also show that some places that should have very low
error rates using a conventional census design could have higher error
levels if sampling for nonresponse and ICM are used. For example, the
Bureau’s current plan for the 2000 Census is designed to achieve a relative
error rate of 0.5 percent for all but the largest states.12 But its simulations
projected that two states could have error rates of less than 0.5 percent by
using a conventional design without sampling in the 2000 Census. (In the
1990 Census, five states had estimated error rates of under 0.5 percent.)
The trade-off is more noticeable for smaller geographic areas. The smallest
geographic areas for which we have detailed data from the Bureau
simulation are census tracts.13 Using a trimmed data set, the Bureau
calculated that its current plan for 2000 (using direct sampling for
nonresponse follow-up, together with ICM) would produce less error for


12
 For the four states with a population of over 12 million persons (California, Florida, New York, and
Texas), the Bureau’s plan was designed to ensure that the standard errors from sampling did not
exceed 60,000 persons, which produced relative error rates of under 0.5 percent for those states.
13
 The simulation did not produce block-level data on the potential effects of the Bureau’s current plan
and options, but Bureau officials told us they intend to contract for a study of the effects on block
data.



Page 67                                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




64 percent of census tracts.14 The time-truncation option produced less
error for 54 percent of tracts, and the 90-percent truncation option
produced less error for only 51 percent of tracts. The converse of these
figures is that the 1990 Census procedures performed better for
approximately 40 to 50 percent of tracts, depending on the option used for
comparison.

The question therefore becomes, how much better or worse are the error
levels for individual tracts using different designs? The Bureau is still
working with and studying the data set from its latest simulations and has
not yet produced detailed data tables to address that question directly. But
it did provide us with data from its simulation showing the distribution of
census tracts by the level of relative error. Table I.1 provides summary
information on the distribution of census tracts by the level of relative
error. For this table, we chose to present data for all 60,128 tracts from the
untrimmed data set. We used the untrimmed data set because we believe
that the ends of the distributions, which the Bureau trimmed from the
other data set, matter unless the Bureau decides not to apply the same
methods to these tracts.




14
  The Bureau “trimmed” the number of census tracts represented in these calculations by removing
tracts with unusual characteristics (e.g., those with especially small or large populations). Trimming
reduced the number of tracts from 60,128 to 56,022. A more detailed explanation of trimming is
included in the Bureau’s footnotes to its summary chart in appendix II.



Page 68                                                       GAO/GGD-97-142 2000 Census Design
                                            Appendix I
                                            The 2000 Decennial Census With Statistical
                                            Sampling and Estimation: An Overview of
                                            Operational and Technical Issues




Table I.1: Distribution of Census Tracts by Relative Error Level Using Alternative Census Designs
                                                                   Relative error level in percentage
                                                 0.5 to   1.0 to    1.5 to    2.0 to    2.5 to    3.0 to     3.5 to       4.0 to   4.5 to
Design alternative                     < 0.5        1.0      1.5       2.0       2.5       3.0       3.5        4.0          4.5      5.0   > 5.0
            Percentage of tracts
Direct sampling                          1.5      39.5     41.6      11.6        3.3       1.1        0.5       0.2         0.1      0.1        0.4
90-percent truncation                    1.5        8.2    40.8      32.9       10.7       3.4        1.3       0.5         0.2      0.2        0.4
Time truncation                          1.5      16.1     46.7      25.0        6.7       2.1        0.8       0.3         0.2      0.1        0.4
No sampling                             22.2      12.1     15.0      13.1        8.8       6.6        5.1       4.0         3.5      2.7        6.8
            Cumulative percentage of tracts
Direct sampling                          1.5      41.0     82.6      94.2       97.5      98.7      99.2      99.4         99.5     99.6    100.0
90-percent truncation                    1.5        9.7    50.5      83.4       94.0      97.4      98.7      99.2         99.4     99.6    100.0
Time truncation                          1.5      17.6     64.4      89.3       96.1      98.2      98.9      99.3         99.5     99.6    100.0
No sampling                             22.2      34.3     49.4      62.5       71.3      77.9      82.9      87.0         90.4     93.2    100.0
                                            Note 1: Data presented reflect simulation results for 60,128 census tracts.

                                            Note 2: Relative error for the direct sampling, 90-percent truncation, and time-truncation design
                                            alternatives is the combined sampling error from sampling for nonresponse follow-up and ICM in
                                            simulations of the 2000 Census. It does not include possible errors from other sources, such as
                                            any bias in the statistical models used to produce the tract-level estimates.

                                            Note 3: Relative error for the no sampling design alternative (i.e., without using sampling for
                                            nonresponse follow-up or ICM) is the absolute value of the estimated undercount or overcount
                                            rate for tracts in simulations of the 2000 Census. It does not include possible errors from other
                                            sources. Also, although model error would not apply in the actual use of the no sampling design,
                                            it does represent an unmeasured error component in producing these tract-level estimates.

                                            Note 4: Cumulative percentages may be affected by rounding.

                                            Source: Bureau of the Census



                                            There is a clear difference in the shape of the distributions for the design
                                            alternatives using sampling for nonresponse and ICM, and the distribution
                                            for a design that does not use those procedures. The distributions of tract
                                            data for designs using these new methods are compressed toward the
                                            average relative error level for tracts. The distribution of tracts for the
                                            conventional census design, while showing that about 22 percent of all
                                            tracts had minimal net undercounts (i.e., less than 0.5 percent), is more
                                            dispersed across the range of error levels. This suggests that the tracts that
                                            would have relatively more error using the new methods may have only
                                            slightly more error.

                                               Figure I.2 illustrates the different distributions of census tracts by the
                                               relative error level produced by the Bureau’s selected design for Census




                                            Page 69                                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




2000 (the direct sampling option for nonresponse follow-up together with
ICM) and a census design that does not include sampling for nonresponse
follow-up and ICM. The patterns are similar when the results for the
90-percent truncation and time-truncation designs are graphed. Again, the
distributions are compressed toward the average error level. However,
since the averages for those designs are slightly higher than for direct
sampling, the trade-off compared to a conventional design is more
pronounced.




Page 70                                      GAO/GGD-97-142 2000 Census Design
                                                       Appendix I
                                                       The 2000 Decennial Census With Statistical
                                                       Sampling and Estimation: An Overview of
                                                       Operational and Technical Issues




Figure I.2: Distribution of Census Tracts by Error Level Shows Trade-Off Between Direct Sampling and No Sampling
Designs
Percentage of census tracts
100


90


80


70

60


50


40

30


20


10


 0
         < 0.5      0.5--1.0    1.0--1.5        1.5--2.0    2.0--2.5     2.5--3.0    3.0--3.5     3.5--4.0    4.0--4.5     4.5--5.0     > 5.0

         Percentage of relative error level

                 No sampling design
                 Direct sampling design
                 No sampling (cumulative)
                 Direct sampling (cumulative)




                                                       Note 1: Data presented reflect simulation results for 60,128 census tracts.

                                                       Note 2: Relative error for the direct sampling design is the combined sampling error from
                                                       sampling for nonresponse follow-up and ICM in simulations of the 2000 Census. It does not
                                                       include possible errors from other sources, such as any bias in the statistical models used to
                                                       produce the tract-level estimates.

                                                       Note 3: Relative error for the no sampling design (i.e., without using sampling for nonresponse
                                                       follow-up or ICM) is the absolute value of the estimated undercount or overcount rate for tracts in
                                                       simulations of the 2000 Census. It does not include other possible errors.

                                                       Source: Bureau of the Census data.




                                                       Page 71                                                      GAO/GGD-97-142 2000 Census Design
                             Appendix I
                             The 2000 Decennial Census With Statistical
                             Sampling and Estimation: An Overview of
                             Operational and Technical Issues




                             The Bureau’s research over the past several years also showed that, while
                             both sampling for nonresponse follow-up and ICM procedures introduce
                             sampling error into census data, they do so in different proportions at
                             different geographic levels. Bureau officials found that, on average,
                             sampling for nonresponse follow-up would contribute most of the
                             sampling error or variability for smaller geographic areas, such as tracts
                             and blocks, while ICM would contribute almost all of the error at the level
                             of congressional districts and larger geographic areas. In general, relative
                             sampling error increases as the population of a geographic area decreases.
                             Simulation results also showed that sampling errors could be significantly
                             high for some very small areas or population groups, although the level of
                             nonsampling errors for such areas and groups can also be high when
                             conventional counting methods are used. This highlights the need for
                             additional research to identify the block-level effects of different census
                             designs. However, relative variability becomes smaller as block-level data
                             are aggregated to larger geographic areas, and the simulations indicated
                             that data for areas or groups of equivalent population size are likely to
                             have similar levels of sampling variability.

                             In large part, the accuracy trade-off in the Bureau’s proposed approach
                             may entail accepting that sampling and statistical estimation would
                             introduce random sampling error in the data, especially for small
                             geographic areas (such as most blocks and census tracts with 1,000 or
                             fewer persons), but would reduce systematic bias in the results for all
                             larger geographic areas. Bias in this context occurs because certain
                             individuals and households are more likely to be missed by the census. To
                             the extent that these missed units have distinctive characteristics, the
                             resulting census data will be biased since these units are not included in
                             the census tabulations. Random sampling error refers to the fact that one
                             random sample will differ somewhat from another even if the two samples
                             are drawn from the population in the same random way. The magnitude of
                             random sampling errors can be estimated from an actual sample, and it is
                             possible to limit the magnitude of random sampling errors as part of the
                             sample design process.


Bureau’s Cost Projections    The Bureau has been conducting research to estimate the overall effects of
Show Cost Savings Are        using alternative designs on the cost of the 2000 Census. To develop cost
Possible Using Its Current   projections for the full census cycle, the Bureau used two cost models,
                             one to generate estimates for Bureau headquarters activities and the
Design for Census 2000       second, called the Year 2000 Cost Model, to generate expected costs in all
                             nonheadquarters components of the census cycle for fiscal years



                             Page 72                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




1997-2001, using information from the 1990 Census and research for the
2000 Census. Among the underlying design principles of the Year 2000
Cost Model are that it is intended to replicate the whole census process
and model interrelationships among operations and activities. For
example, if the assumed response rate changes, a domino effect may result
for many activities and costs, in areas as diverse as recruiting and training
of census enumerators, printing, and postage. The Bureau contracted with
an outside consultant to provide validation of the Year 2000 Cost Model’s
logic. The firm, Booz-Allen & Hamilton, Inc., was responsible for writing
an executive summary explaining how the model operates and detailing
sources of model inputs.

Assuming the Bureau can obtain an overall mail response rate of about 67
percent, it estimated in June 1997 that the cost for the entire 2000 Census
cycle would range from about $4.0 billion to $4.8 billion (in 2000 dollars)
for different designs. Given the cost estimates provided to us by the
Bureau, its selected design, which would use the direct sampling option
for nonresponse follow-up, appears to offer the greatest potential to
control the overall cost of the 2000 Census. According to the Bureau’s
estimates, the difference between its current selected design and a plan
that would not use sampling for nonresponse or ICM to produce the final
counts would be about $700 million to $800 million.

Cost projections for all designs may increase depending on changes in a
number of cost-model assumptions, such as the pay and turnover rates
that affect staffing costs and the size of the nonresponse workload. The
Bureau was still examining and validating wage rate information from a
study by Westat, Inc., when we completed our work. That study
recommended higher wage rates to ensure an adequate labor supply in the
2000 Census, but the Bureau has not included those rates in its current
cost estimates. Costs are also likely to increase if the Bureau cannot fully
implement all of the proposed changes in operations for the 2000 Census
as planned. For example, the Bureau may need to rely more heavily on
labor-intensive clerical procedures than what is now reflected in the
cost-model estimates if new technology does not work as well as planned.
Costs would also escalate if the efforts planned to encourage mail
responses to the census do not increase response rates as much as the
Bureau anticipates, resulting in a larger nonresponse workload than the
Bureau has been projecting for the 2000 Census.




Page 73                                      GAO/GGD-97-142 2000 Census Design
                             Appendix I
                             The 2000 Decennial Census With Statistical
                             Sampling and Estimation: An Overview of
                             Operational and Technical Issues




Additional Research,         There is still uncertainty surrounding the potential results that would be
Testing, and Clarification   produced by alternative census designs and methods and questions that
of Objectives Remain to Be   need to be answered. In the relatively short time available before the next
                             census, the Bureau has identified additional research and testing that it
Completed Before the 2000    needs to complete in order to refine its plans, as well as to address
Census                       broader concerns expressed by some Members of Congress and other
                             observers about the Bureau’s proposal to expand the use of statistical
                             methods. The remaining work to prepare for Census 2000 also includes
                             reaching decisions on some technical details of the proposed methods that
                             can affect the final results.

                             Bureau officials have been developing a prioritized list of research topics
                             on sampling and estimation methods in the census that need further work.
                             For example, the design and selection of the ICM sample (i.e., identifying
                             the number and types of blocks that would be surveyed in areas across the
                             country) would need to be determined on the basis of the desired level of
                             precision in estimates for various levels of geography and population
                             groups. The Bureau also plans to continue research to refine the overall
                             estimation methods and procedural details for producing a census data
                             file. One of the Bureau’s goals for this research is to develop a way to take
                             the results from the proposed estimation methods and place persons down
                             to the level of individual housing units in each block. Another key research
                             topic is determining when and how to produce direct and indirect
                             population estimates. Evaluations of the 1995 Census Test also indicated
                             the need for more development of the procedures and software used for
                             computerized matching of addresses and individuals listed by the ICM and
                             the other census operations. These are only some of the remaining topics
                             for investigation that Bureau researchers identified.

                             Issues that have been raised in Congress and by observers of the census
                             planning process, including members of the statistical community,
                             concerning the expanded use of statistical methods in the 2000 Census
                             also pose important questions for the Bureau’s research efforts. These
                             issues generally can be grouped into two areas. The first area involves
                             questions about the technical soundness of the proposed methods, such as
                             whether the underlying assumptions are valid, the proposed methods are
                             statistically robust (i.e., that the results produced are not overly sensitive
                             to variations in reasonable assumptions and alternatives to the production
                             procedures), and the effects and consequences of using and combining
                             these methods are well understood. For example, one important question
                             in weighing the effects of alternative designs is the extent to which the
                             Bureau’s estimates for the precision of population estimates produced by



                             Page 74                                       GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




various methods accurately account for all sources of error. The second
set of issues involves questions about the operational feasibility of the
proposed procedures. Because the sample surveys that the Bureau intends
to employ in 2000 are large and complex, some observers are concerned
about whether all components of the Bureau’s plan can be implemented
effectively and with limited errors in a decennial census environment. One
reason for such concerns is that, several months after the July 1991 census
adjustment decision, the Bureau discovered a computer coding error in
the 1990 PES estimation procedures. Correcting for that error, together
with other subsequent modifications and edits, lowered the PES estimates
of the 1990 net undercount by about half a percentage point, from around
2.1 percent to about 1.6 percent. The Bureau should be better able to
address both sets of issues as it completes more research in the priority
areas it has identified and undertakes evaluations of how the proposed
statistical methods performed in the 1996 Community Census and in the
Census 2000 Dress Rehearsal in 1998.

The work that remains to be done by the Bureau is important because
decisions about the details of its methods and procedures can affect the
census results. Some of these technical details involve policy decisions,
such as defining what sample allocation is equitable or what level of
precision the Bureau’s methods should attempt to achieve for different
levels of census geography. One of the most difficult challenges regarding
the design of census-taking procedures is that the formula for
apportioning seats in the House of Representatives is mathematically very
sensitive to small changes in the number of people in states that receive
the last few seats. Variations and errors in conventional census data
collection can also affect apportionment; this is not a problem unique to
the use of statistical methods. With the current apportionment formula
and size of the House of Representatives, a perfect count or
apportionment is not likely using any census design, as demonstrated in
some of the Bureau’s research. The question is which design will produce
better estimates of the population.

When Secretary of Commerce Robert Mosbacher made his decision not to
adjust the 1990 Census using the results of the PES, he recognized this
dilemma and noted that some sensitivity should be expected. He pointed
out that no production of the complexity of the census could be
completely prespecified and that technical decisions were made in the
course of the estimation procedure following the 1990 Census. However,
in commenting on the precedent for future censuses, the Secretary also
noted that there are different implications to the many decisions that are



Page 75                                      GAO/GGD-97-142 2000 Census Design
Appendix I
The 2000 Decennial Census With Statistical
Sampling and Estimation: An Overview of
Operational and Technical Issues




made during the course of the census process (when “the decision maker
does not know the import of his decision”) and the decisions made when
the results of different choices can be known. The House Committee on
Government Reform and Oversight raised similar concerns about the
potential subjectivity of sampling and statistical estimation methods in its
report released on September 24, 1996. To mitigate such concerns about
the subjectivity of its proposed statistical methods in the 2000 Census, the
Bureau expects to subject its plans and procedures for implementing the
methods to the review and scrutiny of professional experts, advisory
committees, and other stakeholders. According to the Bureau, once the
detailed procedures for the statistical methods have been developed by
the Bureau and accepted by the reviewers, these procedures will be
“frozen” to ensure that there is no introduction of subjectivity into the
results for Census 2000.




Page 76                                      GAO/GGD-97-142 2000 Census Design
Appendix II

Bureau of the Census Summary Information
on Design Alternatives for the 2000 Census

               The Bureau’s summary data and information on alternative designs for
               conducting the 2000 Census, as of June 1997, are presented in this
               appendix. Figure II.1 shows the projected error rates from the Bureau’s
               simulations for selected design alternatives, together with cost estimates
               for those designs and other explanatory information in the accompanying
               footnotes.

               Please note that, while we chose to present data in appendix I from an
               “untrimmed” data set of simulation results for all 60,128 census tracts, the
               Bureau used the “trimmed” set of results for 56,022 tracts in its chart to
               reduce the potential for unusual tracts to affect the results. The Bureau’s
               use of trimming, along with other details on the methods and assumptions
               used to produce the chart data, is discussed in the footnotes. In general,
               close attention to the Bureau’s footnotes is important for understanding
               the information presented in the chart and the limitations of the Bureau’s
               data.

               We made some minor formatting changes needed to reproduce the
               Bureau’s chart in our report and added a figure title and source.
               Otherwise, the following material in this appendix presents the Bureau’s
               original text, data, and notes.




               Page 77                                      GAO/GGD-97-142 2000 Census Design
                                           Appendix II
                                           Bureau of the Census Summary Information
                                           on Design Alternatives for the 2000 Census




Table II.1: Bureau of the Census Summary Data on Projected Costs and Accuracy of Selected Census 2000 Alternative
Methodologies
                                                            NATIONAL                        STATES (excluding DC)

THE CENSUS BUREAU’S PLAN                  Cost in            Error, by sourceb                         Error, by sourceb
AND ALTERNATIVE                              2000      Misses/                                  Misses/
METHODOLOGIES FOR                         dollars       Double               Combined            Double                     Combined
CONDUCTING CENSUS 2000                 (billions)a     Countsd Sampling         Errore          Countsd Sampling               Errore
THE PLAN TO ASSURE MEETING IMPROVED ACCURACY AND REDUCED COST GOALS OF CENSUS 2000
(Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate)
Refined Plan:                                $4.0            *       0.1%               0.1%            *         0.5%g         0.5%g
- Direct Sampling for NRFU
- Census Tract Response Control                                                                               (0.2% to        (0.2% to
- Quality Check                                                                                                    0.5%)g       0.5%)g
ALTERNATIVES THAT DO NOT MEET IMPROVED ACCURACY AND/OR REDUCED COST GOALS OF CENSUS 2000
(Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate)
Conduct Census 2000 Using                    $4.7          1.9%     N/A                 1.9%d        1.9%         N/A           1.9%d
Improved Proceduresj                           to
Except there is:                             $4.8k,l                                            (0.4% to                      (0.4% to
- Full NRFU (no sampling)                                                                            3.2%)j                     3.2%)i
- No Quality Checkk
- An Evaluation Study +




Conduct Census 2000 Using                    $4.4l           *       0.1%               0.1%            *         0.5%g         0.5%g
Improved Procedures
Except there is:                                                                                              (0.2% to        (0.2% to
- Full NRFU (no sampling)                                                                                          0.5%)g       0.5%)g
(Still includes the Quality Check)

SCENARIO SHOWN FOR REFERENCE AND COMPARISON ONLY; THIS IS NOT AN ALTERNATIVE FOR CENSUS 2000m
(1990 forms/single mail contact/pro bono advertising strategy yielding about 55 percent mail response rate)

Conduct Census 2000 Using 1990               $4.8k,n       1.9%     N/A                 1.9%d        1.9%         N/A           1.9%d
Proceduresj
- Full NRFU (no sampling)                                                                       (0.4% to                      (0.4% to
- No Quality Checkk                                                                                  3.2%)i                     3.2%)i
- PES




                                           Page 78                                               GAO/GGD-97-142 2000 Census Design
                                           Appendix II
                                           Bureau of the Census Summary Information
                                           on Design Alternatives for the 2000 Census




     CONGRESSIONAL DISTRICTS                                 CENSUS TRACTS
                              b
             Error, by source                        Error (trimmed), by sourceb,c
Misses/                                           Misses/
 Double                           Combined         Double                         Combined
Countsd        Samplingf             Errore       Countsd        Sampling            Errore         Comments



         *           0.6%h               0.6%h           *             1.1%h             1.1%h      - Low combined error for small
                                                                                                    geographic areas
                (0.3% to            (0.3% to                      (0.6% to          (0.6% to
                     2.3%)i              2.3%)i                        2.4%)c,i          2.4%)c,i



      1.9%          N/A                  1.9%d        1.9%            N/A                1.9%d      - Much higher cost than the
                                                                                                    Refined Plan
(–1.2% to                          (–1.2% to    (–1.2% to                          (–1.2% to        - Higher combined error for all
      7.0%)i                             7.0%)i       6.2%)c,i                           6.2%)c,i   geographic areas
                                                                                                    - A PES is used to evaluate
                                                                                                    census quality; results
                                                                                                    available after delivery of
                                                                                                    apportionment totals
                                                                                                    - Increased activities (publicity,
                                                                                                    followup visits at all
                                                                                                    vacant housing units, etc.) to
                                                                                                    attempt to achieve census
                                                                                                    coverage consistent with 1990
                                                                                                    levels
         *           0.6%h               0.6%h           *             0.8%h             0.8%h      - Higher cost than the Refined
                                                                                                    Plan
                (0.3% to            (0.3% to                      (0.4% to          (0,4% to        - Low combined error for small
                     2.3%)i              2.3%)i                        1.9%)c,i          1.9%)c,i   geographic areas
                                                                                                    - Increased risk of management
                                                                                                    failure




      1.9%          N/A                  1.9%d        1.9%d           N/A                1.9%d      - Much higher cost than the
                                                                                                    Refined Plan
(–1.2% to                          (–1.2% to    (–1.2% to                          (–1.2% to        - Much higher combined error for
      7.0%)i                             7.0%)i       6.2%)c,i                        6.2%) c,i     all geographic areas
                                                                                                    - A PES is used to evaluate
                                                                                                    census quality; results
                                                                                                    available after delivery of
                                                                                                    apportionment totals
                                                                                                                            (continued)




                                           Page 79                                                GAO/GGD-97-142 2000 Census Design
                                         Appendix II
                                         Bureau of the Census Summary Information
                                         on Design Alternatives for the 2000 Census




                                                              NATIONAL                          STATES (excluding DC)

THE CENSUS BUREAU’S PLAN                Cost in           Error, by sourceb                        Error, by sourceb
AND ALTERNATIVE                            2000    Misses/                                   Misses/
METHODOLOGIES FOR                       dollars     Double                 Combined           Double                   Combined
CONDUCTING CENSUS 2000               (billions)a   Countsd Sampling           Errore         Countsd Sampling             Errore
DISCARDED ALTERNATIVES THAT MEET IMPROVED ACCURACY AND REDUCED COST GOALS OF CENSUS 2000
(Improved forms/multiple mail contacts/paid advertising strategy yielding about 67 percent mail response rate)
The Original Plan: Truncate at 90%         $4.2o          *        0.1%               0.1%         *       0.5%g           0.5%g
- Census Tract Response Control
- Sampling for NRFU                                                                                    (0.2% to          (0.2% to
- Quality Check                                                                                             0.5%)g         0.5%)g
Implementation Alternative 1: Time         $4.6o          *        0.1%               0.1%         *       0.5%g           0.5%g
Truncation
- Census Tract Response Control                                                                        (0.2% to          (0.2% to
- Sampling for NRFU                                                                                         0.5%)g         0.5%)g
- Quality Check




                                         Page 80                                              GAO/GGD-97-142 2000 Census Design
                                       Appendix II
                                       Bureau of the Census Summary Information
                                       on Design Alternatives for the 2000 Census




   CONGRESSIONAL DISTRICTS                                 CENSUS TRACTS
           Error, by sourceb                       Error (trimmed), by sourceb,c
Misses/                                        Misses/
 Double                        Combined         Double                            Combined
Countsd     Samplingf             Errore       Countsd       Sampling                Errore          Comments



       *           0.6%h              0.6%h            *             1.5%h                 1.5%h

              (0.3% to           (0.3% to                      (0.8% to              (0.8% to
                   2.3%)i             2.3%)i                        2.9%)c,i              2.9%)c,i
       *           0.6%h              0.6%h            *             1.3%h                 1.3%h

              (0.3% to           (0.3% to                      (0.7% to              (0.7% to
                   2.3%)i             2.3%)i                        2.7%)c,i              2.7%)c,i

                                       Legend:
                                       *               -Too small to measure with reasonable cost and operations
                                       NRFU            -Nonresponse Followup
                                       Quality Check   -Also known as the Integrated Coverage Measurement Survey or ICM
                                       N/A             -Not applicable; in the absence of any sampling process, there is no sampling error
                                       +               -The evaluation study would yield coverage measures similar to the 1990 PES;
                                                        results not available until after delivery of the apportionment totals
                                       PES             -Post Enumeration Survey; results not available until after delivery of the
                                                        apportionment totals
                                       a
                                        The dollar figures shown do not include the higher wage rates recommended by Westat, Inc. to
                                       ensure an adequate labor supply for Census 2000. The results of the Westat study are still being
                                       evaluated by the Census Bureau for their effect on the cost of the Refined Plan and each
                                       alternative to the plan for Census 2000. For the Refined Plan and Alternatives 5 and 6 that use
                                       sampling to reduce the workload for nonresponse followup, the effect is likely to be less than
                                       $100 million. For the alternatives that include full (rather than sample) nonresponse followup
                                       operations, the nearly 60 percent (11.9 million housing unit) increase in workload, multiplied by
                                       the recommended wage rates, could add several hundred million dollars to the estimated cost
                                       figure shown.
                                       b
                                        All error figures were derived using simulations of 1990 census estimates of undercounts and
                                       overcounts for census tracts. To account for expected growth in the population of the United
                                       States through the year 2000, the 1990 census tract population totals by race and Hispanic origin
                                       were projected using the factors shown on Attachment 1. The projection factors were derived
                                       from a widely used process known as Demographic Analysis. The simulations assume that the
                                       percentage undercounts (and overcounts) measured for each group in the 1990 Post
                                       Enumeration Survey also would apply in Census 2000. To determine the amount of undercount or
                                       overcount for each census tract, the projected population totals for each were computed with and
                                       without the results of the 1990 PES for each region and for the various segments of the population
                                       within it.

                                       The totals for the specific census tracts in each geographic entity were summed to derive the
                                       error rates for the more populous geographic levels shown on this chart, such as congressional
                                       districts, states, and the Nation.




                                       Page 81                                                     GAO/GGD-97-142 2000 Census Design
Appendix II
Bureau of the Census Summary Information
on Design Alternatives for the 2000 Census




c
 The average census tract has a resident population of about 4,000 people; however, some
census tracts have very few people and some are far more populous because of specific local
circumstances and changes in settlement pattern since the census tracts initially were
established. To evaluate the effect of sampling on all except the most unusual census tracts, the
error distributions shown were “trimmed” (a widely used practice in analyzing large data sets) by
removing values for census tracts that contain only group quarters population, census tracts that
contain 10 or fewer people, the least populous 3 percent of all remaining census tracts, and the
most populous 3 percent of all remaining census tracts. Trimming was done separately for each
alternative shown.
d
 The single figure shown in the “Misses/Double Counts” column is the estimated net undercount
rate. In 1990, this rate was 1.6 percent, meaning that the 1990 census population total for the
United States failed to include more than 4 million people. The estimated net undercount rate in
2000 is projected to be 1.9 percent if there is no Quality Check operation. This estimate is based
on growth rates in the populations most difficult to count; this means Census 2000 likely will fail to
include more than 5.2 million people.

The net undercount figure fails to convey the magnitude of enumeration errors actually made
during a decennial census. Several studies looked at the component measures of “gross”
undercount — total people missed and total erroneous enumerations. There is no generally
accepted definition of gross undercount, and different studies had widely divergent totals for
people missed and people included erroneously. Regardless of the components included in the
different measures, the net undercount — people missed minus people included erroneously —
was about 4 million people.
e
 The “Combined Error” figures for congressional districts and census tracts do not include error
due to modeling; model error would not apply to either of the nonsampling alternatives.
f
 One might expect the “Sampling Error” figures for a congressional district always to be larger
than the sampling error figures for a state. However, some congressional districts are more
populous than the least populous states.
g
 To ensure highly accurate population totals for each state (because these totals are used to
apportion the 435 seats in the U.S. House of Representatives among the 50 states) the Census
Bureau has designated the sampling processes to yield a “coefficient of variation” of 0.5
percent. To control the “standard error” (the expected amount of variation in the number of
people that could be included in each state’s total) the designed coefficient of variation will
actually be smaller than 0.5 percent in the four states that had a 1990 census population of
12 million or more (Texas, New York, Florida, and California); the figures in parenthesis show the
range in variation.
h
 The single figure shown is the “average error” (the numeric mean) of the estimated error for
each individual entity (all congressional districts or all census tracts) in this geographic level.
i
 The figures in parenthesis show the “range of error” (lowest and highest situations) for all entities
in this geographic level. Negative figures identify overcount estimates; overcounts happen for
several reasons, including erroneous inclusion of some people on completed census forms. All
other figures identify undercount estimates; undercounts happen when residents of an area are
not included on any census form.
j
This scenario restores several coverage improvement activities as a substitute for the “Quality
Check” planned for Census 2000. These activities have been viewed by the National Academy of
Sciences, the Inspector General for the Department of Commerce, and many others as only
marginally effective. However, they are the only alternatives expected to deal with some aspects
of the total undercount and the differential participation rates among some segments of the
population.




Page 82                                                         GAO/GGD-97-142 2000 Census Design
Appendix II
Bureau of the Census Summary Information
on Design Alternatives for the 2000 Census




k
 An obvious question is, “Why should the Congress pay for the improvements (other than
sampling) offered by the plan for Census 2000 when the cost is nearly as high as repeating 1990
census methods and the quality appears no better?” The answer starts with understanding that
the census-taking environment in 2000 will be significantly more difficult than it was in 1990:
Mistrust of, and cynicism about, the government and its programs has increased; people are
increasingly resistant to intrusions on their time; there is increasing concern about privacy; the
number of people working more than one job has increased, along with the number of
multiple-worker families, so fewer people are home when an enumerator visits; and so forth.

Many major improvements planned for Census 2000 — a better address list, improved through
partnerships with the U.S. Postal Service and local and tribal governments, easy to read and
complete census forms, fewer questions to answer, multiple opportunities to respond, improved
publicity, and improved procedures for dealing with those who have no “usual” residence — are
needed to keep initial response rates about even with those in 1990. Studies from past censuses
have shown clearly that responses received by mail are of better quality than those gathered by
temporary field staff. So high initial response improves census quality and reduces census cost,
freeing resources to deal more effectively with the 33 percent of households likely not to respond
initially.

Those households that do not respond initially are likely to have attitudes that make finding and
including them more difficult. Thus, in the absence of the “Quality Check” operation planned for
Census 2000, the Census Bureau believes it would need to implement several additional
procedures in an attempt to ensure a complete and accurate census. These activities would
require the expenditure of additional funds, even though the National Academy of Sciences, the
Inspector General for the Department of Commerce, and many others, view these activities as
only marginally effective. The Census Bureau would include the following activities because they
are the only alternatives to the Quality Check expected to deal with some aspects of the total
undercount and the differential participation rates among various segments of the population.

•Conduct Census Bureau followup visits at all vacant housing units — not just a sample of them
— which increases requirements for temporary field staff. Added cost: $200 million

•Conduct field followup visits for all incomplete questionnaires (i.e., primarily data omissions),
which increases requirements for temporary field staff. Added cost: $150 million

•Expand partnership activities with state, local, and tribal governments and with various
community and business groups, which increases the requirements for temporary field staff.
Added cost: $25 million to $50 million

•Further expand marketing and publicity activities through increased media placements and
Census Bureau outreach activities. Added cost: $50 million to $100 million

•Deploy special activities and tactics (e.g., team enumeration and blanket census tract
coverage) not otherwise included in the plan for Census 2000 to assure reaching those segments
of the population traditionally difficult to reach and typically among the most undercounted.
Added cost: $25 million to $50 million

•Further expand enumerator supervision to assure greater quality in enumeration for
nonresponse. Added cost: $25 million to $50 million

Even with these added activities and their associated costs, there is no assurance that the
Census Bureau would be able to hold the undercount rate to the levels achieved in 1990: We
estimate the undercount would go up to approximately 1.9 percent of the total population. And, it
would cost more than alternative approaches from which we would get better results!




Page 83                                                       GAO/GGD-97-142 2000 Census Design
Appendix II
Bureau of the Census Summary Information
on Design Alternatives for the 2000 Census




l
 The Quality Check operation is estimated to cost $325 million. An obvious question is, “Why, if
the Quality Check costs $325 million and Alternative 2 does not include the Quality Check, does
Alternative 2 cost $300-400 million more than Alternative 3?” The answer is found in
understanding that having a response from every household (every address), plus responses
from all people living in group quarters (institutional facilities) and all other locations where people
without a usual address reside on Census Day (including migrant workers and the people often
referred to as “the homeless”) does not guarantee a complete enumeration of the population.
Even when the Census Bureau accounted for every known address in the 1990 census, the
census failed to include more than 4 million people. Only the Quality Check will find all the people
missed by normal decennial census procedures.

Thus, in the absence of the “Quality Check” operation planned for Census 2000, the Census
Bureau believes it would need to implement several additional procedures in an attempt to ensure
a complete and accurate census. These activities would require the expenditure of an additional
$500-600 million, the details of which are described in footnote i. The Census Bureau would
include theses additional activities because they are the only alternatives to the Quality Check
expected to deal with some aspects of the total undercount and the differential participation rates
among various segments of the population.

Adding $500-600 million in new activities, while subtracting $325 million for the Quality Check,
adds a net of about $175-275 million to the cost of Alternative 2. In addition, performing the
Evaluation Study that will be needed to assess the completeness of Census 2000 in the absence
of the Quality Check, adds an additional $125 million to the cost of Alternative 2, for a net
increase of $300-400 million when compared with Alternative 3.
m
  This scenario provides a reference point for comparison with earlier decennial census methods;
it is not an alternative the Census Bureau would ever propose to use again. It assumes:

•A 1990 census-taking environment in 2000;

•No partnerships with the U.S. Postal Service or local and tribal governments to improve the
address list and the maps used to guide temporary Census Bureau employees assigned to visit
nonresponding households;

•The continued use of 1990-style census forms designed to make processing easy for
computers, not respondents;

•Only one delivery of census forms to each address with no early notices, no reminder post
cards, and no replacement forms;

•No opportunity for responding via telephone;

•No opportunity to pick up blank census forms at convenient locations if no form was delivered to
the address of the person wishing to respond;

•No use of U.S. Postal Service knowledge about vacant housing units;

•No automated access to data tabulations for customer-specified geographic areas and
population groupings; and,

•No “Quality Check” to significantly reduce or eliminate the undercount in each state, each
congressional district, and most local and tribal governments, or to significantly reduce or
eliminate the differential undercounts among various segments of the population.




Page 84                                                        GAO/GGD-97-142 2000 Census Design
                                           Appendix II
                                           Bureau of the Census Summary Information
                                           on Design Alternatives for the 2000 Census




                                       n
                                        The $4.8 billion figure was estimated in 1992 in response to a question from the General
                                       Accounting Office. Although the Census Bureau has learned many things in the intervening five
                                       years, this figure has been retained as a benchmark for purposes of comparison. It does not
                                       include improvements planned for Census 2000 to counter the ongoing decline in the mail
                                       response rate, estimated to reach 55 percent without such improvements.

                                       If the Census Bureau found it necessary to make nonresponse followup visits at 45 percent of all
                                       households, instead of the 33 percent expected when using the improved procedures planned
                                       for Census 2000, the extra workload and cost would divert significant financial resources away
                                       from improved publicity and local/tribal partnership activities, as well as additional special
                                       procedures needed to deal with the most reluctant residents at nonresponding addresses.
                                       Repeating 1990 census procedures also would require more followup visits to apparently vacant
                                       housing units, and other procedures aimed at trying to include the traditionally hard-to-count.
                                       o
                                        The cost figure shown is higher than originally estimated because of necessary refinements
                                       identified subsequent to excluding these options from consideration for Census 2000.

                                       Source: Bureau of the Census summary chart, footnotes, and attachment (June 18, 1997).




Attachment 1
                                                                              2000/1990 RATIOS
                                             Projected                            Not Hispanic                                Hispanic
                             1990 Census       2000                                               Amer.            API +     (can be any
State Codes                      TOTAL              Total         White           Black           Indian         “Other”           race)
               UNITED
               STATES        248,709,873         1.10423        1.04748         1.14896         1.14548          1.46646          1.40312
63   01        Alabama         4,040,587         1.10147        1.09159         1.11352         1.06300          1.48703          1.47367
94   02        Alaska            550,043         1.18771        1.13427         1.22785         1.07502          2.32220          1.67618
86   04        Arizona         3,665,228         1.30893        1.23871         1.43201         1.22171          1.64231          1.55515
71   05        Arkansas        2,350,725         1.11939        1.11468         1.09301         1.22545          1.47510          1.78114
93   06        California     29,760,021         1.09278        0.91384         1.02155         0.92357          1.44806          1.38473
84   08        Colorado        3,294,394         1.26525        1.22898         1.39307         1.36705          1.60175          1.40029
16   09        Connecticut     3,287,116         0.99910        0.95210         1.12244         0.96790          1.43362          1.34820
51   10        Delaware          666,168         1.15220        1.10324         1.28455         1.07276          1.59955          1.60449
53   11        District of
               Columbia          606,900         0.86230        0.91740         0.79814         0.67971          1.19415          1.24699
59   12        Florida        12,937,926         1.17741        1.09811         1.26960         1.18241          1.54617          1.51884
58   13        Georgia         6,478,216         1.21558        1.15992         1.30216         1.21543          1.80126          1.74751
95   15        Hawaii          1,108,229         1.13461        1.04357         1.02558         1.15571          1.16397          1.32293
82   16        Idaho           1,006,749         1.33748        1.30453         1.90003         1.42914          1.55623          1.82060
33   17        Illinois       11,430,602         1.05426        1.00040         1.08302         0.99879          1.40220          1.40206
32   18        Indiana         5,544,159         1.09025        1.07505         1.15474         1.11493          1.46407          1.42174
42   19        Iowa            2,776,755         1.04432        1.02733         1.27244         1.19320          1.59977          1.62551
47   20        Kansas          2,477,574         1.07697        1.04678         1.18499         1.11482          1.46760          1.47784
61   21        Kentucky        3,685,296         1.08392        1.07860         1.09028         1.14045          1.47735          1.48194
72   22        Louisiana       4,219,973         1.04849        1.00561         1.11374         1.02292          1.37983          1.27807
                                                                                                                              (continued)

                                        Page 85                                                   GAO/GGD-97-142 2000 Census Design
                                              Appendix II
                                              Bureau of the Census Summary Information
                                              on Design Alternatives for the 2000 Census




Attachment 1
                                                                                    2000/1990 RATIOS
                                                 Projected                             Not Hispanic                               Hispanic
                                1990 Census        2000                                                Amer.           API +     (can be any
State Codes                         TOTAL               Total          White           Black           Indian        “Other”           race)
11   23        Maine              1,227,928          1.02544         1.02216         0.91493         0.97796         1.36832           1.37853
52   24        Maryland           4,781,468          1.10314         1.01334         1.24195         1.10961         1.51274           1.72371
14   25        Massachusetts      6,016,425          1.03030         0.98139         1.20818         0.98018         1.45724           1.51768
34   26        Michigan           9,295,297          1.04127         1.01828         1.10506         1.04919         1.44585           1.28821
41   27        Minnesota          4,375,099          1.10393         1.06980         1.63299         1.25371         1.72112           1.75265
64   28        Mississippi        2,573,216          1.09425         1.08019         1.10752         1.04004         1.54860           1.42941
43   29        Missouri           5,117,073          1.08272         1.06675         1.13972         1.17665         1.43599           1.45827
81   30        Montana              799,065          1.18846         1.17243         1.54728         1.25065         1.66946           1.68137
46   31        Nebraska           1,578,385          1.08051         1.05432         1.23405         1.21358         1.68420           1.62682
88   32        Nevada             1,201,833          1.55704         1.44317         1.67091         1.42260         2.07420           2.21791
12   33        New
               Hampshire          1,109,252          1.10365         1.09617         1.06905         1.14838         1.47377           1.51390
22   34        New Jersey         7,730,188          1.05790         0.97179         1.12152         1.14476         1.66426           1.41281
85   35        New Mexico         1,515,069          1.22793         1.19314         1.23095         1.22370         1.33586           1.27163
21   36        New York          17,990,455          1.00866         0.93419         1.03862         1.03730         1.40773           1.26675
56   37        North Carolina     6,628,637          1.17328         1.15627         1.19086         1.16099         1.74674           1.56188
44   38        North Dakota         638,800          1.03583         1.01661         1.36511         1.25615         1.62593           1.61886
31   39        Ohio              10,847,115          1.04348         1.02408         1.13818         1.07222         1.41716           1.31574
73   40        Oklahoma           3,145,585          1.07214         1.04141         1.19289         1.10532         1.39829           1.43386
92   41        Oregon             2,842,321          1.19521         1.15897         1.32502         1.25145         1.58609           1.71517
23   42        Pennsylvania      11,881,643          1.02697         1.00368         1.10125         1.17127         1.48854           1.43937
15   44        Rhode Island       1,003,464          0.99416         0.94914         1.18525         1.16644         1.11675           1.65560
57   45        South Carolina     3,486,703          1.10650         1.09792         1.11148         1.03248         1.39562           1.41838
45   46        South Dakota         696,004          1.11648         1.09946         1.58438         1.20611         1.72261           1.68107
62   47        Tennessee          4,877,185          1.15992         1.14393         1.19369         1.29541         1.69500           1.76137
74   48        Texas             16,986,510          1.18443         1.09527         1.21758         1.14037         1.55113           1.35377
87   49        Utah               1,722,850          1.28102         1.24794         1.63259         1.46452         1.75503           1.61397
13   50        Vermont              562,758          1.09604         1.08778         1.48662         1.08601         1.68444           1.60148
54   51        Virginia           6,187,358          1.13086         1.07650         1.20923         1.08587         1.63047           1.67326
91   53        Washington         4,866,692          1.20377         1.15636         1.22703         1.25074         1.64470           1.67648
55   54        West Virginia      1,793,477          1.02649         1.02270         1.02458         1.01100         1.40708           1.46413
35   55        Wisconsin          4,891,769          1.08883         1.05972         1.31732         1.19069         1.78329           1.44430
83   56        Wyoming              453,588          1.15678         1.13747         1.28955         1.33115         1.65389           1.33373

                                              Source: Bureau of the Census summary chart, footnotes, and attachment (June 18, 1997).




                                              Page 86                                                  GAO/GGD-97-142 2000 Census Design
Appendix III

Comments From the Bureau of the Census




               Page 87        GAO/GGD-97-142 2000 Census Design
Appendix IV

Major Contributors to This Report


                        James H. Burow, Assistant Director
General Government      Victoria E. Miller, Evaluator-in-Charge
Division                Timothy A. Bober, Senior Evaluator
                        Jacqueline E. Matthews, Senior Evaluator
                        Kiki Theodoropoulos, Senior Evaluator (Communications Analyst)
                        Thomas M. Beall, Technical Advisor
                        Thomas B. Jabine, Statistical Consultant


                        Arthur J. Kendall, Senior Mathematical Statistician
National Security and
International Affairs
Division
                        Alan N. Belkin, Assistant General Counsel
Office of General       James M. Rebbe, Attorney-Advisor
Counsel, Washington,
D.C.




                        Page 88                                      GAO/GGD-97-142 2000 Census Design
Appendix IV
Major Contributors to This Report




Page 89                             GAO/GGD-97-142 2000 Census Design
Appendix IV
Major Contributors to This Report




Page 90                             GAO/GGD-97-142 2000 Census Design
Related GAO Products


              Addressing the Deficit: Budgetary Implications of Selected GAO Work for
              Fiscal Year 1998 (GAO/OCG-97-2, Mar. 14, 1997).

              High-Risk Series (GAO/HR-97-1 and 97-2, Feb. 1997).

              Addressing the Deficit: Updating the Budgetary Implications of Selected
              GAOWork (GAO/OCG-96-5, June 28, 1996).

              Decennial Census: Fundamental Design Decisions Merit Congressional
              Attention (GAO/T-GGD-96-37, Oct. 25, 1995).

              Addressing the Deficit: Budgetary Implications of Selected GAO Work for
              Fiscal Year 1996 (GAO/OCG-95-2, Mar. 15, 1995).

              Decennial Census: 1995 Test Census Presents Opportunities to Evaluate
              New Census-Taking Methods (GAO/T-GGD-94-136, Sept. 27, 1994).

              Decennial Census: Promising Proposals, Some Progress, but Challenges
              Remain (GAO/T-GGD-94-80, Jan. 26, 1994).

              Decennial Census: Test Design Proposals Are Promising, but Fundamental
              Reform Is Still at Risk (GAO/T-GGD-94-12, Oct. 7, 1993).

              Decennial Census: Fundamental Reform Jeopardized by Lack of Progress
              (GAO/T-GGD-93-6, Mar. 2, 1993).

              Transition Series: Commerce Issues (GAO/OCG-93-12TR, Dec. 1992).

              Decennial Census: 1990 Results Show Need for Fundamental Reform
              (GAO/GGD-92-94, June 9, 1992).

              1990 Census: Reported Net Undercount Obscured Magnitude of Error
              (GAO/GGD-91-113, Aug. 22, 1991).

              1990 Census Adjustment: Estimating Census Accuracy—A Complex Task
              (GAO/GGD-91-42, Mar. 11, 1991).

              Progress of the 1990 Decennial Census: Some Causes for Concern
              (GAO/T-GGD-90-44, May 21, 1990).




              Page 91                                       GAO/GGD-97-142 2000 Census Design
           Related GAO Products




           Critical Issues for Census Adjustment: Completing Post Enumeration
           Survey on Time While Protecting Data Quality (GAO/T-GGD-90-15, Jan. 30,
           1990).




(410022)   Page 92                                      GAO/GGD-97-142 2000 Census Design
Ordering Information

The first copy of each GAO report and testimony is free.
Additional copies are $2 each. Orders should be sent to the
following address, accompanied by a check or money order
made out to the Superintendent of Documents, when
necessary. VISA and MasterCard credit cards are accepted, also.
Orders for 100 or more copies to be mailed to a single address
are discounted 25 percent.

Orders by mail:

U.S. General Accounting Office
P.O. Box 6015
Gaithersburg, MD 20884-6015

or visit:

Room 1100
700 4th St. NW (corner of 4th and G Sts. NW)
U.S. General Accounting Office
Washington, DC

Orders may also be placed by calling (202) 512-6000
or by using fax number (301) 258-4066, or TDD (301) 413-0006.

Each day, GAO issues a list of newly available reports and
testimony. To receive facsimile copies of the daily list or any
list from the past 30 days, please call (202) 512-6000 using a
touchtone phone. A recorded menu will provide information on
how to obtain these lists.

For information on how to access GAO reports on the INTERNET,
send an e-mail message with "info" in the body to:

info@www.gao.gov

or visit GAO’s World Wide Web Home Page at:

http://www.gao.gov




PRINTED ON    RECYCLED PAPER
United States                       Bulk Rate
General Accounting Office      Postage & Fees Paid
Washington, D.C. 20548-0001           GAO
                                 Permit No. G100
Official Business
Penalty for Private Use $300

Address Correction Requested