United States Government Accountability Office GAO Report to the Chairman, Subcommittee on Federal Financial Management, Government Information, Federal Services, and International Security, Committee on Homeland Security and Governmental Affairs, U.S. Senate February 2012 FEDERAL STATISTICAL SYSTEM Agencies Can Make Greater Use of Existing Data, but Continued Progress Is Needed on Access and Quality Issues GAO-12-54 February 2012 FEDERAL STATISTICAL SYSTEM Agencies Can Make Greater Use of Existing Data, but Continued Progress Is Needed on Access and Quality Issues Highlights of GAO-12-54, a report to the Chairman, Subcommittee on Federal Financial Management, Government Information, Federal Services, and International Security, Committee on Homeland Security and Governmental Affairs, U.S. Senate Why GAO Did This Study What GAO Found As demand for more and better The Office of Management and Budget (OMB), agencies, and interagency information increases, rising costs and statistical committees have distinct roles in identifying opportunities to improve other challenges require that the federal information collection efforts. OMB exercises several authorities that federal statistical system identify promote the system’s efficiency, including overseeing and approving agency efficiencies. To explore opportunities to information collections. The website Reginfo.gov provides the public with improve cost-effectiveness, GAO was information, such as cost and burden, on collections that OMB reviews, though asked to (1) review how the Office of GAO’s review identified some discrepancies in selected items. OMB periodically Management and Budget (OMB) and issues guidance to agencies on complying with federal requirements for agencies improve information information collections, but this guidance generally does not prescribe specific collections, (2) evaluate opportunities actions to take. GAO’s analysis of agencies’ documentation of active surveys and constraints for agencies to use administrative data (information indicated that 77 percent included detailed descriptions of efforts to identify collected as part of the administration duplication, while those that did not tended to be for collections that are unlikely of a program or held by private to duplicate existing information; and 75 percent reported actions beyond those companies) with surveys, and (3) required by statute to solicit external input. OMB, through enhanced guidance, assess the benefits and constraints of could promote additional awareness of options agencies can take to identify surveys making greater use of the duplication and solicit input. Interagency committees, which primarily draw Census Bureau’s American members from the 13 agencies that have statistics as their primary focus, are Community Survey (ACS) data and particularly important in helping ensure collaboration. The committees have resources. GAO focused on collections numerous projects underway aimed at addressing key challenges facing the administered to households and statistical system. However, mechanisms for disseminating information about individuals, analyzed statutory and their work are not comprehensive or up-to-date. Though member agencies are agency documents, did five case the most-likely customers of the committees’ products, making information about studies of surveys, reviewed committee work and priorities more accessible could benefit other agencies, documentation of representative academics, and the general public. It could also benefit committee members by samples of active surveys, and providing a central repository for information. interviewed agency officials and experts. Administrative data have greater potential to supplement rather than replace survey data. Agencies currently combine the two data sources in four key ways What GAO Recommends to cost-effectively increase efficiency and quality. Specifically, agencies use GAO recommends that OMB take administrative data to: (1) link to survey data to create new data products; (2) several actions to improve the broader supplement surveys’ sample frames; (3) compare to survey data to improve efficiency of the federal statistical accuracy and design of surveys; and (4) combine with survey data to create, or system, including implementing model, estimates. However, expanding the use of administrative data faces key additional quality-control procedures constraints related to the access and quality of the data. While agencies and for selected website data, enhancing committees are taking steps to address these constraints and facilitate the awareness of ways to meet information process through which agencies work together to share data, individual tools collection requirements, better may not be sufficient. A more-comprehensive framework for use by all agencies disseminating information on involved in data-sharing decisions that includes key questions to consider when interagency committees, and evaluating potential use of administrative data could make the decision process developing comprehensive guidance more consistent and transparent. for agencies to use when considering data sharing. OMB generally agreed ACS, an ongoing monthly survey that provides information about the nation’s with all of GAO’s recommendations. communities, offers agencies important opportunities to increase the efficiency and reduce the costs of their surveys, but its current design limits the extent to which agencies can utilize some of these opportunities. Uses that do not affect ACS design or the survey’s respondents, such as using ACS estimates to inform View GAO-12-54. For more information, contact Robert Goldenkoff at (202) 512-2757 survey design or evaluate other surveys’ results, have widespread potential. or firstname.lastname@example.org or Ronald S. Fecso at However, more-intensive uses, such as adding content or supplemental surveys (202) 512-7791 or email@example.com. to the ACS, currently have limited potential. United States Government Accountability Office Contents Letter 1 Background 4 OMB and Agencies Take a Number of Steps to Ensure Efficient Information Collections, Though Opportunities Exist for Refinements 8 Administrative Data Could Help Improve Federal Surveys, but Continued Progress Is Needed on Access and Quality Issues 19 Prospects for Enhanced Use of the ACS with Other Surveys Are Mixed 28 Conclusions 34 Recommendations for Executive Action 35 Agency Comments and Our Evaluation 36 Appendix I Scope and Methodology 39 Appendix II Description of Case-Study Surveys 45 Appendix III Selected Statutes Related to Information Collection 51 Appendix IV Printable Interactive Graphic 53 Appendix V Comments from the Department of Commerce 54 Appendix VI GAO Contacts and Staff Acknowledgments 56 Tables Table 1: Overview of Interagency Statistical Committees 17 Table 2: Key Characteristics of the ACS 29 Table 3: Number of Collections, by Stratum 42 Table 4: Actions Taken to Address Constraints That Hamper Greater Use of Administrative Data 53 Page i GAO-12-54 Federal Statistical System Figures Figure 1: The Thirteen Principal Statistical Agencies and Their Parent Organizations 5 Figure 2: Most Information Collections from Households and Individuals Have Relatively Modest Costs 7 Figure 3: Actions Taken to Address Constraints That Hamper Greater Use of Administrative Data 25 Abbreviations ACS American Community Survey BLS Bureau of Labor Statistics CE Surveys Consumer Expenditure Surveys CED Consumer Expenditure Diary Survey CEQ Consumer Expenditure Quarterly Interview Survey CIPSEA Confidential Information Protection and Statistical Efficiency Act ERS Economic Research Service FCSM Federal Committee on Statistical Methodology ICSP Interagency Council on Statistical Policy NCHS National Center for Health Statistics NCSES National Center for Science and Engineering Statistics NHANES National Health and Nutrition Examination Survey NHIS National Health Interview Survey NSCG National Survey of College Graduates OIRA Office of Information and Regulatory Affairs OMB Office of Management and Budget PRA Paperwork Reduction Act ROCIS Regulatory Information Service Center and OIRA Consolidated Information System SCOPE Statistical Community of Practice and Engagement SIPP Survey of Income and Program Participation This is a work of the U.S. government and is not subject to copyright protection in the United States. The published product may be reproduced and distributed in its entirety without further permission from GAO. However, because this work may contain copyrighted images or other material, permission from the copyright holder may be necessary if you wish to reproduce this material separately. Page ii GAO-12-54 Federal Statistical System United States Government Accountability Office Washington, DC 20548 February 24, 2012 The Honorable Thomas R. Carper Chairman Subcommittee on Federal Financial Management, Government Information, Federal Services, and International Security Committee on Homeland Security and Governmental Affairs United States Senate Dear Mr. Chairman: Information is a critical strategic asset, and all levels of government, as well as businesses and private citizens, depend on relevant, accurate, and timely social, demographic, financial, and other federally funded data- collection efforts to inform their planning and other decisions. Collectively, this information plays a vital role in measuring the health and well-being of the nation, informing private-sector investment, allocating federal funding, and measuring the outcomes of government programs. However, the federal statistical system, including (1) agencies that collect and analyze data, and (2) the Office of Management and Budget (OMB), which oversees the system, faces several challenges. Key among them is that the demand for information is increasing, especially as organizations look for ways to operate more cost-effectively, while the cost of collecting data is growing and response rates to surveys—both government and private-sector—are declining, driven in part by concerns over privacy and confidentiality. In the face of these challenges, it will be important for federal statistical agencies to identify opportunities to increase their efficiency, while maintaining or improving data quality and minimizing respondent burden and respecting privacy and confidentiality concerns. Greater use of administrative data, which includes information collected as part of the execution of government programs as well as information held by private companies, has been proposed as one approach to enhance efficiency and quality. 1 Another potential approach is making greater use of the American Community Survey (ACS), a monthly survey 1 Examples of administrative data include Social Security Administration records, state unemployment records, medical records, and store loyalty-card data. Page 1 GAO-12-54 Federal Statistical System that replaced the census long form and provides annual data on communities’ demographic, social, economic, and housing conditions. At your request, this report (1) reviews the ways in which OMB and agencies identify opportunities for improvement and increased efficiency; (2) evaluates opportunities and constraints for the statistical agencies to use administrative data in conjunction with selected surveys; and (3) assesses the benefits and constraints of selected surveys making greater use of ACS data and resources. To achieve our objectives, we focused our review on statistical information collections administered to households and individuals, as opposed to businesses or other entities, and subject to the Paperwork Reduction Act (PRA), which requires OMB approval of certain federal data collections. 2 Specifically, we performed case studies of five federal surveys: the Consumer Expenditure Surveys, sponsored by the Bureau of Labor Statistics (BLS); the National Health and Nutrition Examination Survey and the National Health Interview Survey, both sponsored by the National Center for Health Statistics (NCHS); the National Survey of College Graduates, sponsored by the National Center for Science and Engineering Statistics (NCSES), part of the National Science Foundation; and the Survey of Income and Program Participation, sponsored by the Census Bureau. We selected these surveys based on several factors, such as their size and cost and whether they use or have the potential to use administrative data or ACS data. We focused our selection on large surveys, in terms of both cost and number of respondents, because potential cost savings and efficiency gains are likely greatest for them. Additionally, to address all three objectives, we examined related statutes and regulations, applicable OMB guidance, documentation of the ACS and our case study surveys, papers and reports, and our own prior work. 3 To gain an understanding of the information collections in our scope, we reviewed publicly available data from Reginfo.gov, a government website with information on agency requests for OMB approval of information 2 The PRA is codified at 44 U.S.C. §§ 3501-3521. 3 For examples of our prior work, see: GAO, Federal Information Collection: A Reexamination of the Portfolio of Major Federal Household Surveys Is Needed, GAO-07-62 (Washington, D.C.: Nov. 15, 2006); American Community Survey: Key Unresolved Issues, GAO-05-82 (Washington, D.C.: Oct. 8, 2004). Page 2 GAO-12-54 Federal Statistical System collections. 4 We analyzed the subject matter of all of the collections in our scope and, for a representative sample of 106 surveys, analyzed agencies’ reported efforts to identify duplication and consult with persons outside of the agency. We interviewed experts on the federal statistical system and officials at OMB and the four agencies that administer the case-study surveys to learn about coordination among agencies, efforts agencies take to identify improvement, and experts’ and officials’ perspectives on current and potential uses of administrative data and ACS. We also interviewed and discussed these topics with officials at the Department of Agriculture’s Economic Research Service (ERS), which is a member of several interagency statistical committees and the lead agency for the Statistical Community of Practice and Engagement. 5 In evaluating OMB, agency, and interagency actions to improve efficiency, we used as criteria the requirements of the PRA and practices identified in our prior work on agency collaboration. 6 For the purposes of this review, we assessed the reliability of the data from the Reginfo.gov website and determined that they were reliable for some of our purposes but not others. Specifically, we reviewed related documentation, conducted interviews with OMB officials, and compared selected data elements from the Reginfo.gov website to supporting documents. We determined that the data were sufficiently reliable for purposes of identifying the collections within our scope and obtaining information on the collections’ subject matter and actions taken by agencies to identify duplication and solicit input. As described later in this report, data provided on the website were not sufficiently reliable for the purpose of assessing collections’ annual cost to the federal government and annual respondent burden hours. Appendix I includes additional information on our scope and methodology. Appendix II contains more detailed descriptions of our case-study surveys. We conducted this performance audit from December 2010 to February 2012 in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audit to 4 The website is maintained by OMB and the General Services Administration. 5 The Statistical Community of Practice and Engagement is an interagency committee that focuses on providing a collaborative community for agencies that focus on statistics. 6 GAO, Results-Oriented Government: Practices That Can Help Enhance and Sustain Collaboration among Federal Agencies, GAO-06-15 (Washington, D.C.: Oct. 21, 2005). Page 3 GAO-12-54 Federal Statistical System obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives. In contrast to many other countries, the United States does not have a Background primary statistical agency. 7 Instead, the statistical system is decentralized, with statistical agencies generally located in different government departments. This structure keeps statistical work within close proximity to the various cabinet-level departments that use the information. There are 13 federal agencies, referred to as the principal statistical agencies, which have statistical activities as their core mission. These agencies conduct much of the government’s statistical work, though there are more than 80 additional federal agencies that carry out some statistical work in conjunction with their primary missions. The 13 principal statistical agencies are all attached to a cabinet-level department or an independent agency that reports to the president. As shown in figure 1, they are located at different levels within their respective departments and agencies. 7 Examples of countries that have centralized statistical agencies include Australia, Canada, and Sweden. Page 4 GAO-12-54 Federal Statistical System Figure 1: The Thirteen Principal Statistical Agencies and Their Parent Organizations Note: The 13 principal statistical agencies’ names are displayed in boxes within the figure. Page 5 GAO-12-54 Federal Statistical System For fiscal year 2011, $6.83 billion was requested for statistical work, which includes the collections in our scope as well as work that focuses on entities other than households and individuals, such as businesses and farms. 8 This amount is about 0.2 percent of that year’s total federal budget request. Much of this work is concentrated in the 13 principal statistical agencies, which account for approximately 40 percent of requested funding. The budget request for the Census Bureau is among the highest of the principal statistical agencies. Excluding funding related to the decennial census, the fiscal year 2011 budget request for the Census Bureau was $558 million. 9 In addition to conducting its own statistical activities, the Census Bureau also performs statistical work for other agencies on a reimbursable basis. Most of the collections in our scope have relatively modest annual costs. In a sample of 112 information collections that fell within our scope and were active as of September 22, 2011, the majority cost less than $500,000 annually, and fewer than one in five cost more than $1 million annually. 10 There are a few, more expensive, large and broad-based collections such as the Current Population Survey and National Health Interview Survey, both of which cost tens of millions of dollars each year, and the ACS, which costs over $200 million annually (see fig. 2). 8 The amount requested for statistical work includes requested funding for work done by federal agencies that have annual budgets of $500,000 or more for statistical work. OMB presented this information in Statistical Programs of the United States Government, Fiscal Year 2011, an annual report that it prepares on statistical program funding. This was the most up-to-date budget information available at the time of our review. 9 When decennial census costs are included, the fiscal year 2011 budget request for the Census Bureau was $1.3 billion. 10 The sample of 112 collections was designed to be representative of the population of 555 collections in our scope that were active as of September 22, 2011. Page 6 GAO-12-54 Federal Statistical System Figure 2: Most Information Collections from Households and Individuals Have Relatively Modest Costs Various statutes and guidance from OMB and other entities establish standards for quality and privacy that apply to the federal statistical system. One of the most significant statutes is the PRA, which designates OMB as the coordinating body of the federal statistical system. The PRA establishes requirements that agencies must meet in order to administer information collections, and OMB must meet in overseeing the system, including that it issue guidance to agencies. Other entities also provide guidance to agencies that conduct statistical work. 11 In addition, use of information must be balanced with protection of privacy and confidentiality. Statutes such as the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) apply to the federal statistical 11 For example, the Committee on National Statistics publishes “Principles and Practices for a Federal Statistical Agency” every 4 years in order to provide a current edition to newly appointed cabinet secretaries at the beginning of each presidential administration. This report outlines basic principles that statistical agencies should adhere to in order to carry out their missions effectively, as well as practices designed to help implement them. Page 7 GAO-12-54 Federal Statistical System system and focus on ensuring the privacy and confidentiality of respondents’ information. 12 Agency-specific statutes also protect the privacy and confidentiality of data collected by those agencies. For example, Title 13 of the U.S. Code authorizes the Census Bureau to request and collect information from individuals but also guarantees the confidentiality of these data and establishes penalties for unlawfully disclosing this information. Additional statutes are described in more detail in appendix III. OMB and Agencies Take a Number of Steps to Ensure Efficient Information Collections, Though Opportunities Exist for Refinements OMB Uses Its Oversight Under PRA, OMB, through its Office of Information and Regulatory Affairs Authority to Improve (OIRA), has responsibility and broad authority to improve the efficiency Efficiency and effectiveness of federal information resources. 13 In this regard, OMB is charged with the oversight and coordination of federal agencies’ statistical activities. Specifically, these oversight functions are carried out by OIRA’s Statistical and Science Policy Branch, headed by the Chief Statistician, which includes five staff members who work closely on these oversight and coordination activities with approximately 25 other OIRA desk officers. OMB exercises four key authorities that contribute to the efficiency of the federal statistical system: • Oversight and approval of information collections: OMB generally must approve information collections that are to be administered to 10 or more people. 14 OMB staff review agencies’ information collection 12 44 U.S.C. § 3501 note. 13 44 U.S.C. § 3504. 14 Under PRA the term “person” includes, among others, individuals, partnerships, associations, corporations, and state and local governments. Page 8 GAO-12-54 Federal Statistical System requests to determine whether proposed collections meet PRA standards by assessing such factors as whether they are necessary for the mission of the agency and do not unnecessarily duplicate existing information. This review also enables OMB to identify opportunities for improvement. For example, according to OMB and agency officials, if it determines that it is necessary to ask similar questions in multiple collections, then OMB works to ensure that agencies ask them in a consistent manner, when appropriate. • Standard-setting and guidance to agencies: OMB is responsible for developing and implementing governmentwide policies, principles, standards, and guidelines related to statistical issues, such as procedures and methods for collecting data and disseminating information. Specifically, OMB issues directives, guidance, and memorandums, and provides additional information through information sessions and presentations, to guide federal data collection and promote the quality and efficiency of information collections. For example, OMB published “Questions and Answers When Designing Surveys for Information Collections,” a set of 81 questions and answers on the OMB review process for agency information collection requests required by PRA. 15 OMB also issued Standards and Guidelines for Statistical Surveys, which outlines 20 standards and related guidelines for the design and methodology of statistical surveys. Finally, OMB issues memorandums focusing on various topics, with recent ones clarifying the guidance for complying with PRA and encouraging agencies to coordinate efforts to share data. • Budget development and reporting: Although agency budgets are initiated within agencies, OMB is responsible, under PRA, for ensuring that agency budget proposals are consistent with systemwide priorities for maintaining and improving the quality of federal statistics. In addition to the budgets themselves, OMB reports information to the public and Congress about the identification of key priorities through key documents. OMB annually reports on the paperwork burden federal collections impose on the public in the Information Collection Budget of the United States Government. In addition, OMB annually describes statistical program funding and proposed program changes for statistical activities in the Statistical Programs of the United States Government. 15 OMB, “Questions and Answers When Designing Surveys for Information Collections,” (January 2006). Page 9 GAO-12-54 Federal Statistical System • Other statistical-policy coordination activities: The Chief Statistician and staff in OMB’s Statistical and Science Policy Branch participate in both formal and informal coordination activities with agencies. OMB’s role and participation in the formal interagency committees are discussed later in this report. In general, it maintains regular contact with staff at principal statistical agencies. Additionally, OMB encourages agencies that are designing information collections to collaborate with principal statistical agencies because they can help improve survey design and methodology. For example, according to OMB and Census Bureau officials, OMB encouraged the Corporation for National and Community Service to work with the Census Bureau and BLS to sponsor a supplement to the Current Population Survey rather than a stand-alone survey. OMB indicated that sponsoring this supplement likely resulted in cost savings, improved data quality, and greater utility. 16 The Reliability of One tool that OMB uses to facilitate its oversight and coordination Information in OMB’s functions under PRA is an internal system called the Regulatory Information Collections Information Service Center and OIRA Consolidated Information System (ROCIS), which contains information on all active collections and those Database Needs to Be pending OMB approval. Agencies use the system to submit information Improved collection requests. This system also facilitates OIRA’s review of the requests and underlies the information provided on the public website Reginfo.gov. Agency submissions to OMB typically include a copy of the data-collection instrument (e.g., a survey) and supporting documentation that, in a standardized form, provides information on the collection, such as the estimated annual burden hours and cost to the federal government. Further, under PRA, agencies must certify that the collection satisfies the act’s standards, for example that the collection avoids unnecessary duplication. Making this information transparent and easily accessible to other agencies facilitates coordination and can potentially help agencies avoid duplication and identify opportunities for improvement. Furthermore, OMB uses the information contained in its internal system to track reviews of information collections, and to compile 16 Similar coordination could be fruitful for other surveys as well. For example, we have noted concerns about the surveys the Department of Labor uses for Davis-Bacon Act wage determination and have recommended that the department seek help from an independent statistical organization to ensure survey methods are sound and in accordance with best practices. See GAO, Davis-Bacon Act: Methodological Changes Needed to Improve Wage Survey, GAO-11-152 (Washington, D.C.: March 22, 2011). Page 10 GAO-12-54 Federal Statistical System quantitative data for the Information Collection Budget of the United States Government. Despite the benefits of this electronic system, our review identified some discrepancies between the Reginfo.gov website’s data and the underlying documentation for certain key variables. Specifically, we reviewed a systematic random sample of 56 of the 555 collections in our scope and checked the reported information for annual cost to the federal government and annual burden hours. For 11 of the 56 information collections, the information on cost or burden, or both, did not match between the two sources. In cases where annual cost did not match, the differences ranged from $1,000 to $19.3 million. In cases where annual burden hours did not match, the differences ranged from 30 to almost 500,000 hours. OMB confirmed that the information in the external Reginfo.gov system is the same as in its internal ROCIS system. As a result, these discrepancies raise questions about the confidence that users can have in both the internal and external databases and may affect OMB’s ability to track information collection requests. OMB officials told us that responsibility for ensuring data reliability is shared between OMB and agencies. The Regulatory Information Service Center has issued detailed guidance to agencies on how to upload information into ROCIS, and the system has a function that allows agencies to check the completeness of data for individual information collections to ensure that no required data are missing. Entering this information is not always straightforward, however, and some interpretation of the underlying documentation may be required. The discrepancies that we identified indicate that additional actions, such as edit checks, review by an informed staff member, or increased clarification in supporting documents are necessary to ensure the reliability of Reginfo.gov and ROCIS data. Agencies Identify Our analysis indicated that agencies addressed PRA standards related to Duplication and Solicit duplication and public comment in their information collection requests to Input to Enhance OMB, and in many cases went beyond the actions specifically described in PRA and related OMB guidance. Efficiency The elements of PRA most directly related to our review were identifying duplication and soliciting external input on proposed collections. To analyze agencies’ actions in these two areas, we reviewed a generalizable sample of supporting statements from 106 active statistical Page 11 GAO-12-54 Federal Statistical System information collections administered to households and individuals. 17 Each of the supporting statements we reviewed addressed those PRA standards, as required, and in many cases included detailed descriptions, the content of which we analyzed in order to identify the range of actions that agencies took. Although agencies must address how their proposed collections meet PRA standards, the act and OMB guidance do not prescribe many specific actions that agencies need to take in addressing these standards. Regarding duplication, PRA does not dictate how agencies should address the standard. Regarding external input, PRA does require that agencies at a minimum provide notice in the Federal Register to allow the public to comment on proposed collections as well as consult with members of the public and affected agencies. OMB guidance expands somewhat on ways that agencies can address these standards, particularly in the case of surveys using statistical methods. However, just as with PRA, much is left to the discretion of agencies and little is specifically required. For example, OMB guidance states that agencies should review existing studies and consult with survey methodologists and data users. 18 Identifying potential duplication: Our analysis showed that agencies took various steps to comply with the PRA requirement that information collections do not unnecessarily duplicate an available information source. 19 Specifically, based on our analysis, we estimated the following for the universe of collections in our scope: • 77 percent included detailed explanations of the actions taken to identify potential duplication. 20 Those supporting statements that did 17 Information collections’ supporting statements contain a narrative section through which agencies describe their efforts to identify potential duplication. Our review focused on collections that were active as of May 17, 2011. 18 These directions are provided in OMB’s “Questions and Answers When Designing Surveys for Information Collections,” and OMB Circular No. A-130. 19 OMB defines unnecessary duplication as information similar to or corresponding to information that could serve the agency’s purposes and need and is already accessible to the agency. (OMB, The Paperwork Reduction Act of 1995: Implementing Guidance for OMB Review of Agency Information Collection, draft [Aug. 16, 1999]). 20 The 95 percent confidence interval for this estimate is (68, 85). Page 12 GAO-12-54 Federal Statistical System not include detailed explanations were generally for information collections that had unique scopes or other characteristics that made them unlikely to duplicate existing information. • 57 percent reported reviewing other surveys when looking for duplication. 21 For example, the National Cancer Institute identified seven other surveys that collected information similar to that of a Current Population Survey supplement on tobacco use and explained why the data from these surveys could not replace those collected through the supplement. • 46 percent indicated that the agency considered administrative data as a potential source of data. 22 • About a quarter indicated that they consulted with other entities, such as agencies, and a similar number reported that they conducted literature searches. 23 • In addition, for six of the information collections in our sample, agencies sponsored a collection in the form of a supplement to the Current Population Survey rather than creating a stand-alone survey, thus piggybacking onto another survey vehicle to potentially avoid duplication. Despite these steps, the collection of similar data in different surveys is unavoidable for methodological reasons. In some cases, agencies need to ask the same or similar questions because different surveys target different populations. Both the National Survey of College Graduates and the Current Population Survey ask about respondents’ college degree and occupation, but the National Survey of College Graduates targets individuals in the United States who have bachelor’s degrees or higher in science or engineering, while the Current Population Survey targets a nationally-representative sample of U.S. civilians aged 16 and older. Furthermore, according to agency officials, assessing relationships among survey variables may require asking the same or similar questions in different surveys. For example, it is common for surveys to ask for respondents’ ages in order to analyze how responses to other questions vary according to this variable. In addition, asking the same question among surveys allows agencies to compare survey estimates and 21 The 95 percent confidence interval for this estimate is (46, 68). 22 The 95 percent confidence interval for this estimate is (32, 59). 23 Twenty-four percent of collections indicated that agencies consulted with another entity, and 25.5 percent reported that they conducted literature searches. The 95 percent confidence intervals for these estimates are (15, 35) and (16, 37), respectively. Page 13 GAO-12-54 Federal Statistical System evaluate surveys’ data quality. In order to facilitate comparisons among surveys, OMB encourages asking consistent questions, when possible, about certain characteristics such as race and ethnicity. Further, when considered from an individual’s perspective, duplication of survey questions is relatively rare. This is because in a given year a very small percentage of households are selected to participate in a single collection within our scope. 24 The likelihood that a household would be selected for participation in more than one collection, and thus household members be asked the same question more than once, is considerably lower. Soliciting input and feedback on information collections: Most agencies in our scope took steps to seek outside input beyond those prescribed by OMB guidance and the PRA. On the basis of our analysis, we estimated the following for the universe of collections in our scope: • 75 percent indicated that the agency reported obtaining external feedback in addition to publishing notices in the Federal Register. 25 • 57 percent indicated that agencies consulted with experts. 26 For example, the sponsor of the National Survey of Women Veterans, a survey on the health-care needs, experiences, and preferences of women veterans, consulted with individuals representing a variety of research and clinical backgrounds, such as public health, social welfare, and psychology. • Agencies less frequently reported consulting with other agencies, contractors or subcontractors, or interagency or advisory committees. 27 In addition, agencies reported soliciting feedback directly from former and potential survey respondents and data users and customers. 28 They also reported conducting literature searches 24 For example, the ACS is the largest survey in our scope and is administered annually to 2.5 percent of households. 25 The 95 percent confidence interval for this estimate is (65, 83). 26 The 95 percent confidence interval for this estimate is (46, 67). 27 Thirty nine percent of collections indicated that agencies consulted with other agencies, 32 percent reported contacting contractors or subcontractors, and 15 percent described meeting with interagency or advisory committees. The 95 percent confidence intervals for these estimates are (30, 49), (22, 41), and (8, 25), respectively. 28 For example, we estimate that 15 percent reported soliciting input from data users and customers. The 95 percent confidence interval for this estimate is (8, 25). Page 14 GAO-12-54 Federal Statistical System and sponsoring or participating in workshops, panels, or other events. 29 Agencies in our sample reported making changes in response to input, potentially resulting in improvements to their information collections. For example, in response to recommendations made by the Committee on National Statistics on a Current Population Survey supplement about food security, ERS reported entering into an agreement with Iowa State University to study food security measurement issues. This collaborative project is exploring alternatives to an aspect of the supplement’s current design, which could result in alternatives to methods used to estimate food security prevalence and potentially improve measurement precision and reliability. The U.S. Geological Survey also reported incorporating changes in response to feedback on its Landsat Survey. 30 Agencies’ actions to find duplication and solicit input that we identified in our review, as well as others that OMB may identify, could be useful for OMB to share with other agencies that sponsor information collections. Offering more-detailed guidance in a single document that outlines different actions agencies can take to identify duplication and solicit input would help ensure that agencies are aware of the various options. It would also allow them to easily access and reference this information. OMB could include this information in one of its periodic memorandums related to compliance with the PRA. We previously reported on the importance of establishing ways to operate across agency boundaries, and promoting these actions is one way OMB can do this. 31 Also, just as OMB’s guidance to agencies in complying with the Information Quality Act gives agencies flexibility to determine the most appropriate actions, it is important that any new guidance continue to give agencies discretion in 29 Factors that facilitate interaction among agencies and between agencies and others in the statistical community include agency staff’s professional involvement in committee work, movement by some staff to other agencies during their careers, and training opportunities. For example, survey methodologists work together on various interagency subcommittees. Plus, their professional development also includes attending local and other conferences at which papers are presented describing uses and activities related to surveys in other agencies. These opportunities for cross-agency professional knowledge transfer facilitate collaboration and the identification of opportunities for efficiency. 30 The Landsat Survey collects information from professional users of satellite imagery to better understand the uses and applications of moderate-resolution satellite imagery as well as information about the users. 31 GAO-06-15. Page 15 GAO-12-54 Federal Statistical System the number and types of actions they take to identify duplication and solicit input. This is because the most appropriate actions will vary based on the characteristics of the collection. Interagency Committees Interagency statistical committees offer opportunities for broader Facilitate Collaboration, collaboration to increase the efficiency of the federal statistical system. but Better Communication Three key committees are the Interagency Council on Statistical Policy (ICSP), the Federal Committee on Statistical Methodology (FCSM), and Could Increase the Statistical Community of Practice and Engagement (SCOPE), all of Effectiveness which are either chaired or sponsored by OMB. 32 Importantly, the activities of the interagency committees are consistent with key collaborative practices we identified in our previous work. 33 For example, each of these committees has defined roles and responsibilities, and the committees serve as a vehicle for the agencies to operate across agency boundaries. Specifically, ICSP serves an advisory function to the Chief Statistician and focuses on broader issues related to the federal statistical system. In addition, ICSP provides overarching guidance to FCSM and SCOPE. FCSM investigates statistical practices and methodologies used in federal statistical programs, while SCOPE focuses on cross-agency activities of data management and dissemination. Table 1 provides an overview of these committees. 32 Outside of these interagency committees, there are nonfederal organizations, such as the Committee on National Statistics and the Council of Professional Associations on Federal Statistics, which serve as resources to identify opportunities for improving federal statistics. 33 GAO-06-15. Page 16 GAO-12-54 Federal Statistical System Table 1: Overview of Interagency Statistical Committees Interagency Council on Federal Committee on Statistical Statistical Community of Practice Statistical Policy (ICSP) Methodology (FCSM) and Engagement (SCOPE) Date established 1989a 1975 2009 Membership The heads of the principal About 20 members appointed by Appointed representatives from the statistical agencies, plus the the Chief Statistician based on principal statistical agencies, plus the statistical unit at the technical expertise and history of statistical unit at the Environmental Environmental Protection innovative contributions to the Protection Agency. Agency. federal statistical system. Mission • Coordinate statistical work, • Communicate and • Provide a collaborative community particularly when activities disseminate information on for statistical agencies to produce and issues cut across statistical practice among all relevant, accurate, timely, cost- agencies. federal statistical agencies. effective data and insightful • Exchange information • Recommend the introduction research disseminated through about agency programs of new methodologies in shared state-of-the-art best and activities. federal statistical programs to practices to support data-driven improve data quality. decisions. • Provide advice and counsel to OMB on • Provide a mechanism for statistical matters. statisticians in different federal agencies to meet and exchange ideas. Description of selected • Identifying the highest- • Discussing disclosure • Surveying tools used by statistical projects priority statistical program limitation methods; agencies to comply with improvements. • Investigating nonresponse standards for access to electronic • Developing views on issues related to selected and information technology improving implementation surveys. procured by agencies, and of the PRA. recommending the best tools for • Clarifying legal issues of use. • Providing direction to confidentiality and informed FCSM’s subcommittees on consent. • Developing protocols for pilot privacy and administrative testing of a secure cloud • Examining issues related to environment for storing data and data. the quality of administrative recommended software. data. Source: GAO analysis of OMB and agency data. a ICSP was established in 1989 and codified in the 1995 reauthorization of the PRA. The committees study statistical issues and methods through subcommittees and working groups, most of which rely on volunteers from member agencies who take on these responsibilities in addition to their current job duties. The work of the subcommittees and working groups has been useful to other agencies. For example, an FCSM subcommittee produced a checklist that, according to OMB, is used around the world to determine whether a public-use data product sufficiently protects the confidentiality of individuals’ data. The interagency committees use various methods to disseminate information on their activities and products, but they do not do so in a Page 17 GAO-12-54 Federal Statistical System timely or comprehensive manner. The committees’ work is summarized in OMB’s annual report Statistical Programs of the United States Government, but the report does not always communicate key information about it. For example, the fiscal year 2011 report states that one of ICSP’s activities over the past year was identifying the highest- priority statistical-program improvements, but does not provide information about all of these improvements. 34 In addition, interagency committees present information about their work at statistical seminars. For example, according to OMB officials, FCSM has presented work at the biennial FCSM Statistical Policy Seminars. Additionally, agency officials noted that members of interagency statistical committees utilize a limited-access web-based system to facilitate information sharing. Information about FCSM’s work is also posted on the committee’s website or the FedStats website. 35 Neither ICSP nor SCOPE has a dedicated website, though OMB believes that this is not necessary or appropriate because the work of these groups is deliberative. While the FCSM website offers the potential to effectively disseminate information, it is not comprehensive or timely. For example, it provides links to the sites of various interagency and advisory committees, including three FCSM permanent working groups, but does not have pages for any of the active FCSM subcommittees. 36 Moreover, the websites do not appear to be regularly updated with new products produced by the committees that could be useful for other agencies. For example, the subcommittee on the statistical uses of administrative data published a paper in April 2009 highlighting examples of successful data-sharing projects using administrative data for statistical purposes, but this product is not yet available on the FCSM website. Providing more-comprehensive and timely information on interagency activities could offer benefits. As identified in our previous work, developing mechanisms to monitor and report on results is a necessary 34 OMB staff noted that the specific program improvements are reflected in the President’s budget. 35 FedStats is a website that provides access to statistical information produced by the federal government. In addition, it includes all federal agencies listed in Statistical Programs of the United States Government that report a certain level of expenditures in statistical activities. 36 FCSM has active subcommittees looking at statistical uses of administrative data and privacy issues. In addition, FCSM has permanent working groups that discuss specific topic areas, such as nonresponse to household surveys. Page 18 GAO-12-54 Federal Statistical System element of a collaborative relationship. 37 In this case, better reporting of committee activities and products could offer benefits to those who are not involved in committee activities, as well as committee members. Membership in the committees is made up almost exclusively of representatives from the 13 principal statistical agencies, so most agencies are not directly involved in committee activities. It makes sense that agencies that have statistics as their primary focus are the most- heavily involved, but those agencies for which statistics is a supporting function to their primary mission, and possibly academics and the broader public, could benefit from greater access to information and products related to the committees’ work and priorities. More easily accessible information would also benefit member agencies, as it would offer a centralized place to maintain committee work and communicate priorities. Much work goes into developing the committees’ products, and making them easily accessible maximizes their value. Administrative Data Could Help Improve Federal Surveys, but Continued Progress Is Needed on Access and Quality Issues Administrative Data Have Administrative data, typically collected to administer a program or Greater Potential to business, are a growing source of information on individuals and Supplement, Rather than households. For example, the Social Security Administration collects data on the earnings of U.S. workers from employers and the Internal Revenue Replace, Federal Surveys Service to calculate the amount of benefits for retired workers, spouses, children, and other beneficiaries, while businesses obtain data, for example, on item and amount of purchases when customers use credit cards and store loyalty cards. According to the Census Bureau, the amount of administrative data held by private companies exceeds the amount held by the government. Researchers recently estimated that the 37 GAO-06-15. Page 19 GAO-12-54 Federal Statistical System amount of digital data in existence, which includes some types of administrative data such as retail customer databases, more than doubles every 2 years. 38 Administrative data have been identified as an important resource for the future of the statistical system, as some of these publicly and privately held data may be analyzed or reported with survey data to yield greater value. Furthermore, the increasing capacity to store and process administrative data has facilitated this potential use. For decades, agencies have been working to expand the use of administrative data in conjunction with data collected from surveys, but certain characteristics of administrative data make it difficult to use them to replace surveys or sections of surveys administered to households and individuals. There is interest in exploring how administrative data may be used to improve data quality, hold down costs, and reduce respondent burden. For example, as part of the redesign of the Consumer Expenditure Surveys, BLS is investigating the potential for replacing some portions of the survey with external sources of expenditure data to reduce respondent burden and potentially improve data quality. However, agencies we contacted have not replaced surveys or sections of surveys administered to households and individuals with administrative data because data: (1) are often not representative of a survey’s population of interest; (2) may not correspond to information collected through survey questions; (3) are vulnerable to program cancellation or changes; and (4) may take a long time to obtain, which delays use and in some cases could cause agencies to miss required reporting dates. Administrative data currently show greater promise for supplementing federal surveys. Indeed, the agencies we contacted identified four major opportunities to enhance surveys with administrative data in order to create efficiencies and enhance data quality. 39 Current uses of administrative data include the following: 38 John Gantz and David Reinsel, “Extracting Value from Chaos” (Framingham, Mass.: IDC Go-to-Market Services, June 2011). 39 For the purposes of our report, we focused on the use of administrative data with surveys administered to households and individuals. However, agencies such as the Census Bureau and BLS also use administrative data with business surveys to produce business statistics. For example, by combining administrative and survey data, the Census Bureau produces an annual series on employment by county, and BLS produces its quarterly series of statistics on gross job gains and losses. Page 20 GAO-12-54 Federal Statistical System • Creating new data products: Agencies link survey data and administrative data to create new, more robust, statistical data products, which increases efficiency in two key ways. First, according to OMB and agency officials, agencies can use these new data products to evaluate and potentially improve federal policies and programs, especially those related to the source of the administrative data, without adding to respondent burden. Second, combining administrative data with survey data can increase efficiency by enhancing previously collected survey data. For example, the National Center for Health Statistics’s (NCHS) record-linkage program links survey data from various health-related surveys to different administrative datasets to create new data products for studying factors that influence health-related outcomes, such as disability, health care, and mortality. • Supplementing surveys’ sample frames: Using administrative data to supplement surveys’ sample frames—the sources from which a survey’s sample is drawn—can create efficiencies, reduce costs, and enhance the quality of surveys. For example, the National Household Food Acquisition and Purchase Survey uses administrative data from the Supplemental Nutrition Assistance Program to develop a sample frame of participating households to potentially include in the survey. ERS officials said that using these data to help develop the survey’s sample frame costs less than the alternative of screening a broader group of respondents to determine if they are participating. In addition, agencies can use administrative data to augment sample frames in areas where the sample is not large enough to fully support a survey. For example, the Census Bureau’s pilot project studying the potential to use ACS data as a sample frame for the National Immunization Survey used commercial data to supplement ACS data in a county that had a limited ACS sample. • Comparing data to improve survey accuracy and design: By comparing survey data to similar administrative datasets and identifying reasons for any discrepancies that may exist, agencies can improve the quality of survey data. For example, researchers identified opportunities for improving surveys’ designs and methodologies after agencies found that surveys of enrollment in health-insurance programs provided lower estimates than those compiled from administrative data. Agencies can also improve the efficiency of their surveys by using administrative data as part of nonresponse follow-up activities. • Modeling estimates: Agencies combine administrative data and survey data to create, or model, estimates that are designed to be more accurate than estimates based on survey data alone. The main benefit of modeling is that it provides the ability to produce estimates Page 21 GAO-12-54 Federal Statistical System for smaller geographic areas than is possible using a survey alone. For example, the Census Bureau conducts the Small Area Income and Poverty Estimates Program to provide updated data on poverty and income, which is used to administer federal programs and allocate federal funds to local areas. The Census Bureau combines survey data from the ACS with population estimates and administrative data and has found that this approach produces consistent and reliable data more reflective of current conditions than data produced only by existing surveys. Agencies Are Addressing Despite the benefits of using administrative data to supplement federal Issues That Hamper Use of surveys, agencies face five key constraints related to data access and Administrative Data, but quality: Additional Actions Could • Statutory restrictions on data sharing: Federal and state statutes Facilitate Progress sometimes prohibit or limit sharing of data for statistical purposes. In cases where specified authorized uses do not include statistical use, nothing short of a statutory change can overcome the constraint. In other cases, statutes limit sharing to purposes related to program administration. For example, the 2008 Farm Bill restricts access to data on participants in certain nutrition-assistance programs to uses for the “administration or enforcement” of the programs. 40 Similarly, the Higher Education Act of 1965, as amended, restricts federal student aid data to purposes related to the “application, award, and administration of aid.” 41 However, agencies holding such restricted data can differ on whether statistical uses are related to program administration. The Census Bureau successfully negotiated access to the nutrition assistance data because it could demonstrate that the linked data would help the federal sponsor and state agencies develop better measures of outcomes, such as poverty, inequality, and the receipt of government transfers. Conversely, the Census Bureau was unable to gain access to the federal student aid data for statistical uses because the Department of Education did not consider that any of the planned uses related to the program’s administration. 40 7 U.S.C. § 2020(e)(8)(A)(i). The Department of Agriculture administers the program at the federal level through the Food and Nutrition Service, while state agencies administer the program at the state and local levels, including determination of eligibility and allotments. 41 20 U.S.C. § 1090(a)(3)(E). Page 22 GAO-12-54 Federal Statistical System • Consent: Individuals’ consent to allow their administrative and survey data to be linked affects uses of administrative data for statistical purposes. Seeking consent derives from a core concept of personal privacy: the notion that each individual should have the ability to control personal information about himself or herself. 42 Moreover, there can be issues regarding the privacy and confidentiality of data collected for one purpose and used for another, and agencies use different practices, wording, and level of detail to meet consent requirements, according to OMB officials. At the time administrative data are collected, an agency can inform individuals that their data may be used for statistical purposes, but, according to ERS officials, agencies collecting administrative data often do not consider possible future statistical uses and therefore may not provide such notice. Obtaining consent after data have been collected can be time- consuming and costly. In addition, an agency can ask survey respondents for permission to link their survey data with certain administrative data. Some respondents may not consent, which can substantially limit the number of respondents eligible for linkage and as a result potentially affect the quality of the linked data. 43 • Costs and infrastructure: Because the primary cost of collecting administrative data has already been incurred, using these data can, in some cases, be more efficient and less costly than new survey efforts. However, there still are costs to using administrative data for statistical purposes, including up-front and ongoing investments to purchase and maintain hardware and software to link data and protect their confidentiality. Agencies identified various factors that can affect costs. These include but are not limited to negotiations with the agency holding the data, the quality of the administrative data, and the ease with which they can be linked to other data. BLS officials said that in some cases the costs of using administrative data with survey data may outweigh any savings and that evaluation of administrative data options always requires careful consideration of a wide range of quality and cost issues, including the costs of specialized personnel and infrastructure. According to an FCSM study that profiled examples of successful statistical uses of administrative data, 42 GAO, Record Linkage and Privacy: Issues in Creating New Federal Research and Statistical Information, GAO-01-126SP (Washington, D.C.: April 2001). 43 Agency officials noted that, if the survey respondents who consent differ from those who do not consent, analysis of the linked files may lead to misleading or biased results. Also, the reduced sample size from an analysis using data for those who consent may increase confidence intervals for calculated estimates. Page 23 GAO-12-54 Federal Statistical System agencies wanting to share data also may not have the necessary staff, policies, or procedures. For example, negotiating data-sharing agreements may require significant time. Moreover, many key administrative datasets are held by states, further complicating the data-sharing process because agencies have to negotiate under different policies and procedures as well as work with numerous staff across states. • Documentation of datasets: OMB and agency officials said that agencies holding administrative data do not uniformly document information about their datasets in a way that is always useful or efficient for use outside of the agency. This lack of documentation of datasets makes evaluating their potential for statistical uses challenging. For example, definitions of key variables of research interest or information about how frequently the agency updates the data may not be available. ERS officials also noted that private companies typically do not disclose detailed information about the sources of their data, making it difficult to assess their quality. As a result, agencies interested in using these data for statistical purposes may have to spend additional time and resources to understand the content and structure of the datasets. • Quality of data: Agency officials and experts identified reasons why the quality of administrative data can vary, which can affect their potential use with survey data. Specifically, different agencies may use different systems, definitions, and time frames when collecting administrative data. For example, states may collect and evaluate the quality of data in different ways, making it complicated to aggregate the data across states as well as to compare state-level data. In addition, several factors can influence the accuracy of data reported in administrative data. For example, agencies that collect data for the purpose of program administration may be concerned with the accuracy of only the variables used for such purpose. Moreover, reporting incentives may influence data quality. For example, individuals may underreport income on tax forms, and program agencies may pay less attention to the accuracy of information collected from applicants when it does not affect their participation in a program. Agencies and interagency committees have been taking numerous actions to address these constraints. For example, ERS, in collaboration with the Census Bureau, NCHS, and OMB is undertaking a pilot project to address data quality concerns with state-level administrative data, and FCSM is working on a project to clarify legal requirements for informed consent (see fig. 3). Page 24 GAO-12-54 Federal Statistical System Interactive graphic Figure 3: Actions Taken to Address Constraints That Hamper Greater Use of Administrative Data Directions: [Click] on the types of constraints within the graphic structure on the right to see Statutory restrictions Data descriptions and documentation on data sharing examples of selected actions taken by agencies to address each constraint type Constraints to statistical uses of administrative data Quality of Consent data Costs and infrastructure Source: GAO analysis of OMB and principal statistical agency data. Constraint: Access Statutory restrictions on data sharing Description Examples of actions taken Statutes may not authorize statistical • OMB has issued guidance encouraging greater sharing of data uses for data collected by a program. while protecting privacy; and • FCSM produced a document, highlighting lessons learned when negotiating data-sharing agreements. • Click to make view needed visible. In the “Print” dialog box, choose “Current page,” then “OK.” Repeat to print each view. Print instructions • A text version of this graphic is available in appendix IV. Page 25 GAO-12-54 Federal Statistical System One theme that cuts across many of these efforts, and where additional short-term actions could accelerate progress, is identifying ways to facilitate the process of deciding whether to share data among agencies. FCSM published a paper describing successful data-sharing arrangements between various federal and state agencies. One of the four core elements of success that FCSM identified in these arrangements was mutual interest, in that each participant—in particular the agency providing the data—evaluates a proposed data-sharing agreement from its own perspective. 44 On the one hand, agencies may share data because the linked data can benefit program administration, as noted earlier. On the other hand, OMB and agency officials noted that agencies may decide against sharing because perceived disadvantages, such as policy concerns and potential identification of weaknesses in program administration, outweigh the possible benefits. In such a case, an individual agency’s interests may be at odds with the broader efficiency of the whole federal statistical system. As illustrated in figure 3, FCSM and agencies are developing tools to approach these decisions in a more standardized way, such as developing checklists for evaluating the quality of administrative data and a template for executing data- sharing agreements. However, these individual tools focus on particular aspects of data sharing—for example, the checklist focuses on data quality. Separately, they may not be sufficient for agencies to efficiently identify potential datasets with the greatest potential for mutual benefit and address all factors involved in the decision-making process. The benefits of having more-comprehensive centralized guidance could include greater consistency, clarification, and efficiency. A more- comprehensive standardized framework that ties together existing tools with additional resources in order to cover major aspects of the data- sharing process could bring consistency to the decision-making process. Similar to the checklist that FCSM is developing for agencies to use in evaluating data quality, the framework could include a template outlining a list of key questions for all agencies involved in the proposed data sharing, including federal and state agencies that hold data, to address issues such as: (1) the steps to take to ensure data reliability; (2) any statutory limitations on planned uses of the data (including confidentiality protections); (3) whether consent has already been obtained for additional 44 The three other core elements of success in these arrangements were (1) vision and support by agency leadership, (2) narrow but flexible goals, and (3) infrastructure. Page 26 GAO-12-54 Federal Statistical System use of the data, or how it will be obtained; and (4) methods to fully account for the costs associated with obtaining and using the data. To be comprehensive, such guidance would not need to be voluminous, but it should identify each of these major aspects of data sharing, provide advice to agencies, and reference any tools available to assist agencies during the process. It should also be kept up-to-date, reflecting changes in legislation or other factors that affect data sharing, as well as any new tools that are developed. Although such a framework may not lead to sharing in all cases, the framework could better ensure that agencies weigh the related benefits and costs in a more balanced, consistent, and transparent fashion. Such guidance could also clarify ways that agencies could resolve disagreements over data sharing. It could also improve efficiency, given that agency officials we spoke with cited examples in which it took multiple years to reach a resolution on data sharing, by helping agencies evaluate available data and determine those that have the greatest potential for mutual benefit. While agencies can take steps to address some constraints on sharing data, in other cases only policy actions on the part of the executive branch or Congress can lift barriers. One of the primary examples of such action is Congress’s enactment of CIPSEA in 2002, which authorized the Census Bureau to share selected business data with BLS and the Bureau of Economic Analysis for statistical purposes. However, CIPSEA is limited because the Census Bureau’s business data are based in large part on tax data, and as a result the tax code would need to be amended for the Census Bureau to also share these data with other statistical agencies. There have been proposals to amend the tax code to further expand the scope and coverage of CIPSEA, but action has not yet been taken by Congress. 45 45 As discussed in our recent report, Taxpayer Privacy: A Guide for Screening and Assessing Proposals to Disclose Confidential Tax Information to Specific Parties for Specific Purposes (GAO-12-231SP), Internal Revenue Code Section 6103 provides that federal tax information is to be kept confidential and used to administer federal tax laws except as otherwise specifically authorized by law. Page 27 GAO-12-54 Federal Statistical System Prospects for Enhanced Use of the ACS with Other Surveys Are Mixed The ACS Provides Unique The Census Bureau’s full implementation of the ACS in 2005 was a major Coverage of the Nation’s change to the statistical system. The survey is unique among other Population surveys of households and individuals because of its size—the monthly surveys add to an annual sample of 3.54 million addresses. The ACS provides annual estimates of social and economic characteristics for all areas of the country and is a primary source of information on small areas, such as towns and tribal lands, down to the neighborhood level. The ACS covers a broad range of topics, such as housing, education, and employment. The information provided by the ACS was previously only available once a decade from the decennial census long form, which the ACS replaced. Users of ACS information include all levels of government, the private and nonprofit sectors, and researchers. According to the Census Bureau, ACS estimates are currently used to help allocate more than $400 billion in federal funding annually. Table 2 lists some of the key characteristics of the ACS. Page 28 GAO-12-54 Federal Statistical System Table 2: Key Characteristics of the ACS Characteristic ACS Response requirements Responses are required by law Frequency of Administered on a monthly basis administration Frequency of data Annual products Reference point for data Period estimates: the period over which data are cumulated products is determined by the population of the geographic area for which the estimate applies. Estimates for places with populations of more than 65,000 represent a 1-year period; places with populations of 20,000 to 65,000 represent 3-year periods; and places with populations smaller than 20,000 represent 5-year periods. Number of questions 48 potential questions per person, plus 21 per housing unita Respondent burden 38 minutes per respondentb Sample size 3.54 million addresses per year Key uses Directing government funding, informing government and private-sector decision making, and research Source: Census Bureau. a Although there are 48 individual questions on the ACS, several questions only apply to respondents with certain characteristics, so respondents likely do not answer every question. For example, only ACS respondents who are female and age 15 to 50 are asked to answer a question about whether they have given birth in the past year. b The Census Bureau estimates that the respondent burden is 38 minutes for the questionnaire it administers to households. Its estimates for other interviews, such as group quarters, are different. Agencies Use the ACS to Several of the ACS’s characteristics lend to its appeal for use for other Inform the Design of Other surveys, including that it produces annual estimates on a broad range of Surveys and Analyze Their topics at finer geographic levels than other surveys, and agencies and others identified five areas of opportunity in which surveys can make use Results of ACS data and resources. Two of these areas, which generally rely on publicly-available ACS estimates and do not require changes to the survey’s design or methodology, have the greatest potential for widespread use. The Census Bureau has provided users with various resources to guide their use of ACS estimates. These include a guide to comparing estimates, handbooks directed to specific types of users, training presentations, and a tutorial. The two areas with the most potential for use are as follows: • Evaluating and supplementing other surveys’ results: Survey administrators and data users can also use ACS estimates to evaluate information collected by other surveys. For example, survey administrators can use ACS information to evaluate the quality of Page 29 GAO-12-54 Federal Statistical System responses to other surveys that include some questions that are the same as or similar to ACS questions. Additionally, data users and survey administrators can use ACS data to supplement information collected by other surveys. For example, a recent report based on analysis of ACS data describes how median earnings vary by the field in which people obtain their bachelor’s degrees. 46 Such information can complement results from other surveys. In this case, NCSES also produces information on earnings by degree type, based on information in its Scientists and Engineers Statistical Data System database, which contains data on people with a science or engineering degree and those who work in related fields. NCSES information collection is less frequent than ACS estimates and pertains to a more-narrowly defined population, but allows more detailed analysis of issues such as how people use their college degrees at work. Together, these two sources of information offer more-timely and more-detailed information than a single source. • Designing other surveys: There is also widespread potential to use ACS data to more-efficiently design other surveys. Because many of the topics included in the ACS are covered in more detail by other surveys or relate to other surveys’ target populations, survey administrators can use ACS estimates at different demographic or geographic levels to stay up-to-date on changes that may affect their surveys. These estimates can also be used when designing and selecting a survey’s sample. Census Bureau officials told us that, when designing a survey, survey administrators can use the data to guide the selection of a survey’s sample so that it better represents individuals or households with certain characteristics. For example, the Survey of Income and Program Participation can use ACS estimates at different demographic or geographic levels to identify and more-efficiently sample geographic areas with disproportionately large numbers of low-income households because this is a population of interest for the survey. Because these data are available for small geographic areas, agencies can use the data when samples include more-local geographic levels. 46 Anthony P. Carnevale, Jeff Strohl, and Michelle Menton, What’s It Worth? The Economic Value of College Majors, Georgetown University Center on Education and the Workforce (Washington, D.C.: May 24, 2011). Page 30 GAO-12-54 Federal Statistical System Uses of ACS That Require Agencies and others identified three uses of ACS data and resources Design and Methodology that, while offering potential benefits to other surveys, face such Changes Have Limited constraints that more widespread use is likely not possible under current ACS design. These uses are more intensive than the ones described Potential above, in that they affect the survey’s design and methodology or respondent burden, or both. Because the ACS has a large sample size and a complex methodology, there are logistical challenges involved in changing its design and methodology. Additionally, any changes that affect the survey’s respondent burden also have limited potential, as there are already concerns about the burden that the ACS places on respondents. Uses with more-limited potential are as follows: • Adding or modifying ACS content: Adding a question to the ACS or modifying existing questions can improve the efficiency of other surveys, though doing so involves trade-offs with factors such as respondent burden. This use of ACS could provide information that would inform the design of other surveys or facilitate the use of ACS data for another survey’s sample frame. For example, NCSES worked with the Census Bureau to add a question to the ACS about the field in which respondents earned their bachelor’s degrees in order to identify respondents that are in the target population for the National Survey of College Graduates. Despite the potential benefits of adding or modifying ACS content, adding a question to the ACS would increase respondent burden and have operational impacts, as it requires the Census Bureau to change the questionnaire design and processing and editing systems. If these actions result in additional pages for the questionnaire, it could affect costs and the response rate. Modifying questions poses an additional challenge because ACS estimates reflect multiple years of data, and a change in a question may affect the Census Bureau’s ability to cumulate data. • Adding supplements to the ACS: Another possible use of the ACS by other surveys is adding supplements to the ACS, though this use faces several obstacles. While the ACS currently does not include supplements, doing so could enable surveys to leverage the resources of the ACS. Other surveys, such as the Current Population Survey, allow other agencies or entities to sponsor supplemental surveys that are added on to the survey’s core set of questions. According to officials at BLS, which sponsors the Current Population Survey, in their experience it costs less to add a supplement to an existing survey than to conduct a separate stand-alone survey. Additionally, the agencies sponsoring the supplements gain the benefit of the experience of BLS or Census staff, or both, in designing and implementing surveys. Although the Current Population Survey successfully incorporates supplements, the ACS is different in several Page 31 GAO-12-54 Federal Statistical System key ways, and adding supplements to the ACS would involve significant challenges. For example, the ACS is mandatory, meaning that responses are required by law. Assuming a supplement to the ACS would be voluntary, Census Bureau officials told us that they would have to determine how to distinguish between the mandatory and voluntary sections, which would create complexity. Additionally, the Census Bureau processes ACS data on a yearly basis and does not have a process in place for producing estimates from a single month’s data, which would be a challenge if the supplement was administered along with only 1 month’s ACS mailout. Finally, including supplements raises concerns about respondent burden and respondent fatigue. 47 BLS officials noted the potential of matrix sampling, in which a set of additional questions, as in a supplement, is added to a month of collection (or all months) but differs from a supplement in that it is only administered to a subset of the survey sample in a given collection period. This option could reduce burden and increase efficiency; however, such an option involves logistical considerations in administering the survey and processing the data, and adds complexity for analysts using the data for research. • Creating sample frames: Using ACS data to develop sample frames for follow-on surveys has been identified as a potential use of ACS data, but several factors limit this use. 48 This involves using ACS data to identify ACS respondents with certain characteristics for potential inclusion in a follow-on survey and requires the approval of the Census Bureau and OMB. 49 At present only NCSES uses ACS data for this purpose. Agency officials told us that using ACS data to create a sample frame, as opposed to census long-form data, which they used previously, has improved the agency’s coverage of its target 47 Respondent fatigue occurs when respondents become tired of being surveyed and become more prone to refusal or the quality of their responses deteriorates. 48 A follow-on survey is one that is sent to ACS respondents after they have completed the ACS. Census Bureau policy prohibits sending an ACS respondent a follow-on survey within 6 months of his or her ACS interview. 49 Using ACS for sample frame development is more intensive than using ACS estimates to inform the design of a survey’s sample frame, which does not involve contacting ACS respondents again. In determining whether a survey can use ACS data for its sample frame, the Census Bureau’s and OMB’s policy is to give priority to surveys that meet certain criteria, including those that could substantially reduce costs by doing so and those that produce estimates for populations that would otherwise have prohibitively expensive screening costs. Page 32 GAO-12-54 Federal Statistical System population and has reduced costs and respondent burden. 50 Another benefit of this use is expanded analysis, as agencies, under appropriate Title 13 restrictions, can analyze respondents’ answers to the ACS along with responses to the follow-on surveys, and can analyze the characteristics (from ACS data) of those who do and do not respond to the follow-on survey to determine if they have different characteristics, which might cause bias in the survey. Surveys such as the National Survey of College Graduates that focus on populations that are costly to identify are likely to realize higher gains in efficiency from using ACS data for this purpose. Despite these benefits, opportunities for other surveys to use ACS data for this purpose are limited. ACS’s sample size, although large compared to most surveys, can be too small for another survey to use for a sampling frame. This is especially an issue if a survey targets a rare population or targets members similar to those of surveys already drawing from the ACS for their frame, because there would be too much chance of drawing individuals into both follow-on surveys, and current policy does not allow for that. Census Bureau policy states that, when agencies conduct follow-on surveys, they may not contact any member of a household that has already responded to the ACS and also had a member selected for a follow-on survey. With certain households in the ACS excluded from potential selection, it becomes more difficult for other surveys to draw samples because the data no longer reflect respondents with certain characteristics. In the long run, more-intensive uses of ACS data and resources may require difficult decisions and entail trade-offs with factors such as cost and respondent burden. Further, they risk affecting ACS response rates and overall data quality. However, redesign of the scope and methodology of ACS might overcome some of these constraints. After the release of the survey’s first 5-year data products in 2010, the Census Bureau and others began evaluating the survey and exploring options for increased uses. In addition to its own evaluation of the ACS, at the Census Bureau’s request the National Academy of Sciences is organizing workshops with data users to assess the survey. Also, OMB, in cooperation with the Census Bureau, created an ACS subcommittee of 50 The census long form did not include a question that asked for the field in which respondents received their bachelor’s degree. NCSES used long-form data to identify respondents who had characteristics that made them likely to be in the survey’s target population, but it had to screen a larger sample in order to identify those who in fact belonged to the target population. Page 33 GAO-12-54 Federal Statistical System the ICSP with the goal of investigating trade-offs of options such as adding questions to the ACS and rotating questions in and out of the survey. If the Census Bureau changes the survey’s design or methodology, these changes may become more feasible. To ensure the provision of high-quality, timely statistical data for public- Conclusions and private-sector users, OMB and the agencies that make up the federal statistical system must continue to identify opportunities for efficiency in federal surveys of households and individuals. Most of the surveys and other information collections in our scope have relatively modest costs, but challenges such as declining survey response rates will strain available resources unless agencies find more-effective and less-costly ways to collect and analyze the needed information, while maintaining critical protections of respondents’ privacy and confidentiality. In the long term, addressing the key challenges and constraints that agencies have identified will necessitate broader public debates and policy decisions about balancing trade-offs among competing values, such as quality, cost, timeliness, privacy, and confidentiality. In the short term, our review indicated that two promising avenues to sustain the progress that OMB and agencies are making include (1) facilitating collaboration and coordination among agencies and (2) combining existing data from both survey and administrative sources. The federal statistical system already exhibits many collaborative traits and practices, in particular through projects sponsored by OMB and interagency committees that facilitate coordination and the development of new policies and tools. However, additional efforts could help enhance the effectiveness of these efforts. Going forward, it will be important for OMB to supplement existing guidance to clarify the range of options available to address PRA standards. Supplementing the guidance could increase agencies’ awareness of these options, in particular those that were cited less frequently. At the same time, interagency committees could do more to improve accessibility and timeliness of their work products. Doing so could maximize the usefulness of committees’ work. Additionally, OMB’s ability to oversee and coordinate information collections across the government would benefit from additional steps to ensure the reliability of data on collections’ costs and burdens. Doing so would also benefit users of the information, whether they access it through the website or though OMB reports. Agencies identified multiple ways that combining survey and administrative data can improve the efficiency and quality of their work, Page 34 GAO-12-54 Federal Statistical System and they are already pursuing such opportunities. Importantly, they have demonstrated that using existing datasets to supplement each other can add value for all agencies involved in data sharing. But agencies also face serious constraints to expanded uses of existing data. One of the more-significant barriers is the complexity of the process through which they make decisions about sharing data. Though agencies and interagency committees are working to create tools to facilitate parts of this process, more-comprehensive and centralized guidance for agencies to follow when negotiating and making decisions regarding data-sharing opportunities could help facilitate the process. A standard protocol or framework could accelerate progress in this area by helping agencies to (1) evaluate the growing array of administrative data to identify those datasets that have the greatest potential for mutual benefit of the participating agencies, and (2) consider a common set of criteria and key questions when weighing the pros and cons of sharing data. A key benefit would be to encourage agencies to consider, in a uniform manner, all relevant aspects of these decisions, such as whether or not proposed uses would be consistent with applicable law, maintain confidentiality protections, be cost-effective, and serve to increase the broader efficiency of the federal statistical system. In order to maintain progress in maximizing the efficiency of existing data Recommendations for sources, we recommend that the Director of OMB, in consultation with the Executive Action Chief Statistician, work with the ICSP to take the following four actions: To improve the broader efficiency of the federal statistical system and improve communication among agencies and others, • when OMB next updates guidance on agency survey and statistical information collection and dissemination methods, include additional details on actions agencies can take to meet requirements to identify duplication, to consult with persons outside of the agency, and address other requirements as appropriate; and • create new methods or enhance existing methods to improve the dissemination of information and resources produced by interagency statistical committees. For example, such enhancements could include increasing the timeliness and availability of information on websites to better capture the full range of products and identify committee priorities. To increase the reliability of the information presented on the Reginfo.gov website and in OMB’s internal system, Page 35 GAO-12-54 Federal Statistical System • implement quality-control procedures designed to identify and remedy any differences between cost and burden information provided on the website and in the related supporting statement documentation that underlies this information. To accelerate progress in sharing administrative data for statistical purposes, where appropriate, • develop comprehensive guidance for both statistical agencies and agencies that hold administrative data to use when evaluating and negotiating data sharing, such guidance should include key questions focused on issues such as statutory authority, confidentiality, cost, and usefulness in order to ensure agencies consider all relevant factors and the broader interest of the federal government. We provided a draft of this report to the Secretaries of Commerce and Agency Comments Health and Human Services, the Director of OMB, the Commissioner of and Our Evaluation BLS, the Administrator of ERS, and the Director of the National Science Foundation for their review and comment. We received written comments on the draft report from the Secretary of Commerce that are reprinted in appendix V. We also received comments from OMB staff that are summarized below. The Department of Health and Human Services, BLS, National Science Foundation, OMB, and agencies on the ICSP also provided technical comments and suggestions that we incorporated as appropriate. Commerce stated that our observations illuminate future opportunities for using administrative records within the federal statistical system to increase efficiency and better meet informational needs and that our suggested actions would enhance the ability of statistical agencies to realize these opportunities. Regarding our recommendation on standard protocols and procedures to facilitate data sharing, the department noted that policies and other initiatives can also play a role in achieving cooperation. Finally, the department noted that our report’s acknowledgement of related concerns about the quality of administrative data, and the level of support and resources necessary to maintain a statistical and administrative data infrastructure, underscore the importance of our recommendations. OMB generally agreed with our recommendations and said that the agency hopes to pursue these in the future. More specifically, • OMB agreed that it is worth considering good practices for reducing duplication. As we suggested, OMB indicated that when Page 36 GAO-12-54 Federal Statistical System its survey guidance is next updated it will include additional details and examples of actions agencies can take to identify duplication and consult with persons outside the agency. • OMB said that it shared our concerns about timely and easily accessible dissemination of information resources produced by interagency statistical committees, and that our recommendation underscores the need for addressing this issue. • With respect to our recommendation that OMB implement quality- control procedures designed to identify and remedy any differences between cost and burden information provided on Reginfo.gov and in the related supporting statement documentation that underlies this information, OMB noted that PRA requires OMB to weigh the burdens imposed on the public by information collections against the legitimate needs of the federal agencies. OMB said that this requires a careful assessment of the estimates of paperwork burden that agencies provide to OMB as part of their information collection requests and, further, that these estimates are subject to public scrutiny and comment in Federal Register notices, in the PRA statements provided on information collections, and on Reginfo.gov. OMB pointed out that, because the burden estimates provided on Reginfo.gov and in the underlying supporting statements are all made public, discrepancies such as those found by us are public as well. OMB said that it will investigate and address any such discrepancies that are brought to its attention by GAO or any member of the public. • Finally, OMB concurred that administrative records can be a valuable supplement to, though usually not a replacement for, household surveys. OMB believes that our recommendation to develop comprehensive guidance for statistical and administrative agencies to use when evaluating and negotiating data-sharing agreements would be constructive, but cautioned that this involves a very complex set of issues and said it will take some time to develop such guidance. As agreed with your office, unless you publicly announce the contents of this report earlier, we plan no further distribution until 30 days from the report date. At that time, we will send copies to the Commissioner of the Bureau of Labor Statistics (BLS), the Director of the U.S. Census Bureau, the Administrator of the Economic Research Service (ERS), the Secretary of Health and Human Services, the Director of the National Science Page 37 GAO-12-54 Federal Statistical System Foundation, the Director of OMB, the Secretary of Commerce, and the Under Secretary of Economic Affairs. In addition, the report will be available at no charge on the GAO website at http://www.gao.gov. If you or your staff have any questions concerning this report, please contact Robert Goldenkoff at (202) 512-2757 or firstname.lastname@example.org, or Ronald S. Fecso at (202) 512-7791 or email@example.com. Contact points for our Offices of Congressional Relations and Public Affairs may be found on the last page of this report. Key contributors are listed in appendix VI. Sincerely yours, Robert Goldenkoff Director, Strategic Issues Ronald S. Fecso Chief Statistician Page 38 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology Appendix I: Scope and Methodology The objectives of this report were to (1) review the ways in which the Office of Management and Budget (OMB) and agencies identify opportunities for improvement and increased efficiency of selected information collections; (2) evaluate opportunities and constraints for the statistical agencies to use administrative data in conjunction with selected surveys; and (3) evaluate ways in which American Community Survey (ACS) data and resources can be used in selected surveys, and the associated benefits and constraints. To achieve our objectives, we focused on statistical information collections administered to households and individuals and subject to the Paperwork Reduction Act (PRA), which requires OMB approval of certain federal data collections. Although in many cases the information and views provided by agencies during our review and our general findings may also apply to statistical information collections outside of our scope, such as those administered to businesses, all of the specific collections and surveys we reviewed were administered to households and individuals. The majority of the collections within our scope include a survey, though some also include other methods of information collection such as focus groups. To examine the issues related to our objectives, we performed case studies of five federal surveys: the Consumer Expenditure Surveys, sponsored by the Bureau of Labor Statistics; the National Health and Nutrition Examination Survey and the National Health Interview Survey, both sponsored by the National Center for Health Statistics, part of the Centers for Disease Control and Prevention; the National Survey of College Graduates, sponsored by the National Center for Science and Engineering Statistics, part of the National Science Foundation; and the Survey of Income and Program Participation, sponsored by the Census Bureau. We selected these surveys based on several factors, such as their size and cost and whether they use or have the potential to use administrative data or ACS data. For the first objective, to review the ways in which OMB and agencies identify opportunities for improvement and increased efficiency of selected statistical information collections, we examined the PRA, OMB guidance to agencies, and prior GAO work on the federal statistical system. 1 We interviewed officials at OMB and the four agencies that 1 GAO, Federal Information Collection: A Reexamination of the Portfolio of Major Federal Household Surveys Is Needed, GAO-07-62 (Washington, D.C.: Nov. 15, 2006) Page 39 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology administer the case-study surveys to learn about coordination among agencies, efforts agencies take to identify improvement, and OMB’s role. We also interviewed officials at the Department of Agriculture’s Economic Research Service, which is a member of several interagency statistical committees and the lead agency for the Statistical Community of Practice and Engagement. In addition, we interviewed experts on the federal statistical system to learn about their perspectives on the efficiency of the federal statistical system and agency and OMB coordination. In evaluating OMB, agency, and interagency actions, we used as criteria the requirements of the PRA and practices identified in prior GAO work on agency collaboration. 2 To address the second objective, to evaluate opportunities and constraints for agencies to use administrative data in conjunction with selected surveys, we reviewed statutes that govern the sharing and use of administrative data, documentation from case-study surveys, and various papers and reports. We interviewed officials at OMB and experts in the field of federal statistics to learn about their perspectives on the current and potential uses of administrative data. We also interviewed officials at the Economic Research Service and the agencies that sponsor the case-study surveys to learn about ways in which their surveys use or could potentially use administrative data. For this objective and the third we used OMB guidance, relevant statutes, and prior GAO work as criteria in our evaluation. For the third objective, to evaluate the ways in which ACS data and resources can be used in selected surveys, we reviewed Census Bureau documentation, National Science Foundation reports, prior GAO work, and reports issued by the Committee on National Statistics. We interviewed officials at the Census Bureau, which sponsors the ACS, and at OMB to learn about their perspectives on potential uses of the survey and its data. We also interviewed officials at the Economic Research Service and the agencies that administer the case-study surveys to learn about ways in which their surveys use or could potentially use ACS data and resources, and experts in the field of federal statistics to learn about their assessment of the uses and potential uses of the ACS. 2 GAO, Results-Oriented Government: Practices That Can Help Enhance and Sustain Collaboration among Federal Agencies, GAO-06-15 (Washington, D.C.: Oct. 21, 2005). Page 40 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology To gain a broader perspective on the information collections in our scope and to inform our work across all three objectives, we obtained and analyzed publicly-available data from Reginfo.gov, a government website that provides access to information on agency requests for OMB approval of information collections. We used the website’s search feature to download all of the collections that were classified as (1) active, meaning that they are currently approved by OMB for use by agencies; (2) employing statistical methods; and (3) directed to households and individuals. We downloaded data on all information collections that met these criteria from Reginfo.gov on two dates, May 17, 2011, and September 22, 2011. We performed more in-depth analyses of the 507 information collections in our May 17, 2011, download. First, we reviewed the supporting statements for each of these collections, and on the basis of information in these documents classified them according to the subject matter on which they focus. 3 Next, we grouped the collections into categories, based on information on the sponsoring agency in Reginfo.gov and the supporting statements. Depending on the sponsoring agency, we put the collections into one of four categories: (1) those that are sponsored by one of the 13 principal statistical agencies; (2) those that are sponsored by another agency that shares a parent agency with one of the 13 principal statistical agencies (for example, agencies in the Department of Health and Human Services would fall into this category because it is the parent agency of the National Center for Health Statistics); (3) those that are not a principal statistical agency and do not share a parent agency with one; and (4) unknown, for those whose sponsoring agency we could not determine based on the available information. We also used the information in the supporting statements to determine if the collections included a survey component and found that 481 of the 507 did. We divided the 481 collections that included a survey component into three strata that reflect the type of sponsoring agency. Of the 481, we were not able to determine agency type for 7 collections so we dropped these records, leaving a population of 474 statistical information collections. The number of collections by stratum is shown in table 3. In 3 Agencies include supporting statements with each request for approval of an information collection. These statements must follow a prescribed format and include specified information such as the circumstances that make the collection necessary and how, by whom, and for what purpose the information will be used. Page 41 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology order to estimate the prevalence of certain characteristics in this population—for example, the percentage of information collections for which the sponsoring agency reported steps taken to identify potential duplication—we drew a stratified sample of 106 collections. Within each stratum, we estimated the sample size required to yield a 95 percent confidence interval of plus or minus 14 percent around such an estimate. For the overall population of 474, the approximate precision for an estimated percentage of 50 percent is plus or minus 8.4 percent, at the 95 percent level of confidence. Table 3: Number of Collections, by Stratum Stratum Sample Stratum population size Principal statistical agency 60 27 Nonstatistical agency 139 37 Nonstatistical agency that shares a parent agency with a 275 42 statistical agency Total: 474 106 Source: GAO analysis of OMB data. We reviewed the supporting statements of each of the information collections in our sample of 106, focusing on agencies’ reported efforts to identify duplication and to consult with persons outside the agency to obtain their views. Because agencies follow a standard format in preparing supporting statements, we focused our analysis on the sections of the supporting statements in which OMB instructs agencies to include this information (sections 4 and 8, respectively, of section A of the supporting statement). To review agencies’ reported actions, we used a data-collection instrument that contained a series of “yes-no” questions about the types of efforts reported. For example, we reviewed whether agencies had reported considering administrative data as a potential source of duplication, and whether agencies reported that they had consulted with other agencies when describing consultations outside of the agency. We did not evaluate whether agencies actually took the actions they reported taking. Estimates produced from the sample of the collections are subject to sampling error. We express our confidence in the precision of our results as a 95 percent confidence interval. This is the interval that would contain the actual population value for 95 percent of the samples we could have drawn. As a result, we are 95 percent confident that each of the confidence intervals in this report will include the true values in the study population. Page 42 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology We took several steps to evaluate the reliability of the data we accessed through the Reginfo.gov website. We interviewed OMB officials and reviewed documentation of the Reginfo.gov website and the Regulatory Information Service Center and OIRA Consolidated Information System, 4 which is the system that agencies use to track information collection requests and that underlies information provided on Reginfo.gov. As part of our review of the subject matter of the collections in the May 17, 2011, download, we confirmed that the collections were within our scope. We also used the information in the September 22, 2011, download to evaluate the reliability of data on collections’ cost and annual burden. To do this, we drew a systematic random sample of 56 (approximately 10 percent) of the 555 collections in the download. We found a number of inconsistencies between the cost and burden information available on the website and that provided in supporting statement documentation. According to an official at OMB, the two sources should match, but the supporting statement documentation is more accurate than that on the website. On the basis of our assessment, we determined that the information from the website was not sufficiently reliable for the purpose of describing the annual cost or annual burden to respondents of the collections in our scope. However, through this review and the other steps we took, we found that the other information provided on the Reginfo.gov site was sufficiently reliable for our other intended purposes of identifying the collections within our scope and obtaining information on their subject matter and reported actions taken to identify unnecessary duplication and solicit input from outside persons and entities. Because the cost information on the Reginfo.gov website was not sufficiently reliable, we used cost information from the supporting statements of the collections in our sample to provide background information on the costs of the collections in our scope. In addition to using information from the supporting statements in our initial sample of 56 collections, we drew another systematic random sample of 56 additional collections from the September download. In total, we obtained cost information from the supporting statements of 112 (approximately 20 percent) of the 555 collections in our scope that were active as of our September 22, 2011, download. 4 OIRA is the Office of Information and Regulatory Affairs within OMB. Page 43 GAO-12-54 Federal Statistical System Appendix I: Scope and Methodology We conducted this performance audit from December 2010 until February 2012 in accordance with generally accepted government auditing standards. Those standards require that we plan and perform the audits to obtain sufficient, appropriate evidence to provide a reasonable basis for our findings and conclusions based on our audit objectives. We believe that the evidence obtained provides a reasonable basis for our findings and conclusions based on our audit objectives. Page 44 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Appendix II: Description of Case-Study Surveys Surveys Purpose: To collect information on expenditures and households’ Consumer characteristics Expenditure (CE) Sponsoring agency: Bureau of Labor Statistics (BLS) Surveys: Quarterly Interview Survey Annual sample size (estimated): CEQ: 14,725 households; 1 CED: 12,075 (CEQ) and Diary households 2 Survey (CED) Annual cost to the federal government (estimated): $41.8 million 3 Annual burden hours (estimated): CEQ: 36,033hours; 4 CED: 33,721 hours 5 Target population: Nationally-representative sample of the U.S. population 6 Uses of data: According to BLS documentation, the most important use of the CE Surveys is to provide expenditure data for updating the Consumer 1 BLS estimates that 8,825 of the 14,725 households surveyed per quarter will complete the interviews. As a result, over the course of a year, BLS estimates that there will be 35,300 completed interviews. 2 BLS estimates that 7,050 of the 12,075 households that receive the CED will complete the interview and diaries. Because each household completes two weekly diaries, BLS estimates that households will complete 14,100 diaries per year. 3 This amount reflects the approximate fiscal year 2010 cost of collecting, processing, reviewing, and publishing data collected through the CE Surveys. Survey costs vary somewhat from year to year. 4 BLS estimates that respondents will take an average of 60 minutes to complete one interview survey of the CEQ. Since the CEQ is administered to the same sample of households four times in a year, the annual burden for respondents who complete all four surveys is roughly 4 hours. In addition, a certain number of respondents who complete the interview surveys are reinterviewed, a process that adds 10 minutes to these selected respondents’ burden times. 5 BLS estimates that respondents will take approximately 105 minutes to complete one diary survey of the CED. In addition to the diary survey, respondents complete three interviews, each of which takes 25 minutes. Lastly, a certain number of respondents are reinterviewed, a process that adds 10 minutes to these selected respondents’ burden times. 6 The CE Surveys are limited to the U.S. civilian, noninstitutionalized population, and as a result exclude certain segments of the population, such as active-duty military members living on bases and prisoners. Page 45 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Surveys Price Index, the most widely used measure of inflation. 7 In addition, government agencies, private companies, policymakers, and researchers use data from the CE Surveys in a variety of ways. For example, the Department of Defense uses data from the CE Surveys to update cost-of- living adjustments for military families. Congressional committees also use the data to inform decision making, such as the potential effect of increases in the minimum wage. Purpose: To assess the health and nutritional status of adults and National Health and children in the United States Nutrition Sponsoring agency: National Center for Health Statistics (NCHS), Examination Survey Centers for Disease Control and Prevention (NHANES) Annual sample size (estimated): 5,180 individuals 8 Annual cost to the federal government (estimated): $37.8 million 9 Annual burden hours (estimated): 49,626 hours 10 7 The Consumer Price Index produces monthly data on changes in the prices paid by urban consumers for a representative basket of goods and services. 8 Although a larger pool of respondents participates in a screener survey, NCHS estimates that 5,180 respondents participate in the screener, household interview, and physical examination. 9 NCHS estimates that the annual cost to the federal government of NHANES for fiscal year 2010 was $37.8 million, including both direct and reimbursable funding provided by other agencies for NCHS statistical services. Survey costs vary somewhat from year to year. 10 NCHS estimates that the total annual burden for the NHANES is 37,626 hours, including screening, household interviews, physical examinations, and any follow-up interviews. In addition, tests of procedures and special studies account for an additional 12,000 hours, for a total annual burden of 49,626 hours. NCHS estimates that respondents who participate in all aspects of the NHANES, including the screener survey, household interview, and physical examination, can expect a burden of 6.7 hours. In addition to those who complete all aspects of the NHANES, some respondents may only participate in the screener survey and be screened out of the sample, while other respondents may participate in the screener survey and the household interview but not the physical examination. NCHS includes all respondents at these varying levels of participation in its calculation of the annual burden hours. Page 46 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Surveys Target population: Nationally-representative sample of individuals of all ages 11 Uses of data: According to NCHS documentation, a variety of users, including federal agencies, research organizations, universities, health- care providers, and educators, use NHANES data. For example, the Food and Drug Administration uses NHANES data to determine whether changes are needed to federal regulations. In addition, use of NHANES data informs key decision making. For example, according to NCHS documentation, NHANES data on lead levels in blood were instrumental in developing the policy to eliminate lead from gasoline and in food and soft drink cans. As part of its broader data-linkage program, NCHS links NHANES survey data to multiple administrative datasets, such as the National Death Index (a centralized index of state death record information) and Medicare and Medicaid claims from the Centers for Medicare and Medicaid Services. The National Death Index linkages give researchers an opportunity to analyze mortality differences among subgroups defined using the survey information. Similarly, the Medicare and Medicaid claim linkages provide an opportunity to examine health conditions, utilization, and costs among subgroups defined using the survey information. Additionally, according to NCHS officials, NCHS is currently conducting a pilot study to link NHANES data on participants from Texas to administrative data on food assistance. Purpose: To monitor the health of the U.S. population National Health Interview Survey Sponsoring agency: National Center for Health Statistics (NCHS), Centers for Disease Control and Prevention (NHIS) Annual sample size (estimated): 35,000 households 12 11 The NHANES is limited to the U.S. civilian, noninstitutionalized population, and as a result excludes certain segments of the population, such as active-duty military members living on bases and prisoners. 12 NCHS estimates that the annual sample size in 2011 is 35,000 households, and that 87,500 individuals will participate in the survey. NCHS plans to increase the sample size in the future. Page 47 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Surveys Annual cost to the federal government (estimated): $32.2 million 13 Annual burden hours (estimated): 34,977 hours 14 Target population: Nationally-representative sample of households, collecting data on all members of each household 15 Uses of data: According to NCHS documentation, government agencies, policymakers, researchers, and academics use NHIS data for a variety of purposes, such as identifying health problems and evaluating health programs. For example, policymakers used NHIS data to shape the Centers for Disease Control and Prevention’s cervical-cancer screening policy. In addition, other agencies can use the NHIS as a sample frame for their surveys. Lastly, as part of its broader data-linkage program, NCHS links NHIS survey data to multiple administrative datasets, including those it uses for linkages with NHANES data, such as the National Death Index and Medicaid and Medicare claims. Purpose: To provide information on the U.S stock of scientists and National Survey of engineers College Graduates Sponsoring agency: National Center for Science and Engineering (NSCG) Statistics, National Science Foundation Sample size per survey administration (estimated): 100,000 individuals 13 NCHS estimates that the annual cost to the federal government of NHIS for fiscal year 2010 was $32.2 million, including both direct and reimbursable funding provided by other agencies for NCHS statistical services. Survey costs vary somewhat from year to year. 14 NCHS estimates that the total annual burden of the NHIS was 34,977 hours for 2010 and 2011. NCHS estimates that a single respondent who completes all portions of the NHIS for a household can expect a time burden of one hour. Some respondents who complete all portions of the NHIS are asked to take a short reinterview survey, a process that adds 5 minutes to these selected respondents’ burden times. In addition to those who complete all portions of the NHIS, some respondents may only participate in a screener survey and be screened out of the sample, which NCHS estimates takes 5 minutes per respondent. NCHS includes all respondents at these varying levels of participation in its calculation of the annual burden hours. 15 The NHIS is limited to the U.S. civilian, noninstitutionalized population, and as a result excludes certain segments of the population, such as active-duty military members living on bases and prisoners. Page 48 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Surveys Cost to the federal government (estimated): $13.3 million 16 Burden hours per administration (estimated): 34,792 hours 17 Target population: Individuals in the United States who have a bachelor’s degree in science, engineering, or health, and those who have a degree in another field but work in science, engineering, or health occupation. Uses of data: According to National Science Foundation documentation, information from the NSCG is used by researchers and policymakers. Government agencies use the data to assess available scientific and engineering resources and inform the development of related policies. Additionally, educational institutions use NSCG data to inform the establishment and modification of curricula, and businesses use the data to develop recruitment and compensation policies. Purpose: To provide information about status and principal determinants Survey of Income and of individuals’ and households’ income and participation in government Program Participation programs such as Social Security and Medicaid (SIPP) Sponsoring agency: Census Bureau Annual sample size (estimated): 45,000 households 18 Annual cost to the federal government (estimated): $50 million 19 16 Survey costs vary somewhat from year to year. $13.3 million is the expected cost for the survey in 2012. 17 This estimate is based on the assumption that 83,500 individuals will respond to the survey and each respondent will take 25 minutes to complete it. 18 The Census Bureau estimates that of the 65,300 households in its sample, approximately 52,900 are occupied at the time of interview and approximately 45,000 households are interviewed. It estimates that each interview yields 2.1 individual interviews, for a total of 94,500 individual interviews per survey administration. The Census Bureau administered the SIPP three times to the same households in fiscal year 2011, and estimates that each administration generated 94,500 interviews, for a total of 283,500 in the fiscal year. 19 The Census Bureau estimates that the production cost for all parts of the SIPP in fiscal year 2011 is $50.1 million. Survey costs vary somewhat from year to year. Page 49 GAO-12-54 Federal Statistical System Appendix II: Description of Case-Study Surveys Annual burden hours (estimated): 143,303 hours 20 Target population: Nationally-representative sample of households. 21 All household members 15 years old or over are interviewed for the survey. Uses of data: According to the Census Bureau, SIPP data are used by agencies such as the Department of Health and Human Services and the Department of Agriculture, as well as economic policymakers, Congress, and state and local governments, to plan and evaluate government social-welfare and transfer-payment programs. 20 The bureau estimates the total burden to respondents in fiscal year 2011 as 143,303 hours, which includes the time it takes respondents to fill out the core and topical module sections, as well as the reinterview of selected respondents. This estimate is based on the assumption that most respondents take 30 minutes to complete one administration of the survey. Since the SIPP was administered to the same sample of households three times in fiscal year 2011, the annual burden for most respondents was 90 minutes. 21 The SIPP is limited to the U.S. civilian, noninstitutionalized population, and as a result excludes certain segments of the population, such as active-duty military members living on bases and prisoners. Page 50 GAO-12-54 Federal Statistical System Appendix III: Selected Statutes Related to Appendix III: Selected Statutes Related to Information Collection Information Collection Selected statutes that regulate the collection and dissemination of information include the following. The Information Quality Act of 2000 requires, among other things, that Governmentwide • the Office of Management and Budget (OMB) develop and issue Statutes guidelines that provide policy and procedural guidance for federal agencies for ensuring and maximizing the quality of the information they disseminate. These guidelines include steps designed to assure objectivity and utility of disseminated information. See 44 U.S.C. § 3504(d)(1); OMB guidelines are at http://www.whitehouse.gov/omb/info_quality_iqg_oct2002/. • The Privacy Act of 1974, as amended, and the privacy provisions of the E-Government Act of 2002 specify requirements for the protection of personal privacy by federal agencies. The Privacy Act places limitations on agencies’ collection, disclosure, and use of personal information maintained in systems of records. See 5 U.S.C. §§ 552a and 552a note. The E-Government Act requires agencies to conduct privacy impact assessments that analyze how personal information is collected, stored, shared, and managed in a federal system. See 44 U.S.C. § 3501 note. • The Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002 focuses on confidentiality protection and data sharing. It requires that information acquired by an agency under a pledge of confidentiality and for exclusively statistical purposes be used by the agency only for such purposes and not be disclosed in identifiable form for any other use, except with the informed consent of the respondent. It also authorizes identifiable business records to be shared for statistical purposes among the Bureau of Economic Analysis, Bureau of Labor Statistics, and the Census Bureau. See 44 U.S.C. § 3501 note. Agency-specific statutes also guide federal data collection and use. Agency-Specific • For example, the Census Bureau conducts the census and census- Statutes related surveys such as the American Community Survey under Title 13 of the U.S. Code, which gives the Census Bureau the authority to request and collect information from individuals but also guarantees the confidentiality of these data and establishes penalties for unlawfully disclosing this information. Unless specifically authorized, these provisions preclude the Census Bureau from sharing identifiable census information with other agencies. See 13 U.S.C. § 9. Title 15 of the U.S. Code permits the Secretary of Commerce to conduct studies on behalf of other agencies and organizations. Identifiable data from Page 51 GAO-12-54 Federal Statistical System Appendix III: Selected Statutes Related to Information Collection surveys conducted under Title 15 authority are subject to the sponsoring agency’s legislation and confidentiality requirements. See 15 U.S.C. § 176a. Statutes and regulations specific to other agencies also affect collection and sharing of data. • Section 6103 of the Internal Revenue Code provides that federal tax information is confidential and may not be disclosed except as specifically authorized by law. • Section 308(d) of the Public Health Service Act requires that identifiable information obtained by the National Center for Health Statistics be used only for the purpose for which it was collected unless consent is obtained for another purpose, and it prohibits the release of identifiable information without consent. • Other legislation such as the Family Educational Rights and Privacy Act, which protects the privacy of student education records, can affect federal data-collection efforts. See 20 U.S.C. § 1232g. Page 52 GAO-12-54 Federal Statistical System Appendix IV: Printable Interactive Graphic Appendix IV: Printable Interactive Graphic This table reproduces the information in the interactive figure 3 earlier in this report. Table 4: Actions Taken to Address Constraints That Hamper Greater Use of Administrative Data Constraint type Constraint Description Examples of actions taken Access Statutory restrictions Statutes may not authorize • OMB [the Office of Management and Budget] has issued on data sharing statistical uses for data collected guidance encouraging greater sharing of data while by a program. protecting privacy; and • FCSM [the Federal Committee on Statistical Methodology] produced a document, highlighting lessons learned when negotiating data-sharing agreements. Consent Agencies may not have consent • When requesting consent to link respondents’ survey from respondents to use their data with administrative data, NCHS [the National administrative data for statistical Center for Health Statistics] has moved from asking for purposes, or agencies may respondents’ full Social Security numbers to only asking interpret the legal requirements for part of their Social Security numbers (used to assure for consent differently. link quality), which has improved consent rates; and • FCSM is working on a project to clarify legal requirements for informed consent and to examine the current practices agencies typically use to obtain consent. Costs and Agencies may not have the staff, • FCSM is completing a template for agencies to use infrastructure policies, procedures, and systems when negotiating data-sharing agreements; in place to share or use • FCSM produced a document highlighting lessons administrative data for statistical learned when negotiating data sharing agreements; and purposes. • The Census Bureau is using government and commercial administrative data to simulate the 2010 Census results, as well as comparing the quality of the Census Bureau’s process for linking data from NCHS surveys with administrative data to NCHS’s current record-linkage process. Quality Data documentation Agencies may not uniformly • FCSM is investigating the potential for using a checklist document information about to evaluate the quality of datasets, part of which focuses administrative datasets. on documentation in order to assess potential for statistical uses. Quality of data The quality of administrative data • The Census Bureau is investigating the quality of varies. administrative data held by private companies; • FCSM and the Census Bureau are investigating the quality of administrative data and the potential for using a checklist to assess the quality of datasets; and • ERS [the Economic Research Service] (in collaboration with the Census Bureau and NCHS) is undertaking a pilot project to address data- quality concerns with state- level administrative data. Source: GAO analysis of OMB and principal statistical agency data. Note: Data are from related documentation and interviews with officials at OMB and selected principal statistical agencies. Page 53 GAO-12-54 Federal Statistical System Appendix V: Comments from the Department Appendix V: Comments from the Department of Commerce of Commerce Page 54 GAO-12-54 Federal Statistical System Appendix V: Comments from the Department of Commerce Page 55 GAO-12-54 Federal Statistical System Appendix VI: GAO Contacts and Staff Appendix VI: GAO Contacts and Staff Acknowledgments Acknowledgments Ronald S. Fecso, (202) 512-7791 or firstname.lastname@example.org GAO Contacts Robert Goldenkoff, (202) 512-2757 or email@example.com In addition to the individuals named above, Tim Bober (Assistant Staff Director), Carl Barden, Russell Burnett, Robert Gebhart, Jill Lacey, Acknowledgments Andrea Levine, Jessica Nierenberg, Susan Offutt, Kathleen Padulchick, Tind Shepper Ryen, and Jared Sippel made key contributions to this report. (450880) Page 56 GAO-12-54 Federal Statistical System GAO’s Mission The Government Accountability Office, the audit, evaluation, and investigative arm of Congress, exists to support Congress in meeting its constitutional responsibilities and to help improve the performance and accountability of the federal government for the American people. GAO examines the use of public funds; evaluates federal programs and policies; and provides analyses, recommendations, and other assistance to help Congress make informed oversight, policy, and funding decisions. GAO’s commitment to good government is reflected in its core values of accountability, integrity, and reliability. The fastest and easiest way to obtain copies of GAO documents at no Obtaining Copies of cost is through GAO’s website (www.gao.gov). Each weekday afternoon, GAO Reports and GAO posts on its website newly released reports, testimony, and correspondence. To have GAO e-mail you a list of newly posted products, Testimony go to www.gao.gov and select “E-mail Updates.” Order by Phone The price of each GAO publication reflects GAO’s actual cost of production and distribution and depends on the number of pages in the publication and whether the publication is printed in color or black and white. Pricing and ordering information is posted on GAO’s website, http://www.gao.gov/ordering.htm. Place orders by calling (202) 512-6000, toll free (866) 801-7077, or TDD (202) 512-2537. Orders may be paid for using American Express, Discover Card, MasterCard, Visa, check, or money order. Call for additional information. Connect with GAO on Facebook, Flickr, Twitter, and YouTube. Connect with GAO Subscribe to our RSS Feeds or E-mail Updates. Listen to our Podcasts. Visit GAO on the web at www.gao.gov. Contact: To Report Fraud, Waste, and Abuse in Website: www.gao.gov/fraudnet/fraudnet.htm E-mail: firstname.lastname@example.org Federal Programs Automated answering system: (800) 424-5454 or (202) 512-7470 Katherine Siggerud, Managing Director, email@example.com, (202) 512- Congressional 4400, U.S. Government Accountability Office, 441 G Street NW, Room Relations 7125, Washington, DC 20548 Chuck Young, Managing Director, firstname.lastname@example.org, (202) 512-4800 Public Affairs U.S. Government Accountability Office, 441 G Street NW, Room 7149 Washington, DC 20548 Please Print on Recycled Paper.
Federal Statistical System: Agencies Can Make Greater Use of Existing Data, but Continued Progress Is Needed on Access and Quality Issues
Published by the Government Accountability Office on 2012-02-24.
Below is a raw (and likely hideous) rendition of the original report. (PDF)