~\ ~,~~, ~ ... 'f..)~~~ ~~'<. NATIONAL ARCHIVES OFFICE of INSPECTOR GENERAL Date January 5, 2011 Reply to Attn of Office ofInspector General (OIG) Subject Management Letter No. 11-08, Electronic Records Archive Lacks Ability to Search Records' Contents To David S. Ferriero, Archivist of the United States (N) The Electronic Records Archives (ERA) program is critical to the future of the National Archives and Records Administration (NARA) and the nation. From inception, ERA was envisioned as the primary way our nation's electronic recordswill be preserved and accessed. Thus, ERA is situated to become NARA' s flagship system in a world dominated by born digital records, and there is no alternative venue to which the public can tum for comprehensive access to these records of our democracy. However, other than for select records, system limitations will not allow users to conduct a content search of the comprehensive inventory of electronic records ERA has and will continue to ingest. Instead, ERA will only allow users to locate records by searching through metadata generated about the records, and not the text of the records themselves. We believe this constraint has not been adequately communicated to NARA stakeholders. The inability of all American citizens to fully search the content of records ingested into ERA will have a profound and adverse impact upon this nation. We believe this looming deficit in capacity for American citizens and others to access the records of our democracy has not been effectively communicated outside of this agency. Further, the need for resources which could secure the equipment and staff to process and house the tsunami of records heading to NARA in a manner which might facilitate full-text search capability has not been communicated to OMB, or our Congressional oversight committees. This is akin to not calling for reinforcements when our positions are surely about to be overrun. Figuratively speaking, there is no cavalry on the horizon, and NARA opted not to send out an SOS. In 2005 NARA awarded Lockheed Martin a design contract to build the foundation of ERA, and it has been announced the contract will end on September 30,2011. As of January 5,2011 the NATIONAL ARCH IVES and RECORDS ADMINISTRATION 8601 ADElPH I ROAD. ROOM 1300 COLLEGE PARK. MD 20740·6001 www.archives.gov actual costs of ERA totaled approximately $430 million.! This office has taken an active role in providing audit coverage to ERA. In reports and testimony we have identified a troubled program. ERA has experienced failed deliverables, numerous changes in requirements, cost overruns, material key staff turnover, uncertain funding, technological challenges, miscommunication between vital stakeholders etc. Throughout the saga this office has asked one question repeatedly: At the end of the contract, when the contractor turns in their keys and badges, what exact functionality will ERA provide to the most important stakeholder, the American citizen? From program inception, research papers commissioned by NARA and published by the National Academy of Sciences extolled the benefits of full-text content searching. Indeed, NARA contractual documents defined that ERA shall be able to search assets based on their contents and be able to perform keyword, exact phrase, proximity and other types of searches. Simply speaking, this means that a document such as this very letter would be ingested into ERA, and anyone now or in the future would be able to locate it by searching the body of this text. This is consistent with the manner in which users navigate through existing publicly available search engines such as Google, BING, Yahoo etc. In fact, the commercial search engine applied to ERA not only has full-text content search capability, but we are advised it is the actual default option. However, once publicly available, the base ERA system, other than for a select population of records, will not support full-text search of the contents of individual records. 2 Instead, users will generally only be able to search through metadata generated about the records, with limited records being made full-text searchable only after being identified as high-request records. This shortcoming will become ever more significant as billions of records begin to flow into NARA in the coming years. Indeed, a current key ERA staff member defined "we expect that the percentage of electronic ;ecords that are full text searchable will decrease over time, depending on 1) volume of records coming in and 2) resources available." Compounding this situation is the fact NARA has deferred the capability for automated generation of records descriptions beyond the end of the ERA contract. In a computer system which does not search the content of records, the record descriptions take on additional importance as the only searchable narrative of the record's contents (presuming the descriptions are made part of the searchable metadata). However, as ERA has now been set-up, such descriptions will not be automatically generated by the system, but instead must be manually generated. Considering the massive amount of data expected to be put into the system, such a manual process will invariably create substantial, perhaps insurmountable, bottlenecks. Without full content searching, this potential delay in generating records' descriptions will degrade 1 According to the Office of Management and Budget's Federal IT Dashboard at http://it. usaspending.govl?q=contentl cost-summary&buscid=799. 2 We realize not all records will have text to search (i.e., photos), and this Management Letter is limited to the significant portion of anticipated ERA-housed records which will contain text (i.e., e-mails, reports, databases, etc.). ERA's usefulness in providing timely and full access to electronic records now and into the future. It is unclear when, and by whom, the decision not to pursue full text searching of record contents was made. ERA's previous director advised the OIG in April of2010 that this was a policy decision that would be made in sometime around the end of2010 or the beginning of 2011. Other senior NARA staff stated there was never a requirement to search the full text of record contents, and thus this decision was made at the very beginning of the contract. When questioned about the reason for ERA's general lack of search capacity, senior NARA officials have had a variety of responses. Some adamantly stressed they believed the ERA requirements never included the capacity to search the full text of record contents. Others blamed the technical requirements, combined with reduced funding for the program. Still others claimed they believed researchers would not want full text search capability, or that being able to completely search record contents is not desirable for an electronic archive. Many ofthe "architects" of ERA have since left the program and Federal service. We look forward to your prompt response to these concerns. Should you have any questions, please contact me at (301) 837-1532. t/?J/V/_J Paul Brachfeld Inspector General
Electronic Records Archive Lacks Ability to Search Records' Contents
Published by the National Archives and Records Administration, Office of Inspector General on 2011-01-05.
Below is a raw (and likely hideous) rendition of the original report. (PDF)