United States General Accounting Office Accounting and Information GAO Management Division October 1999 Y2K Computing Challenge: Day One Planning and Operations Guide G A O Accountability ✮ Integrity ✮ Reliability GAO/AIMD-10.1.22 Preface Federal agencies are making significant progress racing against time and renovating, validating, and implementing their mission-critical information systems. Nevertheless, organizations remain vulnerable to Y2K disruptions. Because most federal agencies are highly dependent on information technology to carry out their missions, Year 2000-induced failures of one or more critical systems may have an adverse impact on an organization’s ability to deliver critical services. The risk of failure is not limited to an organization’s internal information systems. Many federal agencies depend on information and data provided by their business partners—including other federal agencies, hundreds of state and local agencies, private-sector entities, and international organizations. Finally, every organization depends on services provided by the public infrastructure—including power, water, transportation, and voice and data telecommunications. In August 1998, we published a guide to help agencies address business continuity and contingency planning issues.1 The guide provides a conceptual framework for managing and mitigating the risks of potential Year 2000-induced disruptions to agency operations. It also calls for agencies to develop a “day zero” strategy, also known as a “Day One” strategy, to manage the critical century rollover period.2 A Day One strategy comprises a comprehensive set of actions to be executed by a federal agency during the last days of 1999 and the first days of 2000. It must be integrated with agency business continuity and contingency plans, and should describe the key activities and responsibilities of agency component organizations and staff. Day One Operations Pre-rollover Post-rollover December 1999 January 2000 28 29 30 31 1 2 3 4 5 Tue Wed Thur Fri Sat Sun Mon Tue Wed Rollover Typical Day One Operations The objectives of a Day One strategy are to (1) position an organization to readily identify Year 1Year 2000 Computing Crisis: Business Continuity and Contingency Planning (GAO/AIMD-10.1.19, issued as an exposure draft in May 1998; issued final in August 1998). The guide is available at <http://www.gao.gov/special.pubs/bcpguide.pdf>. 2Agen cies sh ould also be pr epar ed to deal with laten t post-r ollover pr oblems th at may emer ge days or mon th s later, and with other key date-sensitive events such as the leap year. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 1 2000-induced problems, take needed corrective actions, and minimize adverse impact on agency operations and key business processes, and (2) provide information about an organization’s Year 2000 condition to executive management, business partners, and the public. The guide provides a conceptual framework for helping agencies develop a Day One strategy and reduce the risk of adverse Y2K impact on agency operations. It builds upon our previously issued Year 2000 business continuity and contingency planning guide, and draws on other sources, including the Social Security Administration, International Business Machines Corporation, and the Legislative Branch Y2K Group. Because each agency is different, there is no single, cookie-cutter approach to Day One planning. Some agencies are highly centralized, while others operate in a highly decentralized environment. This guide addresses issues that will be common to most agencies; however, each agency must tailor its Day One plan in response to its unique needs. The guide addresses four phases supported by executive oversight: • Initiation • Rollover risk assessment, planning, and preparation • Rehearsal • Execution, monitoring, responding, and reporting In addition to executive oversight, the four phases are united by the common thread of accountability at all levels. An electronic version of this guide is available from GAO’s World Wide Web at <www.gao.gov/special.pubs/dayone.pdf>. If you have any comments or questions about the guide, please contact us, or Mirko J. Dolak, Technical Assistant Director, at (202) 512-6362; or E. Randolph Tekeley, Technical Assistant Director, at (202) 512-4070. We can also be reached by e-mail at firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, and email@example.com. Joel C. Willemssen Keith A. Rhodes Director Director Civil Agencies Information Systems Office of Computer and Information Technology Assessment GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 2 Contents ____________________________________________________________________________ Day One and the Year 2000 Problem 4 ____________________________________________________________________________ Initiation 5 ____________________________________________________________________________ Rollover Risk Assessment, Planning, and Preparation 7 ____________________________________________________________________________ Rehearsal 11 ____________________________________________________________________________ Execution, Monitoring, Responding, and Reporting 12 ____________________________________________________________________________ Glossary 15 GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 3 Day One and the Year 2000 Although agencies are making significant progress in renovating and testing their mission- critical systems, crossing the century boundary will nevertheless present many challenges. To address these challenges, each agency should develop a Day One strategy for reducing risk to its facilities, systems, programs, and services during the weekend of the critical century rollover. Such a strategy should focus on actions to be taken shortly before, during, and after the rollover. This guide presents a structured approach to aid federal agencies in Day One planning and management. The guide draws on (1) the Day One plan of the Social Security Administration, (2) the rollover guidance3 developed by the International Business Machines Corporation, and (3) the Day One Guide drafted by the Legislative Branch Y2K Group. It describes four phases --supported by agency executive management--with each phase representing a major Day One planning project activity or segment. Year 2000 Day One Planning and Operations Structure Establish a Day One project work group and develop an Initiation overall Day One planning strategy. Develop master schedule and milestones and obtain executive support. Rollover Risk Assess the risk of internal and external Y2K failures. Assessment, Develop Day One plans and establish command center(s). Identify activities and processes for the pre-rollover and Planning, post-event periods. Develop communications plan. Executive Preparation Oversight Define and document rehearsal plans and rehearse selected Rehearsal Day One operations and teams. Update and revise Day One plan as needed. Execution, Execute rollover procedures and tests. Identify and resolve problems, report status. Monitoring, Responding, Reporting 3Plan n in g for the 1999 to 2000 Rollover of IT Systems, Year 2000 Global Initiatives, International Business Machines Corporation, June 1999. The guide is available at <www.ibm.com/ibm/year2000/docs/rollover/rollover_english.pdf>. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 4 1.0 Initiation Executive management needs to be fully aware that the century rollover period can pose risks to an agency’s ability to deliver services. Executives responsible for the agencies’ core business processes should work with the Chief Information Officer, the Chief Financial Officer, and the Year 2000 program manager to develop a Day One strategy. Agency executives must dedicate sufficient resources and staff for this task, and ensure that senior managers support this effort. Key Tasks 1.1 Establish a Day One project work group, define roles, and assign responsibility 1.2 Develop and document a high-level Day One planning strategy, and establish schedule and milestones 1.3 Establish Day One leave and compensation policy 1.4 Establish or activate mechanisms for planned reviews of Day One activities 1.1 Establish a Day One project work group, define roles, and assign responsibility Establish a Day One work group. The group should report to executive management and work closely or be aligned with the business continuity and contingency planning group. This group should also include representatives from the IRM office, major business units, and field organizations. Manage the Day One planning tasks and activities as a sub-project. Define roles and assign responsibilities for leading the planning effort and for performing analyses and developing Day One procedures and plans. Determine what major business units and field organizations need to develop Day One plans. The principal candidates are organizational components responsible for key business processes—usually components that were required to develop individual business continuity and contingency plans, and large field organizations. Appoint individuals to lead the development of Day One plans for each of the designated organizational components. 1.2 Develop and document a high-level Day One planning strategy, and establish schedule and milestones A high-level Day One planning strategy provides the agency’s executive management with an overview of key activities. The strategy should address the project management structure, the number and location of information coordination and/or command centers, the project’s relationship with the business continuity and contingency planning effort, metrics and reporting requirements, and initial cost and schedule estimates. Develop a schedule and milestones for the planning effort. Ensure time for rehearsing GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 5 the key teams needed to carry out each phase and function of the strategy. Coordinate Day One planning with the agency’s business continuity and contingency planning effort, and ensure that the Day One plan incorporates appropriate procedures and processes identified in the agency’s continuity of operations plan. 1.3 Establish Day One leave and compensation policy Address rollover leave and compensation issues. Work with human resources departments to determine special compensation requirements to deal with unused annual leave, overtime, holiday work, and standby. Consult the Office of Personnel Management for further guidance at <www.opm.gov/y2k/index.htm>. 1.4 Establish or activate mechanisms for planned reviews of Day One activities Establish or activate mechanisms to review the Day One planning process and strategy for omissions, feasibility, and key assumptions. Use vehicles such as quality assurance office staff to ensure that the business resumption teams established during the business continuity and contingency planning process are ready, and that the individual business continuity plans were adequately tested. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 6 2.0 Rollover Risk Assessment, Planning, and Preparation Day One planning integrates and acts on the results of the rollover risk assessment. The output of this process is a Day One plan for headquarters and--if required--for each of the designated organizational units responsible for key business processes. The plan should focus on the key Day One functions (monitoring, responding, and reporting) implemented within the framework of the pre-rollover and post-rollover phases. Each plan should provide a description of the activities, resources, staff roles, and timetables needed for its execution. Key Tasks 2.1 Assess failure scenarios documented in business continuity and contingency plans and identify key risk areas, including risks faced by business partners 2.2 Establish information coordination and/or command center(s) 2.3 Define the pre-rollover and post-rollover responsibilities for the command center(s), major business units, facilities, and critical infrastructure components 2.4 Develop agencywide schedule of key events for the pre- and post-rollover periods 2.5 Develop or acquire and document facilities infrastructure checklists and templates 2.6 Develop post-rollover test plans and procedures for testing of key business systems and processes 2.7 Establish rapid response procedures; develop rules for the deployment of business resumption teams and for the execution of contingency and disaster recovery plans 2.8 Develop rollover staffing plan 2.9 Review vendor service agreements and availability 2.10 Establish rollover security procedures for coordination/command center(s) and designated Day One sites 2.11 Develop and document internal communications procedures; establish process for monitoring external events 2.12 Develop external communications strategy and proced dures 2.1 Assess failure scenarios documented in business continuity and contingency plans and identify key risk areas, including risks faced by business partners Identify key risk areas, including facilities, critical infrastructure components, and systems supporting core business processes, that should be addressed by the Day One plan. Rely on the results of Year 2000 risk assessment and testing activities to identify the most likely failure modes and scenarios. Consider the potential impact of external events, including the loss of electric power, telephone communications, water, and natural gas. Consider risks faced by business partners. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 7 2.2 Establish information coordination and/or command center(s) Establish information coordination and/or command center(s) to coordinate and report on agencywide rollover activities. 2.3 Define the pre-rollover and post-rollover responsibilities for the coordination/command center(s), major business units, facilities, and critical infrastructure components Define, for each organizational component responsible for a key business process, its pre- and post-rollover responsibilities, staffing, and problem resolution and escalation procedures. Specify events and conditions defined in the business continuity and contingency plan that will trigger contingency plans and activate business resumption teams. Review procedures and triggers for activating contingency plans to deliver services to constituencies served by business partners experiencing Year 2000-induced system and process failures. Specify data backup requirements, and identify systems and infrastructure components that are to remain operational, remain idle, or be shut down. Develop start-up plan for idled or shutdown systems and infrastructure components. 2.4 Develop agencywide schedule of key events for the pre- and post-rollover periods Develop agencywide schedule of key events and activities, including a schedule for closing business areas, shutting down systems, and facilities inspections. 2.5 Develop or acquire and distribute facilities infrastructure checklists and templates The century rollover may affect the availability of public utilities and the operations of building services, including elevators, air conditioning, and building security systems. Agencies should develop or acquire infrastructure checklists, data capture and reporting templates, and reporting procedures to assist building managers in assessing the rollover impact on agency facilities. 2.6 Develop post-rollover test plans and procedures for testing of key business systems and processes The post-rollover test and assessment process must be well defined and scripted to ensure adequate coverage and proper sequencing. For example, evaluation procedures may call for sequential evaluation beginning with hardware (mainframes, servers, PBX systems, routers, and switches), followed by connectivity testing (LANs and WANs), and ending with end-to-end evaluation and testing of mission-critical systems with live data. Where possible, use existing troubleshooting and problem reporting processes and procedures during the rollover period, including those specified in agency Continuity of Operations plans. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 8 2.7 Establish rapid response procedures; develop rules for the deployment of business resumption teams and for the execution of contingency and disaster recovery plans Define risk-escalation procedures, activation thresholds, and action plans for predefined failure scenarios, including the activation of business resumption teams. Consider activating and terminating a response based on risk-escalation thresholds for the degradation or loss of business systems and applications. 2.8 Develop rollover staffing plan Develop a rollover staffing plan to support Day One activities. Ensure that key technical staff are available to handle infrastructure problems, and that business resumption teams will be available to respond to problems affecting agency core business processes. Assess the availability of key staff to support Day One operations. Develop procedures for reaching all employees, including on-duty and on-call staff. 2.9 Review vendor service agreements and availability Review contract support arrangements to determine availability and coverage. Confirm the availability and contact information for key vendors. Avoid entering into an exclusive service arrangement with vendors serving other clients. 2.10 Establish rollover security procedures for coordination/command center(s) and designated Day One sites Review security plans to ensure their support for rollover activities. Ensure access to facilities in the event of security system failures, and provide additional physical security for critical facilities. Assess system and network security during the rollover period. Consider risks posed by remote access services and public web servers. Modify or reduce remote access to network resources during the rollover period. Develop plans to monitor, resolve, and report network security incidents. 2.11 Develop and document internal communications procedures; establish process for monitoring external events Develop and document internal communications strategy and process, both for intra- and inter-component communications, including communications between the Day One sites and the coordination/command center. Define status indicators and identify reportable events. Define modes of communication and reporting channels. Ensure that key Day One teams and staff have access to emergency communications equipment in case of the failure of public telecommunications networks. Establish process for the monitoring of external events, including communications with the President's Council on Year 2000 Conversion’s Information Coordination Center, to GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 9 obtain information about major infrastructure problems. Ensure that the coordination/command center(s) have access to information on the status of business partners, key suppliers, and service providers. 2.12 Develop external communications strategy and procedures Develop and document external communications strategy and process. Designate agency Day One spokesperson, and define the process for issuing public announcements and communicating with the media. Establish communication links with Information Coordination Center. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 10 3.0 Rehearsal The Day One plan describes a wide range of complex, interrelated activities and geographically distributed processes that must be executed within a very short time frame. To ensure that the Day One strategy is executable, the Day One plans and their key processes and timetables should be reviewed, and, if feasible, rehearsed. Similarly, agencies may wish to rehearse the operations and integration of the Year 2000 coordination/command center(s) and teams responsible for the Day One activities in major business units, facilities, field organizations, and infrastructure components. Key Tasks 3.1 Develop and document Day One rehearsal plan 3.2 Rehearse selected Day One operations and teams 3.3 Update and revise Day One plans and procedures based upon lessons learned 3.1 Develop and document Day One rehearsal plan Define and document the Day One rehearsal plan. Ensure that management approves the plan. Disseminate applicable guidance and establish a help desk, preferably in the coordination/command center(s). Rehearsal plans should address objectives, scope, required equipment and personnel resources, likely scenarios, schedules and locations, and procedures. 3.2 Rehearse selected Day One operations and teams Rehearse the operations of the coordination/command center(s) and of selected Day One teams to ensure that the coordination/command center staff and team member are familiar with Day One procedures and their roles. Test emergency communications and security procedures designed to deal with the loss of electric power and public communications networks. 3.3 Update and revise Day One plans and procedures based upon lessons learned Identify and resolve shortcomings and problems noted during the rehearsals and testing, and revise the Day One plan and procedures accordingly. Retest, if feasible, critical processes, and update plans. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 11 4.0 Execution, Monitoring, Responding, and Reporting Unlike schedules for most other plans, the Day One schedule is immovable, with the rollover to occur at 12:00 A.M. on Saturday, January 1, 2000. The status—and success—of an agency’s Year 2000 program will first be evident immediately after the rollover, with more detailed information on the status of mission-critical systems and on the viability of agency core business processes emerging during the post-rollover phase. The status information gathered during the post-rollover phase needs to be communicated to executive management as well as to external parties. Key Tasks Pre-rollover 4.1 Activate information coordination/command center(s) and designated Day One sites 4.2 Implement pre-rollover procedures specific to each site 4.3 Implement planned risk prevention and risk reduction measures Post-rollover 4.4 Conduct facilities inspection using infrastructure checklist 4.5 Perform post-rollover tests, evaluations, and assessment 4.6 Identify and report incidents, including problems, failures, and outages 4.7 Perform problem and crisis management 4.8 Implement recovery procedures and change control 4.9 Activate, if necessary, business continuity and contingency plans 4.10 Implement internal and external reporting procedures 4.1 Activate information coordination/command center(s) and designated Day One sites Activate coordination/command center(s) and designated sites, perform connectivity checks, and test reporting procedures and tools. Ensure that each site has accessible emergency contact information for local utilities, public safety organizations, key vendors, technical consultants, and contractors servicing critical equipment. Ensure that each site and business area addressed by a rollover strategy has a designated crisis intervention contact and that communication is established to the coordination/command center(s). Ensure that response teams are in communication, ready and awaiting dispatch. 4.2 Implement pre-rollover procedures specific to each site Implement pre-rollover procedures at all designated sites. Conduct operational inspections and readiness checks for normal and emergency backup services, including electric power, lighting, water, communications, and transportation, as appropriate. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 12 4.3 Implement planned risk-prevention and risk-reduction measures Implement planned measures to prevent and reduce risks in business processes, systems, and applications. For example, perform backups of all critical files and data, including automated workflow rules, system configurations, database journals, system data, and application data. Implement shutdown or partial shutdown of processes and applications that may not be able to process correctly across the rollover event. 4.4 Conduct facilities inspection using infrastructure checklist Inspect facilities using infrastructure checklist. Report status and problems, and initiate corrective action. 4.5 Perform post-rollover tests, evaluations, and assessments Initiate post-rollover test and evaluation procedures. Test key business processes and supporting systems. Coordinate tests and evaluations with designated Day One sites and business partners. 4.6 Identify and report incidents, including problems, failures, and outages Report significant failures and outages. Provide status reports on key business processes and supporting systems. Identify and report on potential impact of failures and outages. 4.7 Perform problem and crisis management Document all reported incidents, assign priority, and dispatch appropriate problem response team. Periodically review and escalate response as appropriate. When called for, invoke emergency procedures. Track incidents, problems, and crises to closure. 4.8 Implement recovery procedures Follow recovery procedures for the rollover period. Pay special attention to systems and applications that were fully or partially shut down as a preventive measure. Test and implement corrections. 4.9 Activate, if necessary, business continuity and contingency plans Use escalation thresholds to invoke contingency plans. If a problem remains unresolved for a specific length of time and the impact to operations exceeds a predefined threshold, activate business resumption teams. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 13 4.10 Implement internal and external reporting procedures Implement internal reporting procedures. Ensure that the coordination/command center(s) are provided with timely and accurate information on the rollover status of each designated site. Use standard data collection and reporting templates and tools to gather and report status information. Implement external communications strategy and process. If required, prepare and issue public announcements and communicate with the media. Ensure that the Information Coordination Center is provided with timely and accurate status information. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 14 Glossary The definitions in this glossary were developed by the project staff or were drawn from other sources, including the Computer Dictionary: The Comprehensive Standard For Business, School, Library, and Home, Microsoft Press, Washington, D.C., 1991; The Year 2000 Resource Book, Management Support Technology Corp., Framingham, Massachusetts, 1996; The Year 2000 and 2-Digit Dates: A Guide for Planning and Implementation, International Business Machines Corporation, 1997; and Denis Howe’s “Free On-line Dictionary of computing at <foldoc.doc.is.ac.uk/>. Business area A grouping of business functions and processes focused on the production of specific outputs. Business function A group of logically related tasks that are performed together to accomplish a mission-oriented objective. Business plan An action plan that the enterprise will follow on a short-term and/or long-term basis. It specifies the strategic and tactical objectives of the enterprise over a period of time. The plan, therefore, will change over time. Although a business plan is usually written in a style unique to a specific enterprise, it should concisely describe "what" is planned, "why" it is planned, "when" it will be implemented, by "whom" it will be implemented, and "how" it will be assessed. The architects of the plan are typically the principals of the enterprise. Business resumption Teams responsible for managing the implementation of contingency teams and business resumption plans. Contingency plan In the context of the Year 2000 program, a plan for responding to the loss or degradation of essential services due to a Year 2000- related problem in an automated system. In general, a contingency plan describes the steps the enterprise would take--including the activation of manual or contract processes--to ensure the continuity of its core business processes in the event of a Year 2000-induced system failure. Infrastructure In the context of the Day One plan, the hardware, software, facilities, and public utilities supporting the enterprise’s information management functions. Metrics Measures by which processes, resources, and products can be assessed. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 15 Mission-critical A system supporting a core business activity or process. system Quality assurance Planned and systematic actions necessary to provide adequate confidence that a product or service will satisfy given requirements for quality. Risk assessment An activity performed to identify risks and estimate their probability and the impact of their occurrence; used during system development to provide an estimate of potential damage, loss, or harm that could result from the failure to successfully develop individual system components. Risk management A management approach designed to prevent and reduce risks, including system-development risks, and lessen the impact of their occurrence. Test The process of exercising a product to identify differences between expected and actual behavior. Year 2000 problem The potential problems that might be encountered by computer hardware, software, or firmware in processing year-date data for years beyond 1999. GAO/AIMD-10.1.22 Year 2000 Day One Planning and Operations Guide 16
Y2K Computing Challenge: Day One Planning and Operations Guide
Published by the Government Accountability Office on 1999-10-01.
Below is a raw (and likely hideous) rendition of the original report. (PDF)