DOCUMENT RESUME 03404 - [k253678] ProblEms and Needed Improvements in Evaluating Office of Education Programs. HRD-76-165; B-164031(1). September 8, 1977. 76 pp. 4 appendices (53 pp.). Report to the congress; by Elmer B. Staats, Comptroller General. Contact: Husan Resources Div. '3udget Function: Education, Manpower, and Social Services: Elementary, Secondary, and Vocational Education (501). Organizaticii Concerned: Office of Education; Department of Bealtt, Education, and Welfare. Congressional Relevance: House Committee on Education and Labor; Senate Ccmittee cn Human Resources; Congress. Authority: Elementary and Secondary Education Act of 1965 (20 U.S.C. 241a). Education Amendments of 1974 (P.L. 93-380). General Education Provisions Act (20 U.S.C. 1226c). Education Amendments of 1972 (P.L. 92-318). 20 U.S.C. 1231a. A review was conducted to determine the effectiveness of federally supported education evaluations, primarily those concerning elementary and secondary education programs, in order to obtain objective data for allocating resources and in deciding whether cr not programs should be continued or modified. Questionnaires were sent to education agencies in all States and the District of Columbia and to a statistical sample of local scnool districts to obtain State and local agencies' views on Federal education program evaluations. Federal, State, and local education agencies frequently use standardized norm-referenced achievement tests to measure the effect of Federal education programs. indings/Conclusions: The Office of Education's (OE) evaluation studies can better serve Congress by having them timed to coincide with the legislative cycle and by more frequent briefings of congressional committee staffs. OE needs to make a better effort to set forth specific qualitative and quantitative program objectives in order to provide a clear basis for program evaluation. The usefulness of the State and local evaluation reports needs improvements in the areas of: relevance of reports to policy issues, dta completeness and comparability, and report timeliness. If he reporting systems based on aggregated local agency data are to be effective, standardization of data collection efforts is needed. Educators and test experts disagree on the use of standardized norm-referenced tests versus criterion-referenced tests. ore research may be needed on criterion-referenced tests and on how to reduce racial, sexual, and cultural biases in standardized tests. Reccmmendations: The Secretary of HEW should direct OE to: (1) emphasize congressional information needs when planning, implementing, and reporting on evaluation studies; (2) seek agreement with Congress on the specific program objectives to be used for evaluations as well as acceptable evaluation data and measures for each program to be evaluated; and (3) improve the implementation of evaluation results by giving greater attention and priority to procedures such as the issuance of policy implication memoranda designed to assure implementation of those results. OE should review the types of State and/or local program evaluation iformation collected on programs authorized by titles I and VII of the Elementary and Secondary Education Act to determine if it is real: stic to serve Federal, State, and local levels with aggregated data based on local agency evaluation reports. (So) REPORT TO THE CONGRESS ' BY THE COMPTROLLER GENERAL , : '.', OF THE UNITED STATES Problems And Needed Improve- ments In Evaluating Office Of Education Progranms Office of Education Department of Hea!th, Educaticn, an-i Welfare The Office of Education should more strongly emphasize serving congressional needs in plan- ning and carrying out evaluation studies, should define its program objectives more clearly, o..d should improve the implementa- tion of evaluation results. The Office of Education should assess wheth- er State and/or local evaluation reports, under titles I and VII of the Elementary and Sec- ordary Education Act, can realistically be im- proved to supply Federal, State, and local of- ficials with the reliable program information they need for decisionmaking. Serious questions have been raised about standardized "norm-referenced" tests, which are frequently used to measure Federal educa- tion programs' effectiveness. This report dis- cusses some of these questions and various suggestions for alleviating problems in con- ductiig large-scale evaluations of compensa- tory education and desegregation programs. HRD-76-165 SEPTEMBER 8, 1977 COMPTROLLER GENERAL OF IHE UNITED STATES WASHINGTON,. D.C. 20548 B-164031(1) To the President of the Senate and the Speaksr of the House of Representatives This report points out that the Office of Education needs to more strongly emphasize the purpose of providing information to the Congress when planning and implementing evaluation studies, more clearly define its program objec- tives, and improve implementation of evaluation results. The report contains recommendations to improve the useful- ness of educational evaluations. Also discussed i, the re- port are some criticisms and deferses of standardized tests and suggestions for alleviating problems in evaluating large- scale compensatory education and desegregation programs. Our review was made because of the increasing concern shown by the Congress and various Federal agencies with evaluating the effectiveness of major Federal education programs. This concern stems from the need for objective data to be used in allocating resources and in deciding whether or not programs should be continued, modified, or discontinued. We mdde our review pursuant to the Budget and Account- ing Act, 1921 (31 U.S.C. 53), and the Accounting and Audit- ing Act of 1950 (31 U.S.C. 67). We are sending copies of this report to the Director, Office of Management and Budget, and the Secretary of Health, Education, and Welfare. mptroller General of the United States COMPTROLLER GENERAL'S PROBLEMS AND NEEDED REPORT TO THE CONGRESS IMPROVEMENTS IN EVALUATING OFFICE OF EDUCATION PROGRAMS Office of Education Department of Health, Education, and Welfare DIGEST In recent years the Congress and various FedeLr ancies have become increasingly interested in evaluating the effectiveness of major Federal education programs. Such evaluations should help them decide whether programs should be continued, modified, or discontinued. (See p. 1.) To obtain State 'and local agencies' views on Federal education program evaluations and related matters, GAO sent question- naires to education agencies in all States and the District of Columbia and to a sta- tistical sample of local school districts throughout the Nation. (See p. 11.) This report discusses the questionnaire results, the pros and cons of standardized tests, and suggestions from testing and evaluation experts and others for alleviat- ing problems in conducting large-scale evaluations of compensatory education and desegregation programs. (See pp. 53 and 63.) OFFICE OF EDUCATION EVALUATION STUDIES Office of Education evaluations are to help those making decisions about education, in- cluding the Congress. Recently, the main emphasis has been on evaluating the effec- tiveness of major Office of Education pro- grams throughout the Nation. The Office of Education's evaluation studies can better serve the Congress. Timing the studies to coincide with the legislative cycle and more frequently briefing congres- sional committee staff would help. ar Sha . Upon removal, the report cover date sould be noted hereon. i HRD-76-165 Evaluations can reach conclusive and useful findings about a program's effectiveness only if the program's objectives are defined and measurable. For example, a national evaluation of the migrant program did not adequately assess the program's effective- ness because no one had developed acceptable criteria and objectives for measuring the program's success. Program objectives to be evaluated need to be better defined, and the ways evaluation results are used and implemented need more attention. The Congress should recognize that the De- partment of Health, Education, and Welfare (HEW) is not complying, and des not intend to comply, with the legislative requirement to set forth goals and specific objectives for individual programs and projects in- cluded i its annual valuation report to the House Committee on Education and Labor and the Senate Committee on Human Rsources. HEW eels that its authority and ability to comply with this legislative requirement are limited. PROBLEMS WITH STATE AND LOCAL EVALUATION REPOPTS The money spent at the State and local educa- tion agency levels to evaluate selected ed- eral elementary and secondary education pro- grams is large--over $42 million in fiscal year 1974. If the reporting systems bsed on aggregated local aqenzy data are to be effective, vari- ous aspects of the evaluation reports need to be improved. These include the credi- bility of findings and the qualification and quantification of measurement data. The usefulness of the State and local evalua- tion reports also needs to be improved with respect to ii -- relevance of the reports to policy issues, -- completeness and comparability of the data reported, and -- report timeliness. Valid, complete, and comparable evaluation data is important if evaluation results are to meaningfully contribute to decisions at all levels. If local and State evaluation data continues to be aggregated for use at higher levels, data collection efforts and techniques need to be standardized to pro- vide comparable results. Because of constraints on the Federal role .n education, the Government probably will not try to provide needed valid and compar- able data by dictating uniform evaluation methods and procedures to State and local education agency grantees. In addition, questions exist about whether the models for evaluating title I (aid for disadvantaged children) will be able to provide valid and acceptable data to meet program information needs. If these issues are not resolved, GAO questions whether im- provements can be made that will enable .'te reporting systems (based on aggregated local data) to meet program information needs at Federal, local, and/or State levels. USES OF STANDARDIZED TSTS AND PROGRAM-EVALUATION Federal, State, and local education agen- cies frequently use standardized norm- referenced achievement tests to measure the effectiveness of Federal education programs. These tests measure an individ- ual's performance against a "norm" group. However, testing experts and educators disagree about the adequacy of these tests for their intended uses. Serious questions have been raised about the tests, and some organizations have called for a moratorium Tear Sheet iii on their use and a higher priority on develop- ment and use of alternatives. Although the tests' critics and defenders agree that certain problems exist, views differ greatly about the importance or severity of the problems, and their remedies. Defenders recognize that im- provements are needed in such areas as test and test question bias; appropriate test norms; test selection, interpretation, and administration, including test uses for program evaluation; and other issues. Some questions about the tests have great importance in determining the appropriate- ness, validity, and proper conduct of educa- tional program evaluations, including those that are federally funded. Those responsible for making educational decisions should be aware of these issues when using such infor- mation. Additional research may be needed on: -- Criterion-referenced and other tests for uses which include program evaluation, as alternatives to standardized norm- referenced achievement tests. -- How to reduce racial, sexual, and cultural biases in standardized tests. States and especially local education agen- cies need to be more aware of available in- formation intended to help them select the most appropriate tests for evaluation. EVIDENCE OF PROGRAM EFFECTIVENESS Local and State evaluation reports on Fed- eral elementary and secondary education program effectiveness are intended to pro- vide information that local, State, and Federal officials can use to make policy and program decisions. However, State and local officials see important differences in the types of evidence of program effec- tiveness that they and officials at other levels prefer. iv Better communication is needed among the three levels on information they need to facilitate policy and program decisions. Questionnaire results raise this question: Should all three levels be served by a reporting system based on the same reports? Although State officials view Office of Education program officials as being most impressed by standardized norm-referenced test results, and local officials view State and Office of Education officials in the same manner, State and local offi- cials say that they are not most impressed by such results. Local officials prefer broader, more diverse types of information on program results than just these test scores, and they are mot impressed by improvements in curriculum and instructional methods and gains in the affec- tive domain (likes, dislikes, i.e.-rests, attitudes, motives, etc.). State officials are most impressed by results from criterion- referenced tests. The widespread use of standardized norm- referenced tests to evaluate State and local programs indicates that State and local offi- cials have more frequently based their eval- uations on the kinds of results they believe would be likely to most impress higher level officials than on their own preferences. Although HEW's comments were not responsive to certain aspects of GAO's recommendations, it agreed, at least in general, with all but one recommendation. (See pp. 21, 37, 52, and 76.) Tear Sheet v C o n t e n t s Pae DIGEST CHAPTER 1 INTRODUCTION 1 Origin and development of Federal evaluation efforts 1 .dministrati.-. Ppderi eiuation activities Reqiuirements for State and local evaluations of OE programs 5 Limitations of educational evaluations 6 2 SCOPE AND METHOD OF REVIEW 11 3 OPPORTUNITIES TO IMPROVE OE'S EVALUATION STUDIES 14 Congressional needs should be taken more into account 1.4 Program objectives need to be defined 15 Use of evaluation results needs more attention 17 Conclusions 20 Recommendations to the Secretary of HEW 20 Agency comments and our evaluation 21 Recommendation to the Congress 23 4 PROBLEMS WITH STATE AND LOCAL EVALUATION REPORTS 24 State and local evaluation expenditures 24 State and local evaluation reports need improvement 26 Other problems 30 Conclusions 36 Recommendation to the Secretary of HEW 37 Agency comments and our evaluation 37 5 USING STANDARDIZED TESTS 42 Uses and implications of standardized norm-referenced tests 42 Widespread State and local ue of standardized norm-referenced tests to evaluate Federal programs 43 Efforts to evaluate standardized tests 45 Criterion-referenced tests 48 Conclusions 51 Recommendations to the Secretary of HEW 51 Agency comments and our evaluation 52 age CHAPTER 6 STANDARDIZED TESTS AND PROGRAM EVALUATION 53 Criticism of standardized norm- referenced tests 53 Defense of standardized tests 61 Conference on using tests to evaluate programs 63 Conclusions 67 7 PREFERENCES FOR EVIDENCE OF PROGRAM EFFECTIVENESS DIFFER 69 State and local education officials believe standardized test results most impress higher level officials 70 Conclusions 75 Recommendation to the Secretary of HEW 76 APPENDIX I Results of GAO's State education agency questionnaire 77 II Results of GAO's local education agency questionnaire 93 III Letter dated June 15, 1977, from the Inspector General, HEW 116 IV Additional suggestions by OE conferees for improving testing and evaluation 125 V Principal officials of the Department of Health, Education, and Welfare responsible for activities discussed in this report 128 ABBREVIATIONS GAO General Accounting Office HEW Department of Health, Education, and Welfare NIE National Institute of Education OE Office of Education CHAPTER 1 INTRODUCTION ORIGIN AND DEVELOPMENT OF FEDERAL EVALUATION EFFORTS In recent years the Congress and various Federal agencies have become increasingly concerned with evaluating the effec- tiveness of major Federal ducation programs. Because of tight budget constraints, limited financial and human resources, and the need for objective data on the effectiveness of educa- tion programs, the Congress, the Office of Management and Budget, officials of the Department of Health, Education, and Welfare (HEW), and others are requesting such data to help them allocate resources and decide whether or not programs should be continued, modified, or discontinued. Legislative background Legislative andates for evaluation of education programs can be traced back at least to title I of the lementary and Secondary Education Act of 1965 (20 U.S.C. 241a), which re- quired States to assure the adoption of "* * * effective procedures, including provision for appropriate objective measurements of educational achievement * * * for evaluating at least annually the effectiveness of the programs in meeting the special educational needs of educationally deprived children." Congressional interest in evaluation was further reflected in the Education Amendments of 1974 (Public Law 93-380). The Office of Education states that 22 new studies and re- ports are required to be submitted to the Congress by the Commissioner of Education, the Assistant Secretary for Edu- cation, or the Secretary of HEW. Of te 22, 7 are evaluation studies to be conducted by the Office of Education's Office of Planning, Budgeting, and Evaluation; another evaluation and study of Federal title I and Stateis compensa- a thorough tory education programs by the National Institute of Education (NIE). The General Education Provisions Act (20 U.S.C. 1226c) requires that HEW provide the House Committee on Education and Labor and t Senate Committee on Human Resources (pre- viously the Senate Committee on Labor and Public Welfare) an annual evaluation report "which evaluates the effective- ness of applicable programs in achieving their legislated 1 purposes" and recommends improvements. Any evaluation re- port evaluating specific programs and projects is required to -- set forth goals and specific objectives in qualita- tive and quantitative terms for all programs and Projects and relate those goals and objectives to proG iApurposes; -- report on the progress made during the previous fiscal year in achieving such goals and objectives; -- describe the cost and benefits of each program evaluated; and -- contain plans for implementing corrective action and legislative recommendations, where warranted. In addition to a increasing number of congressionally mandated ntional studies, legislation and HEW regulations often require local and/or State evaluations of program effectiveness at least once a year. The Education Amendments of 1972 (Public Law 92-318), established NIE as part of HEW's Education Division. The Director of NIE reports to the Secretary of HEW through the Assistant Secretary for Education, as does the Office of Education (OE). NIE is charged with improving education by, among other things, building an effective educational research and development system. Since educational research includes not only basic and applied research and surveys, but also evaluation, the law provides a new mechanism for evaluating educational programs. However, principal responsi- bility for evaluating OE programs remains with OE. ADMINISTRATION OF FEDERAL EVALUATION ACTIVITIES According to OE, the main goal of Federal educational evaluation studies is to provide information on which policy decisions about Federal education programs and OE resource allocations may be based. To achieve this goal, OE -- conducts national evaluations of the effectiveness of Federal education programs; -- analyzes major educational problems or issues; -- reports annually to the Congress on the effectiveness of OE programs in meeting their legislative intent; 2 -- attempts to identify the program approaches that work and do not work, and determine why; and -- attempLs to identify and validate for dissemination locally initiated innovative practices and exemplary programs. HEW's education evaluations are carried out by several entities: the Office of the Assistant Secretary for Planning and Evaluation; OE's Office of Planning, Budgeting, and Evalu- ation; a limited number of OE program bureaus; the National Advisory Councils; and NIE. Although the Office of the Assistant Secretary for Planning and Evaluation primarily reviews OE's evaluation activities, it also receives a small portion of education evaluation funds to conduct studies for the Secretary. OE is among the few Federal agencies that attempt to integrate the program evaluation process with their budget and legislative cycle. OE's evaluation activities are largely centralized in the Office of Planning, Budgeting, and Evalua- tion. This was done to try to emphasize evaluation more strongly. The table below lists the funds available to OE for plan- ning and evaluation. According to OE, these sums, although substantial, represent less than three-tenths of 1 percent of OE's total annual program appropriations and must cover approximately 85 legislative programs. OE's Assistant Com- missioner for Planning, Budgeting, and Evaluation estimated that from about 1971 on, approximately two-thirds of the OE planning and evaluation appropriation funds have been used for OE evaluation activities. (Chapte 4 provides funding information on State- and local-level evaluations of elemen- tary and secondary education programs. 3 OE Planning and Evaluation Funds OE program OE planning funds used and evaluation for evaluation Fiscal year appropriations (notes a and b) Total - ----- ~---- (000 omitted) 1968 $ 1,250 - $ 1,250 1969 1,250 - 1,250 c/1970 9,512 $ 4,155 13,667 c/, d/1971 12,475 8,724 21,199 3/, e/1972 11,225 3,950 13,175 d/1973 10,205 9,880 20,085 3/1974 5,200 5,268 10,468 3/1975 6,858 11,043 17,901 1976 6,383 10,512 16,895 a/Includes funds authorized from Follow Through, Emergency School Assistance Act, title I of the Elementary and Secondary Education Act, Basic Opportunity Grants, Project Information Packages, and Career Education programs. b/Does not include program funds used by State and local education agenices for evaluations under Elementary and Secondary Education Act, titles I, III, VII, and VIII. c/Does not include $5 million appropriated for grants to States for planning and evaluation under Elementary and Secondary Education Act, title V, part C--Comprehensive Educational Planning and Evaluation. d/Includes support for the Educational Policy Research Centers (at Stanford Research Institute and Syracuse Uni- versity Research Center) for the following fiscal years: $900,000 (1971); $900,000 (1972); $950,000 (1973); and $450,000 (1974). Monitorship of the centers was transferred to the Office of the Assistant Secretary for Education in fiscal year 1974. e/Excludes $1 million earmarked for NIE planning. Systematic, comprehensive evaluation of Federal educa- tion programs at the Federal level dates back only to 1970. At that time the Congress increased OE evaluation funds in response to HEW's request. According to OE, such efforts were largely precluded before then by insufficient appropri- ated funds for evaluation and too few technically qualified 4 evaluation staff members. Since fiscal year 1970, OE has attempted to expand and upgrade its evaluation activities and capabilities. The equivalent of about 23 professional full-time staff members are now assigned to evaluation. The Office of Planning, Budgeting, and Evaluation has designed and begun over 10 evaluation and planning studies; instituted an annual evaluation plan highlighting yearly priorities; and implemented a process for disseminating, cniefly at the Federal level, the major results of evaluation studies. Almost all OE evaluation and planning studies are per- formed under contract. OE's evaluation office issues a re- quest for proposals after determining the study's design and the techniques to be used--for example, sample size, anal- ysis method, and data collection method. Contractors are selected competitively. After a contract is awarded, an OE project monitor from the evaluation off ce monitors the contractor's performance by exercising ap[ oval over the approach to be used, making site visits, and reviewing pro- gress reports. The project monitor also reviews and approves the draft report's technical adequacy, completeness, and responsiveness before the report is finally accepted. OE develops and implements policy recommendations on the basis of the evaluation findings. REQUIREMENTS FOR STATE AND LOCAL EVALUATIONS OF OE PROGRAMS Legislation and HEW regulations often require annual evaluations of Federal programs by State and/or local educa- tion agencies. This is the case for programs and projects funded under titles I, III, and VII of the Elementary and Secondary Education Act. A brief summary of the provisions in each title follows: --Title I provides funds through State education agencies to local education agencies serving areas with concentrations of children from low income families. The funds are intended to meet the special educational needs of educationally deprived children. --Title III has provided funds to local education agen- cies, principally through State ducation agencies, for (1) stimulating and assisting in the development and establishment of exemplary programs to serve as models for regular school programs and (2) assisting 5 the States in establishing and maintaining guidance, counseling, and testing programs. (The Education Ancnd- ments of 1974 consolidated the title II and most title III activities into a new title IV. Fiscal year 1976 was both the first year of funding under the new title IV and the last year of funding under title III. How- ever, final title III projects will not run out until the end of fiscal year 1977, and requirements for State and local evaluations are in effect until that time. Title IV requires evaluation by the advisory council in each State.) -- Title VII provides OE discretionary funds for local education agencies and others to help carry out proj- ects designed to meet the special educational needs of cliildren who speak a language other than English and who come from low income families. Title VII is a demonstration program designed to build up the re- sources needed to start bilingual projects. Local education agencies are required to evaluate their title I, III, and VII projects annually. State education agencies are also required to administer and annually evalu- ate their title I and III programs. However, title VII pro- jects (bilingual education) operate under direct grants from OE; therefore, no State evaluations are required for such projects. OE is nevertheless required to consult with State education agencies before approving local education agencies' title VII grant applications and to give States the oppor- tunity to make recommendations on the applications. LIMITATIONS OF EDUCATIONAL EVALUATIONS The central issue in most educational evaluation studies is whether programs such as title I of the Elementary and Secondary Education Act affect student progress. The President's Commission on School Finance was established to make recommendations to the President regarding the proper Federal role in financing elementary and secondary education. To be able to make its recommendations in the light of educa- tional research results, the Commission asked a major research corporation to assess the available knowledge on what deter- mines educational effectiveness. The resulting 1972 report 1/ 1/Harvey A. Averch, et. al., "How Effective Is Schooling? A Critical Review And Synthesis Of Research Findings,H R-956- PCSF/RC, The Rand Corporation, Mar. 1972, pp. iii-xii, 125, and 158. 6 states tat research has found nothing that consistently and unambiguously makes a difference in student "outcomes." That is, research has not found any educational practice that offers a high probability of widespread success. OE's fiscal year 1975 annual evaluation report makes a similar statement about the attempt to identify the attributes of successful projects. A November 1976 OE-funded report, based on a study of educational innovations, found that the innovations make little difference in student achievement. The 1972 report's findings were based on an examination of how valid the approach and results of numerous individual studies were. The report states that while some studies show a given educational practice to be effective, other similar studies find the same educational practice to be ineffective. It is unclear, the report adds, why this discrepancy exists. According to the report, four substantive problems appear in virtually every area of educational research that limit evaluation studies: -- Research data is, at best, a crude measure of what is happening. For example, student achievement is typically measured by scores on standardized achievement tests despite the many serious problems involved in interpreting such scores. (See chs. 5 and 6.) -- Educational outcomes are almost exclusively measured according to cognitive achievement, often leading to sparse and inconclusive results that provide little guidance on what practices are effective. -- There is almost no examination of the cost implica- tions of research results, which makes it very difficult to translate research/evaluation results into policy-relevant statements. --Few studies adequately monitor the relationship between what actually goes on in the classroom and student achievement, so that data may be affected by circumstances unrecognized in analysis. Because of the problems above, according to the report researchers are confronted by the virtually impossible task of measuring those aspects of education they wish to study. That is, it is impossible for current research to reach de- finitive conclusions about educational outcomes because it cannot measure most of them well. 7 Other studies point out that numerous poorly designed and implemented evaluations result in very questionable or invalid data on which to base decisions about plicy or programs. A 1975 study, 1/ which reviewed the major title I evaluation efforts rom 1965 through 1972, discusses some limitations in educational evaluations. The study traces the accepted belief in the necessity to evaluate education programs to title I of the 1965 act. This legislation established local reporting on projects in the hope that timely and objective information about the results of title I projects could reform *:he local administration of education and the methods of educating poor children. It was also hoped that systematic evaluation could make Federal manage- ment of education programs more efficient. According to the study, those pursuing educational reform saw evaluation as central to achieving it and assumed that reporting require- ments would generate valuable information whichowould be used rationally in contributing to policy and program decisions. The study concluded that after 7 years, more than $52 million of expenditure on evaluations, and creating a number of alternative evaluation models, evaluation had failed to meet the expectations of those urging reform or even to serve the self-interest of Federal program managers. Regard- ing this conclusion, the study stated: "There are numerous reasons why efforts to evaluate Title I failed * * * The central cause is that school districts had no incentive to collect or report output data, and federal officials lacked the political muscle to enforce evalua- tion guidelines or to require cooperation with other federal evaluation efforts * * *." The study added: -- Those interested in reform efforts failed to take into account the difficulty of evaluating the process of schooling in general and title I in particular. 1/Milbrey Wallin McLaughlin, "Evaluation and Reform," a Rand Educational Policy Study, Ballinger Publishing Company, Cambridge, Mass., 1975, pp. vii-ix and 117-120. 8 -- Legislatively mandated evaluation, intended to make school administrators accountable, has led to local evaluation that is, in the view of many observers, little more than an annual ritualistic defense of program ctivities. In addition, the Federal evalu- ation efforts have not contributed to the formulation of short-run management strategies or long-range plan- ning. Evaluations based on an "impact cost-benefit" model have been used selectively to lend an appearance of rationality to decisions that are essentially political. -- Contrary to the expectations of those interested in reform efforts, neither Federal decisionmakers nor local school personnel showed much ability or interest in using evaluations to formulate title I policy or practice. -- Local perceptions of Federal initiatives and commit- ments as inherently unstable, combined with a basic local defensiveness about achievement measures, will probably continue to istrate Federal attempts to secure objective, rell,2ble information on program results. -- The highly political way that title I evaluation has been conducted, including use of its results, has weakened the credibility of evaluation as a policy instrument, in the opinion of many program person- nel. -- A realistic and useful evaluation policy should acknowledge the inherent constraints that the policy system and the behavior of bureaucracies place upon evaluation. In addition, the study stated: "The history of Title I evaluation also suggests a number of implications about the conduct and use of evaluation in a multi-level government structure * * * In a federal system of government, and especially in education, the balance of power resides at the bottom, with special interest groups * * * Thus a federal evaluation policy that conflicts in fundamental ways with local priorities is unlikely to succeed * * * Federal evaluators, then, are faced with a specifically political dilemma generated by their inability to insist upon accurate information on school effects and program 9 impact. And the existence of powerful social sanctions against a strong federal data requirement means that these barriers to the implementation of federal evalua- tion policy will remain." OE's Assistant Commissioner for Planning, Budgeting, and Evaluation said that he agrees with this historical analysis. He believes that the evaluation approach that OE's evaluation office follows takes these problems into account because it is based not on school district data but on contractor data collected nationally. 10 CHAPTER 2 SCOPE AND METHOD OF REVIEW Our objectives were to review the usefulness and limita- tions of federally supported education evaluations--focusing mostly on elementary and secondary education programs--and to solicit suggestions on needed program evaluation improve- ments, including needed research and development. We reviewed OE's evaluation activities--principally carried out through the Office of Planning, Budgeting, and Evaluation--and related NIE activities. We also reviewed legislation, policies, procedures, and various Federal, State, and local education agency program evaluation reports relating to the Elementary and Secondary Education Act, titles I, III, and VII. We interviewed officials from the Office of the Secretary of HEW, the Office of the Assistant Secretary for Education, the National Center for Education Statistics, OE, NIE, and the Office of Management and Budget. In addition, we inter- viewed the staff members of various congressional committees and officials from 11 education research organizations, includ- ing 4 publishers of commercial tests and 6 research/evaluation organizations. We also interviewed officials from two national interest groups concerned witI education, and attended con- ferences of educators and measurement and evaluation experts which were held to improve student assessment or educational program evaluation. To obtain State and local education agencies' views on Federal education program evaluations and related matters, we sent questionnaires to education agencies in all States and the District of Columbia, and to a statistical sample of local school districts throughout the Nation. The two sets of questionnaires were sent in April 1975 and were returned by June 1975. The District of Columbia and 49 States responded to our State-level questionnaire. (To simplify questionnaire results in this report, we consider the District of Columbia to be a State.) Appendix I compiles the responses on the State education agency questionnaire. Respondents to section A of the questionnaire were almost always officials responsible for statewide assessment, accountability, and/or testing activities. Respondents to sections B and C were nearly always officials responsible for titles I and III prog'.ams, respectively. 11 Our questionnaire sample for local school districts was largely the same as a national sample used by the Office of Education in 1973. Neither sample included school districts having fewer than 300 pupils; both were stratified according to enrollment as follows: 125,000 pupils or more; 35,000 to 124,999 pupils; 9,000 to 34,999 pupils; 3,000 to 8,999 pupils; and 300 to 2,999 pupils. Nineteen school districts compose the first group--the largest school districts--and all were included in the sample. An independent random sample of 813 school districts was drawn from the remaining groups. We received responses from 710 (85 percent) of the 832 school districts included in the sample. As a result of the high response rate, the attitudes and opinions expressed in response to our local school dis- trict questionnaire are representative of the entire uni- verse of 11,666 such districts in the Nation having 300 o more pupils. However, we projected the responses to a total of 8,936 local education agencies because this method, based on the weighting and the response rates across the various strata in our sample, allows us to obtain the most accurate percentages on the answers given. Local education agency questionnaire results appear in appendix II. The numbers shown there represent the number of local scnool districts in the Nation to which our local questionnaire sample responses have been Projected. Most local education agency respondents to section A of the questionnaire were directors for testing. Most respondents for sections B, C, and D were directors for titles I, III, and VII projects, respectively. However, in some cases superintendents responded to the questionnaire. Our questionnaires focused on program evaluations of titles I, III, and VII of the Elementary and Secondary Educa- tion Act for a number of reasons. The largest Federal em- phasis in education has been placed on the attempt to deal with various inequalities in educational opportunity. Pro- grams of this kind have attempted to equalize educational op- portunity for groups and individuals who are at a disadvantage (titles I and VII) and to improve the quality and relevance of American education through research and demonstration and dissemination of results (title III). Appropriations for the programs under thee three titles are substantial. For fiscal yeaL 1975 they were $2.2 billion, which represented about one- half of all Federal elementary and secondary education program dollars. In addition, Federal legislation and HW regulations 12 require national, State, and local evaluations and III, and national and local evaluations for titles I for title VII. To supplement information naires, we interviewed educationobtained from the question- agency officials from five States, the District of Columbia, districts. and 10 local school 13 CHAPTER 3 OPPORTUNITIES TO IMPROVE OE'S EVALUATION STUDIES Evidence we gathered indicates that opportunities exist for OE's evaluation studies to better serve the Congress. Specifically, this includes timing the studies better and briefing congressional committee staff more frequently. OE also needs to better define the program objectives to be evaluated and improve the use of evaluation results. CONGRESSIONAL NEEDS SHOULD BE TAKEN MORE INTO ACCOUNT According to OE, its evaluations are intended primarily to assist those involved in making educational decisions. This includes the Congress. In recent years the main emphasis has been on evaluating the national impact or effectiveness of major OE programs. Decisions on education programs are made at various levels--in the Congress, OE, State education agencies, and local school districts. Decisionmakers at different levels sometimes need different information. For evaluation studies to be most useful to them, the views of those who are to use the results should be taken into account in evaluation plan- ning and design. This increases the chance that their in- formation needs will be adequately fulfilled and that resultant decisions will be well defined. It should also increase the chance that evaluation results will be effectively communicated to those intended to use them. One such user to which OE should give greater attention in providing program evaluation information is the Congress. To obtain information on how useful OE evaluation studies are to the Congress and how much congressional views are taken into account in those studies, we contacted four key congressional committee staff members responsible for education matters, in- cluding the Majority and Minority Counsels for the Subcommittee on Elementary, Secondary, and Vocational Education, House Com- mittee on Education and Labor; the Minority Counsel for the Subcommittee on Education, Senate Committee on Human Resources; and the Chief Counsel for the Senate Appropriations Committee. Two persons interviewed believed that OE does not obtain enough congressional input in designing its evaluation studies. 14 One of the four staff members stated that OE evalua- tion studies are generally useful and reasonably gcod, T'.r other three stated that the studies often have been completely ineffective or have had little impact or. legislation. Their most frequently cited reasons were that -- OE studies are not timed to coincide with the legislative cycle; -- OE efforts to interpret data and highlight important findings are insufficient; and --OE briefings for congressional committee staff are not frequent enough. OE's Assistant Commissioner for Planning, Budgeting, and Evaluation agreed with each of these statements and said that the evaluation office has not been sufficiently sensitive and responsive to congressional needs. He noted that the long leadtime necessary to plan and implement studies contributes to this problem. OE's Assistant Commissioner for the Office of Legisla- tion said that he considers the poor timing of evaluation studies to be a major factor inhibiting their impact on legislation. He stated that OE's evaluation office has not made the effort necessary to assure that evaluation results are arrived at and communicated soon enough to be considered in developing legislative proposals. Commenting on our report, HEW stated that it does not concur with one of the three reasons cited above for the limited impact of its studies. HEW believes that its procedure for interpreting and summarizing evaluation study results is not deficie,-t. Brief summaries of each evaulation study are sent to all members of the cognizant House and Senate authorizing and appropriation committees and their staffs, as well as to appropriate HEW Education Division staff and others. PROGRAM OBJECTIVES NEED TO BE DEFINED Evaluations can reach conclusive and useful findings about a program's effectveness only if the program's objec- tives are defined and measurable. However, our review of selected evaluations and reports, as well as discussions with OE and non-OE experts, showed that one major problem in assessing education programs is the lack of sufficiently defined objectives. 15 For example, a national evaluation of the migrant program under title T of the Elementary and Secondary Educa- tion Act had not adequately assessed the program's effective- ness, an OE migrant procram official said, because no one had developed acceptab>o criteria and objectives for mea- suring the program's success. In addition, officials from OE's evaluation office stated that the purpose and objectives of title I itself have not been specified clearly. They said that this has made it difficult to evaluate the effectiveness of the title I program. Other educational researchers have pointed out that it is difficult to establish criteria for Federal programs designed to respond to multiple needs, such as titles I and III of the Elementary and Secondary Education Act. They have stated that when the purposes of a program such as title I or III are ambiguous, various criteria are applied in asses- sing the program which leads to noncomparable evaluation re- sults. Likewise, officials representing a major test publisher said that the objectives of compensatory education programs should be clarified. According to some educational researchers and contrac- tors who compete for OE evaluation contracts, OE studies do not always define clearly the criteria and major questions that should be addressed. For example, a researcher who frequently prepares OE policy studies said that OE evaluation studies do not help in formulating policy because they are not set up to answer the major short-term or long-term ques- tions. Other educational experts stated that lacking measurable and sufficiently defined objectives often led to evaluation studies that addressed unanswerable questions and produced inconclusive results. They added that the language used in legislation, regulations, policy manuals, plans, and budgets is generally ambiguous and fails to precisely define program objectives and make the evaluation useful. As mentioned in chapter 1, legislation requires that HEW provide the congressional committees having responsibility for education with an annual report evaluating program effec- tiveness in achieving legislative purposes. The report is required to set forth goals and specific objectives in 16 qualitative and quantitative terms for all programs and projects assisted which are evaluated and relate those goals and objectives to the program's purposes. Our review of the annual evaluation report on OE pro- grams for fiscal year 1975 showed that most of its statements of program goals and objectives merely restated the legisla- tive purposes or general goals, and did not set forth specific objectives. Quantitative objectives, even in the broadest sense, were established for very few programs. OE's Assistant Commissioner for Planning, Budgeting, and Evaluation agreed with our observations and said that OE has seldom established specific objectives. He stated that this should be corrected, but that OE sometimes faces opposi- tion from the Congress and others when it specifies objec- tives. We agree that the Congress has major responsiblity for specifying program objectives. In our view, however, the legislative requirement and OE's limited responsiveness to it, as well as the need for providing a clear basis for program evaluation, dictate that OE make a better effort to set forth specific qualitative and quantitative program objectives for congressional consideration. USE OF EVALUATION RESULTS NEEDS MORE ATTENTION In 1972, OE's Office of Planning, Budgeting, and Evalua- tion instituted a procedure which entails drafting and im- plementing a "policy implication memorandum" to increase the use of the evaluation findings with which OE concurs in policy and program decisions. However, the procedure has not been used to its potential. The memorandum procedure was developed to translate the findings of evaluation studies into a list of "action items" for program management. It represents an attempt to make sure that study results receive proper attention from OE and department decisionmakers. For instance, the Commissioner of Education may use the memorandum as a base for policy decisions. He may selectively direct actions to be taken, the office responsible for implementing them, and their due dates. According to OE's Assistant Commissioner for Planning, Budgeting, and Evaluation, the procedure is one of the most important parts of the whole evaluation process, which encom- passes evaluation planning through implementation of results. He believes it is superior to other implementation methods 17 because the implications of evaluation results which C(E accepts for implementation, including related followup requirements, are explicitly set forth in areas such as basic policy, budget- ing, staffing, and program regulations. Also, the Assistant Commissioner believes, decisions are more likely to be made on action items under such a procedure. An OE evaluation official said that policy memorandums were to be written after the completion of each evaluation study in which important findings were produced. From March 1972 to March 1974, OE's evaluation unit completed 32 studies, costing a total of about $8.2 million. The Secre- tary's evaluation office, using education evaluation funds of about $3.5 million, completed 18 studies during a similar period. Although OE considers the policy implication memoran- dum procedure a key to assuring the use of study results, it was followed on only two evaluation studies completed during this eriod. 1/ We reviewed its use in both instances--the st 'its cst 120,000 and $772,000--to ascertain how it af. ed policy and program changes. The first policy memorandum was dated December 26, 1972; the other August 19, 1974. Nine months elapsed between the first study's completion date and the preparation of the policy memorandum, and 10 months elapsed for the second study. Several sources doubted the impact of the first study and memorandum. A program official affected by the study, relating to title I of the Elementary and Secondary Education Act, said that there was no evidence showing that memorandum recommendations were being taken seriously or were influential in causing program change. In addition, OE program officials said that they were pursuing some of the recommendations con- tained in the memorandum prior to availability of the results of the evaluation study. Also, OE's Assistant Commissioner for the Office of Legislation stated that neither the policy memoradum nor the study has had much influence on the process of developing new legislation for title I. H. explained that one reason for their lack of impact was the Administration's emphasis on educational revenue-sharing at the time of the study's publication. This emphasis reduced interest in amend- ing existing legislation that might have been substantially l/In November 1976, OE's Assistant Commissioner for Planning, Budgeting, and Evaluation stated that a third memoran- dum had been written. 18 eliminated and replaced if the educational revenue-sharing legislation had been enacted. The evaluation office official who wrote the memorandum said in November 1976 that the evaluation office was not following up on all memorandum recommendations despite the fact that some were still open issues. He agreed that better followup of such open issues should be a part of the policy implication memorandum system. An OE official involved in the evaluation study said that although very few of its recommendations were acted upon, it compiled evidence to support certain conclusions for the first time. A program official stated that if program officials had been consulted about the subject matter, content, and design of the study, they would have been in a better position to use the study's results. Evaluation office officials dis- agreed with this view and stated that they extensively in- volve program officials in designing evaluation studies. The second policy memorandum contained only one major recommendation which required further action. The recommen- dation was implemented, and substantially changed program emphasis. The OE project monitor for this study said that a policy implication memorandum was written for it because the procedure was given high priority at the time. Regarding other evaluation studies for which no policy memorandums were written, OE project monitors gave these explanations: -- The priority placed on writing the memorandums was not high enough. -- Evaluation studies had overlapping cycles; therefore, before one was completed another could start, detract- ing from full appreciation of the earlier one. -- Delays in receiving study reports could have affected writing the memorandums. An OE official commented on this situation, stating that because policy memorandums are not being written, meaningful study conclusions fail to reach policy planners and program administrators who have a voice in the legislative process. He felt that the procedure is needed to call attention to the significant recommendations in each study. 19 In our opinion- delays of nearly 1 year before the two policy memorandums were written and approved and the general lack of such memorandums clearly point to the need for more OE emphasis on assuring increased use of the evaluation find- ings with which OE concurs. This includes giving a higher priority to policy implication memorandums or some other pro- cedure for achieving this purpose. OE's Assistant Commissioner for Planning, Budgeting, and Evaluation agreed that OE has not successfully carried out the policy implication memorandum system. Although he believes the memorandums are of central importance, other priority matters have preempted staff time. CONCLUSIONS Opportunities exist for OE's evaluation studies to better serve the Congress. These include better timing of the studies and more frequently briefing congressional committee staff. There is a need to better define the program objectives to be evaluated. The use and implementation of evaluation results also need more attention. RECOMMENDATIONS TO THE SECRETARY OF HEW We recommend that the Secretary of HEW direct OE to: --More strongly emphasize the purpose of providing in- formation to the Congress when planning, implemen- ting, and reporting on evaluation studies. In particu- lar, more attention should be given to timing the studies so that they more clearly coincide with the legislative cycle and briefing congressional committee staff more frequently. -- Take steps to comply with the General Education Provi- sions Act (20 U.S.C. 1226c) requirement that, in the annual evaluation report to the House Committee on Education and Labor and the Senate Committee on Human Resources, HEW set forth goals and specific objectives in qualitative and quantitative terms for all programs which are evaluated. OE should indicate in the evalua- tion report that it is setting forth specific objectives tentatively in response to the congressional requirement and as a basis for future discussion and agreement with the committees on program evaluation matters. These matters should include the acceptable evaluation data needed by congressional decisionmakers and the measures to be used. If HEW still does not intend to comply with 20 this requirement, it should propose legislative changes to the Congress to avoid continued agency noncompliance. In the meantime OE should initiate dialogues with the appropriate House and Senate committees to seek understanding and agreement on the specific program objectives to be used for evaluations as well as acceptable evaluation data and measures for each program to be evaluated. -- Improve the implementation of evaluation results by giving greater attention and priority to procedures such as the issuance of policy implication memorandums designed to assure implementation of those results. AGENCY COMMENTS AND OUR EVALUATION HEW commented on matters discussed in this report in a letter dated June 15, 1976. (See app. III.) HEW agreed with our recommendation that OE should more strongly emphasize meeting congressional information needs timing the studies to better coincide with the legislative by cycle and briefing congressional staff more frequently. stated that it has initiated a series of reviews of its HEW studies, focusing on predicted production dates for findings and recommendations in relation to critical dates for legis- lative input. HEW also stated that the need for congressional committe staff briefings has not been given proper attention but that it has recently decided to institute such briefings on all major evaluation studies and will initiate this pro- cedure in the coming weeks. Regarding our recommendation that OE should improve implementation of its evaluation results, HEW agreed and the stated that the policy implications memorandum procedure, which is an invention of OE's evaluation office, has not been used nearly as extensively as it should have been. said that efforts are currently underway to expand the useHEW of these memorandums and that agency officials are now con- ducting periodic reviews of the schedule for producing the memorandums and emphasizing their high priority. Our draft report proposed that OE better define the program objectives to be evaluated as required by the eral Education Provisions Act (20 U.S.C. 1226c). Gen- This includes translating the legislative purposes of individual programs evaluated into specific qualitative and quantitative program objectives, and clearly stating these objectives, and the progress made toward achieving them, in the annual evalua- tion report. 21 HEW disagreed with this proposal because it believes there are limits on OE's "authority and ability" to increase the clarity and specificity of its program objectives. comments, however, ignored the fact that the General HEW tion Provisions ct requires the agency to set Educa- objectives for individual programs included in forth such its annual evaluation report to the House Committee on Education Labor and the Senate Committee on Human Resources. and HEW said that in most cases legislation fails a program's objectives with sufficient clarity for to state evaluation. It appears that the Congress has also recognized that program legislation does not provide sufficiently clear and specific objectives for evaluation; therefore, in the General Provisions Act it has required HEW to set these forth Education in a report to the appropriate congressional committees. ments dd not respond to its noncompliance with this HEW com- require- ment, although OE's Assistant Commissioner for lanning, Budgeting, and Evaluation agreed with our finding on matter. this (See p. 17.) HEW stated that it "proceeds at considerable peril in trying to further specify legislation" and that "* * * in many cases it has been the Congress' specific intention to avoid specification of program objectives and leave such judgme:nts and decisions up to State and local officials." However, in conducting national program evaluations the Office of Education is often implicitly establishing and using program objectives; for example, standardized tests, specific frequently used in these evaluations, are based on specific instructional objectives. We believe there is an important distinction ween specific program objectives explicitly set forth bet- for Federal evaluations and those which would be established dictate to State and local education agencies specificallyto how to design and run their programs which use Federal If OE cannot explicitly set forth specific program funds. objectives it would use for Federal evaluation purposes, then we believe it is inconsistent to conduct national program evaluations which contain such objectives implicitly. We agree with HEW that there is political opposition, but we believe such opposition is really directed toward Federal infringement on State and local education agency perogatives. Such opposition effectively restrains the Federal agency from 22 trying to dictate State and local program goals, specific ob- jectives, approaches, etcetera. In our view, HEW should comply with the requirement, but in doing so it should clearly indicate in the evaluation report that the specific objectives are tentative, are being set forth in response to the legislative requirement, and are intended only for congressional scrutiny and as a basis for mutual discussion and agreement on program evaluation matters, i-cluding the acceptable evaluation data needed by congressions_ decisionmakers. In its general comments on our report HEW stated its be- lief that there is growing professional opinion that OE's studies have, over the past 10 years, been responsible for many major changes in existing lgislation. Also, in HEW's view the assumption that there are certain decisionmakers, and that effective evaluations provides timely data to them, is increasingly being questioned. Instead HEW believes that effective evaluations affect the broad political climate within which particular decisions are made. Obviously, HEW believes that OE studies are affecting legislative decisions. However, as discussed in our report, three of the four key congressional committee staff members interviewed, who are responsible for education matters, said that the studies often have been completely ineffective or have had little impact on legislation. Therefore, we continue to believe that the primary purpose of these evaluations should be to provide useful information to decisionmakers. RECOMMENDATION TO THE CONGRESS The Congress should recognize that HEW is not in com- pliance and does not intend to comply with the General Educa- tion Provisions Act requirement (20 U.S.C. 1226c) as noted above. HEW feels that its authority and ability to comply with this legislative requirement are limited. The Chairmen of the House Committee on Education and Labor and the Senate Committee on Human Resources should discuss this matter further with agency officials to seek a common understanding with them on the process or approach to be used for (1) clarifying program objectives for evaluation and (2) reaching agreement on acceptable evaluation measures and data for each program to be evaluated. 23 CHAPTER 4 PROBLEMS WITH STATE AND LOCAL EVALUATION REPORTS A large amount is spent to evaluate State education agency title I and III, and local education agency title I, III, and VII elementary and secondary education programs. Agency officials at these levels, responding to our question- naires, indicated a need to improve program evaluation re- ports, including these important areas for determining pro- gram effectiveness: the credibility of findings and the qualification and quantification of measurement data. Other areas needing attention and improvement to make the State and local evaluation reports more useful include -- the relevance of the reports to policy issues, -- the completeness and comparability of the data reported, and -- report timeliness. The significance of these problems and other factors raise a question about whether the present reporting systems based on aggregated local data can be improved so that they meet program information needs at Federal, local, and/or State levels. STATE AND LOCAL EVALUATION EXPENDITURES In addition to the funds authorized for program evalua- tion at the Federal level (see ch. 1), the Elementary and Secondary Education Act requires annual State and local edu- cation agency evaluations of title I and III programs, and local evaluations for title VII. The following tables show-- on the basis of State and local agency responses to our questionnaire--estimates of the evaluation funds expended by State and local education agencies for the programs during fiscal year 1974. 24 Estimated State-level Expenditures for Evaluatin Selected Elementary and Secondary Edu'aton Programs Fiscal Year 1974 (note a) Title I Title III Total State-level program expendi- tures reported for evaluation $ 2,066,020 $1,723,805 Average State program grant 24,520,132 2,127,455 Average expenditures on evaluation per Sate 43,958 Average: percent of grant spent for 39,177 evaluation (note b) 1.2% 5.4% a/All amounts are based on unverified questionnaire responses from 47 States for title I and 44 States for title III. b/The percentages shown are based on the averages of cent of grant funds reportedly spent for evaluation the per- by the States. However--overall, two-tenths of one percent of title I and 1.8 percent of title III funds were reportedly spent for evaluation. The differences between these per- centages and those shown are because larger percentages of the smaller grants were generally used for evaluation. 25 Estimated Local-level Expenditures for Evaluating Selected Eementary and Secondary Education Programs Fiscal Year 1974 (note a) Title I Title III Title VII Local project expenditures reported for evaluation $31,790,960 $5,089,344 $1,574,320 Average project grant per local district grantee 161,417 50,240 164,705 Average evaluation expendi- tures 3,860 2,101 4,537 Average percent of grant spent for evaluation (note b) 6.4% 5.0% 3.1% a/All amounts are based on unverified questionnaire responses from local school district respondents representing 8,236 title -, 2,422 title III, and 347 title VII projects. b/The percentages shown are based on the averages of the per- cent of grant funds reportedly spent for evaluation by the local projects. However--overall, 2.4, 4.2, and 2.8 per- cent of title I, III, and VII funds, respectively, were reportedly spent for evaluation. The differences between these percentages and those shown above are because larger percentages of the smaller grants were generally used for evaluation. OE officials stated that OE does not collect State and local education agency data on evaluation expenditures. STATE AND LOCAL EVALUATION REPORTS NEED IMPROVEMENT Our questionnaires asked State and local officials con- nected with administering title I, III, and VII programs under the Elementary and Secondary Education Act to rate the adequacy of these aspects of the State and local evaluation reports: credibility of findings, presentation of required management information needs, qualification of findings, qualification and quantification of measurement data, and focus and scope. In our opinion, the adequacy of these aspects is likely to greatly influence how much the reports satisfy the policy, management, and program information needs of State and local officials. 26 In most cases, more respondents rated the various aspects of the evaluation reports in the "adequate or better" catego- ries than in the "less than adequate" categories. However, in each aspect, many respondents to the questionnaires indi- cated that State and local evaluation reports were inadequate; most of these ratings were in the "marginal" category. In addition substantial numbers of State and local officials rated State and local evaluation reports to be less than ade- quate in these important areas for determining program effec- tiveness: the credibility of findings and the qualification and quantification of measurement data. In our opinion, such large nu-oers of less-than-adequate ratings indicate a serious need for improvement in the reports. The following table summarizes the respondents' less- than-adequate ratings of State evaluation reports. State Evaluation-Reports: Summary of Respondents' Less--Than- Adequate Ratings (note a) State Local program project officials officials of title: of title: (percent)- I III I III VII Credibility of findings 59 40 45 36 61 Presentation of required management information needs 59 40 48 38 59 Qualification of findings 47 47 50 39 69 Qualification and quantifica- tion of measurement data 57 55 49 42 51 Focus and scope 42 43 38 33 60 a/Percentages for State program officials are in all c ses based on questionnaire responses from 48 or 49 State. for title I and between 45 and 47 States for title III. Dpr-- centages for local project officials are based on sample responses and in all cases represenit more than 8,300 title I projects, more than 2,350 title III projects, and about 300 title VII projects. See app. I, questions 9-13 and 20-24, and app. II, questions 9-13, 21-25, and 33-37. The following table summarizes the questionnaire respond- ents' less-than-adequate ratings of local evaluation reports: 27 Local Evaluation Reports: Summary f Respondents' Less-Than- Adequate Ratings (note a) State Local program project officials officials of title: of title: (percent)- - I III I III VII Credibility of findings 67 60 38 31 47 Presentation of required management information needs 71 56 39 31 42 Qualification of findings 69 60 41 34 39 Qualification and quantifica- tion of measurement data 57 58 44 40 41 Focus and scope 50 40 30 25 43 a/Percentages for State program officials are in all cases based on questionnaire responses from 48 or 49 States for title I and 48 States for title III. Percentages for local project officials are based on questionnaire sample re- sponses and in all cases represent more than 8,500 title I projects, more than 2,400 title III projects, and about 350 title VII projects. See app. I, questions 9-13 and 20-24, and app. II, questions 9-13, 21-25, and 33-37. Credibility of findings The questionnaire defined this aspect as the degree of confidence expressed in the findings through statements about statistical certainty, soundness of method, evidence of replication, consensual agreements, similar experiences, sup- porting expert judgment and opinions, and reasonableness of assumptions. As the table on page 27 shows, between 36 and 61 per- cent of State and local respondents from the various pro- grams rated the credibility of findings in their program's State evaluation reports to be less than adequate. As the table above shows, the percentage of State and local respondents from the various programs that rated local evaluation reports to be less than adequate in this aspect ranged from 31 to 67 percent. 28 Presentation of required management information needs The questionnaire defined this category as the extent to which the reports are informative to those who evaluate and update current policies and transfer policy decisions into plans, budgets, curriculum or program implementation, opera- tional oversights, resource allocations, forecasts, status assessments and reports, educational accountability, costs, benefits, and efficiency assessments. The percentage of State and local respondents findina this aspect of their program's State evaluation eports to be less than adequate ranged from 38 to 59 percent. Similarly, for local evaluation reports, the range was from 31 to 71 per- cent. Qualification of findings The questionnaire defined this ,.pect as the extent to which the reports properly qualify the findings and assump- tions and identify those conditions and situations to whiLh the findings are not applicable. The percentage of State nd local respondents reting this aspect of their program's State evaluation reports to be less than adequate ranged from 39 to 69 percent; for local evaluation reports, the percentage ranged from 34 to 69 percent. Qualification and quantification of measurement ata The questionnaire defined this category as the extent to which the evaluation assessments can be qualified and quanti- fied into measurable attributes and parameters that address the problem in measurable, operational, or concrete terms. .:e percentage of State and local respondents rating this aspect of their program's State evaluation reports to be less than adequate ranged from 42 to 57 percent; for local evaluation reports, the range was from 40 to 58 percent. Focus and scope The questionnaire defined focus and scope as the adequacy with which the reports covered essential and related material and the appropriateness of the emphasis and treatment given to the relevant topics and details and high and lower prior- ity information. 29 The percentages of State and local respondents rating this aspect of their program's State evaluation reports to be less than adequate ranged from 33 to 60 percent; for local evaluation reports, the range was from 25 to 50 percent. Other questionnaire results Only 61 percent of State title I officials and 62 percent of State title III officials said that local evaluation re- ports "generally" or "very often" adequately show evidence of qualifiable or measurable student benefits; and only 64 per- cent of State title I officials and 50 percent of State title III officials said that State evaluation reports are generally or very often adequate in this respect. Although over 75 percent of the State respondents said they use local and State evaluation results for policy, pro- grammatic, or management decisions, the only data contained in State and local reports which was frequently found ade- quate by State officials was -- the number of children in the program and -- the per-pupil expenditures. Most local school district respondents stated that their reports are generally or more often than generally adequate in providing essential information on the -- number of children in the program, -- per-pupil expenditures for each program, -- evidence of quantifiable and measurable achievement, and -- evidence of quantifiable or measurable pupil benefits. OTHER PROBLEMS To be useful in making decisions at the Federal level, State and local evaluation reports should be timely, com- plete, comparable, and relevant to policy issues. Among the Elementary and Secondary Education Act State and local evalu- ation reports submitted to OE that we reviewed, most did not meet any of these criteria. 30 Three factors need to be considered: -- The significance of these and other problems discussed in this chapter. -- Constraints on the Federal role in education as dis- cussed in chapter 1. (See pp. 8 to 10.) -- Questions about whether ongoing efforts to resolve these problems with evaluation models for title I will be effective. If the questions about the models are not resolved, the feasibility of producing improvements that will enable the present reporting systems based on aggregated local data to provide the information needed is questionable. Usefulness and-relevance to policy issues Evaluation reports have the best chance of affecting policy decisions if they are designed to directly address policy issues. The reports should, for example, indicate programs' or projects' successes and failures. In relation to this, OE officials generally said that State and local evaluation reports were rarely used to sup- port operational or policy changes. State compensatory education program officials made similar statements, saying that State and local evaluations are of limited usefulness to those making decisions. In addition, O's Assistant Commissioner for Planning, Budgeting, and Evaluation stated that there is no question that the State and local evaluation reports are not useful. Similarly, two OE-contracted research studies question the usefulness of State and local evaluation reports: -- A study analyzed the policy-relevance rating of title I State evaluation reports for the 5 fiscal years 1969-73. Study results revealed serious prob- lems concerning the validity of data reported by most States, precluding any meaningful interpretation of data. The study noted that the majority of the re- ports examined were seriously deficient in reporting policy-relevant information. 31 --A 1974 study of a nationwide sample of title VII evaluations concluded that no strong relationship could be established between the content or quality of evaluation reports and funding levels awarded to projects. It also concluded that none of the eval- uations presented data which would i dicate project failure, noting that such information is essential if a report is to be useful to decisionmakers. The study also noted that over half the evaluations it reviewed were of limited or no use for making judg- ments about project effectiveness. Our review of selected evaluation reports and discus- sions with OE program officials generally confirmed that the State and local education agency reports, because of problems cited in this chapter, have little effect on Federal-level decisions. Complete and comparable data in reports Evaluation of Federal programs at the local level should produce reports containing valid, complete, and comparable results if data from each report is to be aggregated to pro- vide Federal policymakers and program administrators with a good perspective on how well the program as a whole has worked and which approaches have produced the best results. Response to congressional requirement The Congress has recognized the need for OE to make State and local education agency evaluation reports more usable. The Education Amendments of 1974 (Public Law 93-380) directed the Commissioner of Education to carry out certain evaluation activities for programs authorized by title I of the Elementary and Secondary Education Act. Section 151 of the act directs the Commissioner to -- provide for independent evaluations which describe and measure impact of programs and projects, -- develop and publish standards for evaluation of program or project effectiveness, -- provide models for evaluation to State education agencies, 32 -- provide such technical and other assistance as may be necessary, -- specify objective criteria which shall be utilized in the evaluation of all programs, and -- outline techniques and methodology for producing data which is comparable on a statewide and nationwide basis. OE has begun activities to address each of these re- quirements, and evaluation models are being developed and refined. OE plans to require State and local use of the models (or use of alternatives that the Commissioner certi- fies will generate compatible evaluation data). However, OE Planning and Evaluation Office officials have stated that they do not believe that evaluation needs at the local, State, and Federal levels can all be met by the same approach. In addition, these officials, including OE's Assistant Commis- sioner for Planning, Budgeting, and Evaluation told us that although the models will make aggregation of locally collected data theoretically possible, accomplishing successful imple- mentation for 14,000 school districts is doubtful, or at least questionable. They noted that the problems that need to be overcome are methodological, fiscal, and political. For example, some State and local officials do not want compari- sons of educational results. kn OE evaluation official said further that the models are based on assumptions about such things as the common metric (scale) used, the soundness of the tests employed, and whether those tests are similar enough to provide data that is truly comparable; these assumptions represent compromises, and whether they will satisfy everybody is unclear. He stated that because of its technical nature the information based on the models may be provided to those who are responsible for decisions without an explanation of the assumptions involved. We believe that such information, when provided to the Con- gress and other users, should set forth these assumptions as clearly as possible. OE's evaluation office is offering these types of technical assistance on title I: written handbooks on evaluation topics, such as the models; technical assistance workshops to train State administrators and evaluators in using the models and to prepare them to train local school district personnel; and consulting services to States to help them use the models. Technical assistance centers have been established throughout the country under OE contracts to 33 provide these services. According to OE, such services might include writing computer programs, conducting workshops to train local personnel, and helping with data analysis. To alleviate many of the States' staffing and technical problems in using the models, OE encourages States to call upon the technical assistance personnel to solve problems at the State level, and, depending on the desires of each State, at the local school district level, too. Need to improve data As stated earlier, OE program officials and State educa- tion agency officials feel that State evaluation reports are generally not useful for making management decisions; both expressed the need for uniform evaluation methods which would lead to comparable data reporting. When local title I officials were asked whether or not local districts could compare the data from their local reports with data on the same program contained in State and Federal reports, 49 percent said they could do this only occasionally or seldom with State reports, and 64 percent replied similarly regarding Federal reports. Corresponding results were obtained for titles III and VII. The results of a 1972 OE-contracted study done by an educational research firm to identify successful State pro- grams and local projects in compensatory education illus- trates the lack of reliable, comparable data. According to OE's Assistant Commissioner for Planning, Budgeting, and Evaluation, the study was able to identify only "a dis- couragingly low number of successful projects," because many projects did not have an evaluation design good enough to produce reliable data on cognii iJe results and many projects were poorly designed, poorly managed, or badly implemented. Furthermore, the study concluded that the lack of representa- tive data from each State that could be combined in a meaning- ful way made it extremely difficult to address the effective- ness of the Elementary and Secondary Education Act title I program. In an attempt to provide a means for Federal, State, and local education agencies to develop comparable evaluative data, OE, through the National Center for Educational Statis- tics (which was part of OE at that time), funded the "Anchor Test Study." The study was intended to develop tables and procedures for equating test scores among the eight most widely used standardized reading tests for fourth, fifth, and sixth grade children. OE developed the basic plan and 34 the detailed specifications for the study, which was intended to become an integral part of Federal elementary and second- ary education program evaluation. OE expected the study to be useful at all levels of educational administration; that is, evaluative results would become useful to State and Federal Governments because program evaluation test data could be combined in a meaningful way. In addition, local school systems could have flexibility in selecting achieve- ment tests to be used in their evaluations. The results of the Anchor Test Study became available in September 1974. According tc our questionnaire, however, as of April-June 1975 all of the 282 projected local educa- tion agency respondents indicated they had little or no information about the study. Although the great majority of State respondents knew about the study, only 18 percent had used it. Test publishers and educational evaluators informed us that the study was technically excellent and potentially useful, but OE had not adequately planned for its use and dissemination. They also noted that the study's usefulness is diminishing as new versions of those tests included in the study are developed and published. State and local education officials interviewed said that the study has had little impact on their evaluation efforts primarily because OE did not pursue its implementation; little effort has been made to direct, encourage, or promote the use of the study so that evaluation results could be made more comparable. Timeliness OE often does not receive annual State evaluation reports in a timely manner. For instance, 1 month before the end of fiscal year 1975, less than half of the fiscal year 1974 State elementary and secondary education, title I, program evaluation reports had been received. Notwithstanding OE's attempts to obtain delinquent reports, several States are up to 2 years behind in submitting theirs. Delinquent reporting helps to prevent meaningful aggre- gation of State evaluation reports into a national picture of program effectiveness that can affect Federal decisions on funding and program operation. According to an OE program official, some of the more significant reasons they have been given by States for late filing include delays due to -- inclusion of summer program results, -- uncooperative local education agencies, 35 -- differences between OE, State, and/or local agency personnel over what information the evaluation reports should contain, -- low State priority for the programs or their evalua- tion, and -- schedule slippages in printing, computer, and staff processing of the reports. CONCLUSIONS The amount of funds spent to evaluate State education agency titles I and III, and local education agency titles I, III, and VII elementary and secondary education programs is substantial. If the reporting systems based on aggregated local agency data are to be effective, there is a need to improve the adequacy of various aspects of the evaluation reports, including two areas important in determining program effectiveness: -- The credibility of findings. -- The qualification and quantification of measurement data. The usefulness of the State and local evaluation reports also needs improvements in -- report relevance to policy issues, -- data completeness and comparability, and -- report timeliness. Valid, complete, and comparable evaluation data is important for results to meaningfully contribute to decisions at all levels. If local and State evaluation data continues to be aggregated for use at higher levels, standardization of data collection efforts and techniques is needed to pro- vide comparable results. Because of the constraints on the Federal role in educa- tion, as discussed in chapter 1 (see pp. 8 to 10), in our opinion it is unlikely that the Federal Government would seek to provide the valid and comparable data needed by dictating uniform evaluation methods and procedures to State and local education agency grantees. In addition, questions exist 36 about whether the title I evaluation models will be able to provide valid and acceptable data to meet program information needs. If these issues are not resolved, we question whether improvements can be made that will enable the reporting systems based on aggregated local data to meet program information needs at Federal, local, and/or State levels. RLCOMMENDATION TO THE SECRETARY OF HEW We recommend that the Secretary direct OE to assess whether State and/or local evaluation reports for titles I and VII of the Elementary and Secondary Education Act can be improved so that they supply officials at Federal, local, and/or State levels with the reliable program information they need for decisionmaking. This includes assessing the adequacy of the title I evaluation models and related data. -- If OE determines that it is unrealistic to expend resources on improving these programs' State and local evaluation reporting systems based on aggregated data, then it should take the needed steps to adopt more feasible and effective approaches. This should in- clude eliminating unwarranted reporting requirements and if necessary proposing to the Congress any legis- lative changes needed to accomplish this. -- If, however, these reporting systems are continued, OE should more strongly emphasize improving the ade- quacy of State and local evaluation reports for title I and local evaluation reports for title VII. This should be done by giving greater management attention and priority to making reports more com- plete and comparable, relevant to policy issues, timely, credible, and adequate in the qualification and quantification of their measurement data. AGENCY COMMENTS AND OUR EVALUATION HEW stated that it concurs with the general thrust of our draft report recommendations for chapters 4 and 7, how- ever, it also stated that most of the actions we proposed are already underway, many by legislative mandate, and in some cases they are near completion. Our report extensively discusses these actions underway. (See pp. 32 to 34.) HEW's statement implies that the actions underway are likely to be successful. Statements by OE evaluation office offic ials directly contradicted this implication and questioned t practical feasibility of 37 aggregating local level data. (See p. 33.) We concluded in chapter 4 that there are issues which raise questions about whether the evaluation models will be able to provide valid and acceptable data to meet program information needs. Therefore, we believe that OE needs to assess the effect of the problems connected with its "actions underway" in rela- tion to the feasibility and cost-effectiveness of improving the reporting systems to provide the reliable information needed for decisionmaking at Federal, State, and local levels. HEW also said that our understanding was incomplete regarding the much needed and legislatively mandated actions to improve State and local evaluations and reporting. In clarifying the meaning of this statement in relation to title I, HEW stated that we should have provided more infor- mation relating to the workshops, technical assistance cen- ters, and information dissemination activities relating to the evaluation models mandated by the Education Amendments of 1974. We disagree. Our report discusses each of these issues. (See pp. 32 to 34.) Appendix III provides HEW's additional information on these matters. Specifically regarding title I, HEW stated that once the models and revised reporting system are in place across the Nation, their use should produce data which can be aggregated across school districts and States. At that time, HEW be- lieves that OE will be in a better position to assess whether or not the data are sufficiently free of systematic errors to support satisfactory aggregations to the State and national levels. If they are not, HEW stated it can then determine whether technical problems could be overcome or whether dif- ferent kinds of studies are needed to satisfy Federal, State, and local reporting requirements. We believe that to foster maximum efficiency and economy OE should make the needed assessment as soon as it has a good enough understanding, if it does not already, of such factors as the likely validity, reliability, and comparability of the reporting system data as well as the other methodological, fiscal, and political problems involved. HEW also stated that OE has interviewed personnel in Federal policymaking roles and received information from an advisory title I group (which included parent representa- tives) on the kinds of information that should be included in the annual State and local evaluation reports. In addi- tion, evaluation models and their reporting forms were developed and reviewed by each State agency and three of its local agencies to identify possible problems. As a result of 38 all these efforts, a limited core of essential information was identified as "desirable at the Federal level" and this will become the Federal evaluation requirement when regula- tions for this portion of the legislation are published. We commend efforts to identify the Federal information requirements. However, we remain concerned about the iden- tification of State and local agency information require- ments and whether these needs can all be met by the same reporting system. We are also concerned about the possible unnecessary duplication of using both this reporting system and OE national evaluations on these programs to meet these Federal information requirements. With respect to title VII, HEW stated that local evalua- tion reports can be improved and cited the following steps intended o accomplish this: -- HEW has recently published regulations strengthening requirements for bilingual project evaluation. -- The National Institute of Education and OE have a joint project underway to upgrade the technical expertise of local evaluators. Although HEW indicated that aggregating local title VII data is more difficult than for title I, aside from concurring with the general thrust of our recommendation, it did not re- spond to our proposal for OE to assess whether it is realistic to try to improve local title VII reports so that they supply Federal-level decisionmakers with the reliable information they need. General comments In addition, HEW provided several general comments. These observations and our responses follow. HEW comment HEW stated that the report needs to give more careful consideration to evaluation costs and that the quality of data the report "appears to expect" would require significant additional resources which would be high in relation to the possible payoffs through program improvements. 39 Our response These comments ignore the cost-effectiveness considera- tions of our recommendations. In addition, HEW has mis- interpreted what our report expects. Although HEW comments did not recognize or respond to this proposal, our draft report proposed that OE assess whether it is realistic to expend additional resources on improving the State and local reporting systems for titles I and VII. If OE deter- mines that it is not realistic, then we proposed that it initiate action to eliminate unwarranted reporting require- ments, including proposing to the Congress any needed legis- lative changes. We would certainly expect cost-effectiveness considerations to be a part of OE's assessment. Our draft report also proposed (see p. 76) that in connection with this assessment, OE determine (1) whether it is realistic to try to serve Federal, State, and local levels with information based on local agency evaluations and (2) how the information needs at Federal, State, and local levels can best be met. HEW comments also did not recognize or respond to this proposal. These proposals certainly do not require "significant additional resources" on reporting systems. In fact, they question the value of present and proposed expenditures and suggest that OE face this issue. We recommend this assess- ment because it is not clear and has not been demonstrated that the reporting systems based on aggregated local level data are now providing or can be made to provide valid, useful, and cost-effective data to Federal, local, and/or State decisionmakers. In our view, significant additional resources should not be expended until there is some solid evidence that they would be cost-effective. HEW comment HEW stated that the report does not give adequate recognition to whether the tradeoffs in improved program quality are likely to justify additional spending. Our response Once again, HEW has misinterpreted our draft report proposals. As discussed above, we proposed that OE make the needed assessment to determine the realism of trying to improve these reporting systems. We believe that such an assessment, if properly conducted, would necessarily include consideration of the tradeoffs involved. In our view it is the agency's responsibility to make such assessments. This 40 is especially true in this situation where, as discussed in this chapter, not only have OE-funded studies shown signifi- cant problems but OE evaluation office officials have ex- pressed serious questions about the feasibility of the ap- proach currently being followed to solve these problems. HEW comment HEW took exception to our statement that the "amount of funds spent to evaluate State and/or local education agency titles I, III, and VII elementary and secondary education programs is substantial," saying that the percentages of funds involved at the State level are not substantial. Our response The amount that we intended to refer to is the total reportedly spent not only on State evaluations for titles I and III, but also on local evaluations for titles I, III, and VII. This totals in excess of $42 million--this is a substantial amount. 41 CHAPTER 5 USING STANDARDIZED TESTS State and local education agencies generally use stand- ardized "norm-referenced" tests to measure the effectiveness of Federal elementary and secondary education programs. OE's national evaluations also frequently use these tests. Most State and local respondents to our questionnaires believe there is a substantial or very great need for in- creased efforts to develop alternatives to standardi:ed norm-referenced tests, such as "criterion-referenced tests. Many of these officials also believe that increased efforts are needed to reduce racial, sexual, and cultural biases in tests. The questionnaire results indicated a lack of aware- ness among State and especially local agency officials on information available to help them select appropriate stand- ardized tests. USES AND IMPLICATIONS OF STANDARDIZED NORM-REFERENCED TESTS Standardized norm-referenced tests were devised to mea- sure the status of an individual in relation to other individuals--the norm group. The score an individual receives has meaning in relation to the performance of the norm group, not the educational objectives involved; therefore, such tests are described as norm-referenced. A standardized norm-referenced test differs from other tests given by schools in that it (1) is almost always con- structed by specialists in educational testing, (2) has explicit instructions for standard or uniform administration, and (3) has norms for interpreting te't results. These norms have been derived from giving the test to a sample of persons intended to represent the whole group for whom the test is designed. There are many kinds of standardized tests given in schools, business, and the military services: intelligence, academic aptitude, achievement, personality, attitude, interest inventory, and vocational aptitude tests. In dis- cussing tests, this report deals almost exclusively with achievement tests--those which measure current knowledge or competencies. Most of the standa dized tests that cildren take in school are of this kind. Today the typical school- child takes one to three such tests every year. 42 Five or six companies account for about three-fourths of the total test sales in the country. These companies have all been in the testing business for a long time; most helped pioneer the testing field in the 1920's. The companies all sell a wide variety as well as a large volume of tests (most list more than 100 teits in their catalogues), and provide extensive services to test customers. These factors contribute to the widespread use and acceptance of the tests. Standardized norm-referenced achievement tests are used to evaluate both individuals and programs. For student evaluations, the tests are used to rank or compare students for such purposes as counseling them, assigning them to a class within a grade or a group within a class, assigning students to a special program (for example, for the retarded or gifted), indicating the kind of courses a student may take in junior high school or high school, or gaining admit- tance to college or graduate school. In relation to program evaluations, standardized tests are used widely at the Fed- eral, State, and local levels, to determine the effective- ness of OE programs such as titles I, III, and VII of the Elementary and Secondary Education Act. WIDESPREAD STATE AND LOCAL USE OF STANDARDIZED ORM-REFERENCED TESTS TO EVALUATE FEDERAL PROGRAMS Most OE-funded evaluations of Federal education programs at the national, State, and local levels have at least one common purpose: they are to measure and report on the effec- tiveness of federally funded programs and projects. Many State and local evaluations are congressionally mandated, and national evaluations, according to OE evaluation officials, are conducted either because of a congressional request or to provide responsible agency officials and congressional members with nationally comparative data about a particular program or approach to education. These national evaluations frequently se standardized tests. The General Education Provisions Act (20 U.S.C. 1231a) requires OE to collect information intended to objectively measure the effectiveness cF education programs and permits local education agencies to use systematic measurement approaches, approved by the Commissioner of Education, that will assure adequate evaluation of each program. Our questionnaire asked State education agencies which of several techniques they employed for their 1973-74 43 evaluations of projects funded through titles I and III of the Elementary and Secondary Education Act. The following table shos the results for respondents in the 48 States that answered this question: S.-te Education Agency Techniques for Evaluating Titles I and III Title I Title III Percent Percent Techniques Number (note a) Number (note a) Aggregation and analysis of data from local educa - tion agency reports 45 94 41 85 Educational audits and their results 9 19 27 56 Statewide testing of students 1. 23 5 10 Other 5 10 12 25 a/Does not total 100 percent because some States used more than one technique. The distribution of responses shows that some States used a combination of techniques, and most States aggregated local education agency data. Only a small number of States employed statewide testing to evaluate their programs. How- ever, the great majority of those that did test statewide in their program evaluations used standardized norm-referenced tests. Because most States perform their evaluations by aggre- gating and analyzing data from local education agency rports, it is very important that local education agencies use test measures which reflect meaningful and comparative data. Our questionnaire asked local education agencies to indicate which types of tests they used in their 1973-74 evaluation of Federal programs funded through the Elementary and Secondary Education Act, titles I, III, and VII. The following table shows their responses. 44 Types of Tests Used by Local EducaionA7enies ( note a) Title VII Title I Title III (note b) Type of Test Number Percent Number Percent Number PercenT Standardized norm- referenced tests 8,103 94.4 1,380 79.1 357 90.7 Criterion- referenced tests (note c) 2,110 24.6 424 24.3 148 37.5 Other tests 1,168 13.6 515 29.5 91 23.0 a/Local education agency figures are projected on the basis a statistical sample of local agencies. of (See app. II.) The percent columns do not total 100 percent because some local agencies used more than one type of test. b/Title VII provides for direct OE grants to local agencies without State level administration. c/Tests which are designed and scored in relation to specific learning objectives or behaviors and include an explicit statement of performance standards. The local education agency responses show predominant use of standardized norm-referenced tests for program evaluation, but also relatively large use of criterion-referenced tests Frequent use of criterion-referenced tests was also reflected in St;te ducation agencies' responses to a question on the use of statewide assessment programs to achieve duca- tional accountability--an effort separate and distinct from testing used to evaluate Federal programs. Of 47 States responding to this question, 36 said that criterion-referenced tests have been or will be used extensively in that context. EFFORTS TO EVALUATE STANDARDIZED TESTS Since its establishment, NIE's major research effort for improving educational measurement has been an ongoing evalua- tion of commonly used standardized tests. In 1972 NIE 45 assumed from OE the sponsorship of the Center for the Study of Evaluation at the University of California at Los Angeles. The Center extensively assesses published standardized tests and as a primary objective issues reliable guides for use in selecting tests. As of September 1975, the Center had published guides which evaluate preschool and kindergarten, elementary, and secondary school achievement tests. Each guide contains a compendium of tests, keyed to educational objectives and evaluated by measurement experts and educators for such char- acteristics as meaningfulness, examinee appropriateness, administrative useability, and quality of standardization. A Center official said that he believes that commercially available standardized norm-referenced achievement tests are generally inappropriate for measuring program effectiveness. Various Center reports also state this opinion for reasons discussed in chapter 6, such as the tests' low degree of correspondence with actual instructional objectives, their failure to indicate the extent that the full range of in- structional objectives has been mastered, problems in test administration, and the test's lack of information on specific skill and knowledge development. The Center official noted that the tests are especially inappropriate for broad-based intervention programs, such as title I of the Elementary and Secondary Education Act. Since the major purpose of the guides is to provide State and local educators and other test consumers with in- formation that will assist them in selecting the most appro- priate measurement devices for evaluations, our questionnaire included a question relating to the Center's work. We asked State and local education agencies to indicate the degree of familiarity they had with the Center's research on evaluating the utility of many popular commercially available standard- ized norm-referenced tests. Their responses follow: 46 Local State education education agencies agencies Number Responses Number Percent (note a) Percent Have little or no information 6 12 5,059 85 Aware of the Center's work in test evalua- tio'i 14 28 601 10 Read some of the Center's publications 17 34 192 3 Used the Center's material in the selec- tion of commercially available standardized tests 13 26 135 2 Total 50 100 5,987 100 a/Local education agency figures are projected on the basis of a statistical sample of local agencies. (See app. II.) There was general agreement among the educators and test publishers interviewed that the Center's work was a definite improvement over other existing evaluations of stand- ardized tests. Some test publishers, however, criticized the Center's work because it was based on incomplete data, excluding, for example, the highly technical specifications from which test questions are developed. They also criticized it for establishing educational goals on which test ratings were based without consulting publishers, and for relying on the work of graduate students. In response, Center officials indicated that they had sent letters to test publishers advising them of the study and its purpose, stating what information was desired, and asking for any other material the test publishers wished to send. They acknowledged that their curricular goal system is not perfect; however, they believe it is a tremendous step forward, providing a clear statement of expected student behaviors which test users can employ to match tests to cur- riculums. Officials also stated that the system is justified because test makers often measure skills that are different from what they claim to measure. Center officials also noted 47 that all participating graduate students demonstrate adequate competence in research and measurement before being hired; they followed set procedures, were routinely monitored, and discussed questionable points with supervisors. State education agency officials and measurement experts interviewed stated that the Center's work should ,)e continually updated and should include an assessment of available criterion- referenced tests. Commenting on our report, HEW stated that NIE is sponsoring an effort by the Center to prepare a new test evaluation book reviewing all commercially published criterion- referenced achievement tests for grades kindergarten through 12. The target date for publishing this book was June 1977; however, as of August 1, 1977, it had not been published. CRITERION-REFERENCED TESTS Our questionnaire asked State and local education agen- cies to i'icate areas, if any, in which there is a need to increase t.Le educational community's efforts devoted to measurement and assessment techniques. Over 75 percent of the State education agency respondents and about 51 percent of school district respondents indicated "development of alternatives to the classic standardized norm-referenced tests (e.g., criterion-referenced tests)" as an area needing substantial to very great increases. More than half of the State respondents and a third of the local respondents also believe there is a great need to reduce racial, sexual, and cultural biases in tests. The following table shows the percent and number of State and local agency respondents indicating a substantial or very great need to increase the educational community's efforts devoted to measurement and assessment techniques. 48 State agency Local agency responses respons e s Percent Percent Numer (note a) Number (note a) (note b) Development of methods for test design and construc- 52 25 34 2,507 tion Reduction of cultural, racial, and sexual 2,581 biases in tests 51 25 35 Development of alterna- tives to the classic standardized norm- referenced tests (e.g., criterion-referenced 77 37 51 4,061 tests) Development of more and improved standardized norm-referenced tests e 9 27 2,177 Development and utiliza- tion of methods to better evaluate standardized norm-referenced tests in use 40 20 43 3,409 a/Does not add up to 100 percent because more than one item could be checked. The Arcentages reflect only the number of State and local acencies that responded to each suggested item. (See apps. I and II, question 3.) a b/The number of local agencies is projected on the basis of a statistical sample. (See app. II.) NIE is currently funding research intended to meet these needs in the areas of test biases and criterion-referenced measurement. Criterion-referenced tests are designed to remedy some weaknesses in standardized norm-referenced tests (see ch. 6) by (1) being more accurately interpretable, (2) detecting the effects of good instruction, and (3) allowing more ac- curate diagnoses of the individual learner's capabilities. 49 A well-devised criterion-referenced test relates scores to specific learning objectives or behaviors and includes an explicit statement of performance standards. The objectives must be described without ambiguity to permit an accurate description of what an examinee does and does not know or can and cannot do. Criterion-referenced tests pinpoint the student's deficiencies, while a norm-referenced test identi- fies only general student weaknesses. Because criterion-referenced tests are not required to yield large variances in examinee's scores, they can retain questions based on the primary curricular emphasis even if, after instruction, most learners answer them correctly. Con- sequently, criterion-referenced tests are considered more capable of discerning instructional effects than norm- referenced tests. An increasing number of educators have begun to question the use of standardized norm-referenced tests and to propose criterion-referenced tests as an alternative. Some people assume that the latter are fully developed and ready to use. According to experts, however, the technical status of criterion-referenced measurement is far less advanced than many educators and others believe it to be. An expert in criterion-referenced testing from the University of Michigan has pointed out that producing test questions that can be defended as valid and fair for both the majority of examinees and various minority groups is a major problem that affects the development of both standardized norm-referenced and criterion-refer-enced tests. Another expert in criterion-referenced testing from the California Test Bureau of McGraw-Hill has stated that care- fully constructed criterion-referenced tests can provide both diagnostic and evaluative information that is appropriate not only for disadvantaged students but also for all sdents, assuming some consensus can be obtained on instructional objectives to be tested. Therefore, such tests could discover real educational problems and indicate appropriate remedial help for students, rather than simply showing that the stu- dents are "below grade level" on a general test of reading or mathematics. He addeJ, however, that constructing such criterion-referenced tests is not a simple matter. More knowledge is needed about the structure of subject matter than now exists. Nevertheless, successful statewide assess- ments and evaluations have been carried out using only criterion-referenced tests. 50 This testing expert believes that in the future it will using be possible to evaluate basic skills on a large scale know- appropriate criterion-referenced tests when sufficient and ledge has been acquired to specify the important skills areas. For subskills required to assure competence in these problems this to occur, an understanding of the particular as the fa:ing disadvantaged and minority students as well will basic logical and cognitive structure of disciplines these be needed. In his opinion, progress is being made on is problems, but as yet no widespread consensus of what important has emerged. Until it does, no national criterion- referenced evaluations seem likely. Commenting on our report, HEW stated that, basedStudy of on the test evaluation being made by the CenterLosfor Angeles Evaluation at the University of California at many commercially published criterion-referenced tests, likefor norm-referenced tests, ere generally unsatisfactory program evaluation. CONCLUSIONS Federal, State, and local education agencies frequently measure the effec- use standardized norm-referenced tests to The tiveness of Federal education programs. next chapter the tests' problems, provides further discussion of some of needed on: may be including test biases. Additional research -- Criterion-referenced tests and other alternatives to standardized norm-referenced achievement tests, for uses which include program evaluation. -- How to reduce racial, sexual, and cultural biases in standardized tests. There is also a need to increase State and especially local education agency awareness of available NIE-funded appro- information that is intended to help select the most priate tests for use in evaluation. RECOMMENDATIONS TO THE SECRETARY OF HEW We recommend that the Secretary direct NIE to: -- Consider the need for funding additional research and in the future on (1) criterion-referenced tests other alternatives to standardized norm-referenced achievement tests for uses that include program evaluation and (2) the nature and extent of racial, 51 sexual, and cultural biases in standardized tests and how such biases may be reduced. -- Improve dissemination of available NIE-funded informa- tion, which is intended to help select the most ap- propriate standardized tests, thereby increasing State and local education agency officials' awareness and use of this information. AGENCY COMMENTS AND OUR EVALUATION HEW agreed with the above recommendations. Regarding our recommendation that NIE consider the need for funding additional testing research, HEW described efforts currently underway and stated that more emphasis will be given to these programs in fiscal years 1977-79 if NIE's appropriation levels permit. However, HEW also stated that it is not clear whether we believe NIE deserves additional appropriations for such an effort since NIE cannot divert substantial funds from its pre- sent budget. It is not our intention to call for either addi- tional appropriations or redirection of NIE's present budget. Our recommendation calls on NIE to consider funding additional research, as needed to address the problems discussed, that could begin when research efforts currently underway are com- pleted. Regarding our recommendation that NIE improve dissemina- tion of its materials designed to help select standardized tests, HEW stated that NIE intends to make school personnel familiar with these materials through several dissemination approaches. These include: -- Listing the test evaluation consumer guides in a catalog of NIE products sent to school superintendents and district curriculum directors and which will now be sent to school district evaluation directors. -- Using the new "Lab and Center R&D Exchange," which HEW believes will possibly reach 50 percent of the country's school systems. -- Using the dissemination network formed by NIE's seven research and development utilization contractors. 52 CHAPTER 6 STANDARDIZED TESTS AND PROGRAM EVALUATION Serious questions have been raised about standardized norm-referenced achievement tests in spite of their wide- spread use. Based on the views of test and evaluatici experts and others, this chapter provides information concerning: (1) criticism and defense of these tests and (2) suggestions for alleviating problems in conducting large-scale evaluations of compensatory education and desegregation programs. CRITICISM OF STANDARDIZED NORM-REFERENCED TESTS The content and use of standardized norm-referenced achievement tests have been widely criticized by testing experts, educators, and others. The National Education Association, the National Association for the Advarcenent of Colored People, and others have called for morato; iums on using standardized tests. 1/ The National Association of Elementary Schoc! Principals convened the appointed representatives of 25 national educa- tional associations, government agencies, and educatic; groups in November 1975 to explore the implications of the widespread use of standardized achievement tests. They recommended that the educators give higher priority to developing and using new assessment processes that are more fair and effec- tive than those currently in use and that educators more adequately consider the diverse talents and cultural back- grounds of children. Criticism of norm-referenced tests Critics of the standardized norm-referenced achievement tests believe test bias and score interpretation, as well as other problems, are some of the test's deficiencies. 1/Robert L. Williams, et al., "Critical Issues in Achievement Testing of Children from Diverse Ethnic Backgrounds," pre- pared for the Office of Education's Conference on "Achieve- ment Testing of Disadvantaged and Minority Students For Educational Program Evaluation," May 27-30, 1976, pp. 8-12. 53 Test bias A frequent criticism is that the tests discriminate un- fairly against racial and cultural minorities because (1) their norms continue to be based on populations not repre- sentative of a pluralistic, multicultural society and (2) test questions ask for knowledge most familiar to the white middle class or reflect the cultural biases of test developers and question writers, who mostly represent the white middle class. Therefore, the tests are considered biased against the poor, black, Hispanic, and other minority Americans. Although some test publishers have attempted to minimize cultural and racial biases, these problems have not been solved. Score interpretation Some critics cite certain weaknesses in the interpre- tation of test scores, as follows: -- Raw scores (scores based on the total number of correct answers on a test) are typically interpreted in terms of national norms, which are estimates of nationwide performance. The national norms are derived from giv- ing the test to what is intended to be a representa- tive sample of students. But since the samples are different and taken at different times, the norms for different tests vary. As a result, the normed score for a student dependL partly on which test the student takes. 1/ -- The grade equivalent score represents the estimated average score that pupils in that month of that grade would achieve on the test nationwide. For example, a 3.8 in reading is the average score for a child in the eighth month of the third grade. A frequent misconception is that te score means the child has mastered the standard curriculum up to that point in schooling. Even if the 3.8 grade-equivalent were always an acc"rate estimate of the average achieve- ment for the child in the eighth month of the third l/George Weber, "Uses and Abuses of Standardized Testing in the Schools," Council for Basic Education, May 1974, pp. 13-14. 54 grade, that average child has not necessarily "mastered" the reading curriculum to that point, although parents, and even teachers usually do not understand this. Another problem is that on some tests a few answers one way or the other can make as much as a whole year's difference in the grade-equivalent score, and the tests are simply not that accurate. Despite these and other shortcomings, grade-equivalent scores are usually used in interpreting the results of standardized achievement tests. 1/ 2/ -- Test score- .re meaningful only in terms of national average achievement, and do not indicate whether this is good, bad, or indifferent in terms of "reasonable" standards defined independently of such average scores. For example, if a given third grade class does as well on a given reading test as the national third grade average, this does not reveal how well the children can read in absolute terms. According to this view- point, since reading achievement in the primary grades is generally below what could reasonably be accom- plished, reading scores suggest a better achievement than is in fact the case. 3/ Moreover, critics say t'at norm-referenced tests result in half of the students being above the norm and half below, as a statistical fact of life. Then how, they ask, does one "raise scores to the general norm?" 4/ l/Ibid., pp. 14 and 16. 2/Ralph Tyler, "Discussion of Hoepfner's Paper on Achieve- ment Test Selection for Program Evaluation," prepared for the Office of Education's conference on "Achievement Test- ing of Disadvantaged and Minority Students For Educational Program Evaluation," May 27-30, 1976, p. 7. 3/Weber, p. 20. 4/Miriam Clasby, et al., "Laws, Tests, and Schooling: Changing Contexts for Educational Decision-Making," RR-11, Educational Policy Researcih Center, Syracuse University Research Corporation, Syraculse, N.Y. Oct. 1973, p. 174. 55 Other prorem :ecting test use Regarding the use of standardized norm-referenced tests in prograin evaluation, critics state that the tests are inadequate for eval.atig program effectiveness and cite tne following deficiencies. Test results not specific--Normative test scores might be very useful for program evaluation if one knew what they meant. But the test results do not indicate specifically what students have learned. 1/ The test scores reflect the number of correct answers students give, but do not indicate how well students achieve intended educational objectives or whether they answer particular questions correctly. 2/ 3/ Therefore, the test results are too general to provide specific guidance for improving the quality of schooling. 4/ T-sts do not coincide with instrutional-objectives--The tests content often has a low degree of correspondence wni actual instructional objectives at any given time or place. This is a serious deficiency because one cannot determine the effectiveness of an educational program unless the tests used actually measure the objectives that the teacher, teachers, or school district is attempting to accomplish over l/Stephen Klien, "Evaluating Tests in Terms of the Informa- tion They Provide," Evaluation Comment, Vol. 2, June 1970, Center for the Study of Evaluation, University of California at Los Angeles, p. 2. 2/National Assessment of Educational Progress, "General Infor- mation Yearbook," Dec. 1974, Rept. No. 03/04-GIY, pp. 1, 3, and 4. 3/Carmen J. F..ley, "Not Just Another Standardized Test," Compact Vol. 6, No. 1, Feb. 1972, Education Commission f the States, pp. 10 and 11. 4/W. James Popham, "Appropriate Assessment Devices for Educa- tional Evaluation," presented at the National Forum on Ed- ucational Accountability in Denver sponsored by the Office of Education and the Cooperative Accountability Project, May 8-9, 1975, pp. 3 and 4. 56 a defined period of time. 1/ As a major test publisher 2/ and others have pointed out, it is only to the extent that a program's instructional objectives coincide with those of the test that the instrument is valid for measuring how well the learning program has succeeded. If the test fails to measure certain objectives included in the learning program and/or measures other objectives that are not part of that program, to that extent the test is not a valid measure of success in the program. In addition, test publishers describe their standardized norm-referenced tests in very general terms, calling them, for example, tests of reading comprehension. The generality of these descriptions increases the possibility of unrecog- nized differences between what the schools teach and what the tests specifically measure. According to this view, such differences result in misleading data and false conclusions about program effectiveness. 3/ Tests are designed to differentiate students, not diag- nose specificproblems--Since the tests are intended to com- pare examinees, they must yield a reasonably large degree of "response variance"--different scores for different examinees. Test questions that are answered correctly y half the eaminees maximize a test's response variance. If a test question is answered correctly by a large or increas- ing proportion of examinees, it tends to be removed from the test or modified. Thus, as norm-referenced tests are periodically revised, questions on which examinees perform well are systematically eliminated. Yet, test critics main- tain that such questions often deal with the very concepts teachers thought important enough to emphasize in their in- struction. If a concept is taught well, questions measuring it will likely be removed in the next test revision. The l/Rodney Skager, "The System for Objectives-Based Evalu- ation-Reading," Evaluation Comment Vol. 3, No. 1, Sept. 1971, Center for the Study of Evaluation, University of California at Los Angeles, p. 6. 2/J. Wayne Wrightstone, et. a., "Accountability in Educa- tion and Associated Measurement Problems," Test Service Notebook 33, Issued by the Test Department, Harcourt Brace Jovanovich, Inc., New York, p. 4. 3/Popham, p. 3. 57 result is that (1) the tests are particularly insensitive detecting the effects of instruction 1/ 2/ and (2) to sometimes the tests do not contain test questions dealing with central concepts in the field. These are serious the for a test used for program evaluation. 1/ 3/ deficiencies Teaching the test--According to some critics, the standardized achievement tests for program evaluation use of "accountability" has led and will lead to corruption and honesty among educational professionals and to the and dis- further erosion of public trust in the schools and the people who run them. Faced with public pressure that is often in "irrational and destructive," it is all too easy foritself princi- pals and teachers to respond to subtle pressure to "prepare students for the assessment" by teaching students responses to specific test questions rather than by developing underlying skills which these questions reflect. the Standardized tests are readily available at all levels of any school tem, and are brief enough to be highly susceptible sys- ing. Test security and control may be feasible for to coach- programs like the college boards or the American College Testing gram, in which representatives of the testing agency Pro- the assessment and the examinees come to a central handle location. Similar controls are not feasible in large-scale evaluations of school programs. 4/ nappropriate norms--In the typical evaluation, the focus is on performance oflarge-scale program groups by stu- dents--categorized by classes, buildings, or school not by individuals. The reference norms one needs systems-- for such purposes are distributions of averages for appropriate ence schools, not norms for individuals; and these refer- types of 1/Ibid., pp. 4 and 5. 2/Richard M. Jeger, "A Discussion of Classical Test Develop- ment Solutions," prepared for the Office of Education's ference on "Achievement Testing of Disadvantaged con- Students For Educational Program Evaluation," May and Minority 27-30, 1976, p. 6. 3/W. James Popham, Statement presented at U.S. House of Repre- sentative hearings on the Elementary and Secondary Act, March 28, 1973, pp. 2323 and 2324. Education 4/See Skager, p. 7. 58 norms are seldom available unless they ere collected as part of the evaluation study itself. 1/ Measurement of growth A major test publishfng company has described 2/ various technical problems with the tests that are related to their use in measuring academic growth, as contrasted against traditional use for measuring present status. This their is sig- nificant, because according to the company most educational program evaluations have involved using nationally stand- ardized norm-referenced achievement tests--especially measuring "growth." Besides the problems related for to using tests to measure present status--such as selecting a test that measures what the user intends, assuring that teachers directions, and the like--the test publisher noted follow that the tests are used to measure growth they are attended when special problems, such as the following: by -- Defining "normal growth." There are serious questions about the legitimacy of defining normal growth in of grade-equivalent scores. However, expected or terms mal gain is almost universally defined in terms of nor- equivalents for standardized achievement tests used grade at the elementary level. -- Interpolating or estimating norms so that they may be applied to tests taken at times during the year for which norms have not been empirically determined. These estimates are almost certainly in error by some small amount in most cases and by a substantial amount in some cases. -- Converting scores from different levels and alter- native forms of a standardized test series so that the scores are equivalent. If they are not equivalent, this can lead to invalid measurement of gains and possibly erroneous conclusions as to the merit of the program evaluated. 1/William E. Coffman, "Classical Test Development prepared for the Office of Education's conferenceSolutions," "Achievement Testing of Disadvantaged and Minority on Students For Educational Program Evaluation," May 27-30, 1976, p. 24. 2/Wrightstone, et. al., pp. 5-12. 59 The above criticisms of standardized norm-referenced tests indicate that the tests may be inappropriate for wide-scale use in evaluating educational programs. Why are standardized tests used? Considering all these criticisms, why are the tests used for Federal program evaluations? A former OE evaluation of- ficial explained that standardized achievement tests are used in educational program evaluation "for many good and not-so- good reasons" such as the following: -- Since many standardized achievement tests or subtests were developed primarily for basic skill performance measurement, they become prime candidates for evaluat- ing programs that seek to improve basic skills. -- Such tests are readily available in large quantities, at short notice, and at relatively low cost. If off-the-shelf tests were not available, the cost of developing and standardizing such measures for a specific evaluation might be prohibitive and there- fore might cause abandoning evaluation plans. Other, more technical reasons given by this former OE official for the widespread use of standardized achievement tests in education program evaluation include their general technical excellence, their standardized administration pro- cedures, the representativeness of their questions to the possible universe of questions on basic skill performance, their normative reference, ease in scoring, alternative and equated test forms, high reliabilities, and apparent validity. Another factor, he stated, is that most achieve- ment tests are part of a battery of tests designed so that student growth can be measured as the student progresses through school by administering different test levels and forms. 1/ 1/Michael J. Wargo, "An Evaluator's Perspective," prepared for the Office of Education's conference on "Achievement Testing of Disadvantaged and Minority Students For Educa- tional Program Evaluation," May 27-30, 1976, pp. 19-21. 60 DEFENSE OF STANDARDIZED TESTS Defenders of the tests say that there is no convenient alternative for those outside the schools to evaluate students' collective achievements. Therefore, in their view, despite the tests' shortcomings and abuses, they provide the best in- formation available. According to OE's Assistant Commissioner for Planning, Budgeting, and Evaluation, it is true that standardized tests do not allow for program and project differ- ences, but the Congress wants to know if the overall program is effective, and he believes the tests provide this informa- tion acceptably. Defenders of standardized tests tend to emphasize test misuse as the major problem, and they express the need for training teachers and administrators in proper test selec- tion, administration, and interpretation. Many testing experts admit that the tests have deficiencies--such as test and test question bias--but state that wholesale rejection of the tests and their norm-referenced interpretations is unwarranted. Instead, they favor refining the tests and learning to avoid pitfalls, such as lack of congruence among (1) test content, (2) course content or curricular emphasis, and (3) the purpose and design of the evaluation. Some state that progress has been made in the last 10 to 15 years in such areas as -- constructing efficient tests that reliably measure important educational skills, -- developing nationally representative norms, and -- providing test users with relevant information about the test areas. / 2/ l/or C. Bianchini, "Achievement Tests and Differentiated No;.i.s," prepared for the Office of Education's conference on "Achievement Testing of Disadvantaged and Minority Students For Educational Program Evaluation," May 27-30, 1976, pp. 36-37. 2/Ralph Hoepfner, "Achievement Test Selection For Progran. Evaluation," prepared for the same conference, p. 2. 61 One expert from Systems Development Corporation, for example, stated that the quality, forthrightness, and focus of standardized achievement tests have improved remarkably, particularly within the last decade, and that many well-aimed attacks on standardized achievement tests made about 10 years ago are no longer valid. 1/ This expert, who has considerable experience in evaluating test adequacy, stated that his view of available criterion-referenced tests--the most re- alternative--found them to be of uniformly bad quality. likely An expert from RMC Research Corporation said that the standardized tests are not the problem. In his view, large-scale national evaluations as well as most local OE's evalua- tions have been poorly designed and poorly done, and the tests have often been misused by evaluators. He said that he recently spent about 2 weeks in every State working on title I, dividing his time equally between the State education agencies and some local education agencies in each State. Based on this experience, he believes that the great major- ity of local title I projects is not providing students with educational treatments that differ in any significant way from regular classes. In such circumstances, the general lack of evidence of marked improvement in basic skills should not be too surprising. Suggested improvements Included among suggestions for improvement offered by test defenders are the following: -- Test publishers, in developing standardized tests of basic skills, should break away--at least in the elementary grades--from the current practice of de- signing tests for measuring achievement at multiple grade levels. Test publishers should develop series of tests, each designed for a specific grade, with sufficient numbers of questions at various difficulty levels to yield reliable measurement for essentially all students at that grade. -- Test publishers should provide more detailed informa- tion about the content of test questions, the instruc- tional objectives on which questions are based, and the skill characteristics needed to answer them to 1/Ibid., p. 2. 62 provide test users with a general framework for assessing the logical congruence between the test content and the content of the curriculum. In one expert's view, such a detailed classification of individual questions is as important to the test user as is the extensive statistical data currently provided about the mental measurement characteristics of the questions. -- Test publishers should expand the services they pro- vide their clients to include developing special norms when they would produce more appropriate use of test results. Test publishers need to be more active in assuring that their tests and subsequent test results are used fairly and effectively. -- The state of the art should be extended to provide test users with practical procedures to assist them in selecting tests and relating test results to instructional programs and program evaluation. -- Program evaluators should recognize that the process for selecting appropriate standardized tests for evaluation must go beyond a naive inspection of the test and normative data. The process ought to include a careful inquiry into such elements as relevant stu- dent and school characteristics, test content and its relationship to curriculum content, and the ade- quacy of normative data in relation to the evaluation design. / Evaluators should select tests which maximize coverage of the objectives desired. 2/ CONFERENCE ON USING TEST- TO EVALUATE PROGRAMS In May 1976 OE sponsored a special four-day conference on "Achievement Testing of Disadvantaged and Minority Students for Educational Program Evaluation." OE invited about 50 experts in testing, program evaluation, and related fields including university and other researchers, and representa- tives of leading test publishers. Federal and local educa- tion agencies, and education and other interest groups were also represented. The conference focus was on large-scale 1/Bianchini, pp. 36-38. 2/Hoepfner, F, 33. 63 program evaluations of elementary and secondary school compensatory education and desegregation programs--programs on which OE concentrates most of its effort. The conference was to identify, define, and analyze purpose of the the many problems associated with using standardized achievement tests in the context of large-scale evaluations of these pro- grams and to develop interim and long-term solutions those problems. to Conference participants mentioned many of the same problems and issues discussed previously in this Five small working groups were formed at the closechapter. of the conference to write recommended solutions. All five groups recommended (1) either limiting or ceasing large-scale Fed- eral education program evaluations like those contracted for by OE's Office of Planning, Budgeting, and Evaluation and/or (2) placing greater emphasis on local evaluations. The value of conducting large-scale evaluations questioned because of problems such as the was current state of the art in evaluation; inherent bias problems in data collection instruments, methods, and analysis procedures; and community differences in the education programs being evaluated. Suggestions for increased emphasis on local level evaluations included providing funding for adequate local evaluations, training local evaluators, and tech- nical assistance in designing and implementing technically sound evaluations. Related conclusions and recommendations included the following: 1. The Federal Government's basic policy for evaluating its education programs for culturally different stu- dents should require that each local education agency carry out evaluation studies designed to assess how well its local project objectives have been achieved. Also, approved local agency budgets should include sufficient funds to provide for adequate evaluation study design, data collection, analysis, and reporting. The studies should involve at ail stages the partici- pation of members of the minority culture or cultures involved. Beyond this, Federal responsibility should be limited to (a) conducting and publicizing the re- sults of audits that determine whether funds were used as intended and whether evaluation data relevant to program objectives were collected, analyzed and re- ported; (b) providing general guidelines and training 64 for evaluation, and encouraging the development of guidelines and consulting resources by State agencies and Federal regional offices; and (c) developing summary reports based on the aggregation of information from local evaluations. 2. To resolve Federal program evaluation problems: -- The funds and activities devoted to larre-scale evaluation should be immediately rechanneled into development of program evaluation methods and tools that are likely to be more productive, while affording safeguards for recipient populations. -- Studies should be initiated to explore whether it is feasible to draw overall program impact conclusions based on aggregations. Valid, locally relevant project evaluations should also be started. -- Immediate congressional needs for program impact information should be satisfied through careful and extensive analysis in phases of data already collected and data currently being gathered under contract. -- Model local project evaluations should be funded as demonstrations, with support from experts funded by OE's Office of Planning, Budgeting, and Evaluation. This support should be provided to local school systems on a cooperative basis. Local evaluation personnel should be trained. The most effective local evaluation strategies demonstrated should be adopted in an ever-widening pattern, to build a basis for effective national program evalua- tion by aggregating valid local project evaluations. -- Studies should be funded on tools and procedures needed to make local evaluations that will reflect valid conclusions on the worth of specific projects and will adequately identify the processes and in- puts of those projects. This includes holding a workshop to identify needed tools, methods, and priorities. -- Studies should be funded on OE and State education agency regulations, guidelines, and administrative decisions that affect the quality of local education agency evaluation activities and reports. These 65 studies should produce model administrative regulatory strategies, including incentives, and upgrading the quality and validity of for local educa- tion agency evaluations of federally supported projects. 3. One reason for greater emphasis on is that instructional treatments arelocal evaluations not uniform under nationwide programs. The Emergency School Aid Act program, for example, does not provide ments for students. Neither do title uniform treat- I or title VII programs. But there are various identifiable, scribable, instructional treatments de- programs and a small number of them funded by these are effective. However, information about these effective treatments is lost in large-scale evaluations that cover a large number of ineffective treatments. 4. Guidelines for local evaluation studies should include the following: -- Recommendations that each evaluation report include a description of what actually happened to pupils involved in the program. Without such information it is impossible to reach a meaningful interpre- tation of any measures of change. -- Encouragement to local projects to collect evidence of progress toward improved skills in reading and mathematics and obtain data regarding other outcomes of the particular methods employed, particularly on developing self-concept and interpersonal rela- tions. -- Encouragements to local education agencies to in- clude in their evaluation procedures systematic attention to the selection or development niques designed to minimize cultural of tech- and other data-collecting procedures bias in tests used in evaLua- tion. 5. Money saved by curtailing large-scale evaluations should support a national panel responsible ploring and developing more responsive for ex- and effec- tive alternative program evaluation models. 66 6. Research into alternative evaluation models should also be conducted to determine a) the evaluation information the Congress, OE, other Federal agencies, and local agencies receiving Federal funds need and (b) whether adequate evaluation designs can be effectively implemented to meet these needs, and their cost. 7. Because of conflicts in the interpretation of off- the-shelf commercial standardized tests, only custom- designed tests should be used for federally sponsored large-scale program evaluation, when such large-scale evaluations are essential. The associated cost and effort to define program objectives and define mea- sures of their effect should be part of the Federal agency's responsibility along with survey and analysis costs. Other small working group conclusions and recommendations are shown in appendix IV. In response to the conferee's recommendations, OE's Assistant Commissioner for Planning, Budgeting, and Evaluation said he agrees that there is a need for increased evaluation activity and capability at the State and local levels. He disagreed, however, that OE's nationally planned evaluations should be deemphasized. CONCLUSIONS As noted in chapter 5, Federal, State, and local educa- tion agencies frequently use standardized norm-referenced achievement tests to measure the effect of Federal education programs. However, there is a great deal of disagreement among testing experts and educators on the adequacy of these tests for their intended uses. Serious questions have been raised about the tests, and some organizations have called for a moratorium on their use and a higher priority on de- velopment and use of alternatives. Although the tests' critics and defenders agree that certain problems exist, views differ greatly about the im- portance or severity of the problems and their remedies. Those who defend the tests and their continued use recognize that improvements are needed on such issues as test and test question bias; appropriate test norms; test selection, in- terpretation, and administration, including uses for program evaluation; and other issues. 67 Some questions raised about the adequacy of the tests have great importance in determining the appropriateness, validity, and proper conduct of educational program evalua- tions, including those that are federally funded. Decision- makers should be aware of these issues when using such information. 68 CHAPTER 7 PREFERENCES FOR EVIDENCE OF PROGRAM EFFECTIVENESS DIFFER Responses to our questionnaire showed that State and local education agency officials responsible for administer- ing Federal title I, III, and VII elementary and secondary education programs perceive important differences in the type of evidence of program effectiveness that Federal, State, and local officials refer. -- Local education agency officials believe that State and OE officials are predominantly interested in standardized norm-referenced test scores for demons- trating program results. Local officials themselves prefer broader, more diverse types of information that only these test scores. -- State education agency officials prefer criterion- referenced tests. They favor less emphasis on stand- ardized norm-referenced test results as evidence of program effectiveness than they believe OE officials want, but prefer them more than local agency officials do. OE's Assistant Commissioner for Planning, Budgeting, and Evaluation believes that hard objective data on students' cognitive improvements is the centrally important informa- tion needed. He noted that this means gain scores on stand- ardized norm-referenced achievement tests because these are most available. As noted in chapter 5, State and local evaluations for titles I, III, and VII have been most often based on stand- ardized norm-referenced tests. Therefore, evaluations usually have not reflected the kinds of results that State and local officials themselves prefer, but rather those that they believe would be likely to most impress higher level officials. 69 STATE AND LOCAL EDUCATION OFFICIALS BELIEVE ANDARDIZED TESTY RESULTS MOST IMPRESS HIGHER LEVEL OFFICIALS Legislative and other requirements for annual evalua- tions at the State and local levels of titles I, III, and VII of the Elementary and Secondary Education Act have several purposes, according to OE's Assistant Commissioner for Planning, Budgeting, and Evaluation. These include -- reporting on local project effectiveness or providing other data, on which local, State, and Federal of- ficials can base programmatic, financial, and policy decisions; and -- providing data to State and/or Federal officials to select successful and exemplary projects for dis- semination. Educators must increasingly evaluate their instructional endeavors and present evidence which will permit others to judge their effectiveness. Through our questionnaire we have attempted to ascertain (1) what kinds of evidence of program effectiveness local officials administeri.g federally supported elementary nd secondary education programs think State and OE program otficers expect, (2) what kinds of evidence of ef- fectiveness State officials administering the programs think OE program officers expect, and (3) what types of evidence of effectiveness the State and local officials think is most useful to them. Our questionnaire asked -- State program officials to rate, on a seven point scale, the types of information or findings which most impress them and, they believe, OE officials, as demonstrating program results and -- local project officials to rate on the same seven point scale the types of information or findings which most impress them and, they believe, OE or State of- ficials as demonstrating program results. As noted in chapter 5 (see op. 43 to 4 4 ), only a few States employed statewide testing to evaluate their title I and III programs, and the great ma4ority of these States used standardized norm-referenced te. 3 Most State evalua- tion reports for titles I and III were agregations of local education agency evaluition data. As shor on page 45, 70 based on our qu stionnaire sample responses, local title III, and VII projects indicated overwhelming use of stand-I, ardized norm-referenced tests--far greater than their of criterion-referenced tests and all other tests. use State officials' views The demonstration of program results should be an important factor in deciding to continue OE program funding. As the table below shows, State title I and III of- ficials agreed on the types of program results they think are likely to impress OE officials most in this regard. State officials overwhelmingly perceive OE officials to be most impressed by results obtained from standardized norm-referenced tests. Eighty-two percent of the State title I and 65 percent of the State title III officials responding ranked this category first. (See app. I for details.) State title I and III officials ranked results on improvements in educational management and accountability, as well as findings obtained from criterion-referenced as either the second or third most impressive data for tests, OE officials. Titles I and III State Officials' Per-eptions of -h-Tmpresse OEfficias an-Wh- E mes_- tate or ic i (noe a) Impresses OE Impresses sell TTT TTTTT_ -T Ty-TTe TTTn - 1. Improvement in educational management or accountability 3.1 3.9 4.0 4.5 2. Improvement in school services or facilities 6.0 6.1 6.1 3. Student improvements through gain scores or grades on teacher ratings 4.1 4.6 4.3 5.1 4. Student improvement throu-h gain scores on standardize_ noim-referenced tests 1.3 1.6 2.5 3.3 5. Student improvement through gain scores on -riter-ion- refer enccd tests 3.2 2.7 .4 2.4 6. Student improvement through gains in the affective domain (e.g., likes, dislikes, inter- ests, attitudes, motives, etc.) 5.3 4.4 3.7 3.1 7. Improvements in curriculum and instructional methods 4.5 4.5 3.8 3.4 a/The numbers shown are the rspon'-cLs' average rankings of the alter- nati"-s given. Each respondo-nt was asked to give type of resulL a ranking of "'1" and t:le next most the most impressive impressive "2," etc. 71 State officials themselves appear to be impressed by slightly different types of data and information on program results. The table above shows that officials from titles I and III would be most impressed by gain scores on criterion- referenced tests. Although State title I officials ranked gain scores from norm-referenced tests almost equal to those from criterion-referenced tests, the title II: respondents consider gains in the affective domain (for e:ample, likes, dislikes, interests, attitudes, motives) as ':. second most important result, followed by gain scores LOin norm-referenced tests. State title officials ranked gains in the affective domain third. Title I and III officials agre·ed on the remain- ing categories. In general, State officials believe objective test re- sults of program performance are most likely to impress both OE and State program officers. However, State respondents see themselves as more likely than OE program officials to be impressed by criterion-referenced test results and other factors cited above. State officials se- themselves as more open than OE officials to various types of information demons- trating program results. At the same time, they agree with what they see as OE officials' view that impact evaluations based on test scores are the most impressive findings. Emphasis on test scores for evidence of program effective- ness continues at the State and national levels. Reasons may include the following: -- Legislators and Federal and State officials are demand- ing evidence that the infusion of Federal and State dollars for special rograms works. -- Test scores have traditionally been the only measure of effectiveness. -- Few alternatives to test scores exist and those that are available are unproven and not as widely accepted. Local officials' views The local title I and III respondents' perspectives are ve,:y nearly alike on what evidence of program results impresses State and OE officials. Title VII respondents' perspective is somewhat difiernt; however, this may be at least partly be uise local title VII projects are responsible directly to OE program officers and not to State program officers. 72 The table below shows that local title I, III, and VII project officials feel that OE or State program officers are most impressed by norm-referenced test results. Local Title I, III, and VII Officials' Perceptions of What Impresse's-3tae or O offTicials an What impreses Local fficials (note a) Tit sses Impresses OE'or State mp r Impresses e sef seif mle F rtie Title Title Title Title I III VII I III VII 1. Improvement in educa- tional management or accountability 3.9 3.8 3.8 4.7 4.5 4.4 2. Improvement in school services or facilities 5.2 5.1 4.8 49 4.9 4.8 3. Student improvements through gain scores or grades on teacher ratings 4.3 4.3 5.1 4.0 4.3 4.4 4. Student improvement through gain scores on standardized norm- referenced tests 2.2 2.6 2.1 3.7 3.7 3.4 5. Student improvement through gain scores or, criterion-referenced tests 3.1 3.5 3.9 3.4 3.7 4.0 6. Student improvement through gains in the affective domair (e.g., likes, :islikes, interests, attitudes, motives, etc.) 4.6 4.2 4.4 3.3 3.3 3.4 7. Improvements in curriculum and instructional methods 4.2 3.3 3.7 3.2 3.0 3.2 a/The numbers shown are the respondents' average rankings of the given alternatives. Each respondent was asked to give the most impressive type of result a ranking of "1" and the next most impressive "2," etc. Local title I an. III project officials ranked criterion- referenced test results as the second most impressive data for OE or State officials, followed closely by improvements in educational management and accountability, and improvements 73 in curriculum and instructional methods. Local title VII proj- ect officials, however, considered improvements in curriculum and instructional methods the second most impressive program result to OE, followed by improvements in educational manage- ment and accountability. Program results from criterion- referenced tests were ranked fourth. Program results that impress local project officials are different from what they believe impresses OE or State program officers. Officials from all three local project types ranked improvement in curriculum and instructional methods as the most impressive program result. Local title I, III, and VII project officials consider gains in the affective domain the second most impressive program result, but with title VII, gain scores on norm-referenced tests also received the same ranking. Concerning these results, (1) local title I project officials appear to consider results from criterion-referenced tests more impressive than norm-referenced test results, (2) local title III officials consider them equially impres- sive, and (3) local title VII officials clearly prefer norm- to criterion-referenced test results. In all cases, however, test results are not the most impressive program result to local project c ficials. Since local project officials perceive results from improvements in curriculum and instructional methods and improvements in the affective domain as more meaningful to them th3n to State or OE officials, the extent to which such results are excluded from evaluations will probably reduce the adequacy of the evaluations and perhaps make them less useful to local officials. Generally, the degree to which local perceptions of what will most impress State or OE of- ficials causes evaluations to emphasize test results, and educational management and accountability will also probably affect the adequacy and perhaps the usefulness of evaluations at the local level. As wiz the State officials' perception of OE, local project officials generally believe OE or State program of- ficers are most impressed by student outcome measures of program effectiveness, such as norm- and criterion-referenced tests. Local project officials themselves are considerably leis impressed by these measures. hey are more impressed by a variety of measures, but do not believe OE or State program officers fully share this interest. 74 Correspondingly, 51 percent of the local agency respond- ents and 77 percent of State agency respondents to our ques- tionnaire stated that the educational community needs to greatly increase efforts to develop alternatives to the classic standardized norm-referenced tests. Certain differences in questionnaire ratings among Fed- eral, State, and local projects may be because of differing project objectives and priorities. However, local officials clearly do not regard test scores as the sole or most impres- sive criterion for determining program effectiveness. This may indicate a growing disenchantment with using standardized n, rm-referenced tests as program evaluation tools. Lccal of- ficials are apparently interested in knowing how the projects as a whole are functioning and may see test measures as only one of several factors to be used in assessing individual and project performance. CONCLUSIONS Local and State evaluation reports on Federal elementary and secondary education program effectivent.s are intended to provide information on which local, State, and Federal offi- cials can base policy and program decisions. However, our questionnaire results show that State and local officials see important differences in the types of evidence of program effectiveness that they themselves and officials at other levels--Federal, State, and local--prefer. Therefore, better communication is needed among the three levels about the in- formation they need to facilitate policy and program decisions at each level. The questionnaire results also raise this question: Should all three levels be served by a reporting system based on the same reports? Plthough State officials view OE program officials as being most impressed by standardized norm-referenced test results, and local officials view State and OE officials in the same manner, State and local officials say that they are not most impressed by such results. Local officials prefer broader, mere diverse information on program results than just these test scores and they are most impressed by improve- ments in curriculum and instructional methods and gains in the affective domain (likes, dislikes, interests, attitudes, motives, etc.). State officials are most impressed by results from criterion-referenced tests. OE's Assistant Commissioner for Planning, Budgeting, and Evaluation th ieves that hard, objective data on students' cognitive improvements is the most important information 75 needed. HE noted that this means gain scores on standardized norm-referenced achievement ests because these are most available. The widespread use of standardized norm-referenced tests to evaluate State and local title I, III, and VII programs indicates that State and local officials have more frequently based their evaluations on the kinds of results they believe would be likely to most impress higher level officials than on their own preferences. RECOMMENDATION TO THE SECRETARY OF HEW In connection with the assessment recommended in chapter 4, we recommend that the Secretary direct OE to review the types of State and/or local program evaluation information collected (or planned to be collected) on programs authorized by titles I and VII of the Elementary and Secondary Education Act. The review should include an assessment of the informa- tion's usefulness at each level and should determine: -- Whether it is realistic to attempt to serve Federal, State, and local levels with aggregated data based on local agency evaluation reports. -- How the information needs at the local, State, and Federal levels can best be met. -- Whether unnecessary duplication exists or will exist in meeting Federal information requirements through the State and local reporting systems as well as through OE national evaluations on these programs, and if so, how it should be eliminated. During this review process, to define the evaluation infor- mation needed at State and local levels, OE should seek the views and cooperation of the State and local officials who are intended to use the results. 1/ 1/HEW combined its comments on our recommendations for this chapter and chapter 4. For discussion of these comments, see "Agency comments and our evaluation" section on p. 37. 76 APPENDIX I APPENDIX I RESULTS CF GAO'S STATE EDUCATION AGENC- QUESTIONNAIRE (note a) Number responding Responses from 50 Percent (note b) Number (note c) Section A: General 1. How familiar are you with the research being conducted by the Center for the Study of Evaluation at the University of California a Los Angeles to evaluate the utility of many popular commercially available standardized norm-referenced tests? (note d) (Check the one response which best expresses your familiarity with the Center's research.) 50 Have little or no information 6 12.0 Aware of the Center's work in test evaluation 14 28.0 Read some of the Center's publications on the evaluation of norm-referenced tests 17 34.0 Used the Center's material to assist in the selection of commercially avail- able standardized norm-referenced tests 13 26.0 100.0 2. How familiar are you with the Anchor Test Study conducted by the Educational Testing Service for the U.S. Ofice of Education (OE) to provide the ability to translate a child's score on any one of the eight most wijely ised standardized reading tests into a score on any of the other tests. 50 Have little or no information 3 6.0 Aware of the Anchor Test Study 13 26.0 Read the Anchor Test Study 25 $0.r _ Used the Anchor Test Study 9 18.0 100.0 77 APPENDIX I APPENDIX I .l 4 )I L( rj. C b a, (uvr1, EJ O ( q r0 N O w IU0-( N U I, >" ola a co o *4 -lzmI · I'-' C 0 0 .1 _CI 0 ou (n I I) OO - i o o £QZ. U V co v co u al ., o oo O )L.1 e e. o- , o o t, 40o.. 0 ' '= 4o q o O~~L~~ ~,, ft 'w C ' C i o0 * 4 ._C ( * - " to t) 4 > 4 . 4.* (LII C ' Q C 3 .o " :_ X. Q ~ . V I O N V 0 X t. W r 0 4 4 40V 4, :1 a£ Oc < *n fV 4 1 0 D - 0 *4 0 U N o 4. OUc~ CTCUo >D -4. In IOO IU(a0 c > O Q V C'I'r'XC . 'IU4)4 44) U JC (A U - VQ4 0. 4 01 0 X0CV 0 _UCIU) ,4 C Li, 0 Q 4X.4... 4)4C 4 * 04 V'V4)0..4)4)C~C0 m Uu E C1 0) O 3 ' t1 3 U a4)4Jcfl 'VC4)*-. O -'4 O~D-~4)~ C 0 1 u > *0 >O 4 > V 4 v ' ) 'V 'V0-. L0 'V.aNjJ CL _ ·C'Vr X ~ U) O 14)X X > 0 U o v X0 QX O 'V'V ' %'a X 0. OC X f4 C). 4 )0 00 .' 4. O 4IC XC X 04 X 'c a ) .4 a O.Eu 4 10 V O ) ~c04)0m ~4)0 fl*-. O ~~i 0.4) -Eu 0.a 0. cw t: LO -m > U'V4)0p .. I.(a 0 ela . _ a) - 0 w o 4 I - 0 - aE uc 0 a a 04 4L . W47 4 )0 78 APPENDIX I APPENDIX I , vl ~ 4 N O O W N LZ N 0 .CO.U~~~~PENI I4 W : c z1_ jI .1l o ' oC w a C on N eU 4 o N 4 lm I01.01 '1) %O v1 C4 C) %O co Ai ale N _ N 4111Eii .,~1_ uC ~ oe4. 4 1-~ oC',,I N C,4-4 0 0I N , t411 N qA 0 N Ur A ' com',-'f1a N4 0 N V v1 -4 4 01 w~~~ ~ ~~ V II v " Iw . N AC _i 2- .- N N .4 _CD fXvt O ', .%,0. Q1.~ ,, C: ~ ~=.~~~~~ do~~ ".t~ CO -4 id X C Im% W IV .- Aj aLAr (D -4 61 h .- 14 .4 _ WU01~ o -e S tS - ~ · 1" IOU' 0 w; .C .Ifat I *0 q. 0 ,-C 0 01 a% = o.~~~ >v41 1, ~ L. to o N ,CU .W 4 N CO ,-I 02 , - e 00iZ0v .Ac O l v 0 .4 . 4 0 a Ai 0 ,-,.13 0 :).iW0101 , = La Ml 0q. 4 O1 16C ~e (C0 ,:, =~~~~~~ . C oo At0-.,:01,- "3ovwol 01U4 .X_ 4) - ,l .. . I,o, o c.- , ~ 0.- . 0_"o~" . 0, -. * C.~ O-v ram. Z la .1 v.4 .. , I c.^ 40 CAZm .0 0 " 3.c 1 _ ii, ec., c . 0 - , ,o 0. ,- I,,oo III 6-%4 to Z .. a 4 UI Weu o mO o 0 0 @^ 0 C e =1 CO0 OD 0 C 0 . a. C At S:" O in g1 0 , - J0i C .~.0 WU**C 0w J CwUp1i *UL1 -4 . 0..-4 0*03 0* 0*1Q' 5 FC c eIII V 0 (A *0 1 - .c eof .c 2> 0 0 o uW (a 1 to:A V · a 0*4 C i,. C I 0c _W OD4 O).,uI = N =, II I , ·Ai C 01 45 CQ' 14 4 4 O C-. O C- O C4 O C O 'O 01.Q4 0 l400 -4 0 n h45 O s laWO 45 '45 e b e.4051 W b C .4 .C C045 C-- 0 U 0400141 gwoo 0 * v v _ 41 C4L. 4V0L j i -1 u 0 'C 0 L = 40. -4 Ai 06 v 01 -a4 W is -0101 -b3 0 -41 4 C0 : I0 CAa l CU 1 4 C a V J C wC~ w ID :. i C C OW 0 l 4 4 0 C~ "e 4C e' 4~ aW ) .M 0 -,5 . IW 0 0I-C1 4 10W 4 - 0 ~P )- ai14' O oU t 0 n 0 : > u C1 W 0 ( 01 4 0144.. c> 'A , W 4 *4 -4 0 Q 0 I " LA Il V V I w o -4 C; Q '- 0)0 4 " O1. W d 41. C03410. - Vs C C to 0 C C -0 L, >C 0* U C C > h C0 . 4Z0 > C C"4V ". a, (A0 44 .0O J W _ E- _ V 41 C M '4e * 4i2 0 .0 MsJ * A * ' J C 0 01 * U- h w0 - 0- -4 to N4 004' 01C 01 01 -400C 4 OOU 0004*01 -4 0i to 0W 01*' *C1 o'. 00Q 501 > 0 cs 05 00 sc- 41 541 A01.1 041 llt0 U1 t011C s C0 t -1 C~~~f 4120 l4010 > - 0.0 ,, ; WU ', Wl. o*41 v0 .. c .v 1 *4 a * |'* 02441 (1 10 U20 c u20U-oj vni . e -a,9* 0. o o to _ 79 APPENDIX I APPENDIX I l~m;I ~, O . r~ C , 11 m 40 0 N- .C: 0)w 30~~~- co Y Cm co 0 0ru r .r , , u- 0 w to un cD cu tr cin e IW u' 0) o ( co ,. -I 4 .4 -4 .- 4 r~~~~r ," IV L) U~I C a) 2 2 n4 C:) '0 e '.0 r- w .- cmeq.V0 Ln LnC14 co C~~~~~~~~~~~~~~~~ co(' .0 t041 4 4 .- Mr~ ~ ~ Ms q 6 0 CDf~ .) U " a ( -4 r' ~~~~U)~~~~~~U '~~~ ~~~ r 0 ~~~C! q. 41-4 ~~~~~~~~~~~~~~~~~~~~~~~- 0..,~ w if}N .o N. 0 C LA2 C >4 ." 6 Li ) WW "3043 4) Ai C C 4 4 " o ,~ = C o 4) o W> -Ai o. r- N4 % , Ai Oi. I-I .4 N. C 4f · -bA = IC 4 r 4 · ' · * N . -.· in, Pc w r(~~~~~~~~~~~~ ra 0Q Ai0 LA . O C~~~~~~~q% 0 . ,-- '0 Ai o ICI 0 - CU-- 00 CJ, .4 i4C) U)Z . .4 >4~c 3~~~~~ ~~~~~ 0.a-c Li * ~ ~~ ~~~~~~~~~~-.t 0~~~~~~~~~~~~~~~~~~~~~-' ,-d ~G;14 .,C &M N 4 %O ,.,= 0 0C.)CL ~~ ' ~ ~ ~.C4 ~ ~ 0 0',- . -4 MEG) 0 ' 1 1 N O 0 -t co. 0% CO4 ( COi O COG) C ,,, G 4 IC I U · 00 .'L gCOaC U)- ; GP a 0 r4 1 0 0 f" u c A 4 0 c OI0 - 0.i 4 O00 0 0 0>.. Li.,LIV 4 I4 -W i )J L 0Q UI ;414' , 4 ,4J Vu -_ i., q 0 01W CO Ai 4. C434 4. a to 03 C.-' A AL m3J..a la33 . *gAi .gt-. . ~ -. U 0 a-041CW M I..cb- 0Li.0L) 41'aw;:3 Li c C 1-4 ~ C '~41 la .w .ov4 *-4 L *-@0 ?A .0' o 4 c EU00 = ONAi (A OMt cc 4i~ 410 001A A 0 '90 I' dp'C ~1 10 0 1 U, Eo02 43 4) 'C 4 = CPi~ = to 4.. *.4i i la~ .L41 "4 )WO0 0-. Li41a to O0 M -43t 4 4 ~~~~~~~~i~~~~.M~d .IrrO 000'C~bo 4 1 ao 0,t. Li. t.. to4 41 41 0m 0. 0.0 a 0 00aLI to I)1 i C 0. 00 Li 41 '." u- 43 Oi...4. 06 . to ."4 U-44a' 1.~.a OIsoAiW430 .. 0LiCa '0 aU- 0 C 0C 0 0. ·-- , m0: LA A0 ' w I L.4.C ,O -.4Li -. c 1GiP - ai Ai41. A u t 'L .. Ai 10 .0 j *-. 43041 4300 4 30i i 430 4 (c 4,ar> 'Jl > 0 IC 0 W ::C 0 cr 0 Ma A C:IO 4 , - 0 00 >, I .- 4 0%mi 4 t w Ai4 4 a A 4v LA .4 N~ i I aAi R 1 I4L ' APPENDIX I APPENDIX I Number responding Responses from 50 Percent (note b) Number (note c) 6. Which of the following techniques did your State employ for its 1973-74 evaluation of title I? (Check all that apply.) 48 Aggregation and analysis of data from local education agency reports 45 93.8 Educational audits and their results 9 18.8 Statewide testing of title I students 11 22.9 -_Other (please specify) 5 10.4 Note: If your State did not test title I students statewide, skip questions 7 and 8. 7. Which of the folowing types of tests did your State administer for its title I evaluation? 26 Standardized norm-referenced tests (note d) 21 80.8 Criterion-referenced tests (note f) 3 11.5 Other tests (please specify) 4 15.4 If you do not use standardized norm-referenced tests, skip question 8. If you do, continue. 8. How did you report the results for the standardized norm-referenced testing? 28 Raw scores 2 7.1 _Grade equivalents 24 85.7 Percentiles 6 21.4 Quartiles 1 3.6 Stanines 4 14.3 Other (please specify) 2 7.1 81 APPENDIX I APPENDIX I NOTE: In he following five questions you will be asked to rate several aspects of the Elementary and Secondary Education Act, title I, local and State evaluation reports that affect the degree to which these documents satisfy your policy, management and programmatic information needs. You are asked to provide overall judgments on the adequacy of the quality, informational content, and utility of the evaluation reports. Do this by considering each of these attributes: very deficient, deficient, marginal, adequate, and more than adequate. Check the box which most appropriately reflects how you feel about the respective local and State evaluations with regard to the particular aspects in question. 9. Rate the FOCUS AND SCOPE of the local and State evaluation reports (notes b and e) (Focus and Scope: the adequacy with which the report covers the essential and related material and the appropriate- ness of the emphasis and treatment given to the relevant topics, details, and high and lower priority information). (Check one box in each row.) Number re- spond- Very More than ing deficient Deficient Marginal Adeauate adequate from Num- Per- Num Per - Num- - Per Num- - Per- Num- Per- 50 - cent ber cent ber cent ber cent ber cent Local reports 48 0 0 5 10.4 19 39.6 23 47.9 1 2.1 State reports 48 0 0 4 8.3 16 33.3 25 52.1 3 6.3 10. Rate the local and State evaluation re- ports on THE PRESENTATION OF REQUIRED MANAGEMENT INFORMATION NEEDS (notes b and e) (Presentation of Required Management In- formation Needs: the extent to which the report presents the information needed to evaluate and update current policies by those who transfer policy decisions into plans, budgets, program implementation, operational oversight, resource alloca- tions, forecasts, status assessments and reports, educational accountability, costs, benefits and efficiency assess- ments). (Check one box in each row.) 'Number re- spond- Very More than ing deficient Deficient Marginal Adeauate adequate from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per- 50 baer cent ber cent ber cent ber cent ber cent Local reports 49 2 4.1 7 14.3 26 53.1 14 28.6 0 State reports 0 49 1 2.0 6 12.2 22 44.9 18 36.7 2 4.1 82 APPENDIX I APPENDIX I C 1 L1 _ ICe GJIE, LI C, CQI % Z z, qz la ~ OI JL$C o41C - 1 JI C U I JiI ,. t10 1 oC m ~ : ZNI:A en O weJv q* 4* x> C · Wc mXr XW Ai ~ _ -M V* C 1V w Sa Deuc W VW I| U CD 'eIE _ O =U0 a ml eI lo J 40.i: Wj 0 ; q4 P.ito.),MM R > g GiA. QC CT (Li II 0 CJ II.l '1 C - -~ a u c .- ozJ,,, ·c L o _l >4 4 .1 -v 4)in la Loto-4 l Ca .. &tto -) U i 004). .) 0C C rd-C )4~t ODJ o Ai ta C0e -o ; C JJ0 C0 q) C3N .- t r.(C 41 L kw V oI 0 LA ' i0v4 a w Z m 4 0diCC-00. ' (a U)v - i , ~ to U-.0 0.1. C 41 M O.C: i ON44 C 0 -1.4.10 WC 4) 44 .0 w 0 44 -4 ,,*M Ji )4 Ai g, L M OGZ Ai I I "' C 4 -40 W C p Li L 4., I04 to 4 V 4, U 4 C Q04 w 0.4 1i 0 APPENDIX I APPENDIX I 'I 41 .. 90 0 W.CI 030 03 *0I cWI4t 4 '1 O~ -tn - IC ' .CI I U, -4 41 0~~~~~~~ "I do~~~~~~~~ a WI Ai co it of to~~~~~ el. t-I I % Wwin~ ~~~~~01 a WIZ .01 2 431 Vlm v 0I 31 '0 0 .6 I '60.03 ' ~l~ZI - WC w~~~~o .0~~~~~~~~~~~~0 uIg*.uI r.,.e 43"C u V ej 0C a Wi r -v v - 3 40 1 %W 1w >0U. 4 -m @4 .0 moo; I C ko % .4 - ZJI . *I I W C Crl 4313 Neo3.u9 w q q 9 . A IA to II 0 m 40 :1- 0 a W C a a 010 'a 0 -a 4, '61 "'~4a W '6 43 CC 4.0 laiC '6 Zi 30 ou 0'aC 0h '6 ' 4140 00043CM~~W10 0.0 IV 0 OW" U'A 3.. 'A . '6~~~4343 430- I W *L.4 4., a C ' 6U;0 3.. '6 .C '6 40 all~~~~~~~~. 3" 1a'- IDA . 3"M43 IDC43 0b.t' 43 4C 0 46 0 I M41 410-o . 433I W 4#43cov3>' . '6w L. %W 0 0 4341 4 3C0 0. > C. C.CC 4L.C a)W 4 0 >,i-i I I ! 43 1013 1 a A3 Li to .34 O34-0 'a VM C~(~41OW I i- Ac,5 ~ X&~~ .0tc~Oo m.I43C 0.flI @404 - 1C.C'6100 0 u u WI-C Cm ~6-0 5. 0. vIDAi% 3 W I 4, 3.C z 'Or. '3"3. a C a - .4' -CW41L64 20 Ai '4-- 3 3. C a '6 2-1z&( C~ C . 1) => 03C W a 4wC- O.Q 0 4 C. E4a Z O-I 3 0 C GPF 4 a 3 top M 0 C04 4. j 10 C -0 C)[ cr -5 C604. 40 I" 3C 0 00 0 0 C 043 u -'C C u a 431'0 W 4 4 .OZC 43C00 g -U~~~~~~~~~~~a2 43 J41~~~~~~~~~1 00 W 144 41. . 43 434 Ai 4-to .. ' -24 o 4 4 -33.(L0.0 33 0 0 4.3 0 41 a1 43* Wi. -3 C' ';p U.c 0 C Go 434 0.0 C GP 43 pL~a iI.3i30l O ar 1 43433% C 4b 3 C 14341Z4 64O 0.0.00 * 4343 "lo 1 9 - .- 0 03 84 APPENDIX I APPENDIX I I 0% · 0% IV 40 0q " r 14i . . . '- "1 , , a2 OC 1 h B 4 i1 in c i a , 11 0 '0 t I I o · - 0o t..a .J LaI I q-.. caD L @ - ~ 9 o m 0 C U @ 0 0 0 D 0 GDLa o 0 ( 3 0 0~ La 0 C0 C 1 .4e n C .U 'U U C_ .: C _ _ .. / @11, -4- - i G C , C O uf @ C @ O 04 '0-4 '0* ., 448@ L _ C > > APPENDIX I APPENDIX I P ~ ~ ( ' Mr 0 0 (N 0% : (. o . znW~~~~~~~~~~~~~~ ~ '_ I1.0 j 44 . 0 (N ~ t-I (N q ID. Q| q. 0M _ - - 4 -4 (-D -4 · 4 (N·, c0) lVI ~i C4 4 In@ N do _ Q * el Z MU1 Ql ~ ~ q O @ r•'l _ In - 3~0 GO -_ ° 5e ~ (, b" ' t 0N0 A:~ - ~~~m0~,1 r C i .i- ~ N m . ° c° Ut j. u4 0) S c e ~ cr C szc c e 01e 312 J 0| Q r - NO A O ° -4 mJ Ca1 Q LI la a, C V4 a. = AJ.C w e~~~n . 4 S U 2 OCa-I 9Ou e 4 0 O0 4i~~~~~~~ o 1 oxg ' 44''. i oALZ' o A.4 ' _ _ _a ' a _ _ 2,-,v o'e "- A, ez , V m_ · I.M 0 C l" In uj C *= 4, - I QN * a0 " ~C __ W 0% l4l tl,--,tl, GJl. OMM.J.-. *VOGI.C.C4 at..1, Cl 6 C-' GlaO86 Ai"C 'CGL' 01- VU 0 00 GD 4 GD 0Go to 00Ai U 61 N 040 " M1 U M U N 'U M Aq 61.... At CM46 0 1- 4D i0 C ie 4 4j11 At 0 0.C4 Ai4 Ai. ul16 0.. Ai.G Ai .CCAii...,, 0 U CC,: 11G GCG .CU MC..e -'L N i AiC 000 Ca 00 3M 0*a4 COV .- 'I Io C-) VN 6° ];'I°.j £~~~.> D. . .0 "G '"'0.~O C "" ' '">CCa '"D~ GDM~ CG C CCGD 4. D. 1001L.0Aw 04 O' GD £ G I GD i. 06 . - 0- C GD 61CGD-4.M~~~~~~04.~~qJ4 CO 0C.4 C g *.4 .. ei 4- wie C o' o. v-L. -ri -. 4 = -4 40 e ' ,' o- -- ( . j ":"a '"o""':'8 'a'='" 0 v Ai ig--N C¶ a C IlaCn. 86 APPENDIX I 1 ' ~ N t o C o a. - 04 l 30 4 C' 40 40 A ,.CI 4 .. .. o · , . tE1 - .1 MI .N .' . a -4 Z . ltII411 o o - u S E -j f rrl o 0 (N 0 o o W V * '. to °-0 6- Ea41 C in Z0 , 0 i l0 I0 , 0 o W LI , o 410' 1' 1 W 4 C r1 C o . O_uJ . 0 C Cto II Li 4- o (N rr E. C w w Is 4 Li_ C-iO de oh O* O C t U* 0 ... Oa v0 Cc 4 v) v_ GZ I' C 0 ar0 - v4u v o w4.10' Ou _4. aa v- O _. .4 00 - 1 = vE C11.4 F v14C t. - C.- O' v4v4 U CC 4 go 4tl . 4141411 -4 i . * e t,,o * laCv 41Cv .1 ) =1C 0 M41o Ai i M4 c0 -44 141C .0 0 U 004 ,.0 0C4.,,d. . -. *04 V L1 M4 0 OOcC 41 - Cr4,, ~c 0 40 C10C4 ~ e04 14 54 o v U. I CI idO10 4 C C4u._ La .41 C t Ai - 4 Ai 0 t4- el @~2 I >a · 0 0 e0 0 e. . !e 0 C _-i _oat Cw kO 0 Z oC tl W 4 O 0 I 4W -n.0 U 41 O- 4 4a l 4 4I - .I1 ('4 C t C 40 C C _4 A 0 V CI .o e 87 APPENDIX I APPENDIX I Number responding Responses from 50 PetcenE (note b) Number (note c) 17. Which of the following techniques did your State employ fo its 1973-74 evaluation of title III? (Check all that apply.) 48 __Aggregation and analysis of data from local education agency reports 41 85.4 Educational audits and their results 27 56.3 Statewide testing of title III students 5 10.4 Other (please specify) 12 25.0 Note: If your State did not test title III students statewide, skip questions 18 and 19. 18. Which of the following types of tests did your State administer for its title III evaluation? 14 Standardized norm-referenced tests (note d) 9 64.3 Criterion-referenced tests (note f) 8 57.1 Other tests (please specify) 7 50.0 If yoo, do not use standardized norm-referenced tests, sip question 19. If you do, continue. 19. How did you report results for the staid- ardized no.m-referenced tests? 10 Raw scores 2 20.0 _Grade equivalents 7 70.0 Percentiles 4 40.0 Quartiles 2 20.0 Stanines 5 50.0 ___Other (please specify) 2 20.0 88 APPENDIX I APPENDIX I NTE: In the following five questions you will be asked of the Elementary a.d Secondary Education Act, to rate several aspects State evaluation reports that affect the degree title III, local and satisfy your policy, management and programmatic to which these documents You are asked to provide overall judoments on i"nfrmation needs. the adequay of the quality, informational content, and utility of the evaluation by considering each of these attributes: reports. Do this marginal, adequate, and more than adequate.very deficient, d!cient, Check the box which most appropriately reflects how you feel about the respective local and State evaluations with regard to the particular aspects in question. 20. Rate the FOCUS AND SCOPE of the local and State evaluation reports (notes b and e) (Focus and Scope: the adequacy with which the report covers the essential and re- lated material and the appropriateness of the emphasis and treatment given to the relevant topics, details, and high and lower priority information). (Check one box in each row.) Number re- spond- Very ing deficient More than Deficient Marginal Adequate adecuate from Mum- Per- Mum- Per- Num- Per- Num- Per- 50 ber cent Num- Per- ber cent ber cent ber cent ber cent Local reports 48 1 2.1 2 State reports 4.2 16 33.3 27 56.3 2 4.2 46 1 2.2 5 10.9 14 30.4 25 54.3 1 2.2 21. Rate the local and State evaluation re- ports on THE PRESENTATION OF REQUIRED MANAGEMENT INFORMATION NEEDS (notes b and e) (Presentation of Required Manaement Information Needs: te extent to which the report presents the informa- tion needed to evaluate and update current policies by those who transfer policy decisions into plans, budgets, program inlementation, operational oversight, resource allocations, fore- casts, status assessments and reports, educational accountability, costs, benefits, and efficiency assessments). (Check one box in each row.) Number re- spond- Very More than ing deficient Deficient Marginal Adeauate from Num- Per- adequate Num- Per- Num- Per- Num- N- Per- Per- 50 ber cent ber cent ber cent ber cent ber cent Local reports 48 2 4.2 4 8.3 State reports 21 43.8 19 39.6 2 4.2 45 3 6.7 1 2.2 14 31.1 26 57.8 i 2.2 89 APPENDIX I APPENDIX I · ZC Q m. 'I"I .Q 4 _,1~ GI c· ° · I o' @iiOCI , q 0 1 °,- l 0c 90,C I0q:5 e|i · " eI ., o-,,2 C el ~ 0 4)4) m4 I ,. Ai 40 f 'A of OM.0Q-' Z _' C 1 ~.S@ 0E FO 10 v0v n3 a e ow · o v. .)~ 00-4 a c).C.-@~' M,* 4 ,, Iw0'.O Oc ooi 0 a,'A . 41e0 bMUO' 41 O .0IC'~~4U Cg m ,- oV 1 4r :4 x C 1 Ai M. > 0.ugoaa4 9, l 00 lc C ICQ 4V(A 66 c la 0 4)) x444 · 4) r -C0 .Wm 0L...PoOLc0 O LI 4 Of ) 41 "00'V00 " b Cl. ° ==0. o c 0 Ai- e) L '- 0 -m' . l "o."r '.d4I Aic( w 5, >%C Go .C -''a UOOCOIVOtoI r-,e v 0 ~WW 41 041.-'C 4 lOoo . M t APPENDIX I APPENDIX I I IC I-,4.4 ** 0 * w A, la u V, ulI.:CICW.u rl 0 3h: 4 -0~~iiJ' ,a 'ILI u - u*4 rF . C UV .cno1 .4 ( 0 1-1~~~~~~~~~ 3.9, .1 is - 4' .v co 4. 44413.CI I MzI .fl - 0. IV II MI - 1' to I. . . iQ- I,fI -C11.o .. -0-0 44C 1601 -, 4- w 4* - 11 - IC a10 160 4i a I. ' " ""2 0 '64;; ~~ ~ ~ ~ a ~ C~~4 ~~~~~~aWI I 40 .4 ~ la z a -CI a 04 C6111,~~~~~~~~~~~~~~~~~~~~~414v 66 C4 4 II Om A 3444 1 - I 1 f. I CbS t.4" 0 M a 0 a fL 6 la .M.C 4; C ow XI CI I&AC3-IM 44I.w 44 L 4 4 4 u a IZoC 4, q I... C , 0, r C 00 441 144Mb 41.1 44~i 0 n M4Q 4aUW %" I Co61~ 4Cu 3.6 1 , 0 44-4 *L0 CM cw~~~ Aic;'~~~~:~ aU C lU i:. 61 ¶44 n ~~~~~3 4I .~~~4 Z . V'j, w40034a.4 !. I. di 6, r.C C. >() D L C) toU. , b" :0 9 8 'M ea-H4 0 M Al I I - 4g IC4.44-a4 'ad. ID au Oor ~0 am 3 .00 a~~~~~~~~~~~~~~~o AiWM3. 0 44 40.~ 0: 6 00 00 or *-r44 . 0C 4 cq 0643 - . 4C, 444V OrW2 C 4.40. 441 44.300 * 4*444 S1a 44M 0444~ 44U3.~W *--4-~' 44- 44'6.4*3 WA~ Ol 4444 14* 43-CV 44-4 M41 4444 a0 OMa .4 -4 EN '4 4 .AC 4 -. -. Lo 6..S 4 APPENDIX I APPENDIX I Number re- spond- Very An often Occa- ing often Gen ali s not_ snily S Ldom from Mum-Pe- N Per um- - um- Pr- Nui- el : Stte reports S ber cent ber cent ber cent ber cent ber cent (1) Information on the manner in which the needs of children are aieSased 40 4 10.0 13 45.0 7 17.5 6 15.0 5 ;2.5 (2) Information on the number of children in the program 40 21 52.5 13 32.5 4 10.0 1 2.5 1 2.5 (3) Per pupil expenditures of each program 40 12 30.0 13 32.5 5 12.5 20.0 2 . (4) Evidence of ualifiable or measurable achievements 39 8 20.5 15 38.5 IA 26.2 2 5.1 ; 7.7 5) Evidence of oualifiable or m easurable pupil benefits 40 6 1.0 14 35.0 10 25.0 5 12.5 5 12.5 !/:e reojested that the ouestionnaire be completed the State's evaluation efforts conducted for b State officials familiar with Act, titles I (regular, migrant, neglected, lementary and Secondary Education nd delinquent, etc.) and III, and te St3te's own assessment efforts. The uestionnaire was divided into three sections: section A, National Assessment of Educational *ection , Progress and statewide assessment lementary and Secondary Education Act, title and section C, title I evaluation effcrts, III evaluation efforts. be taken apart and distributed to )ersons most e suggested that the uestionnairz the State director of research, planning, and familiar with each oart, Such as director of title I for section 8, evaluation for section A, the State nd the State director of title III for sec- tion C. So-e of the ouestionS from the ouestlonnalre reoort. vuestions pertaining to the National Assessment h.Ve been omitted frcm thi! o Educational Progress ,nJi tateide assesaments are shown in GAO's reoort iated Jl 20, 1976. to the Congress (D-76-113), I, Acril 15 we sent the ouestionnaire ton the education agencies .~nd the District of Columbia. in all States By June 1975 the District of one Statz responded. For Furposes o comoiling reesponses to Columbia and all but the ouestionnsire, the ictri-t of Columbia is considered to be a State. ,/This colJmn hows the percentage of respondents to specfic answer. the uestion tat chose each - tndar;iz?d norm-referenced tests are tests dual tuent's aoility or achievement in broadwhich purport to assess the indlvi- subject areas as cortpard to the rest )f the tYst pplatiton (e.g., the Metropolitan iechsler ntelligence Scale for Children (ISC)). AchievemenP Test (4AT) or the iThe percent colji-s shov the ercentage of rerpondents each category. to each line item that chose Where the percentages on each line do not Co roundinn. add to 100, it is due f riteri.n-referenced tests ere tests s cificially Jttainne t of oecific ducational objectives constructed to measure students' culJm material. or proficiency with pecified curtri- These tests, which may be teandardised, usually Cic and operational description of the level orovide a speci- and tyea of task performance or be- lavioral teasures used as a criterion to indicate attainment of the educational b3)ectives. For example, the student must be able to comput? of 11 single diltt the correct product numerals greater than zero with no more than five errors. 92 APPENDIX II APPENDIX II RESULTS OP GAO'S LCCAL EDUCATION AGENCY QUESTIONNAIRE (note a) Number of projected responses Responses from 8,936 Number Percent (note b) (note b) (note c) Section A: To be completed by questionnaire respondents from al local education agencies saampled 1. How familiar are you with the research be- ing conducted at the Center for the Study of Evaluation at the University of Cali- fornia at Los Angeles to evaluate the utility of many popular commercially avail- able standardized norm-referenced tests? (note d) (Check the one response which best expresses your familiarity with the Cen- ter's research.) 5,987 Have little or no information 5,059 84.5 Aware of the Center's work in test evaluation 601 10.0 Read some of the Center's publications on the evaluation of norm-referenced tests 192 3.2 Used the Center's material to assist in the selection of commercially avail- able standardized norm-referenced tects 135 2.3 100.0 2. How familiar are you with the Anchor Test Study conducted by the Educ:ational Testing Service for the U.S. Office of Education (OE) to provide the ability to translate a child's scor" on any one of the eight most widely used standardized reading tests into a score on any o the other tests? 282 Have little or no information 282 100.0 _Aware of the Anchor Test Stu' - - _Read the Anchor Test Study Used the Anchor Test Study - 100.3 93 APPENDIX II APPEIDIX II U ''II i r Oi 0 0 0% I , L C Z _ @ ( o . 4.(AIUIto 0 0% 40 4D r- E 'C . . . . . . SW 0 O m In , n .. ~ o ._lz l ;l , - 0% N I I 4 C o 0 r- N o V~ I W iC4 '0 Ew9U , r- w N P1 0 0 _p r-_ 4 ci - 1 CC4 , . 9 0 0 0O c 0 I 4 N -4_ 0-4 0-4% co O U1 4 PN0 O o' C r'N N "- - 4oC a" I q r, I _i E O%_ a 4U 4 o Ni N 4 *0: e N L. _ 1 0"I "' t. 0 Q 0 o r-m ry 0 a - laU0 1 * ,cC C -4 a sCI f. o o hi' o,Q - O .._ S o,Im" C o > I L - ao @1 Mhi Co US I O ,tl D4 vo0 1 4 a4i. 0 rg C O,@ .C C C 0 jC C 4_ - c U @a U * Q*. 0 0 C C 0 . QC C4 C C AiC Q OIC C-O 0 u a a= ' Ai4a SCg U 0 0 - C1.4 00 SO A* a a4 j 3 'U 0 o -0 O C 'I 5> - O O * O Mg 4i O0 o 4 hi=ZJO 6* 0 0 4 C C O C M Ai a t : OC aa C -4 OC u O 0 0.- aC _ _3 _ _ __ o * V C coc 0 4 0-400 0 0 o o* * 2 0 0 -4 P1 41 0 4'O C Ai C m aP4a * O - 4 C Y4 a O e - ^-- APPENDIX II kPPENDIX II II &l N -O Sn -· 0 i-It Mr Ci I On O - -l IOU:I - N - " - f- N -4 Sn '0 1' CS IEWI) a MI * N n 0% Sn - c *-, - - - D 0 0 Na(Cel A a SO - Sn N N 0 & A 10~ ~ ~~~. J02 4 o r - Sn (S - 400 T ! T_ 0 0 A %4 0 ri 0 c-0 5 - 4I II I .% I W l 0~~~~~.U- u~~~~~-0% - C~~ 14 .0 o~~~~~n c SaA N a 0 -~~~ a ~..4j'Cua UI *1 -M * p 'c Il C N 4 0 Sn Sn 0 Ai Sn c 5 40 0 c0 0 4i a4 4 . 0~~~~ - C 0 0H 103 ~~ ~~ ~ ~ ~ ~ ' -A -? i I. - ' a - 0%u .4 '4U U4O~U0Co. E'O v-,'C a a am 'c aa M a z1 Cq~_·~ G9 ~ ~w4; U g; ',· ye I9 M 4;l4 4.' 4; 4;.4; 1. 40 14.40 CP9 WIM-r O hI C~~~~~~r C *- @p0 at ASI C~~r 'C0 aW C C r o s' .0 ' O.Ua0S0~I*Om·u S I *s O U a... t IsO"' S.... Iso.*s C6 · · ~~~~~~~~~c *E CS *u . "*~8' ,d"O'Y~~~u "~~ lJ COw 'IC~r 5· 5 ~~~ Y16=LIC .I 8 u, ul .'u eu NI r~I· U~y Ic~·j95 APPENDIX II APPENDIX II CIA. 0 ° N' ° ' ._, In. .. 4 l' nf-A - *0 N '0 0% . N O I~1 N 0 - N Z I .- 0 1 0 c .4 0 01 .'Z ZI 0 . 'Cit *0 IA a. ~0 N N CI Ql _ , i IU~ e. . 0 . O .1 I 1 n14 O 00 0 is 414 11, fi 1 CD e aL % ' D _->I N co N -w ,Zll . . .M . . £1 o. _ _4 1 _4 -l. tI -k W_I li 1 I _ * 31 -: * C O C CI O0 _94 0 .0 141 .4 0 0 .4 7 *0 0 IwIv.0 N e1 0 - N u I 1 jI - I 0 .0 I O .4 e 4 'n M 0 _ C, 10 * b0 0 %a *0 In j a 'A Io I v. 0_ _ N I . W o 16 "CC 0C6I0I 0 *0 0a. 0 o.4 Ca _ 0 41 C C 0ll 0 N A 4 Q 0001 0 IA I '6£ ouC fl.4C AC a 0O -1 o C C u C a, or ror U a 4i I . , _a _> u Il e _ I C W C C C 0 C6 C 1 4 a a .44~ul 660 6~ '.144 '4 t " a Q 0 0 .441..._C 6 01 C C 0 C 0 . r·0 *. c * o>00 16 IPr 16 60444PO C. £6 £ O £41 h6U1 * * OC c'16@ @* O .4444 ·UQ 4 44 ?z 4* 4 44 C . C6 U 411 3 j 0 ur << . Q4 60..4*-6 66 6 US 06 4* 6 £ 2t.3ZO* 0 4 O B= 0 ~ IIt * = 64_ 44444044 6 U *1. * 1 009 96 APPENDIX II APPENDIX II Number of projected responses Responses from 8,936 Number Percent (note b) (note b) (note c) 6. Are your local evaluations of federally funded programs usually performed by ex- ternal or internal evaluators? (Check cre.) 8,205 Internal (e.g., local education agency staff) 5.468 66.6 External (e.g., consultants) 494 6.0 Both 2,243 27.3 100.0 f/ 7. Which of the following types of tests did your local education agency adminis- ter for its title I evaluation? If your local education agency did not test title I students, skip questions 7 and 8. 8,583 Standardized norm-referenced tests (note d) 8,103 94.4 Criterion-referenced tests (note g) 2,110 24.6 Other tests (please specify) 1,168 13.6 If you do not employ standardized norm- referenced tests, skip question 8. If you do, continue. 8. How did you report the results for the standardized norm-referenced testing? 8,029 Raw scores 1,792 22.3 _ Grade equivalents 6,658 82.9 ___Percentiles 2,978 37.1 _Quar tiles 528 6.6 ___Stanines 946 11.8 Other (please specify) 163 2.0 97 APPENDIX II APPENDIX II NOTE: In the following five uestions you will be asked to rate several aspects of the Elementary and Secondary Education Act, title , local and tion reports that affect the degree to which these documents State evalua- satisfy your policy, management, and programmatic information needs. You are asked to pro- vide overall judgments on the adequacy of the quality, informational and utility of the evaluation reports. content, Do this by considering each of these attributes: very deficient, deficient, marginal, adeauate, and adequate. Check the box which most appropriately reflects how more than you feel about the respective local and State evaluations with regard to the particular as- pects in question. 9. Rate the FOCUS AND SCOPE of the local and State evaluation report. (notes b and e) (Pocus and Scope: the adequacy with WhITh the report covers the essential and related material and the appropriateness of the emphasis and treatment given to the relevant topics, details, and high and lower priority information). (Check one box in each row) Projected number re- spond- Very More than ing deficient Deficient Marginal Adeguate adeuate from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per- 8,936 ber cent ber cent ber cent ber cent ber cent Local reports 8,510 45 0.5 426 5.0 2,097 24.6 4,521 53.1 1,422 16.7 State reports 6,350 196 2.3 733 8.8 2,248 26.9 4,098 49.1 1,076 12.9 10. Rate the local and State evaluation re- ports on the PRESENTATION OF REQUIRED MANAGEMENT INFORMATION NEEDS. (notes b and e) (Presentation of Required Management Infrmation Nees: t extent to which the report presents the information needed to evaluate and update policies by those who transfer policy decisions into plans, budgets, program implement- ation, curriculum, operational over- sight, resource allocations, forecasts, status assessments and reports, educa- tional accountability, costs, benefits, and efficiency assessments). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficient Marginal A dequate from Num- Per- Num- Per- Num- Per- Num- Per- um- 2er- 8,936 ber cent ber cent ber cent ber cent ber cent Local reports 8,635 119 1.4 484 5.6 2,782 32.2 4,037 46.7 1,212 14.0 State reports 8,486 312 3.7 863 10.2 2,879 33.9 3,313 39.0 1,119 13.2 98 APPENDIX II APPENDIX II C t 114'1 C il C II4I O - '5 05 ICI Q La 05 .5~~~~~~~~~~~~~~~~~~~~~~~4 .C0 4a co 4) 01 4.1405 0Ld1 If co g~~~~~~~~~~~bS ki 0 Y~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~4i41 1 0 44 40 05mc * 0 51L.CI l 05I I - 0 . I c MA* 4 1MM . . '555 N CIfOlu~~~~~~~~~~~~~~~Ii I'I6 il m A04Ul01'i M i l0i N N in tn~~~~~~~~~~c Nl Nr N P II.. h. in- 4*14* .4 en .E I W~0 '* 0 4*1u .I C 4i1Al _! W A.1 I rl - V Of x CS A 4* 4Cr CI O0 l -0.CJI 05~O '5 .4 -14 Z -- VEJ I '5t % co C 0. *O10)41CCId~. 0.0 ~ ~-OC 2 0 4 ~~~~~-405O.-~~~~~~ ~ ~ MM C 9-r.4,- 3 0Cs.IVt *05 10. 0xk %WI 49 CA 4Go .40 05 405 *A 0.4* 05-4 & u - la Occ0.00 0 0z O- .410 0 I .45 4j . 44 de la 4i 1-4 A 1 0 (LO C 0C C Wu1 i r 0 I· C C,1.IV5InC(A 0 1OO.C 4114*1 4*04 &fl 4 m ofL~ Aj 60.i 14CC. O1i L 4qv .41 .0 a).- '5 0.464 059 05 . 051 4N l J* 0)- 05 * Ca. 4 005 0 '@1) ~0 3 C 0 .501 0 4 U0 6, 01006 0 N 4!*as,; 0In " 6 Y Y1 C 0CIII . C 0QI0i4b w 4 c) 0 a0 CO 140 1 'A t0L 0 04 0 - 011) *4 * 5*.0 qVI. ..CL- 00 54 "U" 0a 5 60 5 -. O A. 0 4'' 0 O ) ·-rU 5~ IO M 4-I4 to um .- 0 WI 10 Cin 5 1. 0 0..0.* 0C 0101 n0)(54 0 C 00 It, - 0 r N 1(A 4 49 to0 0. a05.05 *C k. O: -4 A 0. * 1 (4 %a * 4* 0 Ca h 10 >U j C--L4t )V= -- r 44 C~ aI. laC ' .34 O - Cn 0 g C 4C' xC -4 0 _ . "a05050 0C &L.; 8. 8.C 10u 1UN5 4) m .I~ C' 0*. '5 i a In 0 0. 0 0 OC..,.4CJ to '5054 'O MDo.' UI'a ) co e-05 fa wto c0 v >Ca "1 C la Aip1 3a CJC m %I 41 aoG Ai C'5 0410C.) OU~ · U* -4 ~w 44 O C. -I C ~~ Iw ~ C0 ~rJ ~ w* ~ aC06m NI12 -4 U ~ l~i L L~a4 C0000 r1 %D Ai c r 4i~ 3 0I Ai Vgo 00 . 1 C at 0 a, t-o C x 54.0. C Q. wL0 la000 404 o1 0 04 U 1 C4(p 0 C0 L. C go5 .-- 04 la0 a--*. En0 - 0)4 "9,·WV W i u0C 0, O . ON to (D w 00)( 0 z C -~~~~~~~~~~~~~~~04 99cr 0 wiO C u L) U~9 APPENDIX II APPENDIX II ,I IV: 01 01 _1.-, L-,,,¢ 9145 ICIU a 0_ N _ I 1 1 n Q >n 14 0.1 m ' eQl 454510.01 . 0 C~ic~l+ILn 1 _ C~~~~~ l 0 " ii0 CI I 0 - ' 4 10. . w At 45(a0tW tol r o4 - :I.IU C45C1I .115uX 0 .-A I5 .. 0 ca 0. I, 4, 4 ' I5X'v0 ' N N d ... r o * ~ I~' ,4l. .~CI 4j o . I ¶0.0 N 19 c p _ . ID4 S4,= I . . e(A ra (J o _. M_ 'a 0 U9 C 40'I V C > 4 U V (y n eUaI- E -- o_TIIV _ m4e *i.o4' N V '0 1 'a In 4~ 0 v I MI MI N cD- 4 Q NC O 'a -I CA v 4J 0a a 1-CC o= l a u a X S o 4 uI c -u' 451-.V'454G c452 ei- ' 4 r E ' 4 i 45C V M c0.045V4 W .5 45e . 0 ... 4.C O 4 o Q I 0N9 & - C O U l M - > : .c4 'o e 0 e e - C 4 eC u.a C 0 45 e 2 . 45 0, oc0 O OMI 1. v v5 v Z45'A OjI_ c cQ va 4 4ci I C- C o OO 45-145 -P Eg- f eO L cU e U 44 Ai 4 Cxr .-- 4 4 u e.c e . - _ w o q 45C0- ,c455. MoI^ 1U -'I CI5CO.4 ~44c4500Q 5 1- I e *045fl..B 45e S.055 eVC ea o v 0O1- O-9 45 4 1- M 1- 45 0M'>- =o O~N 452400 UQV MM44 *5ucc 45 CU e 50 0 CU c e ' 2o1 u U Y a M5 2 l45 .. u0 =0 - 4"U O4C O e - 0 v V2 0 4O *e -~ .w454sX 450 14 u-e 45 45 APPENDIX II APPENDIX II APPENDIX II Z J_L IU, I tnqD ooIqw q.4 cn-I UI en C , . .I CI 0 Aill t goI rr- 4·-.4 () o .C ~; ~,...4 e . . 01= w -,I L 4 JW 0 C: id tr~c~0 4 L4 ... LA k Cl . . C.). · -w~, I C ,O , - '4.44) 4 02 j 0 ojL -l '- 0.C WO44) o = Lit i34:4 L 4o ( 4UZ.0l Ai 0C 'r.. 0 0 SW DA C41Cw to410 ( r.0 . irar-I a a 4 Z co rb 0 -4 W t v . 4 )OV4 1 E 'U , E* I a N c l 0 c a t 1 '- w Li 0 (1 CIla I4 'I 0 Z 4. 0100 X p O.LIW 41 4 3 EaU .0 V) Lie APPENDIX II APPENDIX II _~APPENDIX I z4 .l a. * a . _ @1~ 4a I o-. * 0 0 ,^ - CI 4 4 -4 Ql 0 .I * '--'1 _" C 0% N' I Cl .4-JIN N W - l 0 0 _ C [0.1 M n I D41 - e C 0 o o,> 1 o e o 0r Ijl 01 G ,c O,, 0 IZ nl * In N kn_ c J.01 i - N _-4 .01 II 1Wt 4 i -. 'ICI _ . 'i .0 U " r *- : ,o oi0 l 4. ,E :' o c, .o o I ..- z, IZ U iU'01 % 41 NL- N '.I C= 0 ' , ' '"" 0 ~ 4 -0' 1 0 , 444 V -0-.j j~j O"1C -e0 CC 00 O O2 Q 41 O O 4'0 '0 . a C ,. O 0). Uc 1 ! 0o 44 , - ,0 0 M , 1.,,a& , t _ · O *, C 0-, ,~ C. C 0' =* C0Za aGP4 , a .cCL 0 la0. J *4 > 0 x.4 ,O tZ FW t Ed ffi 0-q 4)10"W C 50 AC 4Ai 40 A ne h C.IC44. e- C O 4-, M444 44 4 - 44 . AiW >. - . 44-. CC Jo 0 44 5 A C _- r , , _ Wz la Ai4 u OC 00 0. C- Q -... 0. Z;c 0,_ CiYW .,; 0C cwU.i4.4o Oy r44 " -'a- ' , G 'D - C oL. , S,W5 C44 * ., 0 o 444"I4 . =C i 0-. . 44 Or- v -or CO 444 44. CO ..4a ao 0.44 4-C U 2r U0uE - 0,u a 0C 0C 0C 4 M A *i c04....j0 - eU~gu o 444.. 4444 44u 44ur CC44 4 0 idS 0 CU 46 CU 44 * >44 0D 44 '0a A( -ru·- N -~~rrr~r~ Y- - -Y a L * 4%le t4o APPENDIX II APPENDI;: II %n C4 0 ' -0 rq~ I04* 2.0 %n 4fl kn 0 N at NI N 0 q. M -4 * -, N ~ ~ ~~CN 11411 4111 4) u'4 '0 4,4 IN V -Ia I %OIE'4I i %.Ci Sfl UC 0. l. 0 M 'CI~~ ~Ir~~~~~~~~~~~~~~V ~ ~~~C 10. UI N N -4ct 14 4 4I 44 4 - 0 In l.CI VI. . MI, I WI 4 4 4 N c C 4l'. % '0 en 4 n 0 0 i2 .0 NI N 4 4 N 4 en : 4) f '0 - co (I - 'I - N O .. '110 la 24 El C4, CU '1 14O C4 cln Ut r.N N C4 Irl a~~~ r-4 .4 *' 00 0% LN 112.0C MI C ' 0 4= z A M C4 l C I en on 001 en co 4 NO a IC O a iW n0 0 c 0 g~~i deOnI~~~~~~~a. D o '4l .4 - - " 01 N N N N N a P Aia O P- o Nl r N a qu· *p4 c g L. ato39c ID O co 7,, rn Y Ai 01 4 Pu- CO·~ *0-4*4*a id. la -. Q4 so 4* 4 W a 1 c93 0 w 1q -- '40 ( 4. W I 0 4 Ai 34d0- 41W4*44~~~~~C4* 40V. Iv 4.4.- 04* 0 O 01U 0% 04w 04* 044* 04 04*' * C 5C au1 C%4* c 00a 4* 0i A 00~~~~~~~~~le C 00 41 IC A 6 C ' 1 - .0* C WC C'.- M A044,.1 C- 4 Ow OC 0 0 la ~~~~~~m ~ ~ ~ . 4* '4 '4 ''4 44 OX "- luo or~ ), 4 o~~~~~x44 aq la4C O -- 0 4144* 414N uE 41c 41**.. u 11. .. 414*OO140OW'E BlrY U4*.MMM 0 * 0 O 0) C .0C to D'4* V ~0u Pa . 'a- M-o CO .41', C.* MC CO 0 Ai C ·-0CU DI'h cV.-44 wu 4A-LI U'0 0ur wo 044 .,C-.4* 0 4* 0 4 EU WAi Ai W D 1.. C 00 . o 4* to4444 0 4*. it M- 111 -i * .- 4t 0 40 41 4 0 4 4 toV 4*4* AiCE AC121Imc4* 4 el 10* 0 doU4 6.000 CM C V j Iola a a 4*-4CM w 41 4* I1C 4-'4 0C4* c-t CO 0 4* go 10.4w0%-.4* W4* 4O 0 a 0 V us MEM 410 V-44*4*M-40 4 W 4* 'a 1 C 4 i 4Ai 4*u U f4* C 0 be 4**% 4Ui 00% 04* O 4*O*'4 O* : 43 4** a 04*00 ia 0 0 ?A OIL EO"a fa 4* * A.i -4 103 APPENDIX II APPENDIX II Number of projected responses Rt3ponses from 8,936 Number Percent (note b) (note b) (note c) 18. Are your local evaluations of federally funded programs usually performed by external or internal evaluators? (Check one.) 2,623 Internal (e.g., local education agency staff) 1,445 55.1 External (e.g., consultants) 392 14.9 _Both 786 30.0 100.0 19. which of the following types of tests did your local education agency administer for its title III evaluation? It your local education agency did not test title III students,skip questions 19 and 20. 1,745 Standardized norm-referenced tests (note d) 1,380 79.1 Criterion-referenced tests (note g) 424 24.3 __Other tests (please specify) 515 29.5 Ir you do not employ standardized norm- referenced tests, skip question 20. If you do, continue. 20. How did you report the results for the standardized norm-referenced testing? 1,319 Raw scores 400 30.3 _Grade equivalents 981 74.4 Percentiles 765 58.0 Quartiles 73 5.5 _Stanines 283 21.5 Other (please specify) 64 4.8 104 APPENDIX II APPENDIX II NOTE: In the following five questions you will be asked to rate several aspects of the Elementary and Secondary Education Act, title III, local and State evaluation reports that affect the degree to which these documents satis- fy your policy, management, and programmatic information needs. You are asked to provide overall judgments on the adequacy of the quality, infor- mational content, and utility of the evaluation reports. Do this by con- sidering each of these attributes: very deficient, deficient, marginal, adequate, and more than adequate. Check the box which most appropriately reflects how you feel about the respective local and State evaluations with regard to the particular aspects in question. 21. Rate the FOCUS AND SCOPE of the local and State evaluation reports. (notes b and e) (Focus and Scope: the adequacy with which the report covers the essential and re- lated material and the appropriateness of the emphasis and treatment given to the relevant topics, details, and high and lower priority information). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficiert Marginal Adequate adequate from Num- Per- Num- Pe Num- Per- Num- Per- Num- Per- 8,936 ber cent ber cent ber cent ber cent ber cent Local reports 2,463 35 1.4 44 1.8 531 21.6 1,396 56.7 458 State reports 18.6 2,411 16 0.6 162 6.7 607 25.2 1,221 50.6 405 16.8 22. Rate the local and State evaluation re- ports on the PRESENTATION OF REQUIRED MANAGEMENT INFORMATION NEEDS. (notes b and e) (Presentation of Required Management Information Nees: the extent to which the report presents the information needed to evaluate and update policies by those who transfer policy decisions into plans, budgets, program implemen- tation, curriculum, operational over- sight, resource allocations, forecasts, status assessments and reports, educa- tional accountability, costs, benefits, and efficiency assessments). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficient Marginal Adequate adequate from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per- 8,936 ber cent beL cent ber cent ber cent ber cent Local reports 2,471 58 2.3 147 5.9 558 22.6 1,333 53.9 376 15.2 State reports 2,420 45 1.9 204 8.4 672 27.8 1,132 46.8 367 15.2 105 APPENDIX II APPENDIX II a i lE. CI 0foz 4 e4 aI= - el I_ I -PI -I N Un %A -clz .1 ' Z.6L -. -crZ 01 CI C4 dUI@J4JI Onr 5.O1 LI CI C J.cI -- C'JfrI n~ Ail co L e Ai I 00 u ul a, VI C to I -: - (9 U aII ~~~~~~~~~~~~~4 4I I CW 1-.I . UI C, £I It CZ.0j -- o.. 0Vc Ai I 4- GMtGWJI 010 - JGI=Q)L.C0101 E 'U " - >1.Ogla -- .. M Ca.o, *1 C( I %DI $ It 0 IC Ai *43 I.A C lao4 o a, %Q . , qr j0I C e . . tO UC0 I I . .Ic0 a0 Ai .. I 00. U .W * > CW IaaCUa a4 0(L0 'I AJ = LC0 'U.C). 0 ' "- · Co IU.Ccn · 0mw '.0.A .0 ~~~~~~~~~~~~~Uz AiU 0 0 Ai Ia L U- Of 1 C~~~~'U a, IU~~~~~~~~4*C0 0 I 0~~~~~~~~~i4110' a" Ai O 0 ZiI . I ·- p, C-L) 113 q( 0 Q0 (N (.C 4*4* a, - 04* W .-. 41 JC * C ·~~~~~~CG~~~~,'j GJO~~~~~.4I U A tr I" L~~~~~~~~t, C a30 P) O 00I Z -a 4* Cey w 1 ~ P6 L c (4 · .4, aUL*u ~~I~1,M 0.0. kwa- - ·3v, 0 4 A clU, mf-,C. Q r06 . Pc 0 V, 01~~~~~~~~~~~~~~~~~~~~~~~~~~04 U eV~~~~~~~~~ a r ua, . ~ 10c atw OY 1 L APPENDIX II APPENDIX II Q -4 , , U -01 II i e , _ _ ChIl 4 C l 19Mo C ff v14IhiCI il ,I * V41'bca OI. C l lbII o . I 4101IAUI , · .4011u4 ,0 . - l :~,le )0(l- , - oN4 'o ~ Iz U , 0 i N N I 011, Ut L .4> IU mCI · , , 10* .C ,,Q~~ e mcO~ _ ~aI c N . ioi-1 N a e O S ^0 L_ , .l i .. lb o lbD 1 0.UI 0% . 0c eXetC c4 O 0 a re£ C C I.0 4: 4 .1IalblN W U £4- -4 u. 0N tiZOI 0 -. l '0~C 0 Iq010@ 0 4dl 01I601' -" " r- ::* @2 ': ' C00 , @00, 4. -4 C U lb.4 41g C0.'0 £ .0 120 0C4 , C0 Cj.C 0 4 'U C M 0 0 .m .£ ".a 0104 Ai W. 1 ..4 M 4 0 0 I l1 c0 0 1.4 . 'C 0 a l 0 . .0C0 0 ,l *-. U-o.'0 200i .- .- 1. .C.-.. 011 l 0 ,- 01 C 0C '0C-41o C0 1060) 0' C~ . laCIL 0 oU1 g I' M A o 1~441p I '01 Cla to ..C0o ,.,... U0 "(~ 'o 90 . = 001q 0 co. 0.-i 4i C 1l C£ CC 0 0fl 107 APPENDIX II APPENDIX II '0 0 1W I u' 41 I tm- o 4rI I o °o-C~i..l'I a r o UI)Z WI1 u1 D en O - I t4 U1 nr c 41i L~~i a) L o4ZU I T 0 0t1C Li ol i to 106 (I V Uo cW Uenc) 0 J 02 01 - 4 a4 O VU: E) ^ w * C .8.1 jJ'0 0 WOI OJ -4 Q-LZ GCIC.) iU W r A U 0 2 IV $4 "4 I 0 4OO 'WU1 J ,, C. ,'O 4, . e I w 0. to'W',-~0W G C4 iU ( SW O3 C , If 0, a o a c L L41CWL1"- 04 0.CiuU W O4 0 CO4Li 0.4 O C D( . -i 0 0 U D I 0 ao m 0 t 0 ro 4) C 'W $-4 W C C *n g aOO 0~ 0 CIE 0 108 APPENDIX II APPENDIX II i I i d N l W 4 ll .i C 4 ;I .c c r, I I I-l in n I C 0' ID CD zl 0I I N -0 m N N c N rZ I -, i ow e -| _ . , , oI -1 - l4 Cl . . . cic Z.I I l O - . Isi- ~" -1 0 0 lfO ,I -4 -o '- N N - WI1I I I ~ I rt lZ .01 c iI Wl O, ' 4'i ' ," C~l l .l ZO _ ' O I C1 a, ' p a 3O 0 C i 0 0. N CUa i- W , , N ' ~1 zIIuV~ E W - 1 1 O,7 N Ilj . I N aA l Oa , ~.UUJ~ e. cS 45 OW41~i-~~i~.i.&, qrI- CC * I) *^ '' C 45 4QJC Ci -41 C > th. ,,4M,,00.C~ , J @V 4 WU g I 4 o,cin - - C .--o V C 4C c4 N 0' C J '. O Oi*MJ O i4 tn a. 4C°-Z a C C n _ _ CW 441M4 4V M :- ,4.UN -C 4 C CE ~ i-COCC aO 02 U0~~~1 C.) O V ) M. 45 ),~* aa4 U * W Ck CI(ai- 1Wa"C45 40 4.. U I 0W a, ('>45 a C W Cr C " I d a OCC i C CC aco. . W "ao'& ~; iI C a. W,1 " >" 1. . C ' n C0 C e l C W e. ' z sv o >45 L. 04 4 VW 45 4,. V C 4 C ' ,° I ,C ) 45 ao 45 c 0o'U - fi0 i-V A. tU O _ *f 0 4,5 C1 o "0 f 4III-iC ->.41 45I · 0i-0C0MU).CI,~ C C. z0 0 >U0 >40 >4141 M10 0l~ *.) 0.W 0 0 0, - U.C B~r 4145a-ar, W1A-i C4 'C v -MO aU a da x a u, u Ai %w U 0 W 0W" 40 E APPENDIX II APPENDIX II £0Q C MN N _ -4 NI 11 ' °n A - 4 N 6 4 C t.01! v I% I 6 .- , n i , a 0 - ,, . O 4 .0U . N -A 0 Nl W IU N ,-I I - Z .0 II4 )' n _ An C l'0 (. 0* I I _" N _o c la u . . ^ -·IS N 0 % IZ 4i F l c0 .@ a ' a a0 WI IIi JI lr u h a -4 -Ai~ .4 e- CU4 C 41 CP ca f 041 - Ai ~q 41.4aC04U4 'g 6 -o 6 am 61 .. OC0 'X M U aM e W "o0a a)4 ~"n C A1 ! >W °w -d An 6 % e1-M6M. c_ '0 "O Cc @ e - 0 e @ 41 4n 6 U0 * 4 J1 LMg41 0 04 uMAi O MO to .0 a * _ c -0 >c ' O o *4o Ai C~ '- o *. u0 oe c~ CL a .. ·. -~ ,' v_ -o _ ~:p c>; :1 @ O * 4 ev @ @0 o O F o Aia N 4 toO3 .U V XO . Ai . - 0 .. Co°e 4 A .. 46 M 4 '01 C(*ACr_ 1w n U ta & *U l: VI 4 a, a N ci VA A0 _. c u c m v a a 4 4.4 O C, -Y -Y - - 110 · U C rI APPENDIX II APPENDIX II Number of projected responses Responses from 8,936 Number Percentf (note b) (note b) (note c) 30. Are your local evaluations of federally funded programs usually performed by ex- ternal or internal evaluators? (Check one.) 426 Internal (e.g., local education agency staff) 205 48.1 External (e.g., consultants) 91 21.3 Both 131 30.7 100.0 f/ 31. Which of the following types of tests did your local education agency adminis- ter for its title VII evaluation? If your local education agency did not test title VII students, skip questions 31 and 32. 394 Standardized norm-referenced tests (note d) 357 90.7 Criterion-referenced tests (note g) 148 37.5 Other tests (please specify) 91 23.0 32. How did you report the results for the standardized norm-referenced testing? 396 Raw scores 132 33.4 Grade equivalents 288 72.7 Percentiles 204 51.6 Quartiles 48 12.0 Stanines 35 8.8 Other (please specify) 28 7.1 111 APPENDIX II APPENDIX II NOTE: In the following five questions you will be asked to rate several aspects of the Elementary and Secondary Education Act, title VII, local and State evaluation reports that affect the degree to which these documents satisfy your policy, management. and programmatic information needs. You are asked to provide overall judgments on the adequacy of the quality, infor- mational content, and utility of the evaluation reports. Do this by con- sidering each of these attributes: very deficient, deficient, marginal, adequate, and more than adequate. Check the box which most appropriately reflects how you feel about the respective local and State evaluations with regard to the particular aspects in question. 33. Rate the FOCUS AND SCOPE of the local and State evaluation reports. (notes and e) (Focus and Scope: the adequacy whirwhich the report covers the the essential and related mate- rial and the appropriateness of the emphasis and treatment given to the relevant topics, details, and high and lower priority in- formation). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficient Mar inal Adequate adequate from Num- Per- Num- Per- Num- Peir- Num- Pe- Num- Per- 8,936 ber cent ber cent ber cert ber cent ber cent Local report 363 0 0 11 3.0 146 40.1 175 /,8.1 32 8.7 State reports 314 24 7.7 23 7.2 143 45.6 120 38.3 4 1.3 34. Rate the local and State evaluation reports on the PRESENTATION OF RE- QUIRED MANAGEMENT INFORMATION NEEDS. (notes b and e) (Presentation of Required Man- agement Information Needs: The extent to which the report pre- sents the information needed to evaluate and update policies by those who transfer policy de- cisions into plans, budgets, program implementation, curri- culum, operational oversight, resource allocations, forecasts, status assessments and reports, educational accountability, costs, benefits, and efficiency assessments). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficient Marginal Adequate adequate from Num- Per- Num- Per- Num- Per- Num- Per- Nuni-er- 8,936 ber cent ber cent bear cent ber cent ber cent Local reports 363 0 0 17 4.7 135 37.1 163 45.0 48 13.2 State reports 314 18 5.7 28 8.8 139 44.2 121 38.4 9 2.9 112 APPENDIX II APPENDIX II 35. Rate the local and State evaluation reports on the adequacy with which they properly QUALIFY FINDINGS. (notes b and e) (Qualification of Findings: the extent to which the report properly qualifies the findings and assumptions and identifies those conditions and situations where the findings are not ap- plicable). (Check one box in each row.) Projected number re- spond- Very ing deficient More than Deficient Marginal Adequate adequate from um- Num- Per- Num- Per- 8,936 Num u- er- Num- Per- ber cent ber cent ber cent ber cent ber cent Local reports 363 0 0 17 4.6 127 34.9 183 50.5 37 1'.1 State reports 314 23 7.3 29 9.3 165 52.6 95 30.1 2 0.6 36. Rate local and State evaluation reports on the CREDIBILITY OF FINDINGS. (notes b and e) (Credibility of Findings: the degree o confidence expressed in the findings through state- ments of statistical certainty, soundness of method, evidence of replication, consensual agree- ments, similar experiences, sup- porting expert judgment and opinions, and reasonableness of assumptions). (Check one box in each row.) Projected number re- spond- Very More than ing deficient Deficient Marginal Adequate from Num- Per- adequate Num- Per- Num- Per- Num- Per- Nun- Per- 8,936 ber cent ber cent ber cent ber cent ber cent Local reports 362 1 0.3 28 7.7 143 39.4 138 38.2 52 14.4 State reports 314 24 7.7 25 7.8 142 45.1 122 38.8 2 0.6 113 APPENDIX II APPENDIX II 37. Rate the local and State evaluation reports on the adequacy of the QUAL- IFICATION AND QUANTIFICATION OF hEAS- UREMENT DATA. (note b and e) (Qualification and Quantifica- tion of Meaaurem nt Data: the extent to which the evaluation assessments can be ualified and quantified into :easurable attributes and parameters that address the problem in measur- able, ope ational, or concrete terms). (Check one box in each rOw.) Projected number re- spond- Very More than ing deficient Deficient Marginal Adequate ad ate from Num- Per- Num- Per- Num- Per- Num- Per- Num- Per- 8,936 ber cent bher cent bar cent ber cent bar cent Local reports j49 2 0 .9 138 39.6 165 47.3 41 11.8 State reports 29' 25 8.3 21 7.1 106 35.9 127 43.1 17 5.6 38. Now often do youi Elementary and Secondary E'cation Act, title VII, local prog evluatiol, adequately report inf on on the need as- sessment, nu of children, per pupil expendt.. e, project achieve- ment, and pupil nenefit parameters? (Check one box in each row.) (notes o and e) Projected number re- spond- As often Occa- ing Very often Gener.lly as not sionall[ Sel o from Nm - r- r- Num- -r a-Per- - guF : 8,936 ber cet.t ber cent ber cent ber cent het cent (I) Information on the manner in which the needs of children are assessed 369 146 39.4 161 43.6 4 1.1 45 12.1 14 3.3 (2) Information on the number of children in the program 368 198 53.7 106 28.8 2 0.5 50 13.5 13 3.5 (3) Per pupil expenditures of each pro- grim 367 89 24.3 103 27.9 69 19.9 79 21.4 28 7.6 (4) Evidence of qualifiable and measur- able achievements 369 135 36.6 152 41.2 28 7.6 43 10.8 14 3.8 (5) Evidence of qualifiaole or measur- able pupil benefits 368 148 40.1 123 33.4 28 7.6 54 14.5 16 4.3 114 APPENDIX II APPENDIX II 39. From your experience, how often can you draw comparisons between the re- sults of your local Federal program and Federal programs in other loca- lities from State and Federal eval- uation reports? (Check one box for each type of report,) (notes b and e) Projected number re- spond- As often Occa- No basis ing yLyoften Generally as not sionally Seldom to udge fr"n N--rum- NueM- felr- um-Prm- f,936 Num- Per- um- Per Per- ber cent ber cent ber cent ber cent ber cent ber cent State reports 361 40 11.1 72 19.9 12 3. 3 104 28.8 37 10.2 97 26.9 Federal reports 374 22 5.9 39 10.4 34 9.1 95 25.4 88 2;.5 97 25.9 a/We requested that the questionnaire be completed by local education agency officials familiar with the local agency's Elementary and Secondary ducation Act, evaluation efforts conducted for and delinquent, etc.), III, and VII, and titles I (regular, migrant, neglected forts. The questionnaire the local agency's own assessment ef- Assessment f Educational was divided into four sections: Progress and local education agencysection A, National section , Elementary and Secondary Education testing programs; III; and section D, title V Act, title I; section C, title evaluation tionnaire be taken apart and distributed efforts. We suggested that the ques- to persons most familiar with each part, such as the local education agency director evaluation for section A, the respective of research, planning, and local education agency directors of title I for section 8, title III for section Some of the questions from the questionnaire C and title VII for section D. port. Questions pertaining to the National have been omitted from this re- Assessment of Educational Progress and local education agency's own testing programs are shown in GAO's report to the Congress (HRD-76-113), dated July 20, 1976. b/In April 1975 we sent the questionnaire to a national statistical sample of 832 local school districts. By June districts or 85 percent. The numbers 1975 we received responses from 710 school school districts in the Nation--out of shown above represent the number of local the 11,666 in the defined universe 300 or more pupils--to which our local questionnaire with projected. We projected a total of 8,936 sample responses have been local instead of 11,666 for technical reasons--based education agencies responding rates across the various strata in our sample, on the weighting and the response the most accurate percentage breakdowns on this method allows us to obtain of responses to each line item the answers given. Where the number do not total to the 'Number of projected re- sponses" column, it is due to rounding. c/This column shows for each question the percentage of projected respondents choosing each specific answer. d/Standardized norm-referenced tests are ual student's ability or achievement in tests which purport to assess the individ- rest of the test population (e.g., the broad subject areas as compared to the Metropolitan Achievement Test (MAT) or Wechsler Intelligence Scale for Children (WISC)). the e/Tne percent columns show the percentage of projected respondents to each line item choosing each category. Where the 100.0, it is due to rounding. percentages on each line do not add to f/Total does not add to 100.0 due to rounding, 2/Criterion-referenced tests ae tests specifically constructed to measure students' attainment of specific educational objectives r'culum material. These tests, which or proficiency with specified may be standardized, usually provide acur- specific and operational description of the or behavioral measures used as a criterion level and type of task performance to indicate attainment of the educa- tional objectives. For example, the student product of all single digit numerals greater must be able to compute the correct errors. than ero with no more than five 115 APPENDIX III APPENDIX III DEPARTMENT OF HEALTH. EDUCATION. AND WELFARE OFFICE OF THE SECRETARY WASHINGTON. D.C. 20201 JUN 1 5 1977 Mr. Gregory J. Ahart Director Human Resources Division United States General Accounting Office Washington, D.C. 20548 Dear Mr. Ahart: The Secretary asked that I respond to your )quest for our comments on your draft report entitled, "Problems and Needed Improvements In Evaluating Office of Education Programs." The enclosed comments represent the tentative position of the Department and are subject to reevaluation when the final version of this report is received. We appreciate the opportunity to comment on this draft report before its publication. Sincerely yours, Thomas D. Morris Inspector General Enclosure 116 APPENDIX III APPENDIX III Comments of the Department of Health, Education, and Welfare on the Comptroller General's Report to the Congress entitled "Problems and Needed Improvements in Evaluating Office of Education Programs," February 22, 1977, B-164031 (1) General Department Comments The subject GAO report critically assesses the Office of Education's program evaluation mechanisms, concludes that opportunities exist for improvement, and recommends that the Secretary of HEW direct the Commissioner of Education and the Director of the National Institute of Education to take a number of corrective actions. Several changes would strengthen the report. There needs to be more careful consideration of the costs of evaluati-. While there is an assertion (p. 50) that "the amount of funds spent to evaluate State and/or local education agency Title I, III, and VII elementary and secondary education programs is sub- stantiel," on p. 33 the amounts are respectively 0.2% of Title I and 1.8% of Title III. Twenty cents per $100 is not "substantial." While the quality of data the report appears to expect would require significant additional resources and this expenditure is not possible within existing budgets, it would also Le high in relation to the possible payoffs through improvements in the programs. Are the increased costs of providing better information to State and local decision-makers likely to result in sufficient improvements to justify the expenditure? Do State and local decision-makers have the incentives and does the state-of- knowledge permit improvements or economies at the local level equal (at least) to the increased costs of evaluation improvements? If the costs of improved evaluation are not recoverable in improved productivity or efficiency, calling for more is a questionable recommendation. As another example, the call for the NIE to support more research on criterion-referenced testing is acceptable, but this is. as recognized in the report, a long-term effort, and it is not clear whether GAO believes NIE deserves additional appropriations for such n effort. NIE can hardly divert substantial resources from its present $70 million. A final point deserves mention. In spite of the reports of three of the four key congressional committee staff members that OE's evaluations "often have been completely ineffective or have had little impact on legislation" (p. 21), there is a growing body of professional opinion that O's studies have, over the past ten years, been responsible for many major changes in existing legislation. In retrospect, even though apparently many of the reports did not reach "Congressional decision- makers" in a timely manner, the reports were solid pieces of work, widely discussed and debated, and by the time the next piece of legislation came up, the climate of opinion about particular programs had changed. 117 APPENDIX III APPENDIX III As a result of such retrospective analysis, the approach to evaluation which assumes there are certain "decision-makers" and that effective evaluations provide timely data to them is increasingly being questioned. Rather it appears that effective ealuations affect the climate within whicn particular decisions are broad political made. The large scale OE evaluations have stood the test of time in this view of evaluation far better than any of the alternatives proposed in the report appear likely to. (To underscore this point these large-scale OE studies in witness the large number of citations of the report itself.) In short, the report is well done within its limits. The recommendations do not give adquate recognition to the additional entail, nor to whether the tradeoffs in improved financial burden they program quality are likely to justify the additional expenditures. of the limitations of the view that evaluation And there is no discussion ought to serve ecision- making in a direct and linear sense. GAO Recommendation That the Secretary of HEW direct the Cmmissioner of Education to strongly emphasize the purpose of providing information ore to the Congress when planning, implementing and reporting on evaluation studies. In particular, more attention should be iven to timing the studies so that they closely coincide with the legislative cycle, (See GAO note p. 124) and briefing congressional committee staff more frequently. Department Comments We concur with the eneral conclusion that Congressional needs can be better served than we are now doing. First, we concur that it is clearly desirable that evaluation studies should be timed to more closely coincide with the legislative cycle. It is rot the case, however, rFGAO has apparently concluded, that this obviously important problem has not received attention. Rather, the frequent failure with the legislative cycle hs been due to such to coincide studies factors as: inadequate funds to initiate evaluations at the right time; delays in the procure- ment cycle due to uncertainties in the appropriation difficulty in getting studies and data collection process; increased instruments cleared; and failure to anticipate difficulties and delays in the data collection process in the field. Nevertheless, we believe these roblems can be more effectively addressed and, in order to do so, we have initiated a series of reviews of our studies which focus on predicted production dates for findings and recommendations vis-a-vis critical dates for input to legislation renewal. Second, we concur that congressional committee staff should be briefed more directly and fully on the findings of evaluation need has not received the attention it deserves, studies. This but we have recently made the decision to institute such briefings on all major evaluation studies, and expect to get this new procedure underway in the coming weeks. 118 APPENDIX III APPENDIX III (See GAO note p, 124) GAO Recommendation That the Secretary of HEW direct the Commissioner of Education to better define the program objectives to be evaluated as required by the General Education Provisions Acc. This includes translating the legislative ur- poses of each program into specific qualitative and measurable program objectives, and clearly stating these objectives, and the progress made toward achieving them, in the annual evaluation report. Department Comments We do not concur. In most cases, legislation fails to state a program's objectives with sufficient clarity that they are readily susceptible to evaluation. But very often this failure of the legislation to be clear and precise on program objectives is the price paid through political compromise for getting the legislation passed at all. The Office of Education proceeds at considerable peril in trying to further specify legislation. It has in fact been criticized on several occasions for going further than the Congress intended, and of "trying to legislate by means of regulation." Furthermore, in many cases it has been the Congress' specific intention to avoid specification of program objec- tives and to leave such judgments and decisions up to State and local officials. For example, in turning down a proposed amendment to concentrate seventy- five percent of compensatory funds on basic skills Title I of ESEA, the Senate Committee said: "The Committee believed it inappropriate for the Federal government to sustain its judgments on appropriate Compensatory Education Pro- grams for that of State and local officials." (Senate Report 93-76Z, 1974, p. 30) On the same question the House observed: "The Committee feels strongly that i Local School Agency is the appropriate level to determine the special needs of educationally deprived children and should be primarily responsible for deter- mining approaches to meeting those needs." (H Report 93-805, 1974, p. 20-21) 119 APPENDIX III APPENDIX III For all these reasons, we believe there are very definite limits on the Office of Education's authority and ability to increase the clarity and specificity of program objectives. GAO Recommendation That the Secretary of HEW direct the Commissioner of Education to improve the implementation of evaluation results by giving greater attention and priority to procedures such as the issuance of Policy Implication Memo- randums designed to insure implementation of those results. Department Comments We concur. The need to improve the utilization and implementation of evaluation reports is a major problem which all Federal agencies and the Congess jointly face, and we further concur that increased efforts should be devoted to its solution. The Policy Implications Memorandum (PIM) is an invention of OE's evaluation office, and while its potential for improving the utilization of evaluation findings is considerable, GAO is correct in observing that it has not been used nearly as extensively in OE as it should have been. Efforts are currently underway to expand the production and use of the PIM. We are now conducting periodic reviews of the production schedule for PIMs and emphasizing their high priority. GAO Recommendation That the Secretary of HEW direct the Commissioner of Education to assess whether State and/or local evaluation reports for Title I and VII of the Elementary and Secondary Education Act can be improved so that they supply officials at Federal, State, and/or local levels with the reliable program information they need for decision making. GAO Recommendation That the Secretary of HEW direct the Commissioner of Education to review the types of State and/or local program information collected on programs authorized by Titles I, III, and VII of the Elementary and Secondary Edu- cation Act. Department Comments We concur with the general thrust of GAO's recommendations in this area, but most of the actions that GAO recommends are already underway, many by legislative mandate. With respect to Title I, new evaluation requirements (Section 151 created by P.L. 93-380) directed the Commissioner to develop evaluation models and standards and to provide technical assistance to the States and local districts in order to improve the quality of local Title I evalu- ations and to yield comparable data which could be aggregated to State and National levels. All of this is being carried out: OE has interviewed personnel in policy making roles at the Federal level (both Congressional 120 APPENDIX III APPENDIX III and HEW staff) to determine the kinds of information they felt should be included in the annual State and local evaluation reports; an advisory group of personnel from the different operating levels of the Title I program (including parent representatives) indicated the kinds of infor- mation they thought should be included in the reports; and evaluation models and their associated reporting forms were developed and reviewed by each State agency and three of its locals to determine the kinds of problems they might have in using them. As a result of these efforts, a limited core of essential information was identified as being desirable at the Federal level. This core of information is substantially less than what has frequently been contained in State the past. It will become the Federal evaluation and local reports in requirement when regu- lations for t'.s portion of the legislation are published. OE has sponsored a series of workshops for State and local evaluation staff in the use of the standard evaluation models and reporting system. We have established ten technical assistance centers--one for each of the HEW regions--to provide technical assistance and local staff on a continuing basis. We have and training for State prepared training manuals and guidebooks for widespread dissemination to current and future users of the models and reporting system. Once the place across the nation, their use should models and system are in result in data which can be aggregated across States and across school districts. are in place, OE will be better able to assess And, once they ant data are sufficiently free of systematic whether or not the result- errors to support aggregations to the State and national levels in a satisfactory manner. If they are not, then a determination can be made as to whether technical problems existed that could be overcome or whether different kinds cf studies needed to be done to satisfy Federal, State, and local repo-ting requirements. With respect to the recommendation on ESEA Title VII local evaluation reports, we believe they can be improved and we steps to do o. Recently published Title VII have taken the following regulations strengthen the requirements for evaluation of LEA bilingual projects. In addition, NIE and OE have a joint project underway to upgrade the technical expertise of persons responsible for local evaluations. worthwhile to improve local evaluations, the While we believe it is extent to which such evalu- ations will be useful at the Federal level is not yet evident. Certainly the problems of aggregating across local bilingual evaluations are more severe than in Title I, if only because of the multiplicity of languages involved. (See GAO note p. 124) Thus, as regards the much needed and legislatively mandated actions to improve State and local evaluations and reporting and VII, we believe GAO's understanding under Titles I, IV, is incomplete (See GAO note p. 124) that the problems they refer to are well understood by both the Office of Education and Congress and that appropriate actions to deal with them are already well underway, and in some cases near completion. 121 APPENDIX III APPENDIX III GAO Recommendation That the Secretary direct the Director, National Institute of Education, to consider the need for funding additional research on (1) criterion- referenced tests an- other alternatives to standardized norm-referenced achievement tests for uses which include program evaluation and (2) the nature and extent of racial, sexual, and cultural biases in standardized tests and how such biases may be reduced. Department Comment We concur. Several groups within the Institute presently have ongoing research programs in criterion-rL :erenced testing and test bias. More emphasis will be given to these pograms in FY 1977, FY 1978, and FY 1979 if appropriations for the Institute are increased significantly above present levels. Under NIE leadership, the Center for the Study of Reading at the Univer- sity of Illinois and the Center for the Study of Evaluation at UCLA are doing research that 'till lead to alternatives to standardized achieve- ment tests in readi 3 comprehension and writing. In addition, atatis- tical research is in progress on how to assess the probability of making errors in classifying students on such tests, how to set the length of such tests and how to determine passing scores. The SOBER-Espanol project funded by NIE is developing criterion- referenced tests to assess competency in reading Spanish for grades K-6. The SOBER system allows teachers to create "tailor-made tests" by matching prepared test items to reading objectives. Currently, K-3 tests are being published and distributed through Science Research Associates. Grades 4-6 will be published later this year. NIE is also supporting research and development on criterion-referenced testing and other alternatives to norm-referenced achievement tests for the purpose of educational exit testing and occupation entry selection. In December 1975 NIE held a conference for test developers, test critics and others to consider methods of identifying and eliminating bias in readin; achievement tests. Since that time, NIE has funded two grants on teit bias: one is a project to detect and eliminate the motivational causes of test bias and the other is a project to debias the language found in a widely used, standardized achievement test. And finally, NIE has an in-house project which applies new methods of qualitative data analysis to the intractable problem of detecting bias in test items. NIE is also supporting work on sex bias in the assessment of a person's occupational interests and biases in educational exit and occupational entry testing--the latter being of particular importance given the Griggs v. Duke Power decision of the U.S. Supreme Court. The products of these studies inc'ade the NIE Guidelines on Sex-Fair Vocational- Interest Measurement and the Abt kit on the interpretation and usage of sex-fair vocational-interest tests. 122 APPENDIX III APPENDIX III GAO Recommendation That the Secretary direct the Director, National Institute of Education, to improve dissemination of available NIE-funded information, which is intended to help in selecting the most appropriate standardized tests, thereby increasing State and local education agency officials' awareness and use of this information. Department Corr 'nt We agree that there is a need to improve the dissemination of NIE-funded material intended to help educators select appropriate standardized tests. The Institute's policy is not to force its products on school personnel, but we do intend to make school personnel familiar with the products that are available. In the case of our consumer's guides to standardized tests, that is, the test evaluation books produced for NIF by the Center for the Study of Evaluation at UCLA and mentioned in this GAO report, our Basic Skills Goup and our Dissemination and Resources Group are collabo- rating in the dissemination of these products. Several approaches are or will be used. First, the test evaluation books are listed in the Insti- tute's Catalog of NIE Education Products. This catalog has been offered free o each superintendent of schools and each district director of curriculum in the country. Five thousand copies have been distributed to superintendents. A similar distribution will now be made to the district directors of evaluation, who presumably will be particularly interested in the test evaluation books. In addition, we plan to use the newly formed Lab and Center R&D Exchange to give school personnel more informa- tion about the acquisition and use of test evaluation books (e.g., through brochures or workshops). Th.s dissemination network has the potential of reaching 50 percent of the country's school systems. Still another dis- semination network that will be used in a similar way is the one formed by our seven R&D utilization contractors (five State education agencies and two nongovernmental organizations). If the GAO report is correct in identifying a need for the test evaluation books--and we think that it is--these approaches should give the books much wider dissemination than they have had up to this point. NIE is also sponsoring the dissemination of test information in more specialized areas. The Education and Work Group has supported consumer guides to standardized tests in career education and occupational prep- aration. Dissemination of products will also be built into the further work of this group to improve testing in career education. The Educa- tional Equity Group has funded American Institutes for Research to develop a catalog reviewing assessment instruments for children of limited English-speaking ability at the K-6 levels. Descriptive infor- mation (author, publisher, research data available, etc.) and analyses of the appropriateness of the tests for use with bilingual children will be provided. All information will be comprehensible to educational practitioners. This contract will also identify those areas and levels for which existing tests are inadequate or nonexistent. 123 APPENDIX III APPENDIX III (See GAO note below) GAO note: Deleted comments pertain to material presented in the draft report which has been revised or not included in the final report. 124 APPENDIX IV APPENDIX IV ADDITIONAL Uq'ESTIONS-BY OE CONFEREES FOR-IMPROVING TESTING-AND EVALUATION As discussed in chapter 6, the Office of Education sponsored a special 4-day conference on "Achievement Testing of Disadvantaged and Minority Students for Educational Pro- gram Evaluation" in May 1976. The Office invited about 50 experts in testing, program evaluation, and related fields, including university and other researchers, and rep- resentatives of leading test publisher_. Federal and local education agencies, as well as education and other interest groups, were also represented. The conference focus was on large-scale program. evaluations of elementary and secondary rchool compensatory and desegregation programs--programs on which OE concentrates much of its effort. The purpose of the conference was to identify, define, and analyze the many problems associated with using standardized achievement tests in these programs, and was to develop interim and long-term solutions. In addition to the suggestions discussed conclusions and recommendations from the five in chapter 6, small working groups formed at the conference's close included the following: -- In the context of educational program evaluation: development, standards, administration, and use of standardized tests -oust be accounted for and moni- tored. Alternative approaches which should be ex- plored include: (1) Federal legislation to estab- lish a monitoring body with enforcement powers to oversee the testing practices of test developers, State and school district educational systems, re- searchers, and others, (2) an independent monitoring agency sponsored by major test developers composed of minority and organizational representatives, and (3) enforceable testing code of ethics, including mandatory withdrawal of services by test producers in established cases of misuse of tests and test informa- tion. -- There is need for federally funded studies to increase understanding of the nature and extent of biases in tests and how such biases might be reduced. Studies of this problem might include detailed studies of individual pupils in interviews or computer simula- tions of bias models. 125 APPENDIX IV APPENDIX IV -- There is no clear consensus on test bias definitions nor clear technical procedures for identifying a biased question or test. There is, nevertheless, enough documentation of. public concern--including calls for cessation of testing--and empirical data to justify change and development of guidelines. It is imperative that the testing community join with other interested groups to agree on the steps to be taken to develop and use tests that are judged to be fair. -- Specific guidelines are also needed for test adminis- tration and assessment of bilingual groups. -- There clearly needs to be an extension of the present professional testing standards of the American Psycho- logical Association to cover the use of achievement tests in program evaluation. -- OE should support developing a procedures manual for determining the appropriateness of using achievement tests in program evaluation and properly selecting, administering, scoring, and interpreting data from such tests. Such a manual should include the degree that other information must be used together with achievement tests to adequately describe program out- comes. This should be the first (and more immediately accomplishable) stage of a longer term effort to pro- duce procedures manuals that address means other than achievement tests for collecting program evaluation data. For the longer term effort, more experimentation (field-testing) is needed to investigate alternative data collection modes to firmly establish them. -- Publishers of standardized tests should give explicit step-by-step instructions in their users' or technical manual about how to use their tests correctly for various purposes and how to avoid misuse. For example, these purposes may include needs assessment, diagnosis and prescription, or project evaluation. -- The use of grade-equivalent scores on standardized tests should be eliminated. -- More tests ought to be developed for diagnosing educa- tional problems and prescribing remedies. 126 APPENDIX IV APPENDIX IV -- The work statements for evaluations in Federal agency requests for proposals are not always adequate. Examples can be cited in which technical approaches have been overspecified by persons who perhaps have not fully understood eicher the technical or the practical problems involved. More time needs to be allocated for writing requests for proposals, and more professional review of them must be accomplished. Detailed technical and procedural specifications should never be included in work statements unless such specifications are the consensus of a panel of national experts in the field. 127 APPENDIX V APPENDIX V PRINCIPAL OFFICIALS OF-THE DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE RESPONSIBLE FOR ACTIVITIES DISCUSSED IN THIS REPOT -- Tenure'of-office- --- From To SECRETARY OF HEALTH, EDUCATION, AND WELFARE: Joseph Califano Jan. 1977 Present David Mathews Aug. 1975 Jan. 1977 Caspar W. Weinberger Feb. 1973 Aug. 1975 Frank C. Carlucci (acting) Jan. 1973 Feb. 1973 Elliot L. Richardson June 1970 Jan. 1973 ASSISTANT SECRETARY FOR EDUCATION: Mary Berry Apr. 1977 Present Philip Austin (acting) Jan. 1977 Apr. 1977 Virginia Y. Trotter June ].974 Jan. 1977 Charles B. Saunders, Jr. (acting) Nov. 1973 June 1974 Sidney P. Marland, Jr. Nov. 1972 Nov. 1973 COMMISSIONER OF EDUCATION: Ernest L. Boyer Apr. 1977 Present William F. Pierce (acting) Jan. 1977 Apr. 1977 Edward Aguirre Oct. 1976 Jan. 1977 William F. Pierce (acting) Aug. 1976 Oct. 1976 Terrel H. Bell June 1974 Aug. 1976 John R. Ottina Aug. 1973 June 1974 John R. Ottina (acting) Nov. 1972 Aug. 1973 Sidney P. Marland, Jr. Dec. 1970 Nov. 1972 Terrel H. Bell (acting) June 1970 Dec. 1970 James E. Allen, Jr. May 1969 June 1970 DIRECTOR, NATIONAL INSTITUTE OF EDUCATION: John Christensen (acting) July 1977 Present Emerson J. Elliott (acting) Jan. 1977 July 1977 Harold L. Hodgkinson July 1975 Jan. 1977 Emerson J. Elliott (acting) Oct. 1974 July 1975 Thomas Glennan Oct. 1972 Oct. 1974 (104003) 128
Problems and Needed Improvements in Evaluating Office of Education Programs
Published by the Government Accountability Office on 1977-09-08.
Below is a raw (and likely hideous) rendition of the original report. (PDF)