oversight

Justice Outcome Evaluations: Design and Implementation of Studies Require More NIJ Attention

Published by the Government Accountability Office on 2003-09-24.

Below is a raw (and likely hideous) rendition of the original report. (PDF)

                 United States General Accounting Office

GAO              Report to the Honorable Lamar Smith
                 House of Representatives



September 2003
                 JUSTICE OUTCOME
                 EVALUATIONS
                 Design and
                 Implementation of
                 Studies Require More
                 NIJ Attention




GAO-03-1091 

                                                 September 2003


                                                 JUSTICE OUTCOME EVALUATIONS

                                                 Design and Implementation of Studies
Highlights of GAO-03-1091, a report to           Require More NIJ Attention
The Honorable Lamar Smith, House of
Representatives




Policy makers need valid, reliable,              From 1992 through 2002, NIJ managed 96 evaluation studies that sought to
and timely information on the                    measure the outcomes of criminal justice programs. Spending on these
outcomes of criminal justice                     evaluations totaled about $37 million. Our methodological review of 15 of
programs to help them decide how                 the 96 studies, totaling about $15 million and covering a broad range of
to set criminal justice funding                  criminal justice issues, showed that sufficiently sound information about
priorities. In view of previously
reported problems with selected
                                                 program effects could not be obtained from 10 of the 15. Five studies,
outcome evaluations managed by                   totaling about $7.5 million (or 48 percent of the funds spent on the studies
the National Institute of Justice                we reviewed), appeared to be methodologically rigorous in both design and
(NIJ), GAO assessed the                          implementation, enabling meaningful conclusions to be drawn about
methodological quality of a sample               program effects. Six studies, totaling about $3.3 million (or 21 percent of the
of completed and ongoing NIJ                     funds spent on the studies we reviewed), began with sound designs but
outcome evaluation grants.                       encountered implementation problems that would render their results
                                                 inconclusive. An additional 4 studies, totaling about $4.7 million (or 30
                                                 percent of the funds spent on the studies we reviewed), had serious
GAO recommends that NIJ                          methodological limitations that from the start limited their ability to produce
•	   review its ongoing outcome                  reliable and valid results. Although results from 5 completed studies were
     evaluation grants and develop               inconclusive, DOJ program administrators said that they found some of the
     appropriate strategies and                  process and implementation findings from them to be useful.
     corrective measures to ensure
     that methodological design                  We recognize that optimal conditions for the scientific study of complex
     and implementation problems                 social programs almost never exist, making it difficult to design and execute
     are overcome so the                         outcome evaluations that produce definitive results. However, the
     evaluations can produce more                methodological adequacy of NIJ studies can be improved, and NIJ has taken
     conclusive results;                         several steps—including the formation of an evaluation division and funding
• 	 continue efforts to respond to
                                                 feasibility studies--in this direction. It is too soon to tell whether these
     GAO’s 2002 recommendation
     that NIJ assess its evaluation              changes will lead to evaluations that will better inform policy makers about
     process with the purpose of                 the effectiveness of criminal justice programs.
     developing approaches to
     ensure that future outcome                  Characteristics of NIJ Outcome Evaluations (1992-2002)
     evaluations are funded only
     when they are effectively                         Total amount: $36.6 million
                                                                                                         GAO's review of 15 studiesa
     designed and implemented.
In commenting on a draft of this
                                                                                                                                    Well designed: 5
report, DOJ agreed with GAO’s                                                                                                     Amount: $7.5 million
recommendations, and cited                                                           Amount: $15.4 million          48%
several current and planned                                                             Reviewed: 15
                                                                                                                                      Well designed/
activities intended to improve NIJ’s                                         42%                                 30%    21%     Implementation problems: 6
                                                                                                                                  Amount: $3.3 million
evaluation program. DOJ also made
two substantive comments related                            58%                                                                     Design problems: 4
                                                                                     Amount: $21.2 million
to the presentation of information                                                     Not reviewed: 81
                                                                                                                                   Amount: $4.7 million
that GAO responded to in the
                                                      Total number of studies: 96
report.
                                                 Source: GAO analysis of NIJ data.
 www.gao.gov/cgi-bin/getrpt?GAO-03-1091.         a
                                                 Percentages may not add to 100 percent because of rounding.
 To view the full product, including the scope
 and methodology, click on the link above.
 For more information, contact Laurie E.
 Ekstrand (202) 512-8777 or
 ekstrandl@gao.gov.
Contents 



Letter                                                                                   1
               Results in Brief 
                                                        3
               Background
                                                               5
               Overview of the Evaluations We Reviewed 
                                 8
               Most of the Reviewed NIJ Outcome Evaluations Could Not 

                 Produce Sufficiently Sound Information on Program Outcomes              9
               Completed Outcome Evaluations Produced Useful Information on
                 Processes but Not on Outcomes for DOJ Program
                 Administrators                                                        23
               NIJ’s Current and Planned Activities to Improve Its Evaluation
                 Program                                                               24
               Conclusions                                                             26
               Recommendations for Executive Action                                    27
               Agency Comments and our Evaluation                                      28

Appendix I     Objectives, Scope, and Methodology                                      31



Appendix II    Summaries of the NIJ Outcome Evaluations Reviewed 37


Appendix III   Comments from the Department of Justice                                 53



Appendix IV    GAO Contacts and Staff Acknowledgments                                  56
               GAO Contacts                                                            56
               Staff Acknowledgments                                                   56


Tables
               Table 1: NIJ Outcome Evaluations Reviewed by GAO                          9
               Table 2: Characteristics of 5 NIJ Outcome Evaluations with
                        Sufficiently Sound Designs and Implementation Plans            11
               Table 3: Problems Encountered during Implementation of 6 Well-
                        Designed NIJ Outcome Evaluation Studies                        15
               Table 4: Design Limitations in 4 NIJ Outcome Evaluation Studies         18
               Table 5: Number and Size of Outcome Evaluation Awards Made by
                        NIJ from 1992 through 2002, and Reviewed by GAO                32



               Page i                              GAO-03-1091 Justice Outcome Evaluations
Table 6: Size and Completion Status of the 15 Evaluations Selected
         for Methodological Review                                                        33
Table 7: Programs Evaluated and Funding Sources for Completed
         NIJ Outcome Evaluations                                                          36




Abbreviations

BTC               Breaking the Cycle 

COPS              Community Oriented Policing Services 

DOJ               Department of Justice 

GREAT             Gang Resistance Education and Training 

NIJ               National Institute of Justice 

OJP               Office of Justice Programs

OVW               Office on Violence Against Women 





This is a work of the U.S. government and is not subject to copyright protection in the
United States. It may be reproduced and distributed in its entirety without further
permission from GAO. However, because this work may contain copyrighted images or
other material, permission from the copyright holder may be necessary if you wish to
reproduce this material separately.




Page ii                                       GAO-03-1091 Justice Outcome Evaluations
United States General Accounting Office
Washington, DC 20548




                                   September 24, 2003

                                   The Honorable Lamar Smith
                                   House of Representatives

                                   Dear Mr. Smith:

                                   The U.S. Department of Justice (DOJ) spent almost $4 billion in fiscal year
                                   2002 on assistance to states and local communities to combat crime. These
                                   funds were used to reduce drug abuse and trafficking, address the
                                   problems of gang violence and juvenile delinquency, expand community
                                   policing, and meet the needs of crime victims, among other things. In
                                   addition, state and local governments spend billions of their dollars
                                   annually on law enforcement and criminal justice programs. Given these
                                   expenditures, it is important to know which programs are effective in
                                   controlling and preventing crime so that limited federal, state, and local
                                   funds not be wasted on programs that are ineffective. As the principal
                                   research, development, and evaluation agency of DOJ, the National
                                   Institute of Justice (NIJ) is responsible for evaluating existing programs
                                   and policies that respond to crime. It spends millions of dollars annually to
                                   support studies intended to evaluate various DOJ funded programs as well
                                   as selected local programs. To the extent that NIJ evaluations produce
                                   credible, valid, reliable, and timely information on the efficacy of these
                                   programs in combating crime, they can serve an important role in helping
                                   policymakers make decisions about how to set criminal justice funding
                                   priorities.

                                   Pursuant to our previous reports in which we reported problems with
                                   selected NIJ-managed outcome evaluations,1 in your former position as
                                   Chairman of the Subcommittee on Crime, House Judiciary Committee, you
                                   asked us to undertake a more extensive review of the outcome evaluation
                                   work performed under the direction of NIJ during the last 10 years.
                                   Outcome evaluations are defined as those efforts designed to determine
                                   whether a program, project, or intervention produced its intended effects.


                                   1
                                    U.S. General Accounting Office, Justice Impact Evaluations: One Byrne Evaluation Was
                                   Rigorous; All Reviewed Violence Against Women Office Evaluations Were Problematic,
                                   GAO-02-309 (Washington, D.C.: Mar. 2002); and Drug Courts: Better DOJ Data Collection
                                   and Evaluation Efforts Needed to Measure Impact of Drug Court Program, GAO-02-434
                                   (Washington, D.C.: Apr. 2002).



                                   Page 1                                      GAO-03-1091 Justice Outcome Evaluations
As agreed with your office, we are reporting on the methodological quality
of a sample of completed and ongoing NIJ outcome evaluation grants, and
the usefulness of the evaluations in producing information on outcomes.
Because we learned of changes NIJ has underway to improve its
administration of outcome evaluation studies, we are also providing
information in this report about these changes.

To meet our objectives, we reviewed outcome evaluation grants managed
by NIJ from 1992 through 2002. During this time period NIJ managed
96 outcome evaluation grants. Of these 96 grants, we judgmentally
selected and reviewed 15 outcome evaluations chosen so that they varied
in grant size, completion status, and program focus. The selected studies
accounted for about $15.4 million, or about 42 percent, of the
approximately $36.6 million spent on outcome evaluation studies during
the 10-year period. Although our sample is not representative of all NIJ
outcome evaluations conducted during the last 10 years, it includes those
that have received a large proportion of total funding for this type of
research, and tends to be drawn from the most recent work. Our review
assessed the methodological quality of these evaluations using generally
accepted social science standards,2 including such elements as whether
evaluation data were collected before and after program implementation;
how program effects were isolated (i.e., the use of nonprogram participant
comparison groups or statistical controls); and the appropriateness of
sampling, outcome measures, statistical analyses, and any reported
results. We grouped the studies into 3 categories based on our judgment of
their methodological soundness. Although we recognize that the stronger
studies may have had some weaknesses, and that the weaker studies may
have had some strengths, our categorization of the studies was a summary
judgment based on the totality of the information provided to us by NIJ.
We also interviewed NIJ officials regarding the selection and oversight of
these evaluation studies. To assess the usefulness of NIJ’s outcome
evaluations in producing information about program outcomes, we
reviewed the findings from all 5 of the completed NIJ outcome evaluations


2
 Social science research standards are outlined in Donald T. Campbell and Julian Stanley,
Experimental and Quasi-Experimental Designs for Research (Chicago: Rand McNally,
1963); Thomas D Cook and Donald T. Campbell, Quasi-experimentation: Design and
Analysis Issues for Field Settings (Boston: Houghton Mifflin, 1990); Carol H. Weiss,
Evaluation Research: Methods for Assessing Program Effectiveness (Englewood Cliffs:
Prentice-Hall, Inc., 1972); Edward Suchman, Evaluation Research: Principles and Practice
in Public Service and Social Action Programs (New York: Russell Sage Foundation, 1967);
and U.S. General Accounting Office, Designing Evaluations, GAO/PEMD–10.1.4
(Washington, D.C.: May 1991).




Page 2                                        GAO-03-1091 Justice Outcome Evaluations
                        in our sample that were funded in part by DOJ program offices, and
                        interviewed program officials at NIJ and program administrators at DOJ’s
                        Office on Violence Against Women and Office of Community Oriented
                        Policing Services. Further details on our methodology are provided in
                        appendix I.


                        Our methodological review of 15 selected NIJ outcome evaluation studies
Results in Brief        undertaken since 1992 showed that although most studies began with
                        sufficiently sound designs, most could not produce sufficiently sound
                        information on program outcomes. Specifically, the studies could be
                        characterized in the following ways:

                   •	   Studies that began with sufficiently sound evaluation designs: Eleven of
                        the 15 studies began with sufficiently sound designs. Some of these well-
                        designed studies were also implemented well, while others were not.
                        Specifically,

                        •	   Five of the 11 studies were sufficiently well designed and
                             implemented—including having appropriate comparison groups or
                             random assignment to treatment and control groups, baseline
                             measures, and follow-up data—so that meaningful conclusions could
                             be drawn about program effects. Funding for these methodologically
                             sound studies totaled about $7.5 million, or nearly 50 percent of the
                             approximately $15.4 million spent on the studies we reviewed.

                        •	   Six of the 11 studies began with sufficiently sound designs, but
                             encountered implementation problems that limited the extent to which
                             the study objectives could be achieved. For example, some evaluators
                             were unable to carry out a proposed evaluation plan because the
                             program to be evaluated was not implemented as planned, or they
                             could not obtain complete or reliable data on outcomes. In some cases,
                             implementation problems were beyond the evaluators’ control, and
                             resulted from decisions made by agencies providing program services
                             after the study was underway. These studies were limited in their
                             ability to conclude that it was the program or intervention that caused
                             the intended outcome results. Funding for these studies with
                             implementation problems totaled about $3.3 million, or about 21
                             percent of the approximately $15.4 million spent on the studies we
                             reviewed.

                   •	   Studies that did not begin with sufficiently sound designs. Four of the
                        15 studies had serious methodological problems from the beginning that
                        limited their ability to produce results that could be attributable to the


                        Page 3                                  GAO-03-1091 Justice Outcome Evaluations
programs that were being evaluated. Methodological shortcomings in
these studies included the absence of comparison groups or appropriate
statistical controls, outcome measures with doubtful reliability and
validity, and lack of baseline data. Funding for these studies that began
with serious methodological problems totaled about $4.7 million, or about
30 percent of the approximately $15.4 million spent on the studies we
reviewed.

Outcome evaluations are difficult to design and execute because optimal
conditions for the scientific study of complex social programs almost
never exist. Attributing results to a particular intervention can be difficult
when such programs are evaluated in real world settings that pose
numerous methodological challenges. All 5 of the completed NIJ outcome
evaluations that focused on issues of interest to DOJ program offices had
encountered some design and implementation problems. Nonetheless,
DOJ program administrators told us that these evaluations produced
information that prompted them to make a number of changes to DOJ-
funded programs. The majority of the changes enumerated by DOJ
program administrators occurred as a result of findings from the process
or implementation components3 of the completed outcome evaluations,
and not from findings regarding program results. For example, as a result
of NIJ’s evaluation of a DOJ program for domestic and child abuse victims
in rural areas, DOJ developed a training program to assist grantees in
creating collaborative programs based on the finding from the process
evaluation that such information was not readily available.

Although outcome evaluations are difficult to design and execute, steps
can be taken to improve their methodological adequacy and, in turn, the
likelihood that they will produce meaningful information on program
effects. NIJ officials told us that they have begun to take several steps to
try to increase the likelihood that outcome evaluations will produce more
definitive results, including the establishment of an Evaluation Division
responsible for ensuring the quality and utility of NIJ evaluations, the
funding of selected feasibility studies prior to soliciting outcome
evaluations, and greater emphasis on applicants’ prior performance in
awarding evaluation grants.




3
Outcome evaluations can be distinguished from process or implementation evaluations,
which are designed to assess the extent to which a program is operating as intended.




Page 4                                       GAO-03-1091 Justice Outcome Evaluations
             We are making recommendations to the Attorney General to improve the
             quality of NIJ’s outcome evaluations. We recommend that NIJ review the
             methodological adequacy of its ongoing grants and take action to improve,
             refocus, or limit them, as appropriate; and that NIJ develop approaches to
             ensure that future outcome evaluations are effectively designed and
             implemented. In commenting on a draft of this report, the DOJ’s Office of
             Justice Programs’ (OJP) Assistant Attorney General agreed with our
             recommendations. She also provided technical comments, which we
             evaluated and incorporated, as appropriate. The Assistant Attorney
             General made two substantive comments on our draft report—one relating
             to the fact that even rigorous study design and careful monitoring of
             program implementation do not ensure that evaluation results will be
             conclusive; the other relating to our purported focus on experimental and
             quasi-experimental methods to the exclusion of other high quality
             evaluation methods. We respond to these points in the Agency Comments
             and Evaluation section of the report.


             NIJ is the principal research development, and evaluation agency within
Background   OJP. It was created under the 1968 Omnibus Crime Control and Safe
             Streets Act,4 and is authorized to enter into grants, cooperative
             agreements, or contracts with public or private agencies to carry out
             evaluations of the effectiveness of criminal justice programs and identify
             promising new programs. NIJ’s Office of Research and Evaluation
             oversees evaluations by outside researchers of a wide range of criminal
             justice programs, including ones addressing violence against women,
             drugs and crime, policing and law enforcement, sentencing, and
             corrections.

             According to NIJ officials, the agency initiates a specific criminal justice
             program evaluation in one of three ways. First, congressional legislation
             may mandate evaluation of specific programs. For example, the
             Departments of Commerce, Justice, and State, the Judiciary, and Related
             Agencies Appropriations Act, 2002,5 requires DOJ to conduct independent
             evaluations of selected programs funded by OJP’s Bureau of Justice
             Assistance and selected projects funded by OJP’s Office of Juvenile Justice
             and Delinquency Prevention. DOJ determined that NIJ would be


             4
              42 U.S.C. 3721-3723. NIJ was formerly called the National Institute of Law Enforcement
             and Criminal Justice.
             5
             P.L. 107-77. See H.R. Conf. Rep. No. 107-278, at 88, 108, and 112 (2001).




             Page 5                                          GAO-03-1091 Justice Outcome Evaluations
responsible for overseeing these evaluations. Second, NIJ may enter into
an evaluation partnership with another OJP or DOJ office, or another
federal agency, to evaluate specific programs or issues of interest to both
organizations. In these cases, NIJ, in partnership with the program offices,
develops a solicitation for proposals and oversees the resulting evaluation.
Third, NIJ periodically solicits proposals for evaluation of criminal justice
programs directly from the research community, through an open
competition for grants. These solicitations ask evaluators to propose
research of many kinds in any area of criminal justice, or in broad
conceptual areas such as violence against women, policing research and
evaluation, research and evaluation on corrections and sentencing, or
building safer public housing communities through research partnerships.

According to NIJ officials, once the decision has been made to evaluate a
particular program, or to conduct other research in a specific area of
criminal justice, the process of awarding an evaluation grant involves the
following steps. First, NIJ issues a solicitation and receives proposals from
potential evaluators. Next, proposals are reviewed by an external peer
review panel, as well as by NIJ professional staff. The external review
panels are comprised of members of the research and practitioner
communities,6 and reviewers are asked to identify, among other things, the
strengths and weaknesses of the competing proposals. External peer
review panels are to consider the quality and technical merit of the
proposal; the likelihood that grant objectives will be met; the capabilities,
demonstrated productivity, and experience of the evaluators; and budget
constraints. Reviews are to include constructive comments about the
proposal, useful recommendations for change and improvement, and
recommendations as to whether the proposal merits further consideration
by NIJ. NIJ professional staff are to review all proposals and all written
external peer reviews, considering the same factors as the peer review
panels. NIJ professional staff are also to consider the performance of
potential grantees on any other previous research grants with NIJ. Next,
the results of the peer and NIJ staff reviews are discussed in a meeting of
NIJ managers, led by NIJ’s Director of the Office of Research and
Evaluation. Then, NIJ’s Office of Research and Evaluation staff meet with
the NIJ Director to present their recommendations. Finally, the NIJ
Director makes the funding decision based on peer reviews, staff
recommendations, other internal NIJ discussions that may have taken



6
 In 2002, the NIJ Director specified that there be an equal number of researchers and
practitioners on the review panels.




Page 6                                         GAO-03-1091 Justice Outcome Evaluations
place, and consideration of what proposals may have the greatest impact
and contribute the most knowledge.

NIJ generally funds outcome evaluations through grants, rather than with
contracts. NIJ officials told us that there are several reasons for awarding
grants as opposed to contracts. Contracts can give NIJ greater control over
the work of funded researchers, and hold them more accountable for
results. However, NIJ officials said that NIJ most often uses grants for
research and evaluation because they believe that grants better ensure the
independence of the evaluators and the integrity of the study results.
Under a grant, NIJ allows the principal investigator a great deal of freedom
to propose the most appropriate methodology and carry out the data
collection and analysis, without undue influence from NIJ or the agency
funding the program. Grants also require fewer bureaucratic steps than do
contracts, resulting in a process whereby a researcher can be selected in a
shorter amount of time.

NIJ officials told us that NIJ tends to make use of contracts for smaller and
more time-limited tasks—such as literature reviews or assessments of
whether specific programs have sufficient data to allow for more extensive
process or outcome evaluations—rather than for conducting outcome
evaluations. NIJ also occasionally makes use of cooperative agreements,
which entail a greater level of interaction between NIJ and the evaluators
during the course of the evaluation. According to NIJ officials, cooperative
agreements between NIJ and its evaluators tend to be slight variations of
grants, with the addition of a few more specific requirements for grantees.
NIJ officials told us that they might use a cooperative agreement when NIJ
wants to play a significant role in the selection of an advisory panel, in
setting specific milestones, or aiding in the design of specific data
collection instruments.

NIJ is to monitor outcome evaluation grantees in accordance with policies
and procedures outlined in the OJP Grant Management Policies and
Procedures Manual. In general, this includes monitoring grantee progress
through regular contact with grantees (site visits, cluster conferences,
other meetings); required interim reports (semiannual progress and
quarterly financial reports); and a review of final substantive evaluation
reports. In some cases, NIJ will require specific milestone reports,
especially on larger studies. Grant monitoring for all types of studies is
carried out by approximately 20 full-time NIJ grant managers, each
responsible for approximately 17 ongoing grants at any one time.




Page 7                                 GAO-03-1091 Justice Outcome Evaluations
                  From 1992 through 2002, NIJ awarded about $36.6 million for 96
Overview of the   evaluations that NIJ identified as focusing on measuring the outcomes of
Evaluations We    programs, policies, and interventions, among other things.7 The 15
                  outcome evaluations that we selected for review varied in terms of
Reviewed          completion status (8 were completed, 7 were ongoing) and the size of the
                  award (ranging between about $150,000 and about $2.8 million), and
                  covered a wide range of criminal justice programs and issues (see table 1).
                  All evaluations were funded by NIJ through grants or cooperative
                  agreements.8 Seven of the 15 evaluations focused on programs designed to
                  reduce domestic violence and child maltreatment, 4 focused on programs
                  addressing the behavior of law enforcement officers (including community
                  policing), 2 focused on programs addressing drug abuse, and 2 focused on
                  programs to deal with juvenile justice issues.




                  7
                  A number of these grants included both process and outcome components.
                  8
                  Three of the 15 evaluations were funded as cooperative agreements.




                  Page 8                                       GAO-03-1091 Justice Outcome Evaluations
Table 1: NIJ Outcome Evaluations Reviewed by GAO

                                                                                                                   Status
 Grant                                                                                         Award     Completed Ongoing
 Domestic violence and child maltreatment
     National Evaluation of the Rural Domestic Violence and Child Victimization
     Enforcement Grant Program                                                               $719,949         X
     National Evaluation of the Domestic Violence Victims’ Civil Legal Assistance Program    $800,154                       X
     Multi-Site Demonstration of Collaborations to Address Domestic Violence and Child 

     Maltreatment                                                                           $2,498,638                      X         

     Evaluation of a Multi-Site Demonstration for Enhanced Judicial Oversight of Domestic 

     Violence Cases                                                                         $2,839,954                      X         

     An Evaluation of Victim Advocacy with a Team Approach                                   $153,491         X
     Culturally Focused Batterer Counseling for African-American Men                         $356,321                       X
     Testing the Impact of Court Monitoring and Batterer Intervention Programs at the
     Bronx Misdemeanor Domestic Violence Court                                               $294,129                       X
 Law enforcement
     An Evaluation of Chicago’s Citywide Community Policing Program                         $2,157,859        X
     Corrections and Law Enforcement Family Support: Law Enforcement Field Test              $649,990                       X
     Reducing Non-Emergency Calls to 911: An Assessment of Four Approaches to
     Handling Citizen Calls for Service                                                      $399,919         X
     Responding to the Problem Police Officer: An Evaluation of Early Warning Systems        $174,643         X
 Drug abuse
     Evaluation of Breaking the Cycle                                                       $2,419,344        X
     Evaluation of a Comprehensive Service-Based Intervention Strategy in Public 

     Housing                                                                                 $187,412         X             

 Juvenile justice issues
     National Evaluation of the Gang Resistance Education and Training Program              $1,568,323        X

     Evaluation of a Juvenile Justice Mental Health Initiative with Randomized Design        $200,000                       X

Source: GAO analysis of NIJ data.



                                             Overall, we found that 10 of the 15 evaluations that we reviewed could not
Most of the Reviewed 	                       produce sufficiently sound information about program outcomes. Six
NIJ Outcome                                  evaluations began with sufficiently sound designs, but encountered
                                             implementation problems that would render their results inconclusive. An
Evaluations Could                            additional 4 studies had serious methodological problems that from the
Not Produce                                  start limited their ability to produce reliable and valid results. Five studies
                                             appeared to be methodologically rigorous in both their design and
Sufficiently Sound                           implementation. (Appendix II provides additional information on the
Information on                               funding, objectives, and methodology of the 15 outcome evaluation
Program Outcomes                             studies.)



                                             Page 9                                         GAO-03-1091 Justice Outcome Evaluations
Most of the Reviewed             Our review found that 5 evaluations had both sufficiently sound designs
Studies Were Well                and implementation plans or procedures, thereby maximizing the
Designed, but Many Later         likelihood that the study could meaningfully measure program effects.
                                 Funding for these methodologically sound studies totaled about $7.5
Encountered                      million, or nearly 50 percent of the approximately $15.4 million spent on
Implementation Problems          the studies we reviewed. Six evaluations were well designed, but they
                                 encountered problems implementing the design as planned during the data
                                 collection phase of the study. Funding for these studies with
                                 implementation problems totaled about $3.3 million, or about 21 percent
                                 of the approximately $15.4 million spent on the studies we reviewed.

Five Evaluations Were            Five of the evaluations we reviewed were well designed and their
Sufficiently Well Designed and   implementation was sufficiently sound at the time of our review. Two of
Implemented                      these evaluations had been completed and 3 were ongoing. All 5
                                 evaluations met generally accepted social science standards for sound
                                 design, including measurement of key outcomes after a follow-up period
                                 to measure change over time, use of comparison groups or appropriate
                                 statistical controls to account for the influence of external factors on the
                                 results,9 random sampling of participants and/or sites or other purposeful
                                 sampling methods to ensure generalizable samples and procedures to
                                 ensure sufficient sample sizes, and appropriate data collection and
                                 analytic procedures to ensure the reliability and validity of measures (see
                                 table 2).

Studies Measured Change in       All 5 evaluations measured, or included plans to measure, specified
Outcomes Over Time               outcomes after a sufficient follow-up period. Some designs provided for
                                 collecting baseline data at or before program entry, and outcome data
                                 several months or years following completion of the program. Such
                                 designs allowed evaluators to compare outcome data against a baseline
                                 measurement to facilitate drawing conclusions about the program’s
                                 effects, and to gauge whether the effects persisted or were transitory. For
                                 example, the National Evaluation of the Gang Resistance Education and
                                 Training Program examined the effectiveness of a 9-week, school-based
                                 education program that sought to prevent youth crime and violence by
                                 reducing student involvement in gangs. Students were surveyed regarding
                                 attitudes toward gangs, crime, and police, self-reported gang activity, and




                                 9
                                  Statistically controlling for external factors that may be related to program outcomes and
                                 on which the treatment and comparison groups differ is usually not necessary when there
                                 is random assignment of participants to treatment and comparison conditions.




                                 Page 10                                         GAO-03-1091 Justice Outcome Evaluations
                                                risk-seeking behaviors 2 weeks before the program began, and then again
                                                at yearly intervals for 4 years following the program’s completion.

Table 2: Characteristics of 5 NIJ Outcome Evaluations with Sufficiently Sound Designs and Implementation Plans

                                                                                                  Appropriate
                                                                                                   sampling                    Appropriate data
                                                                           Use of comparison    procedures and                  collection and
                                                          Sufficient        groups to control reasonable sample                    analysis
 Evaluation study                                         follow-up        for external factors      sizes                       procedures
 National Evaluation of the Gang Resistance                   X                       X                        X                        X
 Education and Training Program
 Evaluation of Breaking the Cycle                             X                       X                        X                        X
 Evaluation of a Multi-Site Demonstration for             Planned                     X                    Planned                  Planned
 Enhanced Judicial Oversight of Domestic
 Violence Cases
 Culturally Focused Batterer Counseling for               Planned                     X                    Planned                  Planned
 African-American Men
 Testing the Impact of Court Monitoring and               Planned                     X                    Planned                  Planned
 Batterer Intervention Programs at the Bronx
 Misdemeanor Domestic Violence Courta
Source: GAO analysis of NIJ data.
                                                a
                                                 Although we have categorized this evaluation as having a sufficiently sound design and
                                                implementation plan, the grantee’s proposal did not discuss how differential attrition from the four
                                                treatment groups would be handled if it occurred. Therefore, we do not know if the grantee has made
                                                sufficient plans to address this potential circumstance.


                                                Measuring change in specific outcome variables at both baseline and after
                                                a follow-up period may not always be feasible. When the outcome of
                                                interest is “recidivism,” such as whether drug-involved criminal defendants
                                                continue to commit criminal offenses after participating in a drug
                                                treatment program, the outcome can only be measured after the program
                                                is delivered. In this case, it is important that the follow-up period be long
                                                enough to enable the program’s effects to be discerned. For example, the
                                                ongoing evaluation of the Culturally Focused Batterer Counseling for
                                                African-American Men seeks to test the relative effectiveness of
                                                counseling that recognizes and responds to cultural issues versus
                                                conventional batterer counseling in reducing batterer recidivism. All
                                                participants in the study had been referred by the court system to
                                                counseling after committing domestic violence violations. The evaluators
                                                planned to measure re-arrests and re-assaults 1 year after program intake,
                                                approximately 8 months after the end of counseling. The study cited prior
                                                research literature noting that two-thirds of first-time re-assaults were
                                                found to occur within 6 months of program intake, and over 80 percent of




                                                Page 11                                             GAO-03-1091 Justice Outcome Evaluations
                              first-time re-assaults over a 2-1/2 year period occur within 12 months of
                              program intake.

Comparison Groups Were Used   All 5 evaluations used or planned to use comparison groups to isolate and
to Isolate Program Effects    minimize external factors that could influence the results of the study. Use
                              of comparison groups is a practice employed by evaluators to help
                              determine whether differences between baseline and follow-up results are
                              due to the program under consideration or to other programs or external
                              factors. In 3 of the 5 studies, research participants were randomly assigned
                              to a group that received services from the program or to a comparison
                              group that did not receive services. In constructing comparison groups,
                              random assignment is an effective technique for minimizing differences
                              between participants who receive the program and those who do not on
                              variables that might affect the outcomes of the study. For example, in the
                              previously mentioned ongoing evaluation of Culturally Focused Batterer
                              Counseling for African-American Men participants who were referred to
                              counseling by a domestic violence court are randomly assigned to one of
                              three groups: (1) a culturally focused group composed of only African-
                              Americans, (2) a conventional counseling group composed of only African-
                              Americans, or (3) a mixed race conventional counseling group. The
                              randomized design allows the investigators to determine the effect of the
                              culturally focused counseling over and above the effect of participating in
                              a same race group situation.

                              In the remaining two evaluation studies, a randomized design was not
                              used and the comparison group was chosen to match the program group
                              as closely as possible on a number of characteristics, in an attempt to
                              ensure that the comparison and program groups would be similar in
                              virtually all respects aside from the intervention. For example, the ongoing
                              Evaluation of a Multi-Site Demonstration for Enhanced Judicial Oversight
                              of Domestic Violence Cases seeks to examine the effects of a coordinated
                              community response to domestic violence (including advocacy, provision
                              of victim services, and enhanced judicial oversight) on victim safety and
                              offender accountability. To ensure that the comparison and program
                              groups were similar, comparison sites were selected based on having
                              court caseload and population demographic characteristics similar to the
                              demonstration sites. Only the program group is to receive the intervention;
                              and neither comparison site has a specialized court docket; enhanced
                              judicial oversight; or a county-wide, coordinated system for handling
                              domestic violence cases.




                              Page 12                                GAO-03-1091 Justice Outcome Evaluations
Sufficiently Sound Sampling     All 5 evaluations employed or planned to employ sufficiently sound
Procedures and Adequate         sampling procedures for selecting program and comparison participants.
Response Rates Helped Ensure    This was intended to ensure that study participants were representative of
Representativeness              the population being examined so that conclusions about program effects
                                could be generalized to that population. For example, in the previously
                                mentioned Judicial Oversight Demonstration evaluation, offenders in
                                program and comparison sites are being chosen from court records. In
                                each site, equal numbers of eligible participants are being chosen
                                consecutively over a 12-month period until a monthly quota is reached.
                                Although this technique falls short of random sampling, the optimal
                                method for ensuring comparability across groups, use of the 12-month
                                sampling period takes into consideration and controls for possible
                                seasonal variation in domestic violence cases.

                                The 5 evaluations also had adequate plans to achieve, or succeeded in
                                achieving, reasonable response rates from participants in their samples.
                                Failure to achieve adequate response rates threatens the validity of
                                conclusions about program effects, as it is possible that selected
                                individuals who do not respond or participate are substantially different
                                on the outcome variable of interest from those who do respond or
                                participate. The previously mentioned National Evaluation of the Gang
                                Resistance Education and Training Program sought to survey students
                                annually for up to 4 years after program participation ended. The grantee
                                made considerable efforts in years 2, 3, and 4 to follow up with students
                                who had moved from middle school to high school and were later enrolled
                                in a large number of different schools; in some cases, in different school
                                districts. The grantee achieved a completion rate on the student surveys of
                                76 percent after 2 years,10 69 percent after 3 years, and 67 percent after
                                4 years. The grantee also presented analyses that statistically controlled
                                for differential attrition among the treatment and comparison groups, and
                                across sites, and showed that the program effects that were found
                                persisted in these specialized analyses.

Careful Data Collection and     All 5 well-designed evaluations employed or had adequate plans to employ
Analysis Procedures Were Used   careful data collection and analysis procedures. These included
or Planned                      procedures to ensure that the comparison group does not receive services
                                or treatment received by the program group, response rates are


                                10
                                 The grantee notes that a 1990 analysis of 85 longitudinal studies reported an average
                                questionnaire completion rate of 72 percent for 19 studies that had a 24-month follow-up
                                period. This is slightly lower than the 76 percent response rate achieved after 2 years in the
                                Gang Resistance Education and Training evaluation.




                                Page 13                                          GAO-03-1091 Justice Outcome Evaluations
                                 documented, and statistical analyses are used to adjust for the effects of
                                 selection bias or differential attrition on the measured results.11 For
                                 example, the Breaking the Cycle evaluation examined the effectiveness of
                                 a comprehensive effort to reduce substance abuse and criminal activity
                                 among arrestees with a history of drug involvement. The program group
                                 consisted of felons who tested positive for drug use, reported drug use in
                                 the past, or were charged specifically with drug-related felonies. The
                                 comparison group consisted of persons arrested a year before the
                                 implementation of the Breaking the Cycle intervention who tested positive
                                 for at least one drug. Both groups agreed to participate in the study.
                                 Although groups selected at different times and using different criteria
                                 may differ in systematic ways, the evaluators made efforts to control for
                                 differences in the samples at baseline. Where selection bias was found, a
                                 correction factor was used in the analyses, and corrected results were
                                 presented in the report.

Six Studies Were Well-Designed   Six of the 11 studies that were well-designed encountered problems in
but Encountered Problems         implementation during the data collection phase, and thus were unable to
During Implementation            or are unlikely to produce definitive results about the outcomes of the
                                 programs being evaluated. Such problems included the use of program and
                                 comparison groups that differed on outcome-related characteristics at the
                                 beginning of the program or became different due to differential attrition,
                                 failure of the program sponsors to implement the program as originally
                                 planned, and low response rates among program participants (see table 3).
                                 Five of the studies had been completed and 1 was ongoing.




                                 11
                                  Selection bias refers to biases introduced by selecting different types of people into the
                                 program and comparison groups; differences in measured outcomes for each group may be
                                 a function of preexisting differences between the groups, rather than the intervention.
                                 Differential attrition refers to unequal loss of participants from the program and
                                 comparison groups during the course of a study, resulting in groups that are no longer
                                 comparable. Both may be a threat to the validity of conclusions.




                                 Page 14                                        GAO-03-1091 Justice Outcome Evaluations
Table 3: Problems Encountered during Implementation of 6 Well-Designed NIJ Outcome Evaluation Studies

                                                                          Program and             Program not
                                                                           comparison           implemented as        Response rates
 Evaluation study                                                        groups differed            planned              were low
 An Evaluation of Chicago’s Citywide Community Policing Program                  X                      X
 Evaluation of a Comprehensive Service-Based Intervention                        X                      X
 Strategy in Public Housing
 An Evaluation of Victim Advocacy with a Team Approach                                                  X                     X
 Reducing Non-Emergency Calls to 911: An Assessment of Four                                             X                     X
 Approaches to Handling Citizen Calls for Service
 Responding to the Problem Police Officer: An Evaluation of Early                X
 Warning Systems
 Evaluation of the Juvenile Justice Mental Health Initiative with                                       X
 Randomized Design
Source: GAO analysis of NIJ data.


Differences between Program                   Three of the 6 studies used a comparison group that differed from the
and Comparison Group                          program group in terms of characteristics likely to be related to program
Characteristics Make it                       outcomes—either due to preexisting differences or to differential
Difficult to Attribute Outcomes               attrition—even though the investigators may have made efforts to
to the Program                                minimize the occurrence of these problems.12 As a result, a finding that
                                              program and comparison group participants differed in outcomes could
                                              not be attributed solely to the program. For example, the Comprehensive
                                              Service-Based Intervention Strategy in Public Housing evaluation sought
                                              to reduce drug activity and promote family self-sufficiency among tenants
                                              of a public housing complex in one city through on-site comprehensive
                                              services and high profile police involvement. The intervention site was a
                                              housing project in one section of the city; the comparison site was another
                                              public housing complex on the opposite side of town, chosen for its
                                              similarities to the intervention site in terms of race, family composition,
                                              crime statistics, and the number of women who were welfare recipients.
                                              However, when baseline data from the two sites were examined,
                                              important preexisting differences between the two sites became apparent.
                                              These differences included a higher proportion of residents at the
                                              comparison site who were employed, which could have differentially


                                              12
                                                Preexisting differences between the program and comparison groups can be viewed as a
                                              design problem. We treat this as an implementation problem in this section because the
                                              proposed design for these particular studies appeared to us to be reasonable at the time the
                                              funding decision was made. Problems with the comparability of the groups became
                                              apparent only after the studies were well underway, and often it was too late to control for
                                              the effects of such differences on program outcomes with statistical adjustments.




                                              Page 15                                        GAO-03-1091 Justice Outcome Evaluations
                             affected intervention and comparison residents’ propensity to utilize and
                             benefit from available services. Additionally, since there was considerable
                             attrition at both the intervention and comparison sites, it is possible that
                             the intervention and comparison group respondents who remained
                             differed on some factors related to the program outcomes. Although it may
                             have been possible to statistically control for these differences when
                             analyzing program outcomes, the evaluator did not do so in the analyses
                             presented in the final report.

Program Results Not          In 5 of the 6 studies, evaluators ran into methodological problems because
Measurable Because Program   the program under evaluation was not implemented as planned, and the
Not Implemented as Planned   investigators could not test the hypotheses that they had outlined in their
                             grant proposals. For the most part, this particular implementation problem
                             was beyond the evaluators’ control. It resulted from decisions made by
                             agencies providing program services that had agreed to cooperate with the
                             evaluators but, for a number of reasons, made changes in the programs or
                             did not cooperate as fully as expected after the studies were underway.
                             This occurred in the evaluation of the Juvenile Justice Mental Health
                             Initiative with Randomized Design, a study that is ongoing and expected to
                             be completed in September 2003. The investigators had proposed to test
                             whether two interventions provided within an interagency collaborative
                             setting were effective in treating youths with serious emotional
                             disturbances referred to the juvenile justice system for delinquency.
                             Juveniles were to be randomly assigned to one of two treatment programs,
                             depending on age and offense history (one for youth under the age of
                             14 without serious, violent, or chronic offense history, and one for youth
                             ages 14 and older with serious, violent, or chronic delinquencies) or to a
                             comparison group that received preexisting court affiliated service
                             programs. The evaluators themselves had no power to develop or modify
                             programs. The funding agencies13 contracted with a local parent support
                             agency and with a nonprofit community-based agency to implement the
                             programs, but the program for youth under the age of 14 was never
                             implemented.14 In addition, partway through the study, the funding
                             agencies decided to terminate random assignment of juveniles, and shortly
                             thereafter ended the program. As a result, the evaluators had complete
                             data on 45 juveniles who had been in the treatment program, rather than


                             13
                              The treatment programs were to be developed under the funding and oversight of the St.
                             Louis Mental Health Board and the Missouri Department of Mental Health.
                             14
                              As a result, juveniles under 14 were randomly assigned to either the program for juveniles
                             14 and over, or to the comparison group.




                             Page 16                                        GAO-03-1091 Justice Outcome Evaluations
                             on the 100 juveniles they had proposed to study. Although the study
                             continued to collect data on juveniles eligible for the study (who were
                             then assigned to the comparison group, since a treatment option was no
                             longer available), the evaluators proposed to analyze the data from the
                             random experiment separately, examining only those treatment and
                             comparison youths assigned when program slots were available. Because
                             of the smaller number of participants than anticipated, detailed analyses of
                             certain variables (such as the type, or amount of service received, or the
                             effects of race and gender) are likely to be unreliable.

Low Response Rates May       Low response rates were a problem in 2 of the 6 studies, potentially
Reduce the Reliability and   reducing the reliability and validity of the findings. In a third study,
Validity of Findings         response rates were not reported, making it impossible for us to determine
                             whether this was a problem or not.15 In one study where the response rate
                             was a problem, the evaluators attempted to survey victims of domestic
                             abuse, a population that NIJ officials acknowledged was difficult to reach.
                             In An Evaluation of Victim Advocacy With a Team Approach, the
                             evaluators attempted to contact by telephone women who were victims of
                             domestic violence, to inquire about victims’ experiences with subsequent
                             violence and their perceptions of safety. Response rates were only about
                             23 percent, and the victims who were interviewed differed from those who
                             were not interviewed in terms of the nature and seriousness of the abuse
                             to which they had been subjected. NIJ’s program manager told us that
                             when she became aware of low response rates on the telephone survey,
                             she and the principal investigator discussed a variety of strategies to
                             increase response rates. She said the grantee expended additional time
                             and effort to increase the response rate, but had limited success. In the
                             other study with low response rates—Reducing Non-Emergency Calls to
                             911: An Assessment of Four Approaches to Handling Citizen Calls for
                             Service—investigators attempted to survey police officers in one city
                             regarding their attitudes about the city’s new non-emergency phone
                             system. Only 20 percent of the police officers completed the survey.




                             15
                              The Evaluation of a Comprehensive Service-Based Intervention Strategy in Public
                             Housing reported response rates for both the intervention and comparison sites on a
                             survey at baseline, but did not report response rates for follow-up surveys conducted
                             12 and 18 months after the intervention began.




                             Page 17                                        GAO-03-1091 Justice Outcome Evaluations
Some Evaluation Studies     Four of the evaluation studies began with serious design problems that
Had Serious Design          diminished their ability to produce reliable or valid findings about program
Limitations from the        outcomes. One of the studies was completed, and 3 were ongoing. The
                            studies’ design problems included the lack of comparison groups, failure
Beginning                   to measure the intended outcomes of the program, and failure to collect
                            preprogram data as a baseline for the outcomes of interest (see table 4).
                            Funding for these studies that began with serious methodological
                            problems totaled about $4.7 million, or about 30 percent of the
                            approximately $15.4 million spent on the studies we reviewed.

                            Table 4: Design Limitations in 4 NIJ Outcome Evaluation Studies

                                                                           No        Intended          Limited
                                                                        comparison outcomes not      pre-program
                             Evaluation study                             group     measured             data
                             National Evaluation of the Rural               X              X              X
                             Domestic Violence and Child
                             Victimization Enforcement Grant
                             Program
                             National Evaluation of the Domestic            X                             X
                             Violence Victims’ Civil Legal Assistance
                             Program
                             Multi-Site Demonstration of                    X              X
                             Collaborations to Address Domestic
                             Violence and Child Maltreatment
                             Corrections and Law Enforcement                X
                             Family Support: Law Enforcement Field
                             Test
                            Source: GAO analysis of NIJ data.



Lack of Comparison Groups   None of the 4 outcome evaluation studies had a comparison group built
                            into the design—a factor that hindered the evaluator’s ability to isolate and
                            minimize external factors that could influence the results of the study. The
                            completed National Evaluation of the Rural Domestic Violence and Child
                            Victimization Enforcement Grant Program did not make use of
                            comparison groups to study the effectiveness of the federal grant program
                            that supports projects designed to prevent and respond to domestic
                            violence, dating violence, and child victimization in rural communities.
                            Instead, evaluators collected case study data from multiday site visits to 9
                            selected sites.

                            The other three funded grant proposals submitted to NIJ indicated that
                            they anticipated difficulty in locating and forming appropriate comparison
                            groups. However, they proposed to explore the feasibility of using
                            comparison groups in the design phase following funding of the grant. At



                            Page 18                                       GAO-03-1091 Justice Outcome Evaluations
                               the time of our review, when each of these studies was well into
                               implementation, none was found to be using a comparison group. For
                               example, the Evaluation of a Multi-Site Demonstration of Collaborations
                               to Address Domestic Violence and Child Maltreatment proposed to
                               examine whether steps taken to improve collaboration between
                               dependency courts, child protective services, and domestic violence
                               service providers in addressing the problems faced by families with co­
                               occurring instances of domestic violence and child maltreatment resulted
                               in improvements in how service providers dealt with domestic violence
                               and child maltreatment cases. Although NIJ stated that the evaluators
                               planned to collect individual case record data from similar communities,
                               at the time of our review these sites had not yet been identified, nor had a
                               methodology for identifying the sites been proposed. Our review was
                               conducted during the evaluation’s third year of funding.

Intended Outcomes of Program   Although they were funded as outcome evaluations, 2 of the 4 studies were
Were Not Measured              not designed to provide information on intended outcomes for individuals
                               served by the programs. Both the Rural Domestic Violence and the Multi-
                               Site Demonstration of Collaborations programs had as their objectives the
                               enhanced safety of victims, among other goals. However, neither of the
                               evaluations of these programs collected data on individual women victims
                               and their families in order to examine whether the programs achieved this
                               objective. Most of the data collected in the Rural Domestic Violence
                               evaluation were indicators of intermediary results, such as increases in the
                               knowledge and training of various rural service providers. While such
                               intermediary results may be necessary precursors to achieving the
                               program’s objectives of victim safety, they are not themselves indicators of
                               victim safety. The Multi-Site Demonstration of Collaborations evaluation
                               originally proposed to collect data on the safety of women and children as
                               well as perpetrator recidivism, but in the second year of the evaluation
                               project, the evaluators filed a request to change the scope of the study.
                               Specifically, they noted that the original outcome indicators proposed for
                               victim safety were not appropriate given the time frame of the evaluation
                               compared to the progress of the demonstration project itself. The modified
                               scope, which was approved by NIJ, focused on system rather than
                               individual level outcomes. The new ‘effectiveness’ indicators included
                               such things as changes in policies and procedures of agencies
                               participating in the collaboration, and how agency personnel identify,
                               process, and manage families with co-occurring domestic violence and
                               child maltreatment. Such a design precludes conclusions about whether
                               the programs improved the lives of victims of domestic violence or their
                               children.



                               Page 19                                 GAO-03-1091 Justice Outcome Evaluations
Lack of Pre-Program Data       As discussed in our March 2002 report, the Rural Domestic Violence
Hinders Ability to Show That   evaluation team did not collect baseline data prior to the start of the
Program Produced Change        program, making it difficult to identify change resulting from the program.
                               In addition, at the time of our review, in the third year of the multi-year
                               National Evaluation of the Domestic Violence Victims’ Civil Legal
                               Assistance Program evaluation, the evaluator did not know whether
                               baseline data would be available to examine changes resulting from the
                               program. This evaluation, of the federal Civil Legal Assistance program,16
                               proposed to measure whether there had been a decrease in pro se
                               representation (or self-representation) in domestic violence protective
                               order cases. A decrease in pro se representation would indicate successful
                               assistance to clients by Civil Legal Assistance grantees. In May 2003, NIJ
                               reported that the evaluator was still in the process of contacting the court
                               systems at the study sites to see which ones had available data on pro se
                               cases. The evaluator also proposed to ask a sample of domestic violence
                               victims whether they had access to civil legal assistance services prior to
                               the program, the outcomes of their cases, and satisfaction with services.
                               Respondents were to be selected from a list of domestic violence clients
                               served by Civil Legal Assistance grantees within a specified time period,
                               possibly 3 to 9 months prior to the start of the outcome portion of the
                               study. Such retrospective data on experiences that may have occurred
                               more than 9 months ago must be interpreted with caution, given the
                               possibility of recall errors or respondents’ lack of knowledge about
                               services that were available in the past.


NIJ Has Funded Outcome         Outcome evaluations are inherently difficult to conduct because in real-
Evaluations Despite Major      world settings program results can be affected by factors other than the
Gaps in Knowledge about        intervention being studied. In addition, grantees’ ability to conduct such
                               evaluations can depend on the extent to which information is available up
the Availability of Data and   front about what data are available to answer the research questions,
Comparison Groups              where such data can be obtained, and how the data can be collected for
                               both the intervention and comparison groups. We found that in 3 of the
                               15 NIJ evaluations we reviewed, NIJ lacked sufficient information about
                               these issues to assure itself that the proposals it funded were feasible to
                               carry out. These 3 studies totaled about $3.7 million.




                               16
                                Civil Legal Assistance provides grants to nonprofit, nongovernmental organizations that
                               provide legal services to victims of domestic violence or that work with victims of
                               domestic violence who have civil legal needs.




                               Page 20                                       GAO-03-1091 Justice Outcome Evaluations
For the Evaluation of Non-Emergency Calls to 911, NIJ and DOJ’s Office of
Community Oriented Policing Services jointly solicited grant proposals to
evaluate strategies taken by 4 cities to decrease non-emergency calls to
the emergency 911 system. NIJ officials told us that they had conducted
3-day site visits of the 4 sites, and that discussions with local officials
included questions about availability of data in each jurisdiction. The NIJ
solicitation for proposals contained descriptions of how non-emergency
calls were processed at all 4 sites, but no information on the availability of
outcome data to assess changes in the volume, type, and nature of
emergency and non-emergency calls before and after the advent of the
non-emergency systems. Evaluators were asked to conduct both a process
analysis and an assessment analysis. The assessment analysis was to
include “compiling and/or developing data” on a number of outcome
questions. Once the study was funded, however, the grantee learned that
only 1 of the 4 cities had both a system designed specifically to reduce
non-emergency calls to 911, as well as reliable data for evaluation
purposes.

In the case of the Multi-Site Demonstration of Collaborations to Address
Domestic Violence and Child Maltreatment, NIJ funded the proposal
without knowing whether the grantee would be able to form comparison
groups. NIJ officials stated that one of the reasons for uncertainty about
the study design was that at the time the evaluator was selected, the
6 demonstration sites had not yet been selected. The proposal stated that
the grantee would explore the “potential for incorporating comparison
communities or comparison groups at the site level, and assess the
feasibility, costs, and contributions and limitations of a design that
incorporates comparison groups or communities.” NIJ continued to fund
the grantee for 3 additional years, although the second year proposal for
supplemental funding made no mention of comparison groups and the
third year proposal stated that the grantee would search for comparison
sites, but did not describe how such sites would be located. In response to
our questions about whether comparison groups would be used in the
study, NIJ officials said that the plan was for the grantee to compare a
random sample of case records from before program implementation to
those after implementation at each of the demonstration sites. Designs
utilizing pre-post treatment comparisons within the same group are not
considered to be as rigorous as pre-post-treatment comparison group
designs because they do not allow evaluators to determine whether the
results are due to the program under consideration or to some other
programs or external factors.




Page 21                                 GAO-03-1091 Justice Outcome Evaluations
NIJ also approved the Multi-Site Demonstration of Collaborations
proposal without knowing whether data on individual victims of domestic
violence and child maltreatment would be available during the time frame
of the evaluation. The first year proposal stated that the grantee would
examine outcomes for individuals and families, although it also noted that
there are challenges to assessing such outcomes and that system
outcomes should be examined first. Our review found that in the third year
of the evaluation, data collection was focused solely on “system”
outcomes, such as changes in policies and procedures and how agency
personnel identify, process, and manage families with co-occurring
domestic violence and child maltreatment. Thus, although the original
design called for answering questions about the outcomes of the program
for individuals and families, NIJ could not expect answers to such
questions.17

In the case of the Civil Legal Assistance study, NIJ officials told us that
they have held discussions with the grantee about the feasibility of adding
comparison groups to the design. According to these officials, the grantee
said that a comparison group design would force it to reduce the process
sites to be studied from 20 to somewhere between 6 and 8. NIJ advised the
grantee that so large a reduction in sites would be too high a price to pay
to obtain comparison groups, and advised the grantee to stay with the
design as originally proposed. Consequently, NIJ cannot expect a rigorous
assessment of outcomes from this evaluation.




17
  NIJ officials told us in August 2003 that the evaluation had been funded for a fourth year,
and that the federal agencies funding this evaluation (DOJ and the Department of Health
and Human Services) were also considering a fifth year of funding. Four years of funding
allows the evaluation to collect data covering about the first 3 years of implementation in
the sites. However, data collected from stakeholders at the sites early in the evaluation
showed that the sites expected that it would take 3.5 to 4 years to achieve change in key
individual level outcomes. At the time of our review, there was no information on whether
individual level outcome data would be collected.




Page 22                                          GAO-03-1091 Justice Outcome Evaluations
                        Of the 5 completed NIJ studies that focused on issues of interest to DOJ
Completed Outcome       program offices, findings related to program effectiveness were not
Evaluations Produced    sufficiently reliable or conclusive. However, DOJ program administrators
                        told us that they found some of the process and implementation findings
Useful Information on   from the completed studies to be useful.18
Processes but Not on
                        Program administrators from DOJ’s Office on Violence Against Women
Outcomes for DOJ        said that although they did not obtain useful outcome results from the
Program                 Rural Domestic Violence evaluation, they identified two “lessons learned”
Administrators          from the process and implementation components of the study. First, the
                        evaluation found that very little information was available to grantees
                        regarding how to create collaborative programs. Thus, DOJ engaged a
                        technical assistance organization to develop a training program on how to
                        create collaborative projects based on the experiences of some of the
                        grantees examined by the Rural evaluation. Second, program
                        administrators told us that the evaluation found that because Rural grants
                        were funded on an 18-month schedule, programs did not have adequate
                        time to structure program services and also collect useful program
                        information. As a result, Rural programs are now funded for at least
                        24 months.19

                        While shortcomings in NIJ’s outcome evaluations of law enforcement
                        programs leave questions about whether the programs are effective and
                        whether they should continue to be funded, program administrators in
                        DOJ’s Office of Community Oriented Policing Services said that the
                        studies helped identify implementation problems that assisted them in
                        developing and disseminating information in ways useful to the law
                        enforcement community. These included curriculum development,
                        leadership conferences, and fact sheets and other research publications.
                        For example, as a result of the NIJ-managed study, Responding to the



                        18
                          Because of our interest in the effectiveness of criminal justice programs, we limited our
                        review of the usefulness of NIJ outcome evaluations to evaluations of DOJ programs, or
                        evaluations funded by DOJ—a total of 5 evaluations. We did not examine 3 other
                        completed NIJ outcome evaluations focusing on programs funded by agencies other than
                        DOJ.
                        19
                          Officials with DOJ’s Office on Violence Against Women were not familiar with the
                        findings from the other completed NIJ study focusing on violence against women, the
                        Victim Advocacy with a Team Approach evaluation. This evaluation was funded by a
                        transfer of funds to NIJ for NIJ research and evaluations in the area of violence against
                        women. NIJ officials stated that Office on Violence Against Women officials were consulted
                        in the development of the solicitation.




                        Page 23                                         GAO-03-1091 Justice Outcome Evaluations
                        Problem Police Officer: An Evaluation of Early Warning Systems,20 DOJ
                        officials developed a draft command level guidebook that focuses on the
                        factors to be considered in developing an early warning system, developed
                        an early warning intervention training curriculum that is being taught by
                        the 31 Regional Community Policing Institutes21 located across the
                        country, and convened a “state-of-art” conference for five top law
                        enforcement agencies that were developing early warning systems. DOJ
                        officials also said the studies showed that the various systems evaluated
                        had been well received by citizens and law enforcement officials. For
                        example, they said that citizens like the 311 non-emergency number that
                        was established in several cities to serve as an alternative to calling the
                        911 emergency number. The system allows law enforcement officers to
                        identify hot spots or trouble areas in the city by looking at various patterns
                        in the citizen call data. Officials may also be able to monitor the overall
                        state of affairs in the city, such as the presence of potholes, for example.
                        Similarly, Chicago’s City-Wide Community Policing program resulted in
                        the development of a crime mapping system, enabling officers to track
                        crime in particular areas of the city. Like the non-emergency telephone
                        systems, DOJ officials believe that crime mapping helps inform citizens,
                        police, and policy makers about potential problem areas.


                        NIJ officials told us that they have begun to take several steps to try to
NIJ’s Current and       increase the likelihood that outcome evaluations will produce more
Planned Activities to   definitive results. We recommended in our March 2002 report on selected
                        NIJ-managed outcome evaluations22 that NIJ assess its evaluation process
Improve Its             to help ensure that future outcome evaluations produce definitive results.
Evaluation Program 	    In November 2002, Congress amended the relevant statute to include cost-
                        effectiveness evaluation where practical as part of NIJ’s charge to conduct
                        evaluations.23 Since that time NIJ has established an Evaluation Division


                        20
                          An early warning system is a data based police management tool designed to identify
                        officers whose behavior is problematic, as indicated by high rates of citizen complaints, use
                        of force incidents, or other evidence of behavior problems, and to provide some form of
                        intervention, such as counseling or training to correct that performance. The NIJ-managed
                        study consisted of a process and outcome evaluation of early warning systems in 3 large
                        urban police departments, as well as a national survey.
                        21
                         Through the Regional Community Policing Institute network, DOJ’s Office of Community
                        Oriented Policing Services assists local law enforcement agencies with meeting their
                        community policing training needs.
                        22
                         GAO-02-309.
                        23
                         Homeland Security Act of 2002, P.L. 107-296 sec. 237.




                        Page 24                                         GAO-03-1091 Justice Outcome Evaluations
within NIJ’s Office of Research and Evaluation. NIJ officials told us that
they have also placed greater emphasis on funding cost-benefit studies,
funded feasibility studies prior to soliciting outcome evaluations, and
placed greater emphasis on applicants’ prior performance in awarding
grants.

In January 2003, NIJ established an Evaluation Division within NIJ’s Office
of Research and Evaluation, as part of a broader reorganization of NIJ
programs. According to NIJ, the Division will “oversee NIJ’s evaluations of
other agency’s [sic] programs and…develop policies and procedures that
establish standards for assuring quality and utility of evaluations.”24 NIJ
officials told us that among other things, the Division will be responsible
for recommending to the NIJ Director which evaluations should be
undertaken, assigning NIJ staff to evaluation grants and overseeing their
work, and maintaining oversight responsibility for ongoing evaluation
grants. In addition, NIJ officials told us that one of the NIJ Director’s
priorities is to put greater emphasis on evaluations that examine the costs
and benefits of programs or interventions. To support this priority, NIJ
officials told us that the Evaluation Division had recently developed
training for NIJ staff on cost-benefit and cost-effectiveness analysis.25

NIJ recently undertook 37 “evaluability assessments” to assess the
feasibility of conducting outcome evaluations of congressionally
earmarked programs prior to soliciting proposals for evaluation.26 In 2002
and 2003, these assessments were conducted to examine each project’s




24
 NIJ Web site (http://www.ojp.usdoj.gov/nij/about.htm).
25
  These analyses compare a program’s outputs or outcomes with the costs (resources
expended) to produce them. Cost-effectiveness analysis assesses the costs of meeting a
single goal or objective, and can be used to identify the least costly alternative to meet that
goal. Cost-benefit analysis aidms to identify all the relevant costs and benefits, usually
expressed in dollar terms.
26
  Earmarked refers to dedicating an appropriation for a particular purpose. Legislative
language may designate any portion of a lump-sum amount for particular purposes. In
fiscal year 2002, congressional guidance for the use of these funds was provided in
conference report H.R. 107-278. The report specified that up to 10 percent of the funds for
the Bureau of Justice Assistance’s Edward Byrne Discretionary Grant Program be made
available for an independent evaluation of the program (at 88); and up to 10 percent of the
funds for the Office of Juvenile Justice and Delinquency Prevention’s Discretionary Grants
for National Programs and Special Emphasis Programs (at 108) and Safe Schools Initiative
be made available for an independent evaluation of the program (at 112).




Page 25                                          GAO-03-1091 Justice Outcome Evaluations
                scope, activities, and potential for rigorous evaluation.27 The effort
                included telephone interviews and site visits to gather information
                regarding such things as what outcomes could be measured, what kinds of
                data were being collected by program staff, and the probability of using a
                comparison group or random assignment in the evaluation. Based on the
                review, NIJ solicited proposals from the research community to evaluate a
                subset of the earmarked programs that NIJ believed were ready for
                outcome evaluation.28

                NIJ officials also stated that in an effort to improve the performance of its
                grantees, it has begun to pay greater attention to the quality and timeliness
                of their performance on previous NIJ grants when reviewing funding
                proposals. As part of NIJ’s internal review of grant applications, NIJ staff
                check that applicants’ reports are complete and accurate and evaluate past
                work conducted by the applicant using performance related measures.
                Although this is not a new activity, NIJ officials told us that NIJ was now
                placing more emphasis on reviewing applicants’ prior performance than it
                had in the past.29 NIJ officials told us that NIJ staff may also contact staff in
                other OJP offices, where the applicant may have received grant funding, to
                assess applicant performance on those grants.


                Our in-depth review of 15 outcome evaluations managed by NIJ during the
Conclusions 	   past 10 years indicated that the majority was beset with methodological
                and/or implementation problems that limited the ability to draw
                meaningful conclusions about the programs’ effectiveness. Although our
                sample is not representative of all NIJ outcome evaluations conducted
                during the last 10 years, it includes those that have received a large
                proportion of the total funding for this type of research, and tends to be
                drawn from the most recent work. The findings from this review, coupled
                with similar findings we reported in other reviews of NIJ outcome



                27
                 Prior to conducting the evaluability assessments, NIJ conducted an initial review of the
                earmarked programs, and eliminated from consideration those programs that were
                appearing in legislation for the first time, in order to focus on those programs that were
                receiving continuation funding.
                28
                  The solicitation deadlines were April 11, 2003, for the Bureau of Justice Assistance
                programs and July 15, 2003, for the Office of Juvenile Justice and Delinquency Prevention
                programs.
                29
                  A new requirement of the solicitation for proposals is that applicants report what prior
                funding they have received from NIJ.




                Page 26                                         GAO-03-1091 Justice Outcome Evaluations
                        evaluations, raise concerns about the level of attention NIJ is focusing on
                        ensuring that funded outcome evaluations produce credible results.

                        We recognize that it is very difficult to design and execute outcome
                        evaluations that produce meaningful and definitive results. Real world
                        evaluations of complex social programs inevitably pose methodological
                        challenges that can be difficult to control and overcome. Nonetheless, we
                        believe it is possible to conduct outcome evaluations in real world settings
                        that produce meaningful results. Indeed, 5 of NIJ’s outcome evaluations
                        can be characterized in this way, and these 5 accounted for about 48
                        percent of the $15.4 million spent on the studies we reviewed. We also
                        believe that NIJ could do more to help ensure that the millions of dollars it
                        spends annually to evaluate criminal justice programs is money well spent.
                        Indeed, poor evaluations can have substantial costs if they result in
                        continued funding for ineffective programs or the curtailing of funding for
                        effective programs.

                        NIJ officials told us that they recognize the need to improve their
                        evaluation efforts and have begun to take several steps in an effort to
                        increase the likelihood that outcome evaluations will produce more
                        conclusive results. These steps include determining whether a program is
                        ready for evaluation and monitoring evaluators’ work more closely. We
                        support NIJ’s efforts to improve the rigor of its evaluations. However, it is
                        too soon to tell whether and to what extent these efforts will lead to NIJ
                        funding more rigorous effectiveness evaluations, and result in NIJ
                        obtaining evaluative information that can better assist policy makers in
                        making decisions about criminal justice funding priorities. In addition to
                        the steps that NIJ is taking, we believe that NIJ can benefit from reviewing
                        problematic studies it has already funded in order to determine the
                        underlying causes for the problems and determine ways to avoid them in
                        the future.


                        We recommend that the Attorney General instruct the Director of NIJ to:
Recommendations for
Executive Action 
 •	   Conduct a review of its ongoing outcome evaluation grants—including
                        those discussed in this report—and develop appropriate strategies and
                        corrective measures to ensure that methodological design and
                        implementation problems are overcome so the evaluations can produce
                        more conclusive results. Such a review should consider the design and
                        implementation issues we identified in our assessment in order to decide
                        whether and what type of intervention may be appropriate. If, based on
                        NIJ’s review, it appears that the methodological problems cannot be



                        Page 27                                 GAO-03-1091 Justice Outcome Evaluations
                           overcome, NIJ should consider refocusing the studies’ objectives and/or
                           limiting funding.

                       •   Continue efforts to respond to our March 2002 recommendation that NIJ
                           assess its evaluation process with the purpose of developing approaches
                           to ensure that future outcome evaluation studies are funded only when
                           they are effectively designed and implemented. The assessment could
                           consider the feasibility of such steps as:

                           •   obtain more information about the availability of outcome data prior to
                               developing a solicitation for research;
                           • require that outcome evaluation proposals contain more detailed
                               design specifications before funding decisions are made regarding
                               these proposals; and
                           • 	 more carefully calibrate NIJ monitoring procedures to the cost of the
                               grant, the risks inherent in the proposed methodology, and the extent
                               of knowledge in the area under investigation.

                           We provided a copy of a draft of this report to the Attorney General for
Agency Comments 
          review and comment. In a September 4, 2003, letter, DOJ’s Assistant
and our Evaluation 
       Attorney General for the Office of Justice Programs commented on the
                           draft. Her comments are summarized below and presented in their entirety
                           in appendix III.

                           The Assistant Attorney General stated that NIJ agreed with our
                           recommendations. She also highlighted NIJ’s current and planned
                           activities to improve its evaluation program. For example, as we note in
                           the report, NIJ has established an Evaluation Division and initiated a new
                           strategy of evaluability assessments. Evaluability assessments are
                           intended to be quick, low cost initial assessments of criminal or juvenile
                           justice programs to help NIJ determine if the necessary conditions exist to
                           warrant sponsoring a full-scale outcome evaluation. To improve its
                           grantmaking process, the Assistant Attorney General stated that NIJ is
                           developing a new grant “special conditions” that will require grantees to
                           document all changes in the scope and components of evaluation designs.
                           In response to our concerns, NIJ also plans, in fiscal year 2004, to review
                           its grant monitoring procedures for evaluation grants in order to more
                           intensively monitor the larger or more complex grants. NIJ also plans to
                           conduct periodic reviews of its evaluation research portfolio to assess the
                           progress of ongoing grants. This procedure is to include documenting any
                           changes in evaluation design that may have occurred and reassessing the
                           expected benefits of ongoing projects.




                           Page 28                                GAO-03-1091 Justice Outcome Evaluations
In her letter, the Assistant Attorney General made two substantive
comments—both concerning our underlying assumptions in conducting
the review—with which we disagree. In her first comment, the Assistant
Attorney General noted that our report implies that conclusive evaluation
results can always be achieved if studies are rigorously designed and
carefully monitored. We disagree with this characterization of the
implication of our report. While sound research design and careful
monitoring of program implementation are factors that can significantly
affect the extent to which outcome evaluation results are conclusive, they
are not the only factors. We believe that difficulties associated with
conducting outcome evaluations in real world settings can give rise to
situations in which programs are not implemented as planned or requisite
data turn out not to be available. In such instances, even a well-designed
and carefully monitored evaluation will not produce conclusive findings
about program effectiveness. Our view is that when such problems occur,
NIJ should respond and take appropriate action. NIJ could (1) take steps
to improve the methodological adequacy of the studies if it is feasible to
do so, (2) reconsider the purpose and scope of evaluation if there is
interest in aspects of the program other than its effectiveness, or (3)
decide to end the evaluation project if it is not likely to produce useful
information on program outcomes.

In her second comment, the Assistant Attorney General expressed the
view that our work excluded consideration of valid, high quality evaluation
methods other than experimental and quasi-experimental design. We
believe that our assessment of NIJ’s outcome evaluations was both
appropriate and comprehensive. We examined a variety of methodological
attributes of NIJ’s studies in trying to assess whether they would produce
sufficiently sound information on program outcomes. Among other things,
we systematically examined such factors as the type of evaluation design
used; how program effects were isolated (that is, whether comparison
groups or statistical controls were utilized); the size of study samples and
appropriateness of sampling procedures; the reliability, validity, and
appropriateness of outcome measures; the length of follow-up periods on
program participants; the extent to which program attrition or program
participant nonresponse may have been an issue; the appropriateness of
analytic techniques that were employed; and the reported results.
Therefore, we made determinations about the cause and effect linkages
between programs and outcomes using a myriad of methodological
information. In discussing the methodological strengths of experimental
and quasi-experimental designs, we did not intend to be dismissive of
other potential approaches to isolating the effects of program
interventions. For example, if statistical controls can be employed to


Page 29                                GAO-03-1091 Justice Outcome Evaluations
adequately compensate for a methodological weakness such as the
existence of a comparison group that is not comparable on characteristics
that could affect the study’s outcome, then we endorse the use of such a
technique. However, in those instances where our review found that NIJ’s
studies could not produce sufficiently sound information about program
outcomes, we saw no evidence that program effects had been isolated
using alternative, compensatory, or supplemental methods.

In addition to these comments, the Assistant Attorney General also
provided us with a number of technical comments, which we incorporated
in the report as appropriate.


As arranged with your office, unless you publicly announce its contents
earlier, we plan no further distribution of this report until 14 days from the
date of this report. At that time, we will send copies to the Attorney
General, appropriate congressional committees and other interested
parties. In addition, the report will be available at no charge on GAO’s Web
site at http://www.gao.gov.

Sincerely yours,




Laurie E. Ekstrand
Director, Homeland Security
 and Justice Issues




Page 30                                 GAO-03-1091 Justice Outcome Evaluations
Appendix I: Objectives, Scope, and
Methodology

              In response to your request, we undertook a review of the outcome
              evaluation work performed under the direction of the National Institute of
              Justice (NIJ) during the last 10 years. We are reporting on (1) the
              methodological quality of a sample of completed and ongoing NIJ outcome
              evaluation grants and (2) the usefulness of the evaluations in producing
              information on program outcomes.

              Our review covered outcome evaluation grants managed by NIJ from 1992
              through 2002. Outcome evaluations are defined as those efforts designed
              to determine whether a program, project, or intervention produced its
              intended effects. These kinds of studies can be distinguished from process
              evaluations, which are designed to assess the extent to which a program is
              operating as intended.

              To determine the methodological quality of a sample of NIJ-managed
              outcome evaluations, we asked NIJ, in June 2002, to identify and give us a
              list of all outcome evaluations managed by NIJ that were initiated during
              the last 10 years, or initiated at an earlier date but completed during the
              last 5 years. NIJ identified 96 evaluation studies that contained outcome
              evaluation components that had been awarded during this period. A
              number of these studies included both process and outcome components.
              We did not independently verify the accuracy or completeness of the data
              NIJ provided.

              These 96 evaluations were funded for a total of about $36.6 million.
              Individual grant awards ranged in size from $22,374 to about $2.8 million.
              Twenty grants were awarded for $500,000 or more, for a total of about
              $22.8 million (accounting for about 62 percent of all funding for NIJ
              outcome evaluations during the 10-year review period); 51 grants for less
              than $500,000, but more than $100,000, for a total of about $11.7 million
              (accounting for about 32 percent of all NIJ outcome evaluation funding);
              and 25 grants for $100,000 or less, for a total of about $2.1 million
              (accounting for about 6 percent of all NIJ outcome evaluation funding).
              Fifty-one of the 96 evaluations had been completed at the time of our
              review; 45 were ongoing.

              From the list of 96 outcome evaluation grants, we selected a judgmental
              sample of 16 grants for an in-depth methodological review. Our sample
              selection criteria were constructed so as to sample both large and
              medium-sized grants (in terms of award size), and both completed and
              ongoing studies. We selected 8 large evaluations—funded at $500,000 or
              above—and 8 medium-sized evaluations—funded at between $101,000 and
              $499,000. Within each group of 8 we selected the 4 most recently


              Page 31                                GAO-03-1091 Justice Outcome Evaluations
                                           Appendix I: Objectives, Scope, and
                                           Methodology




                                           completed evaluations, and the 4 most recently initiated evaluations that
                                           were still ongoing, in an effort to ensure that the majority of the grants
                                           reviewed were subject to the most recent NIJ grant management policies
                                           and procedures. One of the medium-sized ongoing evaluations was
                                           dropped from our review when we determined that the evaluation was in
                                           the formative stage of development; that is, the application had been
                                           awarded but the methodological design had not yet been fully developed.
                                           As a result, our in-depth methodological review covered 15 NIJ-managed
                                           outcome evaluations accounting for about 42 percent of the total spent on
                                           outcome evaluation grants between 1992 and 2002 (see tables 5 and 6).
                                           These studies are not necessarily representative of all outcome
                                           evaluations managed by NIJ during this period.

Table 5: Number and Size of Outcome Evaluation Awards Made by NIJ from 1992 through 2002, and Reviewed by GAO

                                    All NIJ outcome evaluations                 NIJ outcome evaluations reviewed by GAO
                                         Number             Total       Number of grants (percent          Total funding (percent
 Size of grant                          of grants        funding           reviewed in category)            reviewed in category)
 Large ($500,000 or more)                     20     $22,801,186                            8 (40%)            $13,654,211 (60%)
 Medium ($101,000-$499,000)                   51      11,687,679                            7 (14%)              1,765,915 (15%)
 Small ($100,000 or less)                     25       2,110,737                               N/A                           N/A
 Total                                        96     $36,599,602                           15 (16%)            $15,420,126 (42%)
Source: GAO analysis of NIJ data.




                                           Page 32                                       GAO-03-1091 Justice Outcome Evaluations
                                              Appendix I: Objectives, Scope, and
                                              Methodology




Table 6: Size and Completion Status of the 15 Evaluations Selected for Methodological Review

                                                                                     Size of award                     Status
 Grant title                                                       Award           Large       Medium         Completed         Ongoing
 National Evaluation of Gang Resistance Education and
 Training Program                                              $1,568,323           X                              X
 Evaluation of Chicago’s Citywide Community Policing
 Program                                                       $2,157,859           X                              X
 National Evaluation of the Rural Domestic Violence and
 Child Victimization Enforcement Grant Program                  $719,949            X                              X
 Evaluation of Breaking the Cycle                              $2,419,344           X                              X
 National Evaluation of the Domestic Violence Victims’ Civil
 Legal Assistance Program                                       $800,154            X                                             X
 Evaluation of a Multi-Site Demonstration of Collaborations
 to Address Domestic Violence and Child Maltreatment           $2,498,638           X                                             X
 Corrections and Law Enforcement Family Support: Law
 Enforcement Field Test                                         $649,990            X                                             X
 Evaluation of a Multi-Site Demonstration for Enhanced
 Judicial Oversight of Domestic Violence Cases                 $2,839,954           X                                             X
 Evaluation of a Comprehensive Service-Based
 Intervention Strategy in Public Housing                        $187,412                             X             X
 An Evaluation of Victim Advocacy with a Team Approach          $153,491                             X             X
 Reducing Non-Emergency Calls to 911: An Assessment of
 Four Approaches to Handling Citizen Calls for Service          $399,919                             X             X
 Responding to the Problem Police Officer: An Evaluation
 of Early Warning Systems                                       $174,643                             X             X
 Evaluation of a Juvenile Justice Mental Health Initiative
 with Randomized Design                                         $200,000                             X                            X
 Culturally Focused Batterer Counseling for African-
 American Men                                                   $356,321                             X                            X
 Testing the Impact of Court Monitoring and Batterer
 Intervention Programs at the Bronx Misdemeanor
 Domestic Violence Court                                        $294,129                             X                            X
Source: GAO analysis of NIJ data.

                                              The evaluations we selected comprised a broad representation of issues in
                                              the criminal justice field and of program delivery methods. In terms of
                                              criminal justice issues, 7 of the 15 evaluations focused on programs
                                              designed to reduce domestic violence, 4 focused on programs addressing
                                              the behavior of law enforcement officers, 2 focused on programs
                                              addressing drug abuse, and 2 focused on programs to deal with juvenile
                                              justice issues. In terms of program delivery methods, 3 evaluations
                                              examined national discretionary grant programs or nationwide
                                              cooperative agreements, 4 examined multisite demonstration programs,
                                              and 8 examined local programs or innovations.


                                              Page 33                                        GAO-03-1091 Justice Outcome Evaluations
Appendix I: Objectives, Scope, and
Methodology




For the 15 outcome evaluations we reviewed, we asked NIJ to provide any
documentation relevant to the design and implementation of the outcome
evaluation methodologies, such as the application solicitation, the
grantee’s initial and supplemental applications, progress notes, interim
reports, requested methodological changes, and any final reports that may
have become available. We used a data collection instrument to obtain
information systematically about each program being evaluated and about
the features of the evaluation methodology. We based our data collection
and assessments on generally accepted social science standards.1 We
examined such factors as whether evaluation data were collected before
and after program implementation; how program effects were isolated
(i.e., the use of nonprogram participant comparison groups or statistical
controls); and the appropriateness of sampling, outcome measures,
statistical analyses, and any reported results.2 A senior social scientist with
training and experience in evaluation research and methodology read and
coded the documentation for each evaluation. A second senior social
scientist reviewed each completed data collection instrument and the
relevant documentation for the outcome evaluation to verify the accuracy
of every coded item. We relied on documents NIJ provided to us between
October 2002 and May 2003 in assessing the evaluation methodologies and
reporting on each evaluation’s status. We grouped the studies into
3 categories based on our judgment of their methodological soundness.
Although we recognize that the stronger studies may have had some
weaknesses, and that the weaker studies may have had some strengths,
our categorization of the studies was a summary judgment based on the
totality of the information provided to us by NIJ. Following our review, we
interviewed NIJ officials regarding NIJ’s role in soliciting, selecting, and
monitoring these grants, and spoke to NIJ grant managers regarding issues
raised about each of the grants during the course of our methodological
review.



1
 These standards are well defined in scientific literature. See, for example, Donald T.
Campbell and Julian C. Stanley, Experimental and Quasi-Experimental Designs for
Research (Chicago: Rand McNally & Company, 1963); Carol H. Weiss, Evaluation
Research: Methods for Assessing Program Effectiveness (Englewood Cliffs: Prentice-Hall,
Inc., 1972); Edward A. Suchman, Evaluative Research: Principles and Practice in Public
Service & Social Action Programs (New York: Russell Sage Foundation, 1967); and
GAO/PEMD-10.1.4.
2
 The evaluations varied in the methodologies that were used to examine program effects.
Of the 15 evaluations, 14 did not explicitly discuss cost/benefit considerations. The
evaluation of Breaking the Cycle estimated cost/benefit ratios at each of the
3 demonstration sites examined.




Page 34                                       GAO-03-1091 Justice Outcome Evaluations
Appendix I: Objectives, Scope, and
Methodology




In the course of our discussions with NIJ officials, we learned of changes
NIJ has underway to improve its administration of outcome evaluation
studies. To document these changes, we interviewed responsible NIJ
officials, and requested and reviewed relevant documents. We are
providing information in this report about these changes.

To identify the usefulness of the evaluations in producing information on
program outcomes, we reviewed reported findings from completed NIJ-
managed outcome evaluations that either evaluated programs
administered or funded by the Department of Justice (DOJ), or had been
conducted with funding contributed by DOJ program offices (see table 7).
Of the 8 completed evaluations that we reviewed for methodological
adequacy, 5 had been conducted with funding contributed in part by DOJ
program offices, including 2 evaluations funded in part by DOJ’s Office on
Violence Against Women (OVW) and 3 evaluations funded in part by DOJ’s
Office of Community Oriented Policing Services (COPS). Of the 2
evaluations funded by OVW, 1 was a review of a national program
administered by DOJ, and the other was a review of a locally administered
program funded partially by an OVW grant. Of the 3 evaluations funded by
COPS, 2 were evaluations of programs funded at least in part with COPS
funding, and the other was an evaluation of a program operating at several
local law enforcement agencies, supported with local funding. Because of
our interest in the effectiveness of criminal justice programs, we limited
our review of the usefulness of NIJ outcome evaluations to evaluations of
DOJ programs, or evaluations funded by DOJ program offices, and did not
examine the 3 other completed NIJ outcome evaluations that focused on
programs funded by agencies other than DOJ.




Page 35                                GAO-03-1091 Justice Outcome Evaluations
                                           Appendix I: Objectives, Scope, and
                                           Methodology




Table 7: Programs Evaluated and Funding Sources for Completed NIJ Outcome Evaluations

                                                                                                           Evaluation funded
                                                                                       DOJ-funded          by DOJ program
 Completed NIJ evaluations                                                             program             offices
 OVW Evaluations
 National Evaluation of the Rural Domestic Violence and Child Victimization Enforcement Yes                Yes
 Grant Program
 An Evaluation of Victim Advocacy with a Team Approach                                 Yes                 Yes
 COPS Evaluations
 Evaluation of Chicago’s Citywide Community Policing Program                           Yes                 Yes
 Reducing Non-Emergency Calls to 911: An Assessment of Four Approaches to              Yes                 Yes
 Handling Citizen Calls for Service
 Responding to the Problem Police Officer: An Evaluation of Early Warning Systems      No                  Yes
 Other evaluations
 National Evaluation of Gang Resistance Education and Training Program                 No                  No
 Evaluation of Breaking the Cycle                                                      No                  No
 Evaluation of a Comprehensive Service-Based Intervention Strategy in Public Housing   No                  No
Source: GAO analysis of NIJ data.


                                           We interviewed NIJ officials and relevant DOJ program administrators
                                           regarding whether these findings were used to implement improvements
                                           in the evaluated programs. At OVW and COPS, we asked officials the
                                           extent to which they (1) were involved in soliciting and developing the
                                           evaluation grant, and monitoring the evaluation; (2) were aware of the
                                           evaluation results; and (3) had made any changes to the programs they
                                           administered based on evaluation findings about the effectiveness of the
                                           evaluated programs.

                                           We conducted our work at NIJ headquarters in Washington, D.C., between
                                           May 2002 and August 2003 in accordance with generally accepted
                                           government auditing standards.




                                           Page 36                                      GAO-03-1091 Justice Outcome Evaluations
Appendix II: Summaries of the NIJ Outcome
Evaluations Reviewed

Evaluations with Sound Designs and Sound Implementation Plans

Evaluation                  The National Evaluation of the Gang Resistance Education and Training (GREAT) Program
Principal investigator      University of Nebraska at Omaha
Program evaluated 	         The GREAT program began in 1991 with the goal of using federal, state, and local law enforcement
                            agents to educate elementary school students in areas prone to gang activity about the destructive
                            consequences of gang membership. The program seeks to prevent youth crime and violence by reducing
                            involvement in gangs. According to the evaluator’s proposal, as of April 1994, 507 officers in 37 states
                            (150 sites) had completed GREAT training. GREAT targets middle school students (with an optional
                            curriculum for third and fourth graders) and consists of 8 lessons taught over a 9-week period.
Evaluation components 	     Process and outcome evaluations began in 1994 and were completed in 2001. Total evaluation funding
                            was $1,568,323. The outcome evaluation involved a cross-sectional and longitudinal design. For the
                            cross-sectional component, 5,935 eighth grade students in 11 different cities were surveyed to assess
                            the effectiveness of GREAT. Schools that had offered GREAT within the last 2 years were selected, and
                            questionnaires were administered to all eighth graders in attendance on a single day. This sample
                            constituted a 1-year follow-up of 2 ex-post facto groups: students who had been through GREAT and
                            those who had not. A 5-year longitudinal, quasi-experimental component was conducted in 6 different
                            cities. Schools in the 6 cities were selected purposively, to allow for random assignment where possible.
                            Classrooms in 15 of 22 schools were randomly assigned to receive GREAT or not, whereas assignment
                            in the remaining schools was purposive. A total of more than 3,500 students initially participated, and
                            active consent was obtained for 2,045 participants. Students were surveyed 2 weeks before the program,
                            2 weeks after completion, and at 1-, 2-, 3-, and 4-year intervals after completion. Significant follow-up
                            efforts were employed to maintain reasonable response rates. Concepts measured included attitudinal
                            measures regarding crime, gangs and police; delinquency; drug sales and use; and involvement in
                            gangs, gang activities, and risk-seeking behaviors. In addition, surveys were conducted with parents of
                            the students participating in the longitudinal component, administrative and teaching staff at the schools
                            in the longitudinal design, and officers who had completed GREAT training prior to July 1999.
Assessment of evaluation 	 Although conclusions from the cross-sectional component may be limited because of possible pre-
                           existing differences between students who had been exposed to GREAT and students who had not and
                           lack of detail about statistical controls employed, the design and analyses for the longitudinal component
                           are generally sound, including random assignment of classrooms to the intervention in 15 of the
                           22 schools, collection of baseline and extensive follow-up data; and statistical controls for differential
                           attrition rates of participant and comparison groups.




                                            Page 37                                        GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                Evaluation of Breaking the Cycle
Principal investigator    Urban Institute
Program evaluated 	       A consortium of federal agencies, led by the Office of National Drug Control Policy and NIJ, developed
                          the Breaking the Cycle (BTC) demonstration program in 3 sites to test the effectiveness of a
                          comprehensive, coordinated endeavor to reduce substance abuse and criminal activity, and improve the
                          health and social functioning of drug-involved offenders. The first site, Birmingham, Ala., received funding
                          in 1997, and the next 2 sites, Tacoma, Wash., and Jacksonville, Fla. received funding in 1998.
                          Participants were adult arrestees (for any type of crime) who tested positive for drug use and had a
                          history of drug involvement. The program was based on the recognition that there was a link between
                          drug use and crime, and it had the support of many criminal justice system officials who were willing to
                          use the authority of the criminal justice system to reduce drug use among offenders. BTC intended to
                          expand the scope of earlier programs such as drug courts and Treatment Alternatives to Street Crime by
                          incorporating drug reduction activities as part of handling felony cases. BTC included early intervention; a
                          continuum of treatment options tailored to participants’ needs, including treatment readiness programs in
                          jails; regular judicial monitoring and graduated sanctions; and collaboration among justice and treatment
                          agencies.
Evaluation components 	   Begun in 1997, and the final report completed in 2003, the evaluation was funded for $2,419,344, and
                          included both outcome and process components. Comparison groups were selected in each of the
                          3 sites, and were composed of defendants similar to the BTC participants who were arrested in the year
                          before BTC was implemented. The evaluation examined program success in (1) reducing drug use and
                          criminal activity, as measured by self-reported drug use in the 6 months prior to follow-up interviews and
                          officially recorded arrests in the 12 months after baseline; (2) improving the physical and mental health
                          and family/social well-being of participants, as measured by self-reported interview data on problems
                          experienced in these 3 areas during the 30 days before follow-up; and (3) improving labor market
                          outcomes for participating offenders, as measured by self-reported interview data on employment and
                          social difficulties in the 30 days before follow-up. Survey data were collected at baseline and again at two
                          intervals between 9 and 15 months after baseline. At baseline the sample sizes for the treatment and
                          comparison groups were, respectively, 374 and 192 in Birmingham, 335 and 444 in Jacksonville, and
                          382 and 351 in Tacoma. Response rates for the follow-up interviews varied across the 3 sites from 65 to
                          75 percent for the treatment groups, and from 71 to 73 percent for the comparison groups. Method of
                          assessment varied across sites and across samples, with some participants in both the comparison and
                          treatment groups interviewed in person while others were interviewed by telephone. Multiple statistical
                          analyses, including logistic regression, with controls for differences in demographics, offense history,
                          substance abuse history, and work history between treatment and comparison groups were used. BTC’s
                          effect on the larger judicial environment was also assessed, using official records on the number of
                          hearings, case closure rates, and other factors.
                          Cost-benefit analyses of the BTC interventions were conducted at the three locations. The costs
                          attributable to the BTC program were derived from budgetary information provided by program staff. The
                          BTC program benefits were conceptualized as “costs avoided” arising from the social and economic
                          costs associated with crime. The estimates of cost avoided in the study were based on (1) the costs (to
                          society) associated with the commission of particular crimes and (2) the costs (to the criminal justice
                          system) associated with arrests. Estimates of these components from the economic and criminal justice
                          literature were applied to self-reported arrest data from the program and comparison group subjects. The
                          derived estimates of benefits were compared to program costs to form cost-benefit ratios for the
                          interventions. An earlier effort to incorporate estimates of savings in service utilization from BTC (as a
                          program benefit) was not included in the final report analysis due to inconclusive results.




                                            Page 38                                       GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Assessment of evaluation 	 The evaluation was well designed and implemented. The study used comparison groups to isolate and
                           minimize external factors that could have influenced the results. While the comparison groups were
                           selected and baseline data collected 1 year before the treatment groups were selected, the study
                           corrected for selection bias and attrition, using multivariate models that incorporated control variables to
                           measure observed sample differences. The study appears to have handled successfully other potential
                           threats to the reliability and validity of results, by using appropriate statistical analyses to make
                           adjustments. For example, the study relied on both self-reported measures of drug use and arrest
                           histories as well as official records of arrests, to assess the effects of the program. Self-report measures
                           are subject to errors in memory or self-presentational biases, while official records can be inaccurate
                           and/or incomplete. The evaluators made use of both the self-report and official measures to attempt to
                           control for these biases.
                             The methodological approach used in the cost benefit analysis was generally sound. The report specified
                             the assumptions underlying the cost and benefit estimates, and appropriately discussed the limitations of
                             the analysis for policymaking.




                                            Page 39                                         GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




                            Evaluation of a Multi-Site Demonstration for Enhanced Judicial Oversight of Domestic Violence
Evaluation                  Cases
Principal investigator      The Urban Institute


Program evaluated 	         The Judicial Oversight Demonstration (JOD) initiative is a multiyear program being implemented at 3
                            sites (City of Boston/Dorchester District Court, Mass.; Washtenaw County, Ann Arbor, Mich.; and
                            Milwaukee County, Wis.) to address the problem of domestic violence. JOD tests the idea that a
                            coordinated community, focused judicial, and systemic criminal justice response can improve victim
                            safety and service provision, as well as offender accountability. JOD emphasizes uniform and consistent
                            responses to domestic violence offenses, including coordinated victim advocacy and services; strong
                            offender accountability and oversight; rigorous research and evaluation components; and centralized
                            technical assistance. Demonstration sites have developed partnerships with a variety of public and
                            private entities, including victim advocacy organizations, local law enforcement agencies, courts, and
                            other social service providers. The program began in fiscal year 2000, and demonstration sites are
                            expected to receive funding for 5 years.
Evaluation components 	     A process evaluation began in January 2000. The outcome component of the evaluation began in
                            October 2002 and is to be completed by October 2005. At the time of our review, the evaluation grant
                            amount was $2,839,954. Plans call for a full outcome assessment to be conducted in 2 sites and,
                            because no appropriate comparison site could be identified, a partial assessment in the third site. The 2
                            sites with a full assessment were matched with comparison sites having similar court caseloads and
                            population demographics; neither comparison site had a specialized court docket, enhanced judicial
                            oversight, or a countywide coordinated system for handling domestic violence cases. Over 12 months, all
                            domestic violence cases in each site, up to monthly size quotas, will be selected into the following
                            groups: cases where the offender was found guilty and sentenced to jail for 6 months or less and
                            probation or probation only, cases that were dismissed or diverted from prosecution, and cases where
                            the offender received more than 6 months incarceration. Victims and offenders in the first group will be
                            interviewed, and in the second group, victims only will be interviewed. Offender recidivism in both groups
                            will be tracked for 1 year following the intervention using police and court records. For the third group,
                            only offender recidivism will be tracked. In the partial assessment site, subject to data availability, the
                            plan is to compare a sample of domestic violence cases in which the offender was placed on probation in
                            the period before JOD implementation with a sample of cases in which the offender was placed on
                            probation and scheduled for judicial review in the period after JOD implementation. Data about incidents,
                            victims, and offenders are to be obtained from official records, and offender recidivism will be tracked
                            using police and court records. Overall, short-term outcomes for the study are planned to include various
                            measures of offender compliance and victim and offender perceptions of JOD, and long-term outcomes
                            are planned to include various measures of offender recidivism, victim well-being, and case processing
                            changes. In addition, to discern any system level changes due to JOD, aggregate, annual data on all
                            domestic violence cases for the 2 years prior to and 3 years after JOD implementation in all sites will be
                            collected and analyzed.
Assessment of evaluation 	 The evaluation plan appears to be ambitious and well designed. A quasi-experimental design is planned,
                           and data will be collected from multiple sources, including victims, offenders, and agencies. While lack of
                           sustained cooperation, uneven response rates, and missing data could become problems, detailed plans
                           seem to have been made to minimize these occurrences. The planned approach of selecting cases
                           (choosing equal numbers of cases consecutively until a monthly quota is reached, over a 12-month
                           period) may be nearly as good as random sampling and takes into consideration seasonal variation.
                           However, it could introduce biases, should there be variation as to the time each month when case
                           selection begins.




                                            Page 40                                        GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                  Culturally Focused Batterer Counseling for African-American Men
Principal investigator      Indiana University of Pennsylvania
Program evaluated 	         The purpose of this study is to test the relative effectiveness of culturally focused versus conventional
                            batterer counseling for African-American men. It is based on research indicating that conventional
                            counseling dropout and partner re-assault rates are higher for African-American men than they are for
                            white men, and clinical literature in related fields that recommends culturally focused counseling to
                            improve the effectiveness of counseling with African-American men. Culturally focused counseling refers
                            to the counselor recognizing and responding to cultural issues that emerge in group sessions (including
                            such topics as African-American men’s perceptions of the police, relationships with women, sense of
                            African-American manhood, past and recent experiences of violence, and reactions to discrimination and
                            prejudice), and a curriculum that includes the major cultural issues facing a particular group of
                            participants. The setting for the evaluation is a counseling center in Pittsburgh, Pennsylvania.
Evaluation components 	     The evaluation began in September 2001, and the expected completion date is February 2005. At the
                            time of our review, the grant amount was $356,321. A clinical trial will be conducted to test the effect of
                            culturally focused counseling on the extent to which African-American men drop out of counseling, are
                            accused of re-assaults, and are re-arrested for domestic violence. Plans are for 600 African-American
                            men referred by the Pittsburgh Domestic Violence Court over a 12-month period to batterer counseling at
                            the counseling center to be randomly assigned to either (1) a culturally focused counseling group of only
                            African-Americans, (2) conventional batterer counseling in an African-American only group, and (3)
                            conventional counseling in a racially mixed group. Before assignment, however, the counseling center
                            must recommend the men for participation in the study. Men included in the study will be administered a
                            background questionnaire and two tests of culturally specific attitudes (i.e., racial acculturation and
                            identity) at program intake. The men’s female partners will be interviewed by phone
                            3 months, 6 months, and 12 months after program intake. These structured interviews will collect
                            information on the woman’s relationship with the man, the man’s behavior, and the woman’s help-
                            seeking. Clinical records of program attendance and police records of re-arrests will be obtained for each
                            man. Planned analyses are to include (1) verification of equivalent culturally focused and conventional
                            counseling sub-samples at intake and during the follow-up; (2) comparison of the program dropouts, re-
                            assaults, and re-arrests for the three counseling options at each follow-up interval and cumulatively; and
                            (3) a predictive model of the re-assault outcome based on characteristics, cultural attitudes, and
                            situational factors. Additionally, interviews with a sub-sample of 100 men about their counseling
                            experience are to be conducted.
Assessment of evaluation 	 This is a well-designed experiment to test the effect of a new approach to provide counseling to
                           perpetrators of domestic violence. The researchers have plans to (1) adjust for any selection bias in
                           group assignment and participant attrition through statistical analysis; (2) prevent “contamination” from
                           counselors introducing intervention characteristics to control groups, or the reverse; and (3) monitor the
                           response rates on the interviews with female partners. The evaluation is on-going. The most recent
                           progress report we reviewed indicated that the evaluation is proceeding as planned, with the recruitment
                           of batterers behind schedule by 1 month, the series of female partner interviews on schedule and very
                           close to expected response rates, and the interviews with the sub-sample of batterers about three-
                           quarters complete. One potential concern we have is that because all men referred by the domestic
                           violence court to the counseling center may not be recommended to participate in the study, any bias in
                           recommending study participants will determine the population to which the study’s results can be
                           generalized.




                                            Page 41                                        GAO-03-1091 Justice Outcome Evaluations
                                             Appendix II: Summaries of the NIJ Outcome
                                             Evaluations Reviewed




                             Testing the Impact of Court Monitoring and Batterer Intervention Programs at the Bronx
Evaluation                   Misdemeanor Domestic Violence Court
Principal investigator       Fund for the City of New York
Program evaluated 	          Operating since 1998, the Bronx Misdemeanor Domestic Violence Court handles spousal abuse
                             misdemeanor cases. The court has the power to prescribe various conditions of discharge for batterers,
                             including participation in group counseling and/or court monitoring. Given concerns about the
                             effectiveness of these options, it was decided to test the efficacy of batterer counseling programs and
                             court monitoring, alone and in combination with each other. Furthermore, court monitoring was tested
                             based on the frequency of its administration—either monthly or on a graduated basis (less monitoring for
                             fewer incidences of abuse). This was to ascertain whether graduated monitoring might give batterers
                             more incentive to change.
Evaluation components 	      The evaluation began in September 2001 and is expected to be completed in August 2003. At the time of
                             our review, this evaluation was funded for $294,129. The proposed study is an outcome evaluation of
                             4 different treatment alternatives for conditional discharge defendants in domestic violence cases. The
                             treatment options are (1) counseling program and monthly court monitoring, (2) counseling program and
                             graduated court monitoring, (3) monthly court monitoring program only, and (4) graduated court
                             monitoring only. Participants in the evaluation (800 total) are to be assigned randomly to 1 of the 4
                             treatments at the time of sentencing, and incidents of new crimes are to be measured 6 and 12 months
                             after sentencing. Official crime records at both intervals, and interviews with victims at the 12-month
                             interval are the sources of data. The planned analysis involves looking at the groups as a whole, and
                             subgroups related to age, criminal history, and current charge. Outcome measures are (1) completion of
                             the conditional discharge or imposition of the jail alternative, (2) new arrests for domestic violence, and
                             (3) new reports from victims of domestic violence incidents.
Assessment of evaluation 	 This is a well-designed approach to measure the comparative efficacy of combinations of program
                           counseling and variations in monitoring. However, at the time of our review, we had some concerns
                           about how well implementation will proceed. One concern is that if one or more of the treatments is less
                           effective, it could result in participants spending time in jail, reducing the possibility of further incidents.
                           This difficulty can be addressed in the analysis, but neither the proposal nor subsequent progress reports
                           discuss this or other differential attrition issues. Also, although the evaluators have a plan to try to ensure
                           good response rates for the victims’ survey, it is uncertain how effective they will be. Other surveys of
                           similar populations have been problematic.




                                             Page 42                                          GAO-03-1091 Justice Outcome Evaluations
                                           Appendix II: Summaries of the NIJ Outcome
                                           Evaluations Reviewed




Well-designed Evaluations That Encountered Implementation Problems

Evaluation                  An Evaluation of Chicago’s Citywide Community Policing Program
Principal investigator      Northwestern University
Program evaluated 	         Chicago’s community policing program, known as Chicago’s Alternative Policing Strategy (CAPS), began
                            in April 1993. The program reorganizes policing around small geographical areas where officers
                            assigned to beat teams meet with community residents to identify and address a broad range of
                            neighborhood problems.
Evaluation components 	     There were 2 evaluation efforts in this study, 1 examining the prototype project and the second
                            examining citywide program implementation. The combined evaluations were completed in August 2001,
                            at a total cost of $2,157,859.
                            The prototype evaluation, conducted between April 1993 and September 1994, compared five areas that
                            implemented CAPS with four areas that did not. Data from the 1990 Census were used to select four
                            sections of the city that closely matched the demographics of the five prototype areas. Residents of all
                            areas were first surveyed in the spring of 1993 regarding the quality of police service and its impact on
                            neighborhood problems. Follow-up interviews occurred in either June or September of 1994 (14 to 17
                            month time lags). Interviews were conducted by telephone in English and Spanish. The re-interview rate
                            was about 60 percent. A total of 1,506 people were interviewed both times, an average of 180 in each
                            prototype area and 150 in each comparison area.
                            The CAPS citywide evaluation began after the conclusion of the prototype evaluation in July 1994. The
                            purpose of this evaluation was to assess how changing from a traditional policing approach to a
                            community-centered approach would affect citizens’ perceptions of the police, neighborhood problems
                            and crime rates. The researchers administered annual citywide public opinion surveys between 1993 and
                            2001 (excluding 2000). The surveys covered topics such as police demeanor, responsiveness, and task
                            performance. Surveys were also administered to officers at CAPS orientation sessions to obtain, among
                            other things, aggregate indicators of changes in officers’ attitudes toward CAPS. Changes in levels of
                            recorded crimes were analyzed. Direct observations of police meetings, surveys of residents, and
                            interviews with community activists were used to measure community involvement in problem solving
                            and the capacity of neighborhoods to help themselves.
Assessment of evaluation 	 The 1992 crime rates were reported to be similar between prototype districts and their matched
                           comparison areas and the baseline demographic measures used to match the two groups were basically
                           similar. The initial and follow-up response rates of about 60 percent seem reasonable considering the
                           likelihood of community mobility in these areas; however, attrition rates differed for various demographic
                           characteristics, such as home ownership, race, age, and education, raising some concerns about
                           whether the results are generalizable to the intended population. The follow-up time (14-17 months) was
                           the maximum period allowed by the planned citywide implementation of CAPS. A single follow-up survey
                           and the citywide implementation precluded drawing firm conclusions about longer-term impacts of the
                           prototype program.
                            Because CAPS was implemented throughout the city of Chicago in 1995, the CAPS citywide evaluation
                            was not able to include appropriate comparison groups and could not obtain a measure of what would
                            have happened without the benefits of the program. The authors used a variety of methods to examine
                            the implementation and outcomes of the CAPS program, and stated that there was no elaborate
                            research design involved because their focus was on organizational change. However, because the
                            trends over time from resident surveys and crime data were presented without controls or comparison
                            groups and some declines in crime began before the program was implemented, changes cannot be
                            attributed solely to the program.




                                           Page 43                                        GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                  Evaluation of a Comprehensive Service-Based Intervention Strategy in Public Housing
Principal investigator      Yale University School of Medicine
Program evaluated 	         The program was an intervention strategy designed to reduce drug activity and foster family self-
                            sufficiency in families living in a public housing complex in the city of New Haven, Conn. The key
                            elements of the intervention were (1) an on-site comprehensive services model that included both clinical
                            (substance abuse treatment and family support services) and nonclinical components (e.g., extensive
                            outreach and community organizing as well as job training and placement and GED high school
                            equivalency certification) and (2) high profile police involvement. The goals of the program were
                            (1) increases in the proportion of residents entering and completing intervention services and (2) a
                            reduction in substance-related activities and crime.
Evaluation components 	     The evaluation began in 1998 and was completed in 2000. The total evaluation funding was $187,412.
                            The intervention site was a public housing complex composed primarily of female heads of household
                            tenants and additional family members; the control site was another public housing complex on the
                            opposite side of town, chosen for its similarities to the intervention site. The evaluation design was both
                            process and outcome oriented and involved the collection of both qualitative and quantitative data. At
                            baseline, a needs assessment survey was completed (n=175 at the intervention site and n=80 at the
                            control site), and follow-up surveys with residents took place at 12 and 18 months post-intervention (no
                            response rates reported). All heads of household at the sites were the target population for the surveys.
                            The follow-up surveys, while administered in the same two sites, did not track the same respondents that
                            were surveyed at baseline. Survey measures included access to social services; knowledge and reported
                            use of social services; and residents’ perceptions of the extent of drug and alcohol abuse, drug selling,
                            violence, safety, and unsupervised youth in the community. The study also examined crime statistics
                            obtained from the New Haven police department, at baseline and during the intervention.
Assessment of evaluation 	 The study had several limitations, the first of which is potential selection bias due to pre-existing
                           differences between the sites, as well as considerable (and possibly differential) attrition in both groups,
                           with no statistical control for such differences. Second, respondents may not have been representative of
                           the populations at the housing sites. No statistical comparisons of respondents to nonrespondents on
                           selected variables were presented. In addition, on the baseline survey, the response rates of the
                           intervention and control sites differed substantially (70 vs. 44 percent, respectively). Overall response
                           rates were not reported for the follow-up surveys. Furthermore, implementation did not work smoothly
                           (e.g., the control site received additional unanticipated attention from the police). Finally, the grantee
                           proposed to track data on individuals over time (e.g., completion of services), but this goal was not
                           achieved, in part because of the limited capability of project staff in the areas of case monitoring,
                           tracking, and data management. Thus, although the intervention may have produced changes in the
                           intervention site “environment” over time (aggregate level changes), it is not clear that the intervention
                           successfully impacted the lives of individuals and families at the site.




                                            Page 44                                         GAO-03-1091 Justice Outcome Evaluations
                                           Appendix II: Summaries of the NIJ Outcome
                                           Evaluations Reviewed




Evaluation                  An Evaluation of Victim Advocacy with a Team Approach
Principal investigator      Wayne State University
Program evaluated 	         The program provides assistance to domestic violence victims in some police precincts in the city of
                            Detroit. The domestic violence teams studied included specially trained police officers, police department
                            advocates, legal advocates, and in one police precinct, an on-site prosecutor. The advocates assisted
                            victims by offering information about the legal system, referrals, and safety planning.
Evaluation components 	     The outcome evaluation began in January of 1998 and the final report was completed in January of
                            2001. The grant amount was $153,491. The objectives of the study were to address the relationships
                            between advocacy and victim safety and between advocacy and victims’ responses to the criminal justice
                            system, using a quasi-experimental design to compare domestic violence cases originating in police
                            precincts with and without special police domestic violence teams that included advocates. The study
                            focused on assistance provided in 3 police precincts. Precincts not served by in-precinct domestic
                            violence teams, but resembling the precincts with such teams in terms of ethnic representation and
                            median income, were selected as comparisons. Data were collected using police records, county
                            prosecutor’s office records, advocate contact forms, and telephone interviews with victims. Cases that
                            met Michigan’s legal definition of domestic violence, had adult female victims, and were received in the
                            selected precincts over a 4-month period in 1998 were eligible for the study. The cases were first
                            identified by the police department through police reports and then reviewed for qualification by a
                            member of the research team. A weekly quota of cases was selected from each precinct. If the number
                            of qualified cases for a precinct exceeded the quota, then cases were selected randomly using a random
                            numbers table. Outcomes included rates of completed prosecution of batterers, rate of guilty findings
                            against batterers, subsequent violence against victims, victims’ perceptions of safety, and victims’ views
                            of advocacy and the criminal justice process.
Assessment of evaluation 	 The study was severely affected by numerous problems, many of which the researchers acknowledged.
                           First, the sample selection was based on incomplete or unreliable data, since police officers in writing
                           reports often did not fully describe incidents, and precinct staff inconsistently provided complete case
                           information about incidents to the researchers. Second, evaluators were not able to secure cooperation
                           from domestic violence advocates and their supervisors at all service levels in providing reliable reports
                           on service recipients and the type, number, and length of services. Additionally, most domestic violence
                           team members were moved out of the precincts and into a centralized location during the period victims
                           in the study were receiving services, thereby potentially affecting the service(s) provided to them.
                           Further, the researchers were uncertain as to whether women from the comparison precincts received
                           any advocacy services, thereby potentially contaminating the research results between the precincts with
                           the domestic violence teams and the comparison precincts. Finally, low response rates and response
                           bias for data collected from victims were problems. The overall response rate for the initial round of
                           telephone interviews was only about 23 percent and the response rates for follow-up interviews were
                           lower. Response rates were not provided separately for victims from the precincts with the domestic
                           violence teams and the comparison precincts. As a result of the low response rates, the interviewed
                           victims were identified as being less likely to have experienced severe physical abuse, less likely to be
                           living with the abuser, and more likely to have a child in common with the abuser, compared to the
                           victims in the sample who were not interviewed.




                                           Page 45                                         GAO-03-1091 Justice Outcome Evaluations
                                               Appendix II: Summaries of the NIJ Outcome
                                               Evaluations Reviewed




                              Reducing Non-Emergency Calls to 911: An Assessment of Four Approaches to Handling Citizen
Evaluation                    Calls for Service
Principal investigator        University of Cincinnati
Program evaluated 	           DOJ’s COPS office has worked with police agencies, the Federal Communications Commission, and the
                              telecommunications industry to find ways to relieve the substantial demand on the current 911
                              emergency number. Many police chiefs and sheriffs have expressed concern that non-emergency calls
                              represent a large portion of the 911 overload problem. Four cities have implemented strategies to
                              decrease non-emergency 911 calls and have agreed to participate in the research. Those cities, each
                              implementing a different type of approach, were Baltimore, Md.; Dallas, Tex.; Buffalo, N.Y.; and Phoenix,
                              Ariz.
Evaluation components 	       A process and outcome evaluation was conducted between July of 1998 and June of 2000. The grant
                              amount was $399,919. For the outcome component, the grantee examined whether (1) the volume of
                              911 calls declined following the introduction of the non emergency call system; (2) there was a
                              corresponding decline in radio dispatches, thus enhancing officer time; and (3) this additional time was
                              directed to community-oriented policing strategies. The bulk of the design and analysis focused on
                              Baltimore, with a limited amount of analysis of outcomes in Dallas and no examination of outcomes in the
                              other two sites. The study compared rates of 911 calls before implementation of the new 311 system to
                              rates of 911 and 311 calls after the system in both cities. In Baltimore, time series analysis was used to
                              analyze the call data; police officers and sergeants were surveyed; the flow of 311 and 911 calls to
                              Neighborhood Service Centers was examined; researchers accompanied police officers during randomly
                              selected shifts in 3 sectors of Baltimore for 2 weeks; and citizens who made 311 calls during a certain 1-
                              month time frame were surveyed.
Assessment of evaluation 	 The crux of the outcome analysis relies on the study of pre- and post- 311 system comparisons, and the
                           time series analysis done in Baltimore is sound. The rigor of several other parts of this study is
                           questionable (e.g., poor response rates to surveys and short time frames for data from accompanying
                           police officers on randomly selected shifts). In addition, the choice of sites that NIJ required the grantee
                           to examine, other than Baltimore, did not allow for a test of the study’s objectives. Although NIJ
                           conducted pre-solicitation site visits to all 4 sites, at the time of the solicitation it still did not clearly know
                           whether outcome data would be available at all the sites. As it turned out, outcome data were not
                           available in Phoenix and Buffalo. Further, since the 311 system in Dallas was not implemented with the
                           goal of reducing or changing call volume, it does not appear to be a good case with which to test the
                           study’s objectives.




                                               Page 46                                            GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                  Responding to the Problem Police Officer: An Evaluation of Early Warning Systems
Principal investigator      University of Nebraska – Omaha
Program evaluated 	         An Early Warning (EW) system is a data based police management tool designed to identify officers
                            whose behavior is problematic, as indicated by high rates of citizen complaints, use of force incidents, or
                            other evidence of behavior problems, and to provide some form of intervention, such as counseling or
                            training to correct that performance. According to the current study’s national survey of local law
                            enforcement agencies (LEA) serving populations of 50,000 or more, about one-quarter of LEAs surveyed
                            had an EW system, with another 12 percent indicating that one was planned. One-half of existing EW
                            systems have been created since 1994.
Evaluation components 	     Begun in 1998, the study was completed in 1999 and included process and outcome components, as
                            well as a national survey. The total evaluation funding was $174,643. The outcome portion of the study
                            was composed of case studies of EW systems in 3 large urban police departments (Miami-Dade, Fla.;
                            Minneapolis, Minn.; and New Orleans, La.). Sites were selected judgmentally; each had functioning EW
                            systems in place for a period of 4 or more years and had agreed to participate in the study.
                            Both Miami-Dade and Minneapolis case studies examined official performance records (including citizen
                            complaints in both sites and use of force reports in Miami-Dade) for officers identified by the
                            department’s EW system, for 2 years prior to and after departmental intervention, compared to records
                            for officers not identified. The participant groups included officers hired between 1990 and 1992 and later
                            identified by the EW system (n=28 in Miami-Dade; n=29 in Minneapolis); the comparison groups included
                            officers hired during the same period and not identified (n=267 in Miami-Dade; n=78 in Minneapolis). In
                            New Orleans, official records were not organized in a way that permitted analysis of performance of
                            officers subject to EW and a comparison group. The New Orleans case study, therefore, examined
                            citizen complaint data for a group of officers identified by the EW system 2 years or more prior to the
                            study, and for whom full performance data were available for 2 years prior to and 2 years following
                            intervention (n=27).
Assessment of evaluation 	 The study had a number of limitations, many of them acknowledged by the grantee. First, it is not
                           possible to disentangle the effect of EW systems per se from the general climate of rising standards of
                           accountability in all 3 sites. Second, use of nonequivalent comparison groups (officers identified for
                           intervention are likely to differ from those not identified), without statistical adjustments for differences
                           between groups creates difficulties in presenting outcome results. Only in Minneapolis did the evaluators
                           explicitly compare changes in performance of the EW group with changes in performance of the
                           comparison group, again without presenting tests of statistical significance. Furthermore, the content of
                           the intervention was not specifically measured, raising questions about the nature of the intervention that
                           was actually delivered, and whether it was consistent over time in the 3 sites, or across officers subject to
                           the intervention. Moreover, it was not possible to determine which aspects of the intervention were most
                           effective overall (e.g., differences in EW selection criteria, intervention services for officers, and post-
                           intervention monitoring), since the intervention was reportedly effective in all 3 departments despite
                           differences in the nature of their EW systems. Also, no data were available to examine whether the EW
                           systems had a deterrent effect on desirable officer behavior (e.g., arrests or other officer-initiated
                           activity). Finally, generalizability of the findings in Miami-Dade and Minneapolis may also be limited, since
                           those case studies examined cohorts of officers recruited in the early 1990s, and it is not clear whether
                           officers with greater or fewer years of police experience in these departments would respond similarly to
                           EW intervention.




                                            Page 47                                         GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                  Evaluation of the Juvenile Justice Mental Health Initiative with Randomized Design
Principal investigator      University of Missouri - St. Louis.
Program evaluated 	         The Juvenile Justice Mental Health Initiative (JJMI) is a collaborative multi-agency demonstration project
                            funded under an Office of Juvenile Justice and Delinquency Prevention grant, and administered by the
                            St. Louis Mental Health Board, the St. Louis Family Court, and the Missouri Department of Health. The
                            initiative provides mental health services to families of youths referred to the juvenile justice system for
                            delinquency who have serious emotional disturbances (SED). The initiative involves parents and families
                            in juvenile justice interventions, providing coordinated services and sanctions for youths who otherwise
                            might shuttle between criminal justice and mental health agencies. Two new mental health programs
                            were established under JJMI. The first, the Child Conduct and Support Program, was designed for
                            families in which youths under the age of 14 do not have a history of serious, violent, or chronic
                            offending. The second, Multi-systemic Therapy (MST), was designed for families in which youths aged
                            14 and above have prior serious, violent, or chronic delinquency referrals.
Evaluation components 	     The evaluation began in October 2001 and is expected to be completed in September 2003. At the time
                            of our review, the evaluation was funded for $200,000. The study proposed to evaluate the two mental
                            health programs using a random experimental design. Youths referred to the Juvenile Court are first
                            screened for SED. Those who test positive or have prior diagnoses of SED (anxiety, depressed mood,
                            somatic complaints, suicidal ideation, thought disturbance, or traumatic experience) are eligible for the
                            JJMI programs. Eligible youth are randomly assigned to either one of the two treatment programs
                            (depending on age) or to a control group. The evaluation includes a comparison of police contact data,
                            court data, self-reported delinquency, and standardized measures of psychological and parental
                            functioning. Potentially important demographic and social context variables, including measures of school
                            involvement and performance, will be obtained from court records.
Assessment of evaluation 	 This is an ongoing, well designed study. However, as implementation has proceeded, several problems
                           that may affect the utility of the results have emerged. First, the researchers proposed to sample a total
                           of 200 youths, with random assignment expected to result in approximately 100 juveniles in the treatment
                           and comparison groups. The treatment group turned out to be much smaller than anticipated, however,
                           because the randomization protocol and, subsequently, the MST program itself, were discontinued by the
                           St. Louis Mental Health Board. At the time of termination, only 45 youths had been randomly assigned to
                           the treatment group. The small number of subjects limits the extent of the analyses that can be
                           conducted on this population.
                            The Child Conduct and Support Program designed to address the mental health needs of youth under
                            the age of 14 without a history of serious offending was never implemented by the providers contracted
                            to develop the program. Eligible youth, of all ages, were instead assigned to the MST program. Thus, the
                            evaluation will not be able to compare the relative effectiveness of programs specifically designed for
                            younger and older juvenile offenders with SED.




                                            Page 48                                         GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluations with Design Limitations

                            National Evaluation of the Rural Domestic Violence and Child Victimization Enforcement Grant
Evaluation                  Program
Principal investigator      COSMOS Corporation
Program evaluated 	         The National Rural Domestic Violence and Child Victimization Enforcement Grant program, begun in
                            fiscal year 1996, has funded 92 grants through September 2001 to promote the early identification,
                            intervention, and prevention of woman battering and child victimization; increase victim’s safety and
                            access to services; enhance the investigation and prosecution of crimes of domestic violence and child
                            abuse; and develop innovative, comprehensive strategies for fostering community awareness and
                            prevention of domestic abuse. The program seeks to maximize rural resources and capacity by
                            encouraging greater collaboration between Indian tribal governments, rural local governments, and public
                            and private rural service organizations.
Evaluation components 	     The evaluation began in October 1998 and was completed in July 2002. This evaluation was funded at
                            $719,949, and included both process and outcome components. Initially 10 grantees (comprising 11
                            percent of the total number of program grantees) were selected to participate in the outcome evaluation;
                            1 was unable to obtain continuation funding and was dropped from the outcome portion of the study. Two
                            criteria were used in the selection of grant participants: the “feasibility” of grantees visited in the process
                            phase of the evaluation (n=16) to conduct an outcome evaluation; and recommendations from OVW,
                            which were based on knowledge of grantee program activities and an interest in representing the range
                            of organizational structures, activities, and targeted groups served by the grantees. Logic models were
                            developed, as part of the case study approach, to show the logical or plausible links between a grantee’s
                            activities and desired outcomes. The specified outcome data were collected from multiple sources, using
                            a variety of methodologies, during 2-3 day site visits (e.g., multi-year criminal justice, medical, and shelter
                            statistics were collected from archival records where available; community stakeholders were
                            interviewed; and grantee and victim service agency staff participated in focus groups).
Assessment of evaluation 	 This evaluation has several limitations. First, the choice of the 10 outcome sites was skewed toward the
                           technically developed evaluation sites and was not representative of all Rural Domestic Violence
                           program grantees, particular project types, or delivery styles. Second, the lack of comparison groups
                           makes it difficult to exclude the effect of external factors, such as victim safety and improved access to
                           services, on perceived change. Furthermore, several so-called short-term outcome variables were in fact
                           process variables (e.g., number of clients served, number of services provided, number of workshops
                           conducted, and service capacity of community agencies). Moreover, it is not clear how interview and
                           focus group participants were selected. Finally, pre- and post- survey data were not collected at multiple
                           points in time to assess change, except at 1 site, where pre- and post-tests were used to assess
                           increased knowledge of domestic violence among site staff as a result of receiving training.




                                            Page 49                                          GAO-03-1091 Justice Outcome Evaluations
                                             Appendix II: Summaries of the NIJ Outcome
                                             Evaluations Reviewed




Evaluation                   National Evaluation of the Domestic Violence Victims’ Civil Legal Assistance Program
Principal investigator       Institute for Law and Justice
Program evaluated 	          The Civil Legal Assistance (CLA) program is one of seven OJP grants (through OVW) dedicated to
                             enhancing victim safety and ensuring offender accountability. The CLA program awards grants to
                             nonprofit, nongovernmental organizations that provide legal services to victims of domestic violence or
                             that work with victims of domestic violence who have civil legal needs. The CLA grant program was
                             created by Congress in 1998. In fiscal year 1998, 54 programs were funded, with an additional 94 new
                             grantees in fiscal year 1999. Approximately 85-100 new and continuation grants were anticipated in fiscal
                             year 2000.
Evaluation components 	      The study began in November 2000 and was expected to be completed in October 2003. The proposed
                             evaluation consisted of process and outcome components and the total evaluation funding at the time of
                             our review was $800,154. The objective of the outcome evaluation was to determine the effectiveness of
                             the programs in meeting the needs of the women served. The researchers proposed to study 8 sites with
                             CLA programs. At each site at least 75 cases will be tracked to see if there is an increase in pro se (self)
                             representation in domestic violence protective order cases, and a total of 240 victims receiving services
                             will be surveyed (about 30 at each site). Focus groups of service providers will be used to identify
                             potential program impacts on the justice system and wider community. Outcomes to be assessed include
                             change in pro se representation in domestic violence protective order cases, satisfaction with services,
                             and legal outcomes resulting from civil assistance.
Assessment of evaluation 	 The evaluation has several limitations. First, NIJ and the grantee agreed in 2002 not to utilize a
                           comparison group approach whereby data would be collected from a set of comparison sites, due to
                           concerns that investment in that approach would limit the amount of information that could be derived
                           from the process component of the evaluation and from within-site and cross-site analyses of the
                           selected outcome sites. Thus, the study will be limited in its ability to isolate and minimize the potential
                           effects of external factors that could influence the results of the study, in part because it did not include
                           comparison groups in the study design. At the time of our review, it was not yet clear whether sufficient
                           data will be available from the court systems at each outcome site in order to examine changes in pro se
                           representation. In addition, since victims would be selected for the surveys partially on the basis of
                           willingness to be interviewed, it is not clear how representative the survey respondents at each site will
                           be and how the researchers will handle response bias. It also appears that the victim interviews will rely
                           to a great extent on measures that will primarily consist of subjective, retrospective reports.




                                             Page 50                                         GAO-03-1091 Justice Outcome Evaluations
                                            Appendix II: Summaries of the NIJ Outcome
                                            Evaluations Reviewed




Evaluation                  Multi-Site Demonstration of Collaborations to Address Domestic Violence and Child Maltreatment
Principal investigator      Caliber Associates
Program evaluated 	         The Department of Health and Human Services and DOJ’s Office of Justice Programs are jointly funding
                            6 demonstration sites for up to 3 years to improve how 3 systems (dependency courts, child protective
                            services, and domestic violence service providers) work with their broader communities to address
                            families with co-occurring domestic violence (DV) and child maltreatment (CM). Funded sites must agree
                            to implement key recommendations of the National Council of Juvenile and Family Courts Judges’
                            publication, “Effective Interventions in Domestic Violence and Child Maltreatment: Guidelines for Policy
                            and Practice” (aka, the “Greenbook”). At a minimum, the sites need to implement changes in policies and
                            procedures regarding screening and assessment; confidentiality and information sharing; safety; service
                            provision; advocacy; cross-training; and case collaboration. The goals of the demonstration are to
                            generate more coordinated, comprehensive, and consistent responses to families faced with DV and CM,
                            resulting in increased safety and well-being for women and their children.
Evaluation components 	     The evaluation began in September 2000, and is expected to be completed around September 2004. At
                            the time of our review, this evaluation was funded at $2,498,638, for both process and outcome
                            components. The original evaluation proposal focused on various process elements as well as the effects
                            of the intervention on perpetrator recidivism and the safety of women and children. In the second year,
                            the evaluator realized that no site considered itself to be in the implementation phase and many of the
                            original outcome indicators for children and families were not appropriate given the initiative time frame.
                            The revised design in the funded third year proposal is therefore a systems-level evaluation. The analytic
                            focus is now on how the 3 systems identify, process, and manage families with co-occurrence of DV and
                            CM.
                            A random sample of case records from before and after the introduction of the intervention will be used to
                            document trends in identification of co-occurring cases of DV and CM over the course of the intervention.
                            Stakeholder interviews conducted during site visits in fall 2001 and later during implementation, and
                            analysis of agency documents, will be used to measure changes in policies and procedures. “Network
                            analysis” of responses on the stakeholder interviews will be performed to measure changes in how key
                            stakeholders work with others within and across systems. Supervisors and workers will also be asked,
                            early in the implementation period and at the end of the initiative, to respond to vignettes describing
                            hypothetical situations involving co-occurrence of DV and CM to see how they might respond to clients.
Assessment of evaluation 	 This evaluation has several limitations. First, the study objectives changed substantially from year 1 to
                           year 3. The study is no longer examining outcomes for individuals, precluding conclusions about whether
                           the implementation improved the lives of victims of domestic violence or their children. Second, it is not
                           clear whether the evaluator will locate appropriate comparison data at this late stage, and without a
                           comparison group, the study will not be able to determine (a) whether collaboration between systems
                           improved (or weakened) because of the intervention or some extraneous factors and (b) whether
                           collaboration resulted in increased capacity in the 3 systems to identify the co-occurrence of DV and CM,
                           or whether these kinds of cases increased for reasons other than collaboration (e.g., perhaps
                           identification of these cases is improving all over the country). Questions remain about the extent of data
                           available for examining co-occurrence of DV and CM at the 6 sites.




                                            Page 51                                        GAO-03-1091 Justice Outcome Evaluations
                                             Appendix II: Summaries of the NIJ Outcome
                                             Evaluations Reviewed




Evaluation                   Corrections and Law Enforcement Family Support (CLEFS) Law Enforcement Field Test
Program evaluated 	          Since 1996 NIJ has funded, as part of the CLEFS program, 32 grants totaling over $2.8 million to law
                             enforcement agencies, correctional agencies, and organizations representing officers (unions and
                             membership associations) to support the development of research, demonstration, and evaluation
                             projects on stress intervention methods. The stress intervention methods developed and studied have
                             included stress debriefing and management techniques, peer support services, referral networks, police
                             chaplaincy services, stress management training methods, spouse academies, and stress education
                             programs. While NIJ purports to have developed state-of-practice stress reduction methods through
                             these efforts, it acknowledges that very little outcome data have been generated.
Evaluation components 	      The evaluation began in June 2000 and is expected to be completed in June 2004. At the time of our
                             review, the grant amount was $649,990. The study proposes to develop and field test a model to allow
                             for the systematic evaluation of selected program components. The grantee worked with NIJ to identify
                             the test sites and services to be evaluated, based on grant application reviews, telephone interviews, and
                             site visits. Three police departments in Duluth, Minn.; North Miami Beach, Fla.; and Knoxville, Tenn. were
                             selected. Baseline stress correlate data were collected during visits to the 3 sites between January 2002
                             and March 2002, and baseline officer and spouse/partner surveys were conducted during the same
                             visits. Outcome data were to be collected at baseline (prior to actual program implementation), midway
                             through the implementation, and toward the end of the evaluation. While the original proposal did not
                             specify exactly what stress correlate or outcome data were to be collected, the grantee was considering
                             looking at rates of absenteeism and tardiness, citizen complaints, rule and regulation violations,
                             disciplinary actions, and premature retirements and disability pensions, as stress correlates. These were
                             to be obtained from official agency records. Surveys included questions about program impacts on
                             physical health, emotional health, job performance, job satisfaction, job-related stress, and family related
                             stress. The evaluation also included baseline health screenings. It appears the evaluation plan has been
                             modified to add supervisor surveys (there were none at baseline), and to incorporate group data
                             collection efforts with officers, spouses, supervisors, and administrators.
Assessment of evaluation 	 The study has several limitations. First, the 3 study sites were chosen on the basis of merits in their
                           proposal to implement a stress reduction or wellness program for officers, from 4 sites that submitted
                           applications. There was no attempt to make the chosen sites representative of other sites with stress
                           reduction programs and police departments more generally. Second, the study will not make use of
                           comparison groups consisting of similar agencies that did not implement stress reduction programs. It is
                           unclear how effects of the interventions in these 3 sites over time will be disentangled from the effects of
                           other factors that might occur concurrently. Third, the grantee will not collect individually identified data,
                           and thus will only be able to analyze and compare aggregated data across time, limiting the extent of
                           analysis of program effects that can be accomplished. Fourth, response rates to the first wave of officer
                           surveys were quite low in 2 of the 3 sites (16 percent and 27 percent).




                                             Page 52                                         GAO-03-1091 Justice Outcome Evaluations
Appendix III: Comments from the
Department of Justice




              Page 53       GAO-03-1091 Justice Outcome Evaluations
Appendix III: Comments from the Department
of Justice




Page 54                                      GAO-03-1091 Justice Outcome Evaluations
Appendix III: Comments from the Department
of Justice




Page 55                                      GAO-03-1091 Justice Outcome Evaluations
Appendix IV: GAO Contacts and Staff
Acknowledgments

                  Laurie E. Ekstrand (202) 512-8777
GAO Contacts      Evi L. Rezmovic (202) 512-2580


                  In addition to the above, Tom Jessor, Anthony Hill, Stacy Reinstein, David
Staff             Alexander, Michele Fejfar, Douglas Sloane, Shana Wallace, Judy Pagano,
Acknowledgments   Kenneth Bombara, Scott Farrow, Ann H. Finley, Katherine Davis, and Leo
                  Barbour made key contributions to this report.




(440123)
                  Page 56                               GAO-03-1091 Justice Outcome Evaluations
                           The General Accounting Office, the audit, evaluation and investigative arm of
GAO’s Mission              Congress, exists to support Congress in meeting its constitutional responsibilities
                           and to help improve the performance and accountability of the federal
                           government for the American people. GAO examines the use of public funds;
                           evaluates federal programs and policies; and provides analyses,
                           recommendations, and other assistance to help Congress make informed
                           oversight, policy, and funding decisions. GAO’s commitment to good government
                           is reflected in its core values of accountability, integrity, and reliability.


                           The fastest and easiest way to obtain copies of GAO documents at no cost is
Obtaining Copies of        through the Internet. GAO’s Web site (www.gao.gov) contains abstracts and full-
GAO Reports and            text files of current reports and testimony and an expanding archive of older
                           products. The Web site features a search engine to help you locate documents
Testimony                  using key words and phrases. You can print these documents in their entirety,
                           including charts and other graphics.
                           Each day, GAO issues a list of newly released reports, testimony, and
                           correspondence. GAO posts this list, known as “Today’s Reports,” on its Web site
                           daily. The list contains links to the full-text document files. To have GAO e-mail
                           this list to you every afternoon, go to www.gao.gov and select “Subscribe to e-mail
                           alerts” under the “Order GAO Products” heading.


Order by Mail or Phone 	   The first copy of each printed report is free. Additional copies are $2 each. A
                           check or money order should be made out to the Superintendent of Documents.
                           GAO also accepts VISA and Mastercard. Orders for 100 or more copies mailed to a
                           single address are discounted 25 percent. Orders should be sent to:
                           U.S. General Accounting Office
                           441 G Street NW, Room LM
                           Washington, D.C. 20548
                           To order by Phone: 	 Voice:      (202) 512-6000
                                                TDD:        (202) 512-2537
                                                Fax:        (202) 512-6061


                           Contact:
To Report Fraud,
                           Web site: www.gao.gov/fraudnet/fraudnet.htm
Waste, and Abuse in        E-mail: fraudnet@gao.gov
Federal Programs           Automated answering system: (800) 424-5454 or (202) 512-7470


                           Jeff Nelligan, Managing Director, NelliganJ@gao.gov (202) 512-4800
Public Affairs 	           U.S. General Accounting Office, 441 G Street NW, Room 7149
                           Washington, D.C. 20548