oversight

Federal Judgeships: The General Accuracy of the Case-Related Workload Measures Used to Assess the Need for Additional District Court and Courts of Appeals Judgeships

Published by the Government Accountability Office on 2003-05-30.

Below is a raw (and likely hideous) rendition of the original report. (PDF)

United States General Accounting Office
Washington, DC 20548



          May 30, 2003


          The Honorable Lamar Smith
          Chairman
          Subcommittee on Courts, the Internet,
            and Intellectual Property
          Committee on the Judiciary
          House of Representatives

          Subject: Federal Judgeships: The General Accuracy of the Case-Related Workload
          Measures Used to Assess the Need for Additional District Court and Courts of
          Appeals Judgeships

          Dear Mr. Chairman:

          Biennially, the Judicial Conference, the federal judiciary’s principal policymaking
          body, assesses the judiciary’s needs for additional judgeships.1 If the Conference
          determines that additional judgeships are needed, it transmits a request to Congress
          identifying the number, type (courts of appeals, district, or bankruptcy), and location
          of the judgeships it is requesting. In 2003, the Judicial Conference sent to Congress
          requests for 93 new judgeships--11 for the courts of appeals, 46 for the district courts,
          and 36 for the bankruptcy courts.2

          In assessing the need for additional judgeships, the Judicial Conference considers a
          variety of information, including responses to its biennial survey of individual courts,
          temporary increases or decreases in case filings, and other factors specific to an
          individual court. However, the Judicial Conference’s analysis begins with the
          quantitative case-related workload measures it has adopted for the district courts and
          courts of appeals—weighted case filings and adjusted case filings, respectively.
          These two measures recognize, to different degrees, that the time demands on judges
          are largely a function of both the number and complexity of the cases on their
          dockets. Some types of cases may demand relatively little time and others may

          1
           The Chief Justice of the United States presides over the Conference, which consists of the chief
          judges of the 13 courts of appeals, a district judge from each of the 12 geographic circuits, and the
          chief judge of the Court of International Trade. The Conference meets twice a year.
          2
           This report covers the methodology used to develop the case-related workload measures for district
          court and courts of appeals judges. We recently testified on the methodology used to develop the
          case-related workload measure for bankruptcy judges. (See Federal Bankruptcy Judges: Weighted
          Case Filings as a Measure of Judges’ Case-Related Workload, GAO-03-789T (Washington, D.C.: May
          22, 2003)).


                                       GAO-03-788R Accuracy of Judges Case-Related Workload Measures
require many hours of work. Generally, each case filed in a district court is assigned a
weight representing the average amount of judge time the case is expected to require.
A case with a weight of 3.0, for example, would be expected to take twice as much
time as a case with a weight of 1.5. In the courts of appeals, pro se case filings—
those in which one or both parties are not represented by an attorney—are weighted
at 0.33 and all other case filings at 1.0.

Using these measures, individual courts whose past case-related workload meets the
threshold established by the Judicial Conference may be considered for additional
judgeships. These thresholds are 430 weighted case filings per authorized judgeship
for district courts and 500 adjusted case filings per three-judge panel of authorized
judgeships for the courts of appeals (courts of appeals judges generally hear cases in
rotating panels of three judges each). Authorized judgeships are the total number of
judgeships authorized by statute for each district court and court of appeals.

The Judicial Conference relies on these quantitative workload measures to be
reasonably accurate measures of judges’ case-related workload. Whether these
measures are reasonably accurate rests in turn on the soundness of the methodology
used to develop them. As agreed with your office, our objectives were to (1)
determine whether the methods the Judicial Conference uses to quantitatively
measure the case-related workload of district court and court of appeals judges
results in a reasonably accurate measure of judges’ case-related workload, (2) assess
the reasonableness of any proposed methodologies to update the workload measures,
and (3) obtain information from the Administrative Office of the U.S. Courts
(AOUSC) on the steps the Judiciary takes to ensure that the case filing data required
for these workload measures are accurate. The information for the last objective is
presented in enclosure I. The scope of our work specifically excluded any analysis of
how the Judicial Conference used the case-related workload measures to develop its
current judgeship request.

Results in Brief

The district court weighted case filings, as approved in 1993, appear to be reasonably
accurate and are based on a reasonable methodology. However, they are about 10
years old, and we have concerns about the research design approved to update them.

Overall, the weighted case filings, as approved in 1993, appear to be a reasonably
accurate measure of the average time demands that a specific number and mix of
cases filed in a district court could be expected to place on the district judges in that
court. The methodology used to develop the weights used a valid sampling
procedure, developed weights based on actual case-related time recorded by judges
from case filing to disposition, and included a measure (standard errors) of the
statistical confidence in the final weight for each weighted case type. Without such a
measure, it is not possible to assess the accuracy of the final case weights. However,
the case weights are about 10 years old, and the data on which the weights were
based are as much as 15 years old. Changes since 1993, such as the characteristics of
cases filed in federal district courts and changes in case management practices, may
have affected whether the 1993 weights continue to be a reasonably accurate
measure of the average time burden on district court judges resulting from a specific
volume and mix of cases. Some of these changes may have increased time demands;


Page 2                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
others may have reduced time demands. To the extent that the current case weights
understate or overstate the total case-related time demands on district judges, the
weights could potentially result in the Judicial Conference understating or
overstating the need for new district court judgeships.

The Judicial Conference’s Subcommittee on Judicial Statistics has approved a
research design for updating the current case weights, and we have some concerns
about that design. The design would include limited data on the time judges actually
spend on specific types of cases. Much of the time data used would be based on
consensus estimates from groups of experienced judges. Such data cannot be used
to develop an objective, statistical measure of the accuracy of the final case weights.
Without such a measure, it is not possible to determine whether the case weights are
in fact a reasonably accurate measure of case-related judge workload. In assessing
the need for judgeships in specific courts, the Judicial Conference relies on the case
weights to be a reasonably accurate measure of judges’ case-related workload.

Unlike the district court case weights, the adjusted filings workload measure for
appellate judges is not based on any empirical data regarding the time that different
types of cases required of courts of appeals judges. The adjusted filings workload
measure basically assumes that all cases have an equal effect on judges’ workload
with the exception of pro se cases—those in which one or both parties are not
represented by a lawyer—which are weighted at 0.33, or one-third as much as all
other cases. In the documentation we reviewed, we found no empirical data to
support that assumption. The current court of appeals case-related workload
measure, adopted in 1996, reflects an effort to improve the previous measure, which
may have tended to overstate judgeship needs. At the time the current measure was
developed and approved, using the new benchmark of 500 adjusted case filings
resulted in judgeship numbers that closely approximated the judgeship needs of the
majority of the courts of appeals, as the judges of each court of appeals perceived
them. However, on the basis of the documentation we reviewed, there is no
empirical basis on which to assess the accuracy of adjusted filings as a measure of
case-related workload for courts of appeals judges.

Weighted Case Filings: District Judge Case-Related Workload Measure Is
Reasonably Accurate, but 10 Years Old, and the Plan to Update It Raises
Some Concerns

The purpose of the district court case weights was to create a measure of the average
judge time that a specific number and mix of cases filed in a district court would
require. Importantly, the weights were designed to be descriptive not prescriptive—
that is, the weights were designed to develop a measure of the national average
amount of time that judges actually spent on specific types of cases, not to develop a
measure of how much time judges should spend on various types of cases. Finally,
the weights were designed to measure only case-related judge workload. Judges have
noncase-related duties and responsibilities, such as administrative tasks, that are not
reflected in the case weights.




Page 3                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Case Weights Measure Average Judicial Time Demands

With a few exceptions, such as cases that are remanded to a district court from the
courts of appeals, each civil and criminal case filed in a district court is assigned a
case weight that varies from 0.031 (for cases involving defaulted student loans or
veterans benefit overpayments) to 5.99 (for death penalty habeas corpus cases) based
on the subject matter of the case.3 The weight of the overall average case is 1.0. All
                                                                               4
other case weights were established relative to this national average case. Thus, a
case with a weight of 0.5 would be expected to require on average about half as much
judicial time as the national average case. Conversely, a case with a weight of 2.0
would be expected to take twice as much time as the national average case. Case
                                                                           5
weights for criminal felony cases are applied on a per defendant basis. For example,
the case weight for heroin/cocaine distribution is 2.27. A heroin/cocaine distribution
case with two defendants would be weighted at 4.54—two times the assigned weight
of 2.27. The actual amount of time a judge may spend on individual cases of any
specific type may be more or less than the national average for that type of case.

The total annual weighted filings for a district are determined by summing the case
weights associated with all the cases filed in the district during the year. Weighted
case filings per authorized judgeship is the total annual weighted filings divided by
the total number of authorized judgeships. For example, if a district had total
weighted filings of 4,600 and 10 authorized judgeships, its weighted filings per
authorized judgeship would be 460. The Judicial Conference uses weighted filings of
430 or more per authorized judgeship as an indication that a district may need
additional judgeships. Thus, a district with 460 weighted filings per authorized
judgeship could be considered for an additional judgeship.

In assessing judgeship needs, the weighted case filings are calculated using
authorized judgeships (a number which includes any vacancies). This is a measure of
the average workload per judge in a district court if all the court’s authorized
judgeships were filled. Calculating the weighted case filings per active judge—that
is, on the basis of the number of authorized judgeships filled—would show the
burden of existing vacancies on active judges, but not necessarily the need for more
judgeship positions.




3
 Weights are assigned to each civil case counted as an original filing, removal from state courts, or
interdistrict transfer (transfers from one district to another). Weights are also assigned to each felony
defendant counted as an original filing, reopened filing, or interdistrict transfer. Generally, felonies are
those crimes that carry a term of imprisonment of more than 1 year. Weights are not assigned to civil
cases remanded to the district courts from the courts of appeals, reopened cases, or multidistrict
litigation transfers—cases transferred to a single district from a number of districts for disposition,
such as asbestos or breast implant litigation.
4
Some types of civil cases were weighted differently if they involved the United States as a party or
were removed from state court to federal court.
5
The weights do not include nonfelony criminal cases, which are generally the responsibility of
magistrate, not district, judges.


Page 4                       GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Case Weights Calculated in 1993 Using Time Data Recorded by Judges

The Judicial Conference approved the use of the current district court case weights in
1993. The weights are based on a “case-tracking time study,” conducted between
1987 and 1993, in which judges recorded the amount of time they spent on each of
their cases included in the time study.6 The study included about 8,100 civil cases and
about 4,200 criminal cases that were generally “tracked” from filing to disposition.7
All judges who worked on each case were supposed to record the time they worked
on the case.8

Data collection for the time study began in November 1987. Districts were brought
into the study over a 2-year period, with the last district entering the study in January
1990. When a district was brought into the study, a 2-week period was designated for
sampling, during which all cases filed were included in the time study sample.

At the conclusion of the study, sample cases were grouped into civil and criminal
cases, with individual subclassifications (case types) for each, such as "Contract:
Insurance" and "Bank Robbery." Each sample case had a value associated with it,
which was the total number of minutes reported by the district judge(s) who worked
on it. The number of sample cases in the subclassifications ranged from 18 to 1,563.
Within each subclassification, a simple average and a standard error were computed.
The averages and standard errors were converted into relative values as the final step
in creating the case weights—that is, all the weights were calculated relative to the
time required for the average case in the study.

Methodology Used to Develop Case Weights Was Reasonable

Overall, the weighted case filings, as approved in 1993, are a reasonably accurate
method of measuring the average judge time that a specific number and mix of cases
filed in a district court could require. The methodology used to develop the weights
is reasonable. It used a valid sampling procedure, developed weights based on actual
case-related time recorded by judges from case filing to disposition, and included a
measure (standard errors) of the statistical confidence in the final weight for each
weighted case type.

6
 The time study for bankruptcy courts was a “diary study” in which judges recorded the time spent on
case-related and noncase-related work during a 10-week period. Although each method has different
strengths and limitations, each method can produce useful, reasonably accurate results. Enclosure II
includes a comparison of these two methodologies.
7
    Not all cases were completed by the end of the study; some were still pending.
8
 This included district judges, senior judges, magistrate judges, and visiting judges. District judges—
nonsenior and senior—exercise the full judicial authority vested in the district courts. Nonsenior
district judges are those who hold a designated judgeship position and generally carry a full caseload.
Senior district judges are judges who have retired from regular, full-time active service but remain on
the bench and perform such judicial duties as they are willing and able. Magistrate judges, appointed
for a fixed term of years, exercise the judicial duties permissible by statute and the Constitution that
the district courts delegate to them. Visiting judges are those visiting from their “home court” to assist
in addressing the workload of the court they are visiting. Visiting judges may or may not be senior
judges. Time reported by magistrate judges was not included in the final computations of the case
weights.


Page 5                         GAO-03-788R Accuracy of Judges Case-Related Workload Measures
The sampling method was appropriately designed to ensure that all district judges
and all case types could potentially be included in the sample. The staggered entries
of districts into the study ensured the selection of case samples were taken
throughout the year, reducing or eliminating bias due to seasonal variation in case
filings. Every district court judge could potentially have been a participant in the
study (depending on when the 2-week window was designated at a given district and
case assignments during that period).

The method of recording the time spent on each case was designed to capture all
judge time spent on a sample case. Although it was not possible to determine if all
reportable judge time was in fact recorded and reported, validity checks on the
reported time were made where possible. For example, judge-reported courtroom
time in each sample case was compared with the time reported for the same case in
the judiciary’s database on courtroom proceedings.

The empirical data on hours expended on each case in the sample were used to
develop the case weights. The case weights for specific types of cases were basically
determined by dividing the total amount of time judges reported for that type of case
by the number of such cases in the study. For example, if judges reported a total of
2,000 hours for 200 cases of a specific type in the study, this would translate into 10
hours per case. Sampling variability in the estimates based on the time study data
was quantified and provided with the weights. The standard error that is associated
with each weight provides an indicator of variability due to the weight being
produced via a sample, rather than data from the universe of cases during the study
period. The standard errors can be used to display the statistical reliability of the
weighted case filings estimate for each district. Without some measure of statistical
reliability, it is not possible to objectively assess how accurate the case weights are.

The case weights are relative weights. That is, each case weight was calculated
relative to the average case as determined in the study, which was assigned a value of
1.0. For example, a case type with a weight of 2.0 would be expected to require twice
as much judge time as the “average” case. Relative weights were determined by
dividing the absolute weight of each type of case by the weight or value of the
average case. The Federal Judicial Center (FJC) converted absolute weights to
relative weights by dividing the absolute weight values by 2.132. This value was
chosen after FJC conducted research to determine how to produce a new set of
relative weights that they considered to be comparable to the previous set of relative
weights. As described by FJC officials, this approach was reasonable.

For the purposes of applying the national weights to individual districts, the
methodology assumed two things: (1) that the district’s judges were typical of district
judges as a whole and (2) that the district’s cases of any given type were typical of
that case type as a whole. This may or may not have been true, but these are
reasonable assumptions given the purpose of the study—to develop weights based on
national averages, not to develop weights for individual districts or judges.

Research Design for Updating the District Court Case Weights Raises Concerns

The case weights are almost 10 years old, and the time data on which they were
based are as much as 15 years old. Changes since the case weights were finalized in


Page 6                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
1993, such as changes in the characteristics of cases filed in federal district courts
and in case management practices, may affect how accurately the weights continue
to reflect the time burden on district court judges today. For example, since 1993,
new civil causes of action (such as telemarketing issues) and criminal offenses (new
terrorism offenses) needed to be accommodated within the existing case-weight
structure. According to FJC officials, where the new cause of action or criminal
offense is similar to an existing case-weight type, the weight for the closest case type
is assigned. Where the new cause of action or criminal offense is clearly different
from any existing case weight category, the weight assigned is that for either “all
other civil” for civil cases or “all other criminal” for criminal cases.

The Subcommittee on Judicial Statistics of the Judicial Conference’s Judicial
Resources Committee has approved the research design for revising the current case
weights, with a goal of having new weights submitted to the Resources Committee
for review in the summer of 2004. The research would be led by FJC, who developed
the research design. Although the methodology for updating the case weights
appears to offer the benefit of reduced judicial burden (no time study data
collection), potential cost savings, and reduced calendar time to develop the new
weights, we have some concerns about the basic research design.

Our principal concerns are two: the challenge of obtaining reliable, comparable data
from two different automated data systems for the analysis and the limited collection
of actual data on the time judges spent on cases. Essentially, the design for the new
case weights relies on three sources of data for specific types of cases: (1) data from
automated databases identifying the docketed events associated with cases; (2) data
from automated sources on the time associated with courtroom events for cases; and
(3) consensus estimates from structured, FJC-guided discussions among experienced
judges on the judge-time required for noncourtroom events in the cases, such as
reading briefs or writing opinions. The design assumes that judicial time spent on a
given case can be accurately estimated by viewing the case as a set of individual tasks
or events in the case. Information about event frequencies and, where available, time
spent on the events would be extracted from administrative databases and reports,
and then used to develop estimates of the judge-time spent on different types of
cases. For event data, the research design proposes using new technology (the Case
Management/Electronic Case Filing system) that is currently being introduced into
the court system for recording case management information. However, not all
courts have implemented the new system, and data from the existing and new
systems will have to be integrated in the study. Successfully integrating the data from
these two databases will be a challenge. FJC recognizes this and has developed a
strategy for addressing the issues, which includes forming a technical advisory group
from FJC, AOUSC, and individual courts to develop a method of reliably extracting
and integrating data from the two case management systems for analysis.

Second, the design for developing the new weights does not require judges to record
time spent on individual cases. A significant limitation of the time data to be used is
that the time data available from existing databases and reports are limited to time
associated with courtroom events and proceedings, while a majority of district
judges’ time is spent on case-related work outside the courtoom. The time required
for noncourtroom events, such as reviewing briefs, will be based on the consensus of
groups of experienced judges. Groups of 8 to 13 district judges in each of the 12


Page 7                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
circuits (about 100 in all) will meet in a series of structured discussions to develop
estimates of the time required for different events in different types of cases within
each circuit, using FJC-developed “default values” as the reference point for
developing their estimates. These default values would be based in part on the
existing case weights and in part on other types of analyses. Following this series of
meetings, a national group of 24 judges (2 from each circuit), using structured
procedures, will consider the data from the 12 circuit groups and develop consensus
time estimates for use in developing the weights. These consensus time estimates are
likely to represent a majority of the judge time used to develop the new weights.
These consensus data are dependent upon the experience and knowledge of the
participating judges and the accuracy and reliability of the judges’ recall about the
average time required for different events in different types of cases—about 150 if all
case types in the current case weights were used. The greater the number of events
and types of cases for which judges are asked to make estimates, the greater the
demands on judges to recall accurately the judge time associated with specific events
and types of cases. These consensus data cannot be used to calculate statistical
measures of the accuracy of the resulting case weights. Thus, it will not be possible
to objectively, statistically assess how accurate the new case weights are—weights
on whose reasonable accuracy the Judicial Conference will rely in assessing
judgeship needs in the future.

A concurrent time study using "case tracking" or "diary" methods would be advisable
to identify potential shortcomings of the event-based procedure and to assess the
relative accuracy of the case weights that are produced using that procedure. In the
absence of a concurrent time study, there would be no objective, statistical way to
determine the accuracy of the case weights produced by the proposed event-based
methodology.

Adjusted Case Filings: Courts of Appeals Judge Workload Measure Lacks
Empirical Basis for Assessing Its Potential Accuracy

The principal quantitative workload measure that the Judicial Conference uses to
assess the need for additional courts of appeals judges is adjusted case filings. We
found the adjusted filings workload measure is based on available data from standard
statistical reports for the courts of appeals. The measure is not based on any
empirical data about the judge time required by different types of cases in the courts
of appeals.

The Judicial Conference’s policy is that courts of appeals with adjusted case filings of
500 or more per three-judge panel may be considered for additional judgeships.
Courts of appeals generally decide cases using constantly rotating three-judge panels.
Thus, if a court had 12 authorized judgeships, those judges could be assigned to four
panels of three judgeships each. The Conference may also consider factors other
than adjusted case filings, such as the geography of the circuit or the median time
from case filings to disposition. For 11 of the 12 courts of appeals, the Judicial
Conference counts all case filings equally, with two exceptions. (There is no specific
workload measure established for the D.C. circuit, as discussed later.) First, cases




Page 8                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
refiled and approved for reinstatement are excluded from total case filings.9 Second,
two-thirds of pro se cases—defined by AOUSC as cases in which one or both of the
parties are not represented by legal counsel—are deducted from total case filings
(that is, they are effectively weighted at 0.33). For example, a court with 600 total pro
se case filings in fiscal year 2001 would be credited with 198 adjusted pro se case
filings (600 x 0.33). The remaining nonpro se cases would be weighted at 1.0 each.
Thus, a court of appeals with 1,600 case filings (excluding reinstatements)—600 pro
se cases and 1,000 nonpro se cases—would be credited with 1,198 “adjusted” case
filings (198 discounted pro se cases plus 1,000 nonpro se cases). If this court had 6
judges (allowing two panels of 3 judges each), it would have 599 adjusted case filings
per 3-judge panel, and thus, under the Judicial Conference’s policy, could be
considered for additional judgeships.

The current case-related workload measure for courts of appeals judges, adopted in
                                                                  10
1996, is similar in concept to the measure we reviewed in 1993. Table 1 illustrates
the similarities and differences in the two measures. Although the current workload
measure is expressed in terms of appellate case filings, both the 1986 and 1996 case-
related workload measures are based on assumptions about the judge workload
associated with merit dispositions. Merit dispositions are cases that are decided on
the legal rights of the parties to the case rather than on technical issues, such as lack
of federal jurisdiction.

The workload measure we reviewed in 1993 was based on 5-year averages of merit
dispositions in each circuit separately, and the result was not necessarily comparable
among circuits because of the different methods that each circuit used to decide its
cases. The current measure uses a single national standard for all circuits. Using
national data on merit dispositions as a percentage of case filings in 1994, the current
workload measure was based on the assumption that nationally about 55 percent of
all appellate case filings—except for pro se filings and reinstated filings—result in
merit dispositions. Thus, 500 adjusted case filings would represent 275 merit
dispositions—or 20 more than the 255 used in the 1986 measure. The increase from
255 to 275 was basically a matter of establishing equity between the district courts
and courts of appeals workload thresholds. To be considered for additional district
court judgeships, the Judicial Conference had raised the threshold from 400 to 430
weighted case filings per judgeship (a 7.5-percent increase). The new merit
dispositions standard raised the threshold for courts of appeals from 255 to 275 merit
dispositions (a 7.8-percent increase).




9
 Such cases were dismissed for procedural defaults when originally filed but “reinstated” to the court’s
calendar when the case was later refiled. The number of such cases, as a proportion of total cases, is
generally small.
10
   U.S. General Accounting Office, Federal Judiciary: How the Judicial Conference Assesses the Need
for More Judges, GAO/GGD-93-31 (Washington, D.C.: Jan. 29, 1993).



Page 9                      GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Table 1: A Comparison of the 1986 and 1996 Methods of Measuring Case-Related
Workload for Courts of Appeals Judges
1986                                                             1996
The benchmark for considering additional judgeships in a         The benchmark for considering additional judgeships in a
court of appeals is 255 merit dispositions per 3-judge panel.    court of appeals is 500 adjusted case filings per 3-judge
                                                                 panel.
Prisoner petition cases (a subset of pro se cases) are counted   Pro se cases (which include prisoner petitions) are counted
as one-half of an appellate case filing.                         as one-third of an appellate case filing.
Uses a 5-year average merit termination rate for each            Uses single standard of 500 adjusted filings for all courts of
individual circuit.                                              appeals.
No other adjustments.                                            Does not count number of appeals reinstated after procedural
                                                                 default as part of adjusted filings to prevent double counting
                                                                 of appeals.
Calculations are based on actual 5-year average merit            Calculations apply to each circuit’s appellate case filings.
terminations rate for each court of appeals.

Source: FJC documentation and interviews.


The current court of appeals case-related workload measure represents an effort to
improve the previous measure. As we noted in our 1993 report, using the previous
measure the courts of appeals’ own restraint, not the workload standard, seemed to
have determined the actual number of appellate judgeships the Judicial Conference
requested. At the time the current measure was developed and approved, using the
new benchmark of 500 adjusted case filings resulted in judgeship numbers that
closely approximated the judgeship needs of the majority of the courts of appeals, as
the judges of each court perceived them. The current court of appeals case-related
workload measure principally reflects a policy decision using historical data on
filings and terminations. In 1995, the Subcommittee on Judicial Statistics of the
Judicial Conference’s Judicial Resources Committee sent a survey to the chief judge
of each circuit court of appeals. In the responses, there was no agreement that either
the 500 adjusted filings standard or a weight of 0.33 for pro se cases were the
appropriate standards. Unlike the district court case weights, the adjusted filings
workload measure is not based on empirical data regarding the judge time that
different types of case may require. On the basis of the documentation we reviewed,
we determined there is no empirical basis for assessing the potential accuracy of
adjusted filings as a measure of case-related judge workload.

The D.C. Circuit—Adjusted Case Filings Not Applicable to Its Unusual Caseload
                                                      11
In a report to a Judicial Conference subcommittee, FJC discussed some of the
distinctive features of the Court of Appeals for the D.C. Circuit. The report noted that
approximately 30 percent of the circuit’s filings in fiscal years 1996-1997 were
administrative agency appeals that occur almost exclusively in the D.C. circuit and
were more burdensome than other cases in several aspects. On average, these cases
• had more independently represented participants per case;
• were more likely to have participants with multiple objectives, involve complex or
    statutory law, and require the mastery of technical or scientific information;
• had more briefs filed per case;
• had a higher proportion of cases that were terminated; and

11
 Federal Judicial Center, Assessment of Caseload Burden in the U.S. Court of Appeals for the D.C.
Circuit, Report to the Subcommittee on Judicial Statistics of the Committee on Judicial Resources of
the Judicial Conference of the United States (Washington, D.C.: 1999).


Page 10                            GAO-03-788R Accuracy of Judges Case-Related Workload Measures
•    had a higher rate of case consolidation (where two or more cases are combined
     for decision).

The report concluded that the need for additional judgeships in the D.C. circuit
should not be measured using the general workload threshold of 500 adjusted case
filings per 3-judge panel. However, because no information was available on judges’
actual time expenditures, there was no empirical basis for suggesting a specific
alternative formula for assessing the D.C. circuit’s judgeship needs. The report also
concluded that the D.C. circuit’s remaining caseload—that is, all cases other than
administrative agency appeals—was generally not distinguishable from the caseloads
of the other circuits. The report suggested several possible ways to integrate the
D.C. circuit into the existing adjusted weighted filings system, such as giving greater
weight to federal agency appeals or lowering the general threshold of 500 adjusted
filings per 3-judge panel for the D.C. circuit. The Judicial Conference has not yet
adopted any specific workload measure for the D.C. circuit. However, the Judicial
Conference requested no additional judgeships for the D.C. circuit in 2003.

No Judicial Conference Consensus on How to Revise Adjusted Filings Workload
Measure

In 1993, we recommended that the Judicial Conference improve its workload
measure for the courts of appeals.12 In the last decade, the Judicial Conference has
considered a number of proposals for developing a revised case-related workload
measure for courts of appeals judges, but the Conference has been unable to reach a
consensus on any approach. As part of its assistance to the Conference in this effort,
FJC in 2001 compiled a document that reviewed previous proposals to develop some
                                                           13
type of case weighting measure for the courts of appeals. Table 2 outlines some of
these proposals and their advantages and disadvantages, as identified by FJC.




12
 U.S. General Accounting Office, Federal Judiciary: How the Judicial Conference Assesses the Need
for More Judges, GAO/GGD-93-31 (Washington, D.C.: Jan. 29, 1993).
13
 Federal Judicial Center, Review of Previous Appellate Case Weighting Proposals, (Washington, D.C.:
Aug. 22, 2001).


Page 11                    GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Table 2: Past Proposals to Revise the Case-Related Workload Measure for Courts of
Appeals Judges

              Proposal                              Advantages                            Disadvantages
1. Estimation of case burden based           •   The quantitative approach       •   Judges may not be amenable to
on actual time required to process               would be very thorough.             the time-consuming task of
the case.                                    •   Empirically based data.             recording the hours spent on
                                                                                     individual cases.
                                                                                 •   Time spent gathering data could
                                                                                     be used elsewhere.
2. Estimate of case burden based on          •   Would not be very time-         •   Difficult to agree on which factors
the assessment of burden of only                 consuming for judges.               to use.
“certain characteristics” from an            •   Would assess the                •   Difficult to decide if presence and
already-existing database of                     frequencies of certain              absence of factors is enough
“factors.”                                       “factors.”                          information.
                                             •    Analysis of an existing        •   Database and survey accuracy
                                                 database would save time.           may be compromised.
                                             •   Can use a “wealth” of factors
                                                 to get a big picture of the
                                                 caseload burden.
3. Normative assessment of cases to          •   Convenient to extract           •   Difficult to decide which factors
look qualitatively at the cases as a             information from surveys or         to use.
whole.                                           group discussions.              •   Dependent upon accuracy of
                                                                                     judges recall about the case.
                                                                                 •   Lack of empirically based data.
4. Using multiple regression to use          •   Quantitative approach to        •   Use of a potentially incomplete
information about the proportional               determine factors to use.           model.
mix of cases with different defined                                              •   Inherent statistical limits.
characteristics in the different circuits                                        •   Cannot assess appellate
to account for the differences in case                                               burdens on a national level.
termination level.
5. Using district court weights for the      •   Already available data.         •   Little consistency between the
appellate system.                            •   Save time by using existing         two court systems.
                                                 data.                           •   Sacrifice accuracy.
6. Tallying court opinions (published        •   Most appellate judge work       •   Necessary information cannot be
and unpublished).                                leads to production of              obtained consistently.
                                                 appellate opinions in
                                                 chambers.
7. Sampling cases for approximately          •   Can project the results of 3    •   There is no way to anticipate
3 months for a case-based study                  months of cases, to the rest        possible sample sizes, so cannot
(Nov. 8, 1993).                                  of the year.                        make a statistical prediction.

Source: FJC documentation.


Additionally, there are more proposals that are variations of the above or
combinations of the above. Some of these possibilities have more potential than
others. Generally, methods that rely principally on empirical data on actual case
characteristics and judge behavior (e.g., time expended on cases) are more
appropriate than those that rely principally on qualitative data because statistical
methods can be used to estimate the accuracy of the resulting workload measure.

Conclusions

Overall, the methodology used to develop the district court case weights is
reasonable, and the resulting case weights are a reasonably accurate measure of
district court judge case-related workload. However, the weights are about 10 years
old, and the time data on which they are based are as much as 15 years old.
Consequently, it is uncertain whether the case weights continue to be a reasonably
accurate measure of the average district judge time burden resulting from a specific
volume and mix of cases. The Judicial Conference’s Subcommittee on Judicial
Statistics has approved a research design for updating the current case weights,
about which we have two concerns. The design would rely in large part on data from


Page 12                               GAO-03-788R Accuracy of Judges Case-Related Workload Measures
two different case management data systems and it will be a challenge to reliably and
usefully integrate the data from these two systems for analysis. FJC recognizes this
and is developing a strategy for addressing the issue. Second, the design includes
limited actual data on the time district judges spend on different types of cases. All
the data on noncourtroom time will be based on estimates developed by 13 groups of
experienced judges (about 124 in all) using structured, guided discussions. These
data cannot be used to calculate statistical measures of the accuracy of the resulting
case weights. Thus, it will not be possible to objectively, statistically assess how
accurate the new case weights are—weights on whose reasonable accuracy the
Judicial Conference will rely in assessing judgeship needs in the future.

The adjusted case filings workload measure used for the courts of appeals is not
based on actual data about the time that courts of appeals judges expend on different
types of cases. Rather, it represents a policy judgment of the appropriate workload
benchmark for considering new judgeships that is based on an analysis of past trends
in case filings and merit dispositions. Because of the lack of empirical data on the
time demands on courts of appeals judges, neither we nor the judiciary can assess
whether adjusted filings is a reasonably accurate measure of the workload of courts
of appeals judges. Any methodology to revise the current workload measure that
relies solely on qualitative data is unlikely to provide reasonably reliable and
verifiable estimates of judges’ workload. In 1993, we recommended that the Judicial
Conference develop a better measure of the workload of courts of appeals judges.
Although the Conference has studied many potential methods of improving its
workload measure, it has been unable to agree on any methodology for doing so.

We recognize that a methodology that provides greater empirical assurance of a
workload measure’s accuracy will require judges to document how they spend their
time on a cases for at least some period of weeks. We believe that, given the
importance and cost of federal judgeships, this would be a good investment to ensure
that the workload measures that are used to support judgeship requests are
reasonably accurate and based on the best data available using sound research
methods.

Recommendations

We recommend that the Judicial Conference of the United States

   •   update the district court case weights using a methodology that supports an
       objective, statistically reliable means of calculating the accuracy of the
       resulting weights; and
   •   develop a methodology for measuring the case-related workload of courts of
       appeals judges that supports an objective, statistically reliable means of
       calculating the accuracy of the resulting workload measures and that
       addresses the special case characteristics of the Court of Appeals for the D.C.
       Circuit.




Page 13                GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Agency Comments and Our Response

We provided the Director of the Administrative Office of the United State Courts and
the Director of the Federal Judicial Center with a draft of this report for comment.
Both provided technical comments, which were incorporated into the report as
appropriate. In a May 27, 2003 letter, the Chair of the Committee on Judicial
Resources of the Judicial Conference of the United States provided comments (see
enc. III) that offered four major observations: (1) the case-related workload in each
court district court and court of appeals for which the Judicial Conference has
requested one or more judgeships considerably exceed the minimum thresholds the
Conference has established for considering additional judgeships in district courts
and courts of appeals; (2) we did not provide the full context in which the Judicial
Conference uses the district court case weights in assessing district court judgeship
needs; (3) the workload of the courts of appeals entail important factors that have
defied measurement, including significant differences in case processing techniques;
and (4) we did not fully and accurately describe the full context of the new district
court case weighting study.

With regard to the first two observations, the scope of our work was limited to an
assessment of the relative accuracy of the weighted case filings and adjusted case
filings measures of district court judge and courts of appeals judge workload,
respectively. Our report clearly states that the workload measures we reviewed are
one of many factors the Judicial Conference considers in assessing judgeship needs,
although the assessment begins with these workload measures. With regard to the
courts of appeals, we recognize that there are significant methodological challenges
in developing a more precise workload measure for the courts of appeals. However,
using the data available, neither we nor the Judicial Conference can assess the
accuracy of adjusted case filings as a measure of the case-related workload of courts
of appeals judges. We believe it is premature to conclude that it is not possible to
develop a case-related workload measure for courts of appeals judges whose
accuracy can be reasonably determined.

The Deputy Director of FJC provided comments in a May 27, 2003 letter (see enc. IV).
Both the FJC Deputy Director and the Chair of the Judicial Conference’s Committee
on Judicial Resources said that we did not fully describe the proposed methodology
for updating the district court case weights and why this methodology could produce
case weights whose accuracy could be reasonably assessed. We have added language
to the report that provides more detail on the iterative Delphi technique that would
be used to develop the consensus estimates of the judge time required for
noncourtroom events in many different types of cases. FJC agrees that the Delphi
methodology would not support the calculation of standard errors for the new case
weights, but said that it would allow FJC to assess the integrity of the resulting case
weight system. We do not believe that the proposed methodology can be used to
assess the accuracy of weights based in large part on consensus data. The Delphi
technique of guided, structured discussions inherently relies for its accuracy and
reliability on the experience and knowledge of the participating judges and the
accuracy and reliability of judges’ recall about the average time required for different
events in many different types of cases—about 150 if all case types in the current
weights were used. The greater the number of events and types of cases for which
judges are asked to make estimates, the greater the demands on judges to recall


Page 14                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
accurately the judge time required by those events and types of cases. Generally, the
Delphi technique is most appropriate when more precise analytical techniques are
not feasible and the issue could benefit from subjective judgments on a collective
basis. However, more precise analytical techniques are available and were used to
develop the current district court case weights. We believe that any methodology
used should support the calculation of standard errors. Such statistical measures are
essential for assessing the potential error of the weighted case filings for any specific
district that has requested additional judgeship(s).

We believe that the importance and cost of creating new federal judgeships requires
the best possible case-related workload data to support the assessment of the need
for more judgeships. The methodology approved for the revision of the bankruptcy
case weights offers an approach that could be usefully adopted for the revision of the
district court case weights. The bankruptcy court methodology would use a two-
phased approach. First, new case weights would be developed based on the time
data recorded by bankruptcy judges for a period of weeks—a methodology very
similar to that used to develop the current bankruptcy case weights. The accuracy of
the new case weights could be assessed using standard errors. The second part
represents experimental research to determine if it is possible to make future
revisions to the weights without conducting a time study. The data from the time
study can be used to validate the feasibility of this approach. If the research
determines this is possible, the case weights could be updated more frequently with
less cost than required by a time study. We believe this methodology would provide
(1) more accurate weighted case filings than the design proposed for revising the
district court case weights and (2) a sounder method of developing and testing the
accuracy of case weights that were developed without a time study.

Objectives, Scope, and Methodology

As agreed with your office, our objectives were to (1) determine whether the
methods the Judicial Conference uses to quantitatively measure the case-related
workload of district court and court of appeals judges results in a reasonably
accurate measure of judges’ case-related workload, (2) assess the reasonableness of
any proposed methodologies to update the workload measures, and (3) obtain
information from the AOUSC on the steps the Judiciary takes to ensure that the case
filing data required for these workload measures are accurate. To do this, we
obtained and reviewed documentation on the methodology used to develop the
existing workload measures and proposals to revise those measures from AOUSC
and FJC and interviewed officials at both agencies. We based our assessments on our
experience with and knowledge of sound research design and generally accepted
statistical analysis methods. We also obtained information on the methods the
judiciary uses to ensure the accuracy of the case filings data on which the workload
measured rely. Although the Judicial Conference considers a number of factors in
assessing judgeship needs for the district courts and courts of appeals, our work
focused only on the relative accuracy of the weighted case filings and adjusted case
filings measures. We did our work in Washington, D.C., in April and May 2003.

                                          ----




Page 15                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
We will send copies of this report to interested congressional committees, the
Director, Administrative Office of the U.S. Courts; Director, Federal Judicial Center;
and the Chair, Committee on Judicial Resources, Judicial Conference of the United
States. We will make copies available to others on request. In addition, this report
will be available at no charge on GAO’s Web site at http://www.gao.gov.

If you have any questions about this report, please contact me at (202) 512-8777. The
key contributors to this report were David Alexander, Kriti Bhandari, Rochelle Burns,
and Chris Moriarity.

Sincerely yours,




William O. Jenkins, Jr.
Director, Homeland Security and Justice Issues

Enclosures - 4




Page 16                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Enclosure I

                Quality Assurance Steps the Judiciary Takes to
          Ensure the Accuracy of Case Filing Data for Weighted Filings

Whether the district court case weights are a reasonably accurate measure of district
judge case-related workload is dependent upon two variables: (1) the accuracy of the
case weights themselves and (2) the accuracy of classifying cases filed in district
courts by the case type used for the case weights. If case filings are inaccurately
identified by case type, then the weights are inaccurately calculated. Because there
are fewer categories used in the courts of appeals workload measure, there is greater
margin for error. The database for the courts of appeals should accurately identify
(1) pro se cases (2) reinstated cases, and (3) all cases not in the first two categories.

All current records related to civil and criminal filings that are reported to the
Administrative Office of the U.S. Courts (AOUSC) and used for the district court case
weights are generated by the automated case management systems in the district
courts. Filings records are generated monthly and transmitted to AOUSC for
inclusion in its national database. On a quarterly basis, AOUSC summarizes and
compiles the records into published tables, and for given periods these tables serve
as the basis for the weighted caseload determinations.

In responses to written questions, AOUSC described numerous steps taken to ensure
the accuracy and completeness of the filings data, including the following:

•   Built-in, automated quality control edits are done when data are entered
    electronically at the court level. The edits are intended to ensure that obvious
    errors are not entered into a local court’s database. Examples of the types of
    errors screened for are the district office in which the case was filed, the U.S.
    Code title and section of the filing, and the judge code. Most district courts have
    staff responsible for data quality control.
•   A second set of automated quality control edits are used by AOUSC when
    transferring data from the court level to its national database. These edits screen
    for missing or invalid codes that are not screened for at the court level, such as
    dates of case events, the type of proceeding, and the type of case. Records that
    fail one or more checks are not added to the national database and are returned
    electronically to the originating court for correction and resubmission.
•   Monthly listings of all records added to the national database are sent
    electronically to the involved courts for verification.
•   Courts’ monthly and quarterly case filings are monitored regularly to identify and
    verify significant increases or decreases from the normal monthly or annual
    totals.
•   Tables on case filings are published on the Judiciary’s intranet for review by the
    courts.
•   Detailed and extensive statistical reporting guidance is provided to courts for
    reporting civil and criminal statistics. This guidance includes information on


Page 17                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
    general reporting requirements, data entry procedures, and data processing and
    reporting programs.
•   Periodic training sessions are conducted for district court staff on measures and
    techniques associated with data quality control procedures.

AOUSC did not identify any audits to test the accuracy of district court case filings or
any other efforts to verify the accuracy of its electronic data by comparing the
electronic data to “hard copy” case records for district courts. Within the limited
time for our review, AOUSC was unable to obtain information from individual courts
to include in its responses. We have no information on how effective the procedures
AOUSC described may be in ensuring that the data in the automated databases were
accurate and reliable means of assigning weights to district court case filings.




Page 18                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Enclosure II

    Measuring Judicial Workload Using the Collection of Time Study Data

The current bankruptcy court and district court workload measures were developed
using data collected from time studies. The district court time study took place
between 1987 and 1993, and the bankruptcy court time study took place between
1988 and 1989.
Different procedures were used in these two time studies. The bankruptcy court time
study protocol is an example of a "diary" study, where judges recorded time and
activity details for all of their official business over a 10 week period. The district
court time study protocol is an example of a "case-tracking" study, where a sample of
cases were selected, and all judges who worked on a given sample case recorded the
amount of time they spent on the case. Time studies, in general, have the substantial
benefit of providing quantitative information that can be used to create objective and
defensible measures of judicial workload, along with the capability to provide
estimates of the uncertainty in the measures.

Estimating Judge Time in Diary and Case-Tracking Studies

At the conclusion of a case-tracking study, total time spent on each sample case
closed during the study period is readily available by summing the recorded times
spent on the case by each judge who worked on the case. For a given case type, the
summed recorded times can be averaged to obtain an estimate of the average judicial
time per case for that case type.

For a diary study, however, it is necessary to make estimates of judicial workload for
all cases that were not both opened and closed during the data collection period.
This estimation step requires information from the caseload database, and thus the
accuracy of estimates depends in part on the accuracy of the caseload data. Two
kinds of information are required from the caseload database: case type and length of
time the case has been open. Using these data and the time data judges have
recorded for specific cases, estimates can be made of the overall time required for
cases that were not opened and closed during the calendar period covered by the
diary study.

Comparing Case-Tracking Studies and Diary Studies

Each study type has advantages and disadvantages. The following outlines the
similarities and differences in terms of burden, timeliness of data collection, post-
data collection steps, accuracy, and comprehensiveness.

Burden on Participants

Each study type places burden on judicial personnel during data collection. It is not
clear that one study type is less burdensome than the other. The diary study
procedure requires more concentrated effort, but data are collected for a shorter
period of time.



Page 19                  GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Timeliness of Data Collection

Data collection for a diary study can be completed more quickly than for a case-
tracking study.

Post Data Collection Steps

More effort is needed to convert diary study data to judicial workload estimates than
case-tracking study data. Also, the accuracy of estimates from diary study data
depends in part on the accuracy and objectivity of the information in the caseload
database.

Data Accuracy

It is not clear that one study type collects more accurate data than the other study
type. Some of the bankruptcy court case-related time study data could not be linked
to a specific case type due to misreporting errors and/or errors in the caseload
database. Some error of this type likely is unavoidable because of the requirement to
record all time rather than record time for specific cases only. However, it is
plausible that a diary study collects higher quality data, on average, because all
official time is to be recorded during the study period; judicial personnel become
accustomed to recording their time. In contrast, the data quality for a case-tracking
study could decline over the study's length; for example, after a substantial
proportion of the sample cases are closed, judicial personnel could become less
accustomed to recording time on the remaining open cases.

Comprehensiveness and Efficiency

In theory, a case-tracking study collects more comprehensive information about
judicial effort on a given case than a diary study, because data for a sampled case
almost always are collected over the duration of the case. (Data collection may be
terminated for a few cases that remain open, or are reopened, many years after initial
filing.)

With the diary approach, the total judicial time that is required for lengthy case types
is estimated by combining “snap shots” of the time required by such cases of different
ages. Thus, in theory, producing accurate weights for lengthy case types is not
problematic. In practice, however, difficulties may be encountered. For example, in
the 1988-1989 bankruptcy time study, the asset and liability information for cases
older than 22 months was inadequate and appropriate adjustments had to be made.
In addition, difficulties may arise if only a small number of cases of the lengthy type
are in the system. This is an issue FJC said it is considering as it finalizes how to
assess the judicial work associated with mega cases in the upcoming bankruptcy
case-weighting study.




Page 20                 GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Enclosure III

      Comments from the Chair of the Judicial Resources Committee,
                Judicial Conference of the United States




Page 21            GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Page 22   GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Page 23   GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Enclosure IV

               Comments from the Federal Judicial Center




Page 24            GAO-03-788R Accuracy of Judges Case-Related Workload Measures
Page 25   GAO-03-788R Accuracy of Judges Case-Related Workload Measures
(440195)




Page 26    GAO-03-788R Accuracy of Judges Case-Related Workload Measures