oversight

Military Training: Its Effectiveness for Technical Specialties Is Unknown

Published by the Government Accountability Office on 1990-10-16.

Below is a raw (and likely hideous) rendition of the original report. (PDF)

          c
                     L:nited   States   General   Accounting   Office
                     Report to the Secretary of Defense
GAO

October       1990
                     MILITARY TRAINING
                     Its Effectiveness for
                     Technical Specialties Is
                     Unknown
                                                                        .




                                                                            c




 GAO/PEMD-91-4
Program Evaluation    and
Methodology  Division

B-2399 14

October 16, 1990

The Honorable Richard B. Cheney
The Secretary of Defense

Dear Mr. Secretary:

In this report, we review the information sources on which the services base their
evaluations of the effectiveness of their technical training programs, recruit selection, and
classification decisions. We undertook this review because the technical sophistication of
modern weaponry has intensified the need for well-qualified recruits and effective technical
training. This report identifies some critical gaps in the services’ ability to measure how
effectively they are selecting and preparing recruits to use and maintain today’s complex
weapons systems.

This report contains recommendations in Chapter 5. The head of a federal agency is required
by 31 U.S.C. 720 to submit a written statement on actions taken on these recommendations to
the Senate Committee on Governmental Affairs and the House Committee on Government
Operations not later than 60 days after the date of the report and to the House and Senate
Committees on Appropriations with the agency’s first request for appropriations made more
than 60 days after the date of the report.

We are sending copies of this report to appropriate House and Senate committees, members
of Congress from the states mentioned in the report, and the Director of the Office of
Management and Budget. We will also make copies available to interested organizations. as
appropriate, and to others upon request.

If you have any questions or would like additional information, please call me at (202) 27.5-
1854. Major contributors to the report are listed in appendix VI.

Sincerely yours,




Eleanor Chelimsky
Assistant Comptroller General
Executive Summ~


             The ability of the armed forces to carry out their mission into the nest
Purpose      century will depend on both hardware and personnel considerations: the
             reliability and appropriateness of weapons systems. the quality of mili-
             tary personnel, and the “fit” of human skills to the operating demands
             of weapons systems. If the entry-level aptitude. knowledge, and skills of
             new recruits should fall short of the human requirements needed to
             operate and maintain new technologically sophisticated systems, greater
             demands would be placed on the armed services to compensate for the
             shortfall through training. The purpose of this report was to examine
             the information collected by the Department of Defense (DOD) on both
             the quality of its new recruits and the effectiveness of its training in
             preparing recruits to operate in a technologically sophisticated
             environment.


             A recruit is admitted to military service and assigned to an occupational
Background   specialty on the basis of tests taken at recruitment. Upon completion of
             basic training, most recruits receive additional classroom training in
             their specialty and then are assigned to perform the specialty in the
             field. This typical sequence encompasses the three points in a recruit’s
             service career where data critical to evaluating the success of training
             must be collected: at entrance to military life, during and upon comple-
             tion of formal training, and after assignment to a military specialty in
             the field.

             An adequate system of assessing training effectiveness must include
             reliable and valid information at each of these points, and should
             examine the interrelationships among these data points to test the con-
             gruence of initial selection and placement data, classroom measures, and
             the ultimate criterion-field   performance.

             During the mid-1980’s, the services reported dramatic improvements in
             the general qualifications of new recruits. The improvements were
             attributed to better compensation and educational benefits, increased
             recruiting efforts, and heightened public appreciation of the military
             role. These reports did not, however! address the specific area of tech-
             nical qualifications among recruits. More recently, the services have
             reported difficulty in filling their quotas with highly qualified recruits.
             This perceived decline in the ability levels of recruits entering training
             raises questions about the reality of that decline, about its magnitude.
             about the effectiveness of the process by which recruits are selected for
             training, and about the actual on-the-job performance of those recruits.



              Page 2            GAO/PEMD91-4 Military Technical-Training Effectiveness Is Unknown
                        Executive Summary




                        GAOfound that the aptitude level of recruits did increase during the
Results in Brief        1980’s but that most of the improvement occurred during the first half
                        of the decade. Since then, little change has occurred in general aptitude
                        for training, but the levels of some of the more technical skills have
                        declined among recruits, in one case below the 1981 level. Women and
                        members of minority groups consistently scored lower in tests used to
                        assign recruits to more technical occupational specialties such as radar
                        specialist positions.

                        GAO concluded that, for most recruits, the services’ selection criteria are
                        moderately successful at predicting individual performance during
                        classroom technical training. However, they are notably less successful
                        for women and minority recruits.

                        Each service has evaluation mechanisms in place, but only the Army
                        systematically collects data on the field performance of individual grad-
                        uates in a way that would allow comparison of a graduate’s on-the-job
                        performance with his or her entry-level ability and classroom perform-
                        ance. These data reveal an even weaker connection for women and
                        minority group members between criteria used to assign them to tech-
                        nical specialties and their later field performance. The field evaluation
                        practices of the Navy are particularly fragmented and have deteriorated
                        during the 1980’s. GAO found that the lack of reliable field performance
                        data in the Kavy and the Air Force makes realistic assessment of
                        training effectiveness impossible.

                        GAO concluded that the insensitivity of selection and placement mea-
                        sures as predictors of future success for female and minority recruits is
                        a matter of serious concern in view of the military’s increasing reliance
                        on these groups to perform technical roles.



Principal Findings

Recent Quality Trends   All services administer the Armed Services Vocational Aptitude Battery
                        (ASVAB) to new recruits. The primary measure of a recruit’s aptitude is
                        the Armed Forces Qualification Test (AFQT), which is made up of four
                        ASVABsubtests. AFQT scores have tended to level off after rising in the
                        early 1980’s. Average scores on three of the four subtests used to select
                        candidates for technical training have declined since mid-decade, and
                        scores on one-the Electronics Information subtest-are lower than in


                        Page 3              GAO/PEMD91-4 Military Technical-Training Effectiveness Is Unknown
                             Executive Summary




                             1981. A smaller percentage of recruits now qualify for the most
                             demanding technical specialties than at any time since 1981. Women and
                             minority group members are severely underrepresented among quali-
                             fiers because they score lower, on average, than white males. (See pages
                             18-31.)


Classroom Evaluation         Each service has established evaluation mechanisms to monitor instruc-
                             tional quality and curriculum coverage in classroom training. Overall,
Measures                     the grading procedures in the courses GAO reviewed appeared to discrim-
                             inate acceptably well among levels of student performance (with the
                             exception of some Army courses where recorded grades were unreliable
                             indicators of classroom performance). (See pages 32-34, 36-38. and 40-
                             41.)

                             Selection criteria from .;\SVABare moderately successful in predicting the
                             performance of most students for training, but are significantly less reli-
                             able predictors for women and minority students. While these groups
                             appeared to overcome their lower scores on aptitude measures in the
                             Kavy and Air Force courses reviewed, the differences in classroom per-
                             formance for nonwhite and female students persisted throughout the
                             Army technical courses reviewed. (See pages 34-36, 38-39, and 40-41.)

                             GAO developed a statistically more sophisticated summary score from
                             ASVAB using factor analysis. This factor score generally performed better
                             than AFQT and the Electronics Composite score in predicting final grades
                             for all demographic groupings. This finding suggests that broader-based
                             selection criteria than those currently in use could be more reliable
                             predictors of classroom performance, at least in the technical areas GAO
                             reviewed. (See pages 36,39, and 41.)


Field Measures of Training   The Army’s Skill Qualification Test provides the only objective, system-
                             atically collected estimates of the field performance of individual gradu-
Effectiveness                ates of training. The Air Force and the Navy rely instead largely on
                             feedback mechanisms through which field commanders and supervisors
                             may submit complaints to the training community if they believe their
                             graduates have been inadequately trained. In addition, Air Force evalua-
                             tion units periodically survey a sample of supervisors of course gradu-
                             ates for their perceptions of the quality and appropriateness of training.
                             A similar practice was followed in the Navy until the mid-1980’s.
                             Internal reports have been sharply critical of the quality of the Navy’s



                             Page4               GAO/PEhfD91-4MUaryTechnicabT1dning   EffectivenessIsUnknown
                  Executive&mm84




                  training assessment procedures, but these deficiencies are only slowly
                  being corrected. (See pages 45-50.)

                  Field performance measures have been developed by DOD under the
                  Joint-Service Job Performance Measurement project and may be appli-
                  cable to training assessment purposes. (See page 5 1.)

                  ASVABscores in our sample are weaker predictors of field performance as
                  measured by the Army than they are of classroom performance and
                  only predict well for white male recruits. The factor scores developed by
                  GAO are better predictors than either AFQT or the Electronics qualifying
                  scores used by the Army. No ASVABscore was significantly correlated
                  with field performance for women or minority soldiers. (See pages 45-
                  46.)


                  GAO believes that evaluating the effectiveness of the training provided
Recommendations   by the services is crucial if they are to meet the future challenges of
                  changing demographics and increasingly sophisticated weaponry. GAO
                  therefore recommends that the Assistant Secretary of Defense for Force
                  Management and Personnel attempt to develop more sensitive indicators
                  of classroom and field performance in technical specialties for women
                  and minority recruits from extant data. GAO also recommends that the
                  Assistant Secretary review alternative measures of field performance
                  already developed by the services under the Job Performance Measure-
                  ment project for their applicability to training and on-the-job perform-
                  ance evaluation. GAO further recommends that the Secretary of the
                  Army direct the Training and Doctrine Command to review for accu-
                  racy, appropriateness, and reliability the classroom grading procedures
                  identified within the report as deficient. Finally, GAO recommends that
                  the Secretary of the Navy establish a firm deadline for developing a
                  training evaluation program and that he direct that current resources
                  allocated to this effort be reexamined for their adequacy.


Agency Comments   its recommendations and identified specific actions to be taken toward
                  implementing them. DOD also concurred or partially concurred with what
                  it identified as the main findings contained in the report. (See appendix
                  V.) We have reviewed these comments and, where appropriate, have
                  made changes to the text.




                  Page 5             GAO/PEMD91-4 Military Technical-Trahing JSffectivenesaIs Unknown
Contents



Executive Summary                                                                                  *>
Chapter 1
Introduction           Recruit Quality in the 1980’s
                       Recruit Training
                       Objectives, Scope, and Methodology
                       Strengths and Limitations of Our Study

Chapter 2                                                                                        18
                       Armed Services Vocational Aptitude Battery (ASVAB)                        18
The-Quality of         Summary and Conclusions                                                   30
Military Recruits:
1981-89
Chapter 3                                                                                        32
                                                                                                 33
Classroom Measures of Army                                                                       36
Training Effectiveness zrviorce                                                                  39
                       Summary and Conclusions                                                   42

Chapter 4                                                                                         45
                                                                                                  45
Field Measures of      AMY
                                                                                                  48
Training Effectiveness ~~~~orce                                                                   50
                       Alternative Data Sources: The Job Performance                              51
                            Measurement Project
                       Summary and Conclusions                                                    -52

Chapter 5                                                                                         53
                       Sun-u-nary                                                                 53
SummaW,                Recommendations                                                            54
Recommendations,       Agency Comments and Our Response                                           55
and Agency Comments
and Our Response




                       Page 6            GAO/PEMD@l-4 MiUtuy Technkal-Train@ JZffectivenesaISIC’nknO~
             Contents




Appendixes   Appendix I: AFQT Mean Score and Electronics Composite                        60
                 Summary Statistics: 198 l-89
             Appendix II: Predictor and Criterion Variable Mean                           6-l
                 Scores
             Appendix III: Intercorrelation of Study Variables by                         66
                 Occupational Specialty
             Appendix IV: Army SQT Mean Scores, by Occupational
                 Specialty
             Appendix V: Comments From the Department of Defense                          78
             Appendix VI: Major Contributors to This Report                              103

Tables       Table 1.1: How AFQT Test Results Are Categorized                             15
             Table 3.1: Army Occupational Specialties Reviewed                            33
             Table 3.2: Mean Scores on Predictor and Criterion                            34
                 Variables, Army
             Table 3.3: Intercorrelation of Study Variables, Army                         35
             Table 3.4: Occupational Specialties Reviewed, Navy                           37
             Table 3.5: Mean Scores on Predictor and Criterion                            37
                 Variables, Navy
             Table 3.6: Intercorrelation of Study Variables, Navy                         39
             Table 3.7: Occupational Specialties Reviewed, Air Force                      40
             Table 3.8: Mean Scores on Predictor and Criterion                            40
                  Variables, Air Force
             Table 3.9: Intercorrelation of Study Variables, Air Force                    42
             Table 4.1: Correlation of SQT and Predictor Variables                        46
             Table I. 1: AFQT Mean Scores, by Gender                                      60
             Table 1.2: AFQT Mean Scores, by Service                                      60
             Table 1.3: AFQT Mean Scores, by Race/Ethnicity                               61
             Table 1.4: AFQT Mean Score Overall Totals                                    61
             Table 1.5: Electronics Composite Mean Scores, by Gender                      62
             Table 1.6: Electronics Composite Mean Scores, by Service                     62
             Table 1.7: Electronics Composite Mean Scores, by Race/                       63
                  Ethnicity
             Table 1.8: Electronics Composite Mean Score Overall                          63
                  Totals
             Table II. 1: Army Mean Scores                                                64
             Table 11.2:Navy Mean Scores                                                  64
             Table 11.3:Air Force Mean Scores                                             6.5
             Table 111.1:Intercorrelation of Study Variables: Army,                       66
                  245




              Page 7           GAO/PEMD91-4 MLUtary Technical-Trahing Effectiveness Ia Unknown
          Contents




          Table 111.2:Intercorrelation       of Study Variables: Army,                      fii
              27N
          Table 111.3:Intercorrelation       of Study Variables: Army,                      fi8
               29V
          Table 111.4:Intercorrelation       of Study Variables: Navy, AQ                   69
          Table 111.5:Intercorrelation       of Study Variables: Navy, AX                   70
          Table 111.6:Intercorrelation       of Study Variables: Navy,                      il
              STG
          Table 111.7:Intercorrelation       of Study Variables: Kavy, STS                  i2
          Table 111.8:Intercorrelation       of Study Variables: Air Force,                 73
              45530A
          Table 111.9:Intercorrelation       of Study Variables: Air Force,                 i-l
              45530B
          Table III. 10: Intercorrelation     of Study Variables: Air                       i.7
              Force. 30332
          Table III. 11: Inter-correlation    of Study Variables: Air                       i-6
              Force, 30333

Figures   Figure 1.1: Recruit Training Process                                               12
          Figure 1.2: Data Sources and Comparisons                                           14
          Figure 2.1: Mean AFQT Scores, by Gender: 198 l-89                                  19
          Figure 2.2: Mean AFQT Scores, by Race/ Ethnicity: 1981-                           20
               89
          Figure 2.3: Mean AFQT Scores, by Service: 1981-89                                  21
          Figure 2.4: Mean AFQT Subtest Scores, 1981-89                                      22
          Figure 2.5: Mean Electronics Composite Scores, by                                  23
               Gender: 198 l-89
          Figure 2.6: Mean Electronics Composite Scores, by Race/                            24
               Ethnicity: 1981-89
          Figure 2.7: Mean Electronics Composite Scores, by                                  25
               Service: 1981-89
          Figure 2.8: Mean Electronics Composite Subtest Scores,                             26
               1981-89
          Figure 2.9: Number of Recruits Qualifying for Training as                          27
               Control and Warning Radar Specialists, 1981-89
          Figure 2.10: Percent of Recruits Qualifying for Training                           28
               as Control and Warning Radar Specialists, 1981-89
          Figure 2.11: Number of Recruits Qualifying for Training                            29
               as Systems Repair Technicians, 1981-89
          Figure 2.12: Percent of Recruits Qualifying for Training                           30
               as Systems Repair Technicians, 1981-89




          Page 8               G~o/PElm.ol4     MUitary Ted&al-W        Effectiveness Ia Unknown
Contents




Abbreviations

AFgT       Armed Forces Qualification Test
ASVAB      Armed Services Vocational Aptitude Battery
DOD        Department of Defense
F’LETAP    Fleet Training Assessment Program
GAO        General Accounting Office
ISD        Instructional System Development
JPM        Job Performance Measurement
M-SC       Naval Training Systems Center
SQT        Skill Qualification Test
TAST       Training Assessment Survey Team


Page9              GAO/PEMB91-4 Military TechnicaMminhg Effectiveness Is Unknown
Chapter 1

Introduction


                         The ability of the armed forces to carry out their mission into the next
                         century will depend on both hardware and personnel considerations: the
                         reliability and appropriateness of weapons systems. the quality of mili-
                         tary personnel, and the “fit” of human skills to the operating demands
                         of weapons systems. If the entry level aptitude, knowledge, and skills of
                         new recruits should fall short of the human requirements needed to
                         operate and maintain new technologically sophisticated weapons sys-
                         tems, greater demands would be placed on the armed services to com-
                         pensate for the shortfall through training. In this report, we will
                         examine the information collected by DOD on both the quality of its new
                         recruits and the effectiveness of its training in preparing recruits to
                         operate in a technologically sophisticated military environment.


                         In hearings before the House Appropriations Committee on the fiscal -
Recruit Quality in the   year 1988 budget for DOD, the Assistant Secretary for Force Manage-
1980’s                   ment and Personnel characterized the changes since 1980 in the nation’s
                         armed forces in these words: “Today we are recruiting the highest
                         quality personnel in history. [The services’ personnel possess]. high
                         intelligence, correct experience mix, [and] high skill levels.” The reasons
                         cited for this “most remarkable turnaround in peacetime history” were
                         many: higher pay and improved quality of life for members of the
                         armed forces; the recession and consequent unemployment of the early
                         1980’s, which widened the pool of applicants; improved educational
                         benefits for military service; more intensive and effective recruiting;
                         and recovery from the poor public perception of the military following
                         the war in Vietnam.

                         The statistics cited by DOD supported this favorable view. In 1980, 68
                         percent of recruits were high school graduates (versus 75 percent for
                         the youth population in general). By 1986, 92 percent of recruits had
                         high school diplomas. Whereas 65 percent of recruits in 1980 scored in
                         the top three mental categories on the Armed Forces Qualification Test
                         (versus 69 percent for the norm group), in 1986,96 percent achieved
                         this level.

                         Yet the demographic and educational realities of the immediate future
                         are likely to affect this optimistic scenario. The number of young people
                         available for the military recruit pool will continue to diminish until the




                         Page 10            GAO/PEMD-91-dMIIitary Technical-Training Effectiveness Is Unknown
                   Chapter 1
                   Introduction




                   mid-1990’sL The composition of the recruit pool will also shift.
                   According to research sponsored by the Department of Labor. by the
                   year 2000 five of every six new labor force entrants will be female.
                   minority group members, or immigrants.’ Meanwhile. the graduates of
                   the American educational system are said to be falling further behind
                   the youth of competitor nations in technological literacy at the same
                   time that U.S. weapons systems are becoming increasingly
                   sophisticated.3

                   DOD  has also begun to voice concern. Hints of uneasiness emerged in the
                   fiscal year 1988 appropriations hearings when the Air Force veported
                   increased difficulty in securing quality recruits. In the same hearings,
                   the Navy expressed its concern over the steady erosion of its Delayed
                   Entry Pool-the program under which applicants agree to enter the ser-
                   vice within a year. In addition, for the first time in eight years, the
                   Army failed to meet its quarterly recruiting quota in the first quarter of
                   fiscal year 1989.


                   Figure 1.1 identifies the typical sequence that occurs during the early
Recruit Training   stages of a recruit’s time in the military. As shown, after their basic
                   training-the   length and content of which varies by service-most
                   recruits attend additional training to equip them to function effectively
                   in some occupational specialty. The recruit’s area of specialization is
                   determined by service needs, qualifications as determined on tests
                   administered during the recruiting process, and individual interests.




                   LU.S.Bureau of the Census, Projecttons of the Population of the United States, by Age, Sex. and Race.
                   1988 to 2080. Current Population Reports, Series P-25, No. 1018 (Washington, DC.: U.S. Government
                   F’rinting Office, 1989) p. 6.

                   2William B. Johnston and Arnold H. Packer, Workforce 2QOO:Work and Workers for the 2 1st Century
                   (Indianapolis, Indiana: Hudson Institute, 1987) p.95 See also U.S. Office of Personnel Management.
                   Civil Service 2000 (Washington, DC.: U.S. Government Printing, Office, 1988).

                   3Martin Binkin, Military Technology and Defense Manpower (Washington, D.C.: The Brookqs Instl-
                   tution, 1986). See also Aerospace Education Foundation. America’s Next Crisis: The Shortfall III Tech-
                   nical Man wer (Arlington, Va.: The Aerospace Education Foundation, 1989); and National Research
                      uncle, A hallenge III Numbers: People in the Mathematical Sciences (Washington, D.C.: National
                   iIT7++-.
                   Academyof Sciences, 1990).



                   Page 11                  GAO/PEMD91-4 Military Technical-Training Effectiveness Is Unknown
                                        Chapter 1
                                        Introduction




Figure 1.l : Recruit Training Process




                                                 Basic Training




                                                       1
                                            Occupational Specialty
                                                   Training




                                            Assignment to Field in
                                                   Speciatty




                                        The training curriculum for each occupational specialty is designed
                                        through a structured set of procedures called Instructional System
                                        Development (ISD) that draws heavily on the work by Tyler and others
                                        on the behavioral objectives of instruction.4 The ISD model consists of the
                                        following five steps:

                                        1. Determine job requirements through detailed analysis of tasks per-
                                        formed in an occupational specialty.

                                        2. Determine type of instruction (formal classroom, on-the-job, or other)
                                        that best suits the student population and task requirements.




                                        4See,for example, R.W. Tyler, Basic Principles of Curriculum and Instruction (Chicago: University of
                                        Chicago Press, 1950); and R. W. Tyler, R.M. Gagne, and M. Striven, Perspectives of Curriculum Evalu-
                                        ation (Chicago: Rand McNally, 1967).



                                        Page 12                   GAO/PEMD914      Military Technical-Trabdng    Effectiveness Is Unknown
                        Chapter 1
                        Introduction




                        3. Develop objectives that specify the desired behaviors, the conditions
                        under which they are to be demonstrated, and an acceptable standard of
                        performance.

                        4. Plan and develop instructional methods, media, and equipment.

                        5. Conduct and evaluate instruction.

                        A student’s progress through an IsDdeveloped curriculum is measured
                        by criterion-referenced tests at the end of each block of training. A stu-
                        dent passes the course after he or she has performed each task identi-
                        fied as a job requirement at the level of competency defined as
                        acceptable. Continuous monitoring of job requirements is needed to
                        assure that course objectives remain relevant.

                        Upon successful completion of classroom training in the occupational
                        specialty, the recruit is ready for assignment in the field to carry out the
                        duties requiring the skills acquired during training. Formal training is
                        now complemented by the necessary on-the-job training to permit the
                        recruit to function as part of a unit with a defined mission in a real-
                        world setting.


                        The purpose of our study is twofold: to profile the aptitudes of the
Objectives, Scope,and   recruits who entered the service from 1981 to 1989, and to evaluate the
Methodology             military service’s ability to select successful trainees and to assess their
                        training and work performance. We will examine the three points in a
                        recruit’s service career where data critical to performing a thorough
                        evaluation of training must be collected: (1) at entrance to military life,
                        prior to assignment to an occupational specialty; (2) during training,
                        when the recruit’s mastery of the specialty’s basics is assessed; and (3)
                        after assignment to the field, where what was learned in the classroom
                        must be applied in the work environment. (See figure 1.2.)




                        Page 13            GAO/PEMD-914 hIilltary Technical-Training Effectiveness Is Unknown
                                           Chapter 1
                                           Introduction




Figure 1.2: Data Sources and Comparisons

                                                    Prerecruitment Testing
                                                    Data Used for Selection
                                                        and Placement
                                                              (1)

                                                                                                               Comparisons Test the
                                                                                                             Effectiveness of Selection
                                                                                                                      Procedures
                                                                                             t-i




                                                                                                               Comparisons Test the
                                                                                                                  Effectiveness of
                                                                                                                Classroom Training

                                                                                                         \
                                                                                .
                                                    Field Evaluation Data on
                                                        Job Performance

                                                I               (3)




                                           The evaluation model underlying our review assumes the need to inter-
                                           relate these three points. Comparing the information collected at points
                                           1 and 2 can provide some insight into the ability of the services to pre-
                                           dict how well recruits will perform in training on the basis of their
                                           scores in qualifying tests. The strength of the relationship between
                                           points 2 and 3 is a partial measure of the validity and effectiveness of
                                           training. Finally, the relationship between points 1 and 3 is an estimate
                                           of the effectiveness of the services’ selection and training procedures.

                                           The model is, of course, simplistic and in need of considerable expan-
                                           sion. A fully detailed model would have to consider other influences on
                                           performance, such as on-the-job experiences, and would need to be able
                                           to determine the location of a problem if relationships between the three


                                           Page 14                  GAO/PEMD-914 Military TechnkaLT raining Effectiveness Is C~~IIOWII
                                       Chapter 1
                                       Introduction




                                       points were weaker than anticipated. Yet. the model, at whatever level
                                       of sophistication, would at a minimum require data at these three crit-
                                       ical points in a recruit’s service career.

                                       We reviewed the information collection practices of each service at the
                                       three points identified in the model. For a selected number of occupa-
                                       tional specialties-our   focus is on training for the more technical occu-
                                       pational specialties- we reviewed the data that have been collected for
                                       insights they provide into the service’s selection and evaluation proce-
                                       dures, particularly as they affect women and minority groups.

                                       Our study is organized around three evaluation questions, each corre-
                                       sponding to one of the model data points. Each question is addressed in
                                       a separate chapter.

                                       1. How has the aptitude of recruits for technologically sophisticated spe-‘
                                       cialties changed since 1980?

                                       DOD tracks recruit aptitude according to four broad mental categories
                                       based on the scores on the Armed Forces Qualification Test (AFQT). (See
                                       table 1.1.) AFQT is a composite of four of the ten tests from the Armed
                                       Services Vocational Aptitude Battery (ANAB) administered to every
                                       potential recruit. We examined some other components of ASVAB in
                                       greater detail, particularly those subtests that are used to qualify candi-
                                       dates for high technology occupational specialties.

Table 1.1: How AFQT Test Results Are
Categorized                                                                AFQT percentile
                                       AFQT category                                score               Trainability
                                       I                                            93-99               Well above average
                                       II                                              65-92            Above average
                                       IllA                                            50-64            Average
                                       IlIE                                            31-49            Average
                                       IV                                              10-30            Below average
                                       va                                                 l-9           Well below average

                                       ‘Category V examlnees are excluded by law from mhtary service.

                                       2. How useful are the data collected by the services before and during
                                       classroom training for selecting individuals for high technology roles
                                       and for evaluating the effectiveness of this training?

                                       We examined the measures of recruit performance collected during
                                       training and assessed their utility for evaluating training effectiveness,


                                       Page 15                 GAO/PEMD-91-4Military Technical-Train@ Effectiveness Is Ijnknown
Chapter 1
Introduction




as well as for providing information on the vaiidity of procedures used
to assign recruits to training.

3. How well do the services’ selection criteria and training evaluation
measures predict success in high technology roles’?

We examined the procedures used by each of the services to assess the
impact of training on actual job performance. We also related these pro-
cedures to the ASVAB scores used to select trainees and to classroom mea-
sures of training success, in order to estimate the predictive validity of
these measures.

In view of the demographic shifts projected for the labor force over the
next decade, we provided separate answers to each of these questions,
wherever possible and appropriate, for women and minorities.

We defined high technology roles as those occupational specialties for
which the services require a qualifying score in electronics substantially
above the mean. For our review, we selected a sample of 13 such
courses- five from the Army and four each from the Navy and the Air
Force-from which we collected data on individual student perform-
ance. Each of these courses is intended to provide a recruit the neces-
sary introductory training to qualify as an apprentice in his specialty.

In the course of our review, we interviewed officials responsible for
training evaluation in the Office of the Secretary of Defense and within
each of the three services. We visited four service training centers and
the facilities maintained by each of the services for research into
training and other personnel issues, as well as the Training Performance
Data Center in the Office of the Secretary of Defense. Our final data
base was compiled from information received from all of these sources,
but our primary source for ASV’ and demographic data was the Defense
Manpower Data Center. We also received information from the Center
for Naval Analyses on technical adjustments to ASVAB validity estimates,
and on the ASVAB norm group. This study was conducted in accordance
with generally accepted government auditing standards.




Page 16            GAO/PEMB914 Military Technical-Training Effectiveness is C&nom
                     Chapter 1
                     Introduction




                     Our review of the quality trends among the 2.3 million recruits cvho
Strengths and        entered military service from 1981 to 1989 is more finely grained than
Limitations of Our   the traditional counts of recruits in each of four mental categories rou-
Study                tinely reported to the Congress. We report the differences among racial
                     groupings and between male and female recruits, and we examine dif-
                     ferential trends among the various areas measured by ASVAB. We
                     assumed the reliability and validity of the widely researched ASMB and
                     its subtests and made no independent review of these factors. However.
                     we did develop an independent scoring procedure for ASVAB that sug-
                     gests an alternative, and apparently more valid, approach to assigning
                     recruits to occupational specialties.

                     The intent of our review of classroom grades and other evaluation mea-
                     sures was to identify the major sources of training evaluation informa-
                     tion now in place in the services, and to make use of the objective data
                     we collected to address some concerns about recent trends in recruit
                     quality and the future composition of the recruit pool.

                     Two important considerations about our sample of students limit any
                     attempt to generalize our findings. First, we deliberately chose occupa-
                     tional specialties for which the services required above average mental
                     qualifications. While the types of classroom measures employed in these
                     courses would most likely be found in other courses with similar
                     requirements, we can say little about the evaluation procedures for less
                     demanding specialties. Second, in part because of the nature of the spe-
                     cialties we chose, our sample contained relatively few members of
                     minority groups and very few women. This fact limited the power of our
                     statistical analysis of these subgroups, and allowed only first-level com-
                     parisons (that is, white versus nonwhite; male versus female). Neverthe-
                     less, even at this level, we believe we have identified some important
                     differences and gaps in the available data for determining the success of
                     training outcomes. These differences and gaps, together with other find-
                     ings from our analyses, strongly suggest the need for further, more
                     targeted evaluation of its training efforts by the military.




                     Page 17           GAO/PEMD914 bfilitary Technical-Tdning Effectivenes,s Is Unknown
Chapter 2

The Quality of Milim                   Recruits: 1981-89


                      In 1980, there were 2.4 million more American youths aged 18-21 than
                      there are today. This age group, which now numbers 15 million, will
                      diminish to 13.5 million by the mid-1990’s. This 15-year 22-percent
                      decline in the population from which the all-volunteer force draws its
                      new personnel must be a matter of concern to military recruiters. The
                      concern is exacerbated when we consider the technological aptitude of
                      the potential recruit pool: it appears that the graduates of our public
                      schools are becoming less technologically literate when compared to
                      their peers in other developed nations-and this decline is occurring just
                      as our weapons systems are reaching new heights of technological
                      sophistication.

                      However, by the standards set by DOD, the quality of military recruits in
                      the first half of the 1980’s did not decline in proportion to the dwindling
                      numbers in the recruit pool. As we have noted in the previous chapter,
                      DOD reported “the most remarkable turnaround in peacetime history”
                      between 1980 and 1986, with dramatic increases in the proportion of
                      recruits who had graduated from high school and who scored in the top
                      three AFQTcategories.

                      In this chapter, we will address our first evaluation question: How has
                      the aptitude of recruits for technologically sophisticated specialties
                      changed since 1980? Our purpose is threefold: (1) to determine whether
                      the quality gains as defined and reported by the services in the first half
                      of the 1980’s are being maintained; (2) to expand the definition of
                      quality to include other measures beyond those traditionally reported
                      (that is, high school graduation and service-defined mental category);
                      and (3) to examine in greater detail two occupational specialties that, by
                      service definition, require higher entry levels of technological sophisti-
                      cation. We will report the trends we found in the scores achieved by
                      recruits from fiscal year 1981 through fiscal year 1989 on some of the
                      various subtests and composites of the Armed Services Vocational Apti-
                      tude Battery (ASVAB), the instrument used by all services to both qualify
                      applicants for entry and classify recruits into occupational specialties.
                      We will examine in detail those scores that are used by the services to
                      qualify recruits for more technologically demanding specialties.



Armed Services        tant for military service. Scores from ASVAB subtests are combined to
Vocational Aptitude   form composite scores thought to be related to general types of occupa-
Battery (ASVAB)       tionai specialties within the armed forces. While different services use
                      different methods to combine subtest scores into composites, all services


                      Page 18            GAO/PEMD914   Military Technical-Tndnhg   Effectiveness IE Unknown
                                   Chapter 2
                                   The Quality     of Military   Recruita:   1981-89




                                   use the same component subtests for two composite scores, the Armed
                                   Forces Qualification Test (AFQT) and the Electronics Composite. We
                                   examined these two in detail to determine how they have changed
                                   during the 1980’s.


Armed Forces                       An AFQT score is currently derived from a recruit’s scores on four ASVAB
                                   subtests: Word Knowledge, Paragraph Comprehension, Arithmetic Rea-
Qualification Test (AFQT)          soning, and Mathematics Knowledge.’ AFQT scores are the primary
                                   mental criterion for entry into the armed services. Figure 2.1 displays
                                   the mean composite AFQTscores for men and women from 1981 through
                                   1989. Actual mean scores for this period may be found in appendix I.


Figure 2.1: Mean AFQT Scorer, by
Gender: 1981-89
                                   215




                                   200



                                   185

                                     199l          1992          1959          1W        1995           lH5   1997     1955        1999


                                            -        MALE
                                            II--     FEMALE

                                   Note: AFQT scores were computed as the sum of standard scores on Arithmetic Reasoning and
                                   Mathematics Knowledge, plus the Verbal standard score times two. This is the formula used by DOD
                                   as of January 1, 1989.

                                   Source: Data are from the Defense Manpower          Data   Center.


                                   ‘Before 1989, AFQT scoceswere computed differently. In order to mountain comparability. we com-
                                   puted AFQT scores of all recruits using the 1989 definition and the standard subtest scores provided
                                   by the Defense Manpower Data Center.



                                   Page 19                       GAO/PEMD91-4 Military Teclmicd-Tnhhing Effectiveness Is Unknown
                                         Chapter 2
                                         The Quality of Military Ik~ruits: 1981439




                                         Overall AFQT scores improved approximately eight points between 198 1
                                         and 1989. This improvement occurred among both male and female
                                         recruits. However, despite fluctuations over the years, the scores of
                                         male recruits began and ended the decade slightly higher than female
                                         scores. Male scores continued to increase each year until 1988, although
                                         their rate of increase was greatest in the first four years. Female scores
                                         improved dramatically from 1981 to 1983 but then flattened out, so that
                                         by the end of the decade they were lower than in any year since 1985.

                                         AFQT scores differed more substantially across racial/ethnic groupings
                                         than between genders. (See figure 2.2.) White recruits began the decade
                                         with scores approximately 21 points higher than minority recruits. By
                                         1989, this difference had shrunk to 15 points. The bulk of the relative
                                         gain by minority recruits, however, had occurred by 1985, and any nar-
                                         rowing of this gap since then has been slight.


Figure 2.2: Mean AFQT Scores, by Race/
Ethnicity: 1981-89                       220




                                           1981           1982        1oCn     1904      1WS          lS%      1981       1988        1989


                                                  -        WHITE
                                                  I---     BLACK
                                                  -        HISPANIC
                                                  nnn n    OTHER

                                         Note: AFQT scores were computed as the sum of standard scores on Arithmetic Reasoning and
                                         Mathematics Knowledge, plus the Verbal standard score times two. This is the formula used by DOD
                                         as of January 1, 1989.
                                         Source: Data are from the Defense Manpower    Data Center.



                                         Page 20                      GAO/PEMDBl-I Military Technical-Tdniug Effectiveness Is Unknown
-_
                                   Chapter 2
                                   The Quality of Military Recruits: 198189




                                   Mean AFQT scores in all services were significantly higher in 1989 than
                                   in 1981. (See figure 2.3.) Army recruits showed the greatest gain.
                                   Average Army scores were substantially lower than those of other ser-
                                   vices at the beginning of the decade, but by 1986 they had increased to
                                   approximately the same level as scores achieved by Navy and Tvlarine
                                   recruits. Navy scores peaked in 1983 and have declined somewhat
                                   slowly and erratically since then to a level less than 2 points higher than
                                   they were at the beginning of the decade. Air Force AFQT scores have
                                   consistently averaged higher than the other services’ and have not dis-
                                   played their tendency to plateau at mid-decade levels.


Figure 2.3: Mean AFQT Scores, by
Service: 1981-89                   22s


                                   220


                                   21s


                                   210


                                   20s


                                   200


                                   195

                                     1Wl          1981     1989       1984        1WS         lses      1907       1oBB       1989


                                           -       ARMY
                                           I---    NAW
                                           m       AIRFORCE
                                           mmmm    MARINE CORPS

                                   Note: AFQT scores were computed as the sum of standard scores on Arithmetic Reasoning and
                                   Mathematics Knowledge,plus the Verbal standardscore times two. This is the formula used by DOD
                                   as of January 1,1989.

                                   Source: Data are from the Defense Manpower   Data Center

                                    Figure 2.4 displays the service-wide mean scores on each of the four
                                    component subtests that make up AEQT.For two of the subtests, Word
                                    Knowledge and Paragraph Comprehension, the pattern is quite similar,
                                    with the sharpest gains occurring by 1985, and little change thereafter.


                                    Page 21                GAO/PEMIbB14 hIWary Tech&al-Trdning           Effectiveness Is c’nknown
                                        Chapter 2




                                        Scores in Mathematics Knowledge and Arithmetic Reasoning increased
                                        substantially between 1981 and 1984. Arithmetic Reasoning scores
                                        declined after that point, but scores in Mathematics Knowledge have
                                        continued to rise and were the only subtest scores to increase from fiscal
                                        year 1988 to fiscal year 1989.


Figure 2.4: Mean AFOT Subtest Scores,
1981-89


                                        54


                                        53


                                        52


                                        51



                                        50

                                         1901             1982     1983      1984            1916         1900       1oBl      1999       1999


                                                B          ARITH. REASONING
                                                - - - -    WORD KNOWLEDGE
                                                m          PARA. COMPREHENSION
                                                ml m l     MATH KNOWLEDGE

                                        Source: Data are from the Defense   Manpower      Data Center.


Electronics Composite                   The Electronics Composite score is defined by each service as the sum of
                                        four subtest scores: Arithmetic Reasoning, Mathematics Knowledge,
Scores                                  Electronics Information, and General Science. Figure 2.5 displays the
                                        mean Electronics Composite score for men and women from 1981
                                        through 1989. Figure 2.6 presents the same information by racial/ethnic
                                        grouping.




                                         Page 22                    GAO/PEMD-914       Military     Technical-Txxinhg Effectiveness Is linknown
                                         Chapter 2
                                         The Quality of Military Recruits: 1981-89




Figure 2.5: Mean Electronics Composite
Scores, by Gender: 1981-89
                                         215



                                         210



                                         205



                                         200



                                         1%



                                         190

                                           l%l          1982      1983       1984        19Bd        lS%        1981       1988         1989


                                                 -       MALE
                                                 I---    FEMALE

                                         Note: Electronics Composite scores were computed as the sum of standard scores on Arithmetic
                                         Reasoning, Mathematics Knowledge, Electronics Information. and General Science.

                                         Source: Data are from the Defense Manpower   Data Center.




                                          Page 23                 GAO/PEMD91-4 Military Tednical-Trdning Effectiveness Is unknown
                                         Chapter 2
                                         The Quality of Military Recruits: 198189




Figure 2.6: Mean Electronics Composite
Scores, by Race/Ethnicity: 1981-89
                                         220


                                         215


                                         210


                                         20s


                                         200


                                         1%


                                         190


                                         1%

                                           1Wl          1982        1903     1004       1965         lnm      1987       1988         1989



                                                 -       WHITE
                                                 -1--    BLACK
                                                 B       HISPANIC
                                                 mmmu    OTHER

                                         Note: ElectronicsCompositescores were computed as the sum of standard scores on Arithmetic
                                         Reasoning, Mathematics Knowledge, Electronics Information, and General Science.

                                         Source: Data are from the Defense Menpower   Data Center.


                                         Electronics Composite mean scores rose approximately 3-l/2 points
                                         between 1981 and 1989. They peaked in 1984 and experienced a gradual
                                         decline thereafter. Female recruits scored approximately 11 points
                                         lower than male recruits during this period.

                                         Because of the overlap between the Electronics Composite and AFQT, the
                                         racial differences are similar. In 1981, white recruits scored approxi-
                                         mately 24 points higher than minorities on this composite. By 1989, the
                                         gap had narrowed to approximately 19 points, but most of these gains
                                         by minorities were attained in the earlier part of the decade. By 1989,
                                         the scores of all racial groups were declining.

                                         The interservice pattern of Electronics Composite scores is again similar
                                         to the AFQTpatterns discussed previously. (See figure 2.7.) Army scores
                                         progressed from an average of ten points lower than the next closest
                                         service in 1981 to being essentially the same as Navy and Marine scores


                                         Page 24                    GAO/PEMD91-4 MllUry Te&&alTrdning          Effectiveness IE Unknown
                                         Chapter 2
                                         The Quality of Military fruits:   1981439




                                         by 1986. Mean scores for these three services changed very little from
                                         1985 to 1988. but Army and Navy scores declined significantly in 1989.
                                         Air Force scores have remained higher than other services’ but have
                                         fluctuated irregularly since 1984.


Figure 2.7: Mean Electronics Composite
Scores, by Service: 1981-89
                                         2%




                                               -       ARMY
                                               -1--    NAVY
                                               -       AIR FORCE
                                               nn nn   MARINE CORPS

                                         Note: Electronics Composite scores were computed as the sum of standard saxes on Arithmetic
                                         Reasoning, Mathematics Knowledge, Electronics Information, and General Science.

                                         Source: Data are from the Defense Manpower   Data Center.

                                         The trends during this period were not the same for all the subtests that
                                         comprise the Electronics Composite score. (See figure 2.8.) Scores in
                                         General Science and Mathematics Knowledge increased steadily over
                                         these years. Scores in Arithmetic Reasoning increased from 1981 to
                                          1983 but by 1986 had declined again and have since remained relatively
                                         constant. In 1981, recruits scored higher in Electronics Information than
                                         in the other component subtests, but by 1988 the scores were lower than
                                         for other subtests and lower even than they had been at the beginning of
                                         the decade. In 1989, they declined further.




                                          Page 25                 GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown
                                         Chapter 2
                                         The@ditYofMilitaryRt-mtits:1981-t39




Figure 2.8: Mean Electronics Composite
Subtest Scores, 1981-89                  55      Standard S-




                                         52




                                          1981          1962      19M          1984       1986        lS55        1957        1988           1959


                                                 -       ARITH. REASONING
                                                 -1-1    GENERAL SCIENCE
                                                 m       ELECTRONICS INFO.
                                                 gmmm    MATH KNOWLEDGE

                                         Source: Data are from the Defense Manpower    Data Center.


Number of Recruits                       An alternative method for examining trends in recruit qualifications is
                                         to enumerate the number of recruits whose ASVABscores meet the min-
Qualified for High                       imum standards required for entry into certain occupational specialties.
Technology Specialties                   Each service defines “cutting scores” for classifying recruits-that    is, a
                                         minimum score on one or more ASVAB composites is required for entry
                                         into training for each specialty.2 This score can be adjusted to control
                                         flow into specialties as needed. We chose two of the more demanding
                                         specialties, both of them in the Air Force, and computed the number of
                                         recruits into each service from 1981 to 1989 whose AS!.!. scores would
                                         have qualified them for technical training in these specialties. We chose
                                         these specialties as examples of high technology military occupations
                                         because they share cutting scores with a number of other technologi-
                                         cally oriented specialties. Our purpose was not to imply either a surplus
                                         or deficit of requisite manpower.



                                         “Other qualifications may also apply-for example, possession of a valid driver’s license, special
                                         physical qualifications, or the ability to obtain appropriate levels of security clearance.



                                         Page 26                  GAO/PEMD914 Military Technical-Training Effectiveness Is Unknown
-.
                                            Chapter 2
                                            The Quality of Military Recruits: 1981439




                                            Figure 2.9 depicts the number of recruits during the period in question
                                            who would have qualified for training as control and warning radar spe-
                                            cialists in the Air Force on the basis of their ASVAB scores.3 In 1981,
                                            approximately 38,000 recruits qualified for this specialty. By 1986, the
                                            number of recruits qualifying had risen to more than 69,000, but since
                                            then the number has declined to just under 58,000. In 1981,87 percent
                                            of the recruits qualifying for training as control and warning radar spe-
                                            cialists were white males, although only about two thirds of 1981
                                            recruits were white males. These proportions had not changed substan-
                                            tially by 1989, when white males comprised 84 percent of qualified
                                            recruits but only 61 percent of the general recruit population.


Figure 2.9: Number of Recruits Qualifying
for Training as Control and Warning
                                            72ooo
Radar Soecialists, 1981-89                                                                      I




                                                      1961    lssz     lw3      1911     lee6       ls@a   1917     1-        1908
                                                      YEAR

                                                      u      OTHER

                                                      I      WHITEMALE

                                            Source:   Dataare from the Defense MenpowerDataCenter.

                                            Because the total manpower quotas for the services have varied over
                                            this period, we also computed the percent of all recruits within the


                                            3We used the cutting score that was current for Air Force recruits in May 1989-an Electromcs Com-
                                            posite score of 230.



                                             Page 27                 GAO/PEMD914 Military Technical-Tmining Effectiveness Is Unknown
                                          Chapter   2




                                          gender and racial/ethnic groups who qualified for this specialty. The
                                          results are displayed in figure 2.10.


Figure 2.10: Percent of Recruits
Qualifying for Training as Control and
Warning Radar Specialists, 1981-89

                                         30
                                         27
                                         24




                                          0

                                          lS6l          1902      1323        1984       1983        1SM      lW7       1983       1939


                                                 -      WHITEMALE
                                                 1-11   NcNwHfrEELE
                                                 -      WHITEFEMALE
                                                 ~mmm   NONWHITE FEMALE

                                         Source: Data   are from the Defense Manpower Data Center.

                                         While nearly a third of white males who entered the services during this
                                         period qualified on the basis of their Electronics Composite scores for
                                         this occupational specialty, fewer than 15 percent of white females
                                         qualified. Fewer than 10 percent of minority males and approximately 3
                                         percent of minority females qualified.

                                         The demographic differences are even more sharply defined when the
                                         occupational specialty of Systems Repair Technician is examined. (See
                                         figures 2.11 and 2.12.)




                                         Page 28                   GAO/PEMD-91-d Military Techntcai-Tmining   Effectiveness Is hknown
                                     Chapter 2
                                     The Quality of Military Recruits: 1991439




Figure 2.11: Number of Recruits
Qualifying for Training as Systems   3oooo
Repair Technicians, 1981-89
                                     28ooo
                                     26mo                    F-
                                     24mo
                                                    F                            -w
                                     22ooo
                                     2oooo
                                                                                                          L
                                     18ooo
                                                                                                                  L
                                     16ooo'
                                     14000




                                     Source: Date are from the DefenseManpowerDataCenter.




                                      Page 29                GAO/PEIbS91-4 Military Technical-Tmining Effectiveness k Unknown
                                     Chapter 2
                                     The Quality of Military Recruits 198189




Figure 2.12: Percent of Recruits
Qualifying for Training as Systems
Repair Technicians, 1981-89




                                       1981          1981     19BS       19011      1SM         1m          1967       1963        1939


                                              -       WHITEMALE
                                              ----    OTHER

                                     Some: Dataare from the DefenseManpawerData Center.

                                     In 1981, 16,563 recruits met the demanding qualifications for training in
                                     this field.4 The number of qualified recruits increased sharply by 1983,
                                     but by the end of the decade it had dropped to within 700 of its 1981
                                     level. The vast majority of these were white males, of whom approxi-
                                     mately 11 percent qualified. Fewer than 2 percent of our other demo-
                                     graphic groups met the qualifications.

                                     As we approach the twenty-first century, the sophistication of our
Summary and                          weapons systems can be expected to impose greater demands on the
Conclusions                          technological competence of the individual members of the armed
                                     forces. In addition, the youth pool from which the services will draw
                                     their recruits will become increasingly female and minority. And
                                     although we cannot foresee how reduced political tensions may ease the
                                     demands on this pool, our examination of recruit quality trends during
                                     the 1980’s is not reassuring concerning the military’s ability to meet
                                     these challenges.


                                     4This specialty requires an ASVAB Electronics Composite score of 236 and a mechanical score of 247,
                                     requirements that rank it among the most challenging fields in all of the services.



                                     Page 30                  GAO/PEMD91-4 Military Technical-Tmining Effectiveness Is Unknown
Chapter 2
The Quality of hlilitary Recruit9: 1981439




XFQT scores and, to a lesser extent, Electronics Composite scores are
higher now than they were in 1981, yet both have begun to decline. The
Electronics Information subtest scores are lower than they were in 1981,
and General Science scores have dropped to near their 1981 level. Thus,
fewer recruits are qualifying for the more demanding technical occupa-
tional specialties.

Women and minorities have traditionally scored lower in these areas.
While the gap between white males and other recruits narrowed some-
what in the early 1980’s, since mid-decade the race and gender differ-
ences have remained fairly constant. As we discussed in the previous
chapter, women and minorities will form the bulk of the new-entry labor
pool by the year 2000, and therefore providing well-trained personnel
for a technologically sophisticated military can be expected to become
increasingly difficult. The burden on training will increase, and with it
will come the need to monitor the effectiveness of this training as recruit
demographics shift.

In the following chapters, we will address the services’ current ability to
measure the effectiveness of their training in technologically demanding
areas. We will also examine the differences among gender and racial/
ethnic groupings, and the ability of the AFQTand Electronics Composite
scores to predict success in technical military specialties.




Page 31                 GAO/pEMl&914 Military Technical-Tdning Effectiveness Is unknown
Chapter 3

Classroom Measures of Training Effectiveness


                In this chapter, we address our second evaluation question: How useful
                are the data collected by the services before and during classroom
                training for selecting individuals for high technology roles and for eval-
                uating the effectiveness of this training? Although we reviewed a broad
                spectrum of evaluation-related materials and activities performed by
                the services at the classroom level, we concentrated on the course
                grades assigned at the end of training and, in some cases, at interme-
                diate stages during the training process. Our intention was to define the
                extent to which appropriate data were available to the services and to
                external reviewers from which some judgments could be made about
                training effectiveness. We did not attempt to perform an evaluation of
                individual curricula, training sites, or instructors.

                Our primary criterion for selecting courses for review was that the qual-
                ifying score for course entry, as established by the service, was rela-
                tively high. In addition, we considered annual trainee throughput and
                the recent stability of the course curriculum. Nearly all the courses
                which met our criteria were in the electronics area, and most involved
                the use, maintenance, and repair of electronic equipment, particularly
                radar or sonar. We collected the course grades associated with advanced
                individual training for 13 occupational specialties, four each in the Savy
                and Air Force, and five in the Army. Some of the data were collected at
                the training site, and some from centrally computerized records.

                Because of large differences between the services in annual throughput
                of trainees in these courses, the size of our sample varied widely across
                services. This variation was increased by problems we encountered con-
                cerning the usefulness of certain data provided by the Army (see the
                following section), as well as by our decision to supplement our already
                sizable Navy data base with relevant data previously collected by the
                Navy for research purposes. Our final sample consisted of more than
                6,000 sailors, nearly 1,000 Air Force personnel, and fewer than 300
                soldiers. In this chapter, we present the results of our analysis sepa-
                rately for each service.

                We examined the course data for their apparent reliability-that    is, for
                their apparent ability to discriminate meaningfully between perform-
                ances of trainees- as well as for differences in training outcomes among
                the demographic groupings discussed in the previous chapter. We also
                examined the relationship between training outcomes and individual
                abilities, as measured by ASVAEJin order to estimate the power of the
                selection criteria to predict performance in training.



                Page 32            GAO/PEMD914 Military Technical-Training Effectiveness Is Unknown
                                           Chapter 3
                                           Classroom Measures of
                                           Thhing Effectiveness




                                           The Army specialties for which we collected data are listed in table 3.1.

Table 3.1: Army Occupational Specialties
Reviewed                                                                                                                      Electronics
                                                                                                                              Comoosite
                                                                                                                               quaiitying
                                           Specialty           Title                            Location                            score’
                                           24J                  Hawk pulse radar repairer       Redstone Arsenal, Ala.                  217
                                           27N                  Forward area alerting           Redstone Arsenal, Ala.                  217
                                                                   radar repairer
                                           29v                  Strategic microwave             Fort Gordon. Ga.                        217
                                                                   systems repairer
                                           36L                  Transportable automatic         Fort Gordon, Ga                         217
                                                                   systems operator
                                           398                  Automatic test equfpment        Fort Gordon, Ga.                        217
                                                                   ofxrator
                                           3um of subtest standard scores

                                           We found that the course grades for these five specialties were not
                                           equally reliable indicators of performance during training. Whereas for
                                           the two classes at Redstone Arsenal final grades were a simple arith-
                                           metic average of intermediate measures of performance, at Fort Gordon
                                           we were unable to fiid a consistent relationship between individual
                                           milestone measures and final grades, nor were we able to locate anyone
                                           at Fort Gordon who could suggest one. We concluded that the grades
                                           recorded for two of these courses (36L and 39B) could not be used to
                                           discriminate reliably between the performances of individual trainees.
                                           We found inconsistencies in scoring procedures between different
                                           classes and even within the same class. Finally, we discovered that the
                                           Fort Gordon grades (unlike those at Redstone) were based partially on
                                            measures of physical conditioning that appeared to be unrelated to job
                                            performance.

                                           For a third training course at Ford Gordon (29V), however, we were able
                                           to generate what we judged to be reasonable measures of performance
                                           for some classes. For these classes, we developed an algorithm to pro-
                                           duce scores based only on those nonconstant measures that were related
                                           to general or applied electronics training.’



                                            lEktemal corroboration of the preferability of this improvised scoring procedure was provided by
                                            our later analysis of the relationship between grades and ASVAB.The m-relation betweenoriginal
                                            2% gradesand the ElectronicsCompositewas negativeand nonsignificant. The revised gradeswere
                                            positively (50) and significantly correlti (p < .Ol) with this ASVAB score.



                                            Page 33                    GAO/PEMD914 Military Te&nical-Tminbg Effectiveness Is L’nknown
                                          Chapter 3
                                          Classroom  Measures of
                                          Training Effectiveness




                                          Our final sample was therefore composed of U.S. Army trainees from
                                          those 24.J and 27N classes conducted in fiscal years 1985 through 1988
                                          whose records were available at the time of our visit, and approximately
                                          one third of the 29V trainees from the same period. Table 3.2 presents
                                          the mean scores of this sample on AFQT. the Electronics Composite of
                                          ASVAB, and course grades.’

Table 3.2: Mean Scores on Predictor and
Criterion Variables, Army                                                                      Electronics
                                                                           AFQT                Composite                   Grade
                                          Category                     Number Mean’          Number Mea+                Number Mean
                                          Male
                                          ___     ..~                     280 232 15 -.-.-..     280 238 46               232
                                                                                                                           .~~ -~89 23
                                          Female                             23    232 87               23.23013~.~~~      23   86 08
                                          White                            255 234 00                  255 240 00         160   90 i9
                                          Nonwhite                          48 222.67                   48 226.29          95   86 86
                                          Total                            303 232.20                  303 237.83-        255   88.95
                                          %um of subtest standard scores

                                          Male trainees in these courses scored significantly higher than did
                                          females, and white trainees performed better than minority students.
                                          These performance differences correspond to group-level differences in
                                          both AFQT and Electronics Composite scores for racial/ethnic groupings.

                                          The group means presented in table 3.2 also suggest that AFQT and Elec-
                                          tronics Composite scores do not equally predict success in training, at
                                          least for females. While female trainees entered training with Elec-
                                          tronics Composite scores significantly lower than those of males, the
                                          AFQT scores of female and male trainees were equivalent. In other words,
                                          it would appear that Electronics Composite scores are a better indication
                                          of future performance in these occupational specialties than are AF~T
                                          scores. This is consistent with ASVAB’S role in the military accession pro-
                                          cess: potential recruits are admitted to service on the basis of AFQT
                                          scores, and then are assigned to occupational specialties for which they
                                          qualify on the basis of their scores on other ASVAB composites.

                                          We tested this hypothesis more directly by examining the correlations
                                          between course grades and three ASVAB scores: AFQT, Electronics Com-
                                          posite, and a “factor score.” This last measure is the weighted sum of all
                                          ten ASVAB subtests. We derived this last score by principal component
                                          analysis of ASVABsubtest scores. The results of our correlation analysis
                                          are displayed in table 3.3.

                                          %e appendix II for similar statistics on the course level.



                                          Page 34                   GAO/PEhtD-914 Military Technical-Training Effectiveness Is Unknown
                                         Chapter 3
                                         Classroom Measures of
                                         Training Effectiveness




Table 3.3: Intercorrelation   of Study
Variables, Army.                                                                        Electronics                              Grade’
                                         Category                         AFQTb         Compositec            Factoe     Raw        Adjusted’
                                         Total
                                            AFQT
                                          _~-                                 1 00                  0 819       0 849     0 299            0 417
                                            Electronics Composite             303
                                                                              .-__~~                100
                                                                                                      __~_~     0 899     0 439            0 599
                                            Factor                            303                    303        1 00      0 429
                                         ._ Grade                              189                   189         189      100
                                         Male
                                           AFOT                               100                0.83s          0 859     0 319            0 43s
                                           Electronics Composite              280                1 00           0 899     0 429            0 589
                                           Factor                             280         .- ~~-  280           1 00      0419
                                                                              ____-
                                           Grade                              171                 171            171      100   __-~~
                                         Female
                                          AFQT                                1 00                  0 829        0 879    0 42             0 533
                                          Electronics Composite                 23                  1 00         0 89     0 35             0 5J’;
                                          Factor                                23                    23         1 00     0 35
                                          Grade                                 18                    18           18     1 00
                                         White                                                                                       .-.-_____
                                           AFOT                               1 00                  0 809        0 829   -.0_____
                                                                                                                             .-~~.
                                                                                                                              24s         - oy
                                           Electronics Composite              255                   I 00         0 879     0 409            0 609
                                           Factor                             255                    255         1 00      0 409
                                           Grade                               154                   154          154       1 00
                                         Nonwhlte
                                           AFQT                               1 00                  0.789        0.89     0.19             0 22
                                           Electronics Composite                48                  1 00         0 899    0 30             0 40
                                           Factor                               48                    48         100      0 26
                                           Grade                                35                    35           35     loo

                                         YIorrelatlon coefflclents are In upper diagonal and number In lower diagonal.
                                         bAFQT = sum of subtest standard scores
                                         CElectronlcs Composite = sum of subtest standard scores for Electronics Composite
                                         dFactor = score from first factor from principal component analysis
                                         eGrade = final course grade
                                         ‘Adjusted = correlation adjusted for restnction of range
                                         gp < .05

                                         For our whole Army sample, the variation within Electronics Composite
                                         scores explains approximately 18 percent of the variation within course




                                         Page 36                   GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown
       chapter 3
       Classroom Measures of
       Raining Effectiveness




       grades, more than factor scores and substantially more than AFQT? In
       most cases, Electronics Composite scores are somewhat better predictors
       of grades than are .~FQTscores, whether a simple correlation coefficient
       or a coefficient adjusted for range restriction is used as a criterion.’ This
       is not true, however, for female soldiers, for whom AFQT predicts class-
       room performance better than the Electronics Composite does. In most
       cases, ASVABfactor scores provide stronger predictions than either AFQT
       or the Electronics Composite. Our ability to predict course grades from
       any of the three ASVAB scores is weakest for minority soldiers as a group.

       Our analysis of nonwhite and female soldiers is unfortunately based on
       a relatively small sample. Nevertheless, it suggests that AFQT or some
       other general score from ASVABmay provide a better predictor of success
       for women recruits in electronics-related training than does the Elec-
       tronics Composite score. It also indicates that we need better predictors
       than we currently have for minority students.


       We examined four Navy training courses, two each from the Antisub-
Navy   marine Warfare School in San Diego and the Naval Air Station in
       Millington, Tennessee. They are listed in table 3.4.




       3A correlation coefftit is the squaremot of commonvariance.In this case,the ElectronicsCom-
       positescorefrom ASVABshares18.5percent(.4.CJ2) of variance with grades,or, after adjustment,35
       percent(.592>.
       4Themt          for restriction in rangeis commonamongpsychometriciansand appearsin all DOD
       repoti that we reviewed.Sincecorrelationsare simply measuresof the extent to which two mea-
       SUlVSVUyill  cr)mmon,any nstriction to the variation of oneof the measuresresults in an underesti-
       mateof their commonvariation. This restriction occurswhen the sampleincludesonly oneend of a
       spectrumofxores,asisthec;iseforanymea3ure usedfor selectionpurposes.Our sampleincludes
       only thosewhoseAFQT scoreswere sufficiently high to permit acceptanceinto military service The
       aQustedcorrelation coeffkient representsthe hypothetical relationship betweenthe ASVABmeaSure
       andcwrse~ifthisrangerestriaiondidnotexistforoursample.



       Page 36                 GAO/PETkUS914Milbry       Tecbnicai-Train@ Effectiveness Is Unknown
                                          Chapter 3
                                          Classroom Measures of
                                          Training Effectiveness




Table 3.4: Occupational Specialties
Reviewed, Navy                                                                                                           Electronics
                                                                                                                         Composite
                                                                                                                          qualifying
                                          Specialty            Title                      Location                            score’
                                          STG                  Sonar technIcIan,          SanDego,    CalIf                      218
                                                                 antlsubmanne warfare,
                                                                 surface
                                          S-E                  Sonar technlclan,          San Diego. Calif                       218
                                                                 antisubmarine warfare,
                                                                 subsurface
                                          AQ                   Aviation fire control      Mlllmgton, Term                        218
                                                                 technician
                                          AX                   Aviation antlsubmanne      Mllllngton, Term                       218
                                                                 warfare technIctan
                                          ?Sum of subtest standard scores

                                          We were able to achieve a much larger sample size (6,156) for these
                                          courses than was the case for our Army courses (303) because of their
                                          larger annual throughput, and because the Naval Personnel Research
                                          and Development Center provided us with relevant data that they had
                                          collected on STS and STG specialties for fiscal years 1986 and 198i.
                                          These data supplemented the fiscal year 1988 and fiscal year 1989 data
                                          that we collected at the San Diego base. Millington provided us with
                                          training data for 1987 and 1988. Table 3.5 presents the mean scores on
                                          the two ASVABcomposites and course grades for the entire Navy sample.
                                          Statistics on individual courses are presented in appendix II.

Table 3.5: Mean Scores on Predictor and
Criterion Variables, Navy                                                                   Electronics
                                                                          AFQT              Composite                   Grade
                                          Category                     Number Mean’       Number Mean.               Number Mean
                                          Male                          6.080   229.60      6,080    235.33            5.882    89 11
                                          Female                           76   235.59         76    23066                 71   9070
                                          White                         5,355   230.49      5,355    236.25            5179     8921
                                          Nonwhite                        801   224 18        801    228.75             1,159   8958
                                          Total                         6,156 229.67        6,156 235.26               6,443    69.30
                                          3um of subtest standard scores

                                          Male recruits entered training with significantly lower AFQT scores and
                                          significantly higher Electronics Composite scores than those for females.
                                          Final grades for males were slightly, but significantly, lower than those
                                          for their female classmates. These results suggest that, at least for
                                          females, a substantial advantage in AFQT can overcome a disadvantage
                                          in the Electronics Composite. In addition, minority students began



                                          Page 37                  GAO~EMD-914 Milimy Technid-Tmining         Effectiveness Is Unknown
Chapter 3
Classroom Measures of
Training Effectiveness




training with substantially lower scores than nonminorities on both AFQT
and the Electronics Composite. The final grades of the two groups were
not significantly different.

The results of our correlation analysis appear in table 3.6. They suggest
that XFQT may be more important for training success than the Elec-
tronics Composite. For most Navy groupings, AFQT scores are better
predictors of classroom performance than are Electronics Composite
scores. When adjusted, they explain from 12 to 38 percent of the varia-
tion in course grades. Once again, the Electronics Composite is the
weakest of the three predictors for female sailors, and the more general
factor score is the strongest. The ability of any of the three ASVAB scores
to predict training success is weakest for minorities.




 Page 38                 GAO/PEMB91-4 Military Technical-Tn&b.g Effectiveness Is Unknown
                                         Chapter 3
                                         Classroom  Measures of
                                         Training Effectiveness




Table 3.6: Intercorrelation   of Study
Variables, Navy.                                                                        Electronics                              Grade0
                                         Category                         AFQTb         CompositeC          FactoP       Raw      .-. Adjusted’.-
                                         Total
                                           AFQT                               1 00               0 799         0 8Og         -.___
                                                                                                                          0 30s               0 46:
                                                                                                                         ~___-
                                           Electronrcs Composrte            6,156                 1 00         0.85s      0 27’;              0 469
                                           Factor                           6,156               6.156          1 00      0,789         ~---
                                           Grade                            5,939               5,939         5,939       1 00
                                         Male
                                           AFQT                               1 00               0 799         0.81s     0.30s              0469
                                           Electronrc Composrte             6,080                 1.00         0.05s     0.279                0 46s
                                           Factor                           6.080               6.080          1 00      0.274
                                           Grade                            5 868               5,868         5.868      1 00
                                         Female
                                          AFQT                                1.oo                  0.749      0.819     0 399                0 629
                                          Electronrcs Composrte                 76                  100        0 829     0 329                0 55g
                                          Factor                                76                    76       1 00      0 399
                                          Grade                                 71                    71         71      1 00
                                         White
                                          AFQT                                1.oo                  0 79s      0.819     0 30s                0 47,3
                                           Electronrcs ComDosrte            5,355                 1.oo         0.89      0 299          ~     0 50s
                                           Factor                           5,355               5,355           1.oo     oz.09
                                           Grade                            5,165               5,165         5,165      1 00
                                         Nonwhite
                                           AFOT                               1 00                  0.74s      0.779      0 229               0 349
                                           Electronics Comoosite.             801                   1.oo       0.819      0.149               0 25s
                                           Factor                             801                    801       1.00       0.11s
                                           Grade                              774                    774        774       1 00
                                         YZorrelatron coefficients are in upper dragonal and number In lower dragonal.
                                         bAFOT = sum of subtest standard scores
                                         CElectronrcsComposrte = sum of subtest standard scores for Electronrcs Composite
                                         dFactor = score from frrst factor from pnncrpal component analysrs
                                         eGrade = ftnal course grade
                                         ‘Adjusted = correlation adtusted for restnction of range
                                         % < 05




Air Force                                Our sample size from these courses totaled 922. Statistics for individual
                                         courses are provided in appendix II. (We received both training and




                                         Page 39                   GAO/PEMD914 Military Technical-TraWng Effectiveness Is Unknown
                                          Chapter 3
                                          Classroom Measures of
                                          Training Effectiveness




                                          demographic data on all of these courses from the Air Force Human
                                          Resources Laboratory.)

Table 3.7: Occupational Specialties
Reviewed, Air Force                                                                                                   Electronics
                                                                                                                      Composite
                                                                                                                       qualifying
                                          Specialty            Title                         Location                       score’
                                          30332                Aircraft control and          Keesler AFB, MISS                    230
                                                                  warrung radar speclallst
                                                                                                                            .-
                                          30333                Automatic tracking radar      Keesler AFB, MISS.                   225
                                                                  soeclalist
                                          45530A               Photo-sensors                 Lowry AFB, Co/o                      225
                                                                  maintenance spectalist.
                                                                  tactlcal reconnaissance
                                                                  sensors
                                          455308               Photo-sensors                 Lowry AFB, Colo                      225
                                                                  maintenance specialist,
                                                                  reconnaissance electro-
                                                                  optical sensors
                                          “Sum of subtest standard scores

                                          Trainees’ ASVAB scores and course grades are displayed in table 3.8. As
                                          would be expected, ASVAB scores for Air Force students are significantly
                                          higher than those for the other services we reviewed. In addition, we
                                          found a higher proportion of female trainees in the Air Force courses
                                          than in the Army and Navy courses we reviewed.

Table 3.8: Mean Scores on Predictor and
Criterion Variables, Air Force                                                                 Electronics
                                                                          AFQT                 Composite              Grade
                                          Category                     Number Mean’          Number Mean’          Number Mean
                                          Male                               824 235.45          824    241.94        854        91 31
                                          Female                              90 237.73           98    235.88        100        8991
                                          White                              82.5 236.22         825    241.95        855        91 21
                                          Nonwhite                            97 231.19           97    235.73         99        90 76
                                          Total                              922 236.69          922 241.30           964        91.1s
                                          %.Jrn of subtest standard scores

                                          Male Air Force recruits entered training with substantially higher Elec-
                                          tronics Composite scores and slightly, but significantly, lower AFQT
                                          scores than did female recruits. Despite the slight female AFQT advan-
                                          tage, male recruits ended training with higher course grades than those
                                          earned by female recruits. In addition, although white students began
                                          training with substantially higher ASVABscores, their final grades were
                                          not significantly different from those of their nonwhite classmates.


                                          Page 40                  GAO/PEMD91-4 Military Technical-Tmining Effectiveness Is UnJcnown
Chapter 3
Classroom Measures of
Training JXffectiveness




-4s table 3.9 demonstrates, the correlations between ASVAB and Air Force
training grades followed much the same pattern as did the Xavy’s. When
correlations are adjusted, the traditional ASVAB composite scores explain
from 6 to 36 percent of classroom performance. Factor scores are as
good as, or better than, composites as predictors. For female students,
AFQT scores outpredict Electronics Composite scores. Once again, it is
most difficult to predict course grades for minority students, although
factor scores explained 10 percent of their classroom performance.




 Page 41                  GAO/PEMD@l-4 Mmuy Technical-~   Effectiveness Is ~~0~
                                         Chapter 3
                                         Chsroom Measures of
                                         w       Elffectiveness




Table 3.9: Intercorrelation   of Study
Variables, Air Force’                                                                         Electronics                     Grade.
                                         Category                               AFQTb         CompositeC FactoP        Raw        Adjusted’
                                         Total
                                           AFQT                                     1 00             0719      0753
                                                                                                                 __    zz-              b447
                                                                                     ._____
                                           Electronics Composite                    922              1 00      004s     0 33s            054:
                                           Factor                                   922                922     1 00    0 359
                                           Grade                                    922              -922      922      100
                                         Male
                                           AFQT                                     loo              0 74s     0 779   0 30s             0442
                                           Electronics Composite                     024             1 00      0 849   0 334             0 544
                                           Factor                                   824               824      1 00    0 344
                                           Grade                                    824               824      824     1 00
                                         Female
                                           AFOT                                     1 00             0.689     0 779   0 359 ~__         0 54;
                                           Electrontcs Composite                      98             1 00      0 779   0 2@--            0-~~~~
                                                                                                                                            50’;
                                           Factor                                     98               98      1 00    0 289 I_         -~
                                           Grade                                      98               98        98    1 00
                                         White
                                          AFQT                                      1 00             0 729     0.759    0319             047;
                                           Electronics Composite                    a25               1 00     0.839    0 359            0 583
                                           Factor                                   a25               a25      1.00     0 35s -------
                                           Grade                                    825               a25       a25     1 00
                                         Nonwhite
                                           AFOT                                     1 00              0.659    0.689    0 19             0 24;
                                           Electronics Compostte                      97              1 00     0.829    0.239            0 33s
                                           Factor                                     97                97     1 00     0.319
                                           Grade                                      97                97       97     1 00
                                         %orrelatton coefflclents are In upper diagonal and number In lower diagonal
                                         bAFQT = sum of subtest standard scores
                                         CElectronlcs Composite = sum of subtest standard scores for Electromcs Composite
                                         dFactor = score from fvst factor from prmclpal component analysis
                                         eGrade = flnal course grade
                                         ‘Adjusted = correlation adjusted for restrlctlon of range
                                         gp < 05



t-- ~-----.                                                                          ing courses-designed to pre-
mmmary      and                           pare recruits in three services to serve in certain “high technology”
Conclusions                               roles-identified   some problems with the utility of data maintained by
                                          the Army on classroom performance in certain specialties. It would not



                                          Page 42                   GAO/PFJMD-91-Q
                                                                                 MiIitary Technical-Training Effectiveness Is Unknown
Chapter 3
Classroom Measures of
Training Effectiveness




be appropriate to make interservice comparisons on the basis of this
finding, however, since much of the Kavy training information and all of
the data we received from the Air Force were specially prepared for
research purposes. We cannot therefore make firm judgments about the
immediate availability of psychometrically suitable measures from these
two services.

The psychometric deficiencies we found at Fort Gordon appeared to
result from a number of different factors, including questionable data
entry procedures and software. They are also a function of the pass/fail
nature of the criteria used to evaluate student progress. We cannot
assess the extent to which performance on individual training tasks is
susceptible to more sophisticated measures than “go/no-go,” but we
would suggest that subject matter experts attempt to develop more
finely tuned, objective, and reliable measures of performance.

Our review also raised certain questions about differential success in
training for males and females, and for whites and minorities, and about
the differential predictive validity of ASVABfor these subgroups. Our
analysis of gender- and race-related differences in mean ASVAB scores
and course grades in the Army suggested that the Electronics Composite
was an efficient simple predictor of training success. Women and minor-
ities entered training with significantly lower Electronics Composite
scores and received significantly lower course grades.

Our findings from the Navy and Air Force samples, however, suggest
that a more complex relationship exists between ASVAB and course
grades. For these services, gender- and race-related differences in course
grades were small or nonexistent, despite significant differences in Elec-
tronics Composite scores. The Navy and Air Force samples also differed
from the Army sample in three other respects: (1) Electronics course
grade differences, though significant, were much smaller in the Navy
and Air Force than in the Army; (2) unlike women soldiers, Navy and
Air Force women had significantly higher AFQT scores than their male
classmates; and (3) the AFQT disadvantage for minorities in the Navy
and Air Force was only half of that in the Army. These findings suggest
that an advantage in the more general aptitude measured by AF-QT (or by
an even more general measure such as a factor score) can compensate
for a deficit in the Electronics Composite when the deficit is not too
great. In other words, success in training may be related as much to gen-
eral ability as to performance on the Electronics Composite.




Page 43                  GAO/PElKB914 MUary Technical-Train&j Effectiveness Is Unknown
Chapter 3
Classroom .Measuresof
Training Effectiveness




This interpretation is consistent with the results of our correlation anal-
yses, which tested the relationship between ASUB scores and course
grades more directly. While XSVAB'S Electronics Composite score demon-
strated a moderate ability to predict success in training for white male
students, it was less successful for female or minority students. The
factor score we derived from ASVAB was in most cases the best simple
predictor of training success because it utilized information from all ten
MVAB subtests, and not simply from the subset used for AFQT or the Elec-
tronics Composite. However, all three ASVAB measures (AFQT, Electronics,
and factor scores) in most cases proved to be relatively weak predictors
of performance in training for minority students.

Correlations do not imply causality, nor does the lack of a correlation
for a subsample indicate the location of a problem. From our analyses it
is impossible to conclude either that ASVABis a weaker measure of ability
for some groups, or that some factor in classroom training contributes
differentially to the success of different groups. Yet, as the youth pool
shrinks and its demographic characteristics shift, the military will find
itself turning more toward minority and female recruits, These groups,
as we have seen, consistently score lower in the measures used to assign
recruits to technical training and in our largest service are less likely to
perform well. It will become increasingly incumbent on all services to
optimize selection criteria for technical advanced individual training for
women and minority groups, to provide compensatory training where
needed, and to assure that no extraneous factors within the training
environment interfere with the full development of a recruit’s potential.




 Page44
Chapter 4

Field Measures of Training Effectiveness


                           Whatever criteria may exist to predict or to assess a recruit’s perform-
                           ance in training, the ultimate criterion of training effectiveness is the
                           recruit’s performance on the job. Our third evaluation question
                           addresses this issue: How well do the services’ selection criteria and
                           training evaluation measures predict success in high technology roles?

                           To answer this question, we attempted to locate individual field-per-
                           formance data routinely collected by the services that could be linked to
                           our ASVAB and classroom training data to serve as reliable and valid
                           indicators of training effectiveness. And, although we were made aware
                           of numerous post-training evaluation activities performed by the indi-
                           vidual services, only the Army could provide us with individual per-
                           formance measures. In this chapter, we will examine the quantitative
                           relationship between these Army data and the other information we
                           compiled. We will also discuss other evaluation mechanisms used by the
                           services and suggest a potential alternative source of post-training eval;
                           uation measures.




Skill Qualification Test   By Army regulation, a soldier’s occupational specialty performance is
                           tested within six months of completion of training and every year there-
                           after. These written tests are prepared by the sponsoring training site.
                           They are administered under the direction of the Skill Qualification Test
                           (SQT) directorate at Fort Eustis, Virginia, where the resulting data are
                           stored.

                           Fort Eustis provided us with the S&T scores of all soldiers who took the
                           SQTfrom 1985 to 1988 in the occupational specialties we had chosen for
                           our sample. Summary statistics for these data are provided in appendix
                           IV. We matched these scores, where possible, with ASVABscores and
                           classroom grades for each soldier included in our training site review.*
                           Table 4.1 presents the scores of these soldiers summarized by demo-
                           graphic groups, together with the correlation coefficient estimating the
                           relationship between S&Tand the measures we examined in the previous
                           chapter.



                           ‘For soldiers with multiple SQT scores during this period, we used only the first score.



                           Page 45                  GAO/PEMB914 Military Technical-Training Effectiveness Is Unknown
                                    Chapter 4
                                    Field Measures of Training Effectiveness




Table 4.1: Correlation of SOT and
Predictor Variables                                                                                       COrrI?latiOn     with SQT
                                                                                                          Electronics
                                    Category                           Mean     Number AFQTa              Compositeb         FactorC Graded
                                    Male                               82 12       209
                                      Raw                                                     0 21’                 0 28’__.-026’           ~~ 047’
                                      Adjustede                                               0 30’                 0 41’
                                                                                                                             -__        -    --~--
                                    Female                             77 52           21
                                      Raw                                                   -0 07                   0 12       -0 03         -0 52’
                                      Adlustede                                             -0.10                   0 19
                                    White                              81 86          144
                                      Raw                                                     0 21’                 0 25’        0 32’         0 44f
                                      Adjustede                                               0 33’                 0.40’                            ..~
                                    Nonwhlte                           81.45           86
                                      Raw                                                   -0.19                   0.07         0 12          0 44’
                                      Adpstede                                              -0.22                   0.10
                                    Total                              81 70          230                                                     _.
                                      Raw                                                     0.18’                 0 28’           0 34’      0 43’
                                      Adrustede                                               0.26’                 0 41’
                                    aAFOT = sum of subtest standard scores
                                    bElectrontcs Composite = sum of subtest standard scores for Electronics Composite
                                    CFactor - score from first factor from prtnclpal component analysis
                                    “Grade = flnal course grade
                                    eAdjusted = adjusted for restnction of range
                                    ‘p < 05

                                    For the total universe of soldiers the best simple predictor of SQTscores
                                    is final classroom grades, which explains 18.5 percent of the variation in
                                    ~QT’S.The AFQT and Electronics scores from ASVABscores were also sig-
                                    nificantly related to SQT'S for white males in our sample, but factor
                                    scores consistently outpredicted these composites. For females and for
                                    nonwhite soldiers, however, ASVABscores were not positively related to
                                    future performance as measured by SQT. Most surprisingly, the grades
                                    scored by female students at the training site were inversely correlated
                                    with their SQTscores-that is, women with higher grades tended to
                                    score lower on S&T’s,and vice versa.

                                    The limited size of our sample, especially for female soldiers, makes it
                                    inappropriate to generalize without severe caveats. However, our anal-
                                    ysis suggests that the traditional ASVAB scores may not be the best pre-
                                    dictor of performance for the nontraditional-that    is, the female or
                                    minority- soldier. This finding reinforces the concern we expressed in


                                    Page 46                   GAO/PEMD914 Milituy Technical-Training Effectiveness Is Unknown
                           Chapter 4
                           Field Measures of Training Effectiveness




                           the last chapter. that better predictors of success for these groups
                           should be found. Any interpretation of the inverse relationship between
                           grades and SQT'S for women would be purely speculative, but this
                           anomaly warrants further investigation.


Other Evaluation-Related   Each Army training site includes an evaluation unit that performs reg-
                           ular process evaluations. These include classroom observations of
Activities                 instructors, annual meetings to review curricula, cyclical outreach pro-
                           grams to contact graduates of the school in the field and their supervi-
                           sors, and occasional more intensive curriculum reviews called training
                           effectiveness analyses.

                           Classroom observations are conducted on a regular basis by both master
                           trainers and the training site internal evaluation unit. They are per-
                           formed more frequently when instructors are new or have received less-
                           than-satisfactory evaluations. Most of the observation reports that we
                           reviewed, particularly those performed by the internal evaluation unit,
                           were mainly concerned with administrative details. The most frequent
                           criticism we encountered was that copies of the lesson plan and curric-
                           ulum materials were not properly arranged and situated at an empty
                           desk in the rear of the classroom for the observer.

                           Schoolhouse external evaluation units also conduct outreach programs
                           during which members of the units travel to Army bases-where a large
                           concentration of the training-site graduates are stationed-to   collect
                           information on the opinions of base staff about training quality. These
                           reviews occur approximately every two or three years for the courses
                           we reviewed, but they are not routinely scheduled. They are more fre-
                           quently occasioned by indications from the field of training problems,
                           and their frequency is also affected by travel-budget considerations.

                           More objective and formal training effectiveness analyses are performed
                           when a new training course is introduced or when weapons system mod-
                           ifications prompt major changes in the curriculum. These analyses
                           include written tests, hands-on tests, and interviews with soldiers and
                           their supervisors. The most recent training effectiveness analysis for the
                           courses we reviewed was conducted during the summer of 1987 and was
                           prompted by changes to the Hawk missile system.




                           Page 47                GAO/PEMD-91-4Military Technical-Training Effectiveness Is Unk~~own
                             Chapter 4
                             Field Meaaums of Training Effectiveness




Navy

Sourcesof Individual Field   We considered two possible sources of field performance information
                             routinely collected by the Navy as measures of the effectiveness of the
Performance Data             training courses in our sample: Level II surveys and Advancement in
                             Rating Examinations. The Level II survey program was designed to col-
                             lect information on the job performance of recent training-school gradu-
                             ates.2 For each course, questionnaires were sent to the supervisors of
                             graduates approximately six months after graduation, asking them to
                             rate individual tasks performed within the specialty (as to their impor-
                             tance) and the adequacy of the level of training demonstrated by the
                             course graduates. We found, however, that Level II surveys have been
                             effectively abandoned by the Navy, and that none has been performed
                             since at least 1986.

                             Advancement in Rating Examinations are multiple-choice tests adminis-
                             tered to candidates for promotion who have already been certified as
                             qualified by their commanding officers. Different tests are prepared for
                             each promotion cycle, and their results are used to rank candidates.
                             Because they are not standardized, and are not administered to all grad-
                             uates, these tests, in the judgment of test developers and administrators,
                             are “not a good source of training evaluation feedback.” We concurred
                             with this judgment.


Internal Review of           In 1986, the Chief of Naval Operations requested that the Naval
                             Training Systems Center (NTSC) determine the current status of Navy
Evaluation Practices         training evaluation and provide recommendations for the future conduct
                             of such operations. NTSC submitted three reports to the Chief of Naval
                             Technical Training in 1988. They identified three central evaluation
                             functions: Level II surveys, the Fleet Training Assessment Program
                             (FLEXV), and the Training Assessment Survey Team (TAST). The TAST
                             concept had only recently been established at the time of the NTSC
                             report, and only two surveys had been completed under the program.
                             These surveys were limited to new weapons systems and involved fleet
                             visits to identify training deficiencies and requirements and any correc-
                             tive actions that needed to be taken.

                             *The term derives from a classification of evaluation intensiveness established in 1981 by the Naval
                             EducationTraining Command. Level I refers to unsolicited feedback to training sites concerning
                             training adequacy, Level II to a questionnaire sent to the fleet, and Level III to an indepth analysis of
                             problems identified in lower level reviews.



                             Page 48                   GAO/PEMD-91-4Military Technical-Trahing Effectiveness Is Unknown
Chapter 4
Field Measures of Training Effectiveness




FLETAP is currently a reactive system that attempts to identify training
deficiencies through either direct input from the fleet or review of
reports and other fleet materials. FLETAP is also responsible for per-
forming Training Quality Reviews, which involve administering job per-
formance tests to fleet personnel to measure adequacy of training. No
such reviews have been completed. The FLETAP component responsible
for the Pacific Fleet consists of five full-time staff positions, four of
which were filled at the time of our visit there. Its Atlantic Fleet coun-
terpart has four authorized staff positions, three of which were filled.

The NTSCreport also identified numerous other nonformal or noncentral-
ized evaluation and evaluation-related activities within the Navy’s
training community. However, MSC found that the quality of current
Navy classroom training cannot be readily ascertained for the vast
majority of courses; that there is a general lack of technical evaluation/
assessment skills; that current evaluation activities are fractionated, not
comprehensive, and operating in an environment of obsolete instruc-
tions and unclear objectives. NTSCconcluded that the fleet’s mandate to
provide useful data to the training community about the performance of
its graduates needed to be enforced and that fleet evaluation activities
should be upgraded and appropriately staffed. It also recommended that
internal training appraisal responsibility be decentralized to the training
site level and that independent external programs be reviewed for tech-
nical adequacy and integrated into an overall systematic approach.

In response to these reports, a three-person team has recently been
established at the headquarters of the Chief of Naval Education and
Training to review the NTSC proposals and recommend an integrated
training appraisal program. No firm timetable has yet been established
for the team’s report, but they anticipate providing a proposal in the
summer of 1990. We welcome this Navy effort, but we question whether
this response will prove adequate in view of the severity and extensive-
ness of the problems NTX has documented.




Page 49                GAO/PEMDS~~ Mlitary Technical-Tmining Effectiveness Ia Unknown


                                                   --
                             Chapter 4
                             Field Measures of Training Effectiveness




Air Force

Sourcesof Individual Field   We considered sources of individual-level data for field performance of
Performance Data             Air Force personnel equivalent to those we considered for the Navy-
                             that is, promotion examinations and supervisory surveys. After inter-
                             viewing ,4ir Force personnel, however, we concluded that neither was
                             appropriate for our purposes.

                             Unlike the Navy’s Level II surveys, the Air Force supervisory surveys
                             are still in use. They are conducted by the training sites’ evaluation
                             units for each training course at 2- to 3-year intervals. Questionnaires
                             are sent to the supervisors of recent training graduates to determine
                             how frequently they perform each of the major tasks for which they       -
                             were trained, and how well they perform them. A summary training
                             evaluation report is produced from these data identifying task-specific
                             training deficiencies and/or unnecessary training. We were informed
                             that the individual-level data collected by these surveys are not main-
                             tained by the training sites after their reports have been prepared.
                             Therefore, no individual data exist that would allow us to perform anal-
                             yses equivalent to those we performed using the Army ~QTdata.


Other Evaluation-Related     Other training assessment procedures exist, including training quality
                             reports, utilization and training workshops, and occupational survey
Activities                   reports. Training quality reports provide a means for supervisors of
                             recent training-site graduates to report apparent deficiencies in a
                             recruit’s training. Like the Navy’s FLETAP activities, these reports are
                             part of a reactive evaluation process. A succession of training quality
                             reports for a given course can lead to a complete course review. The
                             other activities are more concerned with front-end analysis. Occupa-
                             tional survey reports on occupational specialties are prepared approxi-
                             mately every three to four years. They are based on questionnaires
                             designed to define the major tasks performed by specialists and their
                             relative frequency. Utilization and training workshops are held when
                             the job requirements of an old occupational specialty change dramati-
                             cally or when a new specialty is defined. Major command functional
                             officers, training staff officers, and managers at the Air Force technical
                             schools participate by examining data from occupational survey reports
                             and identifying the specific training requirements of the specialty.




                             Page 50                GAO/PEMD-91-4Military Technical.Training Effectiveness Is Unknown
                      Chapter 4
                      Field Measures   of Training Effectiveness




                      A key impediment to establishing a field evaluation component of
Alternative Data      training assessment is the expense of developing, testing, and adminis-
Sources:The Job       tering measures that validly and reliably measure actual performance.
Performance           Since the early 1980’s a major effort to address these measurement
                      issues has been under way under the direction of the Office of Accession
Measurement Project   Policy of the Office of the Assistant Secretary of Defense for Force Man-
                      agement and Personnel. Known as the Joint-Service Job Performance
                      Measurement (JPM) project, the effort was initiated at the request of the
                      Congress to validate ASVAB measures against actual performance in the
                      field-instead   of against training grades, which had been the sole crite-
                      rion. The project was triggered by the discovery of the ASVABmis-
                      norming in the late 1970’s, which unintentionally allowed some 300,000
                      less qualified recruits into the services and resulted in field com-
                      manders’ complaints of quality deterioration among their personnel. JPM,
                      in other words, was directed toward testing the connection between the
                      first and third points in our model: test data collected for selection and
                      classification purposes at recruitment, and field performance data. JPM
                      did not set out to establish a link between classroom performance and
                      field performance.

                      JPM concluded that suitable measures of field performance did not exist,
                      and undertook to develop them. Over several years, some highly reliable
                      hands-on performance tests were developed and administered for 25
                      occupational specialties across the four services. Surrogates for hands-
                      on testing were also developed, including more traditional job-knowl-
                      edge tests and performance ratings. JPM concluded that AFQT reliably
                      predicted differences in levels of actual field performance, and that
                      these differences tended to persist through a recruit’s enlistment. JPM,
                      however, has not reported any analyses of sex- or race-related differ-
                      ences. Because of its ASVAB orientation, the project also has not
                      addressed the issue of the classroom/field-performance connection.

                      JPM  performance measures were expensive to develop and frequently
                      costly to administer, and they therefore may not be suitable for more
                      routine use as measures of training effectiveness. However, the invest-
                      ment made to develop these measures and their surrogates could prove
                      more profitable if some of the measures developed and the lessons
                      learned in the JPM effort were more widely applied to the development
                      of realistic assessment procedures for training.




                      Page 51                 GAO/PEMD914 Military Technical-Training Effectiveness Is Unknown
              Chapter 4
              Field Measures of Training Effectiveness




              Our third evaluation question asked to what extent the services’ selec-
Summary and   tion criteria and training evaluation measures predict success in high
Conclusions   technology roles. While we identified a multitude of evaluation-related
              activities in the three services, we nevertheless concluded that insuffi-
              cient data existed for us to respond to this question. Army SQT data can
              be adapted for this purpose, but neither the Navy nor the Air Force rou-
              tinely collects and maintains field performance data to evaluate indi-
              vidual-level training effectiveness.

              Our analysis of Army SQTdata was hindered by the limited size of the
              sample. We were able to derive some preliminary conclusions, how-
              ever- namely, that classroom performance, as measured by SQT, is a
              moderately strong indicator of future field performance for males, but
              not for females, and that ASVABcan predict SQT’Smoderately well for
              white male recruits, but is apparently unrelated to sq~ scores achieved
              by women and minorities. These ASVAB/SQT findings are consistent with
              the pattern of AsvAB/course-grade relationships we discussed in the pre-
              vious chapter.

              The lack of other objective, systematically collected field evaluation
              data renders meaningful evaluation of training effectiveness impossible.
              Decisionmakers-whether       they are in the Congress, DOD,or the indi-
              vidual services-can only react to problems in the field after they have
              become apparent and have been identified as training-related. However,
              given the cost and complexity of today’s military equipment, it is imper-
              ative that the services possess adequate evaluative data to monitor how
              well personnel are being prepared to use and maintain these weapons.




               Page 52                GAO/PEMLb91-4Military Technical-Training Effectiveness Is c’nknown
Chapter 5

SUmmary, Recommendations, and Agency
                                -
Comments and Our Response

                 Our report has addressed three evaluation questions:
Sun-nary
             . How has the aptitude of recruits for technologically sophisticated spe-
               cialties changed since 1980?
             l How useful are the data collected by the services before and during
               classroom training for selecting individuals for high technology roles
               and for evaluating the effectiveness of this training?
             . How well do the services’ selection criteria and training evaluation mea-
               sures predict success in high technology roles’?

                 To respond to these questions, we examined the three essential types of
                 information that could be used to assess the effectiveness of military
                 training: (1) data collected at entry to the military for selection and
                 assignment to an occupational specialty, (2) data on classroom measures
                 of performance during formal training, and (3) data on individual field
                 performance. Our analysis has been set in the context of a recruit pool
                 shifting toward a much higher representation of women and minorities.

                 To answer the first question, we examined ASVAB scores during the
                 1980’s and found that (1) most gains in recruit quality occurred in the
                 first half of the decade, (2) technical abilities of recruits have begun to
                 decline, and (3) women and minorities continue to score lower on tech-
                 nical measures than white males. These findings suggest that an
                 increased burden will be placed on the services’ training establishments
                 to assure the technical competence of their future graduates. The ser-
                 vices’ response may also need to include more demographically sensitive
                 training and/or additional compensatory training to raise basic skill
                 levels.

                 Our response to the second question involved an analysis of classroom
                 grades from thirteen technical courses. Our findings indicated that ( 1)
                 some deficiencies exist in the Army’s computerized grading system: ( 2)
                 during training women and minorities overcome their initially lower
                 technical scores in the Navy and Air Force, but not in the Army; (3)
                 classroom success appears more related to a general ability level as mea-
                 sured by ASVAB than to the Electronics Composite score currently in use.
                 particularly for women; and (4) ASVAB'S ability to predict classroom suc-
                 cess for minorities is weak.

                 The last three findings are interrelated. CTnlike the Army, in the Navy
                 and Air Force, women entered training with significantly higher AFQT
                 scores than men. In addition, the gap in AFQT scores between whites and
                 nonwhites was twice as large for Army trainees as for their Navy and


                 Page 53            GAO/PEMD-914 Military Technical-Training Effectiveness Is Unknown
                  Chapter 5
                  Summary, Recommendations,and Agency
                  Commentsand Our Response




                  Air Force counterparts. Based on these findings, we concluded that the
                  services should consider developing a more general ASVAB derivative,
                  such as our factor score, to assign women and minorities to technical
                  training.

                  We found that there was insufficient evidence to attribute the weak
                  relationship between AWAB and course grades for women and minorities
                  either to problems with ASVAB or to factors in the training environment.
                  Yet, whatever its source, the relative inconsistency of the two measures
                  exists and should be addressed by both the recruiting and training
                  communities.

                  In response to the third question, we examined post-classroom measures
                  of training effectiveness. We concluded that (1) only the Army routinely
                  collects data on individual field performance useful for training evalua-
                  tion purposes; (2) on the basis of these Army data, ASVAB scores are even
                  weaker predictors of field performance for women and minorities than
                  of classroom success; and (3) the Navy’s training evaluation component
                  is in need of more intense review and reform than it is currently
                  receiving.

                  In summary, we found serious weaknesses or gaps at each of the data
                  points required by the evaluation model posited in chapter 1. Of these,
                  the most serious deficiency is the inability of the Air Force and Navy to
                  base their evaluation of their selection procedures and classroom
                  training in systematically collected, objective field performance data.
                  Without the ability to test the “fit” of these data points with one
                  another, the services are not able to maximize their training effective-
                  ness, or even to estimate realistically how successful their training
                  investment is in producing skilled operators and maintainers of
                  today’s-and    tomorrow’s-sophisticated      weaponry.


                  We believe that evaluating the effectiveness of the training provided by
Recommendations   the services is crucial if they are to meet the future challenges of
                  changing recruit demographics and increasingly sophisticated weap-
                  onry. Therefore, we make the following recommendations for action at
                  each of the three information collection points that we consider essential
                  to adequate training evaluation: (1) that the Office of Force Manage-
                  ment and Personnel direct the personnel research it coordinates among
                  the individual services to identify more sensitive predictors of classroom
                  performance for women and minority students from the ASVAB data it
                  already possesses; (2) that the Secretary of the Army direct the Training


                  Page 54            GAO/PEMDSl-4 Military Technical-Trabing Effectiveness Is Unknown
                      Chapter 3
                      Summary, Recommendations,and Agency
                      Commentsand Our Response




                      and Doctrine Command to review the classroom grading procedures
                      identified within the report as deficient, for their accuracy, appropriate-
                      ness. and reliability; (3) that the Secretary of the Navy establish a firm
                      deadline for developing a training evaluation program and that he direct
                      that the adequacy of current resources allocated to this effort be reex-
                      amined. Finally. we recommend that the Assistant Secretary of Defense
                      for Force Management and Personnel review alternative measures of
                      field performance already developed by the services under the .Job Per-
                      formance Measurement project for their potential applicability to
                      training and on-the-job performance evaluation.

                      Our purpose in this study has been to review the ability of the services
                      to monitor, evaluate, and (where necessary) adjust training to changes
                      in the demographics and technical ability of the recruit pool and to the
                      technical sophistication of weapons systems. Whatever changes in our
                      military posture are occasioned by shifts in the nature of threats to our
                      national security, we believe that accurate information relating to the
                      recruit pool, to the effectiveness of military training, and to on-the-job
                      performance will continue to be essential to the mission of our armed
                      forces.


                      In its written response to a draft of this report, DOD concurred with all of
Agency Comments and   its recommendations and identified specific actions to be taken toward
Our Response          implementing them. DOD also concurred or partially concurred with what
                      it identified as the main findings contained in the report. DUD also raised
                      some technical methodological questions and offered some thoughtful
                      interpretations of our findings. (See appendix V.) We have reviewed
                      these comments and, where appropriate, have made changes to the text.

                      DOD generally agreed with our description of changes in recruits’ ASWB
                      scores during the past decade. It commented, however, that it would be
                      inappropriate to define a recruit’s technological sophistication merely as
                      his or her Electronics Composite score. We agree that this would be a
                      very limited definition, and for this reason our report encouraged the
                      development of better predictors of success in more technologically
                      demanding occupational specialties. DOD’S speculation that the decline in
                      Electronics Information scores is attributable to a decline in technical
                      vocational education in high schools is persuasive. It could as well have
                      speculated that the lower Electronics Composite scores of women
                      recruits are attributable to their traditionally lower enrollment in such
                      courses.



                      Page 55            GAO/PEMDS1-4 Military Technical-Training Effectiveness Is Unknown
Chapter 5
SUmmarY,bmmendations, and Agency
Comments and Our Response




DOD  generally concurred with our analysis of classroom grades and their
relationship to ASVAB predictors. However, it questioned the appropriate-
ness of some of our procedures. DOD summarized its methodological con-
cerns as (1) inappropriate pooling of grades from courses with different
metrics, (2) implausibly high factor scores after correction for restric-
tion in range, (3) lack of detailed regression analyses for differences
between subgroups, and (4) small sample sizes for subgroups.

DOD  incorrectly assumes that we simply pooled raw course grades from
different courses. Before performing correlation analyses, we standard-
ized course grades to a common metric to adjust for any differences
between courses in grading procedures. We have also added to the draft
we provided DOD parallel tables of results on the individual-course level.
(See appendixes II and III.)

We share DOD’S concern about the apparently inflated values of the
adjusted validity coefficients for factor scores, but we disagree with
their speculation that inappropriate statistical procedures are the source
of this inflation. We applied the same conventional adjustment proce-
dures to all three scores-MQT, Electronics Composite, and factor
scores-and, as DOD comments, for the first two scores our results “are
consistent with other analyses.” As we stated in the draft report, the
factor scores were based on the ASVAB norm group correlation matrix
provided us by DOD. Having performed a principal-components analysis
of these data, we applied the resultant scoring coefficients to our sample
to obtain factor scores. This procedure ideally offers two advantages.
First, it bases the correlation analysis on a norm group presumably
closer to the universe of applicants to military service than our sample
of relatively high-scoring recruits. Second, it permits adjustment for
restriction of range.

After thorough reexamination of our procedures and the data to which
they were applied, we concluded that the results of factor analysis of
the DOD correlation matrix should not be applied to our sample because
of differences between the two samples in the magnitude of subtest
intercorrelations. DOD reported substantially higher intercorrelations
than were present in our sample. As a result, the variance of our
sample’s factor scores, when based on the DOD correlations, was inappro-
priately restricted, and the adjustment for range restriction was overes-
timated. (All other things being equal, the smaller the sample variance,
the greater the adjustment for restriction in range.)




Page 66            GAO/PEMMl-4 Military Technical-Training Effectiveness Is Unknown
Chapter 5
hnmary, Recommendations,and Agency
Commentaand Our Response




We therefore have recalculated our factor scores, deriving them from a
principal-component analysis of our sample’s AXAB scores rather than
from an analysis of the norm-group correlation matrix provided by DOD.
Consequently, no adjustment for restriction of range would be appro-
priate for these scores. While the correlations of these factor scores with
our criterion measures vary somewhat from those originally reported
(being in some cases higher and in others lower), the slight differences in
no way affect the conclusion that we reached in the draft report and
with which DOD has agreed in both written and oral comments-namely,
that a broader-based measure than the simple composites currently in
use would provide a valuable predictor of classroom performance.

DOD  cites the absence of certain regression-related statistics-intercepts,
regression coefficients, and standard errors of estimates-and the small
sample size in some subgroups as reasons for not “generalizing to other
samples” or “making policy decisions” on the basis of our report. First, a
for simple bivariate relationships such as we analyzed (ASVAB versus
course grades or SQT), our detailed reporting of means, N’s, correlation
coefficients, and significance levels serves essentially the same function
as these equivalent regression statistics. We would, however, gladly pro-
vide our data base to DOD for alternative analysis. Second, we repeatedly
draw the reader’s attention to the problem of small sample size in some
subgroups. Most importantly, we strongly agree that, unless they are
replicated on larger samples, our analyses should not be the basis for
significant policy shifts in selection and classification of recruits.
Rather, we recommended (and DOD concurred) that the services attempt
to develop more sensitive predictors of training success for minorities
and women. (Indeed, one of the main strengths of our work here is that
it determined the insensitivity to these populations of current
predictors.) Should the results of these efforts prove successful, policy
changes would then be appropriate.

The Army found “neither surprising nor particularly disturbing” the
fact that we were not able to use many of the test scores they provided
for some courses because they do not discriminate among soldiers’ per-
formances. We would point out that (1) the same software and report
formats are used to assign scores to trainees in these courses as in other
similar courses where we found usable scores; (2) we were able for some
of these cases to reanalyze the individual measures and derive mean-
ingful scores; and (3) the Army assigns and maintains rank-in-class sta-
tistics for each graduate of these courses on the basis of this software,
thus itself implicitly measuring and recording the relative performance
of individuals. While our ability to perform correlational analyses may


Page 57            GAo/PEMDS1-4 Military Techn.kal-Trainlne Effectiveness
                                                                        Is Unknown
Chapter 5
Summruy, Recommendations,and Agenw
Comments and Our Response




not be a critical need, in our opmion the Army’s ability to perform ob,jec-
tive evaluations of the effectiveness of its courses is. 1Vetherefore ~vel-
come the concurrence of the Army in our recommendation to review its
testing procedures for the courses we identified.

DOD commented on our review of field measures of training effectiveness
for each of the services, asserting that our negative view of ASUH scores
as a predictor of performance for female and minority soldiers NXS con-
trary to research on predicting training success. Not only does DOD pro-
vide no specifics on this research but also, and more importantly. it is
not clear how predicting training outcomes is directly relevant to the
issue of field performance. Of more interest are the preliminary results
reported from ongoing research by the Army Research Institute. These
results suggest a fairly strong relationship for women and a somewhat
weaker, but still significant, relationship for blacks between .LSMRand
SQT in larger occupational specialties. The Army appears to concede that e
these results may not be true for smaller, more technical specialties,
such as the ones we examined. What is most noteworthy about the
Army’s response, however, is its capability to perform these analyses of
field performance routinely, a capability that the Navy and ,4ir Force do
not share.

The Navy supplied some information on recent steps being taken to
enhance training evaluation methods in addition to the ones we identi-
fied in the report. The Air Force commented that they do not have SQT'S
and do not plan to introduce them in the near future. It noted that
“testing, recoding, and documenting individual performance for statis-
tics is very time-consuming, requires additional manpower, and is cost-
prohibitive.” It would be difficult to agree with the Air Force that deter-
mining the effectiveness of individual performance is merely a statis-
tical endeavor, or even that it is an optional one. Rather, it lies at the
core of our ability to know how well we are prepared for meeting critical
defense challenges. Indeed, given the cost and complexity of today’s mil-
itary equipment, it is imperative that all the services possess adequate
evaluative data to monitor how well personnel are being trained to use
and maintain these weapons. Our report does not propose the introduc-
tion of SQT’Sinto other services, nor does it attempt to determine the
cost-effectiveness of ~QT’S.It does, however, assert the need for objec-
tive, systematically collected information on individual field perform-
 ance in all services.




Page 58             GAO/PEMlX91-4 MIlItary TechnIcaLTraining Effectiveness Is ~nknoan
Chapter 5
Summary, Recommendations,and Agency
Commentaand Our Responee




Finally, DOD noted that it had directly addressed the applicability of les-
sons learned from the Joint-Service Job Performance Measurement Pro-
gram in 1985, but had deferred implementing any training-related
application of these measures at that time. DOD states that it will explore
the feasibility of such an application once again.




 Page 59             GA(-J/pmg14      Mimaq   Technical-Training   Effectiveness Is Unknom
.4ppendix I

AF’QT Mean Score and Ekctronics Composite
Summq Statistics: 198 l-89

Table 1.1:AFQT Mean SCOreS, by
Gender’                                                                               Male                            Female
                                             Year                          Number              Mean
                                                                                                ___-         Number            Mean
                                             1981                           163.571            20395         22886
                                                                                                             ~~                20295
                                             1982                           222,726            20626         30311             209 10
                                             1983                           227,161            20951         32,546            211 57
                                             1984                           226,975            21036         32,026            211 15
                                             1985                           222.772            211 55        35,368            211 43
                                             1986                           254,030            211 94        37,175            21273
                                             1987                           239,122            21217         35,385            21242
                                             1988                           213.493            21264         32,682            21204
                                             1989                           217 783            21183         35.984            21178
                                             %um of subtest standard scores



Table 1.2: AFQT Mean Scores, by Service’
                             Army                               Navy                      Air Force               Marine Corps
Year                 Number           Mean           Number             Mean          Number        Mean       Number        Mean
1981                  76,284        19552              47,715          20861          37.389        213 12      25,069         206 16
1982                 108.063        201 73             55.182          21006          57,442        212.86      32,350         20584
1983                 121,112        206.07             55,256          212.52         51,771        216.72      31,568         207 78
1984                 118.287        20707              57,214          21185          50,235        218.45      33,265         207 67
1985                 111,625        209.30             59,604          211.92         57,617        21708       29,294         20834
1986                 125,918        210.33             68,891          210.30         62,372        21708       34,024         211 44
1987                 120,538        210.73             66,078          210.75         54,371        218.10      33,520         21090
1988                 102,709        210.88             69.080          21158          40,087        219.94      34,299         21093
1989                 106,126        20942              73.272          210.40         42,247        220.59      32,122         21145
                                             %iurn of subtest standard scores




                                             Page 60                   GAO/PEJMD61-4Military Technical-~      Effectiveness Is Unknown
                                           Appendix I
                                           AFQT Mean Score and Electronics Composite
                                           Summary Statistics: 198149




Table 1.3:AFQT Mean Scores, by Race/Ethnicity’
                            White                              Black                   Hispanic                            Other
Year                Number          Mean             Number            Mean       Number        Mean           Number              Number
1981                  138,431        20927            35.666           18656        6,904        191 00          5.456             19495
1982                  189,134        211 48           48.377           19086        8,569        19397           6957              198 91
1983                  196.585        21419            47.540           19454        8,616        19871           6966              20254
1984                  193.193        21507            48.500           19499        9,439        19946           7 869             20415
1985                  190.243        215 79           49,663           197 97       9,504        20232           8730              205 88
1986                  212,661        21594            56,150           19920       12,059        204 26         10,335             20674
1987                  198,130        216.62           54 166           19867       13,708        205.00          8.503             207 42
1988                  174,501        21716            50.370           19914       13,567        20592           7,737             20784
1989                  177,111        216.40           53,409           199.07      15,499        20592           7,748             20697
                                              %urn of subteststandard scores

Table 1.4: AFQT Mean Score Overall
Totals*                                                                                                    Overall total
                                              Year                                             Number                              Meanb
                                              1981                                              186,457                             20383
                                              1982                                              253,037                             206 60
                                              1983                                              259,707                             209 77
                                              1984                                              259,001                             21041
                                              1985                                              258,140                             211 53
                                              1986                                              291,205                             211 90
                                              19R7
                                              .--.                                              274.507                             21221
                                              1988                                              246,175                             212.56
                                              1989                                              253.767                             21182
                                              %um of subtest standard scores
                                              bStandard dewatlon = 20.66




                                              Page 61                  GAO/pEMD914 MIlhry   Technical-TraLnbg Effectiveness 1scnknown
                                             Appendix I
                                             AFQT Mean Score and Electronics          Composite
                                             Summary Statistics: 1981439




Table 1.5: Electronics Composite Mean
Scores, by Gender’                                                                          Male                                Female
                                                 Year                         Number                 Mean              Number             Mean
                                                 1981                             163,571            20789             22,886             19441
                                                 1982                          222,726               21000             30,311             199 18
                                                                                                                                         __-
                                                 1983                          227,161               21291             32,546             201 52
                                                 1984                          226,975               213.46            32.026             201 40
                                                 1985                          222,772               212.70            35.368             199 57
                                                 1986                          254,030               211.76            37,175             20057
                                                 1987                          239,122               212.17            35.385             20057
                                                 1988                          213,493               212.73            32,682             199 43
                                                 1989                          217,783               21150             35,984             199 97
                                                 5um of subtest standard scores



Table 1.6: Electronics Composite Mean Scores, by Service.
                               Army                       Navy                                  Air Force                   Marine Corps
Year                   Number         Mean        Number                   Mean             Number        Mean           Number        Mean
1981                   76,284           198.22             47,715         209.76            37,389            215.75      25.069          208.27
1982                  108,063           204.03             55,182         210.33            57,442            215.24      32.350          20790
1983                  121,112           207.92             55,256         212.16            51,771            218.34      31,568          21000
1984                  118,287           208.56             57,214         211.69            50,235            219.87      33,265          209 70
1985                  111,625           208.66             59,604         209.66            57,617            216.77      29,294          20817
1986                  125,918           208.73             66,891         20732             62,372            215.48      34,024          20980
1987                  120,538           208.79             66,078         208.55            54,371            217.21      33,520          20936
1988                  102,709           209.11             69,080         208.71            40,087            219.01      34,299          20953
1989                  106,126           207.19             73,272         207.29            42.247            218.69      32,122          20965
                                                 “Sum of subtest standard scores




                                                 Page 62                 GAo/PEMD914 Mlwuy Te~~ldcal-lkaidng J3fectiveness Is Unknown
                                              Appendix I
                                              AFQT Mean Scoreand Electronics Composite
                                              summary statistics: 199189




Table 1.7: Electronics Composite Mean Scores, by Race/Ethnicity’
                              White                      Black                           Hispanic                            Other
Year                   Number         Mean        Number                 Mean       Number        Mean           Number              Mean
1981               __-. 138,431      21247              35,666          18645         6.904        19340           5 456             197 91
1982           -.~-     189.134      21451              48.377          190.01        8,569        19637           6957              201 33
1983                    196,585      21681              47.540          19324         8,616        20093           6966              204 31
1984                    193,193      21753              48,500          193.49        9,439        201 35          7 869             206 24
1985                    190,243      21628              49.663          19394         9,504        20250           8,730             205 87
1986                    212.661      21550              56,150          19411        12,059        20307          10.335             205 78
1987                    198130       216.19             54,166          193.50       13,708        203 76          8 503             207 23
1988                    174501       216.86             50,370          19408        13,567        204.54          7,737             20708
1989                    177111       215.64             53,409          193.46       15,499        203.66          7,748             20657
                                              Yiurn of subtest standard scores

Table 1.8: Electronics Composite Mean
Score Overall Totals.                                                                                        Overall total
                                              Year                                                Number                             Meanb
                                              1981                                                186,457                            20604
                                              1982                                                253,037                            20844
                                              1983                                                259,707                            21115
                                              1984                                                259,001                            21159
                                              1985                                                258,140                            21065
                                              1986                                                291,205                            209 97
                                              1987                                                274,507                            21047
                                              1988                                                246,175                            21067
                                              1989                                                253,767                            209 45
                                              %um of subtest standard scores
                                              bStandard devlatlon = 22.19




                                              Page 63                  GAO/PEMD91-4 Military Technkal-Training Effectiveness Is Cnknown
Appendix II

Predictor and Criterion Variable Mean Scores



Table 11.1:Army Mean Scores
                                                              Electronics
                               AFQT’                         Composite.                     Course grade                       SO-P
Category               Mean            Number              Mean         Number            Mean        Number           Mean            Number
24J                   227 87               65              234.75               65        86.75           76           82 58               53
27N                   226.73              100              232.85              loo        88.78          138           83.95              110
29v                   238.22              136              242.92              136        93.55           41           76.98               65
Male                  232.14              280              238 46              280        89.23          232           82.12              209
Female                232.87               23              230 13               23        80.31           23           77 52               21
White                 234.00              255              240.00              255        96.19          160           81.86              144
Nonwhtte              222.67               48              226.29               48        86.86           95           81.45               86
All Army              232.20              303              237 83              303        88.94          255           81 70              230
                                                %um of subtest standard scores
                                                bScore on SkullsQualification Test

Table 11.2:Navy Mean Scores
                                                                                                    Electronics
                                                                                 AFQT’              Comwsitea                 Course grade
                                                Catwow                         Mean Number          Mean Number               Mean Number
                                                AQ                             228.10     703      233.13        783           89 72      833
                                                AX                             231.64     392      236.16        392           90 64      469
                                                STG                            228.57   3,233      234.43      3,233           90.23    3,418
                                                STS                            231.87   1,698      237.47      1,696           86.89    1,723
                                                Male                           229.59   6,060      235.33      6,080           89.11    5,882
                                                Female                         235.59      76      230.65         76           90.70       71
                                                White                          230.49   5,355      236.25      5,355           89.20    5,179
                                                Nonwhrte                       224.18     801      228.74        801           89.57    1,159
                                                All Navy                       229.67   6,156      235.27      6,156           89.30    6.443
                                                %um of subtest standard scores




                                                Page 64                   GAO/PEMD914 MiUtary Technical-Tmining    Effectivenew   Is hknown
                                    Appendix [I
                                    Predictax and Criterion Variable Mean Scores




Table 11.3:Air Force Mean Score?,
                                                                                     Electronics
                                                                   AFQT’             Compositea                    Course grade
                                    Category                     Mean Number         Mean Number                   Mean Number
                                    45530A                      235.53     119       24072      119                    90 17    119
                                    455308                      235.93     231       240.55     231                    9082     231
                                    30332                       238.12     212       245.00     212                    91 77    227
                                    30333                       234.15     360       23977      360                    91 31    377
                                    Male                        235.45     824       241 94     824                    91 31    854
                                    Female                      23773       98       235.88      98                    8991     100
                                    White                       236.22     825       241.95     825                    91 21    855
                                    Nonwhlte                    231.19      97       235.73      97                    90.76     90
                                    AllAir Force                235.68     922       241 29     922                    91 16    954
                                    ?Sum of subtest standard scores




                                     Page 65                 GAO/pEMD914 Milhry    Technhl-Tminhg     EffeCtiVeneS.3      hi c’nknown
Appendix III

Intercorrelation of Study Variables by
Occupational Specialty

Table 111.1:Intercorrelation   of Study
Variables: Army, 24J’                                                                   Electronics                               Grade*
                                          Category                         AFQTb        CompositeC           FactoP       Raw        Adjusted’
                                          Total
                                            AFOT
                                          ___-                                 1 00                  0 799       0 839     0319             0 493
                                            Electronics Composite                65                  100   --____0813      0 329            0 33”
                                            Factor                               65                    65 __-__-I_~~~-
                                                                                                                 1 00      0 409         .~__.~
                                            Grade                                59                    5g          59
                                                                                                                   ___-~-- 1 00 ~    ~~ ~~_~
                                          Male
                                            AFQT                               1 00                  0 829      0 8.59    0 2ge ~-       --~.i$:s
                                            Electromcs    ComDoslte              55                  1 00       0 799     0 289 ~---__       03oq
                                            Factor                               55                    55       1 00      0 38s
                                            Grade                                                                 50      1 00
                                          Female
                                            AFOT                               1 00                  081s       0 899     043                0 63
                                            Electronics   Composite              10                  1 00       0 889     0 15               0 15
                                            Factor                               10                    10       1 00      021
                                            Grade                                 9                     9          9      1 00      _____
                                          White
                                            AFQT                               1 00                  0.82s      0 809     0 24               0 39
                                            Electronics   Composite              49                  1 00       0 799     027                0 29
                                            Factor                               49                    49       1.oo      0429       ~___
                                            Grade                                44                    44         44      1 00
                                          Nonwhlte
                                            AFQT                               1 00                  0619       0 809      0 13              0 23
                                            Electronics   Composite              16                  1 00       0 849      0 15              0 16
                                            Factor                               16                    16       1.00       0.17
                                            Grade                                15                    15         15       1 00
                                          %orrelatlon coefflclents are In upper diagonal and number In lower diagonal
                                          bAFQT = sum of subtest standard scores
                                          CElectronlcs Composite = sum of subtest standard scores for Electronics Composite
                                          dFactor = score from first factor from pnnclpal component ana!ysis
                                          eGrade = flnal course grade
                                          ‘Adjusted = correlatjon adjusted for restnction of range
                                          gp<     05




                                          Page 66                     GAO/PEMD-914 hlilitary TechnicaLTraining Effectiveness Is L‘nknown
                                          Appendix lIl
                                          Intercorrelation of Study Variables by
                                          occupational speclalcy




Table 111.2:Intercorrelation   of Study
Variables: Army, 27N’                                                                          Electronics                       Grade.
                                          Cateaorv                                AFQTb        CompositeC Factofl          Raw      Adiusted’
                                          Total
                                            AFQT                                      1 00             0 849     0 859      0 362          0 55;
                                            Electromcs Composite                       100             1 00      0 929      0 533          0 57’2
                                            Factor                                     100              100      1 00       0 489
                                            Grade                                       95               95        95       1 00
                                          Male
                                            AFQT                                      1 00             0 869     0 85s      0 399          0 59s
                                            Electrontcs Composite                       94             1 00      093s       0 529          0- 56s
                                            Factor                                      94               94      1 00       0 48s
                                            Grade                                       89               89        89       1 00
                                          Female
                                            AFQT                                      1 00             0 869      0 829     0849           0 94s
                                            Electrontcs Compostte                        6             1 00       0 969     0 889          0 939
                                            Factor                                       6                6       1 00      0 90s
                                           Grade                                         6                6          6      1 00
                                          White
                                           AFQT                                       1 00             0 829      082s      0319           0 49a
                                           Electrontcs Composite                        85             1 00       0 909     0 499          0 529
                                            Fartnr
                                             --.-.                                      --
                                                                                        85               85       1 00      0 439
                                            Grade                                       81               81         81      1 00
                                          Nonwhite
                                           ._
                                             AFQT                                     1 00              0.809     0819      031            0 49
                                             Electromcs Composite                       15              1 00      0.93s     0 659          0 699
                                             Factor                                     15                15      1.00      0.629
                                             Grade                                      14                14        14      1 00
                                          %orrelation coeffrcrents are In upper diagonal and number rn lower dragonal
                                          bAFQT = sum of subtest standard scores
                                          CElectronrcsCompostte = sum of subtest standard scores for Electronrcs Composrte
                                          dFactor = score from first factor from pnncrpal component analysts
                                          eGrade = final course grade
                                          ‘Adjusted = correlatton adjusted for restnctron of range
                                          op < 05




                                           Page 67                  GAO/PEMDgl4 MLutary Tednkd-Thining                  Effectiveness Is unknown
                                          Appendix Ill
                                          Intercomehtion of Study Variables by
                                          Occupational Specialty




Table 111.3:intercorrelation   of Study
Variables: Army, 29Va                                                                    Electronics                            Grade’
                                          Category                          AFQTb        CompositeC          Facto+ -___ Raw       Adjusted’
                                          Total
                                            AFQT                               1 00                  0 749     0 799     0 20                0 33
                                            Electroncs   Composite              136                  1 00      0 889     0 -__
                                                                                                                            509       ~--__ 0 53s
                                                                                                                                     -~ ~~..~__
                                            Factor                              136                  136       1 00      0 389
                                            Grade                                35                    35        35      1 00--__-          ._
                                          Male
                                            AFOT                               1 00                  0 79      0 805     0 25               0 41
                                            Electronics Composite               129                  1 00      0 885     0 47g              0 504
                                            Factor                              129                   129      1 00      0 369
                                            Grade                                32                    32        32      1 00
                                          Female
                                           AFQT                                1 00                  0.839     0 805     0 59               0 78
                                           Electronics Composite                  7                  1.oo      0 909     0 79               0%
                                           Factor                                 7                     7      1 00      0.57
                                           Grade                                  3                     3         3      1 00        ____
                                          White
                                            AFQT                               1 00                  0 74s     0 785     0 20               0 33
                                            Electronics Composite               119                  1 00      0 879     0 53s          ___ 0- 563
                                            Factor                              119                   119      1 00      0 40s
                                            Grade                                29                    29        29      1 00
                                          Nonwhite
                                            AFQT                               1.oo                  L -9      0 859     0.18               0 31
                                            Electronics ComDoslte
                                                            *-                   17                  1 20      0 869     0 34               0 36
                                            Factor                               17                    17      1 00      0 23
                                            Grade                                 6                     6         6      1 00
                                          %orrelatlon coefficients are In upper diagonal and number in lower diagonal.
                                          “AFQT = sum of subtest standard scores
                                          CElectronlcs Composite = sum of subtest standard scores for Electromcs Composrte
                                          dFactor = score from first factor from pnnclpal component analysis
                                          eGrade = flnal course grade
                                          ‘Adjusted = correlation adjusted for restnction of range
                                          gp < 05




                                          Page 68                    GAO/PEMDSI4 Military Technical-Thxihg Effectiveness Is Unknown
                                          Appendix Ill
                                          Intercorrelation of Study Variables by
                                          occupational specialty




Table 111.4:Intercorrelation   of Study
Variables: Navy, AQ*                                                                                  Electronics                        Grade*
                                          Category                                      AFQT”         CompositeC Factoti          Raw       Adjusted’
                                          Total
                                            AFQT                                            100              0 839        0 859   0 25s           0 409
                                            Electronics Composite                           783              100          086s    0 279           0 295
                                            Factor                                          783               783         1 00    0 253
                                            Grade                                           774               774          774    1 00
                                          Male”
                                            AFQT                                            100               0 839       0 859   0 259           0 409
                                            Electronics Composite                           783               1.00        0.86g   0.279           0 295
                                            Factor                                          783                783        i 00    0.29
                                            Grade                                           774                774         774    1 CO
                                          White
                                           AFOT                                             100               0 839       o.a4g   0 259           0419
                                            Electronics Compostte                           665               100         0869    0 28s           0 309
                                            Factor                                          665               665         1.00    0.279
                                            Grade                                           656               656          656    1.00
                                          Nonwhlte
                                            AFQT                                            1.00              0.829       0.869   0.13            0 22
                                            Electronics Composite                            118              1.00        0.83s   0 16            0 17
                                            Factor                                           118               ii8        1.00    007
                                            Grade                                            118               118         118    100
                                          Torrelatlon   coefficients   are In upper diagonal and number in lower diagonal.

                                          bAFOT - sum of subtest standard scores

                                          CElectronrcs Compostte = sum of subtest standard scores for Ekctronlcs           Composite
                                          dFactor = score from first factor from principal component analysis

                                          Qrade    = final course grade

                                          ‘Adjusted = correhtton       adfusted for restriction of range

                                          gp < .05
                                          hWxnen are prohbted          from serving rn the Navy’s AQ occupatronal   specialty
                                          Appendix Ill
                                          Intercorrelation of Study Variables by
                                          0rmpation.d specialty




Table 111.5:intercorrelation   of Study
Variables: Navy, AXa                                                                       Electronics                             Grade.
                                          Category                            AFQT”        Compositec           FactoP     Raw        Adjusted’
                                          Total
                                                                                                                 -~--__-            _
                                            AFOT                                  100                 0 819        0.839    0419            0619
                                            Electronics Composite                 392                 100          0.899   040s             0 439
                                            Factor                                392                  392          100     0.399
                                            Grade                                 391                  391          391     100
                                          Male
                                            AFC?T                                 1.00                0.879        0.88g    0429            0 629
                                            Electrows      Composite              321                 1.00         0.90s    0439            0.469
                                            Factor                                321                  321         1.00     0.419
                                            Grade                                 320                  320          320     1 00
                                          Female
                                            AFQT                                  100                 0.75s        0 809    0.39s           0 589
                                            Electromcs Composite                   71                 1.00         0839     0329            0.34s
                                            Factor                                 71                   71         100      0.399
                                            Grade                                  71                   71           71     100
                                          White
                                            AFQT                                  1.Oo                0.809        o&P      0.449         __ 0 6%
                                            Eiectromcs Composite                  336                 1.00         0.899    0.46s            0 499
                                            Factor                                336                  336         100      0.44
                                            Grade                                 335                  335          335     100
                                          Nonwhite
                                            AFQT                                  1.00                0.789        0.84g    0.18            0 29
                                            Electronics Composite                   56                1.00         0.879    0.02            0 02
                                            Factor                                  56                  56         1.00     0.07
                                            Grade                                   56                  56           56     100
                                          aCorrelatIon coefficients   are In upper diagonal and number in lower diagonal

                                          bAFOT = sum of subtest standard scores

                                          CElectronlcs Composite = sum of subtest standard scores for Electrontcs Composite
                                          dFactor = score from first factor from principal component analysis

                                          eGrade = final course grade

                                          ‘Adjusted = correlation adjusted for restriction of range

                                          gp < .05




                                          Page 70                      GAO/PEMD914 Military Tecbnkal-Trainhg Effectiveness IE Unknown
                                          Appendix Cll
                                          Intercorrelation of Study Variables by
                                          occupational specialty




Table 111.6:Intercorrelation   of Study
Variables: Navy, STG.                                                                      Electronics                           Grade.
                                          Category                            AFQl”        ComoositeC         FactoP       Raw      Adiusted’
                                                                                                                                       -a----

                                          Total
                                            AFQT                                 1 00                 0 789      0 809     0 307          0 489
                                            Electronics     Composite           3233                   1 00      0 849     0 269          0 289
                                            Factor                              3233                 3233         100      0 289
                                            Grade                               3123                 il23        3123      1 00
                                          Maleh
                                            AFQT                                 1 00                0 789        0 809      ___
                                                                                                                           0 309          0 48s
                                            Electronics     Composite           3233                  100         0 849    0 269          0 289
                                            Factor                              3233                 3233          1 00    0 289
                                            Grade                               3123                 3123         3123     1 00
                                          White
                                            AFQT                                 100                 0 799        0 809    0319           0 495
                                            Electrontcs     Composite           2791                  100         0.849    0.289          0 2.9”
                                            Factor                              2791                 2791          1 00    0.309
                                            Grade                               2697                 2697         2697     1 00
                                          Nonwhite
                                            AFQT                                 1 00                0 719        0 769    0 229          0 377
                                            Electrorucs     Composite            442                 100          0.789    0 169          0 165
                                            Factor                               442                  442         100      0 129
                                            Grade                                426                  426          426     1 00
                                          ‘Correlatron   coeffrcrents are In upper diagonal and number in lower dragonal
                                          bAFQT = sum of subtest standard scores

                                          CElectronrcs Composrte = sum of subtest standard scores for Electronrcs Composite

                                          dFactor = score from first factor from principal component analysis

                                          eGrade = final course grade
                                          ‘Adjusted = correlation adtusted for restnctron of range

                                          gp < .os
                                          hWomen are prohrbrted from servrng In the     Navy’sSTG occupational specialty




                                           Page   71                    GAO/PEMD-914 Military Technical-Train&g Effectiveness Is Unknown
                                          Appendix m
                                          IntercorrelatIon of Study Variablw       by
                                          occupational specialty




Table 111.7:Intercorrelation   of Study
Variables: Navy, STS                                                                               Electronics                          Grade*
                                          Category                                  AFQT”          CompositeC Factoe              Raw      Adjusted’
                                          Total
                                                                                                                                          ~-~__--
                                            AFQT                                         100                 0 769        0 78s   0 28s           0 453
                                            Electromcs Composite                        1698                 100          0 853   0 269           0 27s
                                            Factor                                      1698                 1698         100     0 269 --___
                                            Grade                                       1651                 1651         1651    1 00
                                          Male”
                                           AFQT                                          1.OO                0.769        0 789   028s        ___0 453
                                           Electronics Composite                        1698                  100         0 85g   0.26g          0.279
                                           Factor                                       1698                 1698         1 00    0 269
                                           Grade                                        1651                 1651         1651    1 00
                                          White
                                           AFOT                                         100                  0.779        0.79    0.28s          0.463
                                            Electronics Composite                       1518                 1.00         0s      0.279          029
                                            Factor                                      1518                 1518         1 00    0.280
                                            Grade                                       1477                 1477         1477    loo
                                          Nonwhlte
                                            AFQT                                        1.00                 0.70s        0.689   0 279          0 443
                                            Electronics Composite                        180                  100         0.829   0.11           0 12
                                            Factor                                       180                   180        1.00    0.12
                                            Grade                                        174                  174          174    1.00
                                          %orrelation coefficients are In upper diagonal and number tn lower diagonal
                                          bAFOT = sum of subtest standard scores

                                          CElectronlcs Composite = sum of subtest standard scores for Electronics Composite

                                          dFactor = score from first factor from principal component analysis
                                          eGrade = final course grade

                                          ‘Adjusted = correlation adjusted for restriction of range

                                          gp c .05
                                          hWomen are prohIbIted from serving In the Navy’s STS occupational          specialty




                                          Page 72                    GAO/PEMD~I~         Milituy      Technical-Trahing      Effectiveness Ls Unknown
                                        Intercorrelation of Study Variables by
                                        Occupational Specialty




Table 111.8:Intercorrelation of Study
Variables: Air Force, 4553OA                                                           Electronics                            Grade*
                                        Category                          AFQTb        Composites           Facto+     Raw       Adjusted’
                                        Total
                                          AFQT                               1 00                   0 74      0 199    0 229              0 362
                                          Electronics   Composite            119                    100       087      0 27'2             0 299
                                          Factor                             119                     119      1 00     0 309
                                          Grade                               119                    119       119     100
                                        Male
                                          AFOT                                100                   0 779     0 779    021s               0 351
                                          Electronics   ComDoslte              99                   1 00      0869     0 260              028s
                                          Factor                               99                     99      1.oo     0 279
                                          Grade                                99                     99        99     1 00
                                        Female
                                          AFQT                                1 00                  0 699      0 639   0 31               0 49
                                          Electronics   Composite               20                  1 00       0.849   015                015
                                          Factor                                20                    20       1.oo    0.25
                                          Grade                                 20                    20         20    1 00
                                        White
                                          AFOT-                               1 00                  0 759     0.739    0 249              0 39s
                                          Electronvzs   Composite              102                  1.00       0.879   0 289              0 29';
                                          Factor                               102                   102       1 00    0.28s          ~
                                          Grade                                102                   102      2102     100
                                        Nonwhite
                                          AFQT                                1 .oo                 0 58s      0.65s   008                013
                                          Electronics ComDosite                  17                 1.oo       0859    0.22               023
                                          Factor                                                               1 00    0.33
                                          Grads                                 17                    17         17    1 00
                                        %orrelation coefflclents are In upper diagonal and number In lower diagonal.
                                        bAFQT = sum of subtest standard scores
                                        ‘Electronics Composite = sum of subtest standard scores for Electronics Composite
                                        dFactor = score from first factor from prmcipal component analysts
                                        eGrade = flnal course grade
                                        ‘Adjusted = correlation adjusted for restrictIon of range
                                        gp<     05




                                        Page 73                     GAO/PEMD914 Military Technical-Trahing Effectiveness Is unknown
                                        Appendix III
                                        Intercorrelation of Study Variablea by
                                        occupational specialty




Table 111.9:Intercorrelation of Study
Variables: Air Force, 455308’                                                          Electronics                                 Grade*
                                        Category                          AFQTb        Compositec           Factoe         Raw        Adjusted’
                                        Total
                                          AFOT                               1 00                   0 709     0 729        0 229            0 369
                                          Electronics Compostte              231                    1 00      0 839        0 279            0 289
                                          Factor                             231                     231      1 00         0 299
                                          Grade                              231                     231       231         1 00
                                        Male
                                          AFQT                               1.00                   0.719      0.729       0 239            0 371
                                          Electronics Compostte              215                    1.00       o.a4g       0 259            0.279
                                          Factor                             215                     215       100         0.299
                                          Grade                              215                     215        215        1 00
                                        Female
                                          AFQT                                100                   0 aig      0.839       0.15             0.26
                                          Electronics Comoostte                 16                  loo        0.719       0.25             0 26
                                         Factor                                 16                    16       1 00        0.10
                                         Grade                                  16                    16         16        1 00
                                        White
                                         AFQT                                 1.00                  0 709      0.729        0 259           0 409
                                         Electrorxs     Composite             206                   1.00       o.aig        0.329           0 349
                                         Factor                               206                    206       1.00         0.359
                                         Grade                                206                    206        206         1 MI
                                        Nonwhite
                                          AFQT                                1.OO                  0.66g      0.659        0.11            0 19
                                          Electromcs Composite                  25                  1.00       093          0 05            0 06
                                          Factor                                25                    25       1.00         0.04
                                          Grade                                 25                    25         25         1 00
                                        aCorrelation coefficients are tn upper diagonal and number in lower diagonal.
                                        bAFOT = sum of subtest standard scores
                                        CElectronlcs Composite = sum of subtest standard scores for Electronics Composite
                                        dFactor = score from first factor from prlnclpal component analysis
                                        =Grade = flnal course grade
                                        ‘Adjusted = correlation adjusted for restriction of range




                                        Page 74                   GAO/PEMD91-4 Milituy Tc&nkal-~                       Effectiveness IE Unknown
                                         Intercomhtion        of Study Variables by
                                         occllpattonal    speddty




Table 111.10:Intercorrelation of Study
Variables: Air Force, 30332’                                                                     Electronics                          Grade*
                                         category                                    AFQTb       Compositec FactoF              Raw      Adjusted’
                                         Total
                                           AFOT                                         1 00               0 699     0.79       0 399          0 599
                                           Electronics Composite                        212                100       081s       0419           0 439
                                           Factor                                       212                 212      1.00       0439
                                           Grade                                        212                 212       212       1 00
                                         Male
                                           AFQT                                         100                0 749     0.789      0 419           061e
                                           Electronics Composite                         186               100       0.82s      0 409      ___- 0 429
                                           Factor                                        186                186      100        0.459
                                           Grade                                         186                186       186       1 00
                                         Female
                                          AFQT                                          100                0.629     0.719      0 34           0 53
                                          Electronics Composite                           26               1.00      0.799      0 489          0 509
                                          Factor                                          26                 26      1.00       031
                                          Grade                                           26                 26            26   1 00
                                         White
                                           AFQT                                         1.00               0 709      0.779     0 369          0 55s
                                           Electronics Composite                         190               100        0.819     0419           0 439
                                           Factor                                        190                190       1.00      0.42s
                                           Grade                                         190                190        190      1 00
                                         Nonwhite
                                             AFOT                                       1.00               0.56g      0.709     0.629          0819
                                             Electronics Composite                        22               1.OQ       0.759     0 43s          0 46s
                                             Factor                                       22                 22       100       0.61s
                                             Grade                                        22                 22            22   1.oo
                                         aCorrelatIon coefficients   are in upper dtagonal and number In lower diagonal.

                                         bAFQT = sum of subtest standard scores

                                         CElectronics Composite      = sum of subtest standard scores for Electromcs Composite
                                         dFactor = score from first factor from principal component analysts

                                         *Grade = final course grade

                                         ‘Adjusted = correlation adjusted for restriction of range

                                         gp c .05




                                         Pyle 75                      GAO/PEMBSl-i Milituy Technical-Trdning Effectiveness Is hknown
                                         Appendix III
                                         Intercorrelation of Study Variables by
                                         Occupational Specialty




Table 111.11:Intercorrelation of Study
Variables: Air Force, 30333’                                                                   Electronics                      Grade.
                                         Category                                 AFQT”        Compositec FactoP          Raw      Adjusted’
                                         Total
                                                                                                                                .___~~
                                           AFQT                                      1 00               0 729     0 779   0329         0 509
                                           Electronrcs Composrte                     360                1 00      0 83g   0 388 -___- 0 40s
                                           Factor                                    360                 360      1 00    0 409
                                                                                                                                    __    .~
                                           Grade                                     360                 360       360    1 00
                                         Male
                                           AFQT                                      1 00               0.759     0 799   0.319           0 499
                                           Electronics Composite                     324                1 00      0.849   0 399        __ 0 41’;
                                           Factor                                    324                 324      1.00    0349
                                           Grade                                     324                 324       324    1 00
                                         Female
                                          AFQT                                       100                0 589     0.789   0 50s           0 709
                                          Electronrcs Composrte                       36                1 00      0 749   0 22            0 24
                                          Factor                                      36                  36      100     0.36s
                                          Grade                                       36                  36        36    1 00
                                         Whrte
                                           AFQT                                      1.oo               0.719     0 779   0349            0 539
                                           Electronrcs Compostte                     327                1.oo      0 849   0 389           0 409
                                           Factor                                    327                327       1.00    0.359
                                           Grade                                     327                327        327    1.00
                                         Nonwhrte
                                           AFQT                                      1 00               0 669     0.689   0 10            0.17
                                           Electronics Composrte                       33               1.00      0.709   0.439           0 469
                                           Factor                                      33                 33      1 00    0.439
                                           Grade                                       33                 33        33    1 00
                                         “Correlation coefficients are In upper dragonal and number In lower diagonal
                                         bAFOT = sum of subtest standard scores
                                         CElectronrcsComposrte = sum of subtest standard scores for Electronrcs Composrte
                                         dFactor = score.from first factor from pnncrpal component analysts
                                         eGrade = ftnal course grade
                                         ‘Adjusted = correlabon adjusted for restriction of range
                                         gp < 05




                                         Page 76                   GAO/PEMD-91-4Military Technical-TWnin,g Ellectiveness Is Unknown
Appendix IV

Amy SQT Mean Scores, by
Occupational Specialty


              Specialty                 Year               Number                  Mean
              24J                       1985                   154                  86 48
                                        1986                   152                  A7 11
                                        1987                   102                 8250
                                        1988                    92                 8305
                                       Total                   500                 85.23
              27N                       1985                   196                 8553
                                        1986                   157                 8836
                                        1987                   145                 8666
                                        1988                   185                 7956
                                       TOM                     683                 84.81
              26V/29V                   1985                  1,308                8228
                                        1986                  1.261                79 39
                                        1987                    944                80 19
                                        1986                    831                7877
                                       Total                 4,344                 80.40




              Page 77     GAO/PEMD914 Military Technical-‘lhinhg   EfWtiveness IE Unknown
Comments From the Department of Defense



                                                      ASSISTANT     SECRETARY       OF DEFENSE

                                                            w*SHtNGTON.    D.C. 203014000



         i   FORCE    M*NAGEMENT                                      1 0 AUG i9SO
                &ND   PERSONNEL


                           MS.   Eleanor Chelimsky
                           Assistant    Comptroller  General
                               Program Evaluation   and Methodology                  Division
                           U.S. General Accounting     Office
                           441 G. Street,     NW
                           Washington,    DC 20548
                           Dear Ms. Chelimsky:
                                 This is the Department    of   Defense (DOD) response to the
                           General Accounting    Office   (GAO) draft     report,   "MILITARY TRAINING:
                           Effectiveness    for Technical     Specialties    Inadequately   Measured,"
                           dated May 31, 1990 (GAO Code 973276, OSD Case 8371).
                                  The report     provides     a series      of useful     recommendations     that
                            are consistent       with ongoing DOD initiatives              designed to develop
                            more sensitive       indicators     of trainee       performance      and to develop
                            more cost-effective          ways of measuring        performance      both in the
                            schoolhouse       and on-the-job.        Despite general         agreement with the
                            report's    final    recommendations,         the DOD does not fully         concur
                            with many of the specific           findings.        In several      cases, the find-
                            ings and conclusions          appear to be based on incorrect              assumptions
                            or inappropriate        methodology.        Specific      issues and details      are
                            provided    in the enclosure.
                                   In addition,     it is important        to note that the field               of job
                            performance       measurement is still         a developing         science and cost-
                            effective     measures for use in evaluating               training      effectiveness
                            are not yet available.             As discussed      in the enclosure,            the DOD
                            has additional       measurement programs in place beyond those dis-
                            cussed in the report,           and continues      to support        a substantial
                            number of research         efforts      to expand the boundaries             of this
                            science.      The GAO report        substantiates      the Department's             conclu-
                            sions about the demands of selecting                 and training         individuals       to
                            meet the requirements           of technical      specialties        in the coming
                            years,    and reinforces        current    DOD efforts       in this area.
                                  The DOD appreciates             the opportunity           to comment on the     draft
                            report.
                                                                            Sincerely,




                            Enclosure:
                            As stated




                                   Page 78              GAO/PEMB914 IWlhry Technical-Traidng JZffecdvenessIs Unknown
          AppendixV
          Comment8Pn3mtheDepartmentofDefense




i
                      GAO DRATT REPORT-DATED MAY 31, 1990
                        (GAO CODB 973276) OSD CASE 8371
              "MILITARY TRAINING:   EPS’ECTIVENESS FOR TECHNICAL
                      SPBCIALTIBS INADEQUATELY MEASURED"
                          DBPAR-          OI DEFENSE COMENTS
                                         **a***

                                          TINDINGS

    FINDING A:     Backaround:       Recruit   Oualitv.        The GAO reported   that,
    if the entry level aptitude,          knowledge,      and skills    of new
    recruits   should fall      short of human requirements           needed to oper-
    ate and maintain     new technologically         sophisticated      weapons sys-
    tems, greater    demands would be placed on the Armed Services                to
    compensate for the shortfall          through    training.       The GAO observed
    that the recruit     quality     had grown in the eighties,          as evidenced
    by the following     statistics:
                -   in 1980, 68 percent           of recruits    were high school
                    graduates, by 1986,           92 percent    had high school diplo-
                    mas; and
                -   in 1980, 65 percent    of the recruits   were in the top
                    three mental categories     on the Armed Forces Qualify-
                    ing Test, compared with 96 percent     in 1986.
    The GAO also     reported    that:
                -   the number of young people             available   for the military
                    recruit    pool will continue          to diminish    until the
                    mid-1990s;
                -   by the year 2000, five of every six new labor force
                    entrants    will   be female, minority group members, or
                    immigrants;      and
                -   the graduates    of the American educational   system are
                    said to be falling     behind the youth of competitor
                    nations  in technological     literacy--while, at the same
                    time, weapons systems become increasingly      sophisti-
                    cated.
    The GAO also reported   that the Air Force has expressed         concern
    about the quality   of recruits,     the Navy noted an erosion      of its
    Delayed Entry Pool, and for the first        time in 8 years,    the Army
    failed  to meet its quarterly     recruiting    quota in the first    quar-
    ter of FY 1989.    (pp. l-l    to l-5/GAO Draft Report)




          Page79                GAO/~Sl4MilltuyTeehnicrl-TrainlngEff~tiveneseIsUnlolown
         Appendix V
         CommentaProm the Department of Defense




                                                                                                  2



DOD Re#~~na@:           Concur.       While the statements          attributed       to the
Services     are essentially          correct,       they do not provide        the "big
picture."       Since FY 1984, quality               in the Air Force has remained
stable     at 98 to 99 percent           high school diploma graduates               and
98 to 100 percent         individuals         who score average or above on the
enlistment      test.     Simultaneously,            Air Force recruiting         objectives
have fallen       from 60,000 in FY 1984 to 43,000 in FY 1989, making
it easier     to meet its goals with high quality.                      Although     the Navy
Delayed Entry Program pool eroded in FY 1989, it is back on
target.      And while the Army did not achieve                 it's      first   quarter
FY 1989 recruiting           objective       (enlisting     all but 475 of the 24,143
people it sought),            it finished       FY 1989 exceedinq         the objective.
In addition,        the impact of the mid-1990s dip in the size of the
youth population         will    be moderated by reductions               in accession
requirements        that are likely         to be part of the overall             down
sizing     of the military         during     this decade.
The GAO report        alSO mentions       that American youth are falling
behind youth of competitor             nations      in "technological               literacy."
While unaware of the existence               of international             "technological
literacy"    data, it is the DOD objective                    to enlist      those youth who
can acquire       the skills     to field      sophisticated           weapon systems.            To
that end, the education            of the nation's            youth is of paramount
importance      to the DOD. Given students'                   lackluster       performance        on
both national        and international         tests over the last decade, the
DOD has formed a collaborative,                working        arrangement          with the U.S.
Department      of Education,       whereby the Department                is assisting         them
with development         and fielding       of new international                literacy
tests.     The DOD is also experimenting                  with those same tests with
hopes of improving          the Joint-Service            enlistment       test.         The
Department      shares the GAO concern and hopes to have much-
improved,     international        comparative        literacy        data over the next
several    years.
  INDI c e : Th* QU1itv             0 f Militarv     Recruits--1981-1989        Test
LULL.          The GAO reported         that the Armed Services          Vocational
Aptitude      Battery     is comprised        of ten subtests      measuring    abilities
considered      important       for Military       Service.      The GAO also reported
that all the Services             use the same component subtests            for two
composite      scores;      the Electronics        composite     and the Armed Forces
Qualification         Test, which is the primary            mental criteria      for
entry    into the Armed Forces.               The GAO found the following          regard-
ing Armed Forces Qualification                 Test:
            -    overall   scores       improved      about    4 percent       between     1981
                 and 1989;
            -    male recruit     scores began and ended the                   decade
                 slightly   higher    than female scores;




          Page 80                  GAO/PEbU%914Milltm-y Technical-Trainhg Effectiveness Is unknown
        Appendix V
        Chnmenta From the Department of Defense




          -   scores differed          more substantially          across      racial
              groupings   than        between genders;
          -   white recruits   scores began the decade 10 percent
              higher  than minority   scores and ended 7 percent
              higher;
          -   mean scores for all            Services     were significantly             higher
              in 1989 than 1981;
          -   Army scores began the decade substantially     below those
              of the other Services,  but by 1986, had reached the
              same level  as Navy and Marine Corps recruits;     and
          -   average Air Force scores have consistently     been higher
              than the other Services   and have not displayed   their
              tendency to plateau   at mid-decade levels.
The GAO found      the   following        regarding     the   Electronics       Composite:
              mean scores       rose      2 percent     between     1981 and 1989;
               scores peaked         in   1984 and have shown a gradual                 decline
               since then;
               female recruits    scored approximately     5 percent                    lower
               than male recruits     during the eighties;
               white recruits         scored about 11 percent   higher than
               minorities    in      1981 and 9 percent  higher by 1969;
               the narrowing    of the gap for minorities,     however, was
               achieved   in the first  half of the decade--by     1989,
               scores for all racial    groups were declining;
               the interservice   pattern              of scores mirror  those of the
               Armed Forces Qualification                 Test, with the Army making
               up a 10 point difference                with the Navy and Marines by
               1986, and the Air Force                on top throughout;   and
               mean scores   for the three Services  changed very little
               from 1985 to 1988, but Army and Navy scores declined
               significantly    in 1989. (pp. 2-1 to 2-T/GAO Draft
               Report)
DoD ReeROnee:          Partially     concur.    Although    the       individual     calcu-
lations     have not been corroborated           by the DOD         due to time con-
straints,       trends   reported     in the Armed Forces           Qualification       Test
score data presented             for comparison    of groups         (i.e.,    gender,
race/ethnicity,         and Service)      look  reasonable,         as do the trends




        Page81                  GAO/PEMl.W4BUituyT            echnkd-Trddng      JWfectivenew L9Unknown
            Appendix V
            Comment.8From the Department of Defense




r
                                                                                                4


    reported  regarding   the Electronics              Composite.       Some technical
    questions   suggest,  however,   that           clarification       may be necessary
    in the GAO narrative.
    For example, the GAO report              states    that Armed Forces Qualifica-
    tion Test "scores           improved about 4 percent             between 1981 and
    1989."       In other statements,         various       percentage      changes are
    mentioned       for the Armed Forces Qualification                  Test and the Elec-
    tronics      Composite.        Computing percentage           gains or changes in
    subtest      standard      scores is not statistically              appropriate.       Scores
    on the Armed Services            Vocational      Aptitude      Battery,     of which the
    Armed Forces Qualification              Test and the Composite scores are a
    part,     do not have a meaningful            zero point       and, therefore,       per-
    centage changes cannot be interpreted.                      Computation       of percent-
    ages requires          a ratio   scale,   which is more powerful              than the
    score scale for all aptitude              tests,      including      the Armed Services
    Vocational        Aptitude     Battery.     The same limitation            applies   to
    interpreting        changes on the Electronics              Composite.
    Some factors     related      to changes in how scores have been computed
    are relevant,     particularly       since the report         examines scores
    across several      years.      Between 1981 and 1989, there were several
    changes in the Armed Forces Qualification                  Test (e.g.,    the sub-
    tests used to compute the Armed Forces Qualification                     Test score
    were changed and the reference             population      for norming of the test
    was updated).       It is unclear       if the differences         in how scores
    were computed over the years were taken into account in the
    analyses   presented      in Appendix      1 and Figures       1, 2, and 3; clari-
    fication    as to these differences          appears appropriate,         otherwise
    comparisons    of means will        not be interpretable.            The same sort of
    changes occurred       over the years in the calculation               of the Elec-
    tronics   Composite and would affect            interpretation        of Figures    5,
    6, and 7.
    Finally,    with the large sample sizes achieved              in the data analy-
    ses, statistical     significance        can be observed for differences
    that have relatively        little    practical  significance.       For example,
    while the statement      that " . . . Navy scores declined            signifi-
    cantly   in 1989 (relative         to 1988)" is true,      the drop was from a
    score of 211.58 in 1988 to a score of 210.40 in 1989.                   That small
    a drop from one year to the next would be worth noting,                  yet not
    cause for alarm.
    TINDING C: Tha uitv                of Militarv    Recruits--Number      of Recruits
    auQg                                                                the Peri
    1981-198s.       The GAO reported          that , as another measure of recruit
    qualification       trends,     it enumerated the number of recruits           whose
    Armed Services        Vocational      Aptitude   Battery    scores met minimum
    standards     required      for entry into two selected          high technology




            Page 82                 GAO/PEMD-@I4Military Technical-Tdning Effectiveness        Is   Unknown
        Appendix V
        Chunents From the Department of Defense




                                                                                              5


military   specialties:        (1) air traffic    controllers   aC.ci (2) sys-
tems repair    technicians.         The GAO found the follcwlng      for the alz
traffic  controller      specialty:
            -   in 1981, approximately        38,000 recruits quaiified  for
                the specialty      and by 1986, more than 69,300 recruits
                qualified--but,       since then, the number qualifying    has
                declined      to 58,000;
            -   in 1981, 87 percent            of the qualifying          recruits    were
                white males, while            two-thirds   of all       recruits     were
                white males;
            -   by 1989, 84 percent of the qualifying    recruits    were
                white males, while only 61 percent    of the recruits
                were white males
            -   while one third     of the white males enter:ng            the Ser-
                vice qualified     on the basis of their         Electronics
                scores,   fewer than 15 percent        of the white females so
                qualified    and fewer than 10 percent        of the alnority
                males and 3 percent      of the minority       females qualifies
                on the basis of their      Electronics      scores.
The GAO found       the   following     for    the   Systems   Repair      Techniclan:
            -    in 1981, the number of qualified     recruits     for the
                 System Repair Technician   specialty    n,dmbered 16,563
                 and, by 1983, the number had increased        sharply--but
                 by 1989, it had fallen   back to within      700 of the 1981
                 level;  and
            -    the vast majority   of those qualified  were white
                 males, of whom 11 percent    qualified compared with
                 less than 2 percent   for other demographic   groups.
The GAO concluded      that,   based on its review,               recruit       quality
trends during     the eighties      are not reassuring.                The GAO also
observed that fewer recruits           are qualifying           for the more demand-
ing technical     occupational      specialties.            The GAO further            con-
cluded that,     with women and minorities             forming the bulk of the
new entry labor force by the year 2000, providing                          well-trained
personnel     for a technologically        sophisticated            military       can be
expected to become increasingly            difficult.           The GAO also noted
that,   in turn,    the burden on training            will    increase,         along with
the need to monitor       its effectiveness.               (PP. 2-7 to 2-ll/GAO
Draft Report)
DOD R@aWnrQ:   Partially     concur.        Providing     well-trained              person-
nel will become increasingly      difficult         shouid recrulc              quality




         Page 83                 GAO/PEMD-914 Military Technical-Training Effectiveness Is unknown
              Appendix V
              Chnmenta Prom the Department of Defense




r-                                                                                                        -

                                                                                                      6


     diminish.       However, the DOD does not consider             that recruit   qual-
     ity trends during the eighties,           particularly       the mid-to-iate
     198Os, are troublesome.           During the last half of the decade,
     recruit     quality     has never been better.         Compared to the youth
     population       from which the DOD recruits,          the quality    level has
     consistently        been well above average.          For example,    in FY 1989,
     92 percent       of new recruits     had a high school diploma,          in contrast
     to 74 percent        in the youth population.          Also,   in FY 1989, 94
     percent     of new recruits      scored average or above on the enlistment
     test,    compared to 69 percent        in the youth population.
     Although     it is reasonable             that the GAO would want to assess how
     the aptitude       of recruits          for technologically           sophisticated       spe-
     cialties     has changed since 1980, the methodology                        selected    to do
     so is flawed.           Equating      a decline       on the Armed Services          Voca-
     tional    Aptitude        Battery's       electronics      composite      to a decline      in
     recruits'      "technological           sophistication"          is inappropriate.         The
     electronics       composite         is composed of four subtests               that measure
     mathematics       ability       (arithmetic        reasoning       and mathematics      knowl-
     edge), general          science,      and electronics          information.        As the
     report    Figure     8 indicates,           the decline      in performance        on the
     composite      is driven        primarily       by the decline        in performance       on
     one subtest--electronics                information.
     There is also a flaw in the example used by the GAO beginning                         on
     page 2-8, wherein the report            refers     to the Air Traffic      Control
     specialty     as having a minimum entry standard               as of May 1989 of
     230 on the Electronics          composite      (in standard     score form).      Air
     Traffic    Control,     Air Force Specialty          Code 272X0, is selected        on
     the General Composite and has never had an Electronics                     require-
     ment.     That renders      report   Figure 9 incorrect,          if based on the
     composite     described     in the text.         The GAO may have actually
     performed     its analyses      on the specialty        titled   Aircraft    Control
     and Warning Radar Specialist,            Air Force Specialty          Code 303X2; in
     report    Table 3.7, that specialty            is correctly     reflected    as having
     an Electronics       Composite qualifying          score of 230.
     The other specialty             used by the GAO in this          finding      is Systems
     Repair Technician,             an occupation     so specialized         that it is not
     assigned      an Air Force Specialty            Code, but is identified           by a
     Reporting      Identifier         (99104).    It would be appropriate            for the
     report      to mention that individuals            qualifying        for this specialty
     are not qualified            for a "typical"      high-technology          job, but are at
     the very highest           end of the technical         continuum.         A footnote
     identifying       the specialty         and its cutoff       score requirement        would
     be appropriate,           similar    to the footnote       given at the bottom of
     page 2-8 for the other specialty.
     It is speculated     that the test score decline     on the electronics
     information   subtest    is attributable to nationwide     educational




              Page 84                 GAO/PEMD-914 Military Te~~lmical-TminingJWfectivenessIs Lhknown
              Appendix V
              Comments From the Department of Defense




                                                                                                       7


    curriculum        changes.       Over the course of this decade, dramatic
    changes have occurred             in public     and private       elementary       and sec-
    ondary education          programs.        These reforms have been well publi-
    cized and documented.               AS high school graduation              standards    have
    become more stringent,              students    have had fewer opportunities              to
    take elective         coursework.         Consequently,       enrollment       in vocational
    education       courses,     like electronics/electricity,                 has declined
    dramatically.          Throughout       the 198Os, recruit          quality,     as measured
    on the Armed Services             Vocational     Aptitude      Battery's       Armed Forces
    Qualification         Test composite,        has improved.          However, as the GAO
    pointed      out, performance          on the electronics         subtest/composite         has
    declined.         Again, this is considered             to be an artifact          of the
    educational        reform movement.          Students      simply are no longer
    enrolling       in the technical          and trade vocational           classes where
    they can learn basic electronics/electrical                       constructs.
    The electronics          composite     is a valid      predictor         of success in
    training      and on the job for occupational                  specialties      requiring
    electronics/electrical             knowledge.       Given that it is also known
    that youth are taking            fewer formal courses in this area prior                      to
    entry into the military,             the DOD is interested               in improving     its
    ability     to select       and classify      recruits       into electronics-related
    occupations.           To that end, there is research                in progress      to
    improve the content           of the current        enlistment         test.    A number of
    large-scale        research    projects,      on both new paper-and-pencil                and
    computerized        tests,    are underway in hopes of finding                  better
    predictors       of performance        in military       training        and occupations.
    The Department          reiterates,      however, that it is inappropriate                 to
    equate performance            on the electronics          composite      with recruits'
    overall      "technological         sophistication"         and to conclude         that this
    sophistication          has declined       over the decade of the 1980s.                 Unfor-
    tunately,        there is no way to conduct a historical                     study on this
    subject.         The DOD concurs with GAO researchers                  that the youth and
    entry-level         labor force demographics             are changing        and that the
    Department         needs to study carefully            the effects       of its enlistment
    test and concomitant             composites       on the people       (e.g.,     women,
    minorities)         that will      be recruited      in the future.            To that end,
    the results         from enlistment        test research        described       above are
    expected       to be helpful        in making future         enlistment       test deci-
    sions.
    FINDING D: Schoolhouse            Measures of Traininc          Effectiveness--Anny.
    The GAO reviewed       course grades in Army advanced individual                   train-
    ing courses for five occupational               specialties       to determine     the
    extent   to which appropriate          data were available          to the Military
    Services     for use in judging        training     effectiveness.          The GAO
    found that the course grades for the five specialties                        were not
    equally    reliable    indicators      of performance        during training.          The
    GAO noted,      for instance,      that at Fort Gordon it was unable to




L


             Page 86                  GAO/PEMD91-4 Military Technical-Tdning Effectiveness Is Unknown
       Appendix V
       Chnmenta Prom the Department of Defense




find a consistent          relationship         between milestone         measures and
final    grades, nor was it able to locate anyone who could suggest
a relationship.           The GAO concluded          that the grades recorded for
two of the courses           (36L and 39B) could not be used to discrimi-
nate reliably        among the performance             of individual       trainees.         The
GAO found inconsistencies               in scoring      between different           classes
and even within         the same class.            The GAO also found that Fort
Gordon's grades (unlike              Redstone's      grades) were based partially
on measures of physical              conditioning       that appeared to be unre-
lated to job performance.                 The GAO concluded         that the psychomet-
ric differences         it found at Fort Gordon appeared to be the result
of a number of factors             including       (1) questionable        data entry
procedures      and software         and (2) the pass/fail            nature of the
criteria     used to evaluate           student     progress.       GAO suggested that
subject     matter experts         need to develop more finely               tuned,     objec-
tive,    and reliable        measures of performance             than *go/no-go."          The
GAO noted that,         because of the problems encountered                    at Fort
Gordon, it excluded           those courses from its sample of Army train-
ees, resulting        in the inclusion           of all recruits         who completed 245
and 27N training          between October 1987 and July 1989, and approxi-
mately one-third          of those who completed 29V training                   during the
same period.
The GAO found that,         on the Armed Forces Qualification               Test and
the Electronic       Composite,      male trainees       scored significantly
higher   than did females and white trainees                 performed     better  than
minority    students.       The GAO further        found that the training
performance     differences       correspond      with the test score differ-
ences on both tests         for the racial        groupings.       The GAO noted that
for gender,     training      performance     differences       between males and
females were larger         than test score differences.              The GAO also
found that the Electronics            Composite is a better          predictor    of
success than the Armed Forces Qualification                   Test.
The GAO further         found that,        for its entire         sample, the score on
the Electronics         Composite explains             18 percent     of the variation         In
course grades, more than the Armed Forces Qualification                             Test--and
a GAO-developed         "factor     score,"      which is the weighted            sum of all
Armed Services        Vocational        Aptitude      Battery     subtests.       The GAO
concluded     that,     for males, the Electronic                Composite score appears
to be a better        predictor       of future       performance      than the Armed
Forces    Qualification         Test. The GAO found, however,                that for
females,    the Armed Services            Vocational        Aptitude     Battery    "factor
scores"    are better        predictors       of schoolhouse         performance      than the
Armed Forces Qualification               Test, which is a better             predictor      than
the electronics         composites.          The GAO noted that for minority
soldiers,     the ability        to predict        training      course grades based on
test scores is the weakest of all groups.                         The GAO concluded         that
the Armed Forces Qualification                 Test, or some other general               score
 form the Armed Services            Vocational        Aptitude     Battery,      may provide




        Page 86                 GAO/PEAID814 Milltuy Technical-Train&q Effectiveness Is L’nJuwwn
         Appendix V
         Comment.9From the Department of Defense




                                                                                                     9


a  better   predictor      of success for women recruits         ln electronics-
related   training      than does the Electronics         score.    The GAO fur-
ther concluded       that better      predictors   of training     performance   are
needed for minority         students.      (pp. 3-l to 3-T/GAO Draft Report)
DOD Rtrvonsq:          Partially        concur.       The Army's testing             procedures
for soldiers       undergoing          Advanced Individual             Training      are designed
to ensure that soldiers                achieve     specified       training       objectives.
To accomplish        this,     criterion-referenced             hands-on performance
tests      are administered          and scored on a "go/no-go"                  basis.       Such
tests are routinely            used in the military              to evaluate         training
effectiveness        because they provide               meaningful        information         to
course managers on student                 performance,       as well as information                 on
the degree to which the course is meeting its stated objectives.
Given that such tests              are    not designed to measure the relative
performance      of individuals             (i.e.,    these measures are not norm-
referenced),       it is neither           surprising       nor particularly             disturbing
that the GAO found such test results                       unsuitable         for correlational
analysis.       Criterion-referenced               measurement,          such as the
 “go/no-go1     measures used by the Army, are a psychometrically
 sound method when mastery learning                     is the goal of instruction                  as
 is the case under discussion.
As with other findings               in the report       that describe        trends      In e,he
Armed Forces Qualification                Test scores and examine differences
for groups (e.g.,           gender and race/ethnicity),               the statements
about training          performance       differences         appear reasonable.           How-
ever, there are problems with some of the specific                            analyses      the
GAO indicates         were performed          to reach those conclusions.                 For
example,      in the Army sample, students                  from three courses were
pooled to increase            the sample size and the course grades for --he
various      specialties        were assumed to be on the same score scale,
or to have the same meaning.                    In fact,      course grades tend to be
on course-unique           metrics     and there is no way to evaluate                  whether
a score of, say, 90 in one course                    means the same in terms of
competence as a score of 90 in another course.                           Thus, the mean
reported      as an average of grades for the three Army courses is
not meaningful           and the relationship           to scores from the Armed
Services      Vocational        Aptitude      Battery     is tenuous.       hote that for
large samples,           such as white males, the differences                   in the score
scales tend to average out and the correlation                          coefficients         are
reasonably        interpretable.          For small samples, however,                the dif-
 ferent     scales for course grades are likely                    to distort      the corre-
 lation     coefficients        and means.        Since the same analyses              of
schoolhouse         measures of effectiveness               were used for each Service
  (Findings     D, E, and F), additional              comments applicable            to all
 appear in the DOD response to Finding                      G, the summary finding            on
 schoolhouse        measures.




          P8gt 87                   GAO/PEMD-814 Ib¶il.it.al’yTedmid-Tminhg          Effectiveness Is Unknown
          Appendix V
          Comments Prom the Department of Defense




                                                                                                    10


FINDING     E:      Schoolhouse        Measures    of     Traininu          Effectiveness--Navy.
The GAO reported     that it examined scores on four training
courses-(l)   Sonar Technician    Anti-Sub   Warfare Surface,      (2) Sonar
Technician  Anti-Sub     Warfare Subsurface,   (3) Aviation    Fire Control
Technician,   and (4) Aviation    Anti-Sub   Warfare Technician.       The
GAO found the following:
            -    male recruits entered training                      with   significantly
                 lower Armed Forces Qualifications                       Test scores and
                 significantly higher electronics                       scores than females;
            -    final     grades for males were slightly,     but signifi-
                 cantly     lower than their   female classmates,    suggesting
                 that a substantial      advantage   in the Armed Forces Qual-
                 ification      Test can overcome an advantage     in Electron-
                 ics; and
            -    minority  students     began training                 with substantially
                 lower scores on both composites                      but their  final    grades
                 were not significantly      different.
The GAO drew the           following       conclusions:
            -    that the         Armed Forces     Qualification               Test may be more
                 important         for training     success than               Electronic's;
            -    that for most Navy groupings,       the Armed Forces Quali-
                 fication  Test scores are better       predictors of schooi-
                 house performance   than Electronic       scores;
            -    that for females,            the Electronics               composite is the
                 weakest predictor            and the "factor               score" is the stron-
                 gest;  and
            -    that the ability           of any of the three scores                      to predict
                 training  success          is weakest for minorities.                        (PP. 3-7
                 to 3-S/GAO Draft           Report)
DOD RenPonsQ:        Partially      concur.       While the GAO concluded       that
the Armed Forces Qualification                Test may be more important        for
predicting    training        success than the Electronics         composite      and
that for most Navy groupings,              the Armed Forces Qualification            Test
scores are better        predictors      of schoolhouse     performance      than
Electronics     scores,       a recent Navy Personnel       Research and Develop-
ment Center validation            report    found the opposite     result,     with an
average validity       coefficient       of .59 for predicting         "A" school
success from the Composite vs. an average coefficient                      of .46 for
prediction    from the Armed Forces Qualification                Test.




          Page 88                   GAO/PEMD914 Military Technical-Tmining EKedveness Is Unknown
         Appendix V
         CommentsFrom the Department of Defense




The report     also states that the Electronics           Composite is the
weakest predictor        and the Factor score is the strongest             for
females.     However, statistical         results   from such a small sample
 (76 females)      would not be stable       enough to warrant      policy
changes.      The results      reported   by the GAO, in all probability,
would not be replicated           given a larger    sample.     Also, the
adjusted    validity    coefficients      for range restriction        in report
Table 3.6 show for the Female Factor Score composite                   an increase
of .42.     That result      is suspect,     as normally    such adjustments
rarely   provide     an increase      of more than .20.
It should also be noted that               only one of the four training
courses represented      is even         open to women (Aviation          Anti-
Submarine Warfare Technician),                which is not evident        without     close
study of report    Table 3.6.            The data for males in report             Table
3.6 is the result    of merging            four training    courses and produces
an unorthodox   analysis    that         requires    an explanation       of grading
differences   which may exist            for the different       schools.
As with the previous            finding,     trends  in the Armed Forces Qualifi-
cation     Test scores and the Electronics              Composite in Navy courses,
including      differences        for groups (e.g.,       gender and race/ethnic-
ity),     appear reasonable           with respect   to schoolhouse          measures of
training     effectiveness.            However, the problems with some of the
specific     analyses      the GAO indicates        were performed         to reach those
conclusions        remain a factor.          In the Navy sample, students             from
four courses were pooled to increase                 sample size and the assump-
tion that course grades for the various                    courses have the same
meaning is tenuous.             That limits      the confidence       in interpretation
of the relationship           to scores from the Armed Services                Vocational
Aptitude     Battery.        Note that for large samples, such as white
males, the differences              in the score scales tend to average out,
and the correlation           coefficients       are reasonably       interpretable.
For small samples, however, the different                     scales for course
grades are likely          to distort      the correlation        coefficients       and
means.      Additional       comments applicable         to all appear in the DOD
response to Finding           G, the summary finding           on schoolhouse        mea-
sures.
FINDING     P:     Schoolhouse      Measures    of   Traininu    Effectiveness--Air
Forcq.       The GAO reported         that it examined four Air Force cours-
es--(l)      Aircraft     Control     and Warning Radar Specialist,              (2) Auto-
matic Tracking         Radar Specialist,            (3) Photo-Sensors      Maintenance
Specialist,        Tactical    Reconnaissance          Sensors,   and (4) Photo-Sen-
sors Maintenance          Specialist,       Reconnaissance       Electra-Optical
Sensors.        The GAO found that,           like the Navy, (1) "factor           scores"
are as good or better            predictors        than composites,       (2) for the
female students,          the Armed Forces Qualifications               Test scores and
factor      scores out predict          Electronic      scores,   and (3) it is most
difficult       to predict     course grades for minority              students,




         Page 89                 GAO/PEMD914 Military Technical-Trdning Effectiveness Is Unknown
       AppendixV
       CknnmentsFromtheDepartmentofDefense




although     factor   scores explained      10 percent    (46 percent  after
adjustment).        The GAO concluded     that because of problems with
some Army data, and the special           preparation     of data by the Navy
and Air Force, it would not be appropriate              to make inter-Service
comparisons      or make firm judge- ments about the immediate avail-
ability    of psychometrically      suitable     measures from the Navy and
the Air Force (pp. 3-8 to 3-lo/GAO Draft Report).
DOD Response:          Partially     concur.      As with other findings         in the
report,     which describe        trends     in the Armed Forces Qualification
Test scores and examine differences                  for groups (i.e.,       gender and
race/ethnicity),           the statements       about training      performance       dif-
ferences      appear reasonable.           The problems with some of the analy-
ses the GAO indicates            were performed        to reach those conclusions
restrict      interpretability          of the findings,       as was stated      in the
DOD response to Findings             D and E. Additional           comments appear in
the DOD response to Finding               G, the summary finding          on schoolhouse
measures.        The DOD does concur,           however, with the final         statement
in Finding       F, which indicates          it would not be appropriate            to make
inter-Service         comparisons.        In addition,      research    performed       by
the Air Force Human Resources Laboratory                    confirms    many of the GAO
findings      about general       ability      (such as is measured in the Factor
Scores the GAO examined)             as a valuable       predictor     of schoolhouse
performance.
FINDING G: Schoolhouse             Measures        of Trainina        Effectiveness--Sum-
mar-v. The GAO questioned               the differential            success in training
for males and females and for whites and minorities--and                                  about
the differential          predictive        validity       of the Armed Services             Voca-
tional     Aptitude      Battery     for these groups.              The GAO concluded           that
its analysis        of gender and race-related                 differences         in mean Armed
Services       Vocational      Aptitude      Battery       scores and course grades in
the Army suggest that the Electronic                       composite       is an efficient
simple predictor          of training        success.        The GAO found, however,
that in the Navy and Air Force, a more complex relationship
exists      between the Armed Services                Vocational        Aptitude      Battery
scores and course grades.                  The GAO noted that gender and race-re-
lated differences           in course grades were quite                   small compared with
significant       differences        in Electronics           scores.         The GAO concluded
that an advantage           in more general           aptitude,       measured by the Armed
Forces Qualification             Test, can compensate for a deficit                      in Elec-
tronics--when         the deficit        is not too great.
The GAO also noted that,       while the Armed Services        Vocational
Aptitude  Battery's   Electronics        composite    score demonstrated      a
moderate ability    to predict      training     success for white students
and males, it was less successful            for female or minority      stu-
dents.   The GAO concluded      the Factor Score that it derived            was,




        Page90
        Appendix V
        Comments        From    the Department of Defense




                                                                                                     13


in most cases, the best predictor      of training    success because                           it
utilized information      from all ten Armed Services     Vocational
Aptitude Battery     subtests.
The GAO concluded         that,    based on its work, it was impossible                        to
determine     whether the Armed Services               Vocatronal       Aptitude         Battery
is a weaker measure of ability                for some groups--or             if some other
factor    in schoolhouse        training      contributes       differentially            to the
success of the different             groups.       The GAO noted that the relative
inconsistency      between school grades and test scOres exists                             and
should be addressed by both the recruiting                      and training           communi-
ties.     The GAO further         concluded      that it will        become increasingly
incumbent     on the Services          (1) to optimize        selection         criteria        for
advanced individual          technical      training      for women and minority
groups,     (2) to provide        compensatory       training       where needed, and
 (3) to assure that no extraneous                factors     within     the training
environment     interfere       with the full        achievement        potential.            (PP.
3-10 to 3-13/GAO Graft Report)
DOD Resmonse:          Partially        concur.    With respect      to GAO findings
describing     trends       in the Armed Forces Qualification               Test scores
and the Electronics            Composite and examining          differences      for
groups (i.e.,       gender and race/ethnicity),              the statements        about
training    performance          differences      appear reasonable.         The analyses
of the relationships             of scores from the Armed Services              Vocational
Aptitude     Battery      (Armed Forces Qualification             Test, Electronics
Composite,     and Factor Score) and school grades are flawed and,
consequently,        interpretation          of the results     of those analyses         is
doubtful.      Because the same analytic               procedures      were used for all
Services     and similar         conclusions      drawn, the following        comments
pertain    to Findings          D, E, F, and G alike.
Problems         with    the     analyses      arise      from the    following    sources:
            -      pooling    students     from several  courses, when the grades
                   for different       courses generally   are not comparable;
            -       correction  for restriction       of range               on the Factor
                    Score, which resulted       in correlation                coefficients     that
                    are not plausible;
             -      lack       of regression        analyses;      and
             -      small       sample      sizes   for    females.
In each Service,      students for several    courses were pooled to
increase   sample size and the course grades for the various
courses within     each Service were assumed to be on the same score
scale,   or to have the same meaning.        In fact,   course grades are
not normally    interpretable    from course-to-course,      because of



                                                                                                             ,



         Page 91                      GAO/p~91-4           Military Technical-Tmining Effectiveness Is Unknown
         Appendix V
         Comments Prom the Department of Defense




                                                                                            14


between-course        differences        in scales and the level of competency
inferred      by a particular         score.      There is no way to evaluate
whether a score of, say, 90 in one course means the same as a
score of 90 in another course.                   (For the Army, three courses were
combined,      four courses for the Navy, and four for the Air Force.)
Thus, the mean grades reported                 for courses :n each Service are
somewhat arbitrary           numbers and their         relationship    to scores from
the Armed Services           Vocational      Aptitude     Battery   is tenuous.     Note
that for large samples, such as white males, the differences                          in
the score scales tend to average out, and the correlation                         coeffi-
cients     are reasonably        interpretable.          For small samples, however,
the different       scales for course grades are likely                to distort     the
correlation       coefficients        and means.
The correlation        coefficients        for the Factor Scores are suspi-
ciously     high,    especially     after     correctlon       for restriction         of         .
range.      The Factor Scores are based on the first                    principal        compo-
nent of the Armed Services              Vocational       Aptitude   Battery       and the
weights     tend to be uniform          (from .lO to .14).          The Factor Score
is the sum of the 10 subtest               standard      scores and the correlation
coefficient       could be computed using the correlation                    of sums. An
important      point    is that the weights           are not regression          weights
computed to maximize the correlation                    between the aptitude           test
scores and course grades;             instead,      the correlation         coefficient
for the Factor Score is, in effect,                   the average for the 10 sub-
tests.
In previous         studies,     the four subtests        in the Electronics     Compos-              I
ite    (Math Knowledge,          Arithmetic      Reasoning,    General Science,      and
Electronics         Information)       repeatedly     tend to have the highest
correlation         wrth course grades in these kinds of courses.                  As a
rule,     therefore,       the correlation        with course grades should be
higher      for the Electronics           Composite than for the Factor Score.
Deviations        from this expectation           may be attributed     to artifacts,
such as restriction            of range.
The GAO report     recognizes      that correlation      coefficients         in sam-
ples cannot be compared directly           because of range restriction.
Adjustments    are made to compensate for differences                  in restriction
of range.     The adjusted     values for the Armed Forces Qualification
Test and Electronics       Composite are plausible           in that they are
consistent    with other analyses;       the adjusted        values for the
Factor Score, however, are unduly high and they lack plausibil-
ity.    The procedure     used to correct       for restriction          of range
should be based on the multivariate             model, which involves            complex
formulae    and computing     routines.      The simpler       univariate       model
may have been used, which could distort             the adjusted          values for
the Factor Score.




         Page 92                 GAO/PFMD91-4 Military Technical-‘Wdning Eflectivenesa IS Unknown
          Appendix V
          CommentaFrom the Department of Defense




                                                                                               15


Comparisons       are made by gender and minority                  status    based on mean
scores and correlation           coefficients.             Conclusions      about the
appropriateness         of the Armed Services              Vocational     Aptitude    Battery
for females and racial/ethnic               minorities         are then based on these
comparisons.         Such comparisons         are a good place to start,              but
analyses      of gender and race differences                 should include        a compari-
son of the respective           regression       lines      (slopes and intercepts),
errors      of estimate,      and cutoff      scores.        Analyses     of differences
in mean performance           on predictors,         final     school grades,       and
differences       in validity      coefficients         are not, by themselves,
sufficient.         With the more thorough             regression       analysis,    meaning-
ful conclusions         can be made about the appropriateness                     of aptitude
tests     for female and racial/ethnic               minorities       compared to white
males.
Even if the DOD were to fully                    concur with the statistical                analy-
ses performed,           interpretation          of the results          for females would
remain problematic              because of the small sample sizes.                      The number
of females with course grades in the samples are 18 for the Army,
71 for the Navy, and 98 for the Air Force.                            With such sample
sizes,      differences         in scales for course grades may be exacer-
bated;      correction        for range restriction              could lead to illogical
correlation         coefficients;          and regression         equations       with up to 10
predictor        variables        would result       in unduly high correlation.
Issues of generalizing                to other samples and of making policy
decisions        about selecting           females and assigning             them to technical
specialties          should always be considered                 extremely      carefully      and
be based on thorough                analysis.       Replication       of results        is the
a       qya non of analysis              and an adequate sample size is a good
foundation         for replication.             The conclusion         "that    the Services
should consider            developing         a more general        ASVAB (sic) derivative
such as our Factor Score to assign women and minorities                                   to tech-
nical     training"         (p. 5-2 and 3) is reasonable,                  and could be pur-
sued by the military                manpower research          community.         The report
provides        a stimulus        to continue       efforts      to improve the effective-
ness of selecting              and classifying         recruits,       especially       for minor-
ities.
FINDING    H:     Fidd      Me8surar     of Traininu    Effectiveness--Arm.               The
GAO reported        that,     although     it was aware of numerous post-train-
ing evaluation         activities       performed by the individual            services,
only the Army could provide                individual   performance        measures.       The
GAO reported        that,     by Army regulation,       a soldier's        occupational
specialty     performance          is tested within     6 months of completion             of
training    and every year, thereafter,               under the Skills         Qualifica-
tion Test program.              The GAO found the following            regarding     the
Skills    Qualification           Test scores:
             -   the best predictor   of Skill              Test   scores     are final
                 schoolhouse  grades;




                                                                                                           J




          Page 93                 GA0/PEMD914bUli~Technical-'IYaMng              EffectivenessLs Unknown
        Appendix V
        Cbnment.8 kom the Department of Defense




                                                                                         16



           -   the Armed Forces        Qualification     Test and Electronics
               scores were also        significantly     related   to the Skill
               Test scores for        whites and males, but factor       scores
               consistently  out       predicted     the composites;
           -   for females and non-white          soldiers,    the Armed services
               Vocational      Aptitude   Battery    scores were not positively
               related      to future   performance,      as measured by Skill
               Qualification        Test scores;   and
           -   the grades scored by females at :he                 schoolhouse      were
               inversely  correlated with the Skill                Qualification      Test
               scores.
The GAO concluded    that the traditional        Armed Services     Vocational
Aptitude  Battery   scores may not be the best predictor            of perfor-
mance for the non-traditional      soldier--that       is, the female or
minority,  soldier.     The GAO observed that better         predictors     of
success for these groups should be found.             (PP. 4-1 to 4-5/GAO
Draft Report)
DOD Rerwnse:       Partially       concur.         The GAO appears to have incor-
rectly  assumed that Skill           Qualification          Tests have a common
metric  across different          specialties,        skill      levels,    and years.
Due to the requirement          to develop new tests               each year, individual
tests  are fielded      with a minimum of pretesting.                     As a result,
means and standard        deviations        across a specialty            and even across
years within    the same specialty              and skill      level     may vary greatly.
For example,    in the five specialties               studied       by the GAO, the
means on the individual           skill      level   1 test during         1985-1989
ranged from 74.5        to 88.4,        while standard        deviation      ranged from
3.5 to 14.7.
During the years 1985-1989,            more than 3800 different             tests were
administered       in more than 200 specialties            annually      across skill
levels     1 to 4. The Army Research Institute                is currently         analyz-
ing this data (more than 1 million                scores)   and intends         to report
Armed Services        Vocational     Aptitude     Battery   validities        by both
race and gender as well as for sample size whenever sample size
is adequate for such analyses.                Noting the GAO concern relating
to low validity        for blacks and females in their               study,     the Army
has computed validities           for these groups for the 1988 Skill
Qualification        Tests.     For 71 skill      level   1 samples comprised of
at least      50 females,     the median corrected         validity      is .58, for
samples of 50 or more blacks the median validity                       is .47; the
median validity        for 205 total       samples is .57.          While the Army
understands      the GAO focused only on highly             technical        specialties,




         Page 94                GAO/PE&ll%g14 Military Technicd-Trahing Effectiveness fs hknown
        Appendix V
        Comments     Ram   the Department    of Defense




total    accessions  in the five GAO selected specialties    numbered
only    310 compared to more than 120,000 for all speclalties     during
1988.

It is suspected           that the finding           is affected     by the small samples
of females and minorities                in the GAO analyses.             The finding     that
Armed Services         Vocational        Aptitude      Battery    scores were not posi-
tively    related       to Skill      Qualification        Test scores for females and
non-white      soldiers        is contrary        to the body of research          evidence
for predicting          training      grades in the schoolhouse.              The consis-
tent finding        in all Services            is that aptitude        scores are about
equally     valid     for females,         racial/ethnic       minorities,      and white
males, although           there may be some over or underprediction                    for
females and minorities.                Research results          also show that aptitude
tests predict         supervisors'         ratings      of job performance        for blacks
about as well as for whites.                     The results     presented    by the GAO
should be evaluated              in larger       samples.
The same problems noted earlier  with analysis   of schoolhouse
training  grades apply to this analysis of Skill   Qualification
Test scores:
            -      pooling of specialties   --Skill   Qualification     Test
                   scores are not on a common metric        across specialties,
                   and the same numerical     value in different     tests does
                   not, as a rule,   mean the same level       of competence;
            -      the correction for restriction       of range           on the Factor
                   Score leads to distortion      in the results;
            -      a regression        analysis      is appropriate    and was not per-
                   formed; and
            -      the sample size          of females     (18 or 21) is    inadequate      to
                   draw meaningful          conclusions.
Research in progress     pertaining                   to enlistment  test development,
including  computerized     tests,                will   examine impiications    for
gender and minority     subgroups.
FINDING I:     tield     bhasurea of Traininu      Effectiveness--Naw            The
GAO reported     that it considered       two possible       sources of field
information    routinely      collected   by the Navy as measures of the
effectiveness      of the training      courses--   (1) Level II surveys and
 (2) Advancement in Rating Examinations.               The GAO found, however,
that the Level II surveys have been effectively                   abandoned by the
Navy, with none having been performed             since at least        1986. The
GAO concurred      with the judgement of the test developers               and
administrators       that,   because the test is not standardized             and is




        Page 96                   GAO/PEMD-914 Military Technical-Trahing Effectiveness Is Unknown
       AppendkV
       Canments From the Department of Defense




                                                                                            ;a



not administered        to all graduates,         the Advancement in Rating
Examination    is     "not a good source         of training  evaluation feed-
back."
The GAO reported     that,   in 1986, the Chief of Naval Operations
requested    that the Naval Training          Systems Center determlne     the
current   status   of Navy training        evaluation   and provide   recommen-
dations.     The GAO further      reported     that,  while numerous non-for-
mal or non-centralized       activities      were ident:fled,     the Naval
Training    Systems Center found that:
             -   the quality of current   Navy schoolhouse   training
                 could not be readily   ascertained   for the vast major-
                 ity of the courses being offered;
             -   there     is    a lack   of technical     evaluation/assessment
                 skills;        and
             -   current  evaluation      activities     are fractionated,            not
                 comprehensive,       and operating    In an environment             of
                 obsolete   instructions       and unclear   objectives.
The GAO reported       that the Navy made a number of recommendat:ons
to upgrade and take a systematic             approach to training       evaluation.
According     to the GAO, the Navy has assigned           a three-person       team
to review the proposals          and recommend an integrated        training
appraisal     program.      The GAO concluded      that,  while the Navy should
be commended for its willingness             to acknowledge    past evaluation
deficiencies,       it seriously     questioned    whether this response is
appropriate      to the severity      and extensiveness     of the problems
that the Naval Training          Systems Center has documented.            (pp. 4-5
to 4-8/ GAO Draft Report)
DOD ResDonsq:          Partially       concur.     Level II surveys were discon-
tinued     by the Navy because they were paper-intensive                      and placed
an undue burden on the fleet.                  Moreover,     only limited      methods of
evaluating      the effectiveness           of schoolhouse       training     were in
effect     at the time the Navy requested                the Naval Training         Systems
Center to determine             the status     of evaluation       procedures     and make
appropriate       recommendations.            Since that time, however,           the Navy
has successfully         employed several          means of collecting         feedback on
training     effectiveness.            In addition     to the steps being taken by
the Navy to enhance training                evaluation      methods as reported         by
the GAO, several         other programs are underway.                  These include     the
 (1) Navy Training          Appraisal      Program,     (2) Navy Training       Require-
ments Review,         (3) Fleet Training          Appraisal     Program, and (4)
Maintenance       Training        Improvement Program.          These are discussed         in
more detail       in the following          paragraphs.




        Page96
     Appendix V
     Cmnment.9From the Department of Defense




                                                                                        13


A Navy training       appraisal     program was implemented           in March 1989.
The process provides         the Chief of Naval        Operations       with an
assessment of the adequacy of Navy training                 to support       warfight-
ing capabilities       in each of the Navy's primary            mission      areas and
focuses attention        on specific     areas where training           may be defi-
cient.    The training       appraisal    program allows       scarce training
assessment resources         to be brought      to bear upon those training
programs that fleet        feedback reveals         are most in need of atten-
tion.    The Navy training        appraisal     process has thus far examined
acoustic    operator,     damage control/firefighting,            electronic       war-
fare operator/maintainer,           and "over-the-horizon"          targeting      sys-
tems training.
There is also an ongoing Navy Training            Requirements      Review, which
provides   direct     feedback between warfare      sponsors,     Systems Com-
mands, the fleet,        and the Naval Education     and Training       Command on
a scheduled    basis.      That program requires     fleet    experts    to talk
directly   to school personnel       and provides    valuable     information    on
training   effectiveness.
Additional      training  effectiveness       feedback systems in place
include     the Fleet Training      Appraisal     Program and the Maintenance
Training      Improvement Program which provide          fleet  performance
data.      The Training   Performance      Evaluation    Board Training     Evalua-
tion and Assessment Division           was staffed     in February    of 1990 and
has as part of its charter          the study of training       feedback
systems.
FINDING   J:   Field     Measures     of Trainina     Effectiveness--Air            Force.
The GAO reported        that it considered        sources of individual             level
data for field       performance      of Air Force personnel             equivalent     to
those it used for the Navy, but concluded                  that neither         the promo-
tion examinations         nor the supervisory       surveys were appropriate.
The GAO further         concluded     no individual      data exist        that would
allow  an analysis        equivalent    to those performed           by the Army with
the Skill    Qualification        Test data.
The GAO reported       that other Air Force training            assessment proce-
dures exist,     including      Training   Quality     Reports,    Utilization        and
Training   Workshops, and Occupational             Survey Reports.          According
to the GAO, the Training           Quality   Reports are part of a reactive
evaluation    process,     while the other activities           are more concerned
with front-end      analysis.        (PP. 4-8 to 4-lo/GAO Draft Report)
DOD Re8~onsa:          Partially     concur.     The Air Force is aware of the
potential       shortcomings     of promotion        examinations     and supervisory
surveys      for evaluating      training     effectiveness,       and is currently
developing        career field     training    management guidelines        to track
and enhance the training             from enlistment       throughout    an individ-
ual's     career.      Emphasis will       be placed on criterion-referenced




     Page 97                GAO/PEMD91-4 Milkmy Technical-Trainhg Effectiveness Is Unknown
        Appendix V
        Chnmenta From the Department of Defense




                                                                                            L”




objectives    rather   than the present        code Levels for performance
standards.     These changes will        have a major impact on the present
promotion    system.     To expedite     feedback from supervisors        concern-
ing any problems with recent graduates,             a new policy     was recently
established    by the Air Training         Command to provide     telephonic
communication     on a 24-hour basis between the training             center
providing   the training     and the supervisor       of the graduate.        The
system allows more effective          and timely    communication     between the
supervisor    and the training       provider.
The Air Force does not have Skill                   Qualification       Tests for perfor-
mance and does not plan to have them in the near future.                               Many of
the tasks performed            in the field        are very complex.          Testing,
recording,        and documenting        individual       performance      for statistics
is very time consuming,              requires      additional      manpower, and is
cost-prohibitive.            Further,      many of the new Air Force systems are
single     channel systems,          which cannot be used for extensive                 train-
ing or evaluating          trainees.        All these factors          combine to make
the use of hands-on Skill               Qualification         Tests an inappropriate
solution      to the problem of training               effectiveness       evaluations.
The GAO finding         that Occupational            Survey Reports are concerned
with front-end        analysis       is true,      but information       about what
first-termers        are doing on-the-job             provides     a good basis for what
should be trained          and what is expected             in the initial      skills
courses.        As written       in the report,        the paragraph       gives a very
 limited    view of what Occupational                Survey Reports provide          the
training      community and their           potential       for training      assessment.
FINDING Kc: Alternative             Data Sources:          The Job Performance          Mea-
9 u rement Pro-.            The GAO reported          a key inpedlment          to estab-
lishing    a field     evaluation       component of training            assessment is
the expense of developing,              testing,      and administering           measures
that validly       and reliability          measure actual       performance.          The GAC
noted that,      beginning       in the early eighties,            a major effort,
entitled--     "The Joint-Service           Job Performance        Measurement
Project,"      designed     to address       the measurement         issues,      has been
underway under the direction                of the Office       of Accession        Policy
located    in the Office         of the Assistant          Secretary     of Defense
 (Force Management and Personnel).                   The GAO reported         that this
project    was initiated         after    the Armed Services           Vocational      Apti-
tude Battery       unintentionally          allowed     some 300,000       less qualified
recruits     into the Military          Services      and resulted       in field      com-
manders'     complaints       of quality        degradation     among their        personnel.
The GAO found        that   the   Joint    Performance      Measurement       project:
             -    did not set out to establish              a link between        school-
                  house performance and field              performance;




        Page99
        Appendix V
        Comments     From   the Department of Defense




                                                                                                        21


               -    concluded       suitable   measures of field   performance                       did
                    not exist       and undertook    to develop them;
               -    has not reported      any analyses       of sex- and race-re-
                    lated differences,       and has not addressed the school-
                    house/field    connection;     and
               -    concluded    performance    measures were expensive        to
                    develop and frequently        costly     to administer  and,
                    therefore,    may not be suited        to more routine   use as
                    measures of training       effectiveness.
The GAO concluded               that the investment              made to develop the perfor-
mance measures and their                    surrogates        could prove to be more prof-
itable      if some of the measures developed                        and the lessons learned
were more widely               applied      to the development             of realistic          assess-
ment procedures              for training.           The GAO further            concluded        that the
lack of other objective,                    systematically          collected        field     evalua-
tion data renders meaningful                      evaluation        of training          effectiveness
impossible.             The GAO observed that decision                     makers in the Con-
gress,      the DOD, or the Services                    can only react to problems in the
field     after       they have become apparent                  and have been identified                 as
training-related.                 The GAO concluded            that,      given the cost and
complexity          of today's        military       equipment,         it is difficult            to
understand          the lack of evaluative                data to monitor           how well Ser-
vice personnel              are being prepared            to use and maintain               those
weapons.         Overall,         the GAO concluded            that,      among the most serious
deficiencies            it identified,          was the inability              of the Air Force
and the Navy to found their                     evaluation         of their       selection        proce-
dures and schoolhouse                  training       in systematically            collected,
objective         field      performance         data.      The GAO further            concluded
that,     without         good    performance        measurement data, the Services                     are
not able        to maximize training                effectiveness,            or even estimate
realistically             the success of their              training        investment        in produc-
 ing skilled          operators        and maintainers           of today's        and tomorrow's
sophisticated             weaponry.          (PP. 4-10 to 5-4/GAO Draft Report)
DOD   ReSVonSa:        Partially       concur.         The GAO analysis        of the back-
ground,      purposes,      and findings          thus far from the Joint-Service
Job Performance         Measurement Program are generally                     accurate.        The
GAO has also correctly              identified         that hands-on performance
measures are resource-intensive                     in terms of labor,         cost, time,
and equipment,         which limits          their     value for routine         use as field
measures of training             effectiveness.             The issue of applying          job
performance       measurement technology                 to training       was investigated
in May 1985, when the Assistant                     Secretary      of Defense (Manpower,
Installations,         h Logistics)          solicited       Service     responses to an
inquiry       from Congressman Les Aspin,                  Chairman of the House Commit-
tee on Armed Services.                One of the Chairman's             questions      specifi-
cally      asked about Service plans for applying                      job performance         data
to training       course design and evaluation.                      The Service     responses




         Page 99                    GAO/PEMB91-4 Military Technical-Train@ Ef’fectivenesaIs Unknown
                                                                                                     .-
             Appendix V
             Comment-9Fran the Department of Defense




r




                                                                                                22


    suggested      how they anticipated            potential     applications        of job
    performance       measurement data.            Each of the Services           offered    a
    plan for institutionalization               of job performance           measures and
    they identified         training     evaluation         as a likely     additional      appli-
    cation    of Job Performance           Measurement technology,            to include
    introducing       performance       measurement into the training                feedback
    system.      The resource        factors     identified      by the GAO, coupled with
    the need to wait until             completion      of the enlistment          standards
    setting     portion     of the Job Performance             Measurement
    research,      resulted       in the decision         to defer full-scale          implemen-
    tation    of routine        job performance        data collection        for all occupa-
    tions.

    It should be noted there is Service work ongoing that examines
    the link between schoolhouse              performance      and field     performance.
    For example,       the Army's Selection          and Classification          research
    program (which incorporates             the Army's contribution           to the Joint-
    Service   Job Performance          Measurement Project)           is examining      the
    link between schoolhouse            performance       and job performance.
    Schoolhouse      (end-of-training)         and job performance         measures have
    been developed        and administered        to a longitudinal        sample in
    several   military      occupational       specialties.         In addition,      school
    grades and Skill        Qualification        Test scores have been obtained              for
    the sample and analyses            are underway.         The Air Force, Navy, and
    Marine Corps have been performing                similar     analyses    and the
    results   will     be applicable       to understanding         the link between
    schoolhouse      performance       and on-the-job        performance.
    Work is also underway in all of the Services                    to determine     the
    efficacy    of performance       surrogates       for specific     purposes.     There
    are technical     and policy       differences       related    to measuring     job
    performance    for validating          a test and measuring         job performance
    for evaluating      a training       system.      Nevertheless,       if research
    efforts   are successful,        it may be possible           to use surrogates      to
    develop cost-effective         field      performance      feedback procedures       that
    could help guide curriculum             development.

                                          RECOMM&NDATIONS

    l7ECmNDATION         1: The GAO recommended that the Assistant            Secre-
    tary of Defense (Force Management and Personnel)               direct  the
    personnel     research   it coordinates      among the individual     Services
    to investigate       more sensitive    predictors     of schoolhouse   perfor-
    mance for women and minority          students    from the Armed Services
    Vocational     Aptitude    Battery  data it already      possesses.
     (P. 5-4/GAO Draft Report)




             Page100                GAO/PEMD914 Military Technical-Ibahing Effectiveness        LsUnknowns
        Appendix V
        CommentaFromthe~partmentofDefense




                                                                                        23


DOD Response:        Concur.    The Office       of the Ass:stant       Secretary     of
Defense (Force Management and Personnel)                 will   prepare a memoran-
dum to the Defense Manpower Data Center and the Services                      request-
ing that the recommended analyses               be performed.       We will   also
ensure that research         in progress     pertaining       to computerized
enlistment      test development     will     include    analyses    to determine
the sensitivity       of the tests as predictors             of schoolhouse     perfor-
mance for gender and minority             subgroups.
REC~NDATION        2: The GAO recommended that the Secretary                 of the
Army direct    the Training    and Doctrine     Command to review the
schoolhouse    grading   procedures  identified      within   the report       as
deficient   for their    accuracy,  appropriateness,        and reliability.
(P. 5-4/GAO Draft Report)
DOD Response:        Concur.    The Secretary   of the Army will       direct    the
Training    and Doctrine     Command to review the appropriateness            of
Fort Gordon's      testing   procedures   and their   compliance      with Army
policy.     A plan of action      to remedy any existing      deficiencies
will    be prepared by August 1990.
RECObQ4ENDATION3 : The GAO recommended that the Secretary                  of the
Navy establish       a firm deadline    for developing     a training     evalua-
tion program and that he direct           that the adequacy of current
resources    allocated     to this effort     be reexamined.       (p. 5-4/GAO
Draft Report)
DOD Response:         Concur.     The Navy has several           training    evaluation
programs already        in place.        As mentioned      previously,     these
include    the Navy Training          Appraisal,      the Navy Training       Require-
ments Review, the Fleet Training                 Appraisal    Drogram, the Mainte-
nance Training        Improvement       Program and the Training          Performance
Evaluation       Board.    Additionally,         the Chief of Naval Education           and
Training     plans to brief,        by July 1990, an enhanced integrated
training     feedback system to the Chief of Naval Personnel.                       A Plan
of Action      and Milestones       will    be prepared by August of 1990 to
implement      that system.
RBCO&Q4ENDATION4: The GAO recommended that the Assistant                 Secre-
tary of Defense (Force Management and Personnel)             review alterna-
tive measures of field      performance   already    developed    by the
Services     under the Job Performance    Measurement project        for poten-
tial   applicability   to training    and on-the-job     performance     evalua-
tion.     (pp. 5-4 and 5-5/GAO Draft Report)
DOD Resvonse:     Concur.   During the mid-1980s,    the DOD explored
applications    of the measures developed    in the Joint-Service      Job
Performance    Measurement Program to training.      While the decision
made following    that review was to defer full-scale      implementation
because of cost factors      and the fact that techniques      for develop-




        Page101                GAO/P~91~MilltaryTechnical-TrainingEffectivenesaIs             Unknown
      Appendix V
      Comments   From   the Department of Defense




                                                                                       24


ing the performance         measures were still       being refined,   the
Department     will  again explore      the feasibility     gf expanding   their
use through      the auspices    of the Joint-Service       Job Performance
Measurement Working Group.            The review is expected to be com-
pleted   following    final    performance    measurement development      during
Fiscal  Year 1991.




      Page 102                GAO/PEMD914 Military Technical-Tmining   Effectiveness   Is Unknown
Appendix VI

Major Contributors to This Report


                     Michael J. Wargo. Issue Area Director
Program Evaluation   Richard T. Barnes, Assistant Director
and Methodology      Robert E. White, Project Manager
                     Kurt R. Kroemer, Project Staff
Division




 (973276)             Page 103          GAO/PEMMW~ Mlllhry Technical-Trahhg Etzectheness 19Unknown
Ordering     Information

The first five copies of each GAO report are free. Additional    copies
are $2 each. Orders should be sent to the following    address, accom-
panied by a check or money order made out to the Superintendent
of Documents,    when necessary. Orders for 100 or more copies to be
mailed to a single address are discounted  25 percent.

U.S. General Accounting Office
P.O. Box 6015
Gaithersburg,  MD 20877

Orders     may also be placed   by calling   (202)   2756241.
K-nited States
General Accounting     Office
Washington,    D.C. 20548

Official   Business
Penalty    for Private   Use $300




                                    .