United States General Accounting Office GAO March 1989 Content Analysis: A Methodology for Structuring and Analyzing Written Material Transfer Paper 10.1.3 Preface In this paper, we define and describe the evaluation method called “con- tent analysis.” It is a set of procedures for transforming nonstructured information into a format that allows analysis. Prom reading this paper, GAO analysts should gain an understanding of the basic concepts and procedures used in content analysis and also an ability to recognize the appropriate circumstances for using this evaluation method in their jobs. Although we have focused on techniques that make quantitative analy- sis possible! this is not necessarily the objective of all content analyses. We have presented the techniques that are the most applicable to GAO'S work. In chapter 1, we define content analysis and compdre it to similar procedures already used in GAO. In chapter 2, we discuss the procedures for using content analysis. In chapter 3, we explain the advantages and disadvantages of content analysis and describe some of its potential applications in program evaluation. The paper is designed to be self-instructional. References are provided throughout the text for readers who want more information on specific topics, and these references are keyed to the bibliography. Research for this document began with a survey of the numerous books and articles on content analysis and its past applications. We also inter- viewed users of content analysis to gain information about its advan- tages and disadvantages, and we interviewed selected GAO staff who have participated in evaluations in which content analysis might have been appropriate. The foundation for this document is a paper written by William Carter while a student intern with GAO. The document was prepared by Teresa Spisak, formerly of the Institute for Program Evalu- ation (now PEMD), and was originally published in 1982 as Transfer Paper 3. It is being reissued now with only minor changes, including some updating of bibliographic materials. Content Analysis is one of a series of papers issued by PEMD. The pur- pose of the series is to provide GAO evaluators with a clear and compre- hensive background of the basic concepts of audit and evaluation methodology. Additionally, transfer papers explain both general and Page I TmmferPaper10.1.3CmtentAndysia Preface specific applications and procedures for using the evaluation methodol- ogy. Other papers in this series include Causal Analysis, Designing Eval- Questionnaires, Using Statistical Sampling, and Case Study Evaluations. Eleanor Chelimsky Assistant Comptroller General for Program Evaluation and Methodology Page2 Transfer Paper 1O.l.a cc&en* AMlydm Page 3 Transfer Paper 10.15 content Analysb Contents Preface Chapter 1 6 What Is Content Analysis? Chapter 2 8 What Are the Deciding to Use Content Analysis Determining What Material Should Be Included 8 10 Procedures in Content Selecting Unitsof Analysis 10 Analysis? Developing Coding Categories 11 Coding the Material 18 Analyzing and Interpreting the Results 20 Writing the Report 22 SummarY 23 Chapter 3 25 Why Should GAO What Content Analysis Can Do 25 Pitfalls in Using Content Analysis 26 Analysts Use Content Potential Applications in Program Evaluation 27 Analysis? Conclusion 28 Bibliography Figures Figure 2.1: Steps in Content Analysis 8 Figure 2.2: Requirements for Content Categories 12 Figure 2.3: Matrix Category Format 13 Figure 2.4: Category Format Measuring Space 14 Figure 2.5: Two Category Formats Measuring Frequency 14 of Statements Figure 2.6: Measuring Frequency of and Position Taken 15 on Specific Proposals Figure 2.7: Category Format Measuring Attitude Intensity 17 Figure 2.8: Guidelines for Contents of Coding Instructions 18 for Trained Coders Figure 2.9: Issues Addressed by HUD’s Evaluation Units 21 Figure 2.10: Minimum Documentation for a Content 23 Analysis Study Page 4 Transfer Paper 10.16 Content Analysis Contenta Abbreviations GAO U.S. General Accounting Office U.S. Department of Housing and Urban Development PEMD Program Evaluation and Methodology Division P-6 Transfer Paper 10.1.3 Content Analysis Chapter 1 What Is Content Analysis? GAO staff often collect large quantities of written material during their jobs. Workpapers, agency documents, transcripts of meetings, previous evaluations, and the like all contain useful information that is difficult to combine and analyze because it is diverse and unstructured. Content analysis is a set of procedures for collecting and organizing this information. One way to begin structuring written material so that it can be analyzed is to summarize and list the major issues that are contained in it. Then the frequency with which these issues occur can be counted. Both activi- ties are usually performed at some point in GAO jobs, and both are part of content analysis. For example, in assessing HUD'S evaluation system to determine whether program offices were duplicating efforts, GAO analysts collected budget information, interviews, and evaluation reports. (GAO, 1978)’ They began analyzing the information by identifying 31 major issues for hous- ing and urban development. Then they reviewed 38 HUD evaluation reports from two offices, categorizing the issues addressed in each report and looking for overlaps between the offices. Simplifying and cat- egorizing written information are part of content analysis. In addition to requiring summaries of written material and enumera- tions of the frequency of statements or issues, GAO projects often require more complex analyses. Sometimes trends have to be examined over time, across different situations, or among different groups. The infor- mation that is needed to make these types of analysis may not exist in computer files. With content analysis, information from written material can be structured so that these types of analysis can be made even with- out computer files. Content analysis is a set of procedures for collecting and organizing information in a standardized format that allows analysts to make infer- ences about the characteristics and meaning of written and other recorded material. Simple formats can be developed for summarizing information or counting the frequency of statements. More complex for- mats can be created for analyzing trends or detecting subtle differences in the intensity of statements. Among the procedures of content analysis that we discuss in the next chapter are defining and sampling the written or recorded material to be lInt.erlinear bibliographic references are cited in full in the bibliography. Page6 TmnsferPaper10.1.3C0ntentAnalpia Chapter 1 What L4 Content Analyshg? analyzed, developing standardized categories, coding the material with rigorous reliability checks, analyzing and interpreting the information, and validating and reporting the results. Although in this paper we have focused on procedures that make quantitative analysis possible, this is not necessarily the objective of all forms of content analysis. Page 7 Transfer Paper 10.18 Content Andy&i What Are the Procedures in Content Analysis? The steps to be followed in content analysis are summarized in figure 2.1. Steps 1, 2, and 6-deciding whether or not the methodology is appropriate, determining what material should be analyzed, and analyz- ing and interpreting the results-are integral aspects of all projects. However, steps 3,4, and 5-choosing the units of analysis, developing coding categories, and coding the material-are unique to content analy- sis, and therefore we will explain these in greater detail. Flgure 2.1: Steps in Content Analysis 1. Decide to use content analysis. 2. Determine what material should be included in content analysis. 3. Select units of analysis. 4. Develop coding categories. 5. Code the material. 6. Analyze and interpret the results. At step 1, analystsshould consider a number of factors in deciding Deciding to Use whether or not to use content analysis, These include a project’s objec- Content Analysis tives, data availability, and the kinds of analyses required. Objectives Objectives are precisely worded questions that the project staff are try- ing to answer. (GAO, December 1988, p. 10-4) The questions should be based on a clear understanding of project needs and the available data. Precisely worded questions provide the focus for data collection, analy- sis, and reporting. In general, content analysis can be used to answer “What?” but not “Why?” That is, it helps analysts describe or summa- rize the content of written material, the attitudes or perceptions of its writer, or its effects on its audience. Page 8 Tram&r Paper 10.1.3 Content Adymis Chapter 2 WhatAretheProceduresin Content Analysis? The content of material can be summarized by listing or by counting the issues or statements within it, as we indicated in chapter 1. The author’s attitudes and perceptions can also be described. For example, if analysts wanted to assess the effects of various programs on the lives of older people, content analysis of open-ended interview responses could be used to identify their outlook on life and their attitudes about loneliness or security. Content analysis can also be useful in describing the effects of messages on their recipients. For example, the effect of Voice of America broadcasts has been assessed by analyzing Soviet newspapers and transcripts of radio broadcasts. (Inkeles, 1952) The Kinds of Material Content analysis can be used to study any recorded material as long as the information is available to be reanalyzed for reliability checks. Available Although it is used most frequently to analyze written material, content analysis can be used to study any recorded communication, including television programs, movies, and photographs. It can be used to analyze congressional testimony, legislation, regulations, other public docu- ments, workpapers, case studies, reports, answers to survey questions, news releases, newspapers, books, journal articles, and letters. A speech or a discussion, however, cannot be analyzed unless it has been tran- scribed or taped. Before using content analysis, project staff should assess the written material’s quality. Does the available material accurately represent what was written or said? A garbled tape recording or written material with sections missing is not a sound basis for content analysis. Findings and conclusions from content analysis can never be more accurate than the material that has been analyzed. The Kinds of Comparison Content analysis can be used for making numerical comparisons among and within documents. For example, staff who want to describe or sum- Required marize the content of written material can use content analysis to com- pare documents derived from a single source, such as from one federal agency, by comparing issues or statements over time, in different situa- tions, or across differing groups. The relationship of two or more state- ments or issues within a single document or set of documents can also be analyzed. Alternatively, statements or issues from two or more different sources can be compared. - Page 9 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProceduresin Content Analysis? Sampling is necessary if the body of material, the “universe,” is too Determining What extensive to be analyzed in its entirety. Thus, at step 2, analysts who Material Should Be want to make valid conclusions and generalizations about a universe Included should select from that universe a sample that is representative of it.1 Selecting samples for content analysis usually involves sampling docu- ments. For example, in a hypothetical project evaluating changes in the eligibility requirements in a food stamp program, more than 500 partici- pants might be interviewed. By arranging the interview transcripts alphabetically and then selecting every tenth transcript for content analysis, the project staff might be able to draw a systematic sample. Other types of sampling design may also be used. (Babbie, 1973, pp. 91- 102) In content analysis, the researcher designates the units of analysis, Selecting Units of called “recording units,” and the units of context. This is step 3. Context Analysis units set limits on the portion of written material that is to be examined for categories of words or statements. Context units can be the same as the units sampled, although they are not always the same. Since it is not always practical to use long documents as context units, chapters, sections, paragraphs, or even sentences may be better choices. This is especially true when attempts are made to identify subtle differ- ences in content. For example, a meeting transcript can be analyzed to determine the extent to which the meeting’s participants supported or opposed various issues. In this case, the analysts would choose sentences as the context unit if entire statements were relatively long and tended, as sometimes happens, to contain conflicting information. It may be typical for a given speaker to oppose an issue at the beginning of a statement but to shift to support of it at the end. To identify such shifts in position, analysts need to examine a small content unit such as the sentence. A recording unit is the specific segment of the context unit in the writ- ten material that is placed in a category. It may be a word, a group of words (such as those that identify a theme), a sentence, a paragraph, or an entire document. It can never be larger than the context unit. In the nun study we cited earlier, analysts used the groups of words that ‘Readers unhmlliar with basis sampling theoryand methods should refer to GAO, December 1988, pp.ll-16tn ll-19and ll-26to 1136. Page 10 Tramfer Paper 10.1.2 content Analyeds~ Chapter 2 What Are the Procedures in Content Analysis? embodied the discussion of the issues as recording units. Their context units were the evaluation studies. Categories provide the structure for grouping recording units. Step 4, Developing Coding formulating categories, is the heart of content analysis. Berelson, an Categories early user of content analysis, emphasized the importance of this step when he cautioned that “Content analysis stands or falls by its categories. Particular studies have been pro- ductive to the extent that the categories were clearly formulated and well adapted to the problem and to the content.” (Berelson, 1962, p. 147) Figure 2.2 lists standard requirements that categories should meet. Adhering to these requirements helps keep an analysis systematic and objective, which leads to results that are amenable to statistical calculation. Page 11 Traneifer Paper 10.1.2 Content Analysis Chapter 2 What Are the Procedures in Content Analysis? Figure 2.2: Requirements for Content Categories 1. Categories should be exhaustive-so that all relevant items in the material being studied can be placed within a category. 2. Categories should be mutually exclusive-so that no item can be coded in more than one category. 3. Categories should be independent-so that a recording unit’s category assignment is not affected by the category assignment of other recording units. Category Formats Categories can be conceptualized in numerous ways. Some common cate- gory formats are groupings, scales, and matrices.2 Structured category formats increase coding efficiency, especially when the number of cate- gories is large. In our HUD example, analysts chose groups of issues as categories. They grouped 31 issues into three general categories. For example, issues such as dispersion of housing, block grants, and public housing modernization were placed in the category “Housing Assistance Issues.” Scales provide for the rank ordering of information. In the HUD example, had the analysts wanted to know the extent to which the reports they were examining supported the issues, they could have used a scale such as “supports, is ‘uncommitted, opposes.” Matrices are useful formats when analysts seek more information about issues than simply whether they are present or absent. The group and scale categories we discussed above could be combined into a matrix for- mat such as that shown in figure 2.3. ‘Krippendorff discusses these and more sophisticated formats such as trees, loops, chains, cubes, and partition lattices. (Krippendorff, 1980, pp. 91-98) Page 12 Transfer Paper 10.1.3 Content Analysis Chapter 2 What Are the Procedures in Content Analysis? Figure 2.3: Matrix Category Format Degree of support for issue Issue SUPPOrtS Opposes Uncommitted I Housing assistance A. Block grants 6. Houslng dispersion C. Public housinq modernization Quantification Levels Categories can be used to measure three quantification levels-space, frequency, and intensity. To explain the differences between these quantification levels and how they relate to constructing categories, we use a hypothetical analysis of handgun control legislation for which the analyst has as major sources of information newspaper articles, public documents, and transcripts of interviews with public officials. At the least rigorous level of quantification, the hypothetical analyst can measure the amount of space in the newspaper articles devoted to positions supporting or opposing the issue. The analyst then can use this measurement to compare the relative strength of issues supporting and opposing handgun control. In selecting newspapers, the analyst also has to control for factors that may influence the articles’ content or editorial viewpoint. The category format shown in figure 2.4 uses the newspapers’ location (rural versus urban) for this purposed. For each issue of each newspaper in the sam- ple, the analyst adds together the number of column inches from all news articles and editorials to find the total amount of space for each position. By also coding the name, location, and date of each newspaper, the analyst can examine trends across time and can compare rural and urban viewpoints. Page 13 Transfer Paper 10.18 Content Analysis chapter 2 What Are the Procedures in Content Analysis? Figure 2.4: Category Format Measuring Space Number of column inches Newspaper Date Location Supporting Opposing Uncommitted “Times” 1l/12/81 Urban 4 0 2 “Examiner” 11/18/81 Rural 0 5 2 Such measurement is rapid and relatively easy, but it provides only very general information. Furthermore, analysts who use this level of quanti- fication have to assume that the differences they find in amounts of space are valid indicators of relative emphasis or impo&+.nce. At the next level of quantification, the analyst can code the frequency of recording units by tallying the number of times each issue or state- ment occurs in the text. Formats for measuring frequency can be very simple, as in figure 2.5, or more complex, as in figure 2.6, depending on the information needs of the project. Figure 2.5: Two Category Formats Measuring Frequency of Statements Format 1 Number of column inches Newspaper Date Location Supporting Opposing Uncommitted “Times” 11/21/81 Urban 2 0 1 “Examiner” 1l/18/81 Rural 0 4 0 Format 2 Newsoaoer Date Location Statement attribution Position “Times” 1 l/12/81 Urban State politician Supports “Times” 1l/12/81 Urban Editorial Supports “Times” 11/12/81 Urban U.S. Senator Uncommitted “Examiner” 1 l/18/81 Rural Citizens’ group Opposes “Examiner” 1l/18/81 Rural State politician Opposes Page 14 Tramfer Paper 10.1.3 Content Andy& Chapter 2 WhatAretheProceduresin Content Analysis? Figure 2.6: Measuring Frequency of and Position Taken on Specific Proposals Cateaorv Format Opposes Uncommitted/no Proposals for handgun control (02) position (03) Bannina handaun sales (011 Banning importation of unassembled aun Darts Handgun registration (03) Stricter controls on handoun purchases (041 Stronger penalties for using handguns to commit crimes (05) More stringent enforcement of existing control (061 Other (07) Codina Format Statement Source Date Column Row Presidential advisory panel 01 02 Presidential advisory panel 01 07 Presidential advisorv oanel a/6/81 01 04 Figure 2.5 presents two simple formats for measuring the number of statements supporting, opposing, and uncommitted to handgun control. Format 1 is similar to the format for measuring space but instead meas- ures the number of articles that appear over a given period of time. For- mat 2 identifies the speaker and allows the analyst to compare positions by different individuals over time and by different locations. Figure 2.6 shows a more elaborate means of measuring frequency, with separate formats for category and for coding. This approach could be used to analyze information from all three data sources in the hypotheti- cal example-newspapers, public documents, and interview transcripts, In the figure, the categories describe positions on specific proposals for handgun control. The positions can be coded by assigning them four dig- its that indicate the positions taken (columns) on the proposals (rows). Page 18 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProceduresin Content Analysis? To show how this works, we can examine the recommendations in the following statement from a New York Times article published on August 6, 1981, coded as shown in figure 2.6. “The eight-member (Presidential advisory) panel . . . recommended legislation for- bidding the importing of pistol parts, requiring citizens to report the theft or loss of a pistol, and establishing a waiting period before a pistol is purchased to permit the authorities to determine if the purchaser has a criminal record.” The recommendation for legislation forbidding the importing of pistol parts is coded as column 01 (“supports”), row 02 (“banning importation of unassembled gun parts”). The second recommendation, “requiring cit- izens to report the theft or loss of a pistol,” is coded as “other” (0107) since it is not in the list of specific proposals. In general, analysts incorporate two assumptions in their research designs when they construct frequency measures. First, they assume that the frequency with which a statement occurs in the text is a valid indication of value or importance. Second, they assume that all content units can be given equal weight and therefore that each one can be com- pared directly with every other. At the third level of quantification, analysts code for intensity. Frequen- cies are counted, but each coded statement or issue is also adjusted by a weight that measures relative intensity.3 This measurement level allows much more sensitive data analysis. One drawback of intensity coding, however, is that it requires coders to recognize more subtle differences in the material than they need to when coding for space or frequency. Furthermore, it is difficult to list all criteria that coders have to consider in making their decisions. For example, coders may have to consider the relative intensity of the mean- ing of verbs (“disagree” versus “doubt”) or their tenses (past, present, future), of the meaning of adverbial modifiers (“often” versus “some- times”), or of the meaning of statements that express what is probable (using “may”) versus what is imperative (using “must”). Since it helps analysts compare subtle differences in words, this level of quantification is the most useful for analyzing direct quotations and the contents of official documents, such as public laws and regulations, in which words are understood to have been chosen carefully to convey a 3Three methods of calculating and assigning weights are discus& in North et al., 1963,pp. M-103. Page 16 lhnsfer Paper 10.1% Content Analysis Chapter 2 WhatAretheProceduresin Content Analysis? precise message. In the gun control example, therefore, only the inter- view transcripts would be analyzed at this level. Figure 2.7 illustrates how attitude intensity can be coded. Using two hypothetical interview responses, it shows how replies can be fitted into the category form “subject, verb, common meaning term.” Each reply may contain more than one statement-or recording unit-to be coded. Therefore, values ranging from +3 to -3, depending on direction and intensity, are assigned to the verb and the common meaning term in each statement. In this case, a plus is assigned to verbs and common meaning terms that appear to support gun control. Each statement’s two values-the value of its verb and the value of its common meaning term-are multiplied, and then the products for all the statements in the response are summed, yielding a total score for each response. Figure 2.7: Category rormat Measuring Attitude Intensity Response 1 “Personally, I’m for gun control, but I doubt that a general gun control bill would meet with verv much success.” Subiect Verb Value Common meaning term Value Product I am +3 for gun control +3 +9 I doubt -2 bill would meet with very +3 -6 much success Total +3 Response 2 “I urge the government to tighten its controls on handguns sold to residents.” Subieot Verb Value Common meaning term Value Product I urge +3 government to tighten Its +3 +9 controls Total +9 In the example in Figure 2.7, response 1 contains two statements while response 2 contains only one. The qualifying statement in the first response lowers its intensity so that, overall, the second response is given a higher intensity rating. Page 17 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProceduresin Content Analysis? Material can be coded either manually or by computers, depending on Coding the Material the resources available and the format of the material. This is step 5 in content analysis. If the material is already computerized, the analyst should explore the possibility of obtaining a computer program to do the coding. After deciding how the material will be coded, the analyst writes the necessary instructions. Figure 2.8 spells out the minimum require- ments for instructions for trained coders. Figure 2.6: Guidelines for Contents of Coding instructions for Trained Coders 1. Definition of recording units, including procedures for identifying them. 2. Descriptions of the variables and categories. 3. Outline of the cognitive procedures used in placing data in categories. 4. Instructions for using and administering data sheets. Source: Adapted from K Knpendorff, Content Analysis, An Introduction to Its Methodology (Beverly Hills, Calif Sage Publications. 1980), p 174. Pretesting Pretesting is an important step before actual coding begins. It involves coding a small portion of the material to be analyzed or some other simi- lar material. From the pretests, the analyst tests and revises the coding categories and instructions, and does this several times in some cases. Pretesting is necessary whether computers are used for content analysis or the analysis is done by hand. Computer analysis requires test com- puter runs to ensure that the program is functioning as planned. A pretest enables the analyst to determine whether (1) the categories are clearly specified and meet the requirements in figure 2.2, (2) the coding instructions are adequate, and (3) the coders are suitable for the job. These determinations are made by assessing reliability among cod- ers and consistency in individual coding decisions (as we discuss below). Once the analyst has been assured that the material can be coded with high reliability, the pretests are over, and the coding can begin. Page 18 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProceduresin Content Analysis? Data can, of course, be coded with the help of computer programs. (Weber, 1986) This solves the reliability problem but generates others. For one, all the material to be coded must be entered on a computer tape or disk, even though this may be impractical. For another, computer programs that perform content analysis require very specific categories. For example, using a computer usually confines analysts to words as recording units, but this means that every word being coded has to be listed in the computer’s memory as in a dictionary. Preparing a diction- ary, however, may be far more difficult than formulating categories. Furthermore, because a word takes on different meanings in different contexts-a subtlety which computers cannot discern but people can- the results of computer coding may lack validity. Computers should not be completely discounted, however, because they do have advantages. They are valuable in a number of situations. Com- puters can save time and permit analysis of large amounts of data when the word is the optimal unit of analysis. Because computers can “remember” many more definitions than people can, they are useful when categories are numerous. They are also valuable when data will be reused. Thus, the cost of preparing a data base for a series of studies for computer analysis may be offset by the benefit of having easily manage- able data in the future. (Holsti, 1969, pp, 161-64) Checking for Reliability A check for reliability tells analysts the extent to which a measuring procedure can produce the same results on repeated trials. (Carmines and Zeller, 1979, p. 11) In content analysis, this means determining the similarity with which two or more people categorize the same material. Analysts have to assess reliability while pretesting the coding categories and instructions and also throughout the coding process. To check for reliability, an analyst compares the way independent cod- ers have coded the same mater-M4 For example, two coders might be given ten items to code individually. The analyst compares their coding decisions and determines the extent to which they agree. 4Many reliability formulas have been developed for computing the percentage agreement among cod- ers. See Kaplan and Golden, 1949; Krippendorff, 1980; Robinson, 1967; and Spiegelman et al., 1967. Scott’s formula is considered useful for two coders because it takes into account the extent of intercoder agreement that may result by chance. See Scott, 1966; see also Holsti, 1969, pp. 140-41. Page 19 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProceduresin Content hulyeis? What constitutes acceptable reliability is best decided case by case, although analysts generally consider nothing lower than 80 to 90 per- cent agreement as acceptable. Low reliability estimates do not reveal whether the fault lies with the categories or with the coders. During the pretest, therefore, it is important for the analyst to identify major sources of discrepant coding and to learn the reasons for them. If the coders are assumed to be competent, low reliability estimates indicate that they are being asked to make finer discriminations than is possible with their training and understanding of the categories. One way to resolve this problem is to contrast data known to have been coded reliably with the data that have not. This tells the analyst whether errors are concentrated in a few categories or cut across all categories. If the latter, the analyst should seriously reconsider the entire design, including the decision to use content analysis. If only a few areas are causing problems, then revising these categories (or the instructions) may solve the problem. (Fox, 1969, pp. 670-72) The main objective of content analysis is to analyze information whose Analyzing and format has been transformed into one that is useful. This constitutes Interpreting the step 6 and involves Results l summarizing the coded data, . discovering patterns and relationships within the data, . testing hypotheses about the patterns and relationships, and l relating the results to data obtained from other methods or situations or from assessing the validity of the analysis. Neither these tasks nor the analytical techniques for accomplishing them are unique to content analysis. Depending on the coding design, an analyst can use a variety of statistical methods. Summarizing Data and The most common means of summarizing data is by looking at frequen- Examining Their Patterns ties among them. Absolute frequency might be the number of times statements or issues are found in the sample; a relative frequency might be represented by a percentage of the sample size. Analysts can compare one category’s frequency to the average frequency for all categories, or they can note changes in frequencies over time. Page 20 Transfer Paper 10.1.3 Content Andysia Chapter 2 WhatAretheProcedureein Content Analysis? Figure 2.9: Issues Addressed by HUD’s Evaluation Units I Source: U.S. General Accounting Office, HUD’s Evaluation System-An Assessment, PAD-7844 (Wash. Ington, D.C.: 1978), p 7 In the assessment of HUD'S evaluation system, for example, after the GAO analysts had categorized the issues addressed in 38 evaluation reports from two offices, they summarized the number of studies discussing each issue. They used absolute frequencies, and we show their grand total in figure 2.9. Within this summary, the analysts reported that 20 of the 38 documents they reviewed were not directed toward any major housing and urban development issue and that 16 issues were not addressed at all. (GAO, 1978, p. 22) Another way of analyzing content analysis data is to examine relations among variables by cross-tabulating the co-occurrence of variables. Fig- ure 2.9, for example, shows the relationship between the issues addressed in various reports and the evaluation units that produced the Page 21 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheProcedwesin Content Analysis? reports. Prom this information, the GAO analysts identified little duplica- tion in the way the two offices addressed the issues. Cross-tabulations need not be limited to two or three variables. Mul- tivariate techniques can be used to analyze complex structures. (Reyn- olds, 1977) Other techniques for discovering patterns and relationships in data include contingency analysis, clustering, and factor analysis; Krippendorff discusses these and others. (Krippendorff, 1980, pp. 109- 18) Assessing Validity Whatever the technique used, a final and important task is to assess the validity of the results by relating them to other data that are known to be reasonably valid. Validity is the extent to which an instrument meas- ures what it is intended to measure. Reliability and adequate sampling are necessary but not sufficient conditions for validating inferences made through content analysis. In addition, analysts have to corrobo- rate the results of content analysis with other data or by other proce- dures that are known to be valid indicators of the phenomena they are studying. An example of validity assessment is provided in Ramallo’s analysis of volunteers’ written reports of their experiences in Crossroads Africa, a Peace Corps program. (Ramallo, 1966) He hypothesized that content analysis of reports could distinguish successful volunteers from unsuc- cessful ones, assuming that the unsuccessful volunteers would exhibit greater alienation from their experiences. Ramallo compared his results with supervisors’ ratings for the same volunteers and found a high cor- relation between the two, concluding that his own analysis had pro- duced a valid measure of success. Other equally appropriate measures could have been used to validate Ramallo’s findings. Surveying the Africans with whom the volunteers had worked is one. Measuring increases in food production or decreases in infant mortality for each volunteer’s assigned village are others. The use of plentiful and generally acceptable corroborating measures reduces the risk of producing misleading evaluation findings. As in writing any GAO report, analysts should explain the scope and Writing the Report nature of their work to indicate to their readers what they covered and what the frame of reference is for their findings. (GAO, July 1988, chap ter 12.8) Readers should be given a clear idea of what was done, why it Page 22 Transfer Paper 10.13 Content Analysi6 Chapter 2 What Are the Procedures in Content Analysis? was done, and why the results provide a sound basis for conclusions and recommendations. Figure 2.10 outlines the record of information that analysts should maintain when they use content analysis. Figure 2.10: Minimum Documentation for a Content Analysis Study 1. The study’s objectives, which governed the choice of data, methods, and study design. 2. A justification of the choice of data, methods, and design. 3. A description of the procedures (so that the research can be repli- cated), including descriptions of the . sampling plans, l units of analysis, l coding instructions, l results of reliability tests, l procedures for data handling and analysis, and l efforts at validating parts of or the entire procedure. 4. The findings and their statistical significance. Content analysis results should be firm enough to withstand critical scrutiny. The information represented in the items mentioned in figure 2.10 may be included in the main body of the report or in appendixes, or it may remain only in the workpapers. In either case, it should be documented well enough to enable critical readers to estimate how much they can rely on the reported results. Content analysis is a set of procedures for transforming nonstructured, Summary written material into a format for analysis. In this chapter, we have described those procedures. They are summarized as follows: Page 23 Transfer Paper 10.1.3 Content Analysis Chapter 2 WhatAretheRocedureein Content Analysis? l deciding to use content analysis based on a project’s objectives, the material that is available, and the kinds of comparison that are required; l determining what material should be included in content analysis, which may involve sampling; l selecting context units and recording units; l developing coding categories, quantification levels, and coding instructions; l pretesting the categories and then coding the material either manually or by computer; l checking reliability during retests and throughout the coding; l analyzing and interpreting the coded data, and l assessing the validity of the findings. Page 24 Tnnsfer P8per 10.1.2 Cvntent Andy& Chapter 3 Why Should GAO Analysts Use Content Analysis? In this chapter, we conclude our discussion by presenting some reasons both for using and not using content analysis. We discuss some advan- tages and disadvantages of content analysis and give brief hypothetical cases of potential application in GAO'S work. All researchers who want to analyze written material systematically What Content should consider content analysis. It is a means of extracting insights Analysis Can Do from already existing data sources. Therefore, it is potentially applica- ble to at least part of almost every project. It Can Provide Content analysis of existing written or otherwise recorded material Unobtrusive Measures yields unobtrusive and nonreactive measures. One problem with some experimental methods, as with surveys, is that interactions between analysts and their subjects can cause the subjects to react to the situa- tion rather than in their more “natural” manner, and this may introduce bias into the results. Additionally, survey questions that are considered inappropriate because they invade a respondent’s privacy may have to be eliminated from analysis. Content analysis of existing documents avoids both problems. It Can Cope With Large Large volumes of written material can be analyzed with the help of con- Volumes of Written tent analysis because explicit coding instructions, precise categories, and extensive reliability checks make it possible to use any number of Material trained individuals to code the material. Furthermore, it allows two or more sets of coders to work on the same kind of data in different loca- tions, such as at headquarters and in regional offices. It Helps Analysts Learn Content analysis can help analysts learn more about the programs they About the Substantive are investigating and their issues. This benefit results from two charac- teristics. Content analysis is systematic in nature, and its task of devis- Area ing reliable and useful categories is rigorous. It Can Validate Other In chapter 2, we discussed how to validate content analysis findings by Methods corroborating them with findings from other methods. Validation can also move in the opposite direction. That is, findings from content analy- sis can be used to test the validity of findings from other measures, such as survey data and econometric proxies. Webb and others have P8ge 26 Transfw Paper 10.13 Content An&da Chapter 3 Why Should GAO Analyets Use Content Analysis? described how investigators can use “multiple operations” to increase confidence in their findings. (Webb et al., 1981) We have explained some of the many reasons for using content analysis, Pitfalls in Using but analysts planning to undertake content analysis should also be Content Analysis aware of some pitfalls that await them. The ready availability of rele- vant material may tempt analysts into aimless and expensive “fishing expeditions” motivated by the hope of turning up something interesting. Quantifying documentary information may produce important and interesting data, but not resisting the temptation to count things for the sake of counting is likely to produce precise but meaningless or trivial findings. It Can Be Costly Content analysis is relatively costly and time consuming. Interviewing users of content analysis and reviewing the literature on the method reveal three potential contributions to prohibitive cost. 1. Formulating categories that can be reliably coded is problematic, repetitive, and time consuming. The time it takes to structure and pretest categories may range from a few days to two or three months. 2. Staff have to train coders if they intend to analyze more data than they can handle themselves. Preparing a coding manual and training and supervising the coders can add a significant length of time to a pro- ject. Content analysis can be especially expensive in regard to time expended if the categorization scheme requires subtle coding decisions. 3. Coding substantial amounts of written material takes a great deal of staff time if the recording unit is small (for example, when it is words or themes), and even more time when the context unit is large (for exam- ple, when it is lengthy reports). Since coding must be systematic, it may also be tedious and arduous. Using a computer trades the coding prob- lem for that of computerizing the text or preparing a dictionary, which can also be time consuming and therefore expensive. It Can Pose Reliability and Reliability and validity are interdependent concepts. Generally, trade- Validity Problems offs have to be made between them because precisely defined categories can produce results that are highly reliable and statistically significant but that lack practical significance. The need for objective and replicable results may force analysts to forego coding what they are interested in Page 26 Tnuwfer Paper 10.1.3 Content Amlyaim w-3 WhyShouldGAOAdystsUa content-? and to code instead what can be done mechanically, thus threatening validity. Redefining categories to increase their reliability can lead to a loss of relevance-that is, a loss of validity-and, therefore, of useful- ness. Because of this dilemma, validity has to be assessed after catego- ries have been developed. Potential Applications terms of three factors-a project’s objectives, the material to be ana- in PrOsa Evaluation iyzed, and the kinds of analysis required. We give brief cases of hype- thetical application that focus on three program evaluation objectives, showing how content analysis could be used to study them. Identifying Program Goals One objective of a program evaluation might be to identify the pro- gram’s goals. To do this, an analyst might gather written or tape- recorded information on the program’s legislative history from its authorizing legislation and congres&@ committee reports, from pro- gram policy documents, and from transcripts of interviews with agency officials. With content analysis, the analyst’s review of this material could be made objective and systematic. Besides providing analysts with a structured format for identifying the program’s goals, this technique can facilitate determination of whether those goals are congruent with legislative intent because it allows, for example, comparison of agency documents with congressional committee reports. Describing Program A program evaluation might have MJan objective a description of the Activities program’s activities. To achieve this objective, an analyst could develop case studies, attend agency me&ngs, or interview program managers. Information gathered in these ways would then be documented in staff workpapers. These, in turn, can be examined by means of content analysis. F’rom such analysis, concise, objective summaries of the material can be produced, or more complex analyses can be designed. An example would be an analysis of trends in program activities across time. The targeting of program activities couhi also be kr~&@I& with content analysis. Recipients of program e CQUWbe mewed and transcripts could be made of their responses, afkr which their eligibility for receiv- ing services could be examined by comparing information obtained from the interviews with established eligibility criteria. P8ge 27 - P8per 10.1.3 Content Analysis Chapter 3 Why Should GAO Analysts Use Content Analysis? Determining Program A program evaluation might have the ascertaining of the program’s results as an objective. In this situation, analysts might gather informa- Results tion by studying earlier evaluation reports or by surveying program par- ticipants. In surveys, open-ended questions could be appropriate for gaining information about issues, perceptions, or attitudes that cannot otherwise be identified. Analysts who do not want to impose their own concepts on survey respondents may, therefore, be unable to formulate appropriate closed questions. Using content analysis on open-ended sur- vey data, such analysts can examine trends in program outcomes across time and compare them to changes in program activities. Alternatively, they could examine trends across groups of program participants distin- guished by geographical location, age, income, and the like. We hope we have given readers of this paper a realistic sense of both the Conclusion advantages and disadvantages of content analysis. The method does have limitations. Without clear objectives, content analysis can produce very precise information that is, however, meaningless. The method can be costly in that formulating categories that can be reliably coded, pre- paring coding instructions, and training and supervising coders can all be time consuming. Additionally, complex coding schemes, which usu- ally yield the most interesting findings, may produce the least reliable results because they entail a substantial element of coder judgment. Content analysis, therefore, requires rigorous reliability and validity checks if its results are to withstand critical scrutiny. Moreover, the results also depend on the quality of information contained in the docu- ments being analyzed. If these are not reliable or valid, even the most rigorous content analysis will have limited value. Nonetheless, content analysis is potentially applicable to at least part of almost all projects. Content analysis can be used at any stage of a pro- ject, but it is particularly useful at the beginning to help analysts learn about the project’s substantive area. It is an excellent method for gath- ering retrospective information about a program from existing data sources. It does not require the collection of new data, and this means that it saves time and money. The possibilities for application we have discussed in this chapter are not exhaustive; rather, we have intended to show the method’s versatility. The number and kind of areas in which content analysis can be applied and the questions it can help answer are limited primarily by its user’s ingenuity and skill in structuring reliable and valid category formats. Page 28 lkamfer Paper 10.1.3 Content Analysis Page 29 lhmfer Pqber 10.19 Content Analysis Bibliography Babbie, E. R. Survey Research Methods. Belmont, Calif.: Wadsworth Publishing Co., 1973. Berelson, B. Content Analysis in Communication Research. Glencoe, Ill.: Free Press, 1962. Carmines, E. G., and R. A. Zeller. Reliability and Validity Assessments. Beverly Hills, Calif.: Sage Publications, 1979. Fox, D. “Techniques for the Analysis of Quantitative Data,” The Research Process in Education, ed. by D. Fox. New York: Holsinehart &Winston, 1969. Ho&i, 0. R. Content Analysis for the Social Sciences and Humanities. Reading, Mass.: Addison-Wesley, 1969. Inkeles, A. “Soviet Reactions to the Voice of America.” Public Opinion Quarterly, 16 (1962), 612-17. Kaplan, A., and J. M. Golden. “The Reliability of Content Analysis Cate- gories,” The Language of Politics: Studies in Quantitative Semantics, ed. by H. W. Lasswell et al. New York: George Stewart, 1949. Krippendorff, K. Content Analysis: An Introduction to Its Methodology. Beverly Hills, Calif.: Sage Publications, 1980. North, R. C., et al. Content Analysis: A Handbook with Applications for the Study of International Crisis. Evanston, Ill.: Northwestern Univer- sity Press, 1963. Ramallo, L I. “The Integration of Subject and Object in the Content of Action: A Study of Reports Written by Successful and Unsuccessful Vol- unteers for Field Work in Africa,” The General Inquirer: A Computer Approach to Content Analysis in the Behavioral Sciences, ed. by P. J. Stone et al. Cambridge, Mass.: MIT Press, 1966. Reynolds, H. T. Analysis of Nominal Data. Beverly Hills, Calif.: Sage Publications, 1977. Robinson, W. S. “The Statistical Measure of Agreement.” American Soci- ological Review, 22 (1967), 782-86. Tlmnder P8per 10.1.3 content Andye& Bibliography Scott, W. A. “Reliability of Content Analysis: The Case of Nominal Scale Coding.” Public Opinion Quarterly, 19 (1956), 321-25. Spiegelman, M., et al. “The Reliability of Agreement in Content Analy- sis.” Journal of Social Psychology, 37 (1953) 175-87. United States General Accounting Office. Communications Manual. Washington, DC.: July 1988. ---. HUD'S Evaluation System-An Assessment, ~~~78-44. Washington, D.C.: July 20, 1978. ---. Project Manual. Washington, D.C.: December 1988. -. Report Manual. Washington, DC.: October 1988. Webb, E. J., et al. Nonreactive Measures in the Social Sciences, 2nd ed. Boston, Mass.: Houghton Miflin Co., 1981. Weber, R.P. Basic Content Analysis. Beverly Hills, Calif.: Sage Publica- tions, 1985. (979165) Page 31 Transfer Paper 10.1.3 Content Analysis
Content Analysis: A Methodology for Structuring and Analyzing Written Material--Transfer Paper 10.1.3
Published by the Government Accountability Office on 1989-03-01.
Below is a raw (and likely hideous) rendition of the original report. (PDF)