United States General Accounting Office Washington, D.C. 20548 General Government Division R-234202 January 4, 1990 The Honorable J.J. Pickle Chairman, Subcommittee on Oversight, Committee on Ways and Means, House of Representatives Dear Mr. Chairman: This report responds to your request that we evaluate the Internal Reve- nue Service’s (IRS) administration of its Integrated Test Call Survey Sys- tem (ITcsS) during the 1989 tax filing season. ITCSS was designed to measure the quality of service IRS provides through its toll-free tele- phone system-a nationwide system in which IRS assistors answer tax- payers’ telephone inquiries. To accomplish this purpose, IRS designed a survey sample to produce statistical estimates on the accuracy of its tel- ephone assistors in answering a set of 62 tax law test questions. These test questions were developed for tax law areas in which IRS determined that individual taxpayers commonly make inquiries when preparing their tax returns. IRS administered the test by placing anonymous calls to its telephone assistors and scoring their responses to the test questions. You requested that we report to you on IRS’ administration of its 1989 test call survey and on the validity of the statistical estimates produced during the test. To respond to your request, we monitored and indepen- dently scored a statistically valid random sample of IRS’ test calls. As you know, we worked with IRS to develop ITCZS and mutually agreed in advance on what constituted a correct answer for each question. This report evaluates the validity of IRS’ overall national accuracy rate. Appendix I provides selected ITCSS filing season results for IRS regions and call sites and by major tax law categories for individual taxpayers. This report updates and supplements the preliminary results of our work, which we reported in testimony before your Subcommittee on March 16,1989. We did our work from January 1989 to August 1989 using generally accepted government auditing standards, IRS’ overall ITCSS results for the 1989 tax filing season showed that IRS telephone assistors responded correctly 62.8 percent of the time to the survey’s tax law test questions. On the basis of our monitoring of a sta- tistically valid random sample of test calls, we agree with the overall telephone assistance accuracy rate IRS reported. Also, overall, IRS fairly Page 1 GAO/GGB9037 IRS’ Test Call Survey B.234202 administered its test call survey. With few exceptions, IRS test callers (Taxpayer Service employees responsible for making the test calls) asked tax law test questions in a fair manner and scored telephone assis- tors’ responses objectively and accurately. The test question scoring criteria for correct assistor responses we used in our assessment are those on which we and IRS mutually agreed. Dur- ing the filing season. however, IRS reported a higher ITCSS accuracy rate that was based on more liberal scoring criteria with which we did not agree. IRS deviation from the agreed-upon scoring criteria in reporting assistor accuracy was an issue discussed at your March 16, 1989, hear- ing. In July 1989, the Assistant Commissioner (Taxpayer Service) said that for the 1990 filing season, IRS would only report accuracy rates that were based on scoring criteria mutually agreed upon by IRS and us. Because our tax laws are complicated, taxpayers often need assistance Background in understanding the tax laws and in preparing their tax returns. The principal vehicle IRS’ Taxpayer Service Division uses to assist taxpayers is a toll-free telephone program. IRS has assisted taxpayers through this program for over two decades. Historically, IRS has considered telephone assistance to be the most efficient method of helping taxpayers. Accord- ingly, it has devoted substantial staff resources to telephone assistance and encourages taxpayers to use the telephone as a means of getting answers to their tax law questions. During the 1989 tax filing season, IRS employed over 5,000 telephone assistors at 32 telephone sites. These assistors answered about 18.8 mil- lion taxpayer calls on individual and business tax law issues, procedural issues, and account-related matters. IRS’ telephone assistors are primar- ily composed of two groups-frontline and backup assistors. Frontline assistors initially take taxpayers’ calls and, if they are unable to answer the taxpayers’ questions, refer them to backup assistors who usually have more experience and expertise. Over the past 2 years. both Congress and the public have raised con- cerns over the quality of the responses assistors have provided to ques- tions designed to test their tax law knowledge. For the 1988 filing season, we did our sixth survey of assistors’ tax law knowledge and Page 2 GAO/GGDOO-37 IRS’ Test Call Survey B-234202 reported that they provided correct responses to our questions 64 per- cent of the time and incorrect responses 36 percent of the time.’ Also during the 1988 filing season, IRS implemented ITCSS and found that assistors correctly responded to its test questions over 70 percent of the time. IRS expressed its belief that both our test results and its own indi- cated an unacceptably low rate of assistor accuracy. The centerpiece of ITCSS is a 62-question test covering what IRS identified as the seven major individual tax law categories in which taxpayers ask questions. As shown in table 1, these 7 tax law categories contained 32 subcategories of tax law. Table 1: Individual Tax Law Categories and Subcategories Tested by IRS - 1989 IRS Test Call Survey Filing Information Pensions/Deferred Compensation l Filing Requirement l Pension & Annuity Income l Estimated Tax l All IRA Inquiries l Other Retirement Plans Dependents/Exemptions/Filing Status l Taxation of Social Security Benefits l Dependents l Lump Sum Distribution l Personal Exemptions l Filing Status - Head of Household Adjustments/Deductions l Filing Status - Other l Employee Business Expense l Other Adjustments to Income Individual Income l Medical & Dental Deductions l Wages, Alimony, & Unemployment l Tax Deductions Compensation l Interest Deductions l Interest & Dividend Income, Sch. B l Miscellaneous Deductions l Taxable Refunds & Other Income l Gifts to Charity l Non-Taxable Income Tax Computation/Credits/Payments Capital Gains & Losses * Standard Deduction l Schedule D l Itemized vs Standard Deduction l Sale/Exchange of Residence l Child 8 Dependent Care Credit l Other Gains/Losses l Self-Employment Tax l Earned Income Credit l Other Credits/Taxes/Payments l Supplemental Medicare Premium ‘Tax Administration: Accesslbllity. Timeliness, and Accuracy of IRS’ Telephone Assistance Program (GAO/GGDSEX? Fvb “. 1989). Page 3 GAO/GGB9037 IRS’ Test CalI Survey B234202 We reached agreement on the 62 test questions that comprised the test and on two specific categories of correct responses: (1) correct and (2) correct and complete. A correct answer was the minimal standard IRS expected its telephone assistors to meet and we, therefore, focused our monitoring to determine whether ITCSS accurately measured assistors’ responses against t.hat, standard. Answers that exceeded this standard would be classified as correct and complete, but they would also be con- sidered as correct for monitoring and scoring purposes. It was agreed that all other answers would be scored as incorrect, meaning that the telephone assistor’s answer could lead taxpayers to a wrong result on their tax return. Appendix III provides examples of selected 1TCS.S test questions and the responses required for both categories of correct responses. IRSadministered its test by having test callers (1) place anonymous calls to telephone assistors located at 29 telephone sites located within the continental United States and (2) score assistors’ responses to the test questions. During an 1 l-week period beginning February 6, 1989, eight test callers completed and scored 14,876 ITCSS test calls. Figure 1 shows the geographic distribution of these 29 call sites. For various technical and administrative reasons, IRS did not include three telephone sites in its test-Alaska, Hawaii, and Puerto Rico. Page 4 GAO/GGD90.37 IRS Test Cd Survey ILL234202 Figure 1: Locations of Toll-Free Telephone Sites IRS Surveyed - 1989 IRS Test Call Survey Seattle. WA Detroit. MI Portand. OR Buffalo. NY St. Paul, MN Ctewland. OH Milwaukee. WI Boston, MA Chicago, IL Brooklyn. NY Oakland. CA Newark, NJ Denver. co Philadelphia. PA El Monte. CA Baltimore. MD Phoenix. AZ Pittsburgh. PA Omaha. NE Richmond, VA Des Moines. IA Cincinnati. OH St. Louis. MO Nashville. TN Indianapolis. IN Atlanta. GA Dallas, TX Jacksonville. FL Houston, TX To monitor the validity of IRS’ overall test and to comment on its accu- Objectives, Scope, and racy results, we listened in on and independently scored 577 randomly Methodology selected ITCSS test calls during an 8-week period of the tax filing season and compared our scores for those questions and answers to IRS’ scores. We based our scoring on the scoring criteria to which we had mutually agreed with IRS.Those criteria established specific acceptable combina- tions of required assistor probes for factual information and/or responses that needed to be present in the conversation for the response to be considered correct. Our monitoring sample was randomly selected from IRS’ test call survey plan. Overall, our sample called for us to monitor 830 test calls, covering all test callers, time periods, and test questions. We calculated that this sample size would allow us to report our accuracy results for the period at the 95-percent level of confidence with a sampling error of plus or minus 2.5 percent,. However, we were unable to monitor and score 244 test calls primarily because of (1) deviations from IRS’ calling schedule, (2) the inability of test. callers to complete calls to the telephone sites, Page 5 GAO/GGD-sOS’I IRS Test Call Survey I%234202 and (3) occasional problems with our monitoring equipment. In addition, we dropped nine calls from our sample because test callers deviated from the agreed-upon question scripts, thereby affecting the outcome of the call. Accordingly, the reduction in our sample size caused our sam- pling error to increase to plus or minus 4.4 percent at the 95-percent level of confidence. We monitored how well IRS administered its test and discussed the devel- opment of and planning for the test with IRSofficials in the Taxpayer Service and Sta&stics of Income Divisions and with the project manager of the contractor IRS selected to develop and implement ITCSS’ computer- ized scoring response program. Our objectives, scope, and methodology are discussed in additional detail in appendix II. Our monitoring results showed that IRS telephone assistors correctly ITCSS Produced a answered 391 of the 577 tax law test questions. For the same 577 test Valid Indicator of calls, IRS scored 377 of’them as correct. IJsing the same method as IRS to Overall Assistor statistically weight 0111‘ scoring, our results show a 67.2-percent IRS tele- phone assistance accuracy rate compared to IRS’ accuracy rate for the Performance monitored calls of 05.X percent. The difference in these rates is not sta- tistically significant and, therefore, the overall 62%percent accuracy rat.e IRS reported for all 11’~s calls can be relied upon as a valid indicator of assistors’ perform;ll~cc. The variance& our and IRS’scoring of test calls was due primarily to differences in interpret,ation as to the adequacy of assistors’ probes and responses. Probing is important because taxpayers who call with ques- tions usually are not sufficiently familiar with the tax laws to know what information assistors need to answer their questions. Without knowing certain facts about a taxpayer’s situation or status, assistors cannot be certain that the response they give would actually apply to t,he taxpayer. Assistors, t,herefore, must elicit that information from the t.axpayer or provide> a conditional response. Generally, assistor probes and responses clearly met or failed to meet the agreed-upon scoring criteria for a correct response. However, instances occurred where assistors’ probes and responses varied some- what from predet crmined acceptable probes and responses; therefore, judgments had to be nradtx on whether the responses expressed were acceptable. On 60 monitored calls, or about 10 percent of our sample, we disagreed with IRSt.cxslcallers as to whether a given probe or response fully met the scormg I*rittlria. For 38 of the 60 calls, we scored the Paye 6 GAO/GGD!Ml-37 IRS’ Test Cdl Survey 5234202 responses as correct and IRS scored them as incorrect. For the other 22 calls, we scored the assistors’ responses as incorrect, but IRS scored them as correct. Figure 2 shows IRS’ overall telephone assistor accuracy rate for ITCSS during the 1989 tax filing season compared to IRS’ and our results for the sample of test calls we monitored. Figure 2: IRS’ Overall Test Results Compared to GAO- and IRS-Monitored Test Call Results for the 1989 Tax Filing 75 Pwcentcormd Season 70 Our scoring of assistors responses to the 577 monitored test calls was based on scoring criteria that we and IRS mutually devised. As discussed in our March 1989 testimony before the Subcommittee,2 IRS also reported a higher accuracy rate that was based on more liberal scoring criteria than those on which we had agreed. About 2 weeks after the start of the test, IRS determined that, for certain questions, assistors were providing answers that IRS believed were “not wrong” but failed to meet our agreed-upon standards for correct answers. IRS officials said that it Page 7 GAO/GGD-9037 IRS’ Test Cd Survey B-234202 would be unfair to imply to Congress or the public that assistors were providing wrong answers if that advice would not necessarily lead tax- payers to file inaccurate tax returns. Thus, IRS devised another category of response-“right” answers-that failed to meet minimal standards but which they proposed to add to the “correct and complete” and “cor- rect” categories in reporting accuracy statistics. We did not agree to IKS’revision of the scoring criteria. In our opinion, the responses IRS categorized as “right” were incomplete and potentially misleading and would increase the likelihood that taxpayers following such advice would make errors. For example, to defer the capital gains tax on the sale of a principal residence, a taxpayer must replace and occupy a new residence within a specified time period. IRSconsidered an answer as “right” if only the time period for replacement was provided. We considered the answer as incomplete and potentially misleading because of the tax consequences that would result if the taxpayer did not meet the occupancy requirement. In July 1989, the Assistant Com- missioner (Taxpayer Service) said that for the 1990 filing season, IRS would only report assistor accuracy rates that were based on mutually agreed-upon scoring criteria. -~ For the 1989 tax filing season, IRS fairly administered its test call sur- Conclusion vey, and we agree that its reported overall 62.Spercent assistor accu- racy rate is reliable. For the sample of ITCSS test calls that we monitored, the difference in the accuracy rates for correct answers between our scoring and IRS’scoring of those calls was not statistically significant. Thus, we believe that with periodic oversight the test call system admin- istered by IRS can be used as the principal monitor of its assistors’ performance. In providing comments to this report, the Commissioner of the Internal Agency Comments and Revenue Service said that he was not satisfied with the telephone assis- Our Evaluation tance accuracy rate that IRS achieved in 1989 and that one of his major goals is to improve this accuracy rate in 1990. The Commissioner agreed with our findings but recommended the deletion of the table that pre- sents accuracy rates for each IRScall site. He believes that because the sample size of the data pertaining to each call site is smaller than national or regional sample sizes, the confidence interval associated with any call site accuracy range is too wide to be meaningful (see app. IV). Page t3 GAO/GGD9037 IRS’ Test Cd Survey -.- E-234202 As discussed on page 13, we agree that call site accuracy rate ranges are wider than for national or regional accuracy rate ranges. However, the potential variance in the accuracy of call site data varies from plus or minus 4.95 percent to plus or minus 6.4 percent-a range we believe useful for comparisons of call site performance. To mitigate the Commis- sioner’s concerns and to permit reader perspective, we added to the call site data table the accuracy rate ranges for each call site. As arranged with the Subcommittee, we are sending copies of this report to the Commissioner of Internal Revenue and other interested parties. We will make copies available to others upon request. The major contributors to this report are listed in appendix V. Please contact me on 272-7904 if you or your staff have any questions concern- ing the report. Sincerely yours, Paul L. Posner Associate Director, Tax Policy and Administration Issues Page 9 GAO/GGINJO-37 IRS’ Test Cd Survey Contents Letter 1 - Appendix I 12 Integrated Test Call ITCSS Accuracy Rate Reflects Overall Quality of IRS 12 Telephone Service Provided to Taxpayers SUmeY System Resu1ts Analysis of Selected Data 13 for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site Appendix II 24 Objectives, Scope, and Methodology Appendix III Selected ITCSS Test Sample Question 1 26 Sample Question 2 27 Questions and Required Responses --~ Appendix IV 29 Comments From the Internal Revenue Service Appendix V 30 Major Contributors to This Report Related GAO Products 31 -~ Tables Table 1: Individual Tax Law Categories and Subcategories 3 Tested by IRS - 1989 IRS Test Call Survey Page 10 GAO/GGlMO-37 IRS’ Test Cdl Survey Contents Table I. 1: Estimated Regional Accuracy Rates and 15 Accuracy Rate Ranges - 1989 IRS Test Call Survey Results Table 1.2: Estimated National Accuracy Rates and 17 Accuracy Rate Ranges by Tax Law Category - 1989 IRS Test Call Survey Results Table 1.3: Estimated Regional Accuracy Rates and 20 Accuracy Rate Ranges by Tax Law Category - 1989 IRS Test Call Survey Results Table 1.4: Accuracy Rate Ranges for IRS Toll-Free 22 Telephone Sites - - 1989 IRS Test Call Survey Results .~~ ..__ Figures Figure 1: Locations of Toll-Free Telephone Sites IRS 5 Surveyed - 1989 IRS Test Call Survey Figure 2: IRS’Overall Test Results Compared to GAO- and 7 IRS-Monitored ‘I’& Call Results for the 1989 Tax Filing Season Figure 1.1: IRS’ ktional Level Accuracy 12 Figure 1.2: Distribution of 62 Tax Law Questions by Tax 14 Law Category Figure 1.3: Estimated Regional Accuracy Rates 15 Figure 1.4: Estimated Nat ional Accuracy Rates by Tax 16 Law Category Figure 1.5: Estimated Regional Accuracy Rates by the Tax 18 Law Categorks of Filing Information, Exemptions, Individual lniomc. and Capital Gains Figure 1.6: Estimat,tcl Regional Accuracy Rates by the Tax 19 Law Categories of Pensions, Adjustments to Income, and Tax CornpIll atlon Abbreviations IRS Internal lirvc,nuc Service ITCSS Integrai (~1Ttsst Call Survey System Page 11 GAO/CXD-90.37 IRS’ Test Call Survey Appendix I Integrated Test Call Survei System Results for the 1989 Tax F’iling Seasonby Tax Law Category, Region, and Call Site The Integrated Test Call Survey System was developed by IRS to more ITCSS Accuracy Rate accurately measure the accuracy of IRS responses to taxpayer telephone Reflects Overall inquiries. Accuracy measurement is important because IRS believes that Quality of IRS the higher the telephone assistance accuracy rate the better the quality of service it provides to the public. Telephone Service Provided to Taxpayers The 1989 test call survey system was designed by IRS to place 1,488 test calls per week for 11 weeks to 29 of its 32 call sites throughout the United States. Each test call came from a group of 62 questions dealing with tax law issues pertaining to individuals. All test questions were derived from tax law categories in which IRS believes most taxpayers ask questions. To be credited with a correct response, ITCSS implementa- tion guidance directed that each IRS telephone assistor in the 29 toll-free telephone assistance call sites nationwide (1) obtain relevant facts from the taxpayer as necessary before attempting to give an answer and (2) ensure that an answer was tailored to satisfy the taxpayer’s needs. - -- Overall National Level From February 6, 1989, to April 21, 1989, IRS National Office test callers ITCSS Accuracy Results completed and scored 14,876 test calls. Figure I.1 shows the national level accuracy results for these test calls. Figure 1.1: IRS’ National Level Accuracy Number of Test Calls Scored as Incorrect (5,534) Number of Test Calls Scored as Correct R=3 Page 12 GAO,‘GGD-9037 IRS’ Test Call Survey Appendix I Integmted Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site -~ The data presented in this section represent selected results obtained by Analysis of Selected IRS during its 1989 tax filing season test call survey sample. We should Data point out that our monitoring sample was designed to evaluate the valid- ity of IRS’ overall national accuracy rate, not the accuracy of IRS’ statisti- cal results at the tax law category, region, or call site levels. It should be expected that ITCSS results by categories, regions, and call sites have larger sampling errors than the overall ITCSS results because it is a com- mon statistical property that a subsection of a sample has more variabil- ity than the whole sample. Accuracy rates are est.imated because they are drawn from a statistical sample of test calls. Each estimate has a range of precision, or confi- dence interval, associated with it. The size of this accuracy rate range varies by the size of the test call sample used to produce the confidence interval. Therefore, the variability of accuracy rate ranges and esti- mated accuracy rates relating to the tax law category, regional, and call site data tables and figures that follow is the result of differing sample sizes associated with each level of data. For example, tax law category and regional data were based on larger sample sizes than the call site sample sizes and, therefore, produced narrower confidence intervals. The narrower the interval the higher the probability that the estimated accuracy rate approximates the actual accuracy rate. All data ranges shown in this section have been calculated to express the results at the g&percent level of confidence. Percentage of ITCSS The 62 tax law questions comprising I& test call survey covered seven Questions by Tax Law tax law categories in which IRS determined that taxpayers commonly made telephone inquiries. Figure 1.2 shows the distribution of the test Category call questions across the seven tax law categories. Page 13 GAO/MD9037 JRS Test Call Survey Appendix I Inte@ated Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site Figure 1.2: Distribution of 62 Tax Law Questions by Tax Law Category 7 ECamputation (15 questions) Filing Information (4 questions) Exemptions (7 questions) Individual Income (8 questions) Capital Gains (6 questions) Pensions (10 questions) 1 Adjustments to Income (12 questions) Estimated Accuracy Rates Figure I.3 shows the estimated accuracy rates achieved by each IRS region, and table I. 1 shows the specific accuracy rate range associated for IRS Regions with each region’s estimate. These data indicate that the Central Region accuracy rate clearly c>xccededboth the Korth Atlantic and Mid-Atlantic Regions’ accuracy ratcls. Page 14 GAO/GGB9037 IRS’ Test Call Survey Appendix I tntecna.ed Test Call SIUWY System Results for tyhe 1989 Tax Filing Se&~ by Tax Law Category, Region, and CaU Site Figure 1.3: Estimated Regional Accuracy Rates 100 Esiimaled Percent Correct 90 IRS Regions and National Accuracy Rates Table 1.1: Estimated Regional Accuracy Rates and Accuracy Rate Ranges - Figures in percent 1969 IRS Test Call Survey Results Estimated accuracy Accuracy IRS region rate rate range Central 67 7 84.9 - 70.6 Mid-Atlank 61.4 58 6 - 642 Midwest 63 7 61.3 66 1 North Atlantlc 59.2 557 62 6 Southeast 62 7 59.7 65 8 Southwest 62.7 59 0 65.7 Western 626 59 8 65.5 Page 16 GAO/GGD9037 IRS’ Test Call Survey Appendix I Integrated Test CalI Survey System Results for the 1989 Tax Piling Season by Tax Law Category, Region, and CalI Site Estimated National and Figure I.4 shows the estimated accuracy rates achieved by IRS telephone Regional Accuracy Rates assistors within each tax law category, and table I.2 shows the accuracy rate range data associated with these estimates. These data illustrate by Tax Law Category that telephone assistors clearly had the most difficulty providing correct responses to questions dealing with capital gains. Figure 1.4: Estimated National Accuracy Rates by Tax Law Category IOO Estimated Percent Correct 90 90 Tax Law Categories Page 16 GAO/GGIMlW7 IRS’ Test Call Survey Appendix I Integnded Test Call Survey System Results for the 1989 Tax F-iling Season by Tax Law Category, Region, and Cnll Site Table 1.2: Estimated National Accuracy Rates and Accuracy Rate Ranges by Tax Figures In percent Law Category - 1989 IRS Test Call Estimated Survey Results accuracy Accuracy Tax law category rate rate range Filing InformatIon -~ 68.3 65.3 - 71.3 Exemptions 66.7 63.6 69.8 lndiwdual Income 62.7 59 6 65.8 Capital Gams 44.9 41 3 48.5 Pensions 65.3 62.4 - 68.2 Adiustments to Income 59.8 569 627 Tax Computation 65.3 70.1 Sources Internal Revenue Serwce Figures I.5 and 1.6 show the estimated accuracy rates, and table I.3 shows the corresponding accuracy rate ranges for each IRSregion in each tax law category. For purposes of comparison, we have included national accuracy rates and ranges for the same tax law categories. Page 17 GAO/GGD9&37 IRS’ Test Call Survey Appaiix I Intelfrated Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site Figure I.5 Estimated Regional Accuracy Rates by the Tax Law Categories of Filing Information, Exemptions, Individual Income, and Capital Gains 100 Estimated Poment Correct 90 90 70 60 60 40 30 30 10 0 0 Filing Information Exemptions individual Income m Capital Gains Page 18 GAO/GGD90-37 IRS Test Call Survey Appendix I Integrated Test CaII Survey System Results for the 1989 Tax Filing Season by Tax Law Category, R&on, and &II Site Figure 1.6: Estimated Regional Accuracy Rates by the Tax Law Categories of Pensions, Adjustments to Income, and Tax Computation 100 Estimated Percent Correct so so 70 so 50 40 30 20 10 0 IRS Regions and National Accuracy Rates Pensions 0 Adjustments to Income Tax Computation Page 19 GAO/GGD%XW IRS’ Test CaII Survey Appendix I Integrated Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site Table 1.3: Estimated Regional Accuracy Rates and Accuracy Rate Ranges by Tax Law Category - 1969 IRS Test Call Survey Results Figures in percent Estimated Accuracy Tax law category IRS region accuracy rate rate range Flllng InformatIon Central 74 1 66.7 81.5 Mid-Atlank 66 1 58.6 736 Midwest 66 6 600 732 North Atlank 66.2 57.1 753 Southeast 70.5 62.6 78 4 Southwest 66 7 58.7 74.7 Western 67 9 60.3 75 5 (National average) (68 3) (65.3) (71.3) ExemptIons Central 72.7 65.0 80.4 Mrd-Atlank 63.6 55.8 71.4 Midwest 69.3 62.7 759 North Atlank 62.0 524 71 6 Southeast - 67 2 58 9 75.5 Southwest 66 2 58.0 74 4 Western 66 4 58 6 74.2 (NatIonal average) (66 7) (63.6) (69 8) lndivldual Income Central 67 5 59.6 75.4 Mid-Atlank 59.5 51.7 67 3 MIdwest 63 9 57.2 70 6 North Atlantic 59 i 50.3 69 1 Southeast 63.1 54.8 71 4 Southwest 62 9 54.7 71 1 Western 62.2 54.4 70.0 (Natronal average) (62 7) -~ (59.6) (65.8) Capital garns Central 51 9 42.5 61 3 Mld~Atlantlc 44 9 36.1 53.7 Mrdwest 43.7 35.9 51.5 North Atlantic 40.8 30.2 51 4 Southeast 41.5 320 51.0 Southwest 48.0 38 6 57.4 Western 44.3 35 3 53.3 (NatIonal average) ~ ~~ (44 9) ~~~~ (41 3) (48 5) ,. (connnueaj Page 20 GAO/GGD-9037 IRS’ Test Call Survey Appendix I Inte@ated Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and CalI Site Estimated Accuracy Tax law category IRS region accuracy rate rate range Pensrons Central 69.1 61.7 76 5 Mrd-Atlantrc 64.6 574 71.8 Mrdwest 67 2 60.9 73.5 North Atlantrc 59.6 50.6 68 6 Southeast 66.1 58 3 73.9 Southwest 66.8 59.2 74.4 Western 63.4 56.0 70.8 (National average) (65.3) (62.4) (68 2) Adjustments to income Central 65.8 58.5 73 1 Mrd-Atlantic 59.6 52.5 66 7 Mrdwest 59 5 53.2 65.8 North Atlantrc 57.1 484 658 Southeast 57 9 50 1 65.7 Southwest 57 6 50.0 65.2 Western 62 2 550 694 (National average) (59 8) (56.9) (62 7) Tax computatron Central 69.9 637 761 Mrd-Atlantrc 66 1 60.1 72 1 Mrdwest 70 6 65.5 75 7 North Atlantrc 64.6 573 719 Southeast 68.0 61.6 744 Southwest 67.0 60 7 73 3 Western 67.7 61.7 737 (NatIOnal Werage) (67 7) (65.3) (70 1) Source Internal Revenue Servli:e Page 21 GAO/GGD9@37 IRS’ Test Call Survey Appendix I Integrated Test Call Survey System Results for the 1989 Tax Filing Season by Tax Law Category, Region, and Call Site Estimated Accuracy Rates Table I.4 below shows for the 1989 tax filing season the variations in for IRS Toll-Free accuracy rate ranges for the 29 telephone sites tested by IRS. Telephone Assistance Call Sites Table 1.4: Accuracy Rate Ranges for IRS Toll-Free Telephone Sites - 1969 IRS Figures In percent Test Call Survey Results Estimated Accuracy IRS telephone sites by region accuracy rate rate range Central Region Clnclnnatl 656 59.5 71 7 Cleveland 69.8 63.8 75.7 Detrort 652 59.1 71.3 lndlanapolis 70.6 65.7 756 Mid-Atlantjc Region Baltimore 64.1 57.9 702 Newark 520 45.6 58.4 Philadelphia 54.9 48.5 61 3 Pittsburgh 726 66.8 78.3 Richmond 61 1 54.9 67-i Midwest Regron Chlcago 58.2 528 63.5 Des Moines - 69.7 63.8 75 7 Milwaukee 65.0 - 58.8 71.1 Omaha 73.0 67.3 78 7 St LOUIS 625 56.2 68.7 St Paul 69.1 63.2 751 North Atlantic Region Boston 677 62 7 72.8 Brooklyn 52.1 45.7 585 Buffalo 67.7 617 737 Southeast Regron Atlanta 57.2 51 9 62.6 Jacksonville 66.0 60.9 71 2 Nashville 674 -- 62.3 725 , Page 22 GAO/GGD-9037 IRS’ Test Call Survey Appaulix I Integrated Test Call Survey System Results for the 1999 Tax Filing Season by Tax Law Category, Region, and Call Site Estimated Accuracy IRS telephone sites by region accuracy rate rate range Southwest Region ~~~__ Dallas 59.6 543 64 9 Denver 64 6 58.4 70 7 Houston 65.5 59.4 71 6 Phoenix 62 0 ~..~ 55.8~~ 68 2 Western Region El Monte 56.0 49.6 62 4 Oakland 65.1 .-____ 59.9 70 3 Portland 71 3 65.5 77 1 Seattle 67.5 62.5 72.6 Source. Internal Revenue Serv~ze Page 23 GAO/GOD-9037 IRS’ Test Call Survey Appendix II Objectives, Scope, and Methodology Our objectives were to report on IRS’ administration of its 1989 test call survey and on the validity of the statistical estimates produced during this test. To evaluate how well IRS administered ITCSS, we interviewed Taxpayer Service Division officials, reviewed IRS planning documents and managerial records, and monitored a randomly selected sample of test calls. We also interviewed officials and reviewed documents from IRS’ Statistics of Income Division and Mathematics Policy Research, Inc., (the contractor IRSselected to develop and implement the computerized response scoring program) for information pertaining to ITCSS’ design and implementation. Finally, IRS’ internal audit reviewed ITCSS proce- dures and results, and we interviewed the IRS auditors and reviewed their evaluation documentation. We did our work at the IRS National Office in Washington, D.C., from January 1989 to August 1989. To evaluate the validity of the statistical estimates produced by ITCSS, we monitored and scored a statistically valid random sample of survey test calls and compared our scoring of those calls with documentation showing how IRS scored the same calls. IRS devised its test call survey sample plan to produce statistical estimates of the accuracy of its tele- phone assistors in answering scripted test questions involving tax law for individuals. ITCSS design methodology called for eight IRS test callers at the National Office to place a total of 16,368 randomly selected test calls over an ll-week period (Feb. 6, 1989, through Apr. 21, 1989) to 29 toll-free telephone assistance sites throughout the United States (see fig. 1). During the test period, IRS test callers actually completed and scored 14,876 test calls. Each test caller was scheduled to make 186 test calls per week to vari- ous call sites and at various times specified in the test call sample. The test call sample assigned to each test caller was randomly selected from a pool of 62 tax law questions representing the seven major tax law cat- egories in which IRS determined that individual taxpayers commonly ask questions. These seven tax law categories contained 32 subcategories of tax law, as shown in table 1. The ITCSS design methodology was devel- oped to produce results that would have a sampling error of plus or minus 2 percent at the 95percent level of confidence. We began monitoring IRS’test on Wednesday, February 22, 1989, about 2 weeks after IRS started its test call sample, and the first workday that telephone monitoring equipment supplied by IRS was operable. We con- tinued our monitoring until Friday, April 14, 1989, a total of 38 test days. In order to comment on IRS’ accuracy results, we developed a sam- pling plan that called for us to listen to and score a randomly selected Page 24 GAO/GGD90-37 IRS Test Call Survey Appendix II Objectives, Scope, and Methodology sample of 830 scheduled test calls that covered all test callers, daily time periods, and test questions. We calculated that this sample size would allow us to report our accuracy results for the period at the 95-percent level of confidence with a sampling error of plus or minus 2.5 percent. To accomplish our monitoring, we developed monitoring records that incorporated the scripted test questions, probes, and responses used by IRS’test callers. We used an individual monitoring record to document the scoring of each test call and to note any test caller deviations from the scripted test calls or assistor variations from acceptable probes. At the end of each day, we provided IRS with a listing of the calls we moni- tored, and IRS later provided us with documentation showing the ITCSS test callers’ scoring of the same test calls. We compared our scoring with IRS’ scoring for each monitored test call and documented the results. We evaluated test caller deviations from the scripts to determine whether they could have had a material effect on the assistors’ responses. We determined that nine deviations were material (e.g., inap- propriate information provided by the test caller either led to or pre- empted an assistor’s response), and we deleted those calls from our monitoring sample. In addition to the nine calls deleted because of test caller script devia- tions, we were unable to monitor and score 244 test calls primarily due to (1) deviations from IRS’ calling schedule because of test caller absences and IRS staff meetings; (2) IRS’ inability to complete test calls as scheduled due to heavy call volumes at the sites called; and (3) occa- sional monitoring equipment problems, which impaired our ability to clearly hear the assistors’ responses. However, anticipating such prob- lems, we purposely oversampled to accommodate lost calls. Although we oversampled, the number of lost calls exceeded our estimates and caused our sampling error to increase. Accordingly, the 577 test calls we monitored and scored are a statistically valid sample size that allows us to report our results with 95-percent confidence that our sampling error is no greater than plus or minus 4.4 percent. Page 25 GAO/GGJMO-37 IRS Test CalI Survey Appendix III SelectedITCSS Test Ques6ons and Required Responses - This appendix presents two ITCSS test questions that were used in the 1989 test call survey. IRS and we agreed that these questions would not be used in the 1990 survey and, thus, we believe that they will provide readers of this report with concrete examples of the types of questions that comprise the test call survey. To score ITCSS test questions, IRS and we agreed on the specific responses that would be categorized as (1) correct or (2) correct and complete. A correct answer was the minimal standard IRS expected its telephone assistors to meet. Answers that exceeded this standard would be classi- fied as correct and complete. It was further agreed that all other answers would be scored as incorrect-meaning that the telephone assistor’s answer could lead taxpayers to a wrong result on their tax return. IRS’ reported 62%percent national accuracy rate for the 1989 tax filing season and our monitoring of how well IRS administered its test call survey were based on the agreed-upon scoring criteria for “correct” responses. Answers that met the correct and complete standard were also considered as correct for monitoring and scoring purposes. For 48 of the 62 ITCSS test questions, scoring criteria required that assis- tors probe callers to obtain information that would be needed to answer their questions with a correct response. Of the 48 questions that required assistors to probe, 29 questions required 1 probe, 16 questions required 2 probes, and 3 questions required 3 probes. Probing is impor- tant because taxpayers who call with questions usually are not familiar with the tax laws and frequently do not know what information assis- tors need to answer their questions correctly. Without knowing certain facts about a taxpayer’s situation or status, assistors cannot be certain that the response they give would actually apply to the taxpayer. Assis- tors, therefore, must elicit that information from the taxpayer or pro- vide a conditional response. The two sample questions that follow illustrate test questions that require no probing and questions that require multiple probing. To assist the caller in judging whether assistors covered the required probes and gave the correct responses, the required probing and response points were enumerated individually. Tax law category: Capital gains and losses. Sample Question 1 Subcategory: Sale,/Exchange of residence. Page 26 GAO/GGD-99-37 IRS’ Test Cd Survey Appendh IlI Selected ITCSS Test Questions and Required Responses Question: My husband:wife) and I have been working for a major corpo- ration in Germany and have decided to sell our home in the United States. We were told that we only have 2 years in which to replace the property. Doing that will be a real burden on us since we’ll still be out of the United States. Is there a way around that 2-year requirement for replacement? We are not eligible for the one-time exclusion for people 55 or older. Background: . Caller and spouse have been overseas for 6 months. l Caller and spouse have not rented their U.S. home. . Caller and spouse will be abroad about 3 years. l Caller’s tax home is outside of IJnited States. Probing points: None Response points: Rl: The replacement for your main home is extended to 4 years from the date of sale of your old home. R2: You must occupy the new home within the 4-year period. R3: Refer to Publication 523, Tax Information on Selling Your Home. Scoring: Correct: RI and R2 Correct and complete: Rl __ and R2 and R3. Tax law category: Individual income. Sample Question 2 Subcategory: Wages, alimony, and unemployment compensation Question: My father was unemployed part of last year. He only made $3,500 before he went on unemployment. Does he have to file a return? Background: Page 27 GAO/GGD9037 IRS’ Test Call Survey Appendix III Selected ITCSS Test Questions and Required aesponses l Father received $1,600 in unemployment compensation from the state and he made no contributions to the plan. l Father received no interest or other income. l Father is 61 and not blind. l Father is single with no dependents. - Father cannot be claimed as a dependent on caller’s (or anyone else’s) return. Probing points: Pl: How much unemployment compensation did your father receive? P2: How old is your father? Or - is your father 65 or older? P3: What is your father’s filing status? Or is your father married? Response Points: Rl : Yes. he must file a return. R2: His unemployment benefits are taxable. R3: His total income exceeds the threshold for filing; Or his total income exceeds $4,950; Or his total income exceeds his standard deduction and personal exemption. Scoring: Correct: Pl and P2 and P3 and Rl and (R2 ok R3). Correct and complek Pl and P2 and P3 and Rl and R2 and R3 Page 28 GAO/GGD-99-37 IRS’ Test CalJ Survey Appendix IV Comments From the Internal Revenue Service DEPARTMENT OF THE TREASURY lNTERNAL REYENUE SERVICE WASH,NGTON. D.C. 20224 Mr. Richard L. Fogel Assistant Comptroller General United States General Accounting Office Washington, DC 20548 Dear Mr. Fogel: We have reviewed your recent draft report entitled! ‘Tax Administration: Monitoring the Accuracy and AdministratIon of IRS’ 1989 Test Call survey, ” which was produced at the request of the Chairman, Subcommittee on Oversight, House Committee on Ways and Means. We generally agree with the report’s findings which validate the design and our administration of the Integrated Test Call Survey System (ITCSS). However, we recommend deletion of the tables in Appendix I that present accuracy rates for each telephone site. Because of the smaller sample pertaining to each site, it is not possible to achieve statistical validity without presenting data in ranges that are too wide to be meaningful. For this reason, it has been IRS policy not to release individual call site data. We have no objection to publishing call site data at the end of the next filing season if we can work with WO to assure statistically valid data. We would also like to work with GAO to provide any data that is necessary to release the report for the 1990 filing season. We believe that earlier release of the report would avoid public confusion and the consequent increase in the volume of calls that occur when the report is released at the beginning of the next filing season. It would also help us focus on remedial actions in planning for the next filing season. The IRS is not satisfied with the accuracy rate that we achieved last year. One of my major goals is to improve the taxpayer service accuracy rate for 1990 and we are taking steps to achieve this improvement. For example, a test site in Boston provides IRS telephone assistors with a computerized data system designed to ensure that taxpayers are asked all necessary questions and correct answers are provided by telephone assistors. Taxpayer Service staff throughout the country have been provided with written desk guides that use these techniques to teach assistors to fully and accurately respond to taxpayer inquiries. We have also used the test call data from the 1989 filing season to modify our training of telephone assistors to improve weak areas. These and other actions lead us to believe that the 1990 filing season will see substantial improvements in our telephone tax assistance. Best regards. Sincerely, Page 29 GAO/GGD-90.37 IRS’ Test Call Survey Appendix V Major Contributors to This Report Larry H. Endy, Assistant Director, Tax Policy and Administration Issues General Government Robert P. Glick, Assignment Manager Division, Washington, Martin S. Morris, Tax Attorney William F. Bley Evaluator-in-Charge D’C* ’ - Susan Ragland ‘Fvaluator Maria Z. Oliver: ivaluator Harry M. Conley III, Statistician Program Evaluation and Methodology Division, Washington, D.C. Page 30 GAO/GGD90-37 IRS’ Test Cdl Survey Page 31 GAO/GGDSO37 IRS’ Test CalI Survey Related GAO Products Accessibility, Timeliness, and Accuracy of IRS’ Telephone Assistance Program (GAO/GGD-8%30, Feb. 2, 1989). (268445) Page 32 GAO/GGB9%?7 IRS’ Test Call Survey ,.. “1 _.,” ‘. &quests for copies of GAO reports should be sent ta T&ephone 202~2758241 The first five copies of each report are free. Additional copies are $2.00 each. There is a 26% discount on orders for 100 or more copies ma&xi to a sin@e8ddres6. Orders must be prepaid by cash or by check or money order made out to the Superintendent of Documents. j. .. / ,. .:a .i/ . .._ :._ S,’
Tax Administration: Monitoring the Accuracy and Administration of IRS' 1989 Test Call Survey
Published by the Government Accountability Office on 1990-01-04.
Below is a raw (and likely hideous) rendition of the original report. (PDF)