ITBS Research Guide - Iowa Testing Programs
Transcription
ITBS Research Guide - Iowa Testing Programs
961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page i Contents Part 1 Nature and Purposes of The Iowa Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Iowa Tests. . . . . . . . . . . . . . . . . . . . . . . 1 Major Purposes of the ITBS Batteries . . . . . 1 Validity of the Tests . . . . . . . . . . . . . . . . . . . 1 Description of the ITBS Batteries . . . . . . . . 2 Names of the Tests . . . . . . . . . . . . . . . . . 2 Description of the Test Batteries . . . . . . . 2 Nature of the Batteries . . . . . . . . . . . . . . 2 Nature of the Levels . . . . . . . . . . . . . . . . 2 Grade Levels and Test Levels. . . . . . . . . 3 Test Lengths and Times . . . . . . . . . . . . . 3 Nature of the Questions . . . . . . . . . . . . . 3 Mode of Responding . . . . . . . . . . . . . . . . 3 Directions. . . . . . . . . . . . . . . . . . . . . . . . . 3 Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . 6 Iowa Writing Assessment . . . . . . . . . . . . 6 Listening Assessment for ITBS . . . . . . . . 6 Constructed-Response Supplement to The Iowa Tests . . . . . . . . . . . . . . . . . 6 Other Manuals . . . . . . . . . . . . . . . . . . . . . . . 6 Part 2 The National Standardization Program . . 7 Planning the National Standardization Program. . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Procedures for Selecting the Standardization Sample . . . . . . . . . . . . . . 7 Public School Sample . . . . . . . . . . . . . . . 7 Catholic School Sample . . . . . . . . . . . . . 8 Private Non-Catholic School Sample . . . 8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8 Design for Collecting the Standardization Data . . . . . . . . . . . . . . . . . 8 Weighting the Samples . . . . . . . . . . . . . . . . 8 Racial-Ethnic Representation . . . . . . . . . . 12 Participation of Students in Special Groups . . . . . . . . . . . . . . . . . . . . 12 Empirical Norms Dates . . . . . . . . . . . . . . . 14 School Systems Included in the 2000 Standardization Samples. . . . . . . . . . . . . 16 New England and Mideast . . . . . . . . . . 16 Southeast . . . . . . . . . . . . . . . . . . . . . . . 17 Great Lakes and Plains. . . . . . . . . . . . . 19 West and Far West . . . . . . . . . . . . . . . . 22 Part 3 Validity in the Development and Use of The Iowa Tests . . . . . . . . . . . . . . . 25 Validity in Test Use . . . . . . . . . . . . . . . . . . 25 Criteria for Evaluating Achievement Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Validity of the Tests . . . . . . . . . . . . . . . . . . 25 Statistical Data to Be Considered . . . . . . . 26 Validity of the Tests in the Local School . . 26 Domain Specifications . . . . . . . . . . . . . . . . 27 Content Standards and Development Procedures . . . . . . . . . . . . . . . . . . . . . . . 28 Curriculum Review . . . . . . . . . . . . . . . . 28 Preliminary Item Tryout . . . . . . . . . . . . . 28 National Item Tryout . . . . . . . . . . . . . . . 28 Fairness Review . . . . . . . . . . . . . . . . . . 30 Development of Individual Tests . . . . . . 30 Critical Thinking Skills . . . . . . . . . . . . . . 43 Other Validity Considerations . . . . . . . . . . 44 Norms Versus Standards . . . . . . . . . . . 44 Using Tests to Improve Instruction . . . . 44 Using Tests to Evaluate Instruction . . . . 45 Local Modification of Test Content . . . . 45 Predictive Validity . . . . . . . . . . . . . . . . . 46 Readability. . . . . . . . . . . . . . . . . . . . . . . 48 Part 4 Scaling, Norming, and Equating The Iowa Tests . . . . . . . . . . . . . . . . . . . . . 51 Frames of Reference for Reporting School Achievement . . . . . . . . . . . . . . . . 51 Comparability of Developmental Scores Across Levels: The Growth Model. . . . . . 51 The National Standard Score Scale . . . . . 52 Development and Monitoring of National Norms for the ITBS . . . . . . . . . . . . . . . . . 55 Trends in Achievement Test Performance . . . . . . . . . . . . . . . . . . . . . . 55 Norms for Special School Populations . . . 60 Equivalence of Forms . . . . . . . . . . . . . . . . 60 Relationships of Forms A and B to Previous Forms . . . . . . . . . . . . . . . . . . . . 61 i 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page ii Part 5 Reliability of The Iowa Tests . . . . . . . . . . 63 Methods of Determining, Reporting, and Using Reliability Data . . . . . . . . . . . . 63 Internal-Consistency Reliability Analysis . . 64 Equivalent-Forms Reliability Analysis . . . . 74 Sources of Error in Measurement . . . . . . . 75 Standard Errors of Measurement for Selected Score Levels. . . . . . . . . . . . . . . 77 Effects of Individualized Testing on Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 83 Stability of Scores on the ITBS . . . . . . . . . 83 Part 6 Item and Test Analysis. . . . . . . . . . . . . . . 87 Difficulty of the Tests . . . . . . . . . . . . . . . . . 87 Discrimination . . . . . . . . . . . . . . . . . . . . . . 94 Ceiling and Floor Effects . . . . . . . . . . . . . 100 Completion Rates . . . . . . . . . . . . . . . . . . 100 Other Test Characteristics . . . . . . . . . . . . 100 Part 7 Group Differences in Item and Test Performance . . . . . . . . . . . . . . . . . . . . . . 107 Standard Errors of Measurement for Groups. . . . . . . . . . . . . . . . . . . . . . . . . . 107 Gender Differences in Achievement . . . . 107 Racial-Ethnic Differences in Achievement . . . . . . . . . . . . . . . . . . . . . 114 Differential Item Functioning . . . . . . . . . . 116 Part 8 Relationships in Test Performance . . . 121 Correlations Among Test Scores for Individuals . . . . . . . . . . . . . . . . . . . . . . . 121 Structural Relationships Among Content Domains . . . . . . . . . . . . . . . . . . . . . . . . 121 Levels 9 through 14. . . . . . . . . . . . . . . 126 Levels 7 and 8. . . . . . . . . . . . . . . . . . . 126 Levels 5 and 6. . . . . . . . . . . . . . . . . . . 126 Interpretation of Factors . . . . . . . . . . . 126 Reliabilities of Differences in Test Performance . . . . . . . . . . . . . . . . . . . . . 127 Correlations Among Building Averages . . 127 Relations Between Achievement and General Cognitive Ability . . . . . . . . . . . . 127 Predicting Achievement from General Cognitive Ability: Individual Scores . . . . 131 Obtained Versus Expected Achievement. . . . . . . . . . . . . . . . . . . 136 Predicting Achievement from General Cognitive Ability: Group Averages . . . . . 143 ii Part 9 Technical Consideration for Other Iowa Tests. . . . . . . . . . . . . . . . . . . 149 Iowa Tests of Basic Skills Survey Battery . . . . . . . . . . . . . . . . . . . . 149 Description of the Tests . . . . . . . . . . . . 149 Other Scores . . . . . . . . . . . . . . . . . . . . 149 Test Development . . . . . . . . . . . . . . . . 149 Standardization . . . . . . . . . . . . . . . . . . 149 Test Score Characteristics. . . . . . . . . . 150 Iowa Early Learning Inventory . . . . . . . . . 150 Description of the Inventory . . . . . . . . 150 Test Development . . . . . . . . . . . . . . . . 151 Standardization . . . . . . . . . . . . . . . . . . 151 Iowa Writing Assessment . . . . . . . . . . . . 151 Description of the Test. . . . . . . . . . . . . 151 Test Development . . . . . . . . . . . . . . . . 152 Standardization . . . . . . . . . . . . . . . . . . 152 Test Score Characteristics. . . . . . . . . . 152 Constructed-Response Supplement to The Iowa Tests . . . . . . . . . . . . . . . . . 153 Description of the Tests . . . . . . . . . . . . 153 Test Development . . . . . . . . . . . . . . . . 154 Joint Scaling with the ITBS . . . . . . . . . 154 Test Score Characteristics. . . . . . . . . . 154 Listening Assessment for ITBS . . . . . . . . 155 Description of the Test. . . . . . . . . . . . . 155 Test Development . . . . . . . . . . . . . . . . 155 Standardization . . . . . . . . . . . . . . . . . . 155 Test Score Characteristics. . . . . . . . . . 157 Predictive Validity . . . . . . . . . . . . . . . . 157 Integrated Writing Skills Test . . . . . . . . . . 157 Description of the Tests . . . . . . . . . . . . 157 Test Development . . . . . . . . . . . . . . . . 158 Standardization . . . . . . . . . . . . . . . . . . 158 Test Score Characteristics. . . . . . . . . . 158 Iowa Algebra Aptitude Test . . . . . . . . . . . 159 Description of the Test. . . . . . . . . . . . . 159 Test Development . . . . . . . . . . . . . . . . 159 Standardization . . . . . . . . . . . . . . . . . . 159 Test Score Characteristics. . . . . . . . . . 160 Works Cited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page iii Tables and Figures Part 1: Nature and Purposes of The Iowa Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Table 1.1 Test and Grade Level Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Table 1.2 Number of Items and Test Time Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Part 2: The National Standardization Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Table 2.1 Summary of Standardization Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Table 2.2 Sample Size and Percent of Students by Type of School . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.3 Percent of Public School Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.4 Percent of Public School Students by SES Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Table 2.5 Percent of Public School Students by District Enrollment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Table 2.6 Percent of Catholic Students by Diocese Size and Geographic Region . . . . . . . . . . . . . . . . . . . 11 Table 2.7 Percent of Private Non-Catholic Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . 12 Table 2.8 Racial-Ethnic Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Table 2.9 Test Accommodations—Special Education and 504 Students. . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Table 2.10 Test Accommodations—English Language Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Part 3: Validity in the Development and Use of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Figure 3.1 Steps in Development of the Iowa Tests of Basic Skills. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Table 3.1 Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B . . . . . . . . . . . 31 Table 3.2 Types of Reading Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Table 3.3 Reading Content/Process Standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 3.4 Listening Content/Process Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Table 3.5 Comparison of Language Tests by Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Table 3.6 Computational Skill Level Required for Math Problem Solving and Data Interpretation . . . . . . . 41 Table 3.7 Summary Data from Predictive Validity Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3.8 Readability Indices for Selected Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Part 4: Scaling, Norming, and Equating The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Table 4.1 Comparison of Grade-to-Grade Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Table 4.2 Differences Between National Percentile Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Figure 4.1 Trends in National Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Table 4.3 Summary of Median Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 4.2 Trends in Iowa Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Table 4.4 Sample Sizes for Equating Forms A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Part 5: Reliability of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Table 5.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Table 5.2 Equivalent-Forms Reliabilities, Levels 5–14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Table 5.3 Estimates of Equivalent-Forms Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Table 5.4 Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests . . . . . . . . . . . . . 76 Table 5.5 Test-Retest Reliabilities, Levels 5–8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Table 5.6 Standard Errors of Measurement for Selected Standard Score Levels . . . . . . . . . . . . . . . . . . . . 78 Table 5.7 Correlations Between Developmental Standard Scores, Forms A and B . . . . . . . . . . . . . . . . . . 84 Table 5.8 Correlations Between Developmental Standard Scores, Forms K and L . . . . . . . . . . . . . . . . . . 85 iii 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page iv Part 6: Item and Test Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 6.1 Word Analysis Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 6.2 Usage and Expression Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 6.3 Distribution of Item Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Table 6.4 Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices . . . . . . . . . . . . 95 Table 6.5 Ceiling Effects, Floor Effects, and Completion Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Part 7: Group Differences in Item and Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Table 7.1 Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 by Level and Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Table 7.2 Male-Female Effect Sizes for Average Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Table 7.3 Descriptive Statistics by Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Table 7.4 Gender Differences in Achievement over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Table 7.5 Race Differences in Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Table 7.6 Effect Sizes for Racial-Ethnic Differences in Average Achievement . . . . . . . . . . . . . . . . . . . . . 115 Table 7.7 Fairness Reviewers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Table 7.8 Number of Items Identified in Category C in National DIF Study . . . . . . . . . . . . . . . . . . . . . . . . 119 Part 8: Relationships in Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Table 8.1 Correlations Among Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Table 8.2 Reliabilities of Differences Among Scores for Major Test Areas: Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Table 8.3 Reliabilities of Differences Among Tests: Developmental Standard Scores . . . . . . . . . . . . . . . 128 Table 8.4 Correlations Among School Average Developmental Standard Scores. . . . . . . . . . . . . . . . . . . 131 Table 8.5 Correlations Between Standard Age Scores and Developmental Standard Scores . . . . . . . . . 137 Table 8.6 Reliabilities of Difference Scores and Standard Deviations of Difference Scores Due to Errors of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Table 8.7 Correlations, Prediction Constants, and Standard Errors of Estimate for School Averages . . . 145 Part 9: Technical Consideration for Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Table 9.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Iowa Tests of Basic Skills–Survey Battery, Form A Table 9.2 Average Reliability Coefficients, Grades 3–8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Iowa Writing Assessment Table 9.3 Correlations and Reliability of Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Iowa Writing Assessment and Iowa Tests of Basic Skills Language Total Table 9.4 Internal-Consistency Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Constructed-Response Supplement Table 9.5 Correlations and Reliabilities of Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Constructed-Response Supplement and Corresponding ITBS Subtests Table 9.6 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Listening Assessment for ITBS Table 9.7 Correlations Between Listening and ITBS Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Table 9.8 Correlations Between Listening Grade 2 and ITBS Grade 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Table 9.9 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Integrated Writing Skills Test, Form M Table 9.10 Correlations Between IWST and ITBS Reading and Language Tests . . . . . . . . . . . . . . . . . . . . 159 Table 9.11 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Iowa Algebra Aptitude Test–Grade 8 Table 9.12 Correlations Between IAAT and Algebra Grades and Test Scores . . . . . . . . . . . . . . . . . . . . . . 160 iv 961464_ITBS_GuidetoRD.qxp 10/29/10 PART 1 3:15 PM Page 1 Nature and Purposes of The Iowa Tests ® The Iowa Tests Validity of the Tests The Iowa Tests consist of a variety of educational achievement instruments developed by the faculty and professional staff at Iowa Testing Programs at The University of Iowa. The Iowa Tests of Basic Skills® (ITBS®) measure educational achievement in 15 subject areas for kindergarten through grade 8. The Iowa Tests of Educational Development ® (ITED®) measure educational achievement in nine subject areas for grades 9 through 12. These test batteries share a history of development that has been an integral part of the research program in educational measurement at The University of Iowa for the past 70 years. In addition to these achievement batteries, The Iowa Tests include specialized instruments for specific achievement domains. The most valid assessment of achievement for a particular school is one that most closely defines that school’s education standards and goals for teaching and learning. Ideally, the skills and abilities required for success in assessment should be the same skills and abilities developed through local instruction. Whether this ideal has been attained in the Iowa Tests of Basic Skills is something that must be determined from an itemby-item examination of the test battery early in the decision-making process. This Guide to Research and Development is devoted primarily to the ITBS and related assessments. The Guide to Research and Development for the ITED contains technical information about that test battery and related assessments. Major Purposes of the ITBS Batteries The purpose of measurement is to provide information that can be used to improve instruction and learning. Assessment of any kind has value to the extent that it results in better decisions for students. In general, these decisions apply to choosing goals for instruction and learning strategies to achieve those goals, designing effective classroom environments, and meeting the diverse needs and characteristics of students. The Iowa Tests of Basic Skills measure growth in fundamental areas of school achievement: vocabulary, reading comprehension, language, mathematics, social studies, science, and sources of information. The achievement standards represented by the tests are crucial in educational development because they can determine the extent to which students will benefit from later instruction. Periodic assessment in these areas is essential to tailor instruction to individuals and groups, to provide educational guidance, and to evaluate the effectiveness of instruction. Common practices to validate test content have been used to prepare individual items for The Iowa Tests. The content standards were determined through consideration of typical course coverage, current teaching methods, and recommendations of national curriculum groups. Test content has been carefully selected to represent best curriculum practice, to reflect current performance standards, and to represent diverse populations. The arrangement of items into levels within tests follows a scope and sequence appropriate to a particular level of teaching and cognitive development. Items are selected for content relevance from a larger pool of items tried out with a range of students at each grade level. Throughout the battery, efforts have been made to emphasize the functional value of what students learn in school. Students’ abilities to use what they learn to interpret what they read, to analyze language, and to solve problems are tested in situations that approximate—to the extent possible with a paper and pencil test—actual situations in which students may use these skills. Ultimately, the validity of information about achievement derived from The Iowa Tests depends on how the information is used to improve instruction and learning. Over the years, the audience for assessment information has grown. Today it represents varied constituencies concerned about educational progress at local, state, and national levels. To make assessment information useful, careful attention must be paid to reporting results to students, to parents and teachers, to 1 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 2 school administrators and board members, and to the public. Descriptions of the types of score reports provided with The Iowa Tests are included in the Interpretive Guide for Teachers and Counselors and the Interpretive Guide for School Administrators. How to present test results to various audiences is discussed in these guides. Description of the ITBS Batteries Nature of the Levels Levels 5–6 (Grades K.1–1.9) The achievement tests included in the Complete Battery are listed below. The Composite score for these levels, Core Total, includes only the tests preceded by a solid circle (•). Those included in the Reading Profile Total are followed by an asterisk (*). Abbreviations used in this Guide appear in parentheses. • Vocabulary* (V) Names of the Tests Iowa Tests of Basic Skills® (ITBS®) Form A, Level 5; Form A, Level 6; Forms A and B, Levels 7 and 8; Forms A and B, Levels 9–14. Description of the Test Batteries The ITBS includes three batteries that allow for a variety of testing needs: • The Complete Battery consists of five to fifteen subtests, depending on level, and is available at Levels 5 through 14. • The Core Battery consists of a subset of tests in the Complete Battery, including all tests that assess reading, language, and math. It is available at Levels 7 through 14. • The Survey Battery consists of 30-minute tests on reading, language, and math. Items in the Survey Battery come from tests in the Complete Battery. It is available at Levels 7 through 14. Nature of the Batteries Word Analysis* (WA) Listening* (Li) • Language (L) • Mathematics (M) Reading: Words* (Level 6 only) (RW) Reading: Comprehension* (Level 6 only) (RC) Levels 7–8 (Grades 1.7–3.2) The achievement tests included in the Complete Battery and the Core Battery are listed below. Those in the Core Battery are preceded by a solid circle (•). Those included in the Reading Profile Total are followed by an asterisk (*). Test abbreviations are given in parentheses. • • • • • • • • • Levels 5–8 Levels 5 and 6 of Form A are published as a Complete Battery; there is no separate Core Battery or Survey Battery for these levels. Levels 7 and 8 of Forms A and B are published as a Complete Battery (twelve tests), a Core Battery (nine tests), and a Survey Battery (three tests). Levels 9–14 Levels 9 through 14 of Forms A and B are published in a Complete Battery (thirteen tests) and a Survey Battery (three tests). At Level 9, two additional tests are available, Word Analysis and Listening. For Level 9 only, a machine-scorable Complete Battery, a Core Battery (eleven tests), and a Survey Battery are available. Levels 10 through 14 have no separate Core Battery booklet; all Core tests are part of the Complete Battery booklet. 2 Vocabulary* (V) Word Analysis* (WA) Reading* (RC) Listening* (Li) Spelling* (L1) Language (L) Mathematics Concepts (M1) Mathematics Problems (M2) Mathematics Computation (M3) Social Studies (SS) Science (SC) Sources of Information (SI) Levels 9–14 (Grades 3.0–9.9) The achievement tests in the Complete Battery are listed below. Those in the Core Battery are preceded by a solid circle (•). Those tests included in the Reading Profile Total for Level 9 are followed by an asterisk (*). • • • • • • • • Vocabulary* (V) Reading Comprehension* (RC) Word Analysis* (Level 9 only) (WA) Listening* (Level 9 only) (Li) Spelling* (L1) Capitalization (L2) Punctuation (L3) Usage and Expression (L4) 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 3 • Math Concepts and Estimation (M1) • Math Problem Solving and Data Interpretation (M2) • Math Computation (M3) Social Studies (SS) Science (SC) Maps and Diagrams (S1) Reference Materials (S2) Tests in the Survey Battery—Reading, Language, and Mathematics—comprise items from the Complete Battery. Each test is divided into the parts indicated. Reading (two parts) Vocabulary Comprehension Language Mathematics (three parts) Concepts, Problem Solving and Data Interpretation Estimation Computation Grade Levels and Test Levels Levels 5 through 14 represent a comprehensive assessment program for kindergarten through grade 9. Each level is numbered to correspond roughly to the age of the student for whom it is best suited. A student should be given the level most compatible with his or her level of academic development. Typically, students in kindergarten and grades 1 and 2 would take only three of the Primary Battery’s four levels before taking Level 9 in grade 3. Table 1.1 shows how test level corresponds to a student’s level of academic development, expressed as a grade range. Decimals in the last column indicate month of the school year. For example, K.1–1.5 means the first month of kindergarten through the fifth month of grade 1. Test Lengths and Times For Levels 5 through 8, the number of questions and approximate working time for each test are given in Table 1.2. Tests at these levels are untimed; the actual time required for a test varies somewhat with the skill level of the students. (The administration times in the table are based on average rates reported by teachers in tryout sessions.) The Level 6 Reading test is administered in two sessions. For Levels 9 through 14, all tests are timed; the administration times include time to read directions as well as to take the tests. Table 1.1 Test and Grade Level Correspondence Iowa Tests of Basic Skills, Forms A and B Test Level Age Grade Level 5 6 7 8 9 10 11 12 13 14 5 6 7 8 9 10 11 12 13 14 K.1 – 1.5 K.7 – 1.9 1.7 – 2.3 2.3 – 3.2 3.0 – 3.9 4.0 – 4.9 5.0 – 5.9 6.0 – 6.9 7.0 – 7.9 8.0 – 9.9 Nature of the Questions For Levels 5 through 8, questions are read aloud except at Level 6 for parts of the Reading test, and at Levels 7 and 8 except for the Reading test and parts of the Vocabulary and Math Computation tests. Questions are multiple choice with three or four response options. Responses are presented in pictures, letters, numerals, or words, depending on the test and level. All questions in Levels 9 through 14 are multiple choice, have four or five options, and are read by the student. Mode of Responding Students who take Levels 5 through 8 mark answers in machine-scorable booklets by filling in a circle. Those who take Levels 9 through 14 mark answers on a separate answer folder (Complete Battery) or answer sheet (Survey Battery). For the machine-scorable booklets at Level 9, students mark answers in the test booklets. Directions A separate Directions for Administration manual is provided for each Complete Battery (Levels 5 through 8) and Core Battery (Levels 7 and 8) level and form. The Survey Battery (Levels 7 and 8) has separate Directions for Administration manuals for each level and form. At Levels 9 through 14, there is one Directions for Administration manual for Forms A and B of the Complete Battery. At these levels, the Survey Battery has a single Directions for Administration manual. The machine-scorable booklets of Level 9 have separate Directions for Administration manuals. 3 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 4 Table 1.2 Number of Items and Test Time Limits Iowa Tests of Basic Skills, Forms A and B Level 5: Complete Battery Approximate Working Time (Minutes) • Vocabulary Word Analysis Listening • Language • Mathematics • Core Tests Complete Battery 20 20 30 25 25 1 hr., 10 min. 2 hrs. Level 6: Complete Battery Number of Items 29 30 29 29 29 87 146 Level 7: Complete and Core Battery Approximate Working Time (Minutes) • Vocabulary 15 • Word Analysis 15 • Reading 35 • Listening 25 • Spelling 15 • Language 15 • Math Concepts 20 • Math Problems 25 • Math Computation 20 Social Studies 25 Science 25 Sources of Information 25 • Core Battery 3 hrs., 5 min. Complete Battery 4 hrs., 20 min. Reading 30 Language 25 Mathematics 22 Mathematics Computation 8 Survey Battery 1 hr., 25 min. 4 • Vocabulary 20 Word Analysis 20 Listening 30 • Language 25 • Mathematics 25 Reading: Words 23 Reading: Comprehension 20 • Core Tests 1 hr., 10 min. Complete Battery 2 hrs., 43 min. Number of Items 31 35 31 31 35 29 19 97 211 Level 8: Complete and Core Battery Number of Items 30 35 34 31 23 23 29 28 27 31 31 22 260 344 Level 7: Survey Battery Approximate Working Time (Minutes) Approximate Working Time (Minutes) Approximate Working Time (Minutes) • Vocabulary 15 • Word Analysis 15 • Reading 35 • Listening 25 • Spelling 15 • Language 15 • Math Concepts 20 • Math Problems 25 • Math Computation 20 Social Studies 25 Science 25 Sources of Information 30 • Core Battery 3 hrs., 5 min. Complete Battery 4 hrs., 25 min. Number of Items 32 38 38 31 23 31 31 30 30 31 31 28 284 374 Level 8: Survey Battery Number of Items 40 34 27 13 114 Approximate Working Time (Minutes) Reading 30 Language 25 Mathematics 22 Mathematics Computation 8 Survey Battery 1 hr., 25 min. Number of Items 44 42 33 17 136 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 5 Table 1.2 (continued) Number of Items and Test Time Limits Iowa Tests of Basic Skills, Forms A and B Levels 9 –14: Number of Items, Complete and Core Battery Level Working Time (Minutes) • Vocabulary 15 • Reading Comprehension1 25 + 30 • Spelling 12 • Capitalization 12 • Punctuation 12 • Usage and Expression 30 • Mathematics Concepts and Estimation1 25 + 5 • Mathematics Problem Solving and Data Interpretation 30 • Mathematics Computation 15 Social Studies 30 Science 30 Maps and Diagrams 30 Reference Materials 25 • Word Analysis2 20 • Listening2 25 • Core Battery (3 hrs., 31 min.)3 Complete Battery (5 hrs., 26 min.)4 211 326 9 10 11 12 13 14 29 37 28 24 24 30 31 34 41 32 26 26 33 36 37 43 36 28 28 35 40 39 45 38 30 30 38 43 41 48 40 32 32 40 46 42 52 42 34 34 43 49 22 25 30 30 24 28 35 31 24 27 34 34 25 30 – – 26 29 37 37 26 32 – – 28 30 39 39 28 34 – – 30 31 41 41 30 36 – – 32 32 43 43 31 38 – – 2503 3624 279 402 302 434 321 461 340 488 360 515 1 This test is administered in two parts. test is untimed. The time given is approximate. 3 With Word Analysis and Listening at Level 9, testing time is 256 min. (4 h., 16 m.) and the number of items is 316. 4 With Word Analysis and Listening at Level 9, testing time is 371 min. (6 h., 11 m.) and the number of items is 428. 2 This Levels 9 –14: Number of Items, Survey Battery Level Working Time (Minutes) 9 10 11 12 13 14 Reading Part 1: Vocabulary Part 2: Comprehension Language Mathematics Part 1: Concepts and Problems Part 2: Estimation Part 3: Computation 30 5 25 30 30 22 3 5 27 10 17 43 31 19 4 8 30 11 19 47 34 21 4 9 32 12 20 51 37 23 5 9 34 13 21 54 40 25 5 10 36 14 22 57 43 27 6 10 37 14 23 59 46 29 6 11 Survey Battery (1 hr., 30 min.) 90 101 111 120 128 136 142 5 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 6 Other Iowa Tests Other Manuals Iowa Writing Assessment In addition to this Guide to Research and Development, several other manuals provide information for test users. Each Directions for Administration manual includes a section on preparing for test administration as well as the script needed to administer the tests. The Test Coordinator Guide offers suggestions about policies and procedures associated with testing, advice about planning for and administering the testing program, ideas about preparing students and parents, and details about how to prepare answer documents for the scoring service. The Interpretive Guide for Teachers and Counselors describes test content, score reports, use of test results for instructional purposes, and communication of results to students and parents. The Interpretive Guide for School Administrators offers additional information, including guidance on designing a districtwide assessment program and reporting test results. The Norms and Score Conversions booklets contain directions for hand scoring and norms tables for converting raw scores to derived scores such as standard scores and percentile ranks. The Iowa Writing Assessment measures a student’s ability to generate, organize, and express ideas in writing. This assessment includes four prompts that require students to compose an essay in either narrative, descriptive, persuasive, or expository modes. With norm-referenced evaluation of a student’s writing about a specific topic, the Iowa Writing Assessment adds to the information obtained from other language tests and from the writing students do in the classroom. Listening Assessment for ITBS Content specifications for Levels 9 through 14 of the Listening tests are based on current literature in the teaching and assessment of listening comprehension. The main purposes of the Listening Assessment are: (a) to measure strengths and weaknesses in listening so effective instruction can be planned to meet individual and group needs; (b) to monitor listening instruction; and (c) to help make teachers and students aware of the importance of good listening strategies. Constructed-Response Supplement to The Iowa Tests These tests may be used with the Complete Battery and Survey Battery of the ITBS. The ConstructedResponse Supplement measures achievement in reading, language, and math in an open-ended format. Students write answers in the test booklet, and teachers use the scoring guidelines to rate the responses. The results can be used to provide information about achievement to satisfy requirements for multiple measures. 6 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 7 PART 2 The National Standardization Program Normative data collected at the time of standardization is what distinguishes normreferenced tests from other assessments. It is through the standardization process that scores, scales, and norms are developed. The procedures used in the standardization of The Iowa Tests are designed to make the norming sample reflect the national population as closely as possible, ensuring proportional representation of ethnic and socioeconomic groups. The standardization of the Iowa Tests of Basic Skills (ITBS) Complete Battery and Survey Battery was a cooperative venture. It was planned by the ITBS authors, the publisher, and the authors of the Iowa Tests of Educational Development (ITED) and the Cognitive Abilities Test™ (CogAT ®). Many public and non-public schools cooperated in national item tryouts and standardization activities, which included the 2000 spring and fall test administrations, scaling, and equating studies. Planning the National Standardization Program The standardization of the ITBS, ITED, and CogAT was carried out as a single enterprise. After reviewing previous national standardization programs, the basic principles and conditions of those programs were adapted to the following current needs: • The sample should be selected to represent the national population with respect to ability and achievement. It should be large enough to represent the diverse characteristics of the population, but a carefully selected sample of reasonable size would be preferred over a larger but less carefully selected sample. • Sampling units should be chosen primarily on the basis of school district size, region of country, and socioeconomic characteristics. A balance between public and non-public schools should be obtained. • The sample of attendance centers should be sufficiently large and selected to provide dependable norms for building averages. • Attendance centers in each part of the sample should represent the central tendency and variability of the population. • To ensure comparability of norms from grade to grade, all grades in a selected attendance center (or a designated fraction thereof) should be tested. • To ensure comparability of norms for ability and achievement tests, both the ITBS and the CogAT should be administered to the same students at the appropriate grade level. • To ensure comparability of norms for Complete and Survey Batteries, alternate forms of both batteries should be administered at the appropriate grade level to the same students or to equivalent samples of students. • To ensure applicability of norms to all students, testing accommodations for students who require them should be a regular part of the standardization design. Procedures for Selecting the Standardization Sample Public School Sample Three stratifying variables were used to classify public school districts across the nation: geographic region, district enrollment, and socioeconomic status (SES) of the school district. Within each geographic region (New England and Mideast, Southeast, Great Lakes and Plains, and West and Far West), school districts were stratified into nine enrollment categories. School district SES was determined with data from the National Education Database™ (Quality Education Data, 2002). The socioeconomic index is the percent of students in a district falling below the federal government poverty guideline, similar to the Orshansky index used in sampling for the National Assessment of Educational Progress (NAEP). This index was used in each of the four regions to break the nine district-size categories into five strata. 7 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 8 In each SES category, districts were selected at random and designated as first, second, or third choices. Administrators in the selected districts were contacted by the publisher and invited to participate. If a district declined, the next choice was contacted. Catholic School Sample The primary source for selecting and weighting the Catholic sample was NCEA/Ganley’s Catholic Schools in America (NCEA, 2000). Within each geographic region of the public sample, schools were stratified into five categories on the basis of diocesan enrollment. A two-stage random sampling procedure was used to select the sample. In the first stage, dioceses were randomly selected from each of five enrollment categories. Different sampling fractions were used, ranging from 1.0 for dioceses with total student enrollment above 100,000 (all four were selected) to .07 for dioceses with fewer than 10,000 students (seven of 102 were selected). In the second stage, schools were randomly chosen from each diocese selected in the first stage. In all but the smallest enrollment dioceses—where only one school was selected—two schools were randomly chosen. If the selected school declined to participate, the alternate school was contacted. If neither school agreed to participate, additional schools randomly selected from the diocese were contacted. Private Non-Catholic School Sample The sample of private non-Catholic schools was obtained from the QED data file. The schools in each geographic region of the public and Catholic samples were stratified into two types: churchrelated and nonsectarian. Schools were randomly sampled in eight categories (region by type of school) until the target number of students was reached. For each school selected, an alternate school was chosen to be contacted if the selected school declined to participate. Summary These sampling procedures produced (1) a national probability sample representative of students nationwide; (2) a nationwide sample of schools for school building norms; (3) data for Catholic/private and other special norms; and (4) empirical norms for the Complete Battery and the Survey Battery. 8 The authors and publisher of the ITBS are grateful to many people for assistance in preparing test materials and administering tests in item tryouts and special research projects. In particular, gratitude is acknowledged to administrators, teachers, and students in the schools that took part in the national standardization. These schools are listed at the end of this part of the Guide to Research and Development. Schools marked with an asterisk participated in both spring and fall standardizations. Design for Collecting the Standardization Data A timetable for administration of the ITBS and the CogAT is given in Table 2.1. This illustrates how the national standardization study was designed. During the spring standardization, students took the appropriate level of the Complete Battery of the ITBS, Form A. These same students took Form 6 of the CogAT. The design of the fall standardization was more complex. Every student in grades 2 through 8 participated in two units of testing. The order of the two testing units was counterbalanced. In the first testing unit, the student took the Complete Battery of either Form A or Form B of the ITBS. In grades 2 and 3, Forms A and B of the ITBS machine-scorable booklets were used in alternate classrooms. In approximately half of the grade 3 classrooms, alternate forms of the ITBS Level 8 were administered; in the remaining grade 3 classrooms, Forms A and B of Level 9 were administered to every other student. In grades 4 through 8, Forms A and B were administered to every other student in all classrooms. In the second testing unit of the fall standardization, students took Form A or Form B of the Survey Battery. (Students who had taken Form A of the Complete Battery took Form B of the Survey Battery and vice versa.) Weighting the Samples After materials from the spring standardization had been received by the Riverside Scoring Service®, the number and percents of students in each sample (public, Catholic, and private non-Catholic) and stratification category were determined. The percents were adjusted by weighting to compensate for missing categories and to adjust for schools that tested more or fewer students than required. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 9 Table 2.1 Summary of Standardization Schedule Time First Unit Spring 2000 ITBS, Form A Complete Battery (Levels 5 – 8, Grades K – 2) CogAT, Form 6 (Levels 1 – 2, Grades K – 3) ITBS, Form A Complete Battery (Levels 9 – 14, Grades 3 – 8) CogAT, Form 6 (Levels A – F, Grades 3 – 8) Fall 2000 Second Unit ITBS, Form A/B Complete Battery (Levels 7 – 8, Grades 2 – 3) ITBS, Form A/B Complete Battery (Levels 9 –14, Grades 3 – 8) ITBS, Form B/A Survey Battery (Levels 9 – 14, Grades 3 – 8) The number of students in the 2000 spring national standardization of the ITBS is given in Table 2.2 for the public, Catholic, and private non-Catholic samples. Table 2.2 also shows the unweighted and weighted sample percents and the population percents for each cohort. Tables 2.3 through 2.7 summarize the unweighted and weighted sample characteristics for the spring 2000 standardization of the ITBS based on the principal stratification variables of the public school sample and other key characteristics of the nonpublic sample. Optimal weights for these samples were determined by comparing the proportion of students nationally in each cohort to the corresponding sample proportion. Once the optimal weight for each sample was obtained, the stratification variables were simultaneously considered to assign final weights. These weights (integer values 0 through 9, with 3 denoting perfect proportional representation) were assigned to synthesize the characteristics of a missing unit or adjust the frequencies in other units. As a result, the weighted distributions in the three standardization samples closely approximate those of the total student population. In addition to the regular norms established in the 2000 national standardization, separate norms were established for special populations. These norms and the procedures used to derive them are discussed in Part 4. 9 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 10 Table 2.2 Sample Size and Percent of Students by Type of School Spring 2000 National Standardization Sample, ITBS, Grades K–8 Public School Sample Catholic School Sample Private Non-Catholic Sample Total Unweighted Sample Size 149,831 10,797 9,589 170,217 Unweighted Sample % 88.0 6.3 5.6 100.0 Weighted Sample % 90.1 4.9 5.0 100.0 Population % 90.1 4.9 5.0 100.0 Table 2.3 Percent of Public School Students by Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 Geographic Region % of Students in Sample % of Students in Weighted Sample % of Students in Population New England and Mideast 14.7 22.2 21.7 Southeast 26.9 23.8 23.6 Great Lakes and Plains 25.0 22.3 21.9 West and Far West 33.4 31.7 32.7 Table 2.4 Percent of Public School Students by SES Category Spring 2000 National Standardization Sample, ITBS, Grades K–8 % of Students in Sample % of Students in Weighted Sample % of Students in Population High 12.2 15.3 15.2 High Average 23.5 19.2 19.1 Average 36.8 31.3 31.5 Low Average 21.2 19.1 19.1 6.3 15.2 15.1 SES Category Low 10 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 11 Table 2.5 Percent of Public School Students by District Enrollment Spring 2000 National Standardization Sample, ITBS, Grades K–8 District K–12 Enrollment % of Students in Sample % of Students in Weighted Sample % of Students in Population 100,000 + 3.9 9.7 15.6 50,000 – 99,999 5.6 9.6 8.5 25,000 – 49,999 11.4 17.9 11.4 10,000 – 24,999 23.3 20.1 17.7 5,000 – 9,999 15.8 10.2 14.7 2,500 – 4,999 17.6 13.6 14.7 1,200 – 2,499 12.5 9.2 10.2 600 – 1,199 7.4 6.9 4.5 Less than 600 2.5 2.8 2.7 Table 2.6 Percent of Catholic Students by Diocese Size and Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 % of Students in Sample % of Students in Weighted Sample % of Students in Population 100,000 + 7.7 17.5 17.5 50,000 – 99,999 7.4 17.9 18.0 20,000 – 49,999 35.0 21.8 21.7 10,000 – 19,999 19.4 24.9 25.0 Less than 10,000 30.5 17.9 17.8 New England and Mideast 23.4 35.0 34.9 Southeast 17.5 13.7 13.6 Great Lakes and Plains 44.2 33.7 33.9 West and Far West 14.9 17.6 17.6 Diocese Size Geographic Region 11 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 12 Table 2.7 Percent of Private Non-Catholic Students by Geographic Region Spring 2000 National Standardization Sample, ITBS, Grades K–8 Geographic Region New England and Mideast % of Students in Sample % of Students in Weighted Sample % of Students in Population 9.7 24.0 23.8 Southeast 19.5 29.3 29.4 Great Lakes and Plains 34.0 19.8 19.7 West and Far West 36.8 26.9 27.1 Racial-Ethnic Representation Participation of Students in Special Groups Although not a direct part of a typical sampling plan, the racial-ethnic composition of a national standardization sample should represent that of the school population. The racial-ethnic composition of the 2000 ITBS spring standardization sample was estimated from responses to demographic questions on answer documents. In all grades, all the racialethnic group(s) to which a student belonged was requested. In kindergarten through grade 3, teachers furnished this information. In the remaining grades, students furnished it. The results reported in Table 2.8 include students in Catholic and other private schools. The table also shows estimates of population percents in public schools for each category, according to the National Center for Education Statistics. In the spring 2000 national standardization, schools were given detailed instructions for the testing of students with disabilities and English Language Learners. Schools were asked to decide whether students so identified should be tested, and, if so, what modifications in testing procedures were needed. The response rate for racial-ethnic information was high; 98 percent of the standardization participants indicated membership in one of the groups listed. Although the percents of students in each group fluctuate from grade to grade, differences between sample and population percents were generally within chance error. This was true for all groups except Hispanics or Latinos, who were slightly underrepresented. However, some of this underrepresentation can be attributed to school districts exempting from testing students whose first language is not English. These students are not as likely to be represented in the test-taking population as they are in the school population. Collectively, the results in Table 2.8 provide evidence of the overall quality of the national standardization sample and its representativeness of the racial and ethnic makeup of the U.S. student population. 12 Among students with disabilities, nearly all were identified as eligible for special education services and had an Individualized Education Program (IEP), an Individualized Accommodation Plan (IAP), or a Section 504 Plan. Schools were asked to examine the IEP or other plan for these students, decide whether the student should receive accommodations, and determine the nature of those accommodations. Schools were told an accommodation refers to a change in the procedures for administering the test and that an accommodation is intended to neutralize, as much as possible, the effect of the student’s disability on the assessment process. Accommodations should not change the kind of achievement being measured, but change how achievement is measured. When accommodations were used, the test administrator recorded the type of accommodation on each student’s answer document. The accommodations most frequently used by students with IEPs or Section 504 Plans were listed on the student answer document. Space for indicating other accommodations was also included. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 13 Table 2.8 Racial-Ethnic Representation Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization White (62.1 %)* Grade Number Black or African American (17.2 %)* Percent Weighted Number Percent Grade Number Percent Weighted Number Percent K 11,137 67.8 33,201 57.0 K 2,735 16.7 11,343 19.5 1 12,648 70.2 29,729 58.7 1 2,629 14.6 7,773 15.3 2 13,529 71.9 30,560 59.1 2 2,323 12.3 6,780 13.1 3 13,308 72.3 33,713 65.6 3 2,229 12.1 7,336 14.3 4 13,437 71.6 33,866 64.6 4 2,024 10.8 7,028 13.4 5 14,516 72.3 35,010 67.0 5 2,146 10.7 7,231 13.8 6 14,776 73.1 35,509 70.0 6 2,223 11.0 7,404 14.6 7 14,346 75.0 36,519 73.1 7 1,731 9.1 6,160 12.3 8 12,146 71.8 37,154 71.4 8 1,877 11.1 8,503 16.3 Total 119,843 71.9 371,460 66.0 Total 19,917 11.9 62,371 14.6 Hispanic or Latino (15.6 %)* Grade Number Percent Weighted Number Asian/Pacific Islander (4.0 %)* Percent Grade Number Percent Weighted Number Percent K 1,839 11.2 6,399 11.0 K 429 2.6 1,493 2.6 1 1,975 11.0 5,732 11.3 1 460 2.6 1,192 2.4 2 2,084 11.1 5,785 11.2 2 537 2.9 1,291 2.5 3 1,941 10.5 6,894 13.4 3 497 2.7 1,406 2.7 4 2,080 11.1 7,127 13.6 4 553 2.9 1,468 2.8 5 2,031 10.1 7,499 14.3 5 591 2.9 1,487 2.8 6 1,745 8.6 4,643 9.1 6 614 3.0 1,602 3.2 7 1,647 8.6 4,466 8.9 7 477 2.5 1,328 2.7 8 1,490 8.8 4,379 8.4 8 485 2.9 1,697 3.3 Total 16,832 10.1 62,371 11.1 Total 4,643 2.9 15,584 2.8 American Indian/Alaskan Native (1.2 %)* Grade Number Percent Weighted Number Percent Native Hawaiian (NA) Grade Number Percent Weighted Number Percent K 207 1.3 878 1.5 K 70 0.4 194 0.3 1 225 1.2 885 1.7 1 69 0.5 157 0.3 2 250 1.3 946 1.8 2 95 0.5 148 0.3 3 364 2.0 1,279 2.5 3 80 0.4 142 0.3 4 500 2.7 1,806 3.4 4 181 1.0 498 1.0 5 673 3.4 2,128 4.1 5 111 0.6 247 0.5 6 749 3.7 2,244 4.4 6 109 0.5 251 0.5 7 789 4.1 2,476 5.0 7 136 0.7 379 0.8 8 656 3.9 2,148 4.1 8 274 1.6 969 1.9 4,413 2.6 17,782 3.2 Total 1,125 0.7 4,060 0.7 Total *Population percent (Source: Digest of Education Statistics 2000, 1999–2000 public school enrollment) 13 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 14 For students whose native language was not English and who had been in an English-only classroom for a limited time, two decisions had to be made prior to testing. First, was English language developed sufficiently to warrant testing, and, second, should an accommodation be used? In all instances, the district’s instructional guidelines were used in decisions about individual accommodations. Test administration for the 2000 spring standardization of the ITBS, Form A, took place between March 23 and May 29; it took place for the fall standardization between September 21 and November 11. The spring norming group was a national probability sample of approximately 170,000 students in kindergarten through grade 8; the fall sample was approximately 76,000 students. The test administrators were told that the use of testing accommodations with English Language Learners is intended to allow the measurement of skills and knowledge in the curriculum without significant interference from a limited opportunity to learn English. Those just beginning instruction in English were not likely to be able to answer many questions no matter what types of accommodations were used. For those in the second or third year of instruction in an English as a Second Language (ESL) program, accommodations might be warranted to reduce the effect of limited English proficiency on test performance. The types of accommodations sometimes used with such students were listed on the student answer document for coding. After answer documents were checked and scored and sampling weights had been assigned to schools, weighted opening and closing dates were determined. These are reference points for the empirical norms dates. The median empirical norms date for spring testing is April 30; for fall testing it is October 22. Table 2.9 summarizes the use of accommodations with students with disabilities during the standardization. While the percents vary somewhat across grades, an average of about 7 percent of the students were identified as special education students or as having a 504 Plan. Of these students, roughly 50 percent received at least one accommodation. The last column in the table shows that in the final distribution of scores from which the national norms were obtained, an average of 3 percent to 4 percent of the students received an accommodation. Table 2.10 reports similar information for English Language Learners. Empirical Norms Dates To provide more information for schools with alternative school calenders, data were collected from districts on their opening and closing dates. Procedures to analyze these data were altered from those used in the 1976–77 standardization—when the Title I program first required empirical norms dates—to determine weighted opening and closing dates. The procedures used and the advice given to school districts that do not have a standard 180-day, September-to-May school year are noted below. 14 Regular fall, midyear, and spring norms can be used by school districts that operate on a twelve-month schedule. To do so, testing should be scheduled so the number of instructional days prior to testing corresponds to the median number of instructional days for schools in the national standardization. For example, the fall norms for the 2000 national standardization were established with a median testing date of October 22, on average 40 instructional days from the median start date of schools in the national standardization. If a school year begins on July 15, testing should be scheduled between September 1 and September 21. Doing so places the median testing date at September 10, about 40 instructional days from the July 15 start date. By testing during this period, instructional opportunity is comparable to the norms group and the use of fall norms is therefore appropriate. Testing dates for twelve-month schools can be calculated in a similar way so midyear and spring norms can be used. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 15 Table 2.9 Test Accommodations — Special Education and 504 Students Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Grade Standardization Sample Identified Students Accommodated Students N N % of Standardization Sample K 58,216 2,121 3.6 262 12.4 0.5 1 50,687 2,397 4.7 905 37.8 1.8 2 51,725 3,076 5.9 1,322 43.0 2.6 3 51,414 3,485 6.8 1,615 46.3 3.1 4 52,392 4,101 7.8 2,184 53.3 4.2 5 52,277 4,286 8.2 2,241 52.3 4.3 6 50,753 3,652 7.2 1,662 45.5 3.3 7 49,925 3,478 7.0 2,146 61.7 4.3 8 52,072 3,489 6.7 2,109 60.4 4.1 N % of Identified Students % of Standardization Sample Note: Accommodations included Braille, large print, tested off level, answers recorded, extended time, communication assistance, transferred answers, individual/small group administration, repeated directions, tests read aloud (except for Vocabulary and Reading Comprehension), plus selected others. Table 2.10 Test Accommodations — English Language Learners Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Grade Standardization Sample Identified Students Accommodated Students N N % of Standardization Sample N K 58,216 3,780 6.5 1,122 29.7 1.9 1 50,687 2,853 5.6 382 13.4 0.8 2 51,725 3,352 6.5 244 7.3 0.5 3 51,414 2,460 4.8 358 14.6 0.7 4 52,392 3,604 6.9 565 15.7 1.1 5 52,277 3,060 5.9 315 10.3 0.6 6 50,753 973 1.9 163 16.8 0.3 7 49,925 739 1.5 216 29.2 0.4 8 52,072 662 1.3 156 23.6 0.3 % of Identified Students % of Standardization Sample Note: Accommodations included tested off level, extended time, individual/small group administration, repeated directions, provision of English/native language word-to-word dictionary, test administered by ESL teacher or individual providing language services. 15 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 16 School Systems Included in the 2000 Standardization Samples New England and Mideast Connecticut Orange: New Haven Hebrew Day School Thomaston, Thomaston School District: Thomaston Center Intermediate School Waterbury, Archdiocese of Hartford: St. Joseph School Delaware Newark, Christina School District: Summit School Wilmington, Brandywine School District: Talley Middle School District of Columbia Washington: Nannie H. Borroughs School Maine Bangor, Hermon School District*: Hermon Middle School Bowdoinham, School Admin. District 75: Bowdoinham Community Elementary School Calais, Union 106 Calais: Calais Elementary School Danforth, School Admin. District 14: East Grand School Hancock, Union 92 Hancock: Hancock Elementary School Jonesport, Union 103 Jonesport: Jonesport Elementary School Limestone, Caswell School District: Dawn F. Barnes Elementary School Monmouth, Monmouth Public School District: Henry Cottrell Elementary School North Berwick, School Admin. District 60: Berwick Elementary School, Hanson School, Noble Junior High School, North Berwick Elementary School, Vivian E. Hussey Primary School Portland, Diocese of Portland: Catherine McAuley High School, St. Joseph’s School Robbinston, Union 106 Robbinston*: Robbinston Grade School Turner*: Calvary Christian Academy Vanceboro, Union 108 Vanceboro*: Vanceboro Elementary School Maryland Baltimore, Baltimore City-Dir. Inst. Area 9*: Roland Park Elementary/Middle School 233 Baltimore, Baltimore City Public School District*: Edgecombe Circle Elementary School 62, Samuel F. B. Morse Elementary School 98 Hagerstown: Heritage Academy Hagerstown, Archdiocese of Baltimore: St. Maria Goretti High School, St. Mary School Stevensville, Queen Annes County School District: Kent Island Elementary School 16 Massachusetts Adams, Adams Cheshire Regional School District*: C. T. Plunkett Elementary School Boston, Archdiocese of Boston: Holy Trinity School, Immaculate Conception School, St. Bridget School Bridgewater S., Bridgewater-Raynham Regional School District: Burnell Laboratory School Danvers, Danvers School District: Highlands Elementary School, Willis Thorpe Elementary School Fall River: Antioch School Fall River, Diocese of Fall River: Our Lady of Lourdes School, Our Lady of Mt. Carmel School, Taunton Catholic Middle School Fall River, Diocese of Fall River*: Espirito Santo School, St. Jean Baptiste School Fall River, Fall River School District*: Brayton Avenue Elementary School, Harriet T. Healy Elementary School, Laurel Lake Elementary School, McCarrick Elementary School, Ralph Small Elementary School, Westall Elementary School Fitchburg, Fitchburg School District: Memorial Intermediate School Lowell: Lowell Public School District Peabody, Peabody Public School District: Kiley Brothers Memorial School Phillipston, Narragansett Regional School District: Phillipston Memorial Elementary School South Lancaster*: Browning SDA Elementary School Swansea: Swansea School District Walpole: Walpole Public School District Weymouth, Weymouth School District*: Academy Avenue Primary School, Lawrence Pingree Primary School, Murphy Primary School, Ralph Talbot Primary School, South Intermediate School, Union Street Primary School, William Seach Primary School Worcester, Worcester Public School District: University Park Campus School New Hampshire Bath, District 23 Bath: Bath Village Elementary School Litchfield, District 27 Litchfield: Griffin Memorial Elementary School Manchester, District 37 Manchester*: Bakersville Elementary School, Gossler Park Elementary School, Hallsville Elementary School, McDonough Elementary School, Southside Middle School, Weston Elementary School North Haverhill, District 23 Haverhill: Haverhill Cooperative Middle School, Woodsville Elementary School, Woodsville High School Rochester, District 54 Rochester: Maple Street Elementary School Salem: Granite Christian School Warren, District 23 Warren*: Warren Village School Note: Schools marked with an asterisk (*) participated in both spring and fall standardizations. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 17 New Jersey Collingswood*: Collingswood Public School District Elizabeth: Bruriah High School For Girls Jersey City: Jersey City Public School District Salem, Mannington Township School District: Mannington Elementary School New York Beaver Falls, Beaver River Central School District: Beaver River Central School Briarcliff Manor, Briarcliff Manor Union Free School District: Briarcliff High School Bronx*: Regent School Dobbs Ferry, Archdiocese of New York*: Our Lady of Victory Academy Elmhurst, Diocese of Brooklyn: Cathedral Preparatory Seminary Lowville, Lowville Central School District: Lowville Academy and Central School New York, Archdiocese of New York: Corpus Christi School, Dominican Academy, St. Christopher Parochial School, St. John Villa Academy-Richmond, St. Joseph Hill Academy North Tonawanda: North Tonawanda School District Old Westbury: Whispering Pines SDA School Spring Valley, East Ramapo Central School District*: M. L. Colton Intermediate School Weedsport, Weedsport Central School District: Weedsport Elementary School, Weedsport Junior/Senior High School Pennsylvania Austin, Austin Area School District: Austin Area School Bloomsburg, Bloomsburg Area School District: Bloomsburg Memorial Elementary School Cheswick, Allegheny Valley School District: Acmetonia Primary School Dubois, Diocese of Erie*: Dubois Central Christian High School Ebensburg, Diocese of Altoona Johnstown: Bishop Carroll High School Erie, Millcreek Township School District*: Chestnut Hill Elementary School Farrell: Farrell Area School District Gettysburg, Gettysburg Area School District: Gettysburg Area High School Hadley, Commodore Perry School District: Commodore Perry School Lebanon, Diocese of Harrisburg: Lebanon Catholic Junior/Senior High School Manheim: Manheim Central School District McKeesport, Diocese of Pittsburgh: Serra Catholic High School McKeesport, South Allegheny School District: Glassport Central Elementary School, Manor Elementary School, Port Vue Elementary School, South Allegheny Middle High School Middleburg, Midd-West School District: Penns Creek Elementary School, Perry-West Perry Elementary School Philadelphia, Philadelphia School District-Bartram*: Bartram High School Philadelphia, Philadelphia School District-Franklin*: Stoddart-Fleisher Middle School Philadelphia, Philadelphia School District-Gratz*: Thomas M. Peirce Elementary School Philadelphia, Philadelphia School District-Kensington*: Alexander Adaire Elementary School Philadelphia, Philadelphia School District-Olney*: Jay Cooke Middle School Philadelphia, Philadelphia School District-Overbrook*: Lewis Cassidy Elementary School Philadelphia, Philadelphia School District-William Penn*: John F. Hartranft Elementary School Pittsburgh: St. Matthew Lutheran School Pittsburgh, Diocese of Pittsburgh*: St. Mary of the Mount School Rhode Island Johnston: Trinity Christian Academy Providence: Providence Hebrew Day School Providence, Diocese of Providence: All Saints Academy, St. Xavier Academy Vermont Chelsea, Chelsea School District: Chelsea School Williston: Brownell Mountain SDA School Southeast Alabama Abbeville, Henry County School District: Abbeville Elementary School, Abbeville High School Columbiana, Shelby County School District: Helena Elementary School, Oak Mountain Middle School Dothan, Dothan City School District*: Beverlye Middle School, East Highland Learning Center, Girard Middle School, Honeysuckle Middle School Eclectic, Elmore County School District: Eclectic Elementary School Elberta, Baldwin County School District: Elberta Middle School Fairhope, Baldwin County School District*: Fairhope Elementary School, Fairhope Intermediate School Jacksonville: Jacksonville Christian Academy Mobile, Mobile County School District: Cora Castlen Elementary School, Dauphin Island Elementary School Mobile, Mobile County School District*: Adelia Williams Elementary School, Florence Howard Elementary School, Mobile County Training School Monroeville, Monroe County School District: Monroe County High School Tuscaloosa, Tuscaloosa County School District: Hillcrest High School 17 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 18 Arkansas Kentucky Altus, Altus-Denning School District 31: Altus-Denning Elementary School, Altus-Denning High School Beebe, Beebe School District*: Beebe Elementary School, Beebe Intermediate School Bismarck, Bismarck School District*: Bismarck Elementary School Conway, Conway School District*: Ellen Smith Elementary School, Florence Mattison Elementary School, Ida Burns Elementary School, Marguerite Vann Elementary School Fouke, Fouke School District 15*: Fouke High School, Fouke Middle School Gentry: Ozark Adventist Academy Grady: Grady School District 5 Little Rock*: Heritage Christian School Mountain Home, Mountain Home School District 9: Pinkston Middle School Norman: Caddo Hills School District Springdale, Springdale School District 50*: Parson Hills Elementary School Strawberry: River Valley School District Benton*: Christian Fellowship School Bowling Green, Warren County School District*: Warren East High School Campbellsville: Campbellsville Independent School District Elizabethtown, Hardin County School District: East Hardin Middle School, Parkway Elementary School, Rineyville Elementary School, Sonora Elementary School, Upton Elementary School, Woodland Elementary School Elizabethtown, Hardin County School District*: Brown Street Alternative Center, G. C. Burkhead Elementary School, Lynnvale Elementary School, New Highland Elementary School Florence: Northern Kentucky Christian School Fordsville, Ohio County School District: Fordsville Elementary School Hardinsburg, Breckinridge County School District: Hardinsburg Primary School Hartford, Ohio County School District*: Ohio County High School, Ohio County Middle School, Wayland Alexander Elementary School Hazard, Perry County School District*: Perry County Central High School Louisville: Eliahu Academy Pineville: Pineville Independent School District Williamstown, Williamstown Independent School District: Williamstown Elementary School Florida Archer, Alachua County School District: Archer Community School Century, Escambia School District: George W. Carver Middle School Fort Lauderdale, Archdiocese of Miami: St. Helen School Gainesville, Alachua County School District*: Hidden Oak Elementary School, Kimball Wiles Elementary School Jacksonville Beach: Beaches Episcopal School Kissimmee, Osceola School District: Reedy Creek Elementary School Miami, Archdiocese of Miami*: St. Agatha School Ocala, Marion County School District*: Fort King Middle School Orlando, Orange County School District-East: Colonial 9th Grade Center, University High School Palm Bay, Diocese of Orlando: St. Joseph Catholic School Palm Coast, Flagler County School District: Buddy Taylor Middle School, Old Kings Elementary School Pensacola, Escambia School District*: Redirections Georgia Barnesville: Lamar County School District Crawfordville, Taliaferro County School District: Taliaferro County School Cumming, Forsyth County School District*: South Forsyth Middle School Dalton, Whitfield County School District: Cohutta Elementary School, Northwest High School, Valley Point Middle School, Westside Middle School Shellman*: Randolph Southern School 18 Louisiana Chalmette, St. Bernard Parish School District: Arabi Elementary School, Beauregard Middle School, Borgnemouth Elementary School, J. F. Gauthier Elementary School, Joseph J. Davies Elementary School, N. P. Trist Middle School, Sebastien Roy Elementary School, St. Bernard High School Chalmette, St. Bernard Parish School District*: Andrew Jackson Fundamental High School, C. F. Rowley Elementary School, Lacoste Elementary School Lafayette, Diocese of Lafayette*: Redemptorist Elementary School, St. Peter School Plain Dealing: Plain Dealing Academy Shreveport, Diocese of Shreveport*: Holy Rosary School, Jesus the Good Shepherd School, St. John Cathedral Grade School West Monroe, Diocese of Shreveport: St. Paschal School Mississippi Brandon*: University Christian School Gulfport, Gulfport School District: Bayou View Elementary School North Carolina Greensboro, Guilford County School District: Alamance Elementary School, Montlieu Avenue Elementary School, Shadybrook Elementary School Hillsborough: Abundant Life Christian School Manteo: Dare County School District New Bern, Diocese of Raleigh: St. Paul Education Center 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 19 South Carolina West Virginia Beaufort, Beaufort County School District: E. C. Montessori and Grade School Camden: Kershaw County School District North Augusta, Diocese of Charleston: Our Lady of Peace School Rock Hill: Westminster Catawba Christian School Salem, Oconee County School District*: Tamassee Salem Middle High School Westminster, Oconee County School District: West-Oak High School Arnoldsburg, Calhoun County School District*: Arnoldsburg School Elizabeth, Wirt County School District: Wirt County High School Grantsville, Calhoun County School District: Pleasant Hill Elementary School Omar: Beth Haven Christian School Wayne, Wayne County School District: East Lynn Elementary School, Lavalette Elementary School, Wayne Middle School Weirton, Diocese of Wheeling-Charleston: Madonna High School Tennessee Athens, McMinn County School District: Mountain View Elementary School, Niota Elementary School Athens, McMinn County School District*: E. K. Baker Elementary School, Rogers Creek Elementary School Byrdstown*: Pickett County School District Dyer, Gibson County School District: Medina Elementary School, Rutherford Elementary School Fairview, Williamson County School District: Fairview High School Harriman, Harriman City School District*: Raymond S. Bowers Elementary School, Walnut Hill Elementary School Harrogate: J. Frank White Academy Murfreesboro, Rutherford County School District: Central Middle School, Smyrna West Kindergarten Somerville, Fayette County School District: Jefferson Elementary School, Oakland Elementary School Yorkville, Gibson County School District*: Yorkville Elementary School Virginia Charlottesville, Albemarle County School District: Monticello High School Chesapeake: Tidewater Adventist Academy Forest, Bedford County School District: Forest Middle School Jonesville, Lee County School District*: Ewing Elementary School, Rose Hill Elementary School Madison Heights, Amherst County School District: Madison Heights Elementary School Marion, Smyth County School District: Atkins Elementary School, Chilhowie Elementary School, Chilhowie Middle School, Marion Intermediate School, Marion Middle School, Marion Primary School, Sugar Grove Combined School Saltville, Smyth County School District*: Northwood Middle School, Rich Valley Elementary School, Saltville Elementary School St. Charles, Lee County School District: St. Charles Elementary School Staunton: Stuart Hall School Suffolk, Suffolk Public School District*: Forest Glen Middle School Great Lakes and Plains Illinois Bartlett, Elgin School District U-46, Area B: Bartlett Elementary School, Bartlett High School Benton, Benton Community Consolidated School District 47: Benton Elementary School Berwyn, Berwyn South School District 100: Heritage Middle School Cambridge, Cambridge Community Unit School District 227*: Cambridge Community Elementary School, Cambridge Community Junior/Senior High School Chicago, Chicago Public School District-Region 1*: Stockton Elementary School Chicago, Chicago Public School District-Region 4*: Brighton Park Elementary School Duquoin, Duquoin Community Unit School District 300: Duquoin Middle School Elgin, Elgin School District U-46, Area A*: Century Oaks Elementary School, Garfield Elementary School, Washington Elementary School Elgin, Elgin School District U-46, Area B*: Elgin High School, Ellis Middle School Glendale Heights, Queen Bee School District 16: Pheasant Ridge Primary School Joliet: Ridgewood Baptist Academy Lake Villa, Lake Villa Community Consolidated School District 41: Joseph J. Pleviak Elementary School Lincoln, Lincoln Elementary School District 27*: Northwest Elementary School, Washington-Monroe Elementary School Mossville, Illinois Valley Central School District 321: Mossville Elementary School Quincy, Diocese of Springfield: St. Francis Solanus School Schaumburg, Schaumburg Community Consolidated School District 54: Douglas MacArthur Elementary School, Everett Dirksen Elementary School Streamwood, Elgin School District U-46, Area C*: Oakhill Elementary School, Ridge Circle Elementary School Villa Park: Islamic Foundation School Wayne City*: Wayne City Community Unit School District 100 Westmont: Westmont Community Unit School District 201 19 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 20 Indiana Hammond, Diocese of Gary*: Bishop Noll Institute, St. Catherine of Siena School, St. John Bosco School Indianapolis, Perry Township School District: Homecroft Elementary School, Mary Bryan Elementary School Logansport, Logansport Community School District*: Lincoln Middle School Spencer, Spencer-Owen Community School District: Gosport Elementary School, Patricksburg Elementary School, Spencer Elementary School Valparaiso, Valparaiso Community School District: Benjamin Franklin Middle School, Thomas Jefferson Middle School Vevay, Switzerland County School District: Switzerland County High School Warsaw: Redeemer Lutheran School Warsaw, Warsaw Community Schools: Eisenhower Elementary School Iowa Alton, Diocese of Sioux City: Spalding Catholic Elementary School Bellevue, Archdiocese of Dubuque: Marquette High School Davenport, Diocese of Davenport: Cardinal Stritch Junior/Senior High School, Mater Dei Junior/Senior High School, Notre Dame Elementary School, Trinity Elementary School Delhi: Maquoketa Valley Community School District Remsen, Diocese of Sioux City*: St. Mary’s High School Williamsburg: Lutheran Interparish School Kansas Anthony, Anthony-Harper Unified School District 361: Chaparral High School, Harper Elementary School Columbus, Columbus Unified School District 493: Central School Galena, Columbus Unified School District 493*: Spencer Elementary School Kansas City*: Mission Oaks Christian School Kansas City, Archdiocese of Kansas City: Assumption School, St. Agnes School Osawatomie*: Osawatomie Unified School District 367 Spring Hill, Spring Hill Unified School District 230: Spring Hill High School St. Paul, Erie-St. Paul Consolidated School District 101: St. Paul Elementary School, St. Paul High School Westwood*: Mission Oaks Christian School-Westwood Michigan Algonac, Algonac Community School District: Algonac Elementary School Auburn*: Zion Lutheran School Berkley, Berkley School District: Pattengill Elementary School Bloomingdale: Bloomingdale Public School District Buckley, Buckley Community School District: Buckley Community School Canton, Plymouth-Canton Community Schools*: Gallimore Elementary School, Hoben Elementary School 20 Carleton: Airport Community School District Dafter, Sault Ste. Marie Area School District*: Bruce Township Elementary School Gaylord, Gaylord Community School District*: Elmira Elementary School, Gaylord High School, Gaylord Intermediate School, Gaylord Middle School, North Ohio Elementary School, South Maple Elementary School Grand Blanc: Grand Blanc Community School District Macomb: St. Peter Lutheran School Plymouth, Plymouth-Canton Community Schools: Allen Elementary School, Bird Elementary School, Farrand Elementary School, Fiegel Elementary School, Smith Elementary School Redford, Archdiocese of Detroit*: St. Agatha High School Reese: Trinity Lutheran School Rockwood, Gibraltar School District: Chapman Elementary School Royal Oak, Royal Oak Public School District: Dondero High School, Franklin Elementary School St. Joseph, Diocese of Kalamazoo*: Lake Michigan Catholic Elementary School, Lake Michigan Catholic Junior/Senior High School Traverse City: Traverse City Area Public Schools Wayland, Wayland Union School District: Bessie B. Baker Elementary School, Dorr Elementary School Whittemore, Whittemore Prescott Area School District: Whittemore-Prescott Alternative Education Center Minnesota Barnum, Barnum Independent School District 91: Barnum Elementary School Baudette, Lake of the Woods Independent School District 390: Lake of the Woods School Farmington: Farmington Independent School District 192 Hanska, New Ulm Independent School District 88: Hanska Community School Hastings, Hastings Independent School District 200: Cooper Elementary School, John F. Kennedy Elementary School, Pinecrest Elementary School Isanti, Cambridge-Isanti School District 911: Isanti Middle School Lafayette, Minnesota Department of Education: Lafayette Charter School Menahga, Menahga Independent School District 821: Menahga School Mendota Heights: West St. Paul-Mendota-Eagan School District 197 Newfolden, Newfolden Independent School District 441*: Newfolden Elementary School Rochester: Schaeffer Academy St. Paul: Christ Household of Faith School Stillwater, Stillwater School District 834: Stonebridge Elementary School Watertown, Watertown-Mayer School District 111: Watertown-Mayer Elementary School, WatertownMayer High School, Watertown-Mayer Middle School Winsted, Diocese of New Ulm: Holy Trinity Elementary School, Holy Trinity High School 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 21 Missouri Cape Girardeau, Cape Girardeau School District 63: Barbara Blanchard Elementary School Cape Girardeau, Diocese of Springfield/Cape Girardeau: Notre Dame High School Lexington, Lexington School District R5: Lexington High School Liberal: Liberal School District R2 Rogersville: Greene County School District R8 Rueter, Mark Twain Elementary School District R8: Mark Twain R8 Elementary School Sparta: Sparta School District R3 Springfield: New Covenant Academy Nebraska Burwell, Burwell High School District 100: Burwell Junior/Senior High School Creighton, Archdiocese of Omaha*: St. Ludger Elementary School Lemoyne, Keith County Centers: Keith County District 51 School Omaha, Archdiocese of Omaha: All Saints Catholic School, Guadalupe-Inez School, Holy Name School, Pope John XXIII Central Catholic High School, Roncalli Catholic High School, Sacred Heart School, SS Peter and Paul School, St. James-Seton School, St. Thomas More School Randolph: Randolph School District 45 Seward: St. John’s Lutheran School Spalding, Diocese of Grand Island*: Spalding Academy North Dakota Belfield, Belfield Public School District 13*: Belfield School Bismarck: Dakota Adventist Academy Fargo: Grace Lutheran School Grand Forks: Grand Forks Christian School Halliday, Twin Buttes School District 37: Twin Buttes Elementary School Minot, Diocese of Bismarck: Bishop Ryan Junior/Senior High School New Town, New Town School District 1*: Edwin Loe Elementary School, New Town Middle School/High School Ohio Akron, Akron Public Schools*: Academy At Robinson, Erie Island Montessori School, Mason Elementary School Bowling Green: Wood Public Schools Chillicothe, Chillicothe City School District: Tiffin Elementary School Cincinnati, Archdiocese of Cincinnati: Catholic Central High School, St. Brigid School Cleveland: Lutheran High School East Dalton, Dalton Local School District: Dalton Intermediate School Danville, Danville Local School District: Danville High School, Danville Intermediate School, Danville Primary School East Cleveland, East Cleveland City School District: Caledonia Elementary School, Chambers Elementary School, Mayfair Elementary School, Rozelle Elementary School, Shaw High School, Superior Elementary School Lima, Lima City School District: Lowell Elementary School London, London City School District: London Middle School Ripley: Ripley-Union-Lewis-Huntington Elementary School Sidney, Sidney City School District: Bridgeview Middle School Steubenville, Diocese of Steubenville: Catholic Central High School Toledo, Washington Local School District: Jefferson Junior High School Upper Sandusky, Upper Sandusky Exempted Village School District*: East Elementary School South Dakota Huron: James Valley Christian School Sioux Falls*: Calvin Christian School Wisconsin Appleton: Fox Valley Lutheran School, Grace Christian Day School Edgerton: Oaklawn Academy Kenosha, Kenosha Unified School District 1: Bose Elementary School, Bullen Middle School, Grewenow Elementary School, Jeffery Elementary School, Lance Middle School, Lincoln Middle School, McKinley Middle School Milwaukee: Bessie M. Gray Prep. Academy*, Clara Muhammed School, Early View Academy of Excellence, Hickman’s Prep. School*, Milwaukee Multicultural Academy, Mount Olive Lutheran School, The Woodson Academy Milwaukee, Milwaukee Public Schools*: Khamit Institute Oshkosh, Diocese of Green Bay*: St. John Neumann School Plymouth, Plymouth Joint School District: Cascade Elementary School, Fairview Elementary School, Horizon Elementary School, Parkview Elementary School, Parnell Elementary School, Riverview Middle School Stoughton, Stoughton Area School District: Sandhill School, Yahara Elementary School Strum, Eleva-Strum School District: Eleva-Strum Primary School 21 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 22 West and Far West Alaska Anchorage: Heritage Christian School Juneau, Juneau School District: Auke Bay Elementary School, Dzantik’i Heeni Middle School, Floyd Dryden Middle School Nikiski, Kenai Peninsula Borough School District: Nikiski Middle High School Palmer: Valley Christian School Arizona Litchfield Park, Litchfield Elementary School District 79: Litchfield Elementary School Mesa, Diocese of Phoenix: Christ the King School Mesa, Mesa Unified School District 4: Franklin East Elementary School Phoenix, Creighton School District 14*: Loma Linda Elementary School Phoenix, Washington Elementary School District 6: Roadrunner Elementary School Pima, Pima Unified School District 6*: Pima Elementary School Teec Nos Pos: Immanuel-Carrizo Christian Academy Tempe: Grace Community Christian School Tucson: Tucson Hebrew Academy California Atascadero, Atascadero Unified School District: Carrisa Plains Elementary School, Creston Elementary School Bakersfield, Panama Buena Vista Union School District: Laurelglen Elementary School Cathedral City, Palm Springs Unified School District: Cathedral City Elementary School Cerritos, ABC Unified School District: Faye Ross Middle School, Joe A. Gonsalves Elementary School, Palms Elementary School Fontana, Fontana Unified School District*: North Tamarind Elementary School Fresno, Fresno Unified School District: Malloch Elementary School Lompoc, Lompoc Unified School District: LA Canada Elementary School, Leonora Fillmore Elementary School Los Angeles, Archdiocese of Los Angeles: Alverno High School, St. Lucy School Los Angeles, Los Angeles Unified School District, Local District C: Lanai Road Elementary School Los Angeles, Los Angeles Unified School District, Local District D: LA Center For Enriched Studies Los Angeles, Los Angeles Unified School District, Local District F: Pueblo De Los Angeles High School Los Angeles, Los Angeles Unified School District, Local District G: Fifty-Second Street Elementary School Los Angeles, Los Angeles Unified School District, Local District I: Alain LeRoy Locke Senior High School, David Starr Jordan Senior High School, Youth Opportunities Unlimited 22 Modesto, Modesto City School District: Everett Elementary School, Franklin Elementary School Norco, Corona-Norco Unified School District: Coronita Elementary School, Highland Elementary School Oakland, Oakland Unified School District: Lockwood Elementary School Oceanside, Oceanside Unified School District: San Luis Rey Elementary School Palm Desert, Desert Sands Unified School District: Palm Desert Middle School Ripon, Ripon Unified School District: Ripon Elementary School Salinas: Winham Street Christian Academy San Clemente, Diocese of Orange: Our Lady of Fatima School San Diego, San Diego Unified School District: Sojourner Truth Learning Academy San Diego, Streetwater Union High School District: Mar Vista Middle School San Francisco: Hebrew Academy-San Francisco Santa Ana, Diocese of Orange*: Our Lady of the Pillar School Santa Ana, Santa Ana Unified School District*: Dr. Martin L. King Elementary School, Madison Elementary School Walnut, Walnut Valley Unified School District*: Walnut Elementary School Colorado Aurora, Aurora School District 28-J: Aurora Central High School, East Middle School Colorado Springs, Academy School District 20: Explorer Elementary School, Foothills Elementary School, Pine Valley Elementary School Denver: Beth Eden Baptist School Denver, Archdiocese of Denver: Bishop Machebeuf High School Golden, Jefferson County School District R-1: Bear Creek Elementary School, Campbell Elementary School, D’Evelyn Junior/Senior High School, Devinny Elementary School, Jefferson Academy Elementary School, Jefferson Academy Junior High School, Jefferson Hills, Lakewood Senior High School, Lincoln Academy, Moore Middle School, Sierra Elementary School Northglenn, Adams 12 Five Star Schools: Northglenn Middle School Parker, Douglas County School District R-1: Colorado Visionary Academy Penrose, Fremont School District R-2*: Penrose Elementary/Middle School Thornton, Adams 12 Five Star Schools*: Cherry Drive Elementary School, Eagleview Elementary School Windsor, Windsor School District R-4: Mountain View Elementary School, Skyview Elementary School Hawaii Honolulu: Holy Nativity School, Hongwanji Mission School Lawai: Kahili Adventist School 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 23 Idaho Boise: Cole Christian School Burley, Cassia County Joint School District 151: Albion Elementary School, Cassia County Education Center, Declo Elementary School, Oakley Junior/Senior High School, Raft River Elementary School, Raft River High School, White Pine Intermediate School Kimberly*: Kimberly School District 414 Saint Maries: St. Maries Joint School District 41 Twin Falls: Immanuel Lutheran School Montana Belfry, Belfry School District 3*: Belfry School Billings*: Billings Christian School Kalispell: Flathead Christian School Miles City, Diocese of Great Falls-Billings: Sacred Heart Elementary School Willow Creek, Willow Creek School District 15-17J: Willow Creek School Nevada Amargosa Valley, Nye County School District: Amargosa Valley Elementary School Henderson*: Black Mountain Christian School Reno: Silver State Adventist School Sandy Valley, Clark County School District-Southwest*: Sandy Valley School Sparks*: Legacy Christian Elementary School New Mexico Alamogordo, Alamogordo School District 1*: Sacramento Elementary School Albuquerque: Evangelical Christian Academy Espanola, Espanola School District 55: Chimayo Elementary School, Hernandez Elementary School, Velarde Elementary School Oklahoma Arapaho, Arapaho School District I-5: Arapaho School Bray: Bray-Doyle School District 42 Chickasha: Chickasha School District 1 Duncan: Duncan School District 1 Elk City: Elk City School District 6 Guthrie: Guthrie School District 1 Laverne: Laverne School District Milburn, Milburn School District I-29: Milburn Elementary School, Milburn High School Mill Creek: Mill Creek School District 2 Oklahoma City: Oklahoma City School District I-89 Oklahoma City, Archdiocese of Oklahoma City: St. Charles Borromeo School Purcell: Purcell School District 15 Roland: Roland Independent School District 5 Shidler, Shidler School District 11: Shidler High School Tulsa: Tulsa Adventist Academy Tulsa, Diocese of Tulsa: Bishop Kelley High School, Holy Family Cathedral School Wellston, Wellston School District 4*: Wellston Public School Oregon Boring*: Hood View Junior Academy Corvallis, Corvallis School District 509J: Inavale Elementary School, Western View Middle School Eugene, Eugene School District 4J: Buena Vista Spn. Immersion School, Gilham Elementary School, Meadowlark Elementary School, Washington Elementary School Grants Pass: Brighton Academy Jefferson: Jefferson School District 14J Portland*: Portland Christian Schools Portland, Archdiocese of Portland: Fairview Christian School, O’Hara Catholic School Texas Amarillo, Diocese of Amarillo*: Alamo Catholic High School Baird*: Baird Independent School District Brownsboro, Brownsboro Independent School District*: Brownsboro Elementary School, Brownsboro High School, Brownsboro Intermediate School, Chandler Elementary School Dallas, Dallas Independent School District-Area 1: Gilbert Cuellar Senior Elementary School Deweyville, Deweyville Independent School District: Deweyville Elementary School Dilley, Dilley Independent School District: Dilley High School Driscoll: Driscoll Independent School District Franklin: Franklin Independent School District Fresno, Fort Bend Independent School District*: Walter Moses Burton Elementary School Gladewater: Gladewater Independent School District Houston, Spring Branch Independent School District*: Cornerstone Academy, Spring Shadows Elementary School Imperial: Buena Vista Independent School District Jacksonville, Jacksonville Independent School District: Jacksonville Middle School, Joe Wright Elementary School Laredo*: Laredo Christian Academy Lubbock, Lubbock Independent School District: S. Wilson Junior High School Nederland, Nederland Independent School District*: Wilson Middle School Odessa, Ector County Independent School District: Burleson Elementary School Perryton: Perryton Independent School District Stockdale, Stockdale Independent School District: Stockdale Elementary School, Stockdale High School Sugar Land, Fort Bend Independent School District: Sugar Mill Elementary School Whitesboro, Whitesboro Independent School District: Whitesboro High School 23 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 24 Utah American Fork, Alpine School District: Lehi High School, Lone Peak High School, Manila Elementary School, Meadow Elementary School Brigham City, Box Elder County School District*: Adele C. Young Intermediate School, Box Elder Middle School, Perry Elementary School, Willard Elementary School Cedar City, Iron County School District: Cedar Middle School Eskdale: Shiloah Valley Christian School Layton, Davis County School District: Crestview Elementary School Murray: Deseret Academy Murray, Murray City School District: Liberty Elementary School Ogden*: St. Paul Lutheran School Ogden, Ogden City School District*: Bonneville Elementary School, Carl H. Taylor Elementary School, Gramercy Elementary School, Ogden High School Ogden, Weber School District*: Green Acres Elementary School Orem, Alpine School District*: Canyon View Junior High School Price: Carbon County School District Tremonton, Box Elder County School District: North Park Elementary School 24 Washington Gig Harbor: Gig Harbor Academy Seattle: North Seattle Christian School Spokane: Spokane Lutheran School Vancouver: Evergreen School District 114 Wyoming Cheyenne: Trinity Lutheran School Torrington: Valley Christian School Yoder, Goshen County School District 1: South East School 961464_ITBS_GuidetoRD.qxp 10/29/10 PART 3 3:15 PM Page 25 Validity in the Development and Use of The Iowa Tests Validity in Test Use Validity is an attribute of information from tests that, according to the Standards for Educational and Psychological Testing, “refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (1999, p. 9). Assessment information is not considered valid or invalid in any absolute sense. Rather, the information is considered valid for a particular use or interpretation and invalid for another. The Standards further state that validation involves the accumulation of evidence to support the proposed score interpretations. This part of the Guide to Research and Development provides an overview of the data collected over the history of The Iowa Tests that pertains to validity. Data and research pertaining to The Iowa Tests consider the five major sources of validity evidence outlined in the Standards: (1) test content, (2) response processes, (3) internal structure, (4) relations to other variables, and (5) consequences of testing. The purposes of this part of the Guide are (1) to present the rationale for the professional judgments that lie behind the content standards and organization of the Iowa Tests of Basic Skills, (2) to describe the process used to translate those judgments into developmentally appropriate test materials, and (3) to characterize a range of appropriate uses of results and methods for reporting information on test performance to various audiences. Criteria for Evaluating Achievement Tests Evaluating an elementary school achievement test is much like evaluating other instructional materials. In the latter case, the recommendations of other educators as well as the authors and publishers would be considered. The decision to adopt materials locally, however, would require page-by-page scrutiny of the materials to understand their content and organization. Important factors in reviewing materials would be alignment with local educational standards and compatability with instructional methods. The evaluation of an elementary achievement test is much the same. What the authors and publisher can say about how the test was developed, what statistical data indicate about the technical characteristics of the test, and what judgments of quality unbiased experts make in reviewing the test all contribute to the final evaluation. But the decision about the potential validity of the test rests primarily on local review and item-by-item inspection of the test itself. Local analysis of test content—including judgments of its appropriateness for students, teachers, other school personnel, and the community at large—is critical. Validity of the Tests Validity must be judged in relation to purpose. Different purposes may call for tests built to different specifications. For example, a test intended to determine whether students have reached a performance standard in a local district is unlikely to have much validity for measuring differences in progress toward individually determined goals. Similarly, a testing program designed primarily to answer “accountability” questions may not be the best program for stimulating differential instruction and creative teaching. Cronbach long ago made the point that validation is the task of the interpreter: “In the end, the responsibility for valid use of a test rests on the person who interprets it. The published research merely provides the interpreter with some facts and concepts. He has to combine these with his other knowledge about the person he tests. . . .” (1971, p. 445). Messick contended that published research should bolster facts and concepts with “some exposition of the critical value contents in which the facts are embedded and with provisional accounting of the potential social consequences of alternative test uses” (1989, p. 88). Instructional decisions involve the combination of test validity evidence and prior information about 25 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 26 the person or group tested. The information that test developers can reasonably be expected to provide about all potential uses of tests in decision-making is limited. Nevertheless, one should explain how tests are developed and provide recommendations for appropriate uses. In addition, guidelines should be established for reporting test results that lead to valid score interpretations so that the consequences of test use at the local level are clear. The procedures used to develop and revise test materials and interpretive information lay the foundation for test validity. Meaningful evidence related to inferences based on test scores, not to mention desirable consequences from those inferences, can only provide test scores with social utility if test development produces meaningful test materials. Content quality is thus the essence of arguments for test validity (Linn, Baker & Dunbar, 1991). The guiding principle for the development of The Iowa Tests is that materials presented to students be of sufficient quality to make the time spent testing instructionally useful. Passages are selected for the Reading tests, for example, not only because they yield good comprehension questions, but because they are interesting to read. Items that measure discrete skills (e.g., capitalization and punctuation) contain factual content that promotes incidental learning during the test. Experimental contexts in science expose students to novel situations through which their understanding of scientific reasoning can be measured. These examples show ways in which developers of The Iowa Tests try to design tests so taking the test can itself be considered an instructional activity. Such efforts represent the cornerstone of test validity. Statistical Data to Be Considered The types of statistical data that might be considered as evidence of test validity include reliability coefficients, difficulty indices of individual test items, indices of the discriminating power of the items, indices of differential functioning of the items, and correlations with other measures such as course grades, scores on other tests of the same type, or experimental measures of the same content or skills. All of these types of evidence reflect on the validity of the test, but they do not guarantee its validity. They do not prove that the test measures what it purports to measure. They certainly cannot reveal whether the things being measured are those that ought to be measured. A high reliability coefficient, for example, shows that the test is measuring something consistently but does not indicate what 26 that “something” is. Given two tests with the same title, the one with the higher reliability may actually be the less valid for a particular purpose (Feldt, 1997). For example, one can build a highly reliable mathematics test by including only simple computation items, but this would not be a valid test of problem-solving skills. Similarly, a poor test may show the same distribution of item difficulties as a good test, or it may show a higher average index of discrimination than a more valid test. Correlations of test scores with other measures are evidence of the validity of a test only if the other measures are better than the test that is being evaluated. Suppose, for example, that three language tests, A, B, and C, show high correlations among themselves. These correlations may be due simply to the three tests exhibiting the same defects—such as overemphasis on memorization of rules. If Test D, on the other hand, is a superior measure of the student’s ability to apply those rules, it is unlikely to correlate highly with the other three tests. In this case, its lack of correlation with Tests A, B, and C is evidence that Test D is the more valid test. This is not meant to imply that well-designed validation studies are of no value; published tests should be supported by a continuous program of research. Rational judgment also plays a key part in evaluating the validity of achievement tests against content and process standards and in interpreting statistical evidence from validity studies. Validity of the Tests in the Local School Standardized tests such as the Iowa Tests of Basic Skills are constructed to correspond to widely accepted goals of instruction in schools across the nation. No standardized test, no matter how carefully planned and constructed, can ever be equally suited for use in all schools. Local differences in curricular standards, grade placement, and instructional emphasis, as well as differences in the nature and characteristics of the student population, should be taken into account in evaluating the validity of a test. The two most important questions in the selection and evaluation of achievement tests at the local level should be: 1. Are the skills and abilities required for successful test performance those that are appropriate for the students in our school? 2. Are our standards of content and instructional practices represented in the test questions? 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 27 To answer these questions, those making the determination should take the test or at least answer a sample of representative questions. In taking the test, they should try to decide by which cognitive processes the student is likely to reach the correct answer. They should then ask: • Are all the cognitive processes considered important in the school represented in the test? • Are any desirable cognitive processes omitted? • Are any specific skills or abilities required for successful test performance unrelated to the goals of instruction? Evaluating an achievement test battery in this manner is time-consuming. It is, however, the only way to discern the most important differences among tests and their relationships to local curriculum standards. Considering the importance of the inferences that will later be drawn from test results and the influence the test may exert on instruction and guidance in the school, this type of careful review is important. Domain Specifications The content and process specifications for The Iowa Tests have undergone constant revision for more than 60 years. They have involved the experience, research, and expertise of professionals from a variety of educational specialties. In particular, research in curriculum practices, test design, technical measurement procedures, and test interpretation and utilization has been a continuing feature of test development. Criteria for the design of assessments, the selection and placement of items, and the distribution of emphasis in a test include: 1. Placement and emphasis in current instructional materials, including textbooks and other forms of published materials for teaching and learning. 2. Recommendations of the education community in the form of subject-matter standards developed by national organizations, state and national curriculum frameworks, and expert opinion in instructional methods and the psychology of learning. 3. Continuous interaction with users, including discussions of needs and priorities, reviews, and suggestions for changes. Feedback from students, teachers, and administrators has resulted in improvements of many kinds (e.g., Frisbie & Andrews, 1990). 4. Frequency of need or occurrence and social utility studies in various curriculum areas. 5. Studies of frequency of misunderstanding, particularly in reading, language, and mathematics, as determined from research studies and data from item tryout. 6. Importance or cruciality, a judgment criterion that may involve frequency, seriousness of error or seriousness of the social consequences of error, expert judgment, instructional trends, public opinion, etc. 7. Independent reviews by professionals from diverse cultural groups for fairness and appropriateness of content for students of different backgrounds based on geography, race/ethnicity, gender, urban/suburban/rural environment, etc. 8. Empirical studies of differential item functioning (e.g., Qualls, 1980; Becker & Forsyth, 1994; Lewis, 1994; Lee, 1995; Lu & Dunbar, 1996; Witt, Ankenmann & Dunbar, 1996; Huang, 1998; Ankenmann, Witt & Dunbar, 1999; Dunbar, Ordman & Mengeling, 2002; Snetzler & Qualls, 2002). 9. Technical characteristics of items relating to content validity; results of studies of characteristics of item formats; studies of commonality and uniqueness of tests, etc. (e.g., Schoen, Blume & Hoover, 1990; Gerig, Nibbelink & Hoover, 1992; Nibbelink & Hoover, 1992; Nibbelink, Gerig & Hoover, 1993; Witt, 1993; Bray & Dunbar, 1994; Lewis, 1994; Frisbie & Cantor, 1995; Perkhounkova, Hoover & Ankenmann, 1997; Bishop & Frisbie, 1999; Perkhounkova & Dunbar, 1999; Lee, Dunbar & Frisbie, 2001). The importance of each of these criteria differs from grade to grade, from test to test, from level to level, and even from skill to skill within tests. For example, the correspondence between test content and textbook or instructional treatment varies considerably. In the upper grades, the Vocabulary, Reading Comprehension, and Math Problem Solving tests are relatively independent of the method used to teach these skills. On the other hand, there is a close correspondence between the first part of the Math Concepts and Estimation test and the vocabulary, scope, sequence, and methodology of leading textbooks, as well as between the second part of that test and The National Council of Teachers of Mathematics (NCTM) Standards for estimation skills. 27 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 28 Content Standards and Development Procedures New forms of The Iowa Tests are the result of an extended, iterative process during which “experimental” test materials are developed and administered to national and state samples to evaluate their measurement quality and appropriateness. The flow chart in Figure 3.1 shows the steps involved in test development. Curriculum Review Review of local, state, and national guidelines for curriculum in the subjects included in The Iowa Tests is an ongoing activity of the faculty and staff of the Iowa Testing Programs. How well The Iowa Tests reflect current trends in school curricula is monitored through contact with school administrators, curriculum coordinators, and classroom teachers across the United States. New editions of the tests are developed to be consistent with lasting shifts in curriculum and instructional practice when such changes can be accommodated by changes in test content and item format. Supplementary measures of achievement, such as the Iowa Writing Assessment and the ConstructedResponse Supplement, are developed when the need arises for a new approach to measurement. Preliminary Item Tryout Developing The Iowa Tests involves research in the areas of curriculum, instructional practice, materials design, and psychometric methods. This work contributes to the materials that undergo preliminary tryout as part of the Iowa Basic Skills Testing Program. During this phase of development, final content standards for new forms of the tests are determined. Preliminary tryouts involve multiple revisions to help ensure high quality in the materials that become part of the item bank used to develop final test forms. Materials that do not meet the necessary standards for content and technical quality are revised or discarded. The preliminary tryout of items is a regular part of the Iowa Basic Skills Testing Program. This state testing program is a cooperative effort maintained and supported by the College of Education of the University of Iowa. Over 350 school systems, testing over 250,000 students annually, administer the ITBS under uniform conditions. To participate, each school agrees to schedule a twenty-minute testing period for tryout materials for new editions. In the preliminary tryouts, new items are organized into units (short test booklets) that are distributed to 28 students in a spiraled sequence. For example, if 30 units are tried out in a given grade, each student in each consecutive set of 30 students receives a different unit. This process assures the sample of students to which each unit is administered represents all schools in the tryout. It also assures a high degree of comparability of results from unit to unit. For Levels 9 to 14, the ITBS forms published since 1955 (18,772 total items), 1,681 units with 46,741 test items were tried out in this fashion. Each unit was administered to a sample of approximately 200 students per grade, usually in three consecutive grades. For Levels 9 to 14 of Forms A and B (3,158 total items), 361 units with 8,708 items were included in the preliminary item tryout. Because most tests in Levels 5 through 8 are read aloud by the teacher, tryout units for these levels are given to intact groups. The procedures used for these tryouts involve stratification of the available schools according to prior achievement test scores. Tryout units are systematically rotated to ensure comparable groups across units. For the nine forms of Levels 5 through 8 (7,340 total items), 336 units with approximately 13,010 items were tried out. For Forms A and B (1,837 total items), 76 tryout units with 2,388 items were assembled. Nearly 200,000 students in kindergarten through grade 8 participated in the preliminary item tryouts for Forms A and B. Standard procedures are used to analyze item data from the preliminary tryout. Difficulty and discrimination indices are computed for each item; because performance in Iowa schools differs significantly and systematically from national performance, difficulty indices are adjusted. Biserial correlations between items and total unit score measure discrimination. Items with increasing percent correct in successive grades provide evidence of developmental discrimination. National Item Tryout After the results of the preliminary tryout are analyzed, items from all tests are administered to selected national samples. The national item tryout for Forms A and B of the ITBS was conducted in the fall of 1998 and the spring and fall of 1999. Approximately 100,000 students were tested (approximately 11,000 students per grade in kindergarten through grade 8). A total of 10,370 items were included in the national item tryouts for Forms A and B. The major purpose of the national item tryouts is to obtain item difficulty and discrimination data on 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 29 Figure 3.1 Steps in Development of the Iowa Tests of Basic Skills Educational Community • National Curriculum Organizations • State Curriculum Guides • Content Standards • Textbooks and Instructional Materials Iowa Testing Programs (ITP) Test Specifications Item Writers ITP Editing and Content Review NO Iowa Tryout Analysis of Iowa Data NO ITP Revisions Iowa Tryout YES Iowa Item Bank RPC Editing and Content Review National Tryout NO YES External Fairness/Content Review Analysis of National Tryout Data and Reviewer Comments NO YES Test Item Bank Preliminary Forms of the Tests Special Content Review Final Forms of the Tests Standardization 29 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 30 a national sample of diverse curricular and demographic characteristics and objective data on possible ethnic and gender differences for the analysis of differential item functioning (DIF). Difficulty and discrimination indices, DIF analyses, and other external criteria were used for final item selection. Results of DIF analyses by gender and race/ethnicity are described in Part 7. Fairness Review Content analysis of test materials is a critical aspect of test development. To ensure that items represent ethnic groups accurately and show the range of interests of both genders, expert panels review all materials in the national item tryout. Such panels serve a variety of functions, but their role in test development is to ensure that test specifications cover what is intended given the definition of each content domain. They also ensure that item formats in each test make questions readily accessible to all students and that sources of construct-irrelevant variance are minimized. Members of these panels come from national education communities with diverse social, cultural, and geographic perspectives. A description of the fairness review procedures used for Forms A and B of the tests appears in Part 7. Development of Individual Tests The distribution of skills in Levels 5 through 14 of the Iowa Tests of Basic Skills appears in Table 3.1. The table indicates major categories in the content specifications for each test during item development. For some tests (e.g., Vocabulary), more categories are used during item development than may be used on score reports because their presence ensures variety in the materials developed and chosen for final forms. For most tests, however, the major content categories are broken down further in diagnostic score reports. The current edition of the ITBS reflects a tendency among educators to focus on core standards and goals, particularly in reading and language arts, so in some parts of the battery there are fewer specific skill categories than in earlier editions. The following descriptions of major content areas of the Complete Battery provide information about the conceptual definition of each domain. They address general issues related to measuring achievement in each domain and give the rationale for approaches to item development. A unique feature of tests such as the ITBS is that the continuum of achievement they measure spans ages 5 to 14, when cognitive and 30 social development proceed at a rapid rate. The definition of each domain describes school achievement over a wide range. Thus, each test in the battery is actually conceived as a broad measure of educational development in school-related subjects for students ages 5 through 14. Vocabulary Understanding the meanings of words is essential to all communication and learning. Schools can contribute to vocabulary power through planned, systematic instruction; informal instruction whenever the opportunity arises; reading of a variety of materials; and activities and experiences, such as field trips and assemblies. One of a teacher’s most important responsibilities is to provide students with an understanding of the specialized vocabulary and concepts of each subject area. The Vocabulary test involves reading and word meaning as well as concept development. Linguistic/structural distinctions among words are monitored during test development so each form and level includes nouns, verbs, and modifiers. At each test level, vocabulary words come from a range of subjects and represent diverse experiences. Because the purpose of the Vocabulary test is to provide a global measure of word knowledge, specific skills are not reported. Monitoring the parts of speech in the Vocabulary test is important because vocabulary is closely tied to concept development. The classification of words is based on functional distinctions; i.e., the part that nouns, verbs, modifiers, and connectives play in language. Representing the domain of word knowledge in this way is especially useful for language programs that emphasize writing. Although words are categorized by part of speech (nouns, verbs, and modifiers) for test development, skill scores are not reported by category. In Levels 5 and 6, the Vocabulary test is read aloud by the teacher. The student is required to identify the picture that goes with a stimulus word. In Levels 7 and 8, the Vocabulary test is read silently by the student and requires decoding skills. In the first part, the student selects the word that describes a stimulus picture. In the second part, the student must understand the meaning of words in the context of a sentence. In Levels 9 through 14, items consist of a word in context followed by four possible definitions. Stimulus words were chosen from The Living Word Vocabulary (Dale & O’Rourke, 1981), as were words constituting the definitions. Emphasis is on grade- 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 31 Table 3.1 Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B Levels 5 and 6 Test Levels 7 and 8 Levels 9 –14 Number of Major Categories Number of Skills Objectives Number of Major Categories Vocabulary 1 3 1 3 1 3 Reading Comprehension 2 5 2 4 3 9 3 8 3 7 4 12 Spelling – – 4 4 3 5 Capitalization – – – – 7 19 Punctuation – – – – 4 21 Usage and Expression – – – – 5 22 7 7 4 13 19 67 Math Concepts and Estimation – – 4 13 6 20 Math Problem Solving and Data Interpretation – – 6 12 6 13 Number of Skills Objectives Number of Major Categories Number of Skills Objectives Reading Reading Total Language Language Total Mathematics Math Computation – – 2 6 12 24 4 11 12 31 24 57 Social Studies – – 4 12 4 14 Science – – 4 10 4 11 Maps and Diagrams – – – – 3 10 Reference Materials – – – – 3 10 – – 4 9 6 20 Listening 2 8 2 8 2* 8* Word Analysis 2 6 2 8 2* 8* 18 40 35 98 Math Total Sources of Information Sources Total Total 65 197 *Listening and Word Analysis are supplementary tests at Level 9. level appropriate vocabulary that children are likely to encounter and use in daily activities, both in and out of school, rather than on specialized or esoteric words, jargon, or colloquialisms. Nouns, verbs, and modifiers are given approximately equal representation. Target words are presented in a short context to narrow the range of meaning and, in the upper grades, to allow testing for knowledge of uncommon meanings of common vocabulary words. Word selection is carefully monitored to prevent the use of extremely common words and cognates as distractors and to ensure that the same target words do not appear in parallel forms of current and previous editions of the tests. Few words in the English language have exactly the same meaning. An effective writer or speaker is one who can select words that express ideas precisely. It is not the purpose of an item in the Vocabulary test to determine whether the student knows the meaning of a single word (the stimulus word). Nor is it necessary that the response words be easier or more frequently used than the stimulus word, although they tend to be. Rather, the immediate 31 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 32 purpose of each item is to determine if the student is able to discriminate among the shades of meaning of all words used in the item. Thus, a forty-item Vocabulary test may sample as many as 200 words from a student’s general vocabulary. Word Analysis The purpose of the Word Analysis test is to provide diagnostic information about a student’s ability to identify and analyze distinct sounds and symbols of spoken and written language. At all levels the test emphasizes the student’s ability to transfer phonological representations of language to their graphemic counterparts. Transfer is from sounds to symbols in all items, which is consistent with developmental patterns in language learning. Word analysis skills are tested in Levels 5 through 9. Skills involving sound-letter association, phonemic awareness, and word structure are represented. Stimuli consist of pictures, spoken language, written language, and novel words. In Levels 5 and 6, the skills involve letter recognition, letter-sound correspondence, initial sounds, final sounds, and rhyming sounds. In Levels 7 through 9, more complex phonological structures are introduced: medial sounds; silent letters; initial, medial, and final substitutions; long and short vowel sounds; affixes and inflections; and compound words. Items in the Word Analysis test measure decoding skills that require knowledge of grapheme-phoneme relationships. Information from such items may be useful in diagnosing difficulties of students with low scores in reading comprehension. Reading The Reading tests in the Complete Battery of the ITBS have emphasized comprehension over the 15 editions. This emphasis continues in the test specifications for all levels. The Reading tests are concerned with a student’s ability to derive meaning; skills related to the so-called building blocks of reading comprehension (graphemephoneme connections, word attack, sentence comprehension) are tested in other parts of the battery in the primary grades. When students reach an age when independent reading is a regular part of daily classroom activity, the emphasis shifts to questions that measure how students derive meaning from what they read. The Reading test in Levels 6 through 8 of the ITBS accommodates a wide range of achievement in early reading. Level 6 consists of Reading Words and Reading Comprehension. Reading Words includes 32 three types of items to measure how well the student can identify and decode letters and words in context. Auditory and picture cues are used to measure word recognition. Complete sentences accompanied by picture cues are used to measure word attack. Reading Comprehension includes three types of items to assess how well the student can understand sentences, picture stories, and paragraphs. Separate scores are reported for Reading Words and Reading Comprehension so it is possible to obtain a Reading score for students who are not reading beyond the word level. In Levels 7 and 8, the Reading test measures sentence and story comprehension. Sentence comprehension is a cloze task that requires the student to select the word that completes a sentence appropriately. Story comprehension is assessed with pictures or text as stimuli. Pictures that tell a story are followed by questions that require students to identify the story line and understand connections between characters and plot. Fiction and nonfiction topics are used to measure how well the student can read and comprehend paragraphs. Story comprehension skills require students to understand factual details as well as make inferences and draw generalizations from what they have read. Levels 5 through 9 of the Complete Battery of Forms A and B combine the tests related to early reading in the Primary Reading Profile. The reading profile can be used to help explain the reasons for low scores on the Reading test at these levels. It is described in the Interpretive Guide for Teachers and Counselors for Levels 5–8. New to Forms A and B is a two-part structure for the Reading Comprehension test at Levels 9–14. Prior to standardization, a preliminary version of the Form A Reading Comprehension test was administered in two separately timed sections. Item analysis statistics and completion rates indicated that the two-part structure was preferable to a single, timed administration. In Levels 9 through 14, the Reading Comprehension test consists of passages that vary in length, generally becoming longer and more complex in the progression from Levels 9 to 14. The range of content in Form A of the Complete Battery is shown in Table 3.2. The passages represent various types of material students read in and out of school. They come from previously published material and information sources, and they include narrative, poetry, and topics in science and social studies. In addition, some passages explain how to make or do something, express an opinion or point of view, or 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 33 Table 3.2 Types of Reading Materials Iowa Tests of Basic Skills — Complete Battery, Form A Items Nonfiction (Biography) Social Studies (U.S. History) Literature (Fiction) Literature (Poetry) Science (Field Observation) Nonfiction (Newspaper Editorial) Social Studies (Anthropology) Science (Earth Science) 6 4 4 7 6 4 6 5 7 4 Table 3.3 presents the content specifications for the Reading Comprehension test. It includes a sample of items from Levels 10 and 14 that corresponds to each process skill in reading. More detailed information about other levels of the Reading Comprehension test is provided in the Interpretive Guide for Teachers and Counselors. 7 6 6 4 8 Level 13 Nonfiction (Social Roles) Science (Human Anatomy) Social Studies (Preservation) Social Studies (Government Agency) 5 Listening 5 7 5 7 8 7 5 Level 14 Nonfiction (Personal Essay) Literature (Poetry) Literature (Fiction) Social Studies (Culture and Traditions) 5 Level 11 Nonfiction (Transportation) Literature (Fiction) Nonfiction (Food and Culture) Science (Insects) 4 5 Reading requires critical thinking on a number of levels. The ability to decode words and understand literal meaning is, of course, important. Yet active, strategic reading has many other components. An effective reader must draw on experience and background knowledge to make inferences and generalizations that go beyond the words on the page. Active readers continually evaluate what they read to comprehend it fully. They make judgments about the central ideas of the selection, the author’s point of view or purpose, and the organizational scheme and stylistic qualities used. This is true at all developmental levels. Children do not suddenly learn to read with such comprehension at any particular age or grade. Thoughtful reading is the result of a period of growth in comprehension that begins in kindergarten or first grade; no amount of concentrated instruction in the upper elementary grades can make up for a lack of attention to reading for meaning in the middle or lower grades. Measurement of these aspects of the reading process is required if test results are used to support inferences about reading comprehension. The ITBS Reading tests are based on content standards that reflect reading as a dynamic cognitive process. Level 12 Nonfiction (Biography) Science (Animal Behavior) Literature (Folktale) Literature (Poetry) 4 4 Level 9 Literature (Fiction) Literature (Fable) Social Studies (Urban Wildlife) Literature (Fiction) Level 10 Passages describe a person or topic of general interest. When needed, an introductory note is included to give background information. Passages are chosen to satisfy high standards for writing quality and appeal. Good literature and high-quality nonfiction offer opportunities for questions that tap complex cognitive processes and that sustain student interest. Listening comprehension is measured in Levels 5 through 9 of the ITBS Complete Battery and in Levels 9 through 14 of the Listening Assessment for ITBS. The Listening Assessment for ITBS is described in Part 9. Listening is often referred to as a “neglected area” of the school curriculum. Children become good listeners through a combination of direct instruction and incidental learning; however, children with good listening strategies use them throughout their school experiences. Good listening strategies developed in the early elementary years contribute to effective learning later. 33 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 34 Table 3.3 Reading Content/Process Standards Iowa Tests of Basic Skills — Complete Battery, Form A Illustrative Items Content/Process Standards Level 10 Level 14 Factual Understanding • Understand stated information • Understand words in context 6, 15, 28 19, 25 26, 35, 42 5, 28, 38 Inference and Interpretation • Draw conclusions • Infer traits, feelings, or motives of characters • Interpret information in new contexts • Interpret nonliteral language 4, 18, 29 17, 27, 30 7, 26 12, 14 19, 27, 41 2, 24 44, 51 21, 37 Analysis and Generalization • Determine main ideas • Identify author’s purpose or viewpoint • Analyze style or structure of a passage 11, 33 37 13, 20 17, 25, 47 39, 52 11, 22, 45 Levels 5 through 9 of the Listening test measure general comprehension of spoken language. Table 3.4 shows the content/process standards for these tests. They emphasize understanding meaning at all levels. Many comprehension skills measured by the Listening tests in the early grades reflect aspects of cognition measured by the Reading Comprehension tests in the later grades, when a student’s ability to construct meaning from written text has advanced. Such items would be much too difficult for the Reading tests in the Primary Battery because of the complex reading material needed to tap these skills. It is possible, however, to measure such skills through spoken language, so at the early levels the Listening tests are important indicators of the cognitive processes that influence reading. Language Language arts programs comprise the four communication skills that prepare students for effective interaction with others: reading, writing, listening, and speaking. These aspects of language are assessed in several ways in The Iowa Tests. Reading, because of its importance in the elementary grades, is assessed by separate tests in Levels 6 through 14 of the ITBS. Writing, the process of generating, organizing, and expressing ideas in written form, is the focus of the Iowa Writing Assessment in Levels 9 through 14. 34 Table 3.4 Listening Content/Process Standards Iowa Tests of Basic Skills — Complete Battery Process Skill Levels Levels 5–9 9–14* Literal Meaning Following Directions Visual Relationships Sustained Listening Inferential Meaning Concept Development Predicting Outcomes Sequential Relationships Numerical / Spatial / Temporal Relationships Speaker’s Purpose, Point of View, or Style *Listening is a supplementary test at these levels. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 35 Listening, the process of paying attention to and understanding spoken language, is measured in the ITBS Complete Battery at Levels 5 through 9 and in the Listening Assessment for ITBS in Levels 9 through 14. Selecting or developing revisions of written text is measured by the ConstructedResponse Supplement to The Iowa Tests: Thinking About Language in Levels 9 through 14. The domain specifications for the Complete Battery and Survey Battery of the ITBS identify aspects of spoken and written language important to clarity of thought and expression. The complexity of tasks presented and the transition from spoken to written language both progress from Level 5 through Level 14. The Language tests in Levels 5 and 6 measure the student’s comprehension of linguistic relationships common to spoken and written language. The tests focus on ways language is used to express ideas and to understand relationships. The student is asked to select a picture that best represents the idea expressed in what the teacher reads aloud. The subdomains of the Language tests in Levels 5 and 6 include: Operational Language: understanding relationships among subject, verb, object, and modifier Verb Tense: discriminating past, present, and future Classification: recognizing common characteristics or functions Prepositions: understanding relationships such as below, behind, between, etc. Singular-Plural: differentiating singular and plural referents Comparative-Superlative: understanding adjectives that denote comparison Spatial-Directional: translating verbal descriptions into pictures The Language tests in Levels 7 through 14 of the Complete Battery and Survey Battery were developed from domain specifications with primary emphasis on linguistic conventions common to standard written English.1 Although writing is taught in a variety of ways, the approaches share a common goal: a written product that expresses the writer’s meaning as precisely as possible. An important quality of good writing is a command of the conventions of written English that allows the writer to communicate effectively with the intended audience. The proofreading, editing, and revising stages of the writing process involve these skills, and the proofreading format used for the Language tests is an efficient way to measure knowledge of these conventions. Although linguistic conventions change constantly (O’Conner, 1996), the basic skills in written communication have changed little over the years. The importance of precision in written communication is greater than ever for an increasing proportion of the adult population, whether because of the Internet or because of greater demand for information. To develop tests of language skills, authors must strike a balance between precision on the one hand and fluctuating standards of appropriateness on the other. In general, skills for the Language tests sample aspects of spelling, capitalization, punctuation, and usage pertaining to standard written English, according to language specialists and writing guides (e.g., The Chicago Manual of Style, Webster’s Guide to English Usage, The American Heritage Dictionary). Content standards for usage and written expression continue to evolve, reflecting the strong emphasis on effective writing in language arts programs. Levels 7 and 8 of the ITBS provide a smooth transition from the emphasis on spoken language found in Levels 5 and 6 to the emphasis on written language found in Levels 9 through 14. The entire test at Levels 7 and 8 is read aloud by the teacher. In sections involving written language, students read along with the teacher as they take the test. Spelling is measured in a separate test; the teacher reads a sentence that contains three keywords, and the student identifies which word is spelled incorrectly. Capitalization, punctuation, and usage/expression are measured in context, using items similar in format to those in Levels 9 through 14. The content of Levels 9 through 14 includes skills in spelling, capitalization, punctuation, and 1 The Language tests measure skills in the conventions of “standard” written English. Students with multicultural backgrounds or, particularly, second-language backgrounds may have special difficulty with certain types of situations presented in the Language tests. It is important to remember that the tests measure proficiency in standard written English, which may differ from the background language of the home. Such differences should be taken into consideration in interpreting the scores from the Language tests, and in any follow-up instruction that may be based, in part, on test results. 35 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 36 usage/expression. Separate tests in each area are used in the Complete Battery; a single test is used in the Survey Battery. Table 3.5 shows the distribution of skills for language tests in the Complete Battery and Survey Battery of Level 10, Form A. Writing effectively requires command of linguistic conventions in all of these areas at once, but greater diagnostic information about strengths and weaknesses in writing is obtained using separate tests in each area. Table 3.5 Comparison of Language Tests by Battery Iowa Tests of Basic Skills — Level 10, Form A Battery Content Area Complete Survey Spelling Root Words Words with Affixes Correct Spelling 32 22 6 4 11 Capitalization Names and Titles Dates and Holidays Place Names Organizations and Groups Writing Conventions Overcapitalization Correct Capitalization 26 3 3 6 3 6 2 3 10 Punctuation End Punctuation Comma Other Punctuation Marks Correct Punctuation 26 12 7 4 3 10 Usage and Expression Nouns, Pronouns, and Modifiers Verbs Conciseness and Clarity Organization of Ideas Appropriate Use 33 9 16 Total 117 8 4 5 7 47 The development of separate tests of language skills offers several advantages. First, content that offers opportunities to measure one skill (e.g., salutations in business letters on a punctuation test) may not offer opportunities to measure another. It is extremely difficult to construct a single test that will provide as comprehensive a definition of each domain—and hence as valid a test—as a separate test in each domain. Second, a single language test 36 covering all skills would need many items to yield a reliable, norm-referenced score in each area. (Note that national percentile ranks are not provided with the Survey Battery in spelling, capitalization, punctuation, and usage/expression.) Third, the directions to students can be complicated when items covering all four skills are included. Finally, it is easier to maintain uniformly high-quality test items if they focus on specific skills. In a unitary test, to retain good items it is sometimes necessary to include a less than ideal item associated with the same stimulus material. A comprehensive school curriculum in the language arts is likely to be considerably broader in coverage than the content standards of the ITBS Language tests. For example, many language arts programs teach the general research skills students need for lifelong learning. Because these abilities are used in many school subjects, they are tested in the Reference Materials test rather than in the Language tests. This permits more thorough coverage of reference skills and underscores their importance to all teachers. Language arts programs also develop competence in writing by having students read each other’s writing. Although such skills are measured in the second part of the Usage and Expression test, they are also covered in the Reading Comprehension test in standards related to inference and generalization. Each Language test in the Complete Battery is developed through delineation of the relevant domain. The content specifications are adjusted for each edition to reflect changing patterns in the conventions of standard written English. The trend toward so-called open punctuation, for example, has led to combining certain skills for that test in the current edition. Other details about content specifications appear in the Interpretive Guide for Teachers and Counselors. Domain descriptions for the language tests are as follows: Spelling. The Spelling test for Levels 9 through 14 directly measures a student’s ability to identify misspelled words. The items consist of four words, one of which may be misspelled. The student is asked to identify the incorrectly spelled word. A fifth response, “No mistakes,” is the correct response for approximately one-eighth of the items. The main advantage of the item format for Spelling is that it tests four spelling words in each item. Another advantage is the words tested better represent spelling words taught at target grade levels. With this item format, one can obtain suitable reliability and validity without using more 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 37 advanced or less familiar spelling words. Careful selection of words is the crucial aspect of content validity in spelling tests, regardless of item format. The spelling words chosen for each form of the test come from the speaking vocabulary of students at a given grade level. Errors are patterned after misspellings observed in student writing. In addition, misspellings are checked so that: (1) the keyword can be identified by the student despite its misspelling, (2) a misspelled English word is not a correctly spelled word in a common second language, and (3) target words do not appear in parallel forms of the current or previous edition of the battery. Spelling words are also selected to avoid overlap with target words and distractors in the Vocabulary test at the same level. The type of spelling item used on The Iowa Tests has been demonstrated to be superior to the type that presents four possible spellings of the same word. The latter type has several weaknesses. Many frequently misspelled words have only one common misspelling. Other spelling variations included as response options are seldom selected, limiting what the item measures. In addition, the difficulty of many spelling items in this form doesn’t reflect the frequency with which the word is misspelled. This inconsistency raises doubt about the validity of a test composed of such items. Educators often question the validity of multiplechoice spelling tests versus list-dictation tests. However, a strong relationship between dictation tests and certain multiple-choice tests has repeatedly been found (Frisbie & Cantor, 1995). Capitalization and Punctuation. Capitalization and punctuation skills function in writing rather than in speaking or reading. Therefore, a valid test of these skills should include language that might have been drawn from the written language of a student. The phrasing of items should also be on a level of sophistication commensurate with the age and developmental level of the student. Efforts have been made to include materials that might have come from letters, reports, stories, and other writing from a student’s classroom experience. The item formats in the Capitalization and Punctuation tests are similar. They include one or two sentences over three lines about equal in length. The student identifies the line with an error or selects a fourth response to indicate no error. This item type, uncommon in language testing, was the subject of extensive empirical study before being used in the ITBS. Large-scale tryout of experimental tests composed of such items indicated the reliability per unit of testing time was at least as high as that of more familiar item types. Items of this type also have certain logical advantages. An item in uninterrupted discourse is more likely to differentiate between students who routinely use correct language and those who lack procedural knowledge of the writing conventions measured. In the traditional multiple-choice item, the error situations are identified. For example, the student might only be required to decide whether “Canada” should be capitalized or whether a comma is required in a given place in a sentence. In the find-the-error item type used on the ITBS, however, error situations are not identified. The student must be sensitive to errors when they occur. Such items more realistically reflect the situations students encounter in peer editing or in revising their writing. Another reason for using the find-the-error format concerns the frequency of various types of errors. Some errors occur infrequently but are serious nonetheless. With a find-the-error item, all of the student’s language practices, good and bad, have an opportunity to surface during testing. Such items can be made easy or difficult as test specifications demand without resorting to artificial language or esoteric conventions. Usage and Expression. Knowledge and use of word forms and grammatical constructions are revealed in both spoken and written language. Spoken language varies with the social context. Effective language users change the way they express themselves depending on their audience and their intended meaning. Written language in school is a specific aspect of language use, and tests of written expression typically reflect writing conventions associated with “standard” English. Mastery of this aspect of written expression is a common goal of language arts programs in the elementary and middle grades; other important goals tend to vary from school to school, district to district, and state to state, and they tend to be elusive to measure. This is why broad-range achievement test batteries such as the ITBS focus exclusively on standard English. The mode of expression, spoken or written, influences how a person chooses words to express meaning. A student’s speech patterns may contain constructions that would be considered inappropriate in formal writing situations. Similarly, some forms 37 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 38 of written expression would be awkward if spoken because complex constructions do not typically occur in conversation. The ITBS Usage and Expression test includes situations that could arise in both written and spoken language. The first part of the Usage and Expression test uses the same item format found in Capitalization and Punctuation. Items contain one to three sentences, one of which may have a usage error. Students identify the line with the error or select “No mistakes” if they think there is no error. Some of the items in this part form a connected story. The validity, reliability, and functional characteristics of this item type are important considerations in its use. In a study of why students selected various distractors, students were found to make various usage errors—many more than could be sampled in a test of reasonable length. Satisfactory reliability could be achieved with fewer items if the items contained distractors in which other types of usage errors are commonly made. Comparisons of find-theerror items and other item formats indicated better item discrimination for the former. The difficulty of the find-the-error items also reflected more closely the frequency with which various errors occur in student speech and writing. The second part of the Usage and Expression test assesses broader aspects of written language, such as concise expression, paragraph development, and appropriateness to audience and purpose. This part includes language situations more directly associated with editing and revising paragraphs and stories. Students are required to discriminate between more and less desirable ways to express the same idea based on content standards in the areas of: Conciseness and Clarity: being clear and using as few words as possible Appropriateness: recognizing the word, phrase, sentence, or paragraph that is most appropriate for a given purpose Organization: understanding the structure of sentences and paragraphs The item types in this part of the test were determined by the complexity and linguistic level of the skill to be assessed. In some cases, the student is asked to select the best word or phrase for a given situation; in others, the choice is between sentences or paragraphs. In all cases, the student must evaluate the effectiveness or appropriateness of alternate expressions of the same idea. 38 What constitutes “good” or “correct” usage varies with situations, audiences, and cultural influences. Effective language teaching includes appreciation of the influence of context and culture on usage, but no attempt is made to assess this type of linguistic awareness. A single test embracing these objectives would involve more than “basic” skills and would have to sample the language domain beyond what is common to most school curricula. Mathematics In general, changes occur slowly in the nature of basic skills and objectives of instruction. The field of mathematics, however, has been a noticeable exception. Elementary school mathematics has been in the process of continual change over the past 45 to 50 years. In grades 3 through 8, the math programs of recent years have modified method, placement, sequence, and emphasis, but quantitative reasoning and problem solving remain important. The National Council of Teachers of Mathematics (NCTM) Principles and Standards for School Mathematics (2000) describes the content of the math curriculum and the process by which it is assessed. Changes in content and emphasis of the ITBS Math tests reflect these new ideas about school math curricula. Forms 1 and 2 of the ITBS were constructed when textbook series exhibited remarkable uniformity in the grade placement of math concepts. Forms 3 and 4 were developed during the transition from “traditional” to “modern” math. In the three years following publication of Forms 3 and 4, math programs changed so dramatically that a special interim edition, The Modern Mathematics Supplement, was published to update the tests. During the late 1960s and early 1970s, the math curriculum was relatively stable. Forms 5 and 6 (standardized in 1970–71) increasingly emphasized concept development, whereas Forms 7 and 8 (standardized in 1977–78) shifted the emphasis to computation processes and problem-solving strategies. Greater attention was paid to estimation skills and mental arithmetic in Forms G and H (standardized in 1984–85). The 1989 NCTM Standards led to significant changes in two of the ITBS Math tests (Concepts and Estimation and Problem Solving and Data Interpretation) in Forms K, L, and M. The 2000 revision of the NCTM Standards is reflected in slight modifications of these tests in Forms A and B. Concepts and Estimation. The curriculum and teaching methods in mathematics show great diversity. Newer programs reflect the NCTM Standards more directly, yet some programs have 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 39 endorsed new standards for math education without significantly changing what is taught. In part, diverse approaches to math teaching belie the high degree of similarity in the purposes and objectives of math education. As with any content area, the method used to teach math is probably less important than a teacher’s skill with the method. When new content standards and methods are introduced, teachers need time to apply them; teachers must learn what works, what does not work, and what to emphasize. During times of curriculum transition, an important part of a teacher’s experience is adjusting to changes in assessment based on new standards. The Iowa Tests have always emphasized understanding, discovery, and quantitative thinking. Math teachers know students need more time to understand a fact or a process when meaning is stressed than when math is taught simply by drill and practice. In the long run, children taught by methods that focus on understanding will develop greater competence, even though they may not master facts as quickly in the early stages of learning. Even with a test aimed at a single grade level, compromises are necessary in placement of test content. A test with many items on concepts taught late in the school year may be inappropriate to use early in the school year. Conversely, if concepts taught late in the school year are not covered on the test, this diminishes the validity of the test if administered at the end of the year. In the Concepts and Estimation test, a student in a given grade should be familiar with 80 to 85 percent of the items for that grade at the beginning of the school year. By midyear, a student should be familiar with 90 to 95 percent of the items for that grade. The remaining items require understanding usually gained in the second half of the year. Assigning levels for the Concepts and Estimation test should be done carefully, because shifts in local curriculum can affect which test levels are appropriate for which grades. Beginning with Forms K and L, Levels 9 through 14 (grades 3 through 8), the Math Concepts test became a two-part test that included estimation. The name of the test was changed to Concepts and Estimation because of the separately timed estimation section. Part 1 is similar to the Math Concepts test in previous editions of the ITBS. In Forms A and B, this part continues to focus on numeration, properties of number systems, and number sequences; fundamental algebraic concepts; and basic measurement and geometric concepts. More emphasis is placed on probability and statistics. As in past editions, computational requirements of Part 1 are minimal. Students may use a calculator on Part 1. Part 2 of the Concepts and Estimation test is separately timed to measure computational estimation. Early editions included a few estimation items in the Concepts test (about 5 percent). The changing role of computation in the math curriculum, however, created by the growing use of calculators and the continued need for estimation skills in everyday life, requires a more prominent role for estimation. Both the 1989 and 2000 NCTM Standards documented the importance of estimation. Studies indicate that, with proper directions and time limits, students will use estimation strategies and will rarely resort to exact computation (Schoen, Blume & Hoover, 1990). Several aspects of estimation are represented in Part 2 of the Concepts and Estimation test, including: (a) standard rounding—rounding to the closest power of 10 or, in the case of mixed numbers, to the closest whole numbers; (b) order of magnitude involving powers of ten; and (c) number sense, including compatible numbers and situations that require compensation. Besides varying the estimation strategy, some items place estimation tasks in context and others use symbolic form. Student performance on estimation items can differ dramatically depending on whether a context is provided. Because estimation strategies in connection with the use of a calculator or computer rarely involve context, some items are presented without a context. At lower levels of the test, about two-thirds of the items are in context. This proportion decreases to roughly one-half at the upper levels. In Forms A and B, the estimation section was shortened by about 50 percent from what it had been in Forms K, L, and M. Experiences of users indicated that sufficient content coverage and reliability would be obtained with fewer items. Using a calculator is not permitted on this part of the test. Problem Solving and Data Interpretation. Another change in the ITBS Math tests occurred in the Problem Solving and Data Interpretation test. The 2000 NCTM Standards called for continued attention to problem solving in the math curriculum with added emphasis on interpretation and analysis of data. Forms K, L, and M, which were designed with this emphasis in mind, contained two parts: one focusing on problem-solving skills and the other on interpreting data in graphs and tables. Part 1 of Problem Solving and Data Interpretation was 39 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 40 similar in format to earlier editions of the ITBS Problem Solving test. Part 2 included materials that had been in one of the tests on work-study skills. In Forms A and B, problem solving and data interpretation are integrated. The problem situations, graphs, and tables for this test are based on real data and emphasize connections to other areas of the curriculum. The content of the Math Problem Solving tests in the ITBS has been strongly influenced by historical changes in the design of the battery. The addition of a test of computational speed and accuracy beginning with Forms 7 and 8 in the late 1970s marked a fundamental change in the definition of problem solving in the ITBS. What had been a domain that included computational skills in an applied setting became one of problems that require fundamental (often multiple) operations and concepts in a meaningful context. Problem Solving and Data Interpretation still requires computation. But the operations and concepts, in most cases, have been introduced at least a year before the grade level for which the test is intended. Most of the operations are basic facts or facts that do not require renaming. The total number of operations at the upper levels is substantially greater than the number of items. This reflects the large proportion of items at these levels that require multiple steps to solve. The content standards emphasize problem contexts with solution strategies beyond simple application of computational algorithms memorized through a drill and practice curriculum. Table 3.6 outlines the computational skills required for Form A of the Problem Solving and Data Interpretation test. In mathematics, an ideal problem is one that is novel for the individual solving it. Many problems in instructional materials, however, might be better described as “exercises.” Often they are identical or similar to others explained in textbooks or by the teacher. In such examples, the student is not called on to do anything new; modeling, imitation, and recall are the primary behaviors required. This is not to suggest that repetition—such as working exercises and practicing basic facts—is not useful; indeed, it can be important. However, opportunities should also be provided for students to solve realistic problems in novel situations they might experience in daily life. Problem Solving and Data Interpretation includes items that measure “problem-solving process” or “strategy.” These item types were adapted from Polya’s four-step problem-solving model (Polya, 1957). As part of the Iowa Problem Solving Project (Schoen, Blume & Hoover, 1990), items were 40 developed to measure the steps of (1) getting to know the problem, (2) choosing what to do, (3) doing it, and (4) looking back. Information gathered as part of this project was used to integrate items of this type into The Iowa Tests. Including questions that require the interpretation of data underscores a long-standing belief in the importance of graphical displays of quantitative information. The ITBS was the first achievement battery to assess a student’s ability to interpret data in graphs and tables. Such items have been included since the first edition in 1935 and appeared in a separate test until 1978. In more recent editions, these items were part of a test called Visual Materials. Formal instruction in the interpretation and analysis of graphs, tables, and charts has become part of the mathematics curriculum as a result of the 1989 and 2000 NCTM Standards. This trend reflects increased emphasis on statistics at the elementary level. The interpretation of numerical information presented in graphs and tables provides the foundation for basic descriptive statistics. The data interpretation skills assessed in this test are reading amounts, comparing quantities, and interpreting relationships and trends in graphs and tables. Stimulus materials include pictographs, circle graphs, bar graphs, and line graphs. Tables, charts, and other visual displays from magazines, newspapers, television, and computers are also presented. The information is authentic, and the graphical or tabular form used is logical for the data. Items in this part of the test essentially require no computation. Students may use a calculator on the Problem Solving and Data Interpretation test. Computation. The early editions of the ITBS measured computation skills with word problems in the Mathematics Problem Solving test just described. Computational load is now considered a confounded effect in problem solving items. As problem solving itself became the focus of that test, an independent measure of speed and accuracy in computation was needed. Beginning with Forms 7 and 8, a separate test of computational skill was added. Instruction in computation takes place with whole numbers, fractions, and decimals. Each of these areas includes addition, subtraction, multiplication, and division. Although much computation in “real world” settings involves currency, percents, and ratio or proportion, these applications are nothing more than special cases of basic operations with whole numbers, fractions, and decimals. “Real world” problems that require performing these 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 41 specialized computation skills are still part of the Problem Solving and Data Interpretation test. The logic of the computation process and the student’s understanding of algorithms are measured in the first part of the Concepts and Estimation test. The grade placement of content in a computation test is more crucial than in other areas of mathematics. For example, whole number division may be introduced toward the end of third grade in many textbooks. Including items that measure this skill in the Level 9 test would be inappropriate if students have not yet been taught the process. In placing items that measure specific computational skills, only skills taught during the school year preceding the year when the test level is typically used are included. The Computation test for Levels 9 through 14 of Forms A and B differs only in length from recent editions. At all levels, testing time and number of items were each reduced by about 25 percent. The small decrease in the proportion of fraction items and the small increase in the proportion of decimal items made with Forms K, L, and M was maintained in Forms A and B. These modifications reflect increased emphasis on calculators, computers, and metric applications in the math curriculum. This test remains a direct measure of computation that requires a single operation—addition, subtraction, multiplication, or division on whole numbers, fractions, or decimals at appropriate grade levels. Unlike some tests of computation, the ITBS Math Computation test does not confound concept and estimation skills with computational skill. Computation is included in the Math Total score unless a special composite, Math Total without Computation, is requested. Social Studies The content domain for the Social Studies test was designed to broaden the scope of the basic skills and to assess progress toward additional concepts and understandings in this area. Although the Social Studies test requires knowledge, it deals with more than memorization of facts or the outcomes of a particular textbook series or course of study. It is concerned with generalizations and applications of principles learned in the social studies curriculum. Many questions on the Social Studies test involve materials commonly found in social studies instruction—timelines, maps, graphs, and other visual stimuli. Table 3.6 Computational Skill Level Required for Math Problem Solving and Data Interpretation Iowa Tests of Basic Skills — Complete Battery, Form A (Number of operations and percent of items per level) Level Operation Whole Numbers and Currency (Totals) • Basic facts (addition, subtraction, multiplication, and division) • Other sums, differences, products, and quotients: No renaming (no remainder) • Other sums, differences, products, and quotients: Renaming (remainder) Fractions and Decimals Total Number of Operations Number of Items Requiring Computation Number of Items Requiring No Computation 7 8 9 10 11 12 13 14 20 (100) 24 (100) 15 (100) 19 (95) 18 (90) 15 (75) 18 (72) 23 (85) 19 (95) 19 (79) 8 (53) 7 (35) 5 (25) 7 (35) 6 (24) 4 (15) 1 (5) 5 (21) 6 (40) 12 (60) 10 (50) 3 (15) 6 (24) 12 (44) — — 1 (7) — 3 (15) 5 (25) 6 (24) 7 (26) — — — 1 (5) 2 (10) 5 (25) 7 (28) 4 (15) 20 24 15 20 20 20 25 27 17 16 12 11 13 14 15 15 11 14 10 13 13 14 15 17 41 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 42 The content areas in the Social Studies test are history, geography, economics, and government and society (including social structures, ethics, citizenship, and points of view). These areas are interrelated, and many tasks involve more than one content area. The history content requires students to view events from non-European as well as European perspectives. The test focuses on historical events and experiences in the lives of ordinary citizens. In geography, students apply principles rather than identify facts. Countries in the eastern and western hemispheres are represented. In economics, students are expected to recognize the impact of technology, to understand the interdependence of national economies, and to use basic terms and concepts. Questions in government and society measure a student’s understanding of responsible citizenship and knowledge of democratic government. They also assess needs common to many cultures and the functions of social institutions. In addition, a student’s knowledge of the unique character of cultural groups is measured. The Social Studies test attempts to represent the curriculum standards for social studies of several national organizations. National panels have developed standards in history, geography, economics, and civics. The National Council for the Social Studies (NCSS) adopted Curriculum Standards for the Social Studies (NCSS, 1994), which specifies ten content strands for the social studies curriculum. The domain specifications for the Social Studies test parallel the areas in which national standards have been developed. The NCSS curriculum strands map onto many content standards of the ITBS. For example, Strand II of the NCSS Standards (Time, Continuity, and Change) is represented in history: change and chronology in the ITBS. Similarly, Strand III (People, Places, and Environments) matches two ITBS standards in geography (Earth’s features and people and the environment). Similar connections exist for other ITBS content skills. Some process skills taught in social studies are assessed in other tests of the ITBS Complete Battery—in particular, the Maps and Diagrams test and the Problem Solving and Data Interpretation test. Science Content specifications for Forms A and B of the Science test were influenced by national standards of science education organizations. The National 42 Science Education Standards (NRC, 1996), prepared under the direction of the National Research Council, were given significant attention. In addition, earlier work on science standards was consulted. Science for All Americans (Rutherford and Ahlgren, 1990) and The Content Core: A Guide for Curriculum Designers (Pearsall, 1993) were resources for developing Science test specifications. The main impact of this work was to elevate the role of process skills and science inquiry in test development. Depending on test level, one-third to nearly one-half of the questions concern the general nature of scientific inquiry and the processes of science investigation. For test development, questions were classified by content and process. The content classification outlined four major domains: Scientific Inquiry: understanding methods of scientific inquiry and process skills used in scientific investigations Life Science: characteristics of life processes in plants and animals; body processes, disease, and nutrition; continuity of life; and environmental interactions and adaptations Earth and Space Science: the Earth’s surfaces, forces of nature, conservation and renewable resources, atmosphere and weather, and the universe Physical Science: basic understanding of mechanics, forces, and motion; forms of energy; electricity and magnetism; properties and changes of matter These content standards were used with a process classification developed by AAAS: classifying, hypothesizing, inferring, measuring, and explaining. This classification helped ensure a range of thinking skills would be required to answer questions in each content area of Science. As in social studies, skills associated with the science curriculum are measured in other tests in the Complete Battery of the ITBS. Some passages in the Reading test use science content to measure comprehension in reading. Skills measured in Problem Solving and Data Interpretation are essential to scientific work. Some skills in the Sources of Information tests are necessary for information gathering in science. Skill scores from these tests may be related to a student’s progress in science or a school’s science curriculum. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 43 Sources of Information The Iowa Tests recognize that basic skills extend beyond the school curriculum. Only a small part of the information needed by an educated adult is acquired in school. An educated student or adult knows how to locate and interpret information from available resources. In all curriculum areas, students must be able to locate, interpret, and use information. For this reason, the ITBS Complete Battery includes separate tests on using information sources. Teaching and learning about information sources differ from other content areas in the ITBS because “sources of information” is not typically a subject taught in school. Skills in using information are developed through many activities in the elementary school curriculum. Further, the developmental sequence of these skills is not wellestablished. As a result, before the specifications for these tests could be written, the treatment and placement of visual and reference materials in instruction were examined. Authorities in related disciplines were also consulted. The most widely used textbooks in five subject areas were reviewed to identify grade placement of information sources and visual displays. Also considered were the extent to which textbook series agreed on placement and emphasis, the contribution of subject areas to skills development, and whether information sources were accompanied by instruction on their use. The original skills selected for these tests were classified as the knowledge and use of (1) map materials, (2) graphic and tabular materials, and (3) reference materials. Graphs and tables by and large have become part of the school math curriculum, so that category of information sources was shifted to the Math tests beginning with Form K. In its place, a category on presentation of information through diagrams and related visual materials was added. Such materials have become a regular part of various publications—from periodicals to textbooks and research documents—and represent an important source of information across the curriculum. General descriptions of the tests in Sources of Information follow. Maps and Diagrams. The Maps and Diagrams test measures general cognitive functions for the processing of information as well as specific skills in reading maps and diagrams. Developing the domain specifications for this test involved formal and informal surveys of instructional materials that use visual stimuli. The specifications for map reading were based on a detailed classification by type, function, and complexity of maps appearing in textbooks. The geographical concepts used in map reading were organized by grade level. These concepts were classified as: the mechanics of map reading (e.g., determining distance and direction, locating and describing places), the interpretation of data (geographic, sociological, or economic conditions), and inferring behavior and living conditions (routes of travel, seasonal variations, and land patterns). The specifications for the diagrams part of the test came from analyzing materials in textbooks and other print material related to functional reading. Items require locating information, explaining relationships, inferring processes or products, and comparing and contrasting features depicted in diagrams. Reference Materials. Although students in most schools have access to reference materials across the curriculum, few curriculum guides pay explicit attention to the foundation skills a student needs to take full advantage of available sources. The content standards for the Reference Materials test include aspects of general cognitive development as well as specific information-gathering skills. The focus is on skills needed to use a variety of references and search strategies in classroom research and writing activities. The items tap a variety of cognitive skills. In the section on using a dictionary, items relate to spelling, pronunciation, syllabification, plural forms, and parts of speech. In the section on general references, items concern the parts of a book (glossary, index, etc.), encyclopedias, periodicals, and other special references. General skills required to access information—such as alphabetizing and selecting guidewords or keywords—are also measured. To answer questions on the Reference Materials test, students must select the best source, judge the quality of sources, and understand search strategies. The upper levels of Forms A and B also include items that measure note-taking skills. Critical Thinking Skills The items in the ITBS batteries for Levels 9–14 were evaluated for the critical thinking demands they require of most students. Questions were classified by multiple reviewers, and a consensus approach was used for final decisions. The test specifications for the final test forms did not contain specific quotas for critical thinking skills. Cognitive processing demands were considered in developing, revising, and selecting items, but the definition of critical thinking was not incorporated directly in any of those decisions. 43 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 44 Classifying items as requiring critical thinking depends on judgments that draw upon (a) knowledge of the appropriate content, (b) an understanding of how students interact with the content of learning and the remembering of it, and (c) a consistent use of the meaning of the term “critical thinking” in the content area in question. The ITBS classifications represent the consensus of the authors about the critical thinking required of most students who correctly answer such items. Further information about item classifications for critical thinking is provided in the Interpretive Guide for Teachers and Counselors. Other Validity Considerations Norms Versus Standards In using test results to evaluate teaching and learning, it should be recognized that a norm is merely a description of average performance. Normreferenced interpretations of test scores use the distribution of test scores in a norm group as a frame of reference to describe the performance of individuals or groups. Norms for achievement should not be confused with standards for performance (i.e., indicators of what constitutes “satisfactory,” “proficient,” or “exemplary” performance). The distributions of building averages on the ITBS show substantial variability in average achievement from one content area to another in the same school system. For example, schools in a given locale may spend substantially more time on mathematics than on writing, and schools with high averages on the ITBS Language tests may not emphasize writing. A school that scores below the norm in math and above the norm in language may nevertheless need to improve its writing instruction more than its math instruction. Such a judgment requires thorough understanding of patterns of achievement in the district, current teaching emphasis, test content, and expectations of educators and the community. All of these factors contribute to standards-based interpretations of test scores. Many factors should be considered when evaluating the performance of a school. These include the general cognitive ability of the students, learning opportunities outside the school, the emphasis placed on basic skills in the curriculum, and the grade placement and sequencing of the content taught. Large differences in achievement between schools in the same system can be explained by such factors. These factors also can influence how a school or district ranks compared to general norms. Quality of instruction is not the only determining factor. 44 What constitutes satisfactory performance, or what is an acceptable standard, can only be determined by informed judgments about school and individual performance. It is likely to vary from one content area to another, as well as from one locale to another. Ideally, each school must determine what may be reasonably expected of its students. Belowaverage performance on a test does not necessarily indicate poor teaching or a weak curriculum. Examples of effective teaching are found in many such schools. Similarly, above-average performance does not necessarily mean there is no room for improvement. Interpreting test scores based on performance standards reflects a collective judgment about the quality of achievement. The use of such judgments to improve instruction and learning is the ultimate obligation of a standardsbased reporting system. Some schools may wish to formalize the judgments needed to set performance standards on the Iowa Tests of Basic Skills. National performance standards were developed in 1996 in a workshop organized by the publisher. Details about national performance standards and the method used to develop them are given in The Iowa Tests: Special Report on Riverside’s National Performance Standards (Riverside Publishing, 1998). Using Tests to Improve Instruction Using tests to improve instruction and learning is the most important purpose of any form of assessment. It is the main reason for establishing national norms and developmental score scales. Valid national norms provide the frame of reference to determine individual strengths and weaknesses. Sound developmental scales create the frame of reference to interpret academic growth—whether instructional practices have had the desired effect on achievement. These two frames of reference constitute the essential contribution of a standardized achievement test in a local school. Most teachers provide for individual and group differences in one way or another. It would be virtually impossible to structure learning for all students in exactly the same way even if one wanted to. The characteristics, needs, and desires of students require a teacher to allocate attention and assistance differentially. Test results help a teacher to tailor instruction to meet individual needs. Test results are most useful when they reveal discrepancies in performance—between test areas, from year to year, between achievement and ability tests, and between expectations and performance. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 45 Many score reports contain unexpected findings. These findings represent information that should not be ignored but instead examined further. Suggestions about using test results to individualize instruction are given in the Interpretive Guide for Teachers and Counselors, along with a classification of content for each test. Any classification system is somewhat arbitrary. The content of the ITBS is represented in the skills a student is required to demonstrate. The skills taxonomy is re-evaluated periodically because of changes in curriculum and teaching methods. For some tests (e.g., Capitalization), the categories are highly specific; for others (e.g., Reading Comprehension), they are more general. The criteria for defining a test’s content classification system are meaningfulness and usefulness to the teacher. The Interpretive Guide for Teachers and Counselors and the Interpretive Guide for School Administrators present the skills classification system for each test. A detailed list of skills measured by every item in each form is included. In addition, suggestions are made for improving achievement in each area. These are intended as follow-up activities. Comprehensive procedures to improve instruction may be found in the literature associated with each curriculum area in the elementary and middle grades. Using Tests to Evaluate Instruction To address the issue of evaluating curriculum and instruction is to confront one of the most difficult problems in assessment. School testing programs do not exist in a vacuum; there are many audiences for assessment data and many stakeholders in the results. Districts and states are likely to consider using standardized tests as part of their evaluation instruction. Like any assessment information, standardized test results provide only a partial view of the effectiveness of instruction. The word “partial” deserves emphasis because the validity of tests used to evaluate instruction hinges on it. First, standardized tests are concerned with basic skills and abilities. They are not intended to measure total achievement in a given subject or grade. Although these skills and abilities are essential to nearly all types of academic achievement, they do not include all desired outcomes of instruction. Therefore, results obtained from these tests do not by themselves constitute an adequate basis for, and should not be overemphasized in, the evaluation of instruction. It is possible, although unlikely, that some schools or classes may do well on these tests yet be relatively deficient in other areas of instruction—for example, music, literature, health, or career education. Other schools or classes with below-average test results may provide a healthy educational environment in other respects. Achievement tests are concerned with areas of instruction that can be measured under standard conditions. The content standards represented in the Iowa Tests of Basic Skills are important. Effective use of the tests requires recognition of their limits, however. Schools should treat as objectively as possible those aspects of instruction that can be measured in this way. Other less tangible, yet still important, outcomes of education should not be neglected. Second, local performance is influenced by many factors. The effectiveness of the teaching staff is only one factor. Among the others are the cognitive ability of students, the school environment, the students’ educational history, the quality of the instructional materials, student motivation, and the physical equipment of the school. At all times, a test must be considered a means to an end and not an end in itself. The principal value of these tests is to focus the attention of the teaching staff and the students on specific aspects of educational development in need of individual attention. Test results should also facilitate individualized instruction, identify which aspects of the instructional program need greater emphasis or attention, and provide the basis for better educational guidance. Properly used results should motivate teachers and students to improve instruction and learning. Used with other information, test results can help evaluate the total program of instruction. Unless test results are used in this way, however, they may do serious injustice to teachers or to well-designed instructional programs. Local Modification of Test Content The Iowa Tests are based on a thorough analysis of curriculum materials from many sources. Every effort is made to ensure content standards reflect a national consensus about what is important to teach and assess. Thus, many questions about content representativeness of the tests are addressed during test development. Adapting the content of nationally standardized tests to match more closely local or state standards is a trend in large-scale assessment. Sometimes these efforts involve augmenting standardized tests and reporting criterion-referenced information along 45 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 46 with national norms from the intact test. In other cases, local districts modify a test by selecting some items and omitting others based on the match with the local curriculum. Studies of the latter type of modified test (e.g., Forsyth, Ansley & Twing, 1992), also known as customized tests, have shown that national norms after modification can differ markedly from norms on the test as originally standardized. Some of this distortion may result from selecting items so tailored to the curriculum that students perform better than they would on the original version. Other distortions are caused by context effects on items. When items are removed, those remaining may not have the same psychometric properties (Brennan, 1992), which affects national norms. When this occurs, other normative score interpretations are also affected. The evaluation of strengths and weaknesses, the assessment of growth, and the status relative to national standards can be distorted if local modifications do not retain the same balance of content as in the original test. Content standards at the local and state level can change dramatically over a short time. Performance standards can be as influenced by politics as by advances in understanding how students learn. For these reasons, The Iowa Tests should be administered under standard conditions to ensure the validity of norm-referenced interpretations. Predictive Validity The Iowa Tests of Basic Skills were not designed as tests of academic aptitude or as predictors of future academic success. However, the importance of basic skills to high school and college success has been demonstrated repeatedly. Evidence of the predictive “power” of tests is difficult to obtain because selection eliminates from research samples students whose performance fails to qualify them for later education. Many college students complete high school and enter college in part because of high proficiency in the basic skills. Students who lack proficiency in the basic skills are either not admitted to college or seek employment. Therefore, coefficients of predictive validity are obtained for a select population. Estimates of correlations for an unselected population can be obtained (e.g., Linn & Dunbar, 1982), but the assumptions underlying the computations are not always satisfied. 46 Five studies of predictive validity are summarized in Table 3.7 with correlation coefficients between the ITBS Complete Composite and several criterion measures. In a study by Scannell (1958), ITBS scores at grades 4, 6, and 8 of students entering one of the Iowa state universities were correlated with three criterion measures. These criteria were (a) grade 12 Composite scores on the Iowa Tests of Educational Development, (b) high school grade-point average (GPA), and (c) first-year college GPA. Considerable restriction in the range of the ITBS scores was present. The observed correlations should be regarded as lower-bound estimates of the actual correlation in an unselected population. When adjustments for restriction in range were made, the estimated correlations with the ITED grade 12 Composite were .77, .82, and .81 for grades 4, 6, and 8, respectively. In Rosemier’s (1962) study of freshmen entering The University of Iowa in the fall of 1962, test scores were obtained for the ITBS in grade 8 and for the ITED in grades 10–12. Scores on the American College Tests (ACT), high school GPA, and first-year college GPA were also obtained. The standard deviation of the ITBS Composite scores for the sample was 7.52, compared to 14.91 for the total grade 8 student distribution that year. Differences between the obtained and adjusted correlations show the effect of range restriction on estimated validity coefficients. Loyd, Forsyth, and Hoover (1980) conducted a study of the relation between ITBS, ITED, and ACT scores and high school and first-year college GPA of 1,997 graduates of Iowa high schools who entered The University of Iowa in the fall of 1977. As in the Rosemier study, variability in the college-bound population was much smaller than that in the general high school population. Ansley and Forsyth (1983) obtained final college GPAs of the students in the Loyd et al. study. They found the ITBS predicted final college GPA as well as it predicted first-year college GPA. Qualls and Ansley (1995) replicated the Loyd et al. study with data from freshmen entering The University of Iowa in the fall of 1991. They found ITBS and ITED scores still showed substantial predictive validity, but the correlations between tests scores and grades were somewhat lower than those in the earlier study. To investigate these differences, a study is under way with data from the 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 47 Table 3.7 Summary Data from Predictive Validity Studies Correlations with ITBS Complete Composite ITED Composite Source Scannell, 1958 (#) Grade 8 Grade 6 Grade 4 Rosemier, 1962(#) Grade 8 Loyd et al., 1980 Grade 8 Grade 6 Ansley & Forsyth, 1983 (#) Grade 8 Grade 6 Qualls & Ansley, 1995 (#) Grade 8 Grade 6 Grade 4 Grade 10 Grade 11 High School Grade 12 HS GPA 0.73 0.76 0.68 0.61 0.59 0.53 ACT Composite College GPA Freshman 0.48 0.49 0.42 0.78 0.77 0.77 0.59 0.73 0.41 0.84 0.78 0.84 0.80 0.79 0.74 0.49 0.49 0.78 0.73 0.44 0.45 freshman classes of 1996 and 1997. Preliminary results indicate the correlations between test scores and GPA are smaller than those reported in the 1950s and 1960s, but the relationship is stronger than the Qualls and Ansley research suggests. Three predictive validity studies that examine the relation between achievement test scores in eighth grade and subsequent course grades in ninth grade are summarized below. Dunn (1990) found that correlation coefficients between the two measures of performance, test scores and course grades, were relatively consistent for composites. The average correlation between the ITBS Complete Composite and grades across 13 high school courses—including language arts, U.S. history, general math, algebra, etc.—was .62. The smallest correlations were observed in courses for which selection criteria narrowed the range of overall achievement considerably (e.g., algebra). As part of this investigation, a variety of regression analyses were performed to examine the joint role of test scores and course grades in predicting later performance in school. These analyses showed that Final 0.43 0.44 0.38 0.36 0.32 0.26 0.21 0.18 achievement test scores added significantly to the prediction of course grades in high school after performance in middle school courses was taken into account. Course grades in the middle school years tended to be better predictors of high school performance than test scores, suggesting unique factors influence grades. Similar results were obtained in a study by Barron, Ansley, and Hoover (1991) that looked specifically at predicting achievement in Algebra I. As in the Dunn study, multiple regression analyses showed that ITBS scores added significantly to the prediction of ninth-grade algebra achievement even after previous grades were taken into account. A more recent analysis of the relation between ITBS Core Total scores in grade 8 and ACT composite scores was conducted by Iowa Testing Programs (1998). Predictive validity coefficients in this study were .76, .81, and .78 for fall, midyear, and spring administrations of the ITBS in grade 8. Predictive validity coefficients of this magnitude compare favorably with those of achievement tests given in the high school years. 47 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 48 Tests such as the ITBS have been used in many ways to support judgments about how well students are prepared for future instruction, that is, as general measures of readiness. This aspect of test use has become somewhat controversial in recent years because of situations where tests are used to control access to educational opportunities. Readability The best way to determine the difficulty of a standardized test is to examine its norms tables and distribution of item difficulty. The difficulty data for items, skills, and tests in the ITBS are reported in Content Classifications with Item Norms. Of the various factors that influence difficulty, readability is the focus of much attention. The readability of written materials is measured in several ways. An expert may judge the grade level of a reading passage based on perception of its complexity. The most common method of quantifying these judgments is to use a readability formula. Readability formulas are often used by curriculum specialists, classroom teachers, and school librarians to match textbooks and trade books to the reading abilities of students. The virtue of readability formulas is objectivity. Typically, they use vocabulary difficulty (e.g., word familiarity or length) and syntactic complexity (e.g., sentence length) to predict the difficulty of a passage. The shortcoming of readability formulas is failure to account for qualitative factors that influence how easily a reader comprehends written material. Such factors include organization and cohesiveness of a selection, complexity of the concepts presented, amount of knowledge the reader is expected to bring to the passage, clarity of new information, and interest level of the material to its audience. Readability formulas were originally developed to assess the difficulty of written prose and sometimes have been used as a basis for modifying written material. Using a readability formula in this way does not automatically result in more readable text. Short, choppy sentences and familiar but imprecise words can actually increase the difficulty of a selection even though they lower its index of readability (Davison & Kantor, 1982). Readability formulas use word lists that become dated over time. For instance, the 1958 Dale-Chall (Dale & Chall, 1948; Powers, Sumner & Kearl, 1958) and the Bormuth (1969) formulas use the Dale List of 3,000 words, which reflects the reading vocabulary of students of the early 1940s. This list 48 results in some curiosities: “Ma” and “Pa,” which today would appear primarily in regional literature, are considered familiar; “Mom” is unfamiliar. Similarly, “bicycle” is an easy word, but “bike” is hard. “Radio” is familiar; “TV” is not. The 1995 DaleChall revision addresses some of these concerns, but what is truly familiar and what is not will always be time dependent. A similar problem exists in predicting the vocabulary difficulty of subject-area tests. Words that are probably familiar to students in a particular curriculum (e.g., “cost,” “share,” and “subtract” in math problem solving; “area,” “map,” and “crop” in social studies; “body,” “bone,” and “heat” in science; and even the days of the week in a capitalization test) are treated as unfamiliar words by the Spache formula. Readability concerns are often raised on tests such as math problem solving. It is generally believed that a student’s performance in math should not be influenced by reading ability. Readability data are frequently requested for passages in reading tests. Here, readability indices document the range of difficulty included so the test can discriminate the reading achievement of all students. Since reading achievement in the middle grades can span seven or eight grade levels, the readability indices of passages should vary substantially. Three readability indices for Forms A and B are reported in the accompanying table for Reading Comprehension, Language Usage and Expression, Math Problem Solving, Social Studies, and Science. The Spache (1974) index, reported in grade-level values, measures two factors: mean sentence length and proportion of “unfamiliar” or “difficult” words. The Spache formula uses a list of 1,041 words and is appropriate with materials for students in grades 1 through 3. The Bormuth formula (Bormuth, 1969, 1971) reflects three factors: average sentence length, average word length, and percent of familiar words (based on the old Dale List). Bormuth’s formula predicts the cloze mean (CM), the average of percent correct for a set of cloze passages (the higher the value, the easier the passage). The value reported in the table is an inverted Bormuth index. The inverted index was multiplied by 100 to remove the decimal. 9 10 11 12 13 14 Form B Level 9 10 11 12 13 14 46-58 38-58 38-49 35-49 27-49 27-44 2.2-4.2 2.2-3.8 — — — — 53 51 50 41 39 38 30 33 35 38 40 43 53 51 50 47 43 41 Number of 1995 Items Dale-Chall 30 33 35 38 40 43 Number of 1995 Items Dale-Chall 45 49 50 52 57 59 Inverted Bormuth 44 47 48 57 61 59 Inverted Bormuth 1995 Dale-Chall Spache 3.0 3.1 — — — — Spache 3.5 3.8 — — — — Spache 52 49 44 44 40 36 1995 Dale-Chall 48 44 44 42 39 38 1995 Dale-Chall 49 51 54 55 58 60 Inverted Bormuth 53 56 57 58 59 59 Inverted Bormuth 9 10 11 12 13 14 Form B Level 9 10 11 12 13 14 Form A Level 46 48 47 46 48 47 22 24 26 28 30 32 48 49 50 48 45 42 Number of 1995 Items Dale-Chall 22 24 26 28 30 32 Number of 1995 Items Dale-Chall 47 48 52 52 52 53 Inverted Bormuth 49 49 51 51 51 52 Inverted Bormuth Mathematics Problem Solving and Data Interpretation 42-56 44-59 50-59 50-62 54-67 54-67 Inverted Bormuth 49-59 51-65 49-65 49-69 54-69 54-67 Inverted Bormuth — 30 34 37 39 41 43 51 49 46 43 39 39 9 10 11 12 13 14 30 34 37 39 41 43 49 48 46 40 38 42 Form B Number of 1995 Level Items Dale-Chall 9 10 11 12 13 14 50 51 51 56 58 56 Inverted Bormuth 48 51 53 54 57 57 44 46 48 49 52 55 2.5 2.8 — — — — 53 50 47 46 43 39 1995 Dale-Chall 51 47 45 44 41 40 1995 Dale-Chall Science Spache 2.8 3.1 — — — — Spache Means 30 34 37 39 41 43 49 47 49 44 42 42 9 10 11 12 13 14 30 34 37 39 41 43 52 51 49 48 45 42 Form B Number of 1995 Level Items Dale-Chall 9 10 11 12 13 14 Form A Number of 1995 Level Items Dale-Chall Inverted Bormuth 44 47 49 51 55 56 Inverted Bormuth Inverted Bormuth 54 52 50 49 45 44 1995 Dale-Chall 54 51 49 47 44 42 1995 Dale-Chall Social Studies 2.3 2.6 — — — — Spache 2.3 2.6 — — — — Spache Form A Number of 1995 Level Items Dale-Chall 37 41 43 45 48 52 Number of Items 37 41 43 45 48 52 Number of Items Means 49 50 49 51 55 58 Inverted Bormuth 49 50 51 56 59 58 Inverted Bormuth 47 50 52 53 56 59 Inverted Bormuth 50 53 54 56 58 58 Inverted Bormuth Passages Plus Items 3:15 PM Form A Level Number of Passages 4 4 8 8 8 8 8 8 Form B Level 7 8 9 10 11 12 13 14 42-53 34-49 34-53 28-53 28-45 25-45 3.0-5.1 3.0-5.1 — — — — — 1995 Dale-Chall — Spache Means Items Only 10/29/10 Usage and Expression Number of Passages 1 4 4 8 8 8 8 8 8 Form A Level 6 7 8 9 10 11 12 13 14 Range Passages Only Reading Comprehension Table 3.8 Readability Indices for Selected Tests 961464_ITBS_GuidetoRD.qxp Page 49 49 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 50 Like the Bormuth formula, the 1995 Dale-Chall index estimates the cloze mean but from only two factors, percent of unfamiliar words and average sentence length. Easy passages tend to have relatively few unfamiliar words and shorter sentences, whereas difficult passages tend to have more unfamiliar words and longer sentences. Readability indices for Reading Comprehension are reported separately by form and level for passages, items, and passages plus items. Indices for passages in each level vary greatly, which helps to discriminate over the range of reading achievement. The average, however, is typically at or below grade level. For example, the 1995 Dale-Chall index for passages in Form A, Level 9, ranges from 42 to 53; the average is 48. The readability index for an item 50 usually indicates the item is easier to read than the corresponding passage. In Form A, Levels 10–14, for example, the average 1995 Dale-Chall index is 43 for the passages, 48 for the items, and 45 for the passages plus items. The formulas tend to treat many words common to specific subjects as unfamiliar. This could have an effect on the readability indices for the Social Studies, Science, and Math Problem Solving and Data Interpretation tests, especially in the lower grades. The values given in the accompanying table, however, are usually below the grade level in which tests are typically given. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 51 PART 4 Scaling, Norming, and Equating The Iowa Tests Frames of Reference for Reporting School Achievement Defining the frame of reference to describe and report educational development is the fundamental challenge in educational measurement. Some educators are interested in determining the developmental level of students, and in describing achievement as a point on a continuum that spans the years of schooling. Others are concerned with understanding student strengths and weaknesses across the curriculum, setting goals, and designing instructional programs. Still others want to know whether students satisfy standards of performance in various school subjects. Each of these educators may share a common purpose for assessment but would require different frames of reference for reports of results. This part of the Guide to Research and Development describes procedures used for scaling, norming, and equating The Iowa Tests. Scaling methods define longitudinal score scales for measuring growth in achievement. Norming methods estimate national performance and long-term trends in achievement and provide a basis for measuring strengths and weaknesses of individuals and groups. Equating methods establish comparability of scores on equivalent test forms. Together these techniques produce reliable scores that satisfy the demands of users and meet professional test standards. Comparability of Developmental Scores Across Levels: The Growth Model The foundation of any developmental scale of educational achievement is the definition of gradeto-grade overlap. Students vary considerably within any given grade in the kinds of cognitive tasks they can perform. For example, some students in third grade can solve problems in mathematics that are difficult for the average student in sixth grade. Conversely, some students in sixth grade read no better than the average student in third grade. There is even more overlap in the cognitive skills of students in adjacent grades—enough that some communities have devised multi-age or multi-grade classrooms to accommodate it. Grade-to-grade overlap in the distributions of cognitive skills is basic to any developmental scale that measures growth in achievement over time. Such overlap is sometimes described by the ratio of variability within grade to variability between grades. As this ratio increases, the amount of grade-to-grade overlap in achievement increases. The problems of longitudinal comparability of tests and vertical scaling and equating of test scores have existed since the first use of achievement test batteries to measure educational progress. The equivalence of scores from various levels is of special concern in using tests “out-of-level” or in individualized testing applications. For example, a standard score of 185 earned on Level 10 should be comparable to the 185 earned on any other level; a grade equivalent score of 4.8 earned on Level 10 should be comparable to a grade equivalent of 4.8 earned on another level. Each test in the ITBS battery from Levels 9 through 14 is a single continuous test representing a range of educational development from low grade 3 through superior grade 9. Each test is organized as six overlapping levels. During the 1970s, the tests were extended downward to kindergarten by the addition of Levels 5 through 8 of the Primary Battery. Beginning in 1992, the Iowa Tests of Educational Development, Levels 15–17/18 were jointly standardized with the ITBS. A common developmental scale was needed to relate the scores from each level to the other levels. The scaling requirement consisted of establishing the overlap among the raw score scales for the levels and relating the raw score scales to a common developmental scale. The scaling test method used to build the developmental scale for the ITBS and ITED, Hieronymus scaling, is described in Petersen, Kolen & Hoover (1989). Scaling procedures that are specific to current forms of The Iowa Tests are discussed in this part of the Guide to Research and Development. The developmental scales for the previous editions of the ITBS steadily evolved over the years of their use. The growth models and procedures used to 51 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 52 derive the developmental scales for the Multilevel Battery (Forms 1 through 6) using Hieronymus scaling are described on pages 75–78 of the 1974 Manual for Administrators, Supervisors, and Counselors. The downward extension of the growth model to include Levels 7 and 8 is outlined in the Manual for Administrators for the Primary Battery, 1975, pages 43–45. The further downward extension to Levels 5 and 6 in 1978 is described on page 118 of the 1982 Manual for School Administrators. Over the history of these editions of the tests, the scale was adjusted periodically. This was done to accommodate new levels of the battery or changes in the ratio of within- to between-grade variability observed in national standardization studies and large-scale testing programs that used The Iowa Tests. In the 1963 and 1970 national standardization programs, minor adjustments were made in the model at the upper and lower extremes of the grade distributions, mainly as a result of changes in extrapolation procedures. During the 1970s it became apparent that differential changes in achievement were taking place from grade to grade and from test to test. Achievement by students in the lower grades was at the same level or slightly higher during the seven-year period. In the upper grades, however, achievement levels declined markedly in language and mathematics over the same period. Differential changes in absolute level of performance increased the amount of grade-tograde overlap in performance and necessitated major changes in the grade-equivalent to percentilerank relationships. Scaling studies involving the vertical equating of levels were based on 1970–1977 achievement test scores. The procedures and the resulting changes in the growth models are described in the 1982 Manual for School Administrators, pages 117–118. Between 1977 and 1984, data from state testing programs and school systems across the country suggested that differential changes in achievement across grades had continued. Most of the available evidence, however, indicated that these changes differed from changes of the previous seven-year period. In all grades and test areas, achievement appeared to be increasing. Changes in median achievement by grade for 1977–1981 and 1981–84 are documented in the 1986 Manual for School Administrators (Hieronymus & Hoover, 1986). Changes in median achievement after 1984 are described in the 1990 Manual for School Administrators, Supplement (Hoover & Hieronymus, 1990), and later in Part 4 of this Guide. 52 Patterns of achievement on the tests during the 1970s and 1980s provided convincing evidence that another scaling study was needed to ascertain the grade-to-grade overlap for future editions of the tests. Not only had test performance changed significantly, so had school curriculum in the achievement areas measured by the tests. In addition, in 1992 the ITED was to be jointly standardized and scaled with the ITBS for the first time, so developmental links between the two batteries were needed. The National Standard Score Scale Students in the 1992 spring national standardization participated in special test administrations for scaling the ITBS and ITED. The scaling tests were wide-range achievement tests designed to represent each content domain in the Complete Battery of the ITBS or ITED. Scaling tests were developed for three groups: kindergarten through grade 3, grades 3 through 9, and grades 8 through 12. These tests were designed to establish links among the three sets of tests from the data collected. During the standardization, scaling tests in each content area were spiraled within classrooms to obtain nationally representative and comparable data for each subtest. The scaling tests provide essential information about achievement differences and similarities between groups of students in successive grades. For example, the scores show the variability among fourth graders in science achievement and the proportion of fourth graders that score higher in science than the typical fifth grader. The study of such relations is essential to building developmental score scales. These score scales monitor year-to-year growth and estimate students’ developmental levels in areas such as reading, language, and math. To describe the developmental continuum in one subject area, students in several different grades must answer the same questions. Because of the range of item difficulty in the scaling tests, special Directions for Administration were prepared. The score distributions on the scaling tests defined the grade-to-grade overlap needed to establish the common developmental achievement scale in each test area. An estimated distribution of true scores was obtained for every content area using the appropriate adjustment for unreliability (Feldt & Brennan, 1989). The percentage of students in a given grade who scored higher than the median of other grades on that scaling test was determined 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 53 from the estimated distribution of true scores. This procedure provided estimates of the ratios of withinto between-grade variability free of chance errors of measurement and defined the amount of grade-tograde overlap in each achievement domain. The table summarizes the relations among grade medians for Language Usage and Expression for Forms G and H in 1984 and for Forms K and L in 1992. Each row of Table 4.1 reports the percent of students in that grade who exceeded the median of the grade in each column. The entries for 1992 also describe the scale used for Forms A and B after the 2000 national standardization. The relation of standard scores to percentile ranks for each grade was obtained from the results of the scaling test. Given the percentages of students in the national standardization in one grade above or below the medians of other grades, within-grade percentiles on the developmental scale were determined. These percentiles were plotted and smoothed. This produced a cumulative distribution of standard scores for each test and grade, which represents the growth model for that test. The relations between raw scores and standard scores were obtained from the percentile ranks on each scale. Two factors created the differences between the 1984 and 1992 distributions. First, the ratio of within- to between-grade variability in student performance increased. Second, before 1992, the parts of the growth model below grade 3 and above grade 8 were extrapolated from the available data on grades 3–8. In the 1992 standardization, scaling test data were collected in the primary and high school grades, which allowed the growth model to be empirically determined below grade 3 and above grade 8. Table 4.1 illustrates the changes in grade-to-grade overlap that led to the decision to rescale the tests. Table 4.1 Comparison of Grade-to-Grade Overlap Iowa Tests of Basic Skills, Language Usage and Expression—Forms K/L vs. Forms G/H National Standardization Data, 1992 and 1984 Percent of GEs in Each Grade Exceeding Grade Median (Fall) Determined from the 1992 and 1984 Scaling Studies Grade Year Grade Medians K 123 K.2 1 140 1.2 2 157 2.2 3 174 3.2 4 190 4.2 5 205 5.2 6 219 6.2 7 230 7.2 8 241 8.2 8 1992 1984 99 99 97 99 91 99 84 96 76 88 66 76 58 64 50 50 7 1992 1984 99 99 95 99 90 98 81 91 70 79 59 65 50 50 42 35 6 1992 1984 99 99 95 99 88 93 77 82 63 67 50 50 41 33 33 17 5 1992 1984 99 99 93 97 83 85 67 68 50 50 35 31 27 16 20 6 4 1992 1984 99 99 98 99 89 88 74 70 50 50 32 30 21 15 14 5 9 1 3 1992 1984 99 99 95 93 79 73 50 50 26 28 13 13 6 3 3 1 1 1 2 1992 1984 99 99 87 78 50 50 19 27 7 11 2 2 1 1 1 1992 1984 94 86 50 50 10 24 1 9 1 1 K 1992 1984 50 50 2 22 1 7 1 1 53 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 54 Table 4.1 indicates that the amount of grade-tograde overlap in the 1992 and 2000 developmental standard score scale tends to increase steadily from kindergarten to eighth grade. This pattern is consistent with a model for growth in achievement in which median growth decreases across grades at the same time as variability in performance increases within grades. The type of data illustrated in Table 4.1 provides empirical evidence of grade-to-grade overlap that must be incorporated into the definition of growth reflected in the final developmental scale. But such data do not resolve the scaling problem. Units for the description of growth from grade to grade must be defined so that comparability can be achieved between descriptions of growth in different content areas. To define these units, achievement data were examined from several sources in which the focus of Grade: standard-score points, but from grade 7 to grade 8 it averages only 11 points. The grade-equivalent (GE) scale for The Iowa Tests is a monotonic transformation of the standard score scale. As with previous test forms, the GE scale measures growth based on the typical change observed during the school year. As such, it represents a different growth model than does the standard score scale (Hoover, 1984). With GEs, the average student ‘‘grows’’ one unit on the scale each year, by definition. As noted by Hoover, GEs are a readily interpretable scale for many elementary school teachers because they describe growth in terms familiar to them. GEs become less useful during high school, when school curriculum becomes more varied and the scale tends to exaggerate growth. K 1 2 3 4 5 6 7 8 9 10 11 12 SS: 130 150 168 185 200 214 227 239 250 260 268 275 280 GE: K.8 1.8 2.8 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8 11.8 12.8 measurement was on growth in key curriculum areas at a national level. The data included results of scaling studies using not only the Hieronymus method, but also Thurstone and item-response theory methods (Mittman, 1958; Loyd & Hoover, 1980; Harris & Hoover, 1987; Becker & Forsyth, 1992; Andrews, 1995). Although the properties of developmental scales vary with the methods used to create them, all data sources showed that growth in achievement is rapid in the early stages of development and more gradual in the later stages. Theories of cognitive development also support these general findings (Snow & Lohman, 1989). The growth model for the current edition of The Iowa Tests was determined so that it was consistent with the patterns of growth over the history of The Iowa Tests and with the experience of educators in measuring student growth and development. The developmental scale used for reporting ITBS results was established by assigning a score of 200 to the median performance of students in the spring of grade 4 and 250 to the median performance of students in the spring of grade 8. The table above shows the developmental standard scores that correspond to typical performance of grade groups on each ITBS test in the spring of the year. The scale illustrates that average annual growth decreases as students move through the grades. For example, the growth from grade 1 to grade 2 averages 18 54 Before 1992, the principal developmental score scale of the ITBS was defined with grade equivalents (GEs) using the Hieronymus method. Other scales for reporting results on those editions of the tests, notably developmental standard scores, were obtained independently using the Thurstone method. For reasons related to the non-normality of achievement test score distributions (Flanagan, 1951), the Thurstone method was not used for the current editions of The Iowa Tests. Beginning with Forms K/L/M, both developmental standard scores and grade equivalents were derived with the Hieronymus method. As a result of the development of new scales, neither GEs nor standard scores for Forms A and B are directly comparable to those reported before Forms K/L/M. The purpose of a developmental scale in achievement testing is to permit score comparisons between different levels of a test. Such comparisons are dependable under standard conditions of test administration. In some situations, however, developmental scores (developmental standard scores and grade equivalents) obtained across levels may not seem comparable. Equivalence of scores across levels in the scaling study was obtained under optimal conditions of motivation. Differences in attitude and motivation, however, may affect comparisons of results from ‘‘on-level’’ and ‘‘out-oflevel’’ testing of students who differ markedly in 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 55 developmental level. If students take their tests seriously, scores from different levels will be similar (except for errors of measurement). If students are frustrated or unmotivated because a test is too difficult, they will probably obtain scores in the ‘‘chance’’ range. But if students are challenged and motivated taking a lower level, their achievement will be measured more accurately. Greater measurement error is expected if students are assigned an inappropriate level of the test (too easy or too difficult). This results in a higher standard score or grade equivalent on higher levels of the test than lower levels, because standard scores and grade equivalents that correspond to ‘‘chance’’ increase from level to level. The same is true for perfect or near-perfect scores. These considerations show the importance of motivation, attitude, and assignment of test level in accurately measuring a student’s developmental level. For more discussion of issues concerning developmental score scales, see ‘‘Scaling, Norming, and Equating’’ in the third edition of Educational Measurement (Petersen, Kolen & Hoover, 1989). Characteristics of developmental score scales, particularly as they relate to statistical procedures and assumptions used in scaling and equating, have been addressed by the continuous research program at The University of Iowa (Mittman, 1958; Beggs & Hieronymus, 1968; Plake, 1979; Loyd, 1980; Loyd & Hoover, 1980; Kolen, 1981; Hoover, 1984; Harris & Hoover, 1987; Becker & Forsyth, 1992; Andrews, 1995). Development and Monitoring of National Norms for the ITBS The procedures used to develop norms for the ITBS were described in Part 2. Similar procedures have been used to develop national norms since the first forms of the ITBS were published in 1956. These procedures form the basis for normative information available in score reports for The Iowa Tests: student norms, building norms, skill norms, item norms, and norms for special school populations. Over the years, changes in performance have been monitored to inform users of each new edition about the normative differences they might expect with new test forms. The 2000 national standardization of The Iowa Tests formed the basis for the norms of Forms A and B of the Complete Battery and Survey Battery. Data from the standardization established benchmark performance for nationally representative samples of students in the fall and spring of the school year and were used to estimate midyear performance through interpolation. The differences between 1992 and 2000 performance, expressed in percentile ranks for the main test scores, are shown in Table 4.2. The achievement levels in the first column are expressed in terms of 1992 national percentile ranks. The entries in the table show the corresponding 2000 percentile ranks. For example, a score on the Reading test that would have a percentile rank of 50 in grade 5 according to 1992 norms would convert to a percentile rank of 58 on the 2000 norms. Trends in Achievement Test Performance In general, true changes in educational achievement take place slowly. Despite public debate about school reform and education standards, the underlying educational goals of schools are relatively stable. Lasting changes in educational methods and materials tend to be evolutionary rather than revolutionary, and student motivation and public support of education change slowly. Data from the national standardizations provide important information about trends in achievement over time. Nationally standardized tests of ability and achievement are typically restandardized every seven to ten years when new test forms are published. The advantage of using the same norms over a period of time is that scores from year to year can be based on the same metric. Gains or losses are “real”; i.e., no part of the measured gains or losses can be attributed to changes in the norms. The disadvantage, of course, is the norms become dated. How serious this is depends on how much performance has changed over the period. Differences in performance between editions, which are represented by changes in norms, were relatively minor for early editions of the ITBS. This situation changed dramatically in the late 1960s and the 1970s. Shortly after 1965, achievement declined—first in mathematics, then in language skills, and later in other curriculum areas. This downward trend in achievement in the late 1960s and early 1970s was reflected in the test norms during that period, which were “softer” than norms before and after that time. Beginning in the mid-1970s, achievement improved slowly but consistently in all curriculum areas until the early 1990s, when it reached an all-time high. 55 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 56 Table 4.2 Differences Between National Percentile Ranks Iowa Tests of Basic Skills — Forms K/L vs. A/B National Standardization Data, 1992 and 2000 Reading Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 Language Corresponding PRs: 2000 National Norms 1 2 3 98 94 88 78 68 58 48 39 31 22 12 5 1 99 97 93 84 73 63 52 41 31 21 12 5 1 99 98 93 86 77 68 58 48 37 26 15 7 3 Grade 5 4 6 7 8 99 98 92 83 74 66 57 48 38 28 16 8 3 99 97 93 83 71 60 50 39 28 18 9 3 1 99 97 91 80 70 60 52 43 34 24 13 6 1 99 97 91 82 73 62 52 41 30 20 11 5 2 99 99 96 88 78 67 58 46 37 26 15 7 2 Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 Corresponding PRs: 2000 National Norms 1 2 3 Grade 5 4 6 7 8 98 96 89 79 71 61 51 42 34 24 12 5 2 99 97 91 81 72 62 54 44 34 24 12 5 2 99 97 92 84 75 66 56 47 35 25 15 6 2 99 97 93 85 77 68 60 51 41 30 17 9 3 99 97 91 80 70 59 48 39 30 21 11 5 1 99 96 91 81 71 61 50 42 33 23 14 7 3 98 95 88 79 70 61 51 42 33 25 15 8 3 Mathematics Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 Social Studies Corresponding PRs: 2000 National Norms 1 2 3 Grade 5 4 6 7 8 98 95 89 78 70 61 51 41 32 23 13 6 2 98 95 88 78 69 58 48 38 29 19 10 4 1 98 95 90 79 69 60 50 40 31 21 11 4 1 99 96 90 81 70 60 51 42 32 22 12 5 1 99 97 90 78 67 56 46 37 28 20 10 4 1 99 96 90 79 68 57 47 38 30 20 11 5 1 98 96 90 80 69 59 49 39 31 21 11 5 2 99 96 91 80 71 62 52 44 34 25 14 6 2 Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 Science Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 56 99 97 93 85 77 68 60 51 41 30 19 10 5 Corresponding PRs: 2000 National Norms 1 2 3 Grade 5 4 6 7 8 98 95 91 83 75 67 60 51 41 29 15 6 2 98 96 92 84 75 66 56 48 37 28 15 7 2 99 99 95 87 80 72 61 50 39 27 14 5 2 99 98 94 86 78 68 58 50 41 29 15 6 2 99 97 91 81 71 61 51 42 32 21 11 4 1 98 96 91 82 70 59 47 38 28 19 8 3 1 99 97 92 81 71 62 52 44 32 22 11 4 1 99 98 93 86 79 72 64 53 43 31 19 9 4 Sources of Information Corresponding PRs: 2000 National Norms 1 2 3 Grade 5 4 99 96 90 80 70 61 53 43 33 22 13 5 1 98 96 90 81 72 64 53 44 35 25 14 6 2 97 93 87 78 69 61 53 43 32 22 13 7 2 98 95 88 79 70 62 53 46 38 27 15 7 2 97 94 87 77 69 62 54 46 37 28 17 9 3 6 7 8 98 94 88 77 68 58 47 38 29 19 8 3 1 99 97 91 82 72 62 51 41 31 21 9 4 1 99 97 91 81 73 63 54 47 38 29 17 7 2 Achievement Level 1992 99 96 90 80 70 60 50 40 30 20 10 4 1 Corresponding PRs: 2000 National Norms 1 2 3 Grade 5 4 6 7 8 97 95 90 80 70 60 52 42 32 21 10 3 1 98 96 89 80 70 60 52 41 31 23 13 6 3 99 97 93 85 77 68 58 49 38 26 13 5 1 99 97 91 83 74 65 57 47 38 27 14 7 2 99 98 94 85 74 63 52 42 32 23 14 7 2 99 98 94 85 74 63 52 42 33 25 14 6 2 99 96 92 82 71 60 50 41 32 22 14 7 2 99 96 92 83 73 65 56 48 39 28 17 8 3 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 57 Figure 4.1 Trends in National Performance Iowa Tests of Basic Skills — Complete Composite, Grades 3–8 National Standardization Data, 1955–2000 9.0 Grade 8 Grade-Equivalent Score 8.0 7 Grade 7.0 Grade 6 6.0 5 Grade 5.0 4 Grade 4.0 3 Grade 3.0 1955 1963 1970 1977 1984 1992 2000 Year of National Standardization Since the early 1990s, no dominant trend in achievement test scores has appeared. Scores have increased slightly in some areas and grades and decreased slightly in others. In the context of achievement trends since the mid-1950s, achievement in the 1990s has been extremely stable. National trends in achievement measured by the ITBS Complete Composite and monitored across standardizations are shown in Figure 4.1. Differences in median performance for each standardization period from 1955 to 2000 are summarized in Table 4.3 using 1955 grade equivalents as the base unit. Between 1955 and 1963, achievement improved consistently for most test areas, grade levels, and achievement levels. The average change for the composite over all grades represented an improvement of 2.3 months. From 1963 to 1970, differences in median composite scores were negligible, averaging a loss of twotenths of a month. Small but consistent qualitative differences occurred in some achievement areas and grades, however. In general, these changes were positive in the lower grades and tended to balance out in the upper grades. Gains were fairly consistent in Vocabulary and Sources of Information, but losses occurred in Reading and some Language skills. Math achievement improved in the lower grades, but sizable losses in concepts and problem solving occurred in the upper grades. Between 1970 and 1977, test performance generally declined, especially in the upper grades. The average median loss over grades 3 through 8 for the Composite score was 2.2 months. Differences varied from grade to grade and test to test. Differences in grade 3 were generally positive, especially at the median and above. From grades 4 to 8, performance declined more markedly. In general, the greatest 57 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 58 Table 4.3 Summary of Median Differences Iowa Tests of Basic Skills National Standardization Data, 1955–2000 (1995 National Grade Equivalents in “Months”) Reading Vocabulary Grade Period Spelling Capitalization Punctuation L2 Mathematics Concepts Problems Usage & Language Computa& Data & Expression Total tion InterpreEstimation tation Sources of Information Math Total Social Studies Science SS SC Maps Reference Sources and Materials Total Diagrams Composite Word Analysis Listening WA Li RV RC L1 L3 L4 LT M1 M2 M3 MT S1 S2 ST CC 8 00-92 92-84 84-77 77-70 70-63 63-55 -1.0 1.1 5.2 -4.9 .4 -.8 -1.0 2.3 6.0 -4.2 -2.1 2.5 -2.2 -.6 5.7 -.3 1.0 2.1 -2.2 3.4 7.8 -10.1 .6 3.3 .1 11.4 5.9 -5.6 -1.6 -2.0 -.7 .4 7.8 -9.7 -3.1 1.6 -1.0 3.5 7.0 -7.2 .0 1.3 1.6 1.9 4.0 -7.9 -1.7 3.4 -1.3 3.1 5.0 -5.1 -3.1 1.6 -1.9 .3 5.6 .0 1.9 4.9 -6.4 -2.4 2.5 -.7 -.6 5.5 -5.2 .8 2.6 -1.5 -2.9 4.4 -2.6 1.4 1.8 -1.0 -1.8 5.0 -3.9 1.0 2.3 -.8 1.2 5.5 -5.0 -1.1 1.6 7 00-92 92-84 84-77 77-70 70-63 63-55 -2.3 1.1 3.1 -2.9 1.0 1.8 -.4 2.2 5.6 -2.6 -1.4 1.5 -2.0 .0 5.5 -1.9 1.1 4.8 -1.6 2.2 6.6 -6.4 -.9 6.6 -1.0 7.7 5.7 -5.4 .0 -2.4 -.6 2.8 5.0 -7.2 -1.6 1.4 -1.2 3.0 5.8 -5.2 .0 2.6 .3 .9 3.7 -6.1 -2.2 3.5 .7 .7 4.2 -5.2 -4.1 4.6 -.7 -.3 4.1 .5 .6 4.5 -5.6 -2.8 4.1 -.8 -.3 3.8 -3.3 1.7 3.8 -4.4 -1.5 2.1 -2.2 2.0 2.8 -2.4 -.9 3.0 -2.7 1.8 3.4 -1.2 1.1 4.3 -3.4 -.4 2.7 6 00-92 92-84 84-77 77-70 70-63 63-55 -.3 2.8 3.3 -3.4 .9 .8 .2 .6 5.5 -3.0 -2.6 2.2 -.8 .2 5.1 -1.0 -1.4 1.5 -1.0 3.0 4.4 -8.3 -2.8 4.5 .5 4.5 5.6 -6.4 -1.4 -.2 .9 3.3 4.9 -7.6 -1.6 2.4 .0 2.4 5.2 -5.8 -1.4 2.1 1.6 3.1 3.5 -6.4 -2.7 4.4 .0 1.2 1.9 -4.3 -3.0 2.5 -1.0 .9 2.9 .7 1.7 2.9 -5.3 -3.0 3.4 .1 -.7 4.0 -3.5 1.4 1.6 -4.1 -1.4 3.2 -3.4 .3 3.8 -1.9 -1.0 3.6 -3.4 1.1 2.3 -.3 1.3 4.1 -3.8 -1.1 2.2 5 00-92 92-84 84-77 77-70 70-63 63-55 -3.3 2.7 3.6 -2.2 1.4 .8 -3.7 3.1 5.2 -1.6 -.7 2.6 -5.3 2.8 5.5 .0 .4 4.0 -4.7 4.1 4.2 -3.6 -2.3 4.4 -3.2 5.5 4.5 -6.0 .0 1.8 -4.0 3.5 4.5 -5.3 -.8 2.0 -4.2 4.0 4.8 -3.7 -.5 3.1 -.6 2.6 3.0 -3.3 .8 3.6 -1.3 2.4 2.1 -3.8 .6 2.6 -1.0 .7 2.4 -.9 1.9 2.5 -3.5 .6 3.1 -1.1 -.6 2.7 -1.5 2.4 1.5 -4.1 .5 3.4 -1.5 .7 1.8 -2.5 .2 3.0 -1.5 1.5 1.6 -2.9 2.5 3.8 -2.3 .6 2.2 4 00-92 92-84 84-77 77-70 70-63 63-55 -3.6 1.3 2.0 -2.1 .6 3.6 -2.2 1.3 4.6 -.9 -.6 2.5 -4.3 1.7 4.8 .3 -1.0 3.4 -4.4 .7 3.0 -3.5 -1.6 6.6 -2.2 4.4 4.0 -3.4 -.6 3.4 -3.3 2.0 4.3 -2.2 -2.2 3.2 -3.5 2.0 4.2 -2.2 -1.4 4.2 -.8 2.0 2.7 -1.9 .6 1.9 -.4 1.5 1.6 -2.1 .3 .6 -1.0 .9 1.9 -.4 1.9 2.1 -2.0 .7 1.3 -2.0 1.2 1.8 -.5 2.3 1.0 -2.3 1.0 2.3 -1.3 1.0 2.8 -2.1 1.3 2.0 -.9 1.9 1.6 -2.4 1.6 2.9 -1.4 .3 2.6 3 00-92 92-84 84-77 77-70 70-63 63-55 -2.9 .7 1.4 3.9 1.3 2.4 -2.4 -1.1 3.2 3.7 .5 4.3 -1.5 2.4 3.2 2.1 .0 3.0 -2.9 3.7 1.9 -.3 -.6 4.6 -1.3 3.7 3.5 -1.4 1.0 1.8 -2.2 4.6 2.9 2.2 .0 .5 -1.9 3.5 2.8 .7 -.6 2.5 .2 1.7 2.1 1.0 .7 2.8 -.1 .4 1.3 2.0 1.9 .6 -1.0 .9 .8 .0 1.1 1.5 1.5 1.6 1.7 -2.4 2.5 1.4 1.9 .6 2.0 -1.9 .4 1.2 2.5 1.0 2.2 -2.1 1.9 1.3 2.2 .9 2.1 -1.9 1.2 2.1 2.6 .4 2.6 2 00-92 92-84 84-77 77-70 -.7 .3 1.9 4.2 -.8 -.1 3.2 1.8 -.6 1.3 1.8 -.4 .3 1.9 .6 -1.2 .6 2.6 1.9 -1.6 -.9 .7 .7 .4 1.7 1.1 -1.4 -.4 .9 1.3 1.2 -.4 2.1 2.0 1.0 .4 3.7 2.0 .9 -1.0 4.9 2.7 .7 1 00-92 92-84 84-77 77-70 -.8 -.9 1.8 .1 1.2 1.8 .9 .4 .1 2.7 2.6 -2.2 -.3 2.6 .8 -.8 -.1 .8 -.9 -.2 1.8 1.6 -.5 -.8 2.4 .9 .6 -1.0 1.5 2.7 -1.0 00-92 92-84 84-77 77-70 70-63 63-55 -2.2 1.6 3.1 -1.9 .9 1.4 -1.6 1.4 5.0 -1.4 -1.2 2.6 -2.0 -.1 3.0 -1.7 1.4 2.2 -1.6 1.5 3.8 -2.2 -.2 2.3 Av. Grades 3-8 58 Comprehension Language -2.7 1.1 5.0 -.6 .0 3.1 -2.8 2.9 4.7 -5.4 -1.3 5.0 -1.2 6.2 4.9 -4.7 -.4 .4 -1.7 2.8 4.9 -5.0 -1.6 1.8 -2.0 3.1 5.0 -3.9 -.6 2.6 .4 2.0 3.2 -4.1 -.8 3.3 -.4 1.6 2.7 -3.1 -1.2 2.1 -1.1 .6 3.0 .0 1.5 3.0 -2.1 .9 2.7 -1.2 .3 3.2 -2.0 1.5 2.1 -3.1 -.7 2.8 -1.4 1.1 2.5 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 59 Figure 4.2 Trends in Iowa Performance Iowa Tests of Basic Skills — Complete Composite, Grades 3–8 Iowa State Testing Program Data, 1955–2001 (1965 Iowa Grade Equivalents) 9.0 de Gra Grade-Equivalent Score 8.0 Gr 7.0 ad 8 e7 de 6 Gra 6.0 de 5 Gra 5.0 e4 Grad 4.0 e3 Grad 3.0 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Year declines appeared in Capitalization, Usage and Expression, and Math Concepts. These trends are consistent with other national data on student performance for the same period (Koretz, 1986). Between 1977 and 1984, the improvement in ITBS test performance more than made up for previous losses in most test areas. Achievement in 1984 was at an all-time high in nearly all test areas. This upward trend continued through the 1980s and is reflected in the norms developed for Forms K and L in 1992. During February 2000, Forms K and A were jointly administered to a national sample of students in kindergarten through grade 8. This sample was selected to represent the norming population in terms of variability in achievement, the main requirement of an equating sample (Kolen & Brennan, 1995). A single-group, counterbalanced design was used. In each grade, students took Form K and Form A of the ITBS Complete Battery. For each subtest and level, matched student records of Form K and Form A were created. Frequency distributions were obtained, and raw scores were linked by the equipercentile method. The resulting equating functions were then smoothed with cubic splines. This procedure defined the raw-score to raw-score relationship between Form A and Form K for each test. Standard scores on Form A could then be determined for norming dates before 2000 by linear interpolation. In this way, trend lines could be updated, and expected relations between new and old test norms could be determined. Trends in the national standardization data for the ITBS are reflected in greater detail in the trends for the state of Iowa. Virtually all schools in Iowa participate on a voluntary basis in the Iowa Basic Skills Testing Program. Trend lines for student performance have been monitored as part of the Iowa state testing program since the 1950s. 59 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 60 Trend lines for the state of Iowa (Figure 4.2) show a pattern in achievement test scores that is similar to that in national standardization samples. For any given grade, the peaks and valleys of overall achievement measured by the ITBS occur at about the same time. Further, both Iowa and national trends indicate that test scores in the lower elementary grades, grades 3 and 4, have generally held steady or risen since the first administration of the ITBS Multilevel Battery. An exception to this observation appears in the Iowa data in the years since the 1992 standardization, when declining Composite scores were observed in grades 3 and 4 for the first time. This decline was also evident in the 2000 national standardization and extended to grade 5. Norms for Special School Populations As described in Part 2, the 2000 standardization sample included three independent samples: a public school sample, a Catholic school sample, and a private non-Catholic school sample. Schools in the standardization were further stratified by socioeconomic status. Data from these sources were used to develop special norms for The Iowa Tests for students enrolled in Catholic/private schools, as well as norms for other groups. The method used to develop norms was the same for each special school population. Frequency distributions from each grade in the standardization sample were cumulated for the relevant group of students. The cumulative distributions were then plotted and smoothed. Interpretive Guide for Teachers and Counselors, show the parallelism achieved in content for each test and level. Alternate forms of tests should be similar in difficulty as well. Concurrent assembly of test forms provides some control over difficulty, but small differences between forms are typically observed during standardization. Equating methods are used to adjust scores for differences in difficulty not controlled during assembly of the forms. Forms A and B were assembled concurrently to the same content and difficulty specifications from the pool of items included in preliminary and national item tryouts. In the tests consisting of discrete questions (Vocabulary, Spelling, Capitalization, Punctuation, Concepts and Estimation, Computation, and parts of Social Studies, Science, and Reference Materials), items in the same or similar content, skills, and difficulty categories were first assigned, more or less at random, to Form A or B. Then, adjustments were made to avoid too much similarity from item to item and to achieve comparable difficulty distributions across forms. Concurrent assembly of multiple test forms is the best way to ensure comparability of scores and reasonable results from equating. Linking methods rely on comparable content to justify the term “equating” (Linn, 1993; Kolen & Brennan, 1995). The Iowa Tests are designed so truly equated scores on parallel forms can be obtained. The Iowa Tests of Basic Skills have been restandardized approximately every seven years. Each time new forms are published, they are carefully equated to previous forms. Procedures for equating previous forms to each other have been described in the Manual for School Administrators for those forms. The procedures used in equating Forms A and B of the current edition are described in this part of the Guide to Research and Development. Forms A and B of the Complete Battery were equated with a comparable-groups design (Petersen, Kolen & Hoover, 1989). In Levels 7 and 8 of the Complete Battery, which are read aloud by classroom teachers, test forms were administered by classroom to create comparable samples. Student records were matched to Form A records from the spring standardization. Frequency distributions in the fall sample were weighted so that the Form A and B cohorts had the same distribution on each subtest in the spring sample. The weighted frequency distributions were used to obtain the equipercentile relationship between Form A and B of each subtest. This relation was smoothed with cubic splines (Kolen, 1984) and standard scores were attached to Form B raw scores by interpolation. The equivalence of alternate forms of The Iowa Tests is established through careful test development and standard methods of test equating. The tests are assembled to match tables of specifications that are sufficiently detailed to allow test developers to create equivalent forms in terms of test content. The tables of skill classifications, included in the At Levels 9 through 14, Forms A and B were spiraled within classroom to obtain comparable samples. Frequency distributions for the two forms were linked by the equipercentile method and smoothed with cubic splines. Standard scores were attached to each raw score distribution using the equating results. The raw-score to standard-score Equivalence of Forms 60 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 61 conversions were then smoothed. Table 4.4 reports the sample sizes used in the equating of Levels 7–14 of Forms A and B. Table 4.4 Sample Sizes for Equating Forms A and B Form B Form A Level 7 8 9 10 11 12 13 14 1 2 Survey 1 3030 1697 3098 5966 5548 6189 5445 5834 Complete 1 Survey 2 5797 3703 3045 6038 5620 6533 5681 6091 2767 2006 1437 2918 2696 2857 2463 2633 Comparable-groups design Single-group design The raw-score to standard-score conversions for the ITBS Survey Battery of Forms A and B also were developed with data from the 2000 fall standardization sample. At Levels 9 through 14 in the fall standardization, students took one form of the Complete Battery and the alternate form of the Survey Battery in a counterbalanced design. This joint administration defined the equipercentile relation between each Survey Battery subtest and the corresponding test of the Complete Battery via intact administrations of each version. The equating function was smoothed via cubic splines, and the resulting raw-score to raw-score conversion tables were used to attach standard scores to the Survey Battery raw scores. Forms A and B contain a variety of testing configurations of The Iowa Tests. For normative scores, methods for equating parallel forms used empirical data designed specifically to accomplish the desired linking. These methods do not rely on mathematical models, such as item response theory or strong true-score theory, which entail assumptions about the relationship between individual items and the domain from which they are drawn or about the shape of the distribution of unobservable true scores. Instead, these methods establish direct links between the empirical distributions of raw scores as they were observed in comparable samples of examinees. The equating results accommodate the influence of context or administrative sequence that could affect scores. Relationships of Forms A and B to Previous Forms Forms 1 through 6 of the Iowa Tests of Basic Skills Multilevel Battery were equivalent forms in many ways. Pairs of forms—1 and 2, 3 and 4, 5 and 6— were assembled as equivalent forms in the manner described for Forms A and B. Because the objectives, placement, and methodology in basic skills instruction changed slowly when these forms were used, the content specifications of the three pairs of forms did not differ greatly. One exception, Math Concepts, was described previously. The organization of the tests in the battery, the number of items per level, the time limits, and even the number of items per page were identical for the first six forms. The first significant change in organization of the battery occurred with Forms 7 and 8, published in 1978. Separate tests in Map Reading and Reading Graphs and Tables were replaced by a single Visual Materials test. In mathematics, separate tests in Problem Solving and Computation replaced the test consisting of problems with embedded computation. Other major changes included a reduction in the average number of items per test, shorter testing time, a revision in grade-to-grade item overlap, and major revisions in the taxonomy of skills objectives. With Forms G and H, published in 1985, the format changed considerably. Sixteen pages were added to the multilevel test booklet. Additional modifications were made in grade-to-grade overlap and in number of items per test. For most purposes, however, Forms G, H, and J were considered equivalent to Forms 7 and 8 in all test areas except Language Usage. As indicated in Part 3, the scope of the Usage test was expanded to include appropriateness and effectiveness of expression as well as correct usage. Forms K, L, and M continued the gradual evolution of content specifications to adapt to changes in school curriculum. The most notable change was in the flexibility of configurations of The Iowa Tests to meet local assessment needs. The Survey Battery was introduced for schools that wanted general achievement information only in reading, language arts, and mathematics. The Survey Battery is described in Part 9. Other changes in Forms K, L, and M occurred in how Composite scores were defined. Three core Composite scores were established for these forms: Reading Total, Language Total, and Mathematics Total. The Reading Total was defined as the average of the 61 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 62 standard scores in Vocabulary and Reading Comprehension. The Language Total, identical to previous editions, was the average standard score of the four Language tests: Spelling, Capitalization, Punctuation, and Usage and Expression. The Math Total for Forms K, L, and M was defined in two ways: the average of the first two subtests (Concepts & Estimation and Problem Solving & Data Interpretation) or the average of all three subtests (including Computation). The Social Studies and Science tests, considered supplementary in previous editions, were moved to the Complete Battery and were added to the ITBS Complete Composite score beginning with Forms K, L, and M. Forms K, L, and M also introduced changes to the makeup of the tests in math and work-study skills (renamed Sources of Information). In math, a separately timed estimation component was added to the Concepts test, resulting in a new test called Concepts and Estimation. The Problem Solving test was modified by adding items on the interpretation of data using graphs and tables, which had been in the Visual Materials test in Forms G, H, and J. This math test was called Problem Solving and Data Interpretation in Forms K, L, and M. A concomitant change in what had been the Visual Materials test involved adding questions on schematic diagrams and other visual stimuli for a new test: Maps and Diagrams. An additional change in the overall design specifications for the tests in Forms K, L, and M concerned grade-to-grade overlap. Previous test forms had overlapping items that spanned three levels. Overlapping items in the Complete Battery of Forms K, L, and M, Levels 9–14, spanned two levels. The Survey Battery contained no overlapping items. Forms A and B of the ITBS are equivalent to Forms K, L, and M in most test areas. Minor changes were introduced in time limits, number of items, and content emphasis in Vocabulary, Reading Comprehension, Usage and Expression, Concepts and Estimation, Problem Solving and Data Interpretation, Computation, Science, and Reference Materials. These changes were described in Part 3. 62 The other fundamental change in the ITBS occurred in the 1970s with the introduction of the Primary Battery (Levels 7 and 8) with Forms 5 and 6 in 1971 and the Early Primary Battery (Levels 5 and 6) in 1977. These levels were developed to assess basic skills in kindergarten through grade 3. Machinescorable test booklets contain responses with pictures, words, phrases, and sentences designed for the age and developmental level of students in the early grades. In Levels 5 and 6 of the Early Primary Battery, questions in Listening, Word Analysis, Vocabulary, Language, and Mathematics are read aloud by the teacher. Students look at the responses in the test booklet as they listen. Only the Reading test in Level 6 requires students to read words, phrases, and sentences to answer the questions. In the original design of Levels 7 and 8, questions on tests in Listening, Word Analysis, Spelling, Usage and Expression, Mathematics (Concepts, Problems, and Computation), Visual Materials, Reference Materials, Social Studies, and Science were read aloud by the teacher. In Vocabulary, Reading, Capitalization, and Punctuation, students read the questions on their own. Because of changes in instructional emphasis, Levels 5 through 8 of the ITBS have been revised more extensively than other levels. Beginning with Forms K, L, and M, the order of the subtests was changed. The four Language tests were combined into a single test with all questions read aloud by the teacher. At the same time, graphs and tables were moved from Visual Materials to Math Problem Solving, and a single test on Sources of Information was created. Forms A and B are equivalent to Forms K, L, and M in most respects. The number of spelling items in the Language test was increased so that a separate Spelling score could be reported. Slight changes were also made in the number of items in several other subtests and in the number of response options. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 63 PART 5 Reliability of The Iowa Tests Methods of Determining, Reporting, and Using Reliability Data A soundly planned, carefully constructed, and comprehensively standardized achievement test battery represents the most accurate and dependable measure of student achievement available to parents, teachers, and school officials. Many subtle, extraneous factors that contribute to unreliability and bias in human judgment have little or no effect on standardized test scores. In addition, other factors that contribute to the apparent inconsistency in student performance can be effectively minimized in the testing situation: temporary changes in student motivation, health, and attentiveness; minor distractions inside and outside the classroom; limitations in number, scope, and comparability of the available samples of student work; and misunderstanding by students of what the teacher expects of them. The greater effectiveness of a well-constructed achievement test in controlling these factors—compared to a teacher’s informal evaluation of the same achievement—is evidenced by the higher reliability of the test. Test reliability may be quantified by a variety of statistical data, but such data reduce to two basic types of indices. The first of these indices is the reliability coefficient. In numerical value, the reliability coefficient is between .00 and .99, and generally for standardized tests between .60 and .95. The closer the coefficient approaches the upper limit, the greater the freedom of the test scores from the influence of factors that temporarily affect student performance and obscure real differences in achievement. This ready frame of reference for reliability coefficients is deceptive in its simplicity, however. It is impossible to conclude whether a value such as .75 represents a “high” or “low,” “satisfactory” or “unsatisfactory” reliability. Only after a coefficient has been compared to those of equally valid and equally practical alternative tests can such a judgment be made. In practice, there is always a degree of uncertainty regarding the terms “equally valid” and “equally practical,” so the reliability coefficient is rarely free of ambiguity. Nonetheless, comparisons of reliability coefficients for alternative approaches to assessment can be useful in determining the relative stability of the resulting scores. The second of the statistical indices used to describe test reliability is the standard error of measurement. This index represents a measure of the net effect of all factors leading to inconsistency in student performance and to inconsistency in the interpretation of that performance. The standard error of measurement can be understood by a hypothetical example. Suppose students with the same reading ability were to take the same reading test. Despite their equal ability, they would not all get the same score. Instead, their scores would range across an interval. A few would get much higher scores than they deserve, a few much lower; the majority would get scores fairly close to their actual ability. Such variation in scores would be attributable to differences in motivation, attentiveness, and other factors suggested above. The standard error of measurement is an index of the variability of the scores of students having the same actual ability. It tells the degree of precision in placing a student at a point on the achievement continuum. There is, of course, no way to know just how much a given student’s achievement may have been underor overestimated from a single administration of a test. We may, however, make reasonable estimates of the amount by which the abilities of students in a particular reference group have been mismeasured. For about two-thirds of the examinees, the test scores obtained are “correct” within one standard error of measurement; for 95 percent, the scores are incorrect by less than two standard errors; for more than 99 percent, the scores are incorrect by less than three standard error values. Two methods of estimating reliability were used to obtain the summary statistics provided in the following two sections. The first method employed internal-consistency estimates using Kuder-Richardson Formula 20 (K-R20). Reliability coefficients derived by this technique were based on data from the entire national standardization sample. The coefficients for Form A of the Complete 63 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 64 Battery are reported here. Coefficients for Form B of the Complete Battery and Forms A and B of the Survey Battery are available in Norms and Score Conversions for each form and battery. The second method provided estimates of equivalent-forms reliability for Forms K and A from the spring 2000 equating of those forms, and for Forms A and B from the fall 2000 standardization sample. Prior to the spring standardization, a national sample of students took Forms K and A of the ITBS Complete Battery. Correlations between tests on alternate forms served as one estimate of reliability. During the fall standardization, students were administered the Complete Battery of one form and the Survey Battery of the other form. The observed relationships between scores on the Complete Battery and Survey Battery were used to estimate equivalent-forms reliability of the tests common to the two batteries. These estimates were computed from unweighted distributions of developmental standard scores. 64 Internal-Consistency Reliability Analysis The reliability data presented in Table 5.1 are based on Kuder-Richardson Formula 20 (K-R20) procedures. The means, standard deviations, and item proportions used in computing reliability coefficients for Form A are based on the entire spring national standardization sample. Means, standard deviations, and standard errors of measurement are shown for raw scores and developmental standard scores. Some tests in the current edition of the ITBS have fewer items and shorter time limits than previous forms. The reliability coefficients compare favorably with those of previous editions. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 65 Table 5.1 Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 5 Kindergarten Vocabulary Word Analysis Listening Language Mathematics Core Total Reading Profile Total V WA Li L M CT RPT Number of items 29 30 29 29 29 Fall RSs Mean SD SEM 17.3 4.7 2.3 15.8 5.7 2.4 16.3 5.1 2.5 15.5 5.3 2.4 16.2 4.7 2.3 SSs Mean SD SEM 121.7 13.1 6.4 121.4 12.9 5.5 122.7 9.3 4.5 123.3 8.6 3.9 121.4 8.9 4.4 121.8 9.0 2.9 122.0 8.9 3.2 Reliability K-R20 .763 .820 .770 .797 .748 .896 .873 Spring RSs Mean SD SEM 20.2 4.0 2.2 19.8 5.2 2.3 20.2 5.0 2.3 19.9 4.9 2.3 20.9 4.6 2.1 SSs Mean SD SEM 131.1 15.0 8.2 131.5 14.3 6.3 130.8 10.8 4.9 131.1 9.4 4.4 130.7 9.8 4.5 130.8 9.8 3.4 130.9 11.1 3.8 Reliability K-R20 .699 .806 .793 .788 .793 .877 .882 Level 6 Grade 1 Number of items Vocabulary Word Analysis Listening Language Mathematics Core Total Reading Words Reading Comprehension Reading Total Reading Profile Total V WA Li L M CT RW RC RT RPT 31 35 31 31 35 29 19 48 18.4 6.3 2.3 8.5 4.4 1.9 27.5 10.0 3.0 Fall RSs Mean SD SEM 18.8 4.9 2.4 21.5 5.8 2.5 18.2 5.1 2.5 16.1 5.5 2.5 19.9 5.8 2.6 SSs Mean SD SEM 138.1 16.0 7.9 138.9 15.9 6.9 138.1 11.9 5.8 138.3 11.0 5.0 138.3 11.3 5.1 138.2 10.9 3.6 139.1 10.2 3.7 139.1 10.2 4.5 139.0 9.1 2.7 138.6 12.3 2.7 .754 .811 .764 .790 .793 .893 .871 .805 .909 .953 Reliability K-R20 Spring RSs Mean SD SEM 22.2 4.3 2.3 25.7 5.2 2.3 22.7 4.5 2.2 21.6 4.9 2.3 25.1 5.6 2.4 SSs Mean SD SEM 150.9 18.0 9.4 152.2 18.4 8.2 150.4 13.5 6.6 151.5 13.4 6.2 150.2 13.6 5.7 .725 .800 .758 .786 .821 Reliability K-R20 24.4 5.1 1.7 13.7 4.8 1.6 38.1 9.6 2.4 151.4 12.7 4.2 152.2 14.3 4.8 152.2 14.3 4.8 151.5 13.5 3.4 151.3 12.6 3.1 .890 .886 .889 .937 .938 65 .916 .896 .910 .886 Note: -Does not include Computation +Includes Computation K-R20 152.2 14.3 4.3 150.9 18.0 6.1 Mean SD SEM SSs Reliability 22.4 7.9 2.4 18.6 6.8 2.3 Mean SD SEM RSs Spring — Grade 1 K-R20 158.9 16.3 4.7 157.5 19.0 6.1 Mean SD SEM SSs Reliability 25.6 7.4 2.1 20.9 6.6 2.1 34 30 Mean SD SEM RC RV RSs Comprehension Vocabulary .925 151.3 13.5 3.7 .940 158.4 15.8 3.9 RT Reading Total .853 152.2 18.4 7.1 23.7 6.5 2.5 .868 159.2 20.4 7.4 25.8 6.3 2.3 35 WA Word Analysis .699 150.4 13.5 7.4 20.7 4.4 2.4 .716 156.9 14.8 7.9 22.6 4.2 2.2 31 Li Listening .880 150.4 11.2 3.9 16.3 5.3 1.8 .907 157.0 12.9 3.9 18.7 5.0 1.5 23 L1 Spelling .869 151.5 13.4 4.8 23.2 6.7 2.4 .874 158.1 15.1 5.4 26.0 6.1 2.2 34 L Language .776 150.0 13.5 6.4 20.4 4.5 2.1 .806 156.5 14.5 6.4 22.4 4.3 1.9 29 M1 Concepts .807 150.7 15.9 7.0 17.0 5.1 2.2 .845 157.5 17.3 6.8 19.0 5.2 2.0 28 M2 Problems & Data Interpretation .878 150.2 13.6 4.7 .900 157.1 14.8 4.7 M3 MT - .865 150.1 9.3 3.4 18.6 5.6 2.0 .872 154.2 9.9 3.5 20.6 5.2 1.9 27 Computation Math Total Mathematics .910 150.2 11.2 3.4 .932 156.4 12.8 3.3 MT + Math Total .959 151.4 12.7 2.6 .962 157.8 13.8 2.7 CT - Core Total .964 151.7 12.2 2.3 .966 157.4 13.4 2.5 CT + Core Total .750 151.1 15.2 7.6 21.8 4.7 2.3 .755 157.8 16.3 8.1 23.6 4.4 2.2 31 SS Social Studies .726 149.8 16.8 8.8 23.8 4.0 2.1 .702 157.4 18.3 10.0 25.3 3.5 1.9 31 SC Science .843 150.5 13.1 5.2 14.3 4.9 1.9 .862 157.6 14.8 5.5 16.6 4.6 1.7 22 SI .966 151.2 12.3 2.3 .966 157.5 13.2 2.4 CC - Sources Compoof site Information .965 151.4 11.8 2.2 .966 157.8 13.0 2.4 CC + Composite .956 151.3 12.6 2.6 .961 158.5 14.0 2.8 RPT Reading Profile Total 3:15 PM Fall — Grade 2 66 10/29/10 Number of items Level 7 Reading Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 66 .900 .875 .897 .873 Note: -Does not include Computation +Includes Computation K-R20 170.7 19.6 6.3 168.6 19.8 7.1 Mean SD SEM SSs Reliability 27.9 7.2 2.3 20.0 6.8 2.4 Mean SD SEM RSs Spring — Grade 2 K-R20 177.5 21.4 6.8 175.4 20.6 7.3 Mean SD SEM SSs Reliability 29.9 6.8 2.1 22.3 6.5 2.3 Mean SD SEM RSs Fall — Grade 3 38 RC RV .939 170.0 19.1 4.7 .939 176.3 20.1 5.0 RT Reading Total .847 171.0 23.7 9.3 24.3 6.8 2.7 .862 177.6 25.4 9.4 26.1 6.8 2.5 38 WA .723 168.2 16.3 8.6 21.5 4.5 2.4 .740 174.6 17.3 8.8 23.1 4.4 2.2 31 Li Listening .821 168.5 15.8 6.7 16.9 4.4 1.9 .853 175.4 17.9 6.8 18.5 4.3 1.6 23 L1 Spelling .875 169.8 17.2 6.1 30.1 7.4 2.6 .891 177.0 19.5 6.4 32.7 7.1 2.4 42 L Language .787 168.1 16.5 7.6 21.4 4.8 2.2 .815 174.5 17.7 7.6 23.2 4.7 2.0 31 M1 Concepts .834 169.1 19.8 8.1 19.5 5.5 2.2 .859 176.1 21.3 8.0 21.2 5.6 2.1 30 M2 Problems & Data Interpretation .892 168.6 16.9 5.6 .910 175.6 18.4 5.5 M3 MT - .839 168.3 13.1 5.3 20.0 5.2 2.1 .854 172.3 13.9 5.3 21.4 5.2 2.0 30 Computation Math Total Mathematics .922 168.9 14.7 4.1 .935 174.5 16.0 4.1 MT + Math Total .960 169.6 15.9 3.2 .965 175.9 17.4 3.3 CT - Core Total .964 169.9 15.3 2.9 .967 175.7 16.7 3.0 CT + Core Total .655 169.5 17.8 10.5 20.8 4.1 2.4 .714 176.6 19.4 10.4 22.3 4.2 2.2 31 SS Social Studies .726 169.7 21.2 11.1 21.1 4.5 2.4 .736 176.9 22.5 11.6 22.6 4.3 2.2 31 SC Science .851 169.1 17.1 6.6 18.5 5.7 2.2 .869 176.5 18.8 6.8 20.8 5.5 2.0 28 SI .964 169.6 15.2 2.9 .971 176.4 17.1 2.9 CC - Sources Compoof site Information .964 169.5 15.0 2.8 .971 175.8 17.1 2.9 CC + Composite .955 169.7 16.1 3.4 .957 176.3 17.1 3.5 RPT Reading Profile Total 3:15 PM 32 Comprehension Vocabulary Word Analysis 10/29/10 Number of items Level 8 Reading Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 67 67 15.4 6.6 2.3 12.5 5.2 2.1 11.8 5.1 2.1 24 L3 15.7 6.9 2.4 30 L4 LT Usage & Language Expression Total 16.4 5.7 2.5 31 M1 11.7 4.9 2.0 22 M2 Concepts Problems & Data & InterpreEstimation tation M3 MT - 11.5 5.1 2.1 25 Computation Math Total MT + Math Total CT - CT + Core Total 16.1 6.0 2.4 30 SS Social Studies 13.9 6.1 2.4 30 SC Science 13.1 4.4 2.1 24 S1 13.0 6.0 2.3 28 S2 ST Maps Reference Sources and Materials Total Diagrams Sources of Information CC - CC + Compo- Composite site 21.8 6.2 2.6 35 WA Word Analysis 19.3 4.4 2.5 31 Li Listening RPT Reading Profile Total 18.8 6.5 2.1 .882 14.7 5.5 2.0 .838 14.0 5.3 2.1 .833 18.6 7.0 2.3 .878 .953 19.7 5.7 2.4 .813 13.8 5.1 1.9 .833 .900 16.0 5.6 2.0 .823 .927 .972 .976 18.9 6.0 2.3 .840 16.5 6.3 2.4 .845 15.0 4.6 2.0 .763 16.0 6.3 2.3 .844 .892 .976 .977 23.9 6.3 2.5 .822 21.6 4.5 2.3 .679 .953 .896 .912 .946 Note: -Does not include Computation +Includes Computation Reliability K-R20 .897 .863 .850 .892 .957 .827 .855 .912 .865 .934 .976 .979 .848 .855 .805 .868 .901 .980 .981 .849 .736 .960 Mean 185.0 187.8 186.2 185.8 187.2 188.3 188.7 188.1 185.2 186.6 186.2 185.4 185.8 186.8 186.9 186.8 187.4 187.2 186.3 187.0 187.4 187.3 187.2 184.2 186.2 SD 21.6 24.5 21.7 20.4 29.2 27.4 28.2 22.7 19.2 24.0 20.5 16.7 17.7 19.9 19.1 21.7 25.2 25.0 19.8 21.0 20.0 19.9 28.6 19.2 19.0 7.0 7.3 5.0 6.5 10.8 10.6 9.3 4.7 8.0 9.1 6.1 6.1 4.5 3.1 2.8 8.4 9.6 11.0 7.2 6.6 2.8 2.8 11.1 9.8 3.8 SEM .940 SSs 21.5 8.7 2.6 .896 Mean SD SEM 18.1 6.8 2.2 .885 RSs Spring Reliability K-R20 Mean 175.4 177.5 176.3 175.4 175.1 177.5 176.8 177.0 174.5 176.1 175.6 172.3 174.5 175.9 175.7 176.6 176.9 177.6 176.5 176.5 176.4 175.8 177.6 174.6 176.3 SD 20.6 21.4 20.1 17.9 23.2 23.6 23.9 19.5 17.7 21.3 18.4 13.9 16.0 17.4 16.7 19.4 22.5 21.5 16.8 18.8 17.1 17.1 25.4 17.3 17.1 7.0 6.9 4.9 6.1 9.3 9.6 8.4 4.2 7.6 8.7 5.8 5.8 4.3 2.9 2.6 7.8 8.9 10.4 6.6 6.2 2.6 2.6 10.7 9.8 3.7 SEM 17.9 8.2 2.7 24 L2 Punctuation Core Total SSs 15.0 6.8 2.3 28 L1 Spelling Capitalization Mathematics Mean SD SEM RT Reading Total Language RSs Fall 37 RC RV 29 Comprehension Vocabulary Reading 3:15 PM Number of items 68 10/29/10 Grade 3 Level 9 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 68 Mean SD SEM SSs Mean SD SEM SSs .944 .906 .892 202.5 25.3 8.3 201.2 24.4 5.8 199.9 202.6 23.4 28.7 7.2 9.0 .901 20.3 7.0 2.3 24.4 8.8 2.8 20.3 8.0 2.4 .882 .938 .895 .887 192.2 22.2 7.6 191.8 22.8 5.7 191.1 193.8 22.5 25.9 7.3 8.7 32 L1 17.4 7.1 2.4 RT Spelling 21.8 8.4 2.8 17.3 7.8 2.5 41 Note: -Does not include Computation +Includes Computation Reliability K-R20 Mean SD SEM RSs Spring Reliability K-R20 Mean SD SEM RSs Fall 34 RC RV Reading Total .841 204.0 36.2 14.4 15.8 5.3 2.1 .820 194.0 31.4 13.3 14.4 5.2 2.2 26 L2 Capitalization .853 204.9 34.4 13.2 14.3 5.8 2.2 .823 195.4 30.0 12.6 12.7 5.4 2.3 26 L3 .902 204.9 34.6 10.8 20.4 7.7 2.4 .893 195.1 30.5 10.0 18.4 7.6 2.5 33 L4 19.2 6.9 2.7 36 M1 22.4 7.2 2.6 .848 .956 .872 203.9 200.4 28.4 22.5 6.0 8.0 .950 .845 202.9 28.3 11.1 14.4 5.1 2.0 .817 193.2 25.4 10.9 12.8 4.9 2.1 24 M2 .918 201.6 24.0 6.9 .905 191.7 21.8 6.7 M3 MT - .878 200.7 20.5 7.2 17.0 6.1 2.1 .839 188.8 17.4 7.0 13.5 5.6 2.2 27 Computation Math Total Mathematics Concepts Problems & Data & InterpreEstimation tation 194.5 190.4 24.9 20.2 5.6 7.9 LT Punctu- Usage & Language ation Expression Total Language .940 201.5 21.1 5.2 .929 190.5 18.9 5.0 MT + Math Total CT + Core Total .976 .977 .980 202.5 202.5 23.7 22.9 3.6 3.3 .973 192.6 192.5 21.2 20.4 3.5 3.1 CT - Core Total 16.7 7.3 2.6 34 SC Science 19.0 7.5 2.6 .872 .856 .881 202.6 203.5 26.4 29.2 10.0 10.1 20.8 6.7 2.5 .842 192.6 193.8 23.3 26.6 9.2 9.5 18.2 6.5 2.6 34 SS Social Studies .842 204.1 31.0 12.3 15.1 5.4 2.1 .818 193.2 27.3 11.6 13.2 5.2 2.2 25 S1 .869 203.2 25.2 9.1 18.2 6.4 2.3 .866 192.7 21.7 7.9 15.6 6.5 2.4 30 S2 .913 203.3 26.0 7.7 .905 192.7 22.8 7.0 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .982 203.1 23.9 3.2 .980 192.9 21.3 3.0 CC - .983 203.2 23.9 3.1 .981 192.7 21.4 3.0 CC + Compo- Composite site 3:15 PM Number of items Comprehension Vocabulary Reading 10/29/10 Grade 4 Level 10 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 69 69 Mean SD SEM SSs Mean SD SEM SSs .943 .893 .903 216.9 29.3 9.1 214.6 27.3 6.5 214.0 215.5 25.5 32.2 8.3 10.0 .904 21.9 7.9 2.5 26.2 9.1 2.8 21.7 8.0 2.6 .934 .873 .895 207.7 26.8 8.7 205.8 25.4 6.5 205.1 207.0 24.0 29.9 8.5 9.8 .892 19.4 7.9 2.5 36 L1 Spelling 23.8 8.9 2.9 19.0 7.6 2.7 RT Reading Total Note: -Does not include Computation +Includes Computation Reliability K-R20 Mean SD SEM RSs Spring Reliability K-R20 Mean SD SEM RSs Fall 43 RC RV 37 Comprehension Vocabulary Reading .851 218.5 41.0 15.8 16.8 5.9 2.3 .840 209.0 37.9 15.2 15.5 5.7 2.3 28 L2 Capitalization .870 219.3 40.0 14.4 16.0 6.4 2.3 .852 210.3 36.6 14.1 14.7 6.1 2.3 28 L3 .892 218.9 40.1 13.2 21.5 7.7 2.5 .881 209.8 36.4 12.6 19.8 7.5 2.6 35 L4 21.0 7.4 2.8 40 M1 23.9 7.7 2.7 .858 .960 .874 218.3 214.6 33.3 25.4 6.7 9.0 .955 .861 216.9 32.2 12.0 15.1 5.8 2.2 .841 208.1 29.7 11.9 13.5 5.6 2.2 26 M2 .927 215.7 27.9 7.5 .915 206.9 25.6 7.4 M3 MT - .890 215.3 24.7 8.2 18.0 6.6 2.2 .858 204.2 21.4 8.0 15.2 6.2 2.3 29 Computation Math Total Mathematics Concepts Problems & Data & InterpreEstimation tation 209.2 205.3 30.5 23.7 6.4 8.9 LT Punctu- Usage & Language ation Expression Total Language .947 215.7 24.8 5.7 .936 205.7 22.3 5.6 MT + Math Total CT + Core Total .978 .979 .981 216.5 216.3 27.2 26.3 4.0 3.6 .976 207.6 207.3 25.4 24.2 3.9 3.6 CT - Core Total 18.7 7.9 2.7 37 SC Science 21.0 8.1 2.7 .881 .865 .891 217.0 217.9 31.2 33.3 11.5 11.0 21.2 7.3 2.7 .847 207.2 208.5 28.2 30.6 11.0 10.6 18.8 7.0 2.7 37 SS Social Studies .835 218.1 35.4 14.4 14.6 5.5 2.2 .808 209.1 32.4 14.2 13.4 5.2 2.3 26 S1 .884 217.9 30.0 10.2 19.8 7.1 2.4 .876 208.1 26.8 9.4 17.6 7.1 2.5 32 S2 .917 217.8 30.6 8.8 .906 208.7 27.8 8.5 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .983 217.2 27.7 3.6 .980 208.1 25.1 3.5 CC - .983 217.1 27.5 3.5 .982 207.9 25.2 3.4 CC + Compo- Composite site 3:15 PM Number of items 70 10/29/10 Grade 5 Level 11 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 70 Mean SD SEM SSs Mean SD SEM SSs .944 .892 .901 229.5 32.2 10.1 227.0 29.6 7.0 226.7 227.3 27.5 35.3 9.0 10.7 .908 23.5 8.2 2.6 27.7 9.5 2.9 24.0 8.1 2.7 .896 .938 .878 .899 221.5 30.3 9.8 219.5 28.2 7.0 219.2 220.0 26.3 33.4 9.2 10.6 38 L1 21.4 8.2 2.6 RT Spelling 25.7 9.3 2.9 21.7 7.9 2.8 45 Note: -Does not include Computation +Includes Computation Reliability K-R20 Mean SD SEM RSs Spring Reliability K-R20 Mean SD SEM RSs Fall 39 RC RV Reading Total .825 231.1 44.6 18.6 17.8 5.7 2.4 .813 223.1 42.1 18.2 16.8 5.6 2.4 30 L2 Capitalization .882 232.4 45.4 15.6 18.4 6.7 2.3 .862 224.0 41.5 15.4 17.3 6.5 2.4 30 L3 .904 230.8 45.0 13.9 23.8 8.4 2.6 .895 223.3 41.7 13.5 22.6 8.2 2.7 38 L4 23.6 8.6 2.9 43 M1 26.1 8.8 2.8 .886 .959 .899 230.8 227.6 36.8 28.5 7.4 9.0 .956 .860 229.9 36.3 13.6 17.4 5.9 2.2 .841 221.8 33.7 13.4 16.1 5.8 2.3 28 M2 .929 228.9 30.6 8.2 .921 220.6 28.9 8.1 M3 MT - .858 228.4 29.3 11.1 18.3 6.1 2.3 .814 219.3 25.7 11.1 16.5 5.6 2.4 30 Computation Math Total Mathematics Concepts Problems & Data & InterpreEstimation tation 222.5 219.3 34.8 26.7 7.3 9.0 LT Punctu- Usage & Language ation Expression Total Language .944 228.8 27.9 6.6 .936 219.8 25.8 6.5 MT + Math Total CT + Core Total .979 .979 .980 229.3 229.3 29.8 28.9 4.4 4.1 .976 220.6 220.9 28.2 27.5 4.3 4.0 CT - Core Total 20.5 8.3 2.8 39 SC Science 22.7 8.5 2.7 .888 .863 .897 229.6 230.7 35.5 36.8 13.1 11.8 21.5 7.6 2.8 .842 221.5 221.8 32.5 34.6 12.9 11.6 20.0 7.2 2.9 39 SS Social Studies .830 230.3 39.2 16.2 16.2 5.6 2.3 .815 223.0 37.0 15.9 15.1 5.4 2.3 28 S1 .873 230.8 34.2 12.2 19.9 7.0 2.5 .861 222.6 31.6 11.8 18.2 6.8 2.5 34 S2 .912 230.5 34.2 10.1 .904 222.4 32.0 9.9 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .983 230.1 30.5 4.0 .981 221.7 28.5 4.0 CC - .983 229.8 30.4 4.0 .981 221.4 28.5 3.9 CC + Compo- Composite site 3:15 PM Number of items Comprehension Vocabulary Reading 10/29/10 Grade 6 Level 12 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 71 71 Number of items Mean SD SEM SSs Mean SD SEM SSs .948 .886 .910 240.9 34.0 10.2 238.3 32.1 7.3 238.1 238.4 29.0 38.6 9.8 10.8 .922 24.0 8.9 2.7 28.4 10.7 3.0 24.0 8.3 2.8 .941 .872 .906 233.5 32.8 10.1 231.1 30.4 7.4 231.2 231.4 28.2 36.3 10.1 10.8 .912 22.3 8.9 2.7 40 L1 Spelling 26.5 10.3 3.1 22.0 8.0 2.9 RT Reading Total Note: -Does not include Computation +Includes Computation Reliability K-R20 Mean SD SEM RSs Spring Reliability K-R20 Mean SD SEM RSs 48 RC RV 41 Comprehension Vocabulary Reading .837 242.5 48.1 19.4 19.2 6.1 2.4 .821 235.4 45.9 19.4 18.2 5.9 2.5 32 L2 Capitalization .879 243.6 49.2 17.1 18.6 7.0 2.4 .867 236.8 47.0 17.1 17.6 6.8 2.5 32 L3 .905 241.6 48.8 15.1 24.2 8.7 2.7 .898 234.7 46.3 14.8 22.9 8.5 2.7 40 L4 25.6 9.0 3.0 46 M1 28.0 9.3 2.9 .890 .961 .902 241.6 239.5 40.0 31.1 7.9 9.7 .957 .870 241.3 39.5 14.2 18.0 6.5 2.3 .853 234.0 37.3 14.3 17.0 6.2 2.4 30 M2 .935 240.5 33.9 8.6 .925 233.1 31.7 8.7 M3 MT - .848 240.6 33.5 13.1 16.7 6.2 2.4 .796 231.8 30.1 13.6 15.1 5.5 2.5 31 Computation Math Total Mathematics Concepts Problems & Data & InterpreEstimation tation 234.7 231.7 38.2 29.3 7.9 9.7 LT Punctu- Usage & Language ation Expression Total Language .945 240.4 30.9 7.2 .934 232.4 28.5 7.3 MT + Math Total CT + Core Total .979 .980 .982 240.8 240.9 32.9 31.8 4.6 4.3 .978 233.2 233.0 30.8 29.9 4.6 4.3 CT - Core Total 21.4 8.6 2.8 41 SC Science 23.2 8.8 2.8 .890 .877 .900 240.7 241.7 39.0 39.9 13.7 12.6 21.4 8.3 2.9 .857 233.2 233.9 36.4 37.8 13.8 12.5 19.8 7.8 2.9 41 SS Social Studies .817 242.1 42.9 18.3 15.8 5.6 2.4 .791 234.5 40.4 18.5 15.0 5.3 2.4 30 S1 .884 242.2 37.8 12.9 20.5 7.7 2.6 .876 235.0 35.6 12.5 19.2 7.6 2.7 36 S2 .911 242.3 37.5 11.2 .901 234.9 35.4 11.2 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .984 241.1 33.6 4.3 .982 233.7 31.8 4.3 CC - .984 241.1 33.2 4.2 .982 233.8 31.4 4.2 CC + Compo- Composite site 3:15 PM Fall 72 10/29/10 Grade 7 Level 13 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 72 Mean SD SEM SSs Mean SD SEM SSs .950 .890 .903 251.2 35.6 11.1 248.8 34.1 7.6 248.7 248.9 30.9 41.4 10.2 11.3 .925 24.5 8.8 2.7 30.9 11.2 3.1 24.8 8.5 2.8 .898 .944 .874 .917 244.4 34.5 11.0 242.3 32.8 7.8 241.9 242.4 29.7 39.5 10.5 11.4 42 L1 22.8 8.8 2.8 RT Spelling 29.1 10.9 3.1 22.9 8.2 2.9 52 Note: -Does not include Computation +Includes Computation Reliability K-R20 Mean SD SEM RSs Spring Reliability K-R20 Mean SD SEM RSs Fall 42 RC RV Reading Total .843 251.7 50.5 20.0 20.3 6.3 2.5 .836 246.0 49.0 19.9 19.6 6.2 2.5 34 L2 Capitalization .872 252.4 51.6 18.4 19.0 7.2 2.6 .862 247.2 50.0 18.6 18.3 7.0 2.6 34 L3 .896 251.5 52.6 17.0 24.6 8.8 2.8 .885 245.2 50.2 17.0 23.7 8.5 2.9 43 L4 24.6 9.4 3.1 49 M1 26.7 9.9 3.1 .890 .960 .904 251.6 250.4 42.6 33.5 8.5 10.4 .957 .879 250.9 42.3 14.7 18.7 6.9 2.4 .862 244.9 40.4 15.0 17.8 6.7 2.5 32 M2 .938 250.8 36.1 9.0 .929 244.4 34.5 9.2 M3 MT - .864 251.3 36.8 13.6 15.9 6.7 2.5 .819 243.9 34.1 14.5 14.6 6.0 2.5 32 Computation Math Total Mathematics Concepts Problems & Data & InterpreEstimation tation 245.4 243.5 41.0 32.0 8.5 10.6 LT Punctu- Usage & Language ation Expression Total Language .949 251.0 33.2 7.5 .939 244.1 31.6 7.8 MT + Math Total CT + Core Total .980 .980 .982 250.9 251.0 34.5 33.6 4.8 4.6 .979 244.3 244.1 33.5 32.6 4.9 4.6 CT - Core Total 22.3 8.8 2.9 43 SC Science 23.6 9.0 2.9 .889 .884 .898 250.6 251.5 42.1 42.4 14.3 13.5 23.2 8.7 2.9 .869 244.2 245.0 39.8 40.6 14.4 13.5 22.1 8.3 3.0 43 SS Social Studies .837 251.7 45.6 18.4 16.6 6.0 2.4 .823 245.8 44.1 18.5 15.9 5.9 2.5 31 S1 .872 251.9 40.2 14.4 21.8 7.5 2.7 .863 246.0 38.6 14.3 20.8 7.4 2.8 38 S2 .914 251.7 39.8 11.7 .908 245.8 38.5 11.7 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .984 251.9 35.6 4.5 .983 244.8 34.4 4.5 CC - .984 251.7 35.2 4.4 .983 244.9 33.9 4.5 CC + Compo- Composite site 3:15 PM Number of items Comprehension Vocabulary Reading 10/29/10 Grade 8 Level 14 Table 5.1 (continued) Test Summary Statistics Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 73 73 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 74 ITBS. In general, alternate-forms coefficients tend to be smaller than their internal-consistency counterparts because they are sensitive to more sources of measurement error. The coefficients in Table 5.2 also reflect changes across editions of The Iowa Tests. Equivalent-Forms Reliability Analysis Reliability coefficients obtained by correlating the scores from equivalent forms are considered superior to those derived through internalconsistency procedures because all four major sources of error are taken into account: variations arising within the measurement procedure, changes in the specific sample of tasks, changes in the individual from day to day, and changes in the individual’s speed of work. Internal-consistency procedures take into account only the first two sources of error. For this reason, K-R20 reliability estimates tend to be higher than those obtained through the administration of equivalent forms. Another source of alternate-forms reliability came from the 2000 fall standardization sample. During the fall administration, students took one form of the Complete Battery and a different form of the Survey Battery. The correlations between standard scores on subtests in both Complete and Survey batteries represent indirect estimates of equivalentforms reliability. To render these correlations consistent with the length and variability of Complete and Survey subtests, the estimates reported in Table 5.3 were adjusted for differences in length of the two batteries as well as for differences in variability typically observed between fall and spring test administrations. These reliability coefficients isolate the presence of form-to-form differences in the sample of tasks included on the tests at each level. Equivalent-forms reliability estimates for Total scores and Composites show a tendency to be lower than the corresponding K-R20 coefficients in Table 5.1; however, their magnitudes are comparable to those of internal-consistency reliabilities reported for the subtests of major achievement batteries. The principal reason that equivalent-forms reliability data are not usually provided with all editions, forms, and levels of achievement batteries is that it is extremely difficult to obtain the cooperation of a truly representative sample of schools for such a demanding project. The reliability coefficients in Table 5.2 are based on data from the equating of Form A to Form K. Prior to the 2000 spring standardization, a national sample of students in kindergarten through grade 8 took both Form K and Form A. Between-test correlations from this administration are direct estimates of the alternate-forms reliability of the Table 5.2 Equivalent-Forms Reliabilities, Levels 5–14 Iowa Tests of Basic Skills — Complete Battery, Forms A and K Spring 2000 Equating Sample Reading Level (N) Language Vocabulary Comprehension Reading Total Spelling Capitalization RV RC RT L1 L2 L3 Sources of Information Mathematics Concepts Problems Punctu- Usage & Language Computa& Data & ation Expression Total tion InterpreEstimation tation L4 LT M1 M2 M3 Math Total MT Social Studies Science SS SC Maps Reference Sources and Materials Total Diagrams S1 S2 5 (418) .63 6 (1121) .72 7 (879) .81 .86 .81 .77 .77 .78 .69 .64 8 (1111) .81 .83 .83 .79 .78 .71 .64 .64 9 (684) .78 .82 .80 .76 .73 .79 .76 .76 .73 .73 .72 .73 .77 10 (596) .80 .79 .84 .74 .77 .79 .81 .77 .78 .74 .77 .75 .81 11 (919) .82 .83 .85 .78 .81 .80 .82 .80 .79 .79 .83 .76 .80 12 (824) .86 .78 .88 .78 .81 .81 .80 .77 .78 .76 .78 .74 .78 13 (939) .84 .84 .86 .79 .80 .82 .83 .76 .76 .77 .76 .74 .74 14 (857) .80 .79 .85 .77 .77 .80 .86 .79 .78 .80 .77 .68 .77 74 .93 ST Word Analysis Listening WA Li .74 .74 .78 .75 .76 .82 .78 .70 .74 .80 .67 .76 .82 .67 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 75 Table 5.3 Estimates of Equivalent-Forms Reliability Iowa Tests of Basic Skills — Complete Battery, Forms A and B 2000 National Standardization Time of Year Reading Total Language Total Math Total Math Total Core Total Core Total Level RT LT MT - MT + CT - CT + Fall 9 10 11 12 13 14 .854 .852 .855 .870 .866 .858 .863 .882 .888 .890 .911 .893 .817 .811 .866 .842 .849 .859 .839 .836 .879 .865 .854 .874 .920 .915 .925 .926 .927 .927 .923 .919 .927 .929 .928 .929 Spring 9 10 11 12 13 14 .877 .872 .876 .883 .881 .869 .902 .911 .907 .902 .920 .901 .856 .848 .889 .861 .869 .872 .870 .870 .903 .885 .877 .886 .939 .933 .935 .934 .936 .931 .942 .936 .939 .936 .937 .933 Note: -Does not include Computation +Includes Computation Sources of Error in Measurement Further investigation of sources of error in measurement for the ITBS was provided in two studies of equivalent-forms reliability. The first (Table 5.4) used data from the spring 2000 equating of Forms K and A. The second (Table 5.5) used data from the fall 1995 equating of Forms K and M of the Primary Battery. As previously described, Forms K and A of the ITBS were given to a large national sample of schools selected to be representative with respect to variability in achievement. Order of administration of the two forms was counterbalanced by school, and there was a seven- to ten-day lag between test administrations. The design of this study made possible an analysis of relative contributions of various sources of measurement error across tests, grades, and schools. In addition to equivalent-forms reliability coefficients, three other “within-forms” reliability coefficients were computed for each school, for each test, for each form, and in each sequence: • K-R20 reliability coefficients were calculated from the item-response records. • Split-halves, odds-evens (SHOE) reliability coefficients were computed by correlating the raw scores from odd-numbered versus even-numbered items. Full-test reliabilities were estimated using the Spearman-Brown formula. • Split-half, equivalent-halves (SHEH) reliability coefficients were obtained by dividing items within each form into equivalent half-tests in terms of content, difficulty, and test length. For tests composed of discrete items such as Spelling, equivalent halves were assembled by matching pairs of items testing the same skill and having approximately the same difficulty index. One item of the pair then was randomly assigned to the X half of the test and the other was assigned to the Y half. For tests composed of testlets (sets of items associated with a common stimulus) such as Reading, the stimulus and the items dependent on it were treated as a testing unit and assigned intact to the X half or Y half. Small adjustments in the composition of equivalent halves were made for testlet-based tests to balance content, difficulty, and number of items. After equivalent halves were assembled, a correlation coefficient between the X half and the Y half was computed. The full-test equivalent75 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 76 as its predecessor, Form K, but was designed to allow more space on each page for item locator art and other decorative features. To ensure that the formatting changes had no effect on student performance, a subsample of the 1995 Form M equating sample was administered with both Forms K and K/M of the Primary Battery in counterbalanced order. halves reliabilities were estimated using the Spearman-Brown formula. Differences between equivalent-halves estimates obtained in the same testing session and equivalentforms estimates obtained a week or two apart constitute the best evidence on the effects of changes in pupil motivation and behavior across several days. The means of the grade reliabilities by test are reported in Table 5.4. Overall, the estimates of the three same-day reliability coefficients are similar. The same-day reliabilities varied considerably among the individual tests, however. For the Reading and the Maps and Diagrams tests, the equivalent-halves reliabilities are nearer to the equivalent-forms reliabilities. These lower reliability estimates are due to the manner in which the equivalent halves were established for these two tests. Table 5.5 contains correlations between scores from the two administrations of Levels 5 through 8 during the 1995 equating study. These values represent direct evidence of the contribution of between-days sources of error to unreliability. These sources of error are thought to be especially important to the interpretation of scores on achievement tests for students in the primary grades. Although some variation exists, the estimates of test-retest reliability are generally consistent with the internal-consistency estimates for these levels reported in Table 5.1. The correlations in Table 5.5 suggest a substantial degree of stability in the performance of students in the early elementary grades over a short time interval (Mengeling & Dunbar, 1999). Another study of sources of error in measurement was completed during the 1995 equating of Form M to Form K. With the introduction of Form M of the ITBS, Levels 9 through 14, a newly formatted edition of Form K, Levels 5 through 8, was developed and designated Form K/M. This edition of the Primary Battery contained exactly the same items Table 5.4 Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests Iowa Tests of Basic Skills — Complete Battery, Forms K and A 76 Form K Form A Test K-R20 SHOE SHEH K-R20 SHOE SHEH EFKA Vocabulary Reading Comprehension Spelling Capitalization Punctuation Usage and Expression Math Concepts and Estimation Math Problem Solving and Data Interpretation Mathematics Computation Social Studies Science Maps and Diagrams Reference Materials .871 .890 .889 .839 .837 .869 .869 .840 .896 .836 .851 .832 .879 .876 .894 .888 .842 .838 .876 .877 .849 .903 .838 .859 .837 .893 .877 .865 .893 .842 .844 .867 .871 .840 .910 .836 .848 .809 .866 .873 .904 .894 .828 .850 .887 .866 .845 .854 .851 .873 .816 .856 .880 .907 .894 .823 .840 .889 .875 .858 .872 .854 .881 .828 .865 .875 .886 .895 .831 .859 .886 .866 .837 .873 .846 .869 .781 .844 .816 .809 .846 .773 .780 .804 .814 .774 .771 .764 .771 .733 .777 Mean .861 .867 .859 .861 .867 .858 .787 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 77 Table 5.5 Test-Retest Reliabilities, Levels 5–8 Iowa Tests of Basic Skills — Complete Battery, Form K Fall 1995 Equating Sample Reading Language Mathematics Problems Listening Language Concepts Computa& Data tion Interpretation Science Sources Word of Analysis Information Vocabulary Comprehension Reading Total RV RC RT Li L M1 M2 M3 MT - SS SC SI WA 5 (N > 826) .74 — — .71 .80 — — — .81 — — — .80 6 (N > 767) .88 .83 .90 .78 .82 — .81 — .85 — — — .86 7 (N > 445) .90 .93 — .76 .90 .83 .83 .77 — .82 .80 .84 .87 8 (N > 207) .91 .93 — .83 .88 .84 .85 .72 — .82 .83 .85 .91 Level Math Total Social Studies Note: ⫺Does not include Computation The most important result of these analyses is the quantification of between-days sources of measurement error and their contribution to unreliability. Reliability coefficients based on internal-consistency analyses are not sensitive to this source of error. Standard Errors of Measurement for Selected Score Levels A study of examinee-level standard errors of measurement based on a single test administration was conducted by Qualls-Payne (1992). The single administration procedures investigated were those originated by Mollenkopf (1949), Thorndike (1951), Keats (1957), Feldt (1984), and Jarjoura (1986), and a modified three-parameter latent trait model. The accuracy and reliability of estimates varied across tests, grades, and criteria. The procedure recommended for its agreement with equivalent-forms estimates was Feldt’s modification of Lord’s binomial error model, with partitioning based on a content classification system. Application of this procedure provides more accurate estimates of individual standard errors of measurement than have previously been available from a single test administration. For early editions of the ITBS, score-level standard errors of measurement were estimated using data from special studies in which students were administered two parallel forms of the tests. Since that time, additional research has produced methods for estimating the standard error of measurement at specific score levels that do not require multiple test administrations. These conditional SEMs were estimated from the 2000 spring national standardization of Form A of the ITBS. Additional tables with conditional SEMs for Form B are available from the Iowa Testing Programs. The form-to-form differences in these values are minor. The results in Table 5.6 were obtained using a method developed by Brennan and Lee (1997) for smoothing a plot of conditional standard errors for scaled scores based on the binomial error model. In addition to this method, an approach developed by Feldt and Qualls (1998) and another based on bootstrap techniques were used at selected test levels. Because the results of all three methods agreed closely and generally matched the patterns of varying SEMs by score level found with previous editions of the tests, only the results of the Brennan and Lee method are provided. 77 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 78 Table 5.6 Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Test Level 5 6 Score Level Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT 3.56 5.74 4.45 90–99 2.32 2.91 3.42 3.85 2.94 100–109 4.05 4.66 4.86 4.99 4.89 110–119 7.45 5.76 4.84 4.29 5.59 120–129 8.64 6.18 4.66 4.17 4.80 130–139 9.92 7.24 5.87 4.97 5.22 140–149 11.33 8.14 6.58 5.74 5.83 150–159 11.90 8.59 5.55 4.73 5.36 160–169 11.08 7.43 170–179 8.67 2.37 3.09 2.39 3.66 5.35 90–99 2.39 100–109 4.10 2.41 110–119 5.37 6.50 3.65 4.79 5.94 5.10 6.47 6.29 120–129 7.41 7.17 3.45 6.22 5.57 4.20 5.28 5.21 130–139 9.50 5.31 2.59 7.60 6.16 2.86 4.90 5.61 140–149 10.32 3.70 2.71 8.51 7.07 4.59 6.28 5.94 150–159 10.85 5.30 4.45 9.41 8.02 7.43 8.24 6.86 160–169 11.08 6.66 4.80 10.34 8.81 9.09 7.96 170–179 10.50 10.49 8.24 8.47 7.81 180–189 9.08 9.57 6.71 6.31 6.13 190–199 7.14 7.69 200–209 78 Vocabulary 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 79 Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Reading Mathematics Word Analysis Test Level Score Level 100–109 7 Vocabulary Comprehension V RC 4.06 Spelling Language Concepts WA Li L1 2.91 3.44 5.73 L 4.69 Problems & Data Computation Interpretation M1 M2 2.99 3.73 M3 Social Studies Science Sources of Information SS SC SI 2.98 2.19 4.67 110–119 6.02 3.83 4.46 5.65 7.12 5.82 5.37 6.94 6.62 4.96 4.38 6.61 120–129 7.29 4.88 6.59 5.77 7.51 4.96 6.88 8.63 6.90 6.09 5.95 7.48 130–139 7.28 5.25 7.14 6.97 6.71 4.54 7.05 8.67 4.35 6.58 8.63 6.61 140–149 5.81 3.92 6.45 7.93 4.03 4.38 6.99 7.91 3.24 7.78 10.71 4.75 150–159 5.61 3.76 7.13 8.57 4.01 5.39 7.47 7.48 4.51 9.22 11.92 4.69 160–169 7.15 7.44 9.17 9.70 5.31 5.13 170–179 9.64 10.12 10.08 9.68 180–189 10.65 9.56 8.80 7.89 7.37 190–199 8 Listening 6.65 8.70 7.79 10.11 12.86 8.36 6.68 9.04 8.40 10.30 13.10 8.00 8.09 7.11 9.58 11.98 7.90 10.76 100–109 4.00 2.43 3.00 2.94 2.95 2.94 2.40 5.30 110–119 6.12 4.26 4.70 5.27 5.70 4.66 5.47 5.67 6.08 5.47 3.99 7.38 120–129 8.05 5.52 6.80 6.83 7.24 5.05 7.02 8.26 7.48 6.92 6.38 7.69 130–139 8.89 5.14 8.35 7.10 7.49 5.55 7.72 9.73 7.32 7.33 7.43 7.62 140–149 8.34 4.43 9.08 7.98 7.82 5.35 8.17 10.05 5.63 8.69 8.66 6.82 150–159 6.80 4.39 9.19 8.71 7.11 5.39 8.68 9.49 5.32 10.41 10.44 5.74 160–169 6.09 6.93 9.36 8.89 6.20 5.62 8.76 8.61 6.19 11.61 11.94 6.51 170–179 8.28 9.88 10.05 9.95 6.87 6.94 8.64 8.44 6.94 12.40 13.12 10.41 180–189 11.26 11.80 10.71 10.93 7.35 9.03 8.75 9.15 8.36 12.07 14.31 10.55 190–199 12.19 12.74 10.91 11.12 9.91 8.90 9.41 9.19 11.09 14.74 8.75 200–209 11.61 12.44 10.53 10.49 8.98 7.47 8.90 8.20 9.59 14.42 210–219 8.67 10.37 9.61 8.88 7.28 12.66 220–229 7.38 7.48 9.14 79 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 80 Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Reading Test Level Score Level Vocabulary Comprehension RV RC Language Sources of Information Mathematics Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation L1 L2 L3 L4 M1 Problems Computa& Data tion Interpretation M2 M3 Social Studies Science SS SC 110–119 9 Maps Reference and Materials Diagrams S1 S2 5.38 120–129 4.63 5.08 5.92 4.24 6.56 5.08 4.13 5.62 130–139 8.08 7.22 7.76 7.04 8.34 7.86 7.26 8.60 140–149 10.24 8.68 8.61 10.06 9.64 9.76 8.45 10.16 7.97 150–159 10.43 8.99 7.91 10.53 10.07 9.92 8.20 10.40 7.31 160–169 8.93 8.11 6.18 9.63 9.73 8.76 8.20 9.97 7.27 170–179 6.85 6.61 5.63 9.28 9.11 7.16 8.42 9.06 6.51 180–189 6.61 6.52 6.50 11.29 9.99 8.49 8.34 9.08 6.02 190–199 7.60 8.21 8.75 14.56 13.62 12.41 8.96 200–209 9.74 9.63 12.04 17.24 16.73 14.96 10.81 210–219 11.55 11.09 13.88 18.58 18.53 16.61 12.46 12.86 8.25 220–229 10.80 12.00 19.00 19.48 16.60 12.27 10.45 230–239 11.98 17.81 18.92 15.81 9.99 240–249 10.64 15.92 16.34 15.02 250–259 14.03 11.47 10 4.82 4.00 130–139 7.35 6.36 7.07 140–149 10.14 7.82 150–159 11.86 160–169 170–179 7.75 7.60 8.03 11.59 9.03 9.98 9.76 7.54 8.68 12.12 8.81 10.45 9.84 7.77 8.98 11.54 7.69 10.62 10.78 8.11 8.98 10.35 6.44 11.20 10.93 9.11 9.31 10.70 7.20 11.43 9.97 10.94 7.52 10.23 11.12 13.76 9.51 11.99 10.72 12.49 9.16 11.05 13.18 16.72 11.70 13.19 12.71 7.49 13.65 13.67 13.86 12.24 8.84 13.94 16.42 12.52 11.98 9.50 11.59 13.87 10.05 9.00 8.15 11.32 8.00 6.64 6.18 9.95 6.91 8.73 10.12 10.53 10.28 8.71 9.99 9.47 7.24 8.32 10.95 9.53 9.16 8.54 11.98 12.17 11.80 9.76 11.41 9.23 7.16 9.95 12.06 9.82 7.94 12.54 12.82 11.76 9.99 12.10 8.74 9.01 11.12 12.62 9.20 10.89 9.44 7.42 12.65 12.48 10.21 9.75 12.24 9.07 9.71 10.99 12.21 7.77 180–189 8.00 8.55 7.07 13.05 11.69 8.12 8.91 11.85 8.16 9.26 9.74 11.23 7.10 190–199 6.31 7.96 7.68 14.69 11.86 8.63 7.37 11.48 6.99 9.37 8.27 11.16 8.30 200–209 6.97 9.15 9.62 18.20 14.16 12.24 7.66 12.46 7.50 11.08 9.75 13.47 11.54 210–219 8.25 11.46 12.27 21.39 16.31 15.38 9.69 14.17 8.62 12.81 12.45 15.68 14.61 8.07 11.95 10.02 220–229 9.71 12.76 14.33 23.02 18.12 17.65 11.59 15.11 9.74 14.14 14.76 17.44 16.59 230–239 11.09 13.00 15.43 23.32 18.99 19.10 12.80 15.04 9.09 14.43 15.79 17.92 17.22 240–249 11.27 12.58 15.67 23.62 18.61 18.99 12.84 14.16 13.62 15.43 17.43 16.94 11.05 14.40 22.75 17.41 17.75 10.04 12.26 11.87 13.82 16.94 15.54 260–269 9.16 10.89 20.30 15.43 16.47 8.99 8.95 12.14 14.79 13.03 270–279 7.14 16.60 12.81 15.18 290–299 11.86 6.73 9.04 11.39 14.73 18.31 12.92 11.68 9.39 10.99 7.06 8.58 10.98 14.77 18.55 13.21 7.36 14.23 4.06 9.97 4.00 280–289 3.87 5.32 6.65 9.71 Li 7.03 5.11 7.62 WA 4.39 7.76 250–259 80 3.95 6.67 Listening 6.56 7.27 110–119 120–129 Word Analysis 8.98 11.19 9.45 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 81 Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Reading Test Level Score Level Vocabulary Comprehension RV RC Language Spelling Capitalization Punctuation L1 L2 L3 120–129 130–139 11 Usage & Expression Concepts & Estimation L4 M1 Problems Computa& Data tion Interpretation M2 M3 Social Studies Science SS SC 4.00 6.07 5.20 8.00 5.46 6.60 6.75 Maps Reference and Materials Diagrams S1 S2 7.93 5.74 7.55 6.34 6.14 10.64 140–149 8.56 6.84 9.57 8.16 9.65 9.82 7.82 9.35 8.72 7.91 8.15 12.22 8.69 150–159 11.08 9.00 9.48 11.10 12.69 12.42 9.08 11.20 10.60 8.86 9.34 13.69 10.33 160–169 12.63 10.84 8.49 13.18 14.30 13.79 10.00 13.00 11.06 10.18 10.23 14.80 9.98 170–179 12.39 11.04 8.44 14.17 14.83 13.27 10.52 13.56 11.25 11.61 11.07 15.38 8.94 180–189 10.81 10.42 8.47 14.59 14.60 11.30 10.62 13.36 10.77 12.20 11.10 15.37 8.42 190–199 9.36 9.89 8.16 14.81 13.90 9.96 10.14 12.66 9.10 11.65 10.13 14.80 8.05 200–209 8.50 9.67 8.54 15.53 13.26 11.16 9.43 12.14 7.76 10.79 9.53 14.13 8.56 210–219 8.14 9.97 9.77 17.60 14.57 14.06 9.08 12.55 8.10 11.09 10.37 14.48 11.22 220–229 8.36 10.64 10.99 19.86 17.06 17.20 9.09 14.06 10.23 12.72 12.17 15.85 14.26 230–239 8.70 11.18 12.40 21.38 18.82 19.49 9.93 15.01 11.80 14.00 14.06 17.03 16.07 240–249 9.65 11.96 13.48 22.54 19.43 20.66 11.39 15.29 12.05 14.76 15.50 17.63 16.95 250–259 10.52 13.07 13.81 23.00 19.40 21.32 11.97 14.68 10.78 15.12 15.98 17.78 17.29 260–269 10.25 13.27 13.42 22.39 18.69 21.28 11.28 12.84 8.31 15.04 15.66 16.68 16.25 270–279 8.42 12.65 12.32 20.53 17.46 20.34 9.97 10.88 14.01 13.97 14.86 13.59 280–289 11.12 9.88 17.46 15.91 18.47 7.76 8.21 11.90 11.40 12.18 10.54 290–299 8.36 13.56 14.09 14.24 8.40 8.57 8.64 9.16 11.30 9.23 4.74 5.37 5.83 300–309 120–129 12 Sources of Information Mathematics 3.98 130–139 5.32 4.79 5.75 6.72 6.30 6.13 8.30 140–149 7.33 7.28 8.22 6.65 7.62 8.63 8.12 8.37 7.48 8.35 7.97 11.23 7.85 150–159 9.86 8.44 10.31 9.65 10.53 12.96 9.67 10.03 9.29 9.08 9.01 12.26 9.53 160–169 11.90 9.68 9.31 13.13 12.91 14.47 10.22 12.07 10.38 9.91 10.11 12.88 9.81 170–179 12.47 10.83 7.88 14.68 13.99 14.65 10.40 13.32 11.22 11.57 11.85 14.11 9.98 180–189 12.19 11.32 8.94 15.71 14.34 13.54 10.43 13.52 11.79 12.86 12.84 14.94 10.46 190–199 11.37 11.43 9.63 17.00 14.03 11.63 10.06 13.25 12.04 13.49 12.57 15.34 10.71 200–209 10.41 11.39 9.80 18.31 13.96 10.86 9.51 12.78 12.03 13.41 11.44 15.73 10.54 210–219 9.62 11.29 9.97 20.19 14.73 12.01 9.22 13.33 11.95 12.88 10.37 16.71 11.16 220–229 9.24 11.10 10.26 22.41 16.66 14.02 9.06 15.03 12.13 12.97 10.66 17.95 13.64 230–239 9.14 11.04 10.99 23.91 18.66 16.41 9.19 16.67 12.73 14.06 12.35 19.37 15.72 240–249 8.72 11.41 12.08 25.16 20.57 19.37 9.71 17.95 13.24 15.20 14.37 20.56 16.97 250–259 8.81 12.03 13.37 25.92 21.77 21.54 11.16 18.39 13.30 16.05 15.97 20.97 17.68 260–269 10.59 12.79 14.56 25.93 21.79 22.55 12.86 17.89 12.75 16.38 16.76 20.91 17.40 270–279 12.08 13.05 14.66 25.06 21.11 22.88 13.23 16.72 11.64 16.15 16.82 20.26 16.26 280–289 11.63 12.62 13.89 23.26 20.40 22.25 12.64 14.88 10.14 15.62 15.94 18.97 15.06 290–299 11.32 12.25 20.69 19.69 20.55 9.74 12.30 7.54 14.21 14.11 17.04 12.95 300–309 9.29 9.31 17.59 17.54 17.86 11.44 11.53 14.59 10.54 310–319 14.30 14.65 14.48 8.63 8.64 8.14 10.11 8.14 320–329 11.04 10.83 10.41 6.06 330–339 7.65 81 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 82 Table 5.6 (continued) Standard Errors of Measurement for Selected Standard Score Levels Iowa Tests of Basic Skills—Complete Battery, Form A 2000 National Standardization Reading Vocabulary Test Level 82 Score Level 13 130–139 140–149 150–159 160–169 170–179 180–189 190–199 200–209 210–219 220–229 230–239 240–249 250–259 260–269 270–279 280–289 290–299 300–309 310–319 320–329 330–339 340–349 350–359 14 130–139 140–149 150–159 160–169 170–179 180–189 190–199 200–209 210–219 220–229 230–239 240–249 250–259 260–269 270–279 280–289 290–299 300–309 310–319 320–329 330–339 340–349 350–359 360–369 Language Comprehension Spelling RV RC L1 5.69 7.19 8.87 11.76 14.04 15.03 15.11 14.28 12.65 11.22 10.44 9.56 8.65 8.71 10.17 11.56 11.81 10.16 4.77 6.94 8.50 11.09 13.29 14.12 14.07 13.22 11.92 10.89 10.32 10.17 10.38 11.08 12.26 13.16 13.36 12.70 11.13 8.89 6.38 8.48 10.85 13.39 15.26 15.93 15.84 14.97 13.36 11.69 10.67 9.98 9.09 8.59 9.92 11.60 12.31 10.99 3.93 5.87 7.68 9.84 12.18 14.28 15.21 15.07 14.13 13.00 12.02 11.36 10.92 11.11 11.61 12.20 12.69 12.76 12.08 11.27 9.68 8.45 10.16 10.65 10.37 10.19 10.06 9.81 9.85 10.12 10.29 10.72 11.86 13.05 13.72 13.85 13.55 11.93 8.36 9.72 11.10 11.00 10.65 10.61 10.97 11.08 11.01 11.39 12.02 12.75 13.48 13.88 13.92 13.52 12.61 11.54 9.97 7.54 Capitalization L2 Punctuation Sources of Information Mathematics Usage & Expression Concepts & Estimation Problems Computa& Data tion Interpretation M3 Science SS SC S1 8.30 10.74 12.47 13.76 15.38 17.17 18.66 19.69 20.06 20.42 20.85 21.54 22.26 22.53 22.46 21.91 20.79 19.11 15.79 11.14 8.17 5.95 L3 L4 M1 4.39 6.28 8.71 11.59 14.00 15.90 17.08 18.34 19.80 21.57 23.50 24.76 25.79 26.40 26.37 25.56 23.87 21.39 18.34 13.69 9.84 7.33 4.77 7.19 10.73 14.04 15.70 16.45 16.63 16.46 16.38 17.23 19.23 21.01 22.01 22.65 22.83 22.47 21.58 20.17 17.31 14.09 11.78 8.75 5.26 8.44 12.25 15.03 16.13 16.08 14.97 13.28 12.47 13.16 15.31 17.93 20.64 22.26 22.89 23.04 22.66 21.71 20.19 18.10 15.48 12.36 8.65 5.75 7.52 8.85 9.80 10.53 11.14 11.37 11.20 10.81 10.34 9.87 9.85 10.33 10.86 11.26 12.02 12.88 12.34 9.98 7.46 10.81 12.65 13.94 14.67 14.79 14.68 14.76 15.24 16.17 16.97 17.38 17.16 16.16 14.73 13.64 11.80 8.91 7.48 9.02 10.59 13.30 15.26 16.51 17.14 17.14 16.59 15.96 15.25 14.31 13.32 12.08 10.19 8.92 7.88 7.74 9.91 10.64 12.04 13.86 14.95 14.94 14.37 13.91 14.10 14.61 15.11 15.30 15.03 14.45 13.87 13.22 12.01 10.44 7.61 6.93 8.93 10.09 11.95 13.69 14.17 13.89 12.88 11.85 11.67 13.10 14.86 16.06 16.64 16.68 16.20 15.06 13.39 10.92 7.38 5.44 8.31 10.88 13.20 14.89 16.23 17.76 20.27 22.52 24.13 25.65 26.92 27.77 28.02 27.59 26.46 24.70 22.44 19.87 15.91 12.20 8.42 6.05 10.71 14.57 16.76 17.66 18.22 18.23 18.05 18.26 18.98 20.23 21.66 22.83 23.38 23.16 22.44 21.28 19.77 18.04 15.35 12.81 11.20 8.91 7.07 9.22 10.59 11.51 12.23 13.00 13.20 12.83 12.24 11.67 11.20 11.09 11.08 10.81 10.27 9.88 9.74 9.49 9.30 8.19 6.78 9.52 12.24 13.77 14.65 15.11 15.41 15.92 16.66 17.45 17.89 17.95 17.61 16.76 15.58 14.22 12.74 11.95 11.16 9.29 8.74 11.56 13.28 14.58 16.26 17.73 18.42 18.86 18.71 18.05 17.19 16.25 14.96 13.63 12.37 10.72 9.08 8.37 7.15 7.06 8.83 9.09 10.16 12.81 14.76 15.75 15.91 15.61 15.19 14.90 14.93 15.08 15.34 15.72 15.93 15.64 14.65 12.60 10.29 8.22 5.98 6.13 7.99 9.64 12.27 14.55 15.58 15.80 15.32 14.17 13.08 12.65 13.44 15.00 16.38 17.35 17.55 17.02 15.88 14.08 11.92 9.46 6.38 4.70 7.02 10.83 14.92 16.52 17.02 16.66 15.98 15.77 16.25 17.27 18.61 20.01 20.96 21.68 22.11 22.11 21.82 20.65 18.68 16.90 13.90 11.07 8.50 M2 Social Studies Maps Reference and Materials Diagrams 8.67 11.16 12.39 13.76 15.35 17.04 18.45 19.12 19.74 20.10 20.35 20.68 20.98 21.50 21.90 21.95 21.69 21.04 19.92 17.37 14.18 10.64 6.63 S2 8.43 10.35 10.78 10.55 11.01 11.87 12.15 11.65 11.57 12.96 14.87 16.29 17.04 17.24 16.89 16.27 14.79 12.84 11.17 8.70 9.00 10.49 10.29 10.20 11.03 11.70 12.38 13.50 14.85 16.48 18.15 19.13 19.53 19.33 18.80 17.58 16.05 14.26 12.22 10.55 8.14 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 83 Effects of Individualized Testing on Reliability The extensive reliability data reported in the preceding sections are based on group rather than individualized testing and on results from schools representing a wide variety of educational practices. The reliability coefficients obtained in individual schools may be improved considerably by optimizing the conditions of test administration. This can be done by attending to conditions of student motivation and attitude toward the tests and by assigning appropriate test levels to individual students. One of the important potential values of individualized testing is improvement of the accuracy of measurement. The degree to which this is realized, of course, depends on how carefully test levels are assigned and how motivated students are in the testing situation. The reliability coefficients reported apply to the consistency of placing students in grade groups. The reliability with which tests place students along the total developmental continuum has been investigated by Loyd (1980). She examined the effects of individualized or functional-level testing on reliability in a sample of fifth- and sixth-grade students. Two ITBS tests, Language Usage and Math Concepts, were selected for study because they represent different degrees of curriculum dependence. In Loyd’s study, each student was administered an in-level Language Usage test and one of four levels of the parallel form, ranging from two levels below to one level above grade level. Similarly, each student was administered an in-level Math Concepts test and one of four levels of the parallel form, ranging from two levels below to one level above grade level. To address the issue of which test level provided more reliable assessment of developmental placement, an independent assessment of developmental placement was provided. This was done by administering an independently scaled, broad range test, or “scaling test” test, to comparable samples in each of grades 3 through 8. Estimates of reliability and expected squared error in grade-equivalent scores obtained at each level were analyzed for each reading achievement grade level. For students at or below grade level, administering an easier test produced less error. This suggests that testing such students with a lower test level may result in more reliable measurement. For both Language Usage and Math Concepts, testing above-average students with lower test levels introduced significantly more error into derived scores. The results of this study provide support for the validity of individualized or out-of-level testing. Individualized testing is most often used to test students who are atypical when compared with their peer group. For students lagging in development, the results suggest that a lower level test produces comparable derived scores, and these derived scores may be more reliable. For students advanced in development, the findings indicate that testing with a higher test level results in less error and therefore more reliable derived scores. Stability of Scores on the ITBS The evidence of stability of scores over a long period of time and across test levels has a special meaning for achievement tests. Achievement may change markedly during the course of the school year, or from the spring of one school year to the fall of the next. In fact, one goal of good teaching is to alter patterns of growth that do not satisfy the standards of progress expected for individual students and for groups. If the correlations between achievement test scores for successive years in school are exceedingly high, this could mean that little was done to adapt instruction to individual differences that were revealed by the tests. In addition to changes in achievement, there are also changes in test content across levels because of the way curriculum in any achievement domain changes across grades. Differences in test content, while subtle, tend to lower the correlations between scores on adjacent test levels. Despite these influences on scores over time, when equivalent forms are used in two test administrations, the correlations may be regarded as lower-bound estimates of equivalent-forms reliability. In reporting stability coefficients for such purposes, it is important to remember that they are attenuated, not only by errors of measurement, but also by differences associated with changes in true status and in test content. The stability coefficients reported in Table 5.7 are based on data from the 2000 national standardization. In the fall, subsamples of students who had taken Form A the previous spring were administered the next level of either Form A or Form B. The correlations in Table 5.7 are based on the developmental standard scores from the spring and fall administrations. 83 84 467 1071 1028 911 890 834 779 942 907 902 840 6(1)A 7(2)A 7(2)B 7(2)A 7(2)B 8(3)A 8(3)B 9(3)A 9(3)B 6(K)A 6(1)A 6(1)A 7(1)A 7(1)A 8(2)A 8(2)A 8(2)A 8(2)A 9(3)A 10(4)A 9(3)A 10(4)B 10(4)A 11(5)A 10(4)A 11(5)B 11(5)A 12(6)A 11(5)A 12(6)B 12(6)A 13(7)A 12(6)A 13(7)B 13(7)A 14(8)A 13(7)A 14(8)B .77 .77 .79 .82 .78 .78 .76 .81 .74 .77 .74 .78 .77 .81 .77 .77 .34 .52 .67 .43 .79 .74 .76 .82 .76 .79 .79 .79 .75 .78 .70 .72 .70 .81 .73 .73 .71 .74 RC Note: -Does not include Computation +Includes Computation 541 627 602 306 335 458 554 155 RV Comprehension .84 .81 .83 .88 .84 .85 .85 .86 .82 .84 .80 .84 .83 .87 .83 .81 .77 .79 RT Reading Total .83 .81 .84 .84 .80 .83 .81 .80 .75 .73 .70 .71 .71 .72 .72 .72 L1 Spelling .72 .72 .74 .73 .71 .71 .66 .70 .63 .62 L2 Capitalization .76 .73 .77 .77 .74 .77 .71 .74 .68 .68 L3 .75 .72 .75 .77 .72 .73 .73 .76 .71 .71 L4 .87 .83 .88 .88 .86 .86 .85 .87 .82 .81 .74 .74 .74 .77 .75 .72 .49 .57 .74 .56 LT Punctu- Usage & Language ation Expression Total Language .79 .77 .77 .78 .76 .75 .76 .77 .69 .69 .65 .64 .68 .71 .55 .71 M1 .73 .68 .72 .72 .73 .72 .71 .72 .70 .68 .62 .61 .75 .75 .64 .64 M2 Concepts Problems & Data & InterpreEstimation tation .82 .78 .81 .81 .82 .80 .82 .81 .79 .76 .74 .71 .81 .81 .70 .76 .71 .70 .75 .66 MT - Math Total .68 .66 .57 .64 .64 .62 .58 .62 .58 .52 .57 .54 .61 .60 .53 .58 M3 Computation Mathematics .84 .79 .80 .82 .82 .81 .82 .83 .80 .77 .76 .75 .81 .81 .73 .77 MT + Math Total .90 .84 .90 .90 .89 .89 .90 .91 .88 .86 .86 .86 .88 .90 .85 .86 .64 .72 .85 .67 CT - Core Total .90 .84 .90 .90 .89 .89 .91 .91 .87 .86 .87 .86 .88 .90 .86 .85 CT + Core Total .72 .70 .73 .78 .72 .73 .72 .76 .68 .70 .55 .54 .57 .71 .57 .63 SS Social Studies .69 .69 .71 .76 .70 .73 .74 .75 .69 .67 .50 .49 .60 .76 .50 .63 SC Science .67 .67 .64 .69 .64 .67 .67 .70 .61 .61 S1 .69 .66 .73 .73 .69 .66 .67 .66 .69 .63 S2 .76 .73 .77 .79 .75 .74 .76 .77 .74 .70 .65 .65 .63 .71 .64 .68 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .89 .85 .89 .91 .88 .89 .90 .91 .87 .86 .84 .85 .88 .91 .87 .87 CC - .88 .84 .88 .90 .87 .88 .90 .90 .86 .85 .84 .84 .87 .91 .87 .87 CC + Compo- Composite site .73 .81 .66 .74 .59 .67 .61 .57 WA Word Analysis .57 .66 .53 .62 .47 .62 .67 .60 Li Listening .87 .91 .86 .85 .77 .82 RPT Reading Profile Total 3:15 PM 503 6(1)A 5(K)A N Fall Vocabulary Reading 10/29/10 Spring Level (Grade) Form Table 5.7 Correlations Between Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Forms A and B Spring and Fall 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 84 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 85 The top row in the table shows correlations for 503 students who took Form A, Level 5 in the spring of kindergarten and Form A, Level 6 in the fall of grade 1. Row 2 shows within-level correlations for 155 students who took Form A, Level 6 in both the spring of kindergarten and the fall of grade 1. Beginning in row 4 and continuing on alternate rows are correlations between scores on alternate forms. Additional evidence of the stability of ITBS scores is based on longitudinal data from the Iowa Basic Skills Testing Program. Mengeling (2002) identified school districts that had participated in the program and had tested fourth-grade students in school years 1993–1994, 1994–1995, and 1995–1996. Each district had also tested at the same time of year and in grades 5, 6, 7, and 8 in successive years. Matched records were created for 40,499 students who had been tested at least once during the years and grades included in the study. Approximately 50 percent of the records had data for every grade, although all available data were used as appropriate. The correlations in Table 5.8 provide evidence regarding the stability of ITBS scores over the upper-elementary and middle-school years. The relatively high stability coefficients reported in Tables 5.7 and 5.8 support the reliability of the tests. Many of the year-to-year correlations are nearly as high as the equivalent-forms reliability estimates reported earlier. The high correlations also indicate that achievement in the basic skills measured by the tests was very consistent across the same years analyzed in research a decade or more earlier (Martin, 1985). As discussed previously, these results might suggest that schools are not making the most effective use of test results. On the other hand, stability rates are associated with level of performance. That is, there is a tendency for aboveaverage students to obtain above-average gains in performance and for below-average students to achieve more modest gains. Table 5.8 Correlations Between Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Forms K and L Iowa Basic Skills Testing Program 4th to 5th 4th to 6th 4th to 7th 4th to 8th Test Fall Spring Fall Spring Fall Spring Fall Spring Reading Total .86 .85 .85 .84 .83 .84 .81 .83 Language Total .87 .85 .84 .84 .82 .81 .80 .79 Math Total .81 .80 .81 .79 .79 .79 .78 .77 Core Total .92 .90 .90 .89 .88 .88 .87 .86 85 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 86 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 87 PART 6 Difficulty of the Tests Elementary school teachers, particularly those in the primary grades, often criticize standardized tests for being too difficult. This probably stems from the fact that no single test can be perfectly suited in difficulty for all students in a heterogeneous grade group. The use of individualized testing should help to avoid the frustrations that result when students take tests that are inappropriate in difficulty. It also is important for teachers to understand the nature of a reliable measuring instrument; they should especially realize that little diagnostic information is gained from a test on which all students correctly answer almost all of the items. Characteristics of the “ideal” difficulty distribution of items in a test have been the subject of considerable controversy. Difficulty specifications differ for types of tests: survey versus diagnostic, norm-referenced versus criterion-referenced, mastery tests versus tests intended to maximize individual differences, minimum-competency tests versus tests designed to measure high standards of excellence, and so forth. Developments in the area of individualized testing and adaptive testing also have shed new light on test difficulty. As noted in the discussion of reliability, the problem of placing students along a developmental continuum may differ from that of determining their ranks in grade. To maximize the reliability of a ranking within a group, an achievement test must utilize nearly the entire range of possible scores; the raw scores on the test should range from near zero to the highest possible score. The best way to ensure such a continuum is to conduct one or more preliminary tryouts of items that will determine objectively the difficulty and discriminating power of the items. A few items included in the final test should be so easy that at least 80 percent of students answer them correctly. These should identify the least able students. Similarly, a few very difficult items should be included to challenge the most able students. Most items, however, should be of medium difficulty Item and Test Analysis and should discriminate well at all levels of ability. In other words, the typical student will succeed on only a little more than half of the test items, while the least able students may succeed on only a few. A test constructed in this manner results in the widest possible range of scores and yields the highest reliability per unit of testing time. The ten levels of the Iowa Tests of Basic Skills were constructed to discriminate in this manner among students in kindergarten through grade 8. Item difficulty indices for three times of the year (October 15, January 15, and April 15) are reported in the Content Classifications with Item Norms booklets. In Tables 6.1 and 6.2, examples of item norms are shown for Word Analysis on Level 6 and for Language Usage and Expression on Level 12 of Form A. Content classifications and item descriptors are shown in the first column. The content descriptors are cross-referenced to the Interpretive Guide for Teachers and Counselors and to various criterion-referenced reports. The entries in the tables are percent correct for total test, major skill grouping, and item, respectively. For the Level 6 Word Analysis test, there are 35 items. The mean item percents correct are 53% for kindergarten, spring; 61% for grade 1, fall; 68% for grade 1, midyear; and 73% for grade 1, spring. The items measuring letter recognition (Printed letters) are very easy—all percent correct values are 90 or above. Items measuring other skills are quite variable in difficulty. In Levels 9 through 14 of the battery, most items appear in two consecutive grades. In Language Usage and Expression, item 23, for example, appears in Levels 11 and 12, and item norms are provided for grades 5 and 6. In grade 5, the percents answering this item correctly are 51, 56, and 60 for fall, midyear, and spring, respectively. In grade 6 the percents are 63, 65, and 67 (Table 6.2). The consistent increases in percent correct—from 51% to 67%—show that this item measures skill development across the two grades. 87 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 88 Table 6.1 Word Analysis Content Classifications with Item Norms Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 6 WORD ANALYSIS Average Percent Correct Kindergarten Grade 1 Item Number Spring Fall Midyear (35) 53 61 68 73 (20) 56 64 70 76 Initial sounds: pictures Initial sounds: pictures Initial sounds: pictures Initial sounds: pictures 8 9 10 11 64 52 79 69 68 57 87 76 71 61 90 80 74 65 93 84 Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words Initial sounds: words 12 13 14 15 16 17 18 19 42 24 30 65 49 37 52 42 44 30 36 67 61 50 59 49 53 40 46 68 71 61 67 59 61 49 55 68 81 71 74 68 4 5 6 7 78 70 59 52 90 84 69 65 93 90 78 75 95 95 87 84 20 21 22 23 66 60 62 65 73 67 68 75 78 70 72 79 82 72 75 82 (15) 50 58 64 70 1 2 3 91 90 91 98 96 97 99 97 98 99 98 98 Letter substitutions Letter substitutions Letter substitutions Letter substitutions Letter substitutions Letter substitutions 24 25 26 27 28 29 46 46 33 41 39 23 57 53 39 48 49 30 62 57 49 55 57 39 67 61 58 61 65 47 Word building Word building Word building Word building Word building Word building 30 31 32 33 34 35 48 58 31 45 32 35 63 67 37 63 33 42 73 78 49 72 35 47 82 88 61 81 37 51 WORD ANALYSIS Phonological Awareness and Decoding Letter-sound correspondences Letter-sound correspondences Letter-sound correspondences Letter-sound correspondences Rhyming sounds Rhyming sounds Rhyming sounds Rhyming sounds Identifying and Analyzing Word Parts Printed letters Printed letters Printed letters 88 (Number of Items) Spring 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 89 Table 6.2 Usage and Expression Content Classifications with Item Norms Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 12 USAGE AND EXPRESSION (Number of Items) Average Percent Correct Grade 6 Item Number Fall Midyear USAGE AND EXPRESSION (38) 59 61 63 Nouns, Pronouns, and Modifiers (10) 58 60 62 Irregular plurals Spring 6 58 59 59 Homonyms 12 26 27 30 Redundancies 15 63 65 67 Pronoun case 5 70 72 73 Nonstandard pronouns 7 49 50 50 Comparative adjectives 4 55 57 58 1 9 11 14 73 62 65 63 74 65 68 65 75 68 70 67 Misuse of adjective for adverb Misuse of adjective Misuse of adverb Misuse of adjective for adverb Verbs (6) 58 61 62 Subject-verb agreement 19 60 62 63 Tense Tense 3 17 58 57 60 59 62 60 Participles Verb forms Verb forms 2 10 34 59 69 47 63 72 48 66 74 49 Conciseness and Clarity (5) 53 55 57 Lack of conciseness 35 39 40 40 Combining sentences 24 36 38 41 Misplaced modifiers 27 70 73 75 Ambiguous references Ambiguous references 23 32 63 57 65 59 67 61 Organization of Ideas (6) 59 61 62 Appropriate sentence order Appropriate sentence order 21 26 57 52 59 54 61 58 Sentences appropriate to function Sentences appropriate to function Sentences appropriate to function 25 33 37 74 65 43 76 67 44 77 68 44 Sentences suitable to purpose 20 62 63 64 Appropriate Use (11) 64 66 67 Use of complete sentences Use of complete sentences 16 28 60 63 62 65 64 66 Appropriate word order in sentences Appropriate word order in sentences Appropriate word order in sentences Appropriate word order in sentences 13 22 30 36 62 69 64 59 65 70 66 61 67 71 67 63 Parallel construction 38 47 48 49 Conjunctions Conjunctions 29 31 67 60 69 62 71 63 Correct written language Correct written language 8 18 78 74 79 75 79 76 89 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 90 The distributions of item norms (proportion correct) are shown in Table 6.3 for all tests and levels of the ITBS. The results are based on an analysis of the weighted sample from the 2000 spring national standardization of Form A. As can be seen from the various sections of Table 6.3, careful test construction led to an average national item difficulty of about .60 for spring testing. At the lower grade levels, tests tend to be slightly easier than this; at the upper grade levels, they tend to be slightly harder. In general, tests with average item difficulty of about .60 will have nearly optimal internal consistency-reliability coefficients. These distributions also illustrate the variability in item difficulty needed to discriminate throughout the entire ability range. It is extremely important in test development to include both relatively easy and relatively difficult items at each level. Not only are such items needed for motivational reasons, but they are critical for a test to have enough ceiling for the most capable students and enough floor for the least capable ones. Nearly all tests and all levels have some items with difficulties above .8 as well as some items below .3. Table 6.3 Distribution of Item Difficulties Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Level 5 Grade K Vocabulary Word Analysis Listening Language Mathematics V WA Li L M Proportion Correct >=.90 3 4 2 5 .80–.89 11 1 6 5 7 .70–.79 3 8 11 7 7 .60–.69 3 8 7 8 4 .50–.59 2 3 3 4 2 .40–.49 5 4 2 3 3 .30–.39 2 2 0 .20–.29 1 .10–.19 <.10 Average Level 6 Grade K .70 .66 .70 .69 .72 Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT Proportion Correct >=.90 3 .80–.89 2 0 .70–.79 4 3 2 .60–.69 5 7 6 1 3 2 .50–.59 8 5 8 8 7 6 .40–.49 5 8 6 6 7 8 1 9 .30–.39 4 7 5 11 8 9 9 18 .20–.29 3 2 4 2 6 4 8 12 1 1 .29 .37 .10–.19 4 3 2 6 <.10 Average 90 .54 .53 .49 .40 .46 .43 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 91 Table 6.3 (continued) Distribution of Item Difficulties Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Level 6 Grade 1 Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT 6 6 4 4 7 Proportion Correct >=.90 7 .80–.89 6 9 9 9 7 14 6 20 .70–.79 5 5 6 10 8 7 5 12 .60–.69 5 9 6 5 9 1 .50–.59 5 3 3 2 5 .40–.49 4 2 3 3 1 1 1 .30–.39 1 .20–.29 5 6 3 3 .72 .79 1 .10–.19 <.10 Average .72 .73 .73 .70 .72 Mathematics Reading Level 7 Grade 1 .84 Word Analysis Listening Vocabulary Comprehension RV RC WA Li 1 1 2 Spelling Language Concepts L1 L Problems & Data Computation Interpretation Social Studies Science Sources of Information SI M1 M2 M3 SS SC 3 1 1 1 6 Proportion Correct >=.90 .80–.89 5 4 2 4 7 7 10 6 6 5 12 3 .70–.79 3 4 13 7 6 9 4 3 8 12 5 4 .60–.69 8 13 11 7 5 10 4 6 5 9 2 9 .50–.59 10 11 6 7 3 7 3 3 4 2 4 4 .40–.49 3 1 1 3 2 1 2 3 2 1 1 2 .30–.39 1 0 0 3 5 1 0 1 1 1 .68 .67 .71 .68 Word Analysis Listening Spelling Language .20–.29 1 1 .10–.19 <.10 Average .62 .66 .70 .69 .70 .77 .65 Social Studies Science Sources of Information SS SC SI Mathematics Reading Level 8 Grade 2 .61 Problems & Data Computation Interpretation Vocabulary Comprehension RV RC 4 2 3 2 1 3 4 5 2 2 3 11 3 4 5 10 10 6 5 6 7 .70–.79 4 12 6 6 6 15 3 3 6 8 7 8 .60–.69 13 3 13 12 8 8 5 4 4 4 6 7 .50–.59 8 6 9 6 2 6 4 6 4 6 6 6 .40–.49 3 1 3 2 3 6 1 4 2 3 .30–.39 0 1 2 3 1 3 1 0 .20–.29 1 Concepts WA Li L1 L M1 M2 M3 Proportion Correct >=.90 .80–.89 1 .10–.19 4 1 1 <.10 Average .63 .73 .64 .69 .74 .72 .69 .65 .67 .67 .68 .66 91 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 92 Table 6.3 (continued) Distribution of Item Difficulties Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Reading Level 9 Grade 3 Language Sources of Information Mathematics Social Studies Science SS SC Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 M2 1 1 7 3 3 1 6 2 2 2 Problems Computa& Data tion Interpretation M3 Maps Reference and Materials Diagrams S1 S2 Word Analysis Listening WA Li Proportion Correct >=.90 .80–.89 4 2 1 4 2 8 3 .70–.79 4 8 6 5 4 8 6 4 7 12 4 6 6 4 9 .60–.69 9 8 5 6 3 10 6 5 4 5 9 3 7 15 6 .50–.59 7 10 8 5 6 8 7 6 7 5 7 4 7 4 6 .40–.49 4 9 2 2 6 2 5 4 3 4 6 4 4 2 2 .30–.39 0 2 3 2 1 1 0 2 2 3 4 1 1 .20–.29 1 .62 .57 .68 .70 1 2 .10–.19 <.10 Average .62 .58 .67 Reading Level 10 Grade 4 .61 .58 .62 .63 Language .63 .64 .63 .55 Sources of Information Mathematics Social Studies Science SS SC S1 3 2 2 3 5 Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 .80–.89 1 2 4 2 1 2 4 5 3 .70–.79 6 8 9 7 3 6 8 3 7 9 .60–.69 13 11 9 2 7 16 10 5 5 6 9 8 10 .50–.59 7 13 3 4 7 5 8 7 3 6 16 5 5 .40–.49 3 5 5 8 5 3 7 1 4 10 5 5 5 .30–.39 4 1 1 1 4 2 2 3 2 1 3 1 1 1 .60 .63 .61 .56 .60 .61 Problems Computa& Data tion Interpretation M2 M3 S2 Proportion Correct >=.90 1 .20–.29 1 2 .10–.19 <.10 Average .60 Reading Level 11 Grade 5 .55 .62 .62 Language .60 .63 .61 Sources of Information Mathematics Social Studies Science M3 SS SC 5 1 Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 .80–.89 1 2 3 .70–.79 8 5 10 4 5 12 8 3 6 5 4 5 8 .60–.69 9 18 5 5 6 10 9 8 7 8 10 6 11 .50–.59 9 13 9 7 11 7 11 6 6 11 11 6 9 .40–.49 7 3 3 6 5 4 8 4 1 9 12 6 4 .30–.39 3 2 5 2 0 1 2 2 3 2 1 1 1 1 .57 .61 .58 .62 Problems Computa& Data tion Interpretation M2 S1 S2 Proportion Correct >=.90 1 4 .20–.29 1 1 1 1 1 3 .10–.19 <.10 Average 92 .59 .61 .61 .60 .60 .57 .57 .56 .62 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 93 Table 6.3 (continued) Distribution of Item Difficulties Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization (Weighted Sample) Reading Level 12 Grade 6 Language Sources of Information Mathematics Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 Problems Computa& Data tion Interpretation M2 M3 Social Studies Science SS SC S1 Maps Reference and Materials Diagrams S2 Proportion Correct >=.90 1 1 .80–.89 3 2 6 4 1 .70–.79 7 7 6 3 9 .60–.69 9 16 7 7 .50–.59 13 12 9 7 .40–.49 5 6 9 .30–.39 1 1 1 .20–.29 1 1 2 4 6 2 3 3 10 7 4 6 4 6 4 8 6 18 12 5 4 6 14 7 6 7 4 13 7 5 15 10 5 9 6 6 5 6 7 4 11 6 5 2 1 1 1 2 4 1 3 3 5 1 1 .58 .59 2 1 .10–.19 <.10 Average .62 .62 .62 Reading Level 13 Grade 7 .59 .61 .63 .61 Language .62 .61 .55 .58 Social Studies Science SS SC Sources of Information Mathematics Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 2 1 2 4 2 Problems Computa& Data tion Interpretation M2 M3 Maps Reference and Materials Diagrams S1 S2 Proportion Correct >=.90 .80–.89 1 1 4 2 1 2 1 .70–.79 6 7 7 4 8 11 7 5 6 .60–.69 10 14 8 8 4 15 11 7 3 12 7 3 5 13 8 12 .50–.59 14 20 18 8 13 6 14 7 4 .40–.49 7 6 2 5 0 4 7 7 10 12 7 4 10 13 11 5 .30–.39 1 3 3 4 2 2 1 5 7 4 3 4 .20–.29 1 1 2 2 .58 .60 1 4 .10–.19 <.10 Average .59 .59 .60 Reading Level 14 Grade 8 .60 .61 Language .60 .54 .52 .57 .53 .57 Social Studies Science SS SC S1 1 1 3 9 6 8 Sources of Information Mathematics Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 1 1 .80–.89 1 3 2 4 2 1 .70–.79 7 9 7 5 2 8 .60–.69 12 12 12 8 11 12 9 7 4 10 6 5 7 .50–.59 14 16 8 8 8 10 14 6 4 17 10 5 11 .40–.49 6 10 7 3 7 7 7 9 13 10 14 6 7 .30–.39 1 2 3 3 4 3 6 2 5 2 4 6 4 .20–.29 1 2 2 2 4 2 1 .58 .60 .57 .55 .50 .54 Problems Computa& Data tion Interpretation M2 M3 S2 Proportion Correct >=.90 1 9 1 2 6 2 2 .10–.19 <.10 Average .59 .60 .56 .58 .55 .54 .57 93 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 94 A summary of the Form A difficulty indices for all tests and grades is presented in Table 6.4. The difficulty indices reported for each grade are item proportions (p-values) rather than percents correct. These data are from the 2000 spring and fall standardizations; the mean item proportions (in italics), the medians, and the 10th and 90th percentiles in the distributions of item difficulty are given. Comparable data for Form B are included in Norms and Score Conversions, Form B. Norms and score conversions for Forms A and B of the Survey Battery are published in separate booklets. Appropriateness of test difficulty can best be ascertained by examining relationships between raw scores, standard scores, and percentile ranks in the tables in Norms and Score Conversions. For example, the norms tables indicate 39 of 43 items on Level 12 of the Concepts and Estimation test must be answered correctly to score at the 99th percentile in the fall of grade 6, and that 41 items must be answered correctly to score at the 99th percentile in the spring. Similarly, the number of items needed to score at the median for the three times of the year are 23, 25, and 27 (out of 43), respectively. This test thus appears to be close to ideal in item difficulty for the grade in which it is typically used. It should be noted that these difficulty characteristics are for a cross section of the attendance centers in the nation. The distributions of item difficulty vary markedly among attendance centers, both within and between school systems. In some schools, when the same levels are administered to all students in a given grade, the tests are too difficult; in others they may be too easy. When tests are too difficult, a given student’s scores may be determined largely by “chance.” When tests are too easy and scores approach the maximum possible, a student’s true performance level may be seriously underestimated. 94 Individualized testing is necessary to adapt difficulty levels to the needs and characteristics of individual students. The Interpretive Guide for School Administrators discusses issues related to the selection of appropriate test levels. Both content and difficulty should be considered when assigning levels of the tests to individual students. The tasks reflected by the test questions should be relevant to the student’s needs and level of development. At the same time, the level of difficulty of the items should be such that the test is challenging, but success is attainable. Discrimination As discussed in Part 3, item discrimination indices (item-test correlations) are routinely determined in tryout and are one of several criteria for item selection. Developmental discrimination is inferred from tryout and standardization data showing that items administered at adjacent grade levels have increasing p-values from grade to grade. Discrimination indices (biserial correlations) were computed for items in all tests and grades for Form A in the 2000 spring standardization program. The means (in italics), medians, and the 90th and 10th percentiles in the distributions of biserial correlations are shown in Table 6.4. As would be expected, discrimination indices vary considerably from grade to grade, from test to test, and even from skill to skill. In general, discrimination indices tend to be higher for tests that are relatively homogeneous in content and lower for tests that include complex stimuli or for skills within tests that require complex reasoning processes. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 95 Table 6.4 Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Vocabulary Word Analysis Listening Language Mathematics V WA Li L M 29 30 29 29 29 Mean .60 .53 .56 .53 .56 P90 .80 .79 .68 .75 .81 Median .66 .53 .55 .52 .60 P10 .31 .29 .37 .29 .26 Mean .70 .66 .70 .69 .72 P90 .89 .91 .81 .87 .91 Median .75 .67 .71 .67 .75 P10 .40 .40 .51 .46 .43 Mean .48 .55 .53 .53 .55 P90 .66 .69 .68 .66 .67 Median .50 .54 .54 .54 .55 P10 .30 .42 .40 .39 .40 Level 5 Grade K Number of Items Difficulty Fall Spring Discrimination Spring Level 6 Grade 1 Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT Number of Items 31 35 31 31 35 29 19 48 Difficulty Fall Mean .61 .61 .59 .52 .57 .63 .45 .56 P90 .85 .89 .80 .71 .82 .78 .53 .76 Median .58 .62 .57 .53 .54 .63 .45 .53 P10 .38 .35 .36 .22 .38 .47 .29 .38 Spring Mean .72 .73 .73 .70 .72 .84 .72 .79 P90 .93 .95 .92 .87 .91 .92 .81 .92 Median .72 .73 .73 .72 .71 .83 .75 .81 P10 .48 .50 .50 .43 .52 .75 .56 .66 Discrimination Spring Mean .48 .54 .49 .51 .53 .73 .76 .74 P90 .64 .72 .61 .69 .66 .84 .91 .87 Median .48 .54 .48 .50 .55 .72 .75 .73 P10 .30 .37 .35 .38 .38 .62 .63 .62 95 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 96 Table 6.4 (continued) Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Reading Level 7 Grade 2, Fall Grade 1, Spring Number of Items Difficulty Fall Mathematics Word Analysis Listening Spelling Language Problems & Data Computation Interpretation Social Studies Science Sources of Information Vocabulary Comprehension RV RC WA Li L1 L M1 M2 M3 SS SC SI 30 34 35 31 23 34 29 28 27 31 31 22 Concepts (Grade 2) Mean .70 .75 .74 .73 .81 .76 .77 .68 .76 .76 .82 .76 P90 .89 .90 .85 .94 .93 .92 .95 .91 .91 .89 .96 .91 Median .67 .74 .75 .73 .83 .77 .83 .70 .82 .79 .87 .76 P10 .53 .63 .60 .57 .63 .60 .41 .36 .51 .62 .59 .56 Spring (Grade 1) Mean .62 .66 .68 .67 .71 .68 .71 .61 .69 .70 .77 .65 P90 .80 .82 .79 .88 .84 .84 .89 .84 .87 .84 .93 .82 Median .60 .65 .68 .69 .71 .66 .74 .60 .73 .72 .83 .63 P10 .47 .54 .54 .49 .52 .53 .37 .32 .44 .57 .51 .45 Discrimination Spring Mean .63 .68 .55 .43 .74 .59 .53 .54 .64 .48 .52 .63 P90 .76 .79 .68 .52 .87 .76 .71 .64 .71 .59 .73 .72 Median .64 .70 .56 .44 .76 .60 .53 .56 .67 .51 .52 .67 P10 .49 .53 .40 .33 .61 .46 .34 .41 .52 .33 .31 .48 Social Studies Science Sources of Information Reading Level 8 Grade 3, Fall Grade 2, Spring Number of Items Difficulty Fall Mathematics Word Analysis Listening Spelling Language Problems & Data Computation Interpretation Vocabulary Comprehension RV RC WA Li L1 L M1 M2 M3 SS SC SI 32 38 38 31 23 42 31 30 30 31 31 28 Concepts (Grade 3) Mean .70 .79 .69 .75 .81 .78 .75 .71 .71 .72 .73 .74 P90 .86 .93 .84 .89 .96 .92 .95 .94 .95 .90 .89 .89 Median .70 .82 .68 .74 .81 .81 .80 .66 .76 .74 .75 .76 P10 .57 .56 .52 .58 .66 .57 .42 .48 .34 .47 .52 .56 Spring (Grade 2) Mean .63 .73 .64 .69 .74 .72 .69 .65 .67 .67 .68 .66 P90 .79 .89 .82 .84 .88 .86 .89 .90 .91 .85 .84 .82 Median .62 .76 .62 .68 .73 .74 .71 .60 .70 .70 .70 .67 P10 .49 .50 .47 .53 .59 .53 .35 .42 .32 .45 .45 .48 Discrimination Spring 96 Mean .61 .69 .52 .45 .65 .56 .53 .57 .59 .41 .44 .60 P90 .71 .95 .61 .54 .80 .71 .67 .70 .68 .55 .61 .70 Median .63 .69 .54 .45 .63 .57 .51 .56 .60 .40 .46 .63 P10 .46 .37 .37 .34 .55 .42 .39 .45 .51 .32 .21 .44 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 97 Table 6.4 (continued) Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Reading Level 9 Grade 3 Number of Items Language Sources of Information Mathematics Social Studies Science Spelling Punctuation Usage & Expression Concepts & Estimation RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 37 28 24 24 30 31 22 25 30 30 24 Comprehension RV 29 Problems Computa& Data tion Interpretation Word Analysis Listening S2 WA Li 28 35 31 Maps Reference and Materials Diagrams Capitalization Vocabulary Difficulty Fall Mean .52 .49 .62 .62 .55 .52 .49 .52 .53 .53 .46 .54 .46 .62 .62 P90 .70 .65 .84 .84 .72 .71 .72 .69 .75 .76 .70 .69 .65 .85 .84 Median .50 .49 .59 .61 .54 .54 .46 .53 .51 .47 .45 .57 .44 .59 .62 P10 .32 .31 .47 .43 .36 .26 .26 .35 .30 .36 .21 .30 .29 .47 .43 Spring Mean .62 .58 .67 .61 .58 .62 .63 .63 .64 .63 .55 .62 .57 .68 .70 P90 .80 .77 .84 .80 .80 .73 .84 .82 .81 .75 .74 .87 .71 .88 .90 Median .62 .58 .66 .63 .55 .64 .65 .60 .61 .66 .54 .60 .58 .67 .70 P10 .45 .43 .50 .36 .37 .41 .40 .46 .46 .41 .36 .37 .37 .51 .47 Discrimination Spring Mean .67 .64 .69 .66 .62 .64 .55 .66 .64 .57 .56 .60 .61 .54 .45 P90 .80 .76 .83 .83 .78 .76 .67 .78 .74 .71 .70 .75 .81 .68 .53 Median .68 .63 .69 .69 .65 .64 .56 .69 .67 .58 .57 .63 .63 .54 .44 P10 .56 .53 .58 .44 .44 .51 .42 .50 .56 .40 .40 .42 .43 .42 .37 Reading Level 10 Grade 4 Language Sources of Information Mathematics Social Studies Science Maps Reference and Materials Diagrams Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 34 41 32 26 26 33 36 24 27 34 34 25 30 Mean .51 .53 .55 .56 .49 .56 .53 .53 .50 .54 .49 .53 .52 P90 .65 .67 .73 .76 .62 .72 .73 .79 .71 .71 .62 .76 .68 Median .52 .53 .57 .53 .49 .56 .51 .52 .52 .50 .48 .53 .53 P10 .29 .37 .35 .31 .32 .35 .34 .24 .24 .36 .34 .36 .30 Mean .60 .60 .63 .61 .55 .62 .62 .60 .63 .61 .56 .61 .61 P90 .75 .73 .80 .81 .69 .75 .78 .84 .80 .76 .66 .80 .75 Median .61 .60 .67 .59 .55 .62 .61 .59 .69 .60 .56 .61 .63 P10 .37 .44 .43 .37 .37 .45 .46 .30 .38 .44 .42 .43 .38 Mean .63 .59 .63 .61 .59 .64 .57 .63 .65 .55 .57 .60 .61 P90 .76 .72 .74 .76 .74 .78 .70 .76 .74 .70 .74 .71 .77 Median .66 .58 .63 .61 .58 .64 .58 .64 .65 .58 .59 .60 .64 P10 .46 .45 .51 .41 .47 .47 .46 .48 .55 .38 .40 .44 .36 Number of Items Vocabulary Comprehension RV Problems Computa& Data tion Interpretation Difficulty Fall Spring Discrimination Spring 97 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 98 Table 6.4 (continued) Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Reading Level 11 Grade 5 Language Sources of Information Mathematics Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 37 43 36 28 28 35 40 26 29 37 37 26 32 Mean .51 .55 .54 .56 .53 .57 .52 .52 .52 .51 .51 .51 .55 P90 .66 .70 .73 .78 .69 .71 .73 .64 .73 .68 .62 .66 .71 Median .51 .55 .52 .54 .50 .57 .50 .50 .54 .50 .51 .50 .57 P10 .33 .40 .31 .34 .38 .43 .33 .28 .28 .34 .37 .33 .39 Mean .59 .61 .61 .60 .57 .61 .60 .58 .62 .57 .57 .56 .62 P90 .75 .75 .79 .80 .72 .74 .79 .71 .80 .74 .69 .73 .78 Median .57 .62 .60 .58 .56 .61 .58 .56 .65 .55 .57 .55 .63 P10 .42 .45 .38 .40 .42 .47 .41 .34 .32 .41 .44 .38 .46 Mean .59 .58 .63 .59 .61 .61 .54 .62 .64 .54 .58 .57 .62 P90 .74 .72 .80 .71 .73 .76 .70 .78 .80 .69 .76 .70 .79 Median .61 .58 .63 .61 .61 .63 .54 .63 .66 .57 .59 .60 .64 P10 .43 .43 .49 .45 .41 .47 .39 .44 .46 .35 .42 .40 .46 Number of Items Problems Computa& Data tion Interpretation Difficulty Fall Spring Discrimination Spring Reading Level 12 Grade 6 Language Sources of Information Mathematics Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 39 45 38 30 30 38 43 28 30 39 39 28 34 Mean .56 .57 .56 .56 .58 .59 .55 .57 .55 .51 .53 .54 .54 P90 .72 .71 .78 .79 .72 .71 .69 .74 .74 .66 .63 .70 .72 Median .53 .58 .53 .53 .56 .60 .53 .55 .56 .48 .53 .52 .53 P10 .40 .43 .39 .34 .44 .42 .37 .39 .30 .38 .39 .35 .32 .62 .62 .62 .59 .61 .63 .61 .62 .61 .55 .58 .58 .59 Number of Items Problems Computa& Data tion Interpretation Difficulty Fall Spring Mean P90 .76 .75 .84 .82 .78 .75 .75 .81 .81 .72 .71 .75 .78 Median .59 .63 .59 .57 .62 .64 .60 .59 .60 .53 .60 .54 .59 P10 .48 .47 .44 .35 .47 .43 .44 .42 .35 .42 .42 .39 .34 Mean .57 .58 .62 .54 .62 .61 .58 .61 .58 .51 .58 .56 .58 P90 .68 .73 .74 .73 .74 .76 .73 .80 .70 .67 .72 .74 .71 Median .58 .59 .64 .55 .63 .64 .60 .62 .61 .53 .58 .59 .61 P10 .44 .44 .46 .34 .50 .45 .43 .37 .48 .33 .40 .37 .37 Discrimination Spring 98 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 99 Table 6.4 (continued) Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Reading Level 13 Grade 7 Language Sources of Information Mathematics Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 41 48 40 32 32 40 46 30 31 41 41 30 36 Mean .54 .55 .56 .57 .55 .57 .56 .57 .49 .48 .52 .50 .53 P90 .70 .67 .74 .77 .72 .71 .73 .72 .69 .59 .65 .70 .67 Median .53 .54 .53 .57 .53 .61 .53 .54 .43 .47 .51 .49 .55 P10 .36 .43 .42 .36 .33 .30 .39 .39 .28 .34 .35 .24 .37 Mean .59 .59 .60 .60 .58 .60 .61 .60 .54 .52 .57 .53 .57 P90 .76 .72 .79 .80 .75 .75 .78 .75 .72 .63 .72 .74 .70 Median .57 .58 .58 .59 .58 .63 .59 .58 .49 .51 .55 .52 .59 P10 .42 .48 .48 .39 .35 .34 .43 .41 .32 .38 .40 .25 .41 Mean .54 .60 .61 .54 .59 .59 .56 .60 .54 .51 .57 .52 .56 P90 .64 .73 .72 .75 .75 .78 .70 .76 .64 .62 .72 .68 .69 Median .56 .61 .62 .53 .63 .64 .58 .59 .54 .52 .58 .58 .57 P10 .45 .43 .46 .37 .39 .36 .40 .42 .42 .37 .41 .25 .40 Number of Items Problems Computa& Data tion Interpretation Difficulty Fall Spring Discrimination Spring Reading Level 14 Grade 8 Language Sources of Information Mathematics Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression Concepts & Estimation RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 42 52 42 34 34 43 49 32 32 43 43 31 38 Mean .55 .56 .54 .58 .53 .55 .50 .56 .46 .51 .52 .51 .55 P90 .67 .73 .72 .80 .67 .72 .69 .75 .63 .62 .69 .73 .69 Median .55 .54 .56 .57 .53 .55 .52 .54 .41 .52 .49 .51 .55 P10 .40 .41 .31 .29 .36 .33 .26 .41 .30 .39 .36 .30 .35 .59 .60 .58 .60 .56 .57 .55 .58 .50 .54 .55 .54 .57 Number of Items Problems Computa& Data tion Interpretation Difficulty Fall Spring Mean P90 .74 .76 .78 .83 .71 .74 .71 .78 .71 .66 .73 .76 .74 Median .59 .58 .61 .61 .56 .57 .56 .56 .47 .55 .54 .53 .57 P10 .45 .44 .32 .30 .38 .34 .30 .42 .31 .40 .37 .32 .36 Mean .54 .59 .59 .55 .56 .56 .54 .60 .56 .52 .56 .54 .55 P90 .68 .72 .71 .79 .72 .79 .71 .75 .64 .67 .75 .71 .71 Median .54 .63 .60 .58 .57 .58 .55 .60 .57 .53 .58 .58 .55 P10 .41 .42 .43 .32 .35 .31 .35 .46 .43 .32 .32 .32 .35 Discrimination Spring 99 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 100 Ceiling and Floor Effects The ITBS battery is designed for flexibility in assigning test levels to students. In schools where all students in a given grade are tested with the same level, it is important that each level of the test accurately measures students of all ability levels. For exceptionally able students or students who are challenged in skills development, individualized testing with appropriate levels can be used to match test content and item difficulty to student ability levels. Students at the extremes of the score distributions are of special concern. To measure high-ability students accurately, the test must have enough ceiling to allow such students to demonstrate their skills. If the test is too easy, a considerable proportion of these students will obtain perfect or near-perfect scores. If the test is too difficult for lowability students, many will obtain chance scores and such scores may have inflated percentile ranks. A summary of ceiling and floor effects for all tests and grades for spring testing is shown in Table 6.5. The top line of the table for each grade is the number of items in each test (k). Under “Ceiling,” the percentile rank of a perfect score is listed for each test as well as the percentile rank of a score one less than perfect (k-1). A “chance” score is frequently defined as the number of items in the test divided by the average number of responses per item. The percentile ranks of these “chance” estimates are listed under “Floor.” Of course, not all students who score at this level do so by chance. However, a substantial proportion of students scoring at this level is an indication the test may be too difficult and that individualized testing should be considered. Completion Rates There is by no means universal agreement on the issue of speed in achievement testing. Many believe that the fact good students tend to be quicker than poor students is not in itself a sufficient reason to penalize the occasional good student who works slowly. On the other hand, if time limits are too generous in order to free all examinees from time constraints, a considerable portion of the examination period is wasted for the majority of students who work at a reasonable pace. Also, when items are arranged essentially in order of increasing difficulty, as they are in most tests, it may be unreasonable to 100 expect all students to complete the difficult items at the end of the test. The speed issue also differs with types of tests, levels of development, and nature of the objectives. Speed can be important in repetitive, mechanical operations such as letter recognition and math computation. It is difficult to conceive of proficiency in computation without taking rate of performance into account. If time limits are too generous on a test of estimation, students are likely to perform calculations instead of using estimation skills. On the other hand, in tests that require creative, critical thinking it is quality of thought rather than speed of response that is important. Two indices of completion rates for all tests and levels from the spring standardization are shown in Table 6.5. The first is percent of students completing the test. Because something less than 100 percent completion is generally considered ideal for most achievement tests, another widely used index is given: the percent of students who completed at least 75 percent of the items in the test. The data in Table 6.5 indicate that most of the ITBS tests at most levels are essentially power tests. The only obvious exception is Math Computation, in which time limits are intentionally imposed to help teachers identify students who are not proficient in computation because they work slowly. It should be noted that two completion rates are reported for the Reading Comprehension test beginning at Level 9. As discussed in Part 3, this test was divided into two separately timed sections in Forms A and B. This change resulted in higher completion rates than were observed in previous editions of the ITBS. Other Test Characteristics In addition to the statistical considerations presented in this part of the Guide to Research and Development, other factors of interest are routinely examined as a part of the test development and evaluation process. For example, readability indices are computed for appropriate test materials that involve significant amounts of reading. These are reported in Part 3. Measures of differential item functioning (DIF) are computed for both tryout materials and final forms of the tests. Part 7 discusses these procedures and results. Finally, complete test and item analysis information is published in separate norms books for Form B Complete Battery and Form A and Form B Survey Battery. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 101 Table 6.5 Ceiling Effects, Floor Effects, and Completion Rates Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample) Spring 2000 National Standardization Vocabulary Word Analysis Listening Language Mathematics V WA Li L M 29 30 29 29 29 CEILING Percentile Rank of Top Scores k (perfect score) k-1 99.9 98.8 99.7 98.4 99.9 98.2 99.9 97.9 99.9 97.2 FLOOR Percent of Students Scoring Below k/n* 0.6 2.1 2.8 2.6 1.3 COMPLETION RATES Percent of Students Completing Test 98 98 98 98 98 99 98 99 99 99 Level 5 Grade K Number of Items (k) Percent of Students Completing 75% of Test Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT 31 35 31 31 35 29 19 48 CEILING Percentile Rank of Top Scores k (perfect score) k-1 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 98.8 99.9 98.9 99.9 99.9 FLOOR Percent of Students Scoring Below k/n* 11.6 10.1 21.2 42.7 22.0 20.0 81.0 26.5 COMPLETION RATES Percent of Students Completing Test 98 93 95 98 95 97 80 91 99 95 98 98 97 97 85 93 Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT Level 6 Grade K Number of Items (k) Percent of Students Completing 75% of Test Level 6 Grade 1 Number of Items (k) 31 35 31 31 35 29 19 48 CEILING Percentile Rank of Top Scores k (perfect score) k-1 98.8 97.5 99.4 97.3 99.9 98.4 99.9 98.6 99.9 98.9 90.1 71.0 90.1 73.2 91.6 83.8 FLOOR Percent of Students Scoring Below k/n* 1.0 0.3 0.8 2.5 1.8 0.7 10.9 1.8 COMPLETION RATES Percent of Students Completing Test 99 98 98 99 98 99 96 97 99 99 99 99 98 99 97 98 Percent of Students Completing 75% of Test * n = number of answer choices 101 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 102 Table 6.5 (continued) Ceiling Effects, Floor Effects, and Completion Rates Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample) Spring 2000 National Standardization Reading Level 7 Grade 1 Mathematics Word Analysis Listening Spelling Language Problems & Data Computation Interpretation Social Studies Science Sources of Information SI Vocabulary Comprehension RV RC WA 30 34 35 k (perfect score) 98.9 98.0 99.9 99.9 95.0 99.0 99.9 99.9 98.8 99.9 99.9 98.7 k-1 95.8 92.8 97.9 99.9 85.9 96.1 98.5 99.0 95.4 99.9 98.4 93.9 FLOOR Percent of Students Scoring Below k/n* 5.1 9.3 2.9 1.9 8.0 5.5 1.3 2.7 3.4 1.6 0.7 10.6 COMPLETION RATES Percent of Students Completing Test 96 96 98 97 99 98 98 97 79 99 99 98 98 98 98 99 99 99 96 99 93 99 99 98 Number of Items (k) Concepts Li L1 L M1 M2 M3 SS SC 31 23 34 29 28 27 31 31 22 CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test Reading Level 8 Grade 2 Mathematics Word Analysis Listening Spelling Language Problems & Data Computation Interpretation Social Studies Science Sources of Information SI Vocabulary Comprehension RV RC WA 32 38 38 k (perfect score) 99.9 99.9 99.9 99.9 96.4 99.7 99.9 99.7 99.9 99.9 99.9 99.3 k-1 98.7 96.0 99.3 99.7 89.0 97.7 98.5 97.2 99.3 99.9 99.9 95.8 FLOOR Percent of Students Scoring Below k/n* 3.8 2.9 4.2 1.6 3.6 2.4 0.1 2.0 1.8 0.9 1.7 7.1 COMPLETION RATES Percent of Students Completing Test 98 98 96 99 99 99 99 98 69 99 99 99 99 99 99 99 99 99 99 99 90 99 99 99 Number of Items (k) Concepts Li L1 L M1 M2 M3 SS SC 31 23 42 31 30 30 31 31 28 CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test * n = number of answer choices 102 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 103 Table 6.5 (continued) Ceiling Effects, Floor Effects, and Completion Rates Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample) Spring 2000 National Standardization Reading Level 9 Grade 3 Language Comprehension RV Concepts Problems Computa& Data & tion InterpreEstimation tation Social Studies Science Word Analysis Listening S2 WA Li 28 35 31 Maps Reference and Materials Diagrams Spelling Capitalization Punctuation Usage & Expression RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 29 37 28 24 24 30 31 22 25 30 30 24 k (perfect score) 99.7 99.9 98.8 99.9 99.9 99.9 99.9 98.7 99.0 99.9 99.9 99.9 99.9 99.9 99.9 k-1 97.0 99.3 93.8 96.9 98.6 97.8 99.9 94.8 96.0 99.6 99.9 98.8 99.9 97.6 99.9 8.4 10.6 4.0 7.2 7.6 8.1 2.5 6.5 5.8 4.2 9.3 1.9 7.9 2.7 1.0 90 961 962 93 92 92 96 932 96 79 97 95 96 95 99 99 95 981 972 97 96 96 99 952 97 89 98 97 97 97 99 99 Number of Items (k) Vocabulary Sources of Information Mathematics CEILING Percentile Rank of Top Scores FLOOR Percent of Students Scoring Below k/n* COMPLETION RATES Percent of Students Completing Test Percent of Students Completing 75% of Test 951 981 Reading Level 10 Grade 4 Language Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 34 41 32 26 26 33 36 24 27 34 34 25 30 k (perfect score) 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.0 99.9 99.9 99.9 99.9 k-1 98.5 99.9 97.8 99.9 99.2 98.8 99.9 97.8 95.9 99.9 99.9 98.1 99.9 FLOOR Percent of Students Scoring Below k/n* 8.7 6.4 4.0 6.5 11.1 8.9 2.2 4.3 5.0 4.0 9.6 7.4 7.3 COMPLETION RATES Percent of Students Completing Test 91 971 962 94 94 92 97 951 942 96 75 97 95 97 97 96 981 982 97 97 97 99 981 972 97 89 99 98 98 98 Number of Items (k) CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test 1 Part 1 Part 2 * n = number of answer choices 2 103 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 104 Table 6.5 (continued) Ceiling Effects, Floor Effects, and Completion Rates Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample) Spring 2000 National Standardization Reading Level 11 Grade 5 Language Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 37 43 36 28 28 35 40 26 29 37 37 26 32 k (perfect score) 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 k-1 98.7 99.7 97.8 99.5 98.7 99.9 99.9 98.3 97.5 99.9 99.9 99.0 99.9 FLOOR Percent of Students Scoring Below k/n* 7.5 4.6 4.5 5.4 7.0 6.7 2.7 8.0 7.1 6.1 9.4 7.8 5.8 COMPLETION RATES Percent of Students Completing Test 93 971 982 95 95 94 97 941 942 97 77 98 96 97 98 97 971 992 97 98 97 99 971 972 98 88 99 98 99 99 Number of Items (k) CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test Reading Level 12 Grade 6 Language Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 39 45 38 30 30 38 43 28 30 39 39 28 34 k (perfect score) 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 k-1 98.1 99.5 98.3 99.9 98.1 99.9 99.9 98.2 98.7 99.9 99.9 99.9 99.9 FLOOR Percent of Students Scoring Below k/n* 3.9 5.3 3.4 4.9 7.1 6.1 4.0 4.2 4.8 5.9 6.9 4.8 6.7 COMPLETION RATES Percent of Students Completing Test 95 981 972 96 96 96 97 941 912 97 80 97 96 97 97 98 991 982 98 98 98 99 971 952 99 91 98 98 99 99 Number of Items (k) CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test 1 Part 1 Part 2 * n = number of answer choices 2 104 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 105 Table 6.5 (continued) Ceiling Effects, Floor Effects, and Completion Rates Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample) Spring 2000 National Standardization Reading Level 13 Grade 7 Language Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 41 48 40 32 32 40 46 30 31 41 41 30 36 k (perfect score) 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 k-1 99.1 99.9 98.8 99.9 99.1 99.9 99.9 98.1 98.6 99.9 99.9 99.9 99.9 FLOOR Percent of Students Scoring Below k/n* 5.1 6.0 4.0 4.6 7.1 6.1 3.8 5.8 5.3 10.6 9.1 7.4 7.3 COMPLETION RATES Percent of Students Completing Test 96 971 982 96 97 96 98 961 942 97 78 96 96 98 97 98 981 992 98 99 98 99 981 962 99 88 98 98 99 98 Social Studies Science Number of Items (k) CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test Reading Level 14 Grade 8 Language Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 42 52 42 34 34 43 49 32 32 43 43 31 38 k (perfect score) 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 99.9 k-1 98.7 99.9 99.9 99.9 99.9 99.9 99.9 98.7 98.9 99.9 99.9 99.9 99.9 FLOOR Percent of Students Scoring Below k/n* 4.4 5.3 5.0 5.7 9.0 6.7 8.1 5.7 8.9 7.7 8.5 7.7 7.4 COMPLETION RATES Percent of Students Completing Test 97 971 992 97 97 96 97 971 942 96 78 97 97 98 98 98 991 992 98 99 98 99 991 982 98 81 98 98 99 99 Number of Items (k) CEILING Percentile Rank of Top Scores Percent of Students Completing 75% of Test 1 Part 1 Part 2 * n = number of answer choices 2 105 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 106 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 107 PART 7 Among the most important results from the periodic use of achievement tests administered under standard conditions are findings that can be used to understand the process of social change through education. The data on national trends in achievement reported in Part 4 represent one example of how aggregate data from achievement tests reflect the social dynamics of education. In addition, national data on student achievement have shown the value of disaggregated results. During the 1980s, for example, the National Assessment of Educational Progress (NAEP) often reported fairly stable levels of achievement. However, dramatic gains were demonstrated by the national samples of Black and Hispanic students (Linn & Dunbar, 1990). Although the social reasons for changes in group differences in achievement are not always clear, carefully developed tests can provide a broad view of the influence of school on such differences. Various approaches to understanding group differences in test scores are a regular part of research and test development efforts for the Iowa Tests of Basic Skills. To ensure that assessment materials are appropriate and fair for different groups, careful test development procedures are followed. Sensitivity review by content and fairness committees and extensive statistical analysis of the items and tests are conducted. The precision of measurement for important groups in the national standardization is evaluated when examining the measurement characteristics of the tests. Differences between groups in average performance and in the variability of performance are also of interest, and these are examined for changes over time. In addition to descriptions of group differences in test performance, analyses of differential item functioning are undertaken with results from the national item tryout as well as with results from the national standardization. Group Differences in Item and Test Performance Standard Errors of Measurement for Groups The precision of test scores for members of various demographic groups is a great concern, especially when test scores are used for purposes of selection or placement, such as college admissions tests and other kinds of subject-matter tests. Although standardized achievement tests such as the ITBS were not designed to be used in this way, there is still an interest in the precision with which the tests place an individual on the developmental continuum in each content domain. Standard errors of measurement were presented for this purpose in Part 5. Table 7.1 reports standard errors of measurement estimated separately for boys, girls, Whites, Blacks, and Hispanics based on data from the 2000 national standardization. Gender Differences in Achievement Differences between achievement test scores of girls and boys have been an ongoing concern in education. Patterns of test performance have been used as arguments in favor of a variety of school reform initiatives aimed at narrowing achievement gaps (e.g., same-gender classrooms, professional development to promote gender equity in instruction, programs that encourage girls to take advanced math and science classes). These initiatives are testimony to the importance of gender differences in achievement test results. It is well-established that the achievement of girls in most elementary school subjects, especially those that emphasize language skills, is higher than that of boys. Results from the most recent national standardization of the ITBS continue to document this finding. Reasons most frequently offered in the past to explain this situation are that girls receive more language stimulation in the home; that the general culture, and especially school culture, sets higher expectations for girls; and that the social climate and predominant teaching styles in elementary school are better suited to the interests and values of girls than boys (Hoover, 2003). 107 6.1 6.1 7.4 7.4 6.8 6.9 7.1 7.2 8.2 8.3 9.1 9.0 9.8 9.9 10.1 10.3 7 Girls 7 Boys 8 Girls 8 Boys 9 Girls 9 Boys 10 Girls 10 Boys 11 Girls 11 Boys 12 Girls 12 Boys 13 Girls 13 Boys 14 Girls 14 Boys 11.2 11.4 10.7 10.9 10.7 10.7 10.0 9.9 9.2 9.0 7.4 7.3 6.5 6.1 4.4 4.3 4.4 4.4 7.6 7.7 7.2 7.3 7.0 7.0 6.5 6.4 5.8 5.8 5.0 5.0 4.9 4.8 3.8 3.7 2.6 2.6 RT Reading Total 11.2 11.1 10.4 10.3 10.3 10.0 9.4 9.1 8.7 8.3 6.7 6.5 6.2 6.3 3.7 3.8 L1 Spelling 20.3 20.0 19.8 19.3 19.1 18.6 16.2 15.8 14.9 14.4 10.9 10.7 L2 Capitalization 18.5 18.5 17.3 17.2 16.0 15.5 14.6 14.4 13.5 13.3 10.9 10.5 L3 Punctuation 17.1 17.0 15.3 15.0 14.3 13.8 13.5 13.1 11.1 10.8 9.6 9.3 L4 8.6 8.5 8.0 7.9 7.6 7.4 6.8 6.7 6.1 6.0 4.8 4.7 6.1 6.0 4.9 4.9 6.0 5.9 4.4 4.3 LT Usage & Language Expression Total 10.3 10.3 9.7 9.8 9.0 9.2 9.1 9.1 8.1 8.1 8.0 8.2 7.6 7.6 6.4 6.5 M1 14.8 14.8 14.4 14.2 13.7 13.7 12.0 12.1 11.1 11.4 9.2 9.2 7.9 8.1 6.9 7.1 M2 Concepts Problems & Data & InterpreEstimation tation 9.0 9.0 8.7 8.6 8.2 8.3 7.5 7.6 6.9 7.0 6.1 6.1 5.5 5.5 4.7 4.8 5.6 5.6 13.3 13.9 12.9 13.4 11.1 11.2 8.1 8.3 7.1 7.2 6.0 6.2 5.5 5.6 5.0 5.2 M3 MT - 4.5 4.5 Computation Math Total Mathematics 7.5 7.6 7.2 7.3 6.6 6.7 5.7 5.8 5.1 5.2 4.5 4.6 4.1 4.1 3.5 3.6 MT + Math Total 4.8 4.9 4.6 4.6 4.4 4.4 4.0 4.0 3.6 3.6 3.1 3.1 3.2 3.2 2.6 2.6 4.1 4.0 3.4 3.4 CT - Core Total 4.5 4.6 4.3 4.3 4.1 4.1 3.7 3.6 3.3 3.3 2.8 2.8 2.9 2.9 2.4 2.4 CT + Core Total 14.3 14.3 13.8 13.7 13.2 13.1 11.5 11.5 10.1 10.0 8.5 8.5 10.6 10.6 7.8 7.8 SS Social Studies 13.5 13.6 12.7 12.7 11.9 11.8 11.1 11.1 10.2 10.2 9.7 9.7 10.8 11.2 9.1 9.3 SC Science 18.2 18.3 18.3 18.5 16.2 16.2 14.3 14.4 12.4 12.5 10.9 11.2 S1 14.8 14.3 13.0 12.9 12.4 12.2 10.7 10.2 9.4 9.0 7.4 7.3 S2 11.7 11.6 11.2 11.3 10.2 10.2 8.9 8.8 7.8 7.7 6.6 6.7 6.4 6.7 5.2 5.3 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 4.5 4.5 4.3 4.3 4.1 4.0 3.7 3.6 3.3 3.3 2.9 2.9 2.9 2.9 2.3 2.3 CC - 4.4 4.4 4.2 4.2 4.0 4.0 3.6 3.5 3.2 3.2 2.8 2.8 2.8 2.9 2.3 2.3 CC + Compo- Composite site 9.3 9.3 7.2 7.1 7.7 7.6 6.5 6.3 WA Word Analysis 8.7 8.6 7.5 7.5 6.5 6.4 5.0 4.9 Li Listening 1.8 1.8 1.4 1.4 2.9 2.9 3.8 3.7 RPT Reading Profile Total 3:15 PM Note: -Does not include Computation +Includes Computation 9.0 9.0 6 Girls 6 Boys RC RV 8.0 7.9 Comprehension Language 10/29/10 5 Girls 5 Boys Level Gender 108 Vocabulary Reading Table 7.1 Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Gender Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 108 6.3 5.9 6.0 7.4 7.0 7.0 6.9 6.8 7.1 7.2 7.3 7.3 8.2 8.5 8.6 9.1 9.3 9.4 9.7 10.5 10.5 10.1 11.0 11.0 7 White 7 Black 7 Hispanic 8 White 8 Black 8 Hispanic 9 White 9 Black 9 Hispanic 10 White 10 Black 10 Hispanic 11 White 11 Black 11 Hispanic 12 White 12 Black 12 Hispanic 13 White 13 Black 13 Hispanic 14 White 14 Black 14 Hispanic 11.3 11.9 11.7 10.8 11.1 11.0 10.7 10.6 10.7 10.0 9.8 9.8 9.1 8.9 9.0 7.3 6.9 7.1 6.3 5.5 5.6 4.4 4.2 4.0 4.4 4.3 4.4 RC 7.6 8.1 8.0 7.3 7.6 7.6 7.0 7.1 7.1 6.5 6.5 6.5 5.8 5.8 5.8 5.0 4.9 5.0 4.9 4.5 4.5 3.8 3.6 3.6 2.6 2.6 2.7 RT Reading Total 11.1 10.9 11.0 10.2 10.2 10.1 10.2 9.8 9.9 9.3 8.9 8.8 8.4 8.3 8.1 6.4 6.3 6.4 6.2 6.1 6.4 3.7 3.9 4.0 L1 Spelling 20.1 19.2 19.8 19.5 18.8 18.9 18.9 18.1 18.1 16.0 15.4 15.3 14.5 13.6 14.1 10.7 10.3 10.1 L2 Capitalization 18.5 18.3 18.5 17.2 16.9 17.2 15.6 15.1 15.4 14.7 14.3 14.3 13.3 12.9 13.1 10.7 9.8 10.2 L3 Punctuation 16.9 16.9 16.9 15.2 14.7 14.7 14.1 13.3 13.7 13.2 12.5 12.9 10.9 10.2 10.6 9.4 8.5 8.8 L4 8.5 8.3 8.4 8.0 7.7 7.8 7.5 7.2 7.3 6.8 6.5 6.5 6.0 5.7 5.9 4.7 4.4 4.5 5.9 5.8 5.8 5.0 4.6 4.7 5.9 5.4 5.5 4.4 4.2 4.1 LT 10.2 10.8 10.7 9.7 9.9 9.9 9.1 9.0 9.1 9.0 9.1 9.1 8.1 8.0 8.0 8.1 7.8 7.8 7.6 7.6 7.7 6.6 6.3 6.3 M1 14.8 15.1 15.1 14.3 14.3 14.5 13.7 13.2 13.4 12.1 11.9 12.0 11.2 11.0 11.1 9.2 8.9 9.1 7.9 8.1 8.1 7.0 7.1 7.0 M2 9.0 9.3 9.2 8.7 8.7 8.8 8.2 8.0 8.1 7.5 7.5 7.5 6.9 6.8 6.8 6.1 5.9 6.0 5.5 5.5 5.6 4.8 4.7 4.7 5.6 5.5 5.4 4.5 4.4 4.4 MT - Math Total 13.6 14.3 14.2 13.5 13.6 13.6 11.1 10.9 11.1 8.1 8.1 8.1 7.2 7.1 7.1 6.1 6.0 6.1 5.5 5.4 5.5 5.0 4.5 4.8 M3 Computation 7.5 7.8 7.8 7.3 7.4 7.4 6.6 6.4 6.5 5.7 5.7 5.7 5.2 5.1 5.1 4.6 4.4 4.5 4.1 4.1 4.2 3.6 3.5 3.5 MT + Math Total 4.8 5.0 5.0 4.6 4.6 4.7 4.4 4.3 4.3 4.0 3.9 4.0 3.6 3.5 3.6 3.1 3.0 3.0 3.1 3.1 3.1 2.6 2.5 2.5 4.0 3.9 3.8 3.4 3.3 3.1 CT - Core Total 4.6 4.7 4.7 4.3 4.4 4.4 4.1 4.0 4.0 3.7 3.6 3.6 3.3 3.2 3.2 2.8 2.6 2.7 2.9 2.8 2.8 2.4 2.3 2.3 CT + Core Total 14.3 14.5 14.5 13.7 13.8 13.8 13.2 12.8 12.9 11.6 11.3 11.4 10.1 9.6 9.8 8.6 8.2 8.2 10.7 10.2 10.4 8.0 7.2 7.1 SS Social Studies 13.5 13.6 13.6 12.7 12.5 12.6 11.9 11.4 11.6 11.1 10.5 10.9 10.2 9.6 9.9 9.7 9.0 9.3 11.0 9.9 10.2 9.8 8.9 8.2 SC Science 18.3 18.2 18.6 18.5 18.3 18.5 16.3 15.9 16.1 14.4 14.2 14.3 12.5 12.0 12.2 11.0 10.1 10.6 S1 14.3 13.9 13.9 13.0 12.5 12.6 12.3 11.6 11.8 10.3 9.7 10.2 9.2 8.6 8.9 7.2 6.9 7.0 S2 11.6 11.5 11.6 11.3 11.0 11.2 10.2 9.8 10.0 8.8 8.6 8.8 7.8 7.4 7.6 6.6 6.1 6.3 6.6 6.3 6.3 5.3 5.2 5.2 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 4.5 4.6 4.6 4.3 4.3 4.3 4.1 3.9 4.0 3.7 3.5 3.6 3.3 3.1 3.2 2.9 2.7 2.8 2.9 2.7 2.8 2.4 2.2 2.2 CC - 4.4 4.5 4.5 4.2 4.2 4.2 4.0 3.9 3.9 3.6 3.4 3.5 3.2 3.0 3.1 2.8 2.7 2.7 2.8 2.7 2.7 2.3 2.2 2.1 CC + Compo- Composite site 9.2 9.1 9.3 7.3 6.8 7.1 7.5 7.3 7.2 6.5 6.5 6.1 WA Word Analysis 8.7 8.3 8.4 7.6 7.1 7.2 6.4 6.2 6.2 5.0 5.0 4.8 Li Listening 1.8 1.7 1.7 1.4 1.3 1.4 2.9 2.8 2.8 3.9 3.7 3.5 RPT Reading Profile Total 3:15 PM Note: -Does not include Computation +Includes Computation 9.0 8.6 8.4 6 White 6 Black 6 Hispanic RV 8.2 7.7 7.0 Group Comprehension Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 5 White 5 Black 5 Hispanic Level Vocabulary Reading Table 7.1 (continued) Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Group Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 109 109 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 110 At the same time, the relatively higher achievement of boys in advanced math and science curricula in the upper grades is a common observation. Group differences in test scores across the content domains and grade levels of the ITBS have been monitored for many years and continue to shed light on the developmental nature of performance differences. Table 7.2 presents mean differences between the achievement of boys and girls in kindergarten through grade 8. The differences are based on results from all of the students participating in the 2000 national standardization whose gender was coded on the answer document (99% of the sample). To examine gender differences in test scores, the frequency distributions of scores for boys and girls were obtained and differences at various points on the distributions were examined with an effect-size statistic. The effect size was defined as the difference between the group statistics (girls minus boys) in total-sample standard deviation units. For example, an effect size for the group means would equal the difference between the boys’ mean and the girls’ mean divided by the totalsample standard deviation (SD). As the table shows, on the average girls were markedly higher in achievement in reading comprehension, language, computation, and reference skills. In addition, the difference between scores earned by boys and girls tended to increase with grade level. The complex nature of gender differences doesn’t necessarily appear when means are compared (Hoover, 2003). Many of the conflicting results previously reported concerning gender differences are explained by consideration of the entire distribution of achievement. The results of an analysis of differences between the score distributions of boys and girls in grades 4 and 8 is presented in Table 7.3. The upper part of the table contains means and SDs of boys and girls for each test and composite score in the battery as well as mean effect sizes and ratios of SDs for the groups. The lower part of the table contains effect sizes computed at three points in the distributions, the 10th, 50th, and 90th percentiles. Complete data for all grades are available from the Iowa Testing Programs. Overall, the performance of boys tends to be more variable than that of girls. This shows up in differences between the standard deviations for the two groups as well as differences in the scores at selected percentiles. For example, when examining the means, boys and girls do not appear to differ greatly in science at grade 4. Because the performance of boys shows more variability (SDB = 30.2, SDG = 28.1) however, the highest scoring boys do better than the highest scoring girls (Effect90%ile= ⫺.10) and the lowest scoring boys do worse than the 110 lowest scoring girls (Effect10%ile= +.10). Differences in the other areas were not as large but still favored girls, especially at the middle and lower achievement levels. At the upper achievement levels, the performance of boys tended to equal or surpass that of girls in all achievement areas except Language, Reference Materials, and Math Computation. In Vocabulary, Math Problem Solving, Social Studies, Science, and Maps and Diagrams, the “crossover” point where equality of achievement by gender occurs is slightly above the median in most grades. Similar data on gender differences were obtained for kindergarten through grade 8 in the 1977, 1984, and 1992 national standardizations, and for grades 3 through 8 in 1963 and 1970. Gender differences for all national standardizations since 1963 are summarized by total score in Table 7.4. Note that the differences in Table 7.4 are expressed in months on the grade-equivalent scale. The direction and magnitude of the differences have remained stable; however, several trends are noteworthy. In Reading since 1992, differences at the 90th percentile have been greater than they were earlier. In Language, the tendency for differences favoring girls to increase in magnitude with grade level, especially at the 10th and 50th percentiles, continued in the 2000 results. In Math, gender differences at the median remained small in 2000; at the 10th percentile, they were near the 1977 peak or exceeded it at some grade levels. Gender differences in composite performance in 2000 were smaller at the 90th and the 50th percentile than at the 10th percentile. The educational implications of these differences across test areas, achievement levels, and time are not immediately apparent. The importance of quantitative skills for success in high school and college and in employment has been recognized in educational policy. Attention has been focused on the gap between boys and girls in quantitative skills, especially for select groups of students; the evidence across several national standardizations of the ITBS suggests the magnitude of gender differences in math concepts and problem-solving skills has been reduced. Similar attention has not been targeted at differences in language skills. In view of the importance of language skills in high school achievement and in predicting college and vocational success, greater emphasis should be placed on language arts programs that engage the attention of boys. The fact that gender differences in achievement are relatively small or nonexistent in kindergarten and first grade raises questions about the influence of school on such differences. -.03 .12 .03 .05 .01 .01 -.04 .01 .08 6 7 8 9 10 11 12 13 14 .16 .18 .17 .16 .14 .15 .19 .18 .09 RC .13 .11 .08 .10 .09 .11 .11 .16 .10 RT Reading Total .27 .32 .26 .31 .28 .25 .27 .22 L1 Spelling .37 .31 .28 .30 .28 .23 L2 Capitalization .47 .45 .41 .41 .28 .23 L3 Punctuation .40 .32 .29 .27 .18 .19 L4 .43 .40 .36 .37 .29 .26 .27 .24 .08 .14 LT -.06 -.01 -.03 -.01 -.06 -.09 -.16 -.05 M1 .05 .05 .01 .06 .01 .04 -.11 .01 M2 .00 .02 -.01 .03 -.02 -.02 -.15 -.02 .00 .08 MT - Math Total .25 .24 .25 .25 .17 .07 -.02 -.10 M3 Computation .09 .10 .08 .10 .04 .00 -.12 -.04 MT + Math Total .22 .21 .17 .19 .14 .13 .09 .14 .01 .10 CT - Core Total .25 .24 .20 .22 .16 .14 .11 .14 CT + Core Total .07 .10 .03 -.04 -.02 .00 -.17 -.05 SS Social Studies .13 .11 .08 .01 .00 -.01 -.25 -.03 SC Science .01 .01 -.10 -.03 -.01 -.03 S1 .32 .27 .22 .27 .21 .14 S2 .16 .14 .05 .11 .09 .05 .00 .08 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .18 .17 .11 .11 .08 .07 .00 .07 CC - .16 .15 .10 .09 .07 .06 .01 .07 CC + Compo- Composite site .11 .09 .09 .15 WA Word Analysis .10 .03 .05 .09 Li Listening .16 .15 .07 .12 RPT Reading Profile Total 3:15 PM Note: Positive differences favor girls. -Does not include Computation +Includes Computation .07 5 RV Comprehension Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 Level Vocabulary Reading Table 7.2 Male-Female Effect Sizes for Average Achievement Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 111 111 22.3 .01 1.03 -.03 .01 .04 SD d** R*** Difference** 90 50 10 247.8 31.7 250.6 29.2 .09 1.08 -.01 .06 .27 Male Mean* SD Female Mean SD d** R*** Difference** 90 50 10 .31 .17 -.01 1.09 .16 39.2 253.3 42.8 246.5 .16 .14 .15 1.01 .14 28.4 207.8 28.8 203.7 .25 .15 .00 1.09 .14 32.5 252.0 35.3 247.3 .11 .09 .07 1.02 .09 23.9 205.2 24.4 203.0 RT Reading Total .41 .29 .17 1.07 .28 33.7 255.8 36.2 245.9 .30 .28 .35 1.00 .28 25.0 207.2 24.9 200.1 L1 Spelling .35 .44 -.33 1.03 .37 49.0 262.8 50.4 244.5 .32 .24 .29 .99 .28 35.7 211.0 35.4 200.9 L2 Capitalization .42 .58 .28 1.04 .47 48.7 265.2 50.5 242.0 .24 .28 .30 .98 .28 34.1 211.6 33.5 202.0 L3 .41 .48 .30 1.03 .41 50.2 262.5 51.4 241.9 .22 .19 .14 1.01 .18 33.8 211.0 34.3 204.9 L4 .42 .50 .32 1.04 .44 40.4 261.8 42.1 243.8 .30 .30 .28 1.00 .29 27.9 210.3 27.9 202.3 LT Punctu- Usage & Language ation Expression Total .08 -.02 -.16 1.10 -.04 31.6 251.3 34.6 252.5 .08 -.05 -.15 1.08 -.05 21.7 201.6 23.3 202.7 M1 .18 .08 -.11 1.12 .06 39.6 253.0 44.3 250.5 .15 .04 -.15 1.10 .02 26.7 205.4 29.4 204.8 M2 Concepts Problems & Data & InterpreEstimation tation 25.0 .14 .03 -.15 1.12 .02 33.7 252.3 37.6 251.6 .12 -.01 -.16 1.09 -.01 22.9 203.5 .28 .32 .17 1.04 .27 35.9 255.6 37.2 245.9 .25 .17 .15 1.03 .19 20.1 203.7 20.6 199.9 M3 MT - 203.8 Computation Math Total Mathematics .21 .15 -.10 1.11 .11 31.0 253.5 34.3 249.9 .18 .05 -.08 1.08 .05 20.2 203.7 21.8 202.7 MT + Math Total .28 .27 .08 1.07 .22 32.9 255.7 35.3 248.2 .19 .15 .10 1.04 .14 23.0 206.6 23.8 203.3 CT - Core Total .31 .31 .13 1.07 .26 31.8 256.2 34.1 247.7 .21 .18 .13 1.03 .16 22.1 206.7 22.8 203.0 CT + Core Total .25 .16 -.16 1.15 .09 38.7 251.4 44.4 247.8 .14 -.05 -.11 1.08 -.02 24.9 204.3 26.8 204.7 SS Social Studies .23 .20 -.10 1.12 .13 39.7 255.2 44.3 249.8 .10 .02 -.10 1.07 .01 28.1 206.6 30.2 206.3 SC Science .15 .02 -.16 1.13 .02 43.3 252.9 48.7 252.2 .11 -.01 -.13 1.08 -.02 29.5 206.3 31.9 206.8 S1 .37 .38 .18 1.06 .33 38.7 258.7 41.0 245.7 .20 .23 .20 1.01 .22 25.0 207.6 25.2 202.2 S2 .26 .20 -.02 1.10 .17 38.2 255.9 42.0 249.0 .18 .09 .00 1.05 .10 25.3 207.0 26.7 204.5 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .25 .23 .00 1.10 .18 33.5 255.5 36.9 249.3 .16 .09 .00 1.06 .08 22.9 206.5 24.2 204.6 CC - .25 .22 -.04 1.11 .16 33.2 255.2 36.8 249.6 .16 .07 -.02 1.06 .07 22.8 206.5 24.2 204.9 CC + Compo- Composite site 3:15 PM Grade 8 22.9 202.4 SD Female Mean 202.2 RC RV Male Mean* Grade 4 Comprehension Language 10/29/10 * Means and SDs are in standard score units. ** d = (Female mean-Male mean)/average SD of Male and Female *** R = Male SD/Female SD -Does not include Computation +Includes Computation 112 Vocabulary Reading Table 7.3 Descriptive Statistics by Gender Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 112 Note: 10 50 2 1 1 2 6 5 4 3 0 1 0 1 1 0 -1 2 2 1 1 0 0 0 -1 -1 0 0 -1 0 0 0 0 -3 -3 -2 -1 -2 0 1 1 0 1 1 1 1 4 5 0 1 0 1 0 0 -1 0 1 0 2 0 0 0 0 -1 -1 1 2 3 3 1 2 2 2 2 3 2 1 2 0 1 1 1 -1 0 1 2 2 2 5 4 3 3 3 4 4 4 3 3 2 1 0 2 1 1 1 2 3 3 5 6 2 2 2 2 6 4 5 6 2 1 1 0 0 1 1 1 1 2 3 4 4 4 6 8 1 2 2 2 3 3 4 6 2 2 2 0 0 1 1 1 1 1 2 5 6 6 6 3 3 1 2 2 4 4 3 4 2 2 2 2 3 3 3 0 1 3 2 2 3 4 6 6 2 2 2 2 3 4 4 4 2 2 1 2 3 2 0 In grade-equivalent units — “months,” positive differences favor girls. 0 0 0 1 1 1 1 0 K 1 2 2 2 1 1 2 3 3 1 -1 0 0 1 1 1 0 -1 -1 -2 -1 -1 0 1 -1 -1 -2 -3 -1 1 1 3 2 1 4 2 6 2 0 7 5 3 2 8 3 0 K 5 1 1 3 1 1 1 1 1 2 2 2 2 2 0 -1 1 0 1 0 7 0 8 2 -1 -1 -1 0 3 1 1 K 0 4 -2 0 0 5 0 -3 -4 1 0 6 0 0 0 0 ’63 ’70 ’77 ’84 ’92 ’00 Reading 7 4 5 5 6 6 4 0 2 4 4 5 5 5 5 2 2 5 6 5 6 5 6 7 1 3 3 4 6 6 4 6 7 8 10 1 1 2 3 4 5 6 7 8 1 3 5 5 7 8 9 10 11 10 9 11 11 5 5 4 6 9 6 5 1 2 3 4 4 5 7 0 0 1 2 3 4 6 7 1 1 2 3 4 7 7 8 9 10 1 0 3 4 4 5 6 7 2 2 3 3 4 6 5 6 2 2 3 3 4 5 6 7 9 1 2 3 4 4 6 7 9 9 10 2 3 2 2 2 3 4 6 8 10 9 1 0 3 3 3 4 5 5 5 ’63 ’70 ’77 ’84 ’92 ’00 Total Language 1 1 1 1 0 0 1 2 1 2 1 0 1 1 1 1 2 -2 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -2 -2 0 0 1 1 2 2 2 3 4 0 0 1 1 1 2 3 4 5 -1 -1 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 1 0 0 0 1 1 2 2 1 0 0 0 1 1 1 3 5 5 4 3 0 0 0 0 0 1 2 2 0 0 0 -1 -2 0 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -3 1 1 0 1 2 2 3 4 3 0 0 -1 0 0 2 1 2 2 0 -1 -1 -1 -1 0 -2 -1 -2 ’63 ’70 ’77 ’84 ’92 ’00 Total Mathematics 1 1 1 3 0 1 2 1 1 3 1 2 1 -1 -1 1 1 -1 1 1 1 3 3 3 2 2 2 2 3 4 0 0 0 0 0 -1 2 2 3 3 4 4 5 2 2 2 3 3 4 6 0 0 -1 -1 0 0 0 0 5 1 2 3 3 3 3 6 1 1 2 2 3 3 1 1 2 3 4 3 4 4 1 0 1 1 1 3 4 4 0 1 -2 0 -1 -1 -2 -1 -1 0 -1 -1 -1 -2 1 1 3 2 3 3 4 5 1 0 1 1 2 1 4 4 1 -2 -1 0 0 -2 -1 0 ’63 ’70 ’77 ’84 ’92 ’00 Total Work Study 1 1 1 2 1 2 3 3 3 4 3 2 1 1 1 1 3 0 1 1 1 2 2 2 3 3 3 4 5 4 2 2 2 1 1 1 0 1 1 2 2 3 3 4 5 0 1 2 3 3 3 4 5 5 0 1 1 0 1 1 1 1 1 0 0 1 2 3 2 3 4 5 1 0 1 2 2 2 2 3 4 0 1 2 1 0 0 1 0 0 1 1 1 2 2 3 4 4 3 1 1 1 1 1 2 2 3 3 1 1 0 0 0 0 1 1 0 1 1 0 2 2 2 3 4 4 1 1 0 1 1 1 2 3 4 0 1 0 0 0 0 0 0 -1 ’63 ’70 ’77 ’84 ’92 ’00 Composite 3:15 PM 2 0 7 ’63 ’70 ’77 ’84 ’92 ’00 8 Grade Vocabulary 10/29/10 90 Percentile Rank Table 7.4 Gender Differences in Achievement over Time Iowa Tests of Basic Skills 1963–2000 National Standardization Data 961464_ITBS_GuidetoRD.qxp Page 113 113 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 114 Racial-Ethnic Differences in Achievement Differences between the average test scores of different racial-ethnic groups, specifically the gaps between average scores for Blacks and Whites, and Hispanics and Whites, have been an ongoing concern in achievement testing. Results from national assessments such as NAEP have consistently shown such gaps in performance. Monitoring changes in the achievement gap over time is an important function of achievement tests. Historical data from The Iowa Tests allow only limited comparisons across time. Table 7.5 shows the differences in performance of fifth-grade students (Whites and Blacks) in national standardization studies since 1977. Mean differences are reported as the effect-size statistic (White mean minus Black mean divided by the pooled group standard deviation). These results support the conclusion that differences between Whites and Blacks have narrowed somewhat, particularly in reading and language skills. The gap continues, however, and the problem of group differences in achievement remains formidable. Group differences in the 2000 national standardization of The Iowa Tests are consistent with the differences observed elsewhere. Specifically, Whites outperform Blacks and Hispanics in terms of average performance in all subjects tested. Although the size of the gap varies by subject area, the averages can differ by more than one-half of a standard deviation. Table 7.6 reports effect sizes for these average differences. 114 Table 7.5 Race Differences in Achievement Iowa Tests of Basic Skills, Grade 5 Balanced National Standardization Samples, 1977–2000 Form Year Vocabulary Reading Language Spelling Capitalization Punctuation Usage & Expression Mathematics Concepts & Estimation Problem Solving & Data Interpretation Computation Sources of Information Maps & Diagrams Reference Materials Note: 7 1977 G 1984 K 1992 A 2000 .84 .80 .68 .58 .72 .63 .54 .54 .47 .62 .66 .87 .22 .42 .50 .59 .19 .35 .44 .63 .17 .27 .35 .54 .67 .56 .61 .58 .63 .30 .78 .25 .64 .29 .58 .20 .80 .65 .62 .52 .62 .48 .55 .41 Effect size = (White mean-Black mean) / total SD. It is important not to confuse these effect sizes with the term “bias” as defined in the 1999 Standards for Educational and Psychological Testing. Statistical bias is addressed by research related to differential item functioning or prediction differences due to group membership. Note that the effect sizes reported here are consistent with NAEP findings of smaller group differences during the early 1990s, but that finding is not uniform across subtests. -.55 -.59 -.78 -.65 13 14 5 6 -.63 -.51 -.54 11 12 13 14 -.48 -.42 -.51 -.50 -.36 -.41 -.45 -.41 -.21 -.63 -.58 -.64 -.54 -.49 -.54 -.49 -.60 -.58 -.47 -.56 -.59 -.56 -.24 -.65 -.60 -.70 -.58 -.58 -.56 -.48 -.53 -.19 RT Reading Total -.35 -.26 -.34 -.31 -.17 -.22 -.30 -.31 -.23 -.12 -.19 -.17 -.11 -.05 -.11 -.27 L1 Spelling Note: Positive differences favor Whites. -Does not include Computation +Includes Computation -.55 -.59 10 -.66 -.67 12 9 -.53 11 -.64 -.62 -.62 -.59 9 10 8 -.47 -.48 8 7 -.41 -.57 7 -.41 -.41 -.17 -.46 RC 6 RV Comprehension 5 Level Vocabulary Reading -.40 -.36 -.43 -.30 -.22 -.27 -.45 -.43 -.39 -.26 -.35 -.23 L2 Capitalization -.43 -.32 -.41 -.32 -.24 -.27 -.42 -.43 -.46 -.35 -.39 -.39 L3 Punctuation -.49 -.47 -.48 -.50 -.36 -.38 -.56 -.57 -.60 -.54 -.54 -.54 L4 -.48 -.40 -.48 -.41 -.29 -.34 -.43 -.44 -.54 -.68 -.48 -.47 -.49 -.39 -.42 -.37 -.45 -.44 -.47 -.57 LT Usage & Language Expression Total Language -.46 -.45 -.47 -.47 -.39 -.45 -.44 -.61 -.64 -.63 -.59 -.58 -.60 -.57 -.56 -.65 M1 -.48 -.45 -.48 -.47 -.35 -.38 -.37 -.47 -.71 -.67 -.66 -.58 -.54 -.54 -.54 -.60 M2 Concepts Problems & Data & InterpreEstimation tation -.50 -.48 -.50 -.50 -.39 -.44 -.43 -.58 -.43 -.67 -.72 -.69 -.67 -.61 -.60 -.59 -.59 -.67 -.42 -.51 MT - Math Total -.20 -.18 -.26 -.19 -.16 -.15 -.21 -.30 -.26 -.21 -.27 -.20 -.29 -.22 -.33 -.47 M3 Computation Mathematics -.44 -.41 -.46 -.44 -.35 -.39 -.39 -.55 -.62 -.58 -.58 -.53 -.55 -.52 -.55 -.67 MT + Math Total -.55 -.49 -.57 -.53 -.41 -.48 -.55 -.59 -.63 -.83 -.66 -.63 -.66 -.56 -.57 -.55 -.56 -.61 -.49 -.58 CT - Core Total -.53 -.47 -.56 -.51 -.40 -.47 -.54 -.56 -.63 -.59 -.63 -.54 -.56 -.52 -.55 -.61 CT + Core Total -.43 -.41 -.52 -.39 -.36 -.46 -.39 -.56 -.57 -.53 -.61 -.50 -.57 -.46 -.46 -.60 SS Social Studies -.43 -.48 -.55 -.47 -.40 -.40 -.50 -.61 -.58 -.58 -.66 -.59 -.60 -.56 -.69 -.55 SC Science -.42 -.40 -.46 -.46 -.33 -.37 -.61 -.59 -.59 -.54 -.57 -.50 S1 -.48 -.35 -.42 -.35 -.24 -.32 -.48 -.50 -.52 -.40 -.40 -.41 S2 -.48 -.41 -.48 -.44 -.31 -.37 -.43 -.45 -.59 -.59 -.60 -.51 -.53 -.50 -.48 -.47 ST Maps Reference Sources and Materials Total Diagrams Sources of Information Table 7.6 Effect Sizes for Racial-Ethnic Differences in Average Achievement Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization -.53 -.49 -.58 -.51 -.41 -.48 -.58 -.65 -.66 -.64 -.69 -.59 -.62 -.57 -.65 -.69 CC - -.52 -.49 -.58 -.51 -.41 -.47 -.58 -.65 -.66 -.63 -.68 -.59 -.62 -.56 -.65 -.69 CC + Compo- Composite site -.40 -.41 -.36 -.48 -.42 -.47 -.29 -.26 WA Word Analysis -.56 -.55 -.52 -.61 -.65 -.69 -.48 -.55 Li Listening -.57 -.55 -.48 -.74 -.50 -.59 -.37 -.48 RPT Reading Profile Total 3:15 PM Black – 10/29/10 Hispanic – White 961464_ITBS_GuidetoRD.qxp Page 115 115 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 116 Differential Item Functioning In developing materials for all forms of The Iowa Tests, attention is paid to writing questions in contexts accessible to students with a variety of backgrounds and interests. Obviously, it is impossible for all stimulus materials to be equally interesting to all students. Nevertheless, a goal of all test development in the Iowa Testing Programs is to assemble test materials that reflect the diversity of the test-taking population in the United States. In pursuing this goal, all proposed stimulus materials and test items are reviewed for appropriateness and evaluated statistically for differential item functioning (DIF). Numerous research studies and editorial efforts examined the presence of differential item functioning in tryout materials and final forms of the Iowa Tests of Basic Skills (e.g., Dunbar, Ordman, & Mengeling, 2002). During item development, original test materials and questions were written by individuals from diverse backgrounds and subjected to editorial scrutiny by the staff of the Iowa Testing Programs and by representatives of the publisher. During this phase of development, items that portray situations markedly unfamiliar to most students because of socio-cultural factors were revised or removed from the pool of items considered for tryout units. Educators also evaluated test items for perceived fairness and cultural sensitivity as well as for balance in regional, urban-rural, and male-female representativeness. The educators were selected to represent Blacks, Whites, Hispanics, American Indians, and Asians. Members of the review panels for Forms A and B are listed in Table 7.7. Reviewers were given information about the philosophical foundations of the tests, the skill areas or classifications that were to be reviewed, and other 116 general information about the instruments. Along with the sets of items, reviewers were asked to look for possible racial-ethnic, regional, cultural, or gender biases in the way the item was written or in the information required to answer the question. The reviewers rated items as “probably fair,’’ “possibly unfair,” or “probably unfair” and to comment on the balance of the items and make recommendations for change. Based on these reviews and the statistical analysis of DIF, items identified by the reviewers as problematic were either revised to eliminate objectionable features or eliminated from consideration for the final forms. The statistical analysis of items for DIF were based on variants of the Mantel-Haenszel procedure (Dorans & Holland, 1993). The analysis of items in the final editions of Forms A and B was conducted with data from the 2000 national standardization sample. Specific item-level comparisons of performance were made for groups of males and females, Blacks and Whites, and Hispanics and Whites. The sampling approach for DIF analysis, which was developed by Coffman and Hoover, is described in Witt, Ankenmann, and Dunbar (1996). For each subtest area and level, samples of students from comparison groups were matched by school building. Specifically, the building-matched sample for each grade level was formed by including, for each school, all students in whichever group constituted the minority for that school and an equal number of randomly selected majority students from the same school. This method of sampling attempts to control for response differences between focal and reference groups related to the influence of school curriculum and environment. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 117 Table 7.7 Fairness Reviewers Iowa Tests of Basic Skills, Forms A and B Name Position Address Gender Ethnicity A. Yvette Alvarez-Rooney Bilingual Diagnostician Shaw Elementary School Phoenix, AZ Female Hispanic Richard Botteicher Assistant Superintendent Spring Cove School District Roaring Spring, PA Male White Monte E. Dawson Director of Monitoring & Evaluation Alexandria City Public Schools Alexandria, VA Male Black Hoi Doan Middle School Teacher The International Center Chamblee, GA Male Asian Dr. Todd Fletcher Assistant Professor Department of Special Education University of Arizona Tucson, AZ Male White Jaime Garcia Principal Sunnydale Elementary School Streamwood, IL Male Hispanic Alfredo A. Gavito Director of Research Houston Independent School District Houston, TX Male Hispanic Paul Guevara Director of Teacher Support, Program Compliance and Community Outreach Merced City School District Merced, CA Male Hispanic José Jimenez Director, Bilingual & World Languages Camden City School District Camden, NJ Male Hispanic LaUanah King-Cassell Principal St. James & St. John School Baltimore, MD Female Black Viola LaFontaine, Ph.D. Superintendent Belcourt School District #7 Belcourt, ND Female American Indian Theresa C. Liu, Ph.D. School Psychology Supervisor Milwaukee Public Schools Milwaukee, WI Female Asian Thelma J. Longboat Program Coordinator Black Rock Academy Public School #51 Buffalo, NY Female American Indian Patti Luke Title I Coordinating Teacher Arbor Heights Elementary School Seattle, WA Female Asian Carlos G. Manrique Assistant Superintendent El Monte City School District El Monte, CA Male Hispanic Koko Mikel Grade 4 Teacher Sand Lake Elementary School Anchorage, AK Female Asian Joseph Montecalvo Planning/Test Specialist Fairfax County Public Schools Falls Church, VA Male White Francesca Nguyen Grade 7 Social Studies Teacher Nichols Junior High School Biloxi, MS Female Asian 117 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 118 Table 7.7 (continued) Fairness Reviewers Iowa Tests of Basic Skills, Forms A and B Name Position Address Gender Ethnicity Michael A. O’Hara Assistant Superintendent St. Joseph Public Schools St. Joseph, MI Male White Margherita Patrick Grade 1 Teacher Spring River Elementary School Gautier, MS Female Hispanic Joseph Prewitt-Diaz, Ph.D. School Psychologist Chester Upland School District Chester, PA Male Hispanic Evelyn Reed Director, Test & Research Department Dallas Independent School District Dallas, TX Female Black Rosa Sailes Medill Training Center Chicago Public Schools Chicago, IL Female Hispanic Susan J. Sharp Educational Consultant Midlothian, TX Female White Samuel E. Spaght Associate Superintendent Curriculum Delivery Services Wichita Public Schools Wichita, KS Male Black Harry D. Stratigos, Ed.D. Math Education Advisor Pennsylvania Department of Education Harrisburg, PA Male White John C. Swann, Jr. Supervisor of Testing Dayton Public Schools Dayton, Ohio Male Black Shelby Tallchief Administrator Title IX Federal Indian Education Program Albuquerque Public Schools Albuquerque, NM Male American Indian Lawrence Thompson Guidance Counselor Big Beaver Falls Area Middle School Beaver Falls, PA Male Black Rosanna Tubby-Nickey Choctaw Language Specialist Choctaw Tribal Schools Philadelphia, MS Female American Indian Doris Tyler, Ed.D. Evaluation Specialist Wake County Public Schools Raleigh, NC Female Black Marguerite L. Vellos Dean of Instruction Farragut Career Academy Chicago, IL Female Hispanic Robert C. West, Ph.D. Testing and Evaluation Consultant Macomb Intermediate School District Clinton Township, MI Male White Margaret Winstead Reading Coordinator Moore Public Schools Moore, OK Female American Indian Debra F. Wynn Coordinator of Assessment (retired) Harrison School District Colorado Springs, CO Female Black Youssef Yomtoob, Ph.D. Superintendent Hawthorn School #73 Vernon Hills, IL Male White Liru Zhang, Ph.D. Education Associate Assessments & Accountability Delaware Department of Education Dover, DE Female Asian 118 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 119 Table 7.8 shows a summary of the results of DIF analyses conducted for the items included in the final edition of Form A. The main columns of the table indicate the number of items identified as favoring a given group according to the classification scheme used by the Educational Testing Service for NAEP. In this classification scheme, items are first flagged for statistical significance, and then for DIF effect sizes large enough that their impact on total scores should be considered. In the study of items from Form A, statistical significance levels for each subtest were adjusted for the total number of items on the test. DIF magnitudes that were flagged correspond roughly to a conditional group difference of .15 or greater on the proportion correct scale (Category C in the NAEP scheme). A total of 3,759 test items were included in the DIF study that investigated male/female, Black/White, and Hispanic/White comparisons. As can be seen from the last row of the table, the overall percentages of items flagged for DIF in Form A were small and generally balanced across comparison groups. This is the goal of careful attention to content relevance and sensitivity during test development. Table 7.8 Number of Items Identified in Category C in National DIF Study Iowa Tests of Basic Skills, Form A 2000 National Standardization Study Gender Group Number of Items Favor Females Favor Males Favor Blacks Favor Whites Favor Hispanics Favor Whites 344 7 15 7 14 14 12 Reading 386 12 12 2 4 2 7 Word Analysis 138 1 1 0 2 0 3 Listening 122 2 0 1 1 5 4 Test Vocabulary Spelling 262 5 5 8 4 3 2 Capitalization 174 6 4 1 1 0 1 Punctuation 174 2 1 2 0 0 1 Usage and Expression 355 3 8 10 9 3 10 Concepts & Estimation 305 3 1 3 7 2 6 Problem Solving & Data Interpretation 284 5 3 1 4 0 5 Computation 231 1 1 0 3 0 0 Social Studies 286 6 10 6 5 4 4 Science 286 9 11 2 5 2 6 Sources of Information 412 6 8 5 8 0 1 3759 68 80 48 67 35 62 1.8 2.1 1.3 1.8 0.9 1.6 Total Percent 119 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 120 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 121 Relationships in Test Performance PART 8 area are more likely to do well in other areas, and students who do poorly in one area are likely to do poorly in other areas. One part of the relation is due to “extraneous” factors, such as vocabulary or reading ability. A greater part, however, is probably due to the emphasis on a curriculum that promotes growth in all achievement areas. If a student’s development lags in one area, instruction is designed and implemented to strengthen that area. Variability in the quality of schooling may be another factor in the consistency of a student’s performance. A student who is helped by the quality of instruction in one subject is likely to be helped in other subjects. Correlations Among Test Scores for Individuals Correlation coefficients among scores on achievement test batteries indicate whether the obtained scores measure something in common. High correlations suggest common sources of variation in the test scores, but they do not reveal the cause. Conversely, lack of correlation may be viewed as evidence the tests are measuring something unique. It should be noted that correlations between obtained scores are attenuated by measurement error in the scores. Correlations among developmental standard scores of the Complete Battery Form A are shown in Table 8.1. These were based on matched samples of students who took Form A of the Iowa Tests of Basic Skills (ITBS) and the Cognitive Abilities Test (CogAT) in the spring 2000 standardization. Correlations are reported for Levels 5 through 14 in kindergarten through grade 8. Structural Relationships Among Content Domains Insight into the nature of relations among tests can sometimes be obtained from factor analysis. The factor structure of Form A of the ITBS was analyzed in two stages. Two random samples of 2,000 students each were drawn from the 2000 spring standardization. The first sample was used to define a common factor model for the battery using Moderate to high correlations among achievement measures are expected in representative samples. They show that students who do well in one content Table 8.1 Correlations Among Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 6 Level 5 Vocabulary Word Analysis Listening Language Mathematics Core Total Reading Words Reading Comprehension Reading Total Reading Profile Total Vocabulary Word Analysis Listening Language Mathematics Core Total Reading Words Reading Comprehension Reading Total Reading Profile Total V WA Li L M CT RW RC RT RPT .56 .65 .60 .65 .67 .71 .60 .71 .67 .74 .88 .73 .77 .89 .87 .49 .76 .53 .59 .68 .66 .46 .71 .50 .56 .65 .62 .81 .49 .78 .54 .60 .70 .67 .95 .95 .78 .89 .78 .77 .80 .89 .86 .83 .89 .51 .61 .63 .58 .89 .85 .56 .59 .62 .65 .83 .72 .69 .76 .83 .70 .87 .76 .84 .74 .91 - - - 121 122 -Does not include Computation +Includes Computation .93 .73 .48 .70 .73 .58 .58 .63 .51 .64 .85 .86 .49 .46 .66 .79 .79 .89 .76 RC Comprehension .78 .54 .71 .77 .63 .62 .68 .53 .68 .91 .92 .55 .52 .69 .85 .85 .95 .95 .93 RT Reading Total .52 .69 .74 .63 .61 .67 .55 .68 .81 .81 .51 .50 .64 .83 .83 .89 .74 .71 .77 WA Word Analysis .39 .57 .62 .60 .65 .42 .64 .65 .63 .63 .59 .56 .77 .76 .69 .58 .54 .60 .52 Li Listening .75 .53 .50 .56 .50 .58 .74 .76 .39 .38 .59 .68 .68 .80 .63 .64 .68 .68 .37 L1 Spelling .67 .66 .72 .57 .73 .91 .92 .56 .54 .69 .85 .85 .83 .71 .71 .75 .73 .56 .74 L Language .72 .91 .61 .90 .81 .79 .61 .60 .65 .81 .80 .71 .61 .60 .64 .62 .58 .52 .65 M1 Concepts .94 .62 .92 .82 .79 .59 .56 .65 .80 .79 .69 .62 .61 .65 .62 .57 .51 .65 .75 M2 Problems & Data Interpretation .67 .98 .88 .85 .65 .62 .70 .86 .85 .75 .80 .65 .68 .44 .41 .55 .62 .64 .59 .47 .49 .51 .51 .40 .48 .55 .62 .64 .67 M3 MT - .66 .64 .69 .66 .62 .55 .70 .92 .95 Computation Math Total Mathematics .88 .87 .64 .61 .71 .86 .86 .76 .65 .64 .69 .67 .59 .57 .70 .90 .92 .97 .82 MT + Math Total .99 .65 .62 .77 .95 .94 .94 .86 .85 .92 .80 .66 .73 .90 .82 .83 .88 .64 .87 CT - Core Total .64 .60 .77 .94 .94 .95 .87 .86 .92 .81 .65 .74 .91 .80 .81 .86 .68 .87 .99 CT + Core Total .63 .58 .78 .78 .61 .58 .52 .59 .49 .58 .35 .51 .55 .55 .59 .38 .57 .62 .62 SS Social Studies .56 .76 .76 .59 .57 .49 .57 .48 .60 .31 .49 .56 .55 .60 .36 .57 .61 .60 .61 SC Science .82 .82 .74 .69 .68 .73 .66 .58 .57 .70 .67 .69 .73 .54 .72 .80 .79 .58 .58 SI .99 .93 .84 .81 .88 .83 .76 .65 .83 .79 .80 .85 .60 .83 .95 .94 .75 .76 .85 CC - Sources Compoof site Information .93 .84 .81 .88 .83 .76 .66 .83 .79 .79 .84 .61 .83 .95 .94 .75 .75 .85 .99 CC + Composite .90 .88 .95 .89 .71 .78 .82 .70 .70 .75 .56 .75 .94 .94 .60 .59 .76 .94 .94 RPT Reading Profile Total 3:15 PM Note: .78 .96 .75 .54 .65 .72 .61 .60 .65 .51 .66 .87 .88 .55 .52 .65 .82 .82 .90 RV Vocabulary Reading 10/29/10 Vocabulary Comprehension Reading Total Word Analysis Listening Spelling Language Concepts Problem Solving & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Sources of Information Composite without Computation Composite with Computation Reading Profile Total Level 7 Level 8 Table 8.1 (continued) Correlations Among Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 122 -Does not include Computation +Includes Computation .62 .62 .61 .79 .77 .72 .75 .78 .49 .76 .92 .92 .80 .79 .72 .77 .80 .92 .91 .73 .59 .93 .93 .96 .78 .95 .58 .60 .60 .76 .74 .69 .73 .76 .48 .73 .89 .88 .76 .77 .69 .75 .78 .88 .87 .69 .55 .88 RT Reading Total RC Comprehension .63 .60 .62 .81 .56 .57 .60 .50 .62 .73 .74 .56 .53 .52 .60 .60 .68 .68 .60 .35 .74 .60 .61 .65 L1 Spelling .71 .65 .88 .60 .61 .64 .56 .67 .78 .79 .58 .58 .57 .62 .64 .73 .73 .56 .42 .66 .56 .62 .63 .64 L2 Capitalization .65 .87 .61 .62 .65 .54 .67 .77 .78 .57 .59 .58 .62 .65 .73 .73 .59 .44 .68 .59 .65 .66 .67 .74 L3 .86 .68 .72 .74 .51 .73 .87 .86 .72 .72 .67 .73 .75 .85 .84 .69 .52 .81 .72 .77 .79 .65 .66 .70 L4 .72 .73 .77 .61 .79 .92 .93 .71 .71 .69 .75 .77 .88 .87 .72 .51 .85 .71 .76 .78 .82 .89 .90 .87 LT Punctu- Usage & Language ation Expression Total Language .77 .93 .61 .91 .85 .84 .69 .70 .69 .69 .75 .83 .82 .62 .52 .73 .67 .71 .73 .59 .63 .66 .70 .74 M1 .95 .60 .92 .88 .86 .72 .74 .72 .73 .78 .86 .85 .64 .53 .77 .67 .74 .75 .58 .62 .65 .71 .74 .80 M2 Concepts Problems & Data & InterpreEstimation tation .64 .97 .92 .90 .75 .76 .75 .75 .81 .90 .89 .67 .56 .80 .71 .76 .78 .62 .66 .69 .74 .78 .94 .96 MT - Math Total .81 .63 .68 .48 .50 .53 .54 .57 .60 .63 .46 .36 .54 .44 .50 .50 .51 .56 .57 .52 .62 .62 .60 .64 M3 Computation Mathematics .90 .91 .73 .74 .74 .75 .80 .88 .88 .60 .48 .70 .68 .74 .76 .63 .68 .71 .73 .79 .91 .92 .97 .81 MT + Math Total .99 .81 .82 .78 .82 .86 .97 .97 .77 .60 .93 .84 .89 .92 .76 .79 .82 .87 .93 .87 .88 .92 .64 .91 CT - Core Total .81 .81 .77 .82 .86 .97 .97 .77 .60 .93 .84 .89 .92 .77 .80 .83 .87 .94 .85 .86 .90 .69 .91 .99 CT + Core Total .77 .70 .74 .77 .89 .88 .66 .54 .79 .75 .76 .80 .56 .57 .59 .71 .70 .70 .71 .75 .47 .72 .80 .80 SS Social Studies .72 .75 .79 .90 .90 .66 .54 .79 .73 .78 .80 .55 .59 .62 .73 .71 .71 .74 .76 .50 .74 .82 .81 .78 SC Science .71 .94 .83 .87 .60 .51 .72 .66 .72 .74 .54 .59 .62 .70 .71 .71 .73 .76 .52 .75 .79 .79 .72 .74 S1 .91 .86 .84 .65 .50 .78 .68 .74 .75 .62 .63 .66 .72 .75 .69 .71 .74 .54 .74 .81 .81 .71 .73 .73 S2 .91 .92 .68 .54 .81 .72 .79 .80 .62 .65 .69 .76 .78 .76 .78 .81 .57 .80 .86 .86 .77 .79 .94 .91 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .77 .62 .92 .84 .89 .92 .71 .74 .77 .85 .88 .84 .86 .90 .61 .88 .97 .97 .88 .90 .85 .85 .92 CC - .76 .62 .91 .83 .88 .91 .70 .74 .77 .85 .88 .84 .85 .89 .63 .88 .97 .96 .88 .90 .88 .83 .92 .99 CC + Compo- Composite site .59 .89 — — — — — — — — — — — — — — — — — — — — — — WA Word Analysis .73 — — — — — — — — — — — — — — — — — — — — — — — Li Listening — — — — — — — — — — — — — — — — — — — — — — — — RPT Reading Profile Total 3:15 PM Note: .78 .93 .59 .56 .55 .72 .71 .66 .69 .72 .45 .69 .85 .85 .74 .73 .65 .70 .73 .84 .84 .69 .55 .87 RV Vocabulary Reading 10/29/10 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problems & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation Word Analysis Listening Reading Profile Total Level 9 Level 10 Table 8.1 (continued) Correlations Among Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 123 123 124 -Does not include Computation +Includes Computation .66 .66 .66 .78 .78 .73 .73 .77 .49 .74 .91 .91 .81 .82 .72 .76 .79 .91 .91 .93 .96 .79 .96 .63 .64 .65 .77 .76 .71 .72 .75 .49 .73 .89 .88 .78 .80 .71 .75 .78 .89 .88 RT Reading Total RC Comprehension .68 .70 .67 .85 .62 .61 .64 .53 .66 .78 .79 .57 .58 .57 .65 .65 .73 .72 .59 .62 .64 L1 Spelling .76 .69 .89 .66 .65 .69 .57 .70 .82 .83 .60 .63 .62 .66 .69 .77 .77 .59 .66 .67 .67 L2 Capitalization .71 .90 .67 .66 .70 .59 .72 .83 .84 .60 .63 .63 .68 .70 .77 .77 .58 .66 .66 .69 .75 L3 .88 .70 .72 .75 .53 .74 .87 .87 .70 .74 .69 .73 .76 .85 .84 .71 .78 .79 .66 .68 .72 L4 .76 .75 .79 .63 .80 .94 .95 .70 .74 .71 .77 .80 .89 .88 .70 .78 .79 .84 .89 .91 .88 LT .82 .94 .64 .92 .87 .86 .70 .74 .75 .71 .78 .85 .85 .65 .72 .73 .59 .64 .66 .71 .74 M1 .96 .61 .93 .88 .86 .71 .76 .76 .73 .80 .87 .86 .65 .73 .73 .56 .62 .64 .71 .73 .81 M2 .65 .97 .92 .90 .74 .78 .79 .76 .83 .90 .90 .68 .76 .77 .60 .66 .68 .75 .77 .94 .96 MT - Math Total .82 .64 .69 .47 .51 .53 .55 .58 .61 .64 .43 .53 .51 .52 .57 .59 .55 .64 .67 .62 .67 M3 Computation Mathematics Concepts Problems Punctu- Usage & Language & Data & ation Expression Total InterpreEstimation tation Language .90 .91 .71 .76 .77 .75 .82 .88 .89 .65 .74 .74 .62 .68 .70 .74 .78 .92 .92 .97 .84 MT + Math Total .99 .81 .84 .80 .83 .87 .97 .97 .83 .90 .92 .76 .81 .82 .88 .93 .87 .87 .91 .66 .90 CT - Core Total .80 .83 .79 .83 .87 .97 .97 .82 .89 .91 .77 .82 .83 .88 .94 .86 .85 .90 .71 .91 .99 CT + Core Total .80 .71 .73 .78 .89 .89 .75 .79 .81 .56 .62 .60 .73 .72 .70 .71 .74 .50 .72 .82 .81 SS Social Studies .74 .76 .81 .91 .91 .72 .80 .81 .56 .62 .63 .76 .74 .73 .75 .78 .54 .76 .84 .83 .80 SC Science .73 .94 .85 .88 .63 .70 .71 .53 .60 .61 .68 .69 .75 .75 .78 .55 .77 .79 .78 .71 .73 S1 .92 .87 .84 .68 .76 .77 .62 .66 .67 .75 .77 .72 .72 .76 .56 .75 .83 .83 .76 .78 .73 S2 .92 .93 .70 .79 .79 .62 .68 .69 .76 .78 .79 .79 .83 .60 .82 .87 .86 .79 .81 .94 .92 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .82 .90 .91 .70 .76 .77 .86 .88 .85 .86 .90 .64 .88 .97 .97 .90 .91 .84 .88 .92 CC - .81 .89 .90 .70 .76 .77 .85 .88 .85 .85 .90 .66 .89 .97 .97 .89 .91 .87 .85 .93 .99 CC + Compo- Composite site 3:15 PM Note: .78 .93 .61 .59 .59 .70 .71 .67 .66 .70 .42 .66 .83 .83 .76 .74 .64 .68 .71 .83 .83 RV Vocabulary Reading 10/29/10 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problem Solving and Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation Level 11 Level 12 Table 8.1 (continued) Correlations Among Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 124 -Does not include Computation +Includes Computation .66 .70 .69 .81 .80 .74 .75 .78 .49 .75 .92 .92 .82 .80 .72 .76 .80 .91 .91 .93 .96 .80 .96 .64 .70 .70 .80 .80 .73 .76 .78 .51 .75 .91 .90 .81 .80 .72 .76 .80 .90 .90 RT Reading Total RC Comprehension .69 .70 .69 .84 .59 .56 .60 .49 .62 .77 .78 .58 .56 .53 .62 .62 .71 .70 .62 .64 .67 L1 Spelling .78 .73 .90 .67 .65 .69 .54 .70 .83 .84 .67 .64 .63 .68 .70 .79 .79 .63 .70 .71 .70 L2 Capitalization .76 .92 .67 .66 .70 .55 .71 .84 .85 .65 .64 .63 .69 .70 .79 .79 .61 .69 .69 .71 .77 L3 .90 .72 .73 .76 .51 .74 .89 .89 .75 .74 .69 .75 .77 .87 .86 .70 .77 .78 .70 .74 .78 L4 .75 .74 .78 .59 .78 .94 .94 .75 .73 .70 .77 .79 .89 .89 .72 .79 .80 .85 .90 .92 .91 LT Punctu- Usage & Language ation Expression Total Language .82 .94 .63 .92 .87 .86 .71 .71 .73 .71 .77 .85 .85 .66 .72 .73 .61 .67 .67 .70 .74 M1 .96 .59 .92 .88 .86 .75 .75 .74 .73 .79 .87 .86 .64 .74 .73 .58 .64 .66 .72 .73 .81 M2 Concepts Problems & Data & InterpreEstimation tation .64 .96 .92 .90 .77 .77 .77 .75 .82 .90 .90 .68 .77 .77 .62 .69 .70 .75 .77 .94 .96 MT - Math Total .83 .62 .68 .50 .50 .50 .53 .55 .60 .63 .41 .47 .47 .47 .51 .53 .51 .57 .61 .55 .61 M3 Computation Mathematics .89 .90 .74 .74 .74 .74 .80 .87 .88 .65 .73 .74 .63 .69 .70 .73 .77 .91 .91 .95 .82 MT + Math Total .99 .83 .83 .79 .82 .86 .97 .97 .83 .90 .92 .78 .84 .84 .88 .94 .87 .87 .91 .60 .89 CT - Core Total .83 .82 .78 .82 .86 .97 .97 .83 .89 .91 .79 .84 .85 .89 .94 .86 .85 .90 .66 .90 .99 CT + Core Total .81 .73 .76 .80 .91 .90 .72 .79 .80 .57 .64 .63 .72 .72 .69 .73 .75 .46 .72 .81 .81 SS Social Studies .74 .76 .81 .91 .90 .71 .79 .79 .57 .65 .65 .74 .73 .71 .75 .77 .47 .74 .83 .82 .80 SC Science .73 .94 .85 .87 .62 .70 .70 .52 .61 .61 .67 .68 .72 .74 .77 .46 .73 .77 .76 .72 .76 S1 .92 .87 .84 .67 .75 .75 .63 .68 .70 .76 .78 .70 .73 .75 .51 .74 .82 .82 .75 .77 .74 S2 .92 .92 .69 .78 .78 .61 .69 .70 .76 .78 .76 .79 .82 .52 .79 .85 .85 .79 .82 .94 .92 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .81 .89 .91 .71 .79 .79 .86 .89 .84 .86 .90 .57 .87 .97 .96 .90 .91 .84 .87 .92 CC - .81 .89 .90 .71 .78 .79 .85 .88 .84 .86 .89 .60 .87 .96 .96 .89 .91 .87 .85 .92 .99 CC + Compo- Composite site 3:15 PM Note: .79 .93 .62 .61 .60 .72 .71 .66 .65 .69 .41 .65 .83 .82 .73 .71 .64 .67 .70 .82 .81 RV Vocabulary Reading 10/29/10 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problem Solving and Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation Level 13 Level 14 Table 8.1 (continued) Correlations Among Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 125 125 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 126 exploratory factor analysis techniques. Correlations among developmental standard scores were used with least-squares estimates of communality. In grades 3 through 8, the factor solutions were based on correlations among the thirteen tests in Levels 9 through 14. In grades 1 and 2, solutions were based on the eleven tests in Levels 7 and 8. In kindergarten and grade 1, solutions were based on the five tests in Level 5 and the six tests in Level 6. After the least-squares factor solutions were obtained, both orthogonal and oblique simple structure transformations were performed. Three factors were retained for Levels 7 through 14; two were retained for Levels 5 and 6. Information. These were again attributed to the multidisciplinary nature of those content domains in the school curriculum. In the second sample for each test level, a restricted factor analysis model based on the final solution in the first sample was used for cross-validation. Because the cross-validation results were similar to the initial results, only the former are described in this section. Levels 7 and 8 Levels 9 through 14 At these levels, tests that define major subject areas in the elementary school curriculum—Vocabulary, Reading Comprehension, Language, and Mathematics—determined the three factors. Tests in Social Studies, Science, and Sources of Information were less consistent in their factor composition. The three factors at Levels 9 through 14 were characterized as follows. Factor I, “Verbal Reasoning or Comprehension” The Vocabulary and Reading tests had the highest loadings on this factor in all six grades and provided the clearest interpretation of the factor. Several other tests had substantial loadings on this factor: Usage and Expression, Maps and Diagrams, Social Studies, and Science. The influence of the first two of these tests on Factor I decreased from Level 9 to Level 14. It was still large enough to suggest the importance of verbal skills for these tests but with less weight in the upper grades. The loadings of the Social Studies and Science tests on this factor probably reflect the multidisciplinary nature of those subjects and the influence of verbal comprehension in the elementary school curriculum. Factor II, “Mathematical Reasoning” The three Math tests formed this factor. Math Concepts and Estimation had the highest loadings at all levels. Problem Solving and Data Interpretation and Computation contributed to this factor, but at slightly varying degrees depending on level. Small but appreciable influences on this factor were noted in Social Studies, Science, and Sources of 126 Factor III, “Aspects of Written Language” In all grades, tests loading on this factor were Spelling, Capitalization, and Punctuation. Loadings of the Spelling test generally increased across the grades. Loadings of the Usage and Expression test also increased from lower to upper elementary grades. By sixth grade, Usage and Expression was clearly associated with the language factor, although its loadings in the upper grades were smaller than those of other language tests. These levels have a subtest structure similar to that of Levels 9 through 14 except in language arts. The three factors defined at these levels reveal contrasts between the tests in Levels 7 and 8 and those in Levels 9 through 14. The first two factors were similar to the ones described above. The Language and Word Analysis tests helped define the first factor; the three Math tests defined the second. The third factor related to the tests that require interpreting pictures while listening to a teacher (Listening, Social Studies, and Science). Levels 5 and 6 Composition of tests and the integrated curriculum in the early elementary grades influence correlations among tests at these levels. Factor analysis reflected these conditions. In Levels 5 and 6, where a smaller battery is given, two factors were defined: a verbal comprehension factor and a factor related to tests of skills developed through direct instruction in kindergarten and grade 1. The Vocabulary, Listening, Language, and Math tests defined the first factor in both levels. The second factor was influenced by the Word Analysis test in Level 5 and by the Word Analysis, Reading, and Math tests in Level 6. Interpretation of Factors Whether the factors defined above result from general cognitive abilities, specific skills required in different tests, item types, qualitative differences among tests, or school curriculum is unknown. The correlations among factors were substantial. In the eight grades that take Levels 7 through 14, the median correlation between factors was .65; 18 of the 24 values were between .60 and .69. This indicates a general factor accounts for most variability. These 961464_ITBS_GuidetoRD.qxp 10/29/10 3:15 PM Page 127 results do not imply that score differences between tests are unreliable, however. A study of grade 5 national standardization results for Forms G and H (Martin & Dunbar, 1985) clarified the internal structure of equivalent forms of the ITBS. One purpose of the study was to investigate the presence of group factors after controlling for a general factor. The analysis was based on 48 composite variables derived from homogeneous sets of items from each test. The group factors were identified as (1) verbal comprehension, (2) language mechanics, (3) solving problems that use quantitative concepts and visual materials, and (4) computation. Application of extension analysis to subtests of the CogAT confirmed interpretations of the ITBS group factors. The CogAT Verbal subtests loaded on the verbal comprehension factor. The Quantitative subtests had the highest loadings on the visual materials and math computation factors. The CogAT Nonverbal tests, which include geometric figures and patterns, had the highest loadings on the third factor. This supports the visual materials interpretation. A study of relations between the ITBS and the Iowa Tests of Educational Development (ITED) was part of the initial investigation of the joint scaling of these batteries (Becker & Dunbar, 1990). An interbattery factor analysis was conducted to examine the relations of tests in the ITBS and the ITED. Results suggested that factors related to verbal comprehension, quantitative concepts and visual materials, and language skills were stable across batteries. This study also replicated research on the multidisciplinary nature of content in Social Studies, Science, and Sources of Information. Support for the joint scaling of the ITBS and ITED was established. Reliabilities of Differences in Test Performance For any test battery, the reliability of differences in an individual’s performance across test areas is of interest. The meaningfulness of strengths and weaknesses in a profile of scores depends on the reliability of these differences. The interpretation of score differences across tests is discussed in the ITBS Interpretive Guide for Teachers and Counselors. Reliabilities of differences among major test areas appear in Table 8.2. Computational procedures for these coefficients appear in Thorndike (1963, pp. 117–120). The correlations among tests were reported earlier in this section. The K-R20 reliability coefficients in Table 5.1 were used to compute the reliabilities of differences. Reliabilities of differences between scores on individual tests are reported in Table 8.3. These were also based on the reliability coefficients and correlations reported earlier. Despite the relatively high correlations among tests, reliabilities of differences between scores are substantial (nearly 90 percent above .50). These results support the use of ITBS subtests to identify strengths and weaknesses. Correlations Among Building Averages Correlations among building averages for kindergarten through grade 8 are shown in Table 8.4. These correlations are higher than those for student scores, reflecting the consistent performance of groups of students across tests. The exception is for Math Computation, particularly in the lower grades where correlations with other Math tests are relatively low. Relations Between Achievement and General Cognitive Ability The ITBS and the CogAT were standardized on the same population and were given under the same conditions at about the same time. This enables comparisons of achievement and ability under nearly ideal conditions— a nationwide sample selected to be representative on factors related to ability and achievement. Only students who took all tests in the ITBS Complete Battery and all three CogAT test batteries were included. The sample sizes in each grade were: Grade K 1 1 2 3 4 5 6 7 8 Level 5 6 7 8 9 10 11 12 13 14 N 6,111 7,128 6,800 14,870 14,978 15,935 16,517 14,972 11,896 10,294 127 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 128 Table 8.2 Reliabilities of Differences Among Scores for Major Test Areas: Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 8 Grade 2 Level 12 Grade 6 Reading Total Language Total Math Total Sources Total RT LT MT + ST .62 .78 .61 Reading Total .66 .55 Language Total .78 .59 Mathematics Total .79 .77 Sources Total .66 .70 .63 Reading Total Language Total Math Total Sources Total RT LT MT + ST .78 .81 .69 .80 .72 Level 7 Grade 1 Reading Total Level 11 Grade 5 Reading Total Language Total Math Total Sources Total RT LT MT + ST .77 .78 .66 .78 .70 Language Total .56 Mathematics Total .74 .60 Sources Total .62 .53 .57 Reading Total Language Total Math Total Sources Total RT LT MT + ST .77 .76 .64 Reading Total .75 .70 Language Total .77 .63 Mathematics Total .79 .79 Sources Total .65 .70 Level 10 Grade 4 Level 9 Grade 3 Reading Total Language Total .79 Mathematics Total .75 .75 Sources Total .62 .69 Note: Level 14 Grade 8 Level 13 Grade 7 .58 .61 .68 .64 +Includes Computation Table 8.3 Reliabilities of Differences Among Tests: Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 5 Grade K Level 6 Grade 1 Vocabulary Word Analysis Listening Language Mathematics Reading Words Reading Comprehension Reading Total V WA Li L M RW RC RT .46 .26 .31 .43 .62 .65 .67 .44 .38 .34 .34 .45 .41 .21 .36 .62 .65 .67 .26 .60 .63 .65 .54 .59 .60 Vocabulary 128 Word Analysis .50 Listening .35 .54 Language .31 .51 .24 Mathematics .40 .47 .34 .30 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 129 Table 8.3 (continued) Reliabilities of Differences Among Tests: Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 8 Grade 2 Level 7 Grade 1 Reading Mathematics Word Analysis Listening Spelling Language Problems & Data Computation Interpretation Social Studies Science Sources of Information Vocabulary Comprehension RV RC WA Li L1 L M1 M2 M3 SS SC SI .52 .46 .52 .59 .57 .56 .62 .73 .44 .54 .56 .56 .59 .61 .61 .61 .66 .74 .53 .63 .60 .55 .48 .48 .52 .58 .68 .51 .59 .55 .64 .54 .42 .48 .64 .26 .32 .49 .42 .59 .65 .68 .60 .67 .62 .51 .58 .68 .52 .61 .55 .23 .50 .38 .44 .45 .55 .43 .51 .50 .59 .66 .66 .21 .41 Vocabulary Concepts Comprehension .53 Word Analysis .49 .56 Listening .55 .63 .53 Spelling .66 .65 .58 .65 Language .56 .59 .47 .50 .50 Concepts .56 .63 .50 .31 .63 .47 Problems & Data Interpretation .62 .66 .56 .39 .69 .52 .25 Computation .75 .77 .69 .62 .74 .69 .54 .57 Social Studies .60 .66 .59 .26 .70 .57 .39 .46 .66 Science .59 .66 .58 .30 .68 .56 .38 .47 .65 .29 Sources of Information .61 .64 .58 .48 .66 .53 .46 .49 .68 .51 Level 10 Grade 4 Level 9 Grade 3 Reading Vocabulary Comprehension RV Vocabulary Language .50 .51 Sources of Information Mathematics Social Studies Science Word Analysis Listening S2 WA Li .63 .65 — — .50 .53 .56 — — .72 .75 .71 .68 — — .68 .64 .66 .62 .61 — — .57 .69 .64 .65 .60 .59 — — .56 .77 .59 .60 .58 .58 — — .30 .67 .55 .57 .50 .58 — — .66 .48 .48 .41 .50 — — .75 .76 .71 .73 — — .41 .47 .53 — — .47 .53 — — .47 — — — — Concepts Problems Computa& Data & tion InterpreEstimation tation Maps Reference and Materials Diagrams Spelling Capitalization Punctuation Usage & Expression RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 .56 .74 .71 .70 .66 .66 .63 .81 .53 .61 .73 .66 .65 .57 .60 .52 .78 .49 .63 .61 .71 .71 .69 .77 .40 .62 .61 .59 .59 .60 .62 Comprehension .57 Spelling .75 .77 Capitalization .73 .72 .68 Punctuation .72 .70 .68 .51 Usage and Expression .62 .59 .72 .65 .63 Concepts and Estimation .59 .58 .69 .61 .59 .56 Problems & Data Interpretation .60 .57 .71 .64 .62 .56 .30 Computation .78 .78 .76 .69 .69 .75 .60 .65 Social Studies .50 .50 .71 .66 .64 .54 .48 .48 .72 Science .55 .50 .73 .66 .64 .54 .47 .45 .72 .35 Maps and Diagrams .57 .54 .69 .61 .59 .54 .40 .40 .65 .42 .39 Reference Materials .61 .56 .70 .64 .63 .56 .51 .49 .71 .46 .44 .43 Word Analysis .59 .62 .68 .67 .64 .58 .57 .59 .73 .55 .56 .56 .59 Listening .59 .61 .72 .66 .63 .61 .55 .57 .69 .54 .55 .54 .61 — .50 129 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 130 Table 8.3 (continued) Reliabilities of Differences Among Tests: Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Level 12 Grade 6 Level 11 Grade 5 Reading Language Social Studies Science M3 SS SC Concepts Problems Computa& Data & tion InterpreEstimation tation Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 .52 .75 .65 .73 .65 .70 .65 .78 .52 .75 .61 .69 .56 .65 .57 .75 .58 .65 .71 .76 .73 .42 .57 .62 .62 Vocabulary Sources of Information Mathematics M2 S1 S2 .62 .63 .63 .46 .51 .56 .54 .75 .73 .77 .71 .70 .58 .63 .59 .63 .56 .56 .68 .64 .69 .68 .70 .63 .63 .66 .59 .73 .58 .59 .59 .56 .36 .63 .60 .62 .47 .59 .63 .52 .52 .38 .52 .72 .74 .65 .69 .41 .47 .45 .50 .48 Reading Comprehension .53 Spelling .74 .74 Capitalization .69 .66 .61 Punctuation .71 .68 .62 .43 Usage and Expression .64 .56 .69 .59 .59 Concepts and Estimation .65 .62 .70 .59 .61 .61 Problems & Data Interpretation .64 .57 .70 .59 .60 .56 .28 Computation .81 .80 .78 .70 .71 .77 .67 .68 Social Studies .50 .48 .73 .65 .67 .60 .57 .52 .77 Science .58 .49 .75 .65 .68 .58 .56 .49 .78 .40 Maps and Diagrams .62 .55 .70 .58 .60 .56 .43 .36 .70 .47 .47 Reference Materials .65 .57 .70 .61 .61 .58 .59 .52 .75 .53 .52 Level 14 Grade 8 Level 13 Grade 7 Reading Language .47 Sources of Information Mathematics Concepts Problems Computa& Data & tion InterpreEstimation tation .45 Social Studies Science Maps Reference and Materials Diagrams Vocabulary Comprehension Spelling Capitalization Punctuation Usage & Expression RV RC L1 L2 L3 L4 M1 M2 M3 SS SC S1 S2 .55 .73 .64 .69 .64 .70 .68 .79 .60 .64 .65 .64 .76 .62 .67 .61 .69 .63 .80 .55 .59 .60 .60 .58 .61 .66 .75 .74 .78 .75 .77 .73 .70 .38 .50 .62 .61 .70 .62 .63 .59 .55 .48 .66 .64 .72 .67 .67 .63 .57 .66 .60 .76 .60 .61 .59 .53 .44 .70 .66 .65 .54 .62 .71 .57 .55 .46 .54 .77 .77 .72 .73 .45 .49 .51 .46 .50 Vocabulary Reading Comprehension .54 Spelling .73 .77 Capitalization .64 .60 .59 Punctuation .71 .67 .65 .35 Usage and Expression .63 .57 .70 .53 .55 Concepts and Estimation .69 .67 .77 .60 .66 .66 Problems & Data Interpretation .65 .57 .75 .58 .63 .59 .36 Computation .77 .77 .76 .66 .70 .75 .66 .66 Social Studies .56 .48 .75 .57 .66 .56 .62 .50 .72 Science .63 .56 .79 .64 .69 .62 .65 .54 .75 .42 Maps and Diagrams .59 .53 .71 .53 .59 .56 .48 .39 .66 .43 .45 Reference Materials .65 .59 .73 .56 .62 .58 .64 .55 .72 .50 .55 130 Maps Reference and Materials Diagrams .45 .45 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 131 Correlations between CogAT and ITBS scores appear in Table 8.5. Patterns of correlations tend to agree with common sense. In Levels 9 through 14, average correlations between the ITBS Complete Composite and CogAT are .86, .78, and .73 for the Verbal, Quantitative, and Nonverbal batteries, respectively. Correlations between CogAT Quantitative and ITBS Math scores are substantially higher than others in the table. Comparisons of ability and achievement tests are meaningful only if the tests measure unique characteristics. To some extent, this can be determined subjectively by examining test content. Revisions of the Verbal and Quantitative Batteries of the CogAT have reduced overlap with ITBS subtests. ITBS tests in Vocabulary, Reading, and Language overlap somewhat with the Verbal Battery of the CogAT. Each battery requires skill in vocabulary, reading, and verbal reasoning. Similar overlap exists between the ITBS Math tests and the CogAT Quantitative Battery. In contrast, the CogAT Nonverbal Battery measures cognitive skills distinct from any ITBS test. The unique skills measured by the Nonverbal Battery (particularly in abstract reasoning) provide the rationale for using Nonverbal scores to set expectations for ITBS performance. Predicting Achievement from General Cognitive Ability: Individual Scores The combined scoring system for the ITBS and the CogAT is described in the Interpretive Guide for School Administrators. The prediction equations in this system are based on matched samples of students who took the Complete Battery of the ITBS and the three test batteries of the CogAT in the spring 2000 standardization. Predicted ITBS scores are based on nonlinear regression equations that use one of the four CogAT scores (Verbal, Quantitative, Nonverbal, or Composite). For example, the equation for predicting Table 8.4 Correlations Among School Average Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization Grade 1 (Level 6/7) Kindergarten (Level 5/6) Reading Word Analysis Listening Language Math Total Core Total Reading Profile Total Li L MT - CT - RPT Vocabulary Reading Words Comprehension Reading Total RV RW RC RT WA .51 .71 .81 .74 .73 .78 .74 .91 .88 .84 .93 .79 .65 .65 .65 .64 .85 .95 .83 .65 .77 .75 .81 .90 .85 .68 .77 .78 .84 .94 .74 .80 .78 .82 .93 .80 .82 .82 .83 .85 .94 .88 .93 .86 Vocabulary Reading Words .48 Reading Comprehension .34 .70 Reading Total .45 .91 .93 Word Analysis .64 .70 .58 .71 Listening .84 .46 .36 .47 .60 Language .84 .48 .35 .48 .61 .93 Math Total .81 .59 .43 .56 .73 .85 .87 Core Total .95 .55 .39 .52 .69 .92 .95 .93 Reading Profile Total .90 .80 .71 .82 .86 .87 .84 .87 Note: -Does not include Computation .94 .92 131 .85 .68 .82 .77 .83 .78 .78 .71 .84 .84 .95 .78 .73 .80 .91 .90 .94 .97 .96 RT Reading Total .74 .79 .80 .80 .77 .78 .70 .80 .82 .89 .76 .71 .79 .91 .92 .93 .86 .85 .88 WA Word Analysis .59 .80 .81 .78 .82 .53 .78 .82 .76 .83 .82 .77 .88 .88 .83 .79 .76 .80 .75 Li Listening .83 .72 .68 .72 .66 .74 .84 .84 .67 .60 .75 .79 .79 .87 .78 .81 .82 .81 .61 L1 Spelling .84 .82 .85 .72 .86 .94 .95 .81 .73 .87 .91 .92 .88 .83 .86 .87 .83 .77 .82 L Language .88 .96 .73 .95 .92 .90 .79 .78 .80 .92 .91 .87 .79 .78 .81 .80 .74 .71 .80 M1 Concepts .97 .76 .96 .90 .88 .80 .79 .85 .90 .89 .83 .77 .78 .80 .79 .79 .70 .82 .89 M2 Problems & Data Interpretation .77 .99 .93 .92 .82 .81 .85 .94 .93 .86 .81 .81 .83 .82 .81 .73 .85 .97 .97 MT - Math Total .86 .77 .80 .60 .55 .72 .72 .73 .73 .54 .61 .59 .64 .47 .58 .60 .70 .67 .70 M3 Computation Mathematics .94 .93 .83 .81 .88 .93 .92 .88 .78 .81 .82 .83 .75 .74 .83 .95 .94 .97 .84 MT + Math Total .99 .86 .80 .89 .97 .96 .94 .92 .93 .95 .89 .84 .83 .95 .91 .91 .94 .66 .92 CT - Core Total .84 .78 .88 .96 .96 .97 .92 .93 .96 .89 .82 .84 .95 .90 .90 .93 .70 .92 .99 CT + Core Total .87 .84 .92 .92 .85 .76 .73 .77 .72 .75 .58 .71 .74 .69 .74 .46 .70 .78 .77 SS Social Studies .82 .89 .89 .79 .78 .70 .77 .69 .83 .52 .70 .75 .74 .77 .41 .71 .79 .77 .79 SC Science .92 .92 .87 .82 .83 .85 .80 .79 .70 .83 .83 .83 .86 .55 .82 .89 .88 .78 .80 SI .99 .96 .92 .90 .94 .91 .89 .77 .90 .90 .89 .92 .62 .89 .97 .96 .86 .87 .92 CC - Sources Compoof site Information .96 .92 .90 .94 .91 .89 .78 .90 .90 .89 .92 .63 .89 .97 .97 .86 .87 .92 .99 CC + Composite .95 .94 .98 .94 .86 .86 .90 .84 .84 .88 .62 .86 .97 .97 .78 .78 .86 .97 .97 RPT Reading Profile Total 3:16 PM -Does not include Computation +Includes Computation .95 .83 .65 .85 .77 .80 .75 .75 .72 .81 .81 .93 .74 .66 .80 .87 .87 .90 .88 RC Comprehension Reading 10/29/10 Note: .71 .81 .74 .73 .75 .78 .79 .76 .74 .67 .80 .91 .91 .77 .73 .75 .88 .87 .88 Vocabulary Comprehension Reading Total Word Analysis Listening Spelling Language Concepts Problem Solving & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Sources of Information Composite without Computation Composite with Computation Reading Profile Total Vocabulary RV 132 Level 7 — Grade 1 Level 8 — Grade 2 Table 8.4 (continued) Correlations Among School Average Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 132 .98 .77 .74 .78 .91 .86 .86 .90 .90 .58 .86 .95 .94 .87 .92 .85 .89 .90 .95 .95 .62 .26 .79 .88 .78 .75 .78 .92 .87 .87 .91 .91 .60 .87 .96 .96 .89 .93 .87 .90 .92 .96 .96 .71 .40 .88 .97 .97 RT Reading Total .81 .79 .78 .88 .76 .78 .79 .67 .79 .85 .85 .73 .72 .70 .79 .76 .81 .81 .44 .38 .68 .75 .75 .77 L1 Spelling .90 .80 .95 .76 .77 .78 .68 .79 .86 .87 .75 .74 .73 .79 .78 .82 .82 .65 .38 .73 .72 .78 .77 .80 L2 Capitalization .82 .95 .78 .81 .81 .68 .81 .88 .89 .77 .77 .75 .81 .80 .85 .85 .59 .34 .67 .74 .79 .79 .79 .90 L3 .92 .84 .90 .89 .63 .86 .95 .94 .86 .90 .84 .89 .89 .94 .93 .70 .41 .84 .86 .90 .91 .77 .84 .83 L4 .84 .88 .88 .70 .87 .95 .96 .84 .85 .81 .88 .87 .92 .92 .70 .43 .84 .82 .87 .87 .88 .95 .95 .93 LT .92 .97 .73 .96 .93 .92 .82 .85 .83 .85 .87 .91 .91 .62 .34 .72 .75 .83 .82 .70 .74 .79 .82 .83 M1 .98 .69 .95 .96 .95 .87 .90 .88 .90 .91 .95 .95 .76 .46 .85 .81 .89 .88 .76 .78 .80 .88 .87 .92 M2 .72 .97 .96 .96 .87 .89 .87 .89 .91 .95 .95 .72 .41 .81 .79 .88 .86 .74 .77 .81 .87 .86 .98 .98 MT - Math Total .85 .70 .74 .58 .60 .62 .63 .64 .66 .68 .49 .37 .55 .53 .57 .56 .62 .63 .66 .59 .67 .70 .68 .70 M3 Computation Mathematics Concepts Problems Punctu- Usage & Language & Data & ation Expression Total InterpreEstimation tation Language .94 .95 .83 .86 .85 .86 .88 .92 .92 .69 .42 .78 .76 .84 .82 .75 .78 .82 .84 .86 .95 .95 .97 .84 MT + Math Total .99 .90 .92 .88 .93 .93 .98 .98 .74 .44 .89 .90 .94 .95 .84 .88 .90 .95 .96 .91 .95 .95 .67 .92 CT - Core Total .89 .92 .88 .92 .93 .98 .98 .74 .44 .89 .89 .94 .94 .85 .89 .90 .94 .96 .91 .94 .94 .72 .93 .99 CT + Core Total .90 .87 .88 .90 .94 .94 .68 .38 .80 .85 .88 .89 .75 .78 .76 .88 .85 .81 .86 .85 .55 .81 .90 .89 SS Social Studies .89 .91 .93 .96 .96 .76 .54 .86 .87 .90 .91 .70 .76 .75 .90 .84 .83 .88 .87 .58 .84 .91 .91 .91 SC Science .87 .97 .92 .93 .73 .56 .83 .84 .90 .89 .74 .79 .81 .89 .87 .84 .89 .88 .59 .85 .92 .92 .88 .89 S1 .96 .95 .94 .64 .51 .83 .81 .87 .86 .78 .81 .80 .89 .88 .81 .87 .86 .60 .83 .91 .90 .84 .87 .86 S2 .96 .97 .72 .56 .87 .86 .91 .91 .79 .83 .84 .92 .91 .86 .91 .90 .62 .87 .95 .95 .90 .91 .97 .96 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .75 .48 .90 .90 .95 .95 .81 .85 .86 .95 .94 .89 .94 .93 .64 .90 .99 .98 .94 .95 .94 .92 .97 CC - .76 .49 .90 .90 .94 .95 .81 .85 .86 .94 .94 .89 .94 .93 .66 .91 .98 .98 .94 .95 .95 .91 .97 .99 CC + Compo- Composite site .65 .88 — — — — — — — — — — — — — — — — — — — — — — WA Word Analysis .75 — — — — — — — — — — — — — — — — — — — — — — — Li Listening — — — — — — — — — — — — — — — — — — — — — — — — RPT Reading Profile Total 3:16 PM -Does not include Computation +Includes Computation .91 .97 .77 .73 .73 .89 .83 .85 .87 .88 .59 .84 .93 .92 .87 .90 .85 .87 .89 .93 .93 .78 .54 .92 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problems & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation Word Analysis Listening Reading Profile Total RC Comprehension Reading 10/29/10 Note: RV Vocabulary Level 9 — Grade 3 Level 10 — Grade 4 Table 8.4 (continued) Correlations Among School Average Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 133 133 134 -Does not include Computation +Includes Computation .80 .80 .82 .91 .89 .85 .88 .88 .56 .84 .95 .95 .91 .91 .84 .87 .88 .95 .94 .97 .98 .91 .98 .77 .79 .80 .91 .88 .86 .89 .89 .56 .85 .94 .94 .90 .90 .85 .86 .88 .94 .93 RT Reading Total RC Comprehension .81 .82 .79 .90 .73 .72 .73 .59 .73 .84 .84 .74 .70 .71 .79 .76 .80 .79 .76 .75 .77 L1 Spelling .89 .84 .95 .79 .80 .81 .65 .82 .89 .90 .77 .76 .80 .76 .80 .86 .86 .74 .75 .76 .80 L2 Capitalization .84 .95 .81 .82 .82 .65 .83 .90 .91 .76 .79 .80 .80 .82 .86 .87 .73 .77 .76 .77 .87 L3 .93 .85 .88 .88 .61 .85 .94 .93 .86 .90 .86 .85 .88 .93 .93 .86 .90 .90 .74 .79 .80 L4 .86 .87 .87 .68 .87 .96 .96 .84 .85 .85 .86 .88 .93 .93 .84 .86 .87 .88 .94 .94 .91 LT Punctu- Usage & Language ation Expression Total Language .92 .97 .72 .96 .92 .92 .83 .85 .85 .82 .86 .91 .91 .79 .87 .85 .70 .73 .76 .84 .83 M1 .98 .64 .95 .94 .93 .86 .91 .90 .86 .91 .94 .94 .79 .87 .85 .65 .72 .74 .87 .82 .92 M2 Concepts Problems & Data & InterpreEstimation tation .68 .97 .96 .95 .86 .89 .89 .86 .91 .95 .95 .81 .88 .87 .69 .73 .76 .87 .84 .97 .98 MT - Math Total .83 .66 .70 .52 .53 .59 .54 .57 .60 .62 .55 .56 .57 .58 .59 .60 .55 .63 .71 .61 .67 M3 Computation Mathematics .93 .94 .82 .84 .85 .82 .87 .91 .92 .78 .85 .84 .71 .74 .77 .83 .83 .96 .93 .96 .84 MT + Math Total .99 .90 .91 .89 .90 .93 .99 .98 .91 .95 .95 .82 .86 .87 .94 .95 .93 .92 .94 .65 .92 CT - Core Total .89 .90 .88 .89 .92 .98 .98 .91 .94 .95 .83 .87 .88 .93 .96 .92 .91 .93 .71 .93 .99 CT + Core Total .90 .87 .85 .89 .94 .94 .83 .87 .87 .70 .77 .72 .87 .84 .81 .85 .85 .54 .81 .89 .88 SS Social Studies .91 .87 .91 .95 .95 .85 .90 .90 .66 .71 .70 .89 .81 .84 .87 .87 .54 .83 .90 .89 .89 SC Science .86 .96 .92 .93 .79 .87 .85 .67 .78 .75 .87 .84 .86 .88 .89 .59 .86 .90 .90 .89 .89 S1 .96 .92 .91 .82 .87 .87 .73 .78 .75 .88 .86 .83 .86 .86 .58 .84 .90 .90 .91 .90 .90 S2 .96 .96 .82 .89 .88 .71 .80 .76 .90 .87 .87 .89 .90 .60 .87 .92 .92 .92 .92 .98 .97 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .90 .94 .94 .77 .83 .82 .94 .92 .91 .93 .94 .63 .90 .98 .97 .94 .95 .94 .95 .96 CC - .90 .94 .94 .78 .84 .82 .94 .92 .91 .92 .94 .65 .91 .98 .98 .94 .95 .95 .94 .97 .99 CC + Compo- Composite site 3:16 PM Note: .90 .97 .81 .77 .80 .87 .87 .81 .82 .82 .52 .79 .92 .91 .87 .86 .79 .83 .83 .91 .90 RV Vocabulary Reading 10/29/10 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problems & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation Level 11 — Grade 5 Level 12 — Grade 6 Table 8.4 (continued) Correlations Among School Average Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 134 .98 .74 .82 .86 .91 .89 .83 .90 .89 .59 .86 .95 .95 .89 .91 .87 .89 .90 .95 .95 .75 .82 .86 .91 .89 .83 .89 .89 .60 .86 .95 .95 .89 .90 .85 .88 .89 .95 .95 .96 .98 .89 .82 .82 .79 .89 .75 .70 .74 .52 .72 .83 .83 .72 .71 .68 .77 .75 .79 .79 .79 .77 .80 L1 Spelling .91 .86 .96 .80 .79 .82 .54 .79 .91 .91 .80 .79 .78 .82 .82 .88 .88 .78 .83 .83 .83 L2 Capitalization .91 .97 .81 .82 .84 .55 .81 .93 .93 .82 .82 .81 .85 .86 .90 .90 .80 .82 .83 .82 .91 L3 .95 .84 .88 .89 .55 .84 .95 .94 .88 .88 .85 .89 .89 .94 .94 .87 .90 .91 .82 .88 .90 L4 .85 .85 .88 .57 .84 .96 .96 .86 .85 .84 .89 .88 .94 .93 .85 .88 .89 .90 .96 .96 .96 LT Punctu- Usage & Language ation Expression Total Language .88 .96 .66 .93 .92 .92 .83 .85 .82 .85 .86 .90 .90 .77 .87 .85 .71 .77 .77 .84 .82 M1 .98 .60 .92 .93 .92 .88 .90 .90 .90 .92 .94 .94 .81 .90 .88 .72 .78 .80 .88 .85 .89 M2 Concepts Problems & Data & InterpreEstimation tation .65 .96 .96 .95 .88 .91 .89 .90 .92 .95 .95 .81 .91 .89 .73 .79 .81 .88 .86 .97 .98 MT - Math Total .84 .62 .69 .53 .57 .55 .58 .58 .61 .63 .53 .59 .58 .56 .54 .56 .56 .58 .63 .58 .62 M3 Computation Mathematics .92 .94 .83 .86 .84 .86 .88 .91 .92 .78 .88 .86 .74 .78 .79 .84 .84 .94 .92 .95 .83 MT + Math Total .99 .91 .92 .89 .93 .93 .99 .98 .91 .96 .96 .85 .90 .91 .96 .96 .92 .94 .95 .62 .92 CT - Core Total .90 .92 .88 .92 .93 .98 .98 .91 .95 .96 .86 .90 .91 .95 .96 .91 .92 .94 .69 .94 .99 CT + Core Total .91 .87 .89 .91 .95 .95 .84 .91 .90 .72 .78 .80 .88 .85 .84 .89 .89 .56 .85 .92 .91 SS Social Studies .90 .90 .93 .96 .96 .83 .90 .90 .74 .80 .81 .87 .86 .85 .88 .89 .58 .86 .92 .91 .90 SC Science .89 .97 .93 .94 .78 .88 .86 .69 .75 .73 .86 .81 .87 .90 .90 .52 .85 .89 .88 .87 .90 S1 .97 .95 .94 .81 .88 .87 .78 .83 .84 .90 .89 .82 .87 .87 .53 .82 .91 .91 .89 .90 .88 S2 .96 .96 .82 .90 .89 .75 .81 .80 .90 .87 .87 .91 .91 .54 .86 .93 .92 .90 .93 .97 .97 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .99 .88 .95 .95 .80 .86 .86 .94 .92 .90 .93 .94 .59 .90 .98 .97 .95 .96 .93 .95 .97 CC - .88 .95 .95 .80 .85 .86 .94 .92 .90 .93 .94 .63 .92 .98 .97 .94 .96 .94 .93 .96 .99 CC + Compo- Composite site 3:16 PM -Does not include Computation +Includes Computation .90 .97 .72 .77 .80 .86 .84 .78 .83 .83 .56 .81 .91 .91 .84 .85 .78 .83 .82 .89 .89 Vocabulary Comprehension Reading Total Spelling Capitalization Punctuation Usage and Expression Language Total Concepts and Estimation Problems & Data Interpretation Math Total without Computation Computation Math Total with Computation Core Total without Computation Core Total with Computation Social Studies Science Maps and Diagrams Reference Materials Sources Total Composite without Computation Composite with Computation RT Reading Total RC Comprehension Reading 10/29/10 Note: RV Vocabulary Level 13 — Grade 7 Level 14 — Grade Table 8.4 (continued) Correlations Among School Average Developmental Standard Scores Iowa Tests of Basic Skills — Complete Battery, Form A 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 135 135 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 136 an ITBS score from the CogAT Nonverbal score (NV) has the following form: 2 3 Predicted ITBS SS = b1NV + b2NV + b3NV + c where NV stands for a student’s obtained score on the CogAT Nonverbal Battery, the b’s stand for the slopes (b1, b2, b3 ), and c stands for the intercept in the prediction equation. The prediction equations are used to compare expected and obtained achievement in combined ITBS and CogAT score reports. School districts decide which CogAT score to use in predicting ITBS standard scores. Separate equations are used for each ITBS test or Total score for fall, midyear, and spring testing. Choosing the CogAT test for predicting achievement depends on the purpose of combined ability-achievement reporting. If the objective is reliability of the difference between predicted and obtained achievement, which is important when predicting for individuals, the Nonverbal Battery is recommended. This is because its correlations with ITBS tests are relatively low. If the objective is accuracy in estimating ITBS scores, which is important in setting expectations for classes, buildings, or districts, the CogAT Composite should be used. This is because its correlations with ITBS tests are high. The prediction of achievement with information about cognitive ability is discussed in Thorndike’s (1963) monograph, The Concepts of Over- and Under-Achievement. Two problems deserve special mention. The first is excessive overlap of predictors and criterion and the second is unreliability. Reliabilities of difference scores and standard deviations of difference scores due to measurement error are presented in Table 8.6. Although scores on general measures of cognitive ability and educational achievement correlate highly, scores for individuals can show discrepancies. When expected achievement is calculated from ability, some discrepancy exists. Whether the discrepancy means anything is a subjective question. Some occur simply because of measurement errors; others are due to true differences between cognitive ability and school achievement. Although the precise influence of ability on achievement is difficult to determine, one can estimate the size of discrepancies caused by measurement error. The reliabilities of differences (rD) and the standard deviations of difference scores due to measurement error (SDE) reported in Table 136 8.6 help to interpret ability-achievement differences. These values were computed from the correlations between scores on the ITBS and the CogAT, the K-R20 reliability coefficients from the ITBS and CogAT standardization, and the standard deviations of the weighted national standardization sample. The first statistic in Table 8.6, rD, estimates the reliability of the predicted difference between actual and expected ITBS scores for students with the same Standard Age Scores. These coefficients are lower than the reliabilities of ITBS and CogAT scores because reliabilities of differences are affected by measurement error in both tests and by the correlation between tests. The next statistic, SDE, is the standard deviation of differences between expected and actual ITBS scores due to measurement error for students with the same CogAT Standard Age Score. If abilityachievement discrepancies were produced by measurement error alone, SDE would be sizable. SDE values are helpful in understanding differences between expected and obtained ITBS scores, as the following example demonstrates. Obtained Versus Expected Achievement According to the system of prediction equations described previously, the expected ITBS Standard Score in the spring of grade 5 on Language Total for a student with a Standard Age Score of 97 on the CogAT Composite is 214. From Table 8.6, the standard deviation of differences due to errors in measurement is 8.8. This means that differences of 8.8 or more standard score units between expected performance and actual performance occur about 32 percent of the time because of measurement error (i.e., 16 percent in each direction). However, the standard error of estimate computed from the correlations and standard deviations in Tables 8.5 and 5.1 is 20.4. The difference between a student’s actual and predicted standard score has to be at least this large before it will be identified as discrepant by the combined achievement/ability reporting system. Because the standard error of estimate (20.4) is more than twice as large as the standard deviation of differences due to measurement error (8.8), the probability of a student’s discrepancy score being labeled as extreme because of measurement error is very unlikely. K K 1 1 2 A B C D E F K K 1 1 2 A B C D E F K K 1 1 2 3 4 5 6 7 8 Verbal Quantitative .27 .56 .57 .61 .63 .65 .65 .66 .69 .67 .24 .50 .53 .61 .76 .77 .78 .79 .79 .80 -Does not include Computation +Includes Computation .56 .52 .54 .61 .63 .61 .60 .60 .59 .61 .60 .66 .60 .66 .61 .69 .75 .75 .76 .78 .80 .80 RC RV .43 .59 .63 .66 .66 .66 .66 .66 .69 .67 .38 .52 .60 .69 .80 .80 .81 .83 .84 .84 RT Reading Total .50 .50 .56 .58 .61 .58 .58 .59 .44 .44 .63 .63 .65 .63 .63 .65 L1 Spelling .60 .62 .64 .62 .64 .63 .62 .61 .65 .65 .67 .69 L2 Capitalization .61 .64 .65 .64 .66 .65 .62 .64 .66 .65 .66 .68 L3 Punctuation .64 .65 .66 .66 .68 .67 .74 .73 .74 .75 .76 .74 L4 .70 .66 .70 .66 .66 .71 .72 .73 .71 .72 .71 .74 .66 .71 .62 .62 .76 .75 .77 .76 .76 .77 LT .72 .72 .74 .77 .79 .79 .82 .82 .66 .63 .70 .71 .72 .72 .73 .75 M1 .73 .73 .72 .74 .75 .75 .77 .76 .65 .63 .73 .71 .72 .71 .72 .72 M2 .72 .68 .74 .78 .78 .78 .80 .81 .81 .83 .83 .57 .55 .66 .66 .65 .66 .62 .60 .46 .42 .51 .50 .51 .50 .46 .46 M3 MT - .68 .60 .67 .71 .68 .76 .75 .76 .75 .76 .77 Computation Math Total Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language .77 .76 .74 .76 .77 .77 .78 .79 .69 .65 .66 .66 .70 .68 .68 .70 MT + Math Total .74 .71 .76 .75 .77 .77 .78 .79 .79 .80 .80 .79 .73 .79 .71 .74 .84 .83 .84 .84 .85 .85 CT - Core Total .74 .76 .78 .79 .79 .79 .80 .79 .69 .73 .84 .83 .84 .84 .84 .85 CT + Core Total .62 .56 .62 .63 .62 .63 .67 .64 .69 .64 .74 .73 .74 .75 .75 .73 SS Social Studies .61 .60 .64 .66 .68 .67 .67 .67 .67 .67 .73 .73 .75 .75 .74 .75 SC Science .65 .67 .71 .70 .70 .68 .69 .71 .71 .69 .71 .69 S1 .66 .66 .68 .67 .67 .67 .73 .72 .73 .74 .71 .73 S2 .64 .68 .70 .72 .75 .74 .74 .72 .61 .65 .77 .77 .77 .77 .76 .76 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .79 .79 .76 .77 .78 .77 .79 .78 .78 .80 .84 .84 .85 .85 .85 .85 CC - .78 .79 .76 .77 .79 .78 .79 .78 .78 .80 .84 .83 .85 .84 .84 .84 CC + Compo- Composite site .62 .55 .63 .62 .62 - .58 .50 .58 .57 .59 - WA Word Analysis .68 .61 .62 .64 .63 - .73 .67 .68 .68 .68 - Li Listening .73 .67 .73 .70 .72 - .77 .69 .74 .67 .72 - RPT Reading Profile Total 3:16 PM Note: 5 6 6 7 8 9 10 11 12 13 14 Grade CogAT Level K K 1 1 2 3 4 5 6 7 8 Comprehension Vocabulary Reading 10/29/10 5 6 6 7 8 9 10 11 12 13 14 ITBS Level Table 8.5 Correlations Between Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 137 137 K K 1 1 2 A B C D E F K K 1 1 2 3 4 5 6 7 8 Nonverbal Composite .29 .60 .60 .66 .74 .75 .75 .76 .78 .77 .27 .51 .48 .52 .61 .62 .61 .62 .63 .63 -Does not include Computation +Includes Computation .61 .57 .59 .65 .69 .71 .70 .70 .71 .72 .72 .40 .38 .37 .48 .51 .56 .55 .55 .54 .55 .56 RC RV .46 .63 .66 .72 .77 .77 .77 .78 .79 .79 .40 .54 .51 .55 .62 .62 .62 .62 .63 .63 RT Reading Total .53 .51 .61 .63 .65 .62 .62 .64 .46 .43 .46 .49 .51 .47 .47 .49 L1 Spelling .65 .66 .69 .67 .70 .70 .54 .57 .59 .55 .58 .59 L2 Capitalization .66 .70 .70 .69 .71 .71 .57 .60 .60 .59 .61 .60 L3 Punctuation .73 .73 .74 .75 .76 .75 .60 .60 .61 .61 .62 .63 L4 .75 .70 .73 .70 .69 .78 .78 .79 .78 .78 .79 .56 .50 .53 .55 .56 .64 .65 .66 .64 .65 .65 LT Usage & Language Expression Total Language .74 .73 .78 .80 .82 .82 .83 .84 .58 .59 .67 .69 .71 .70 .71 .72 M1 .74 .73 .78 .79 .80 .80 .81 .80 .56 .58 .66 .68 .69 .68 .70 .70 M2 Concepts Problems & Data & InterpreEstimation tation .75 .70 .75 .80 .78 .83 .84 .84 .85 .86 .86 .58 .55 .62 .62 .63 .63 .59 .57 .49 .48 .54 .54 .54 .53 .51 .49 M3 MT - .59 .54 .58 .61 .63 .71 .73 .73 .73 .74 .75 Computation Math Total Mathematics .79 .77 .77 .78 .80 .79 .79 .81 .62 .63 .63 .67 .69 .67 .68 .70 MT + Math Total .79 .76 .80 .79 .81 .85 .86 .87 .87 .87 .88 .58 .55 .56 .61 .64 .71 .72 .73 .71 .72 .73 CT - Core Total .78 .81 .86 .86 .87 .86 .87 .87 .60 .64 .71 .72 .72 .71 .72 .72 CT + Core Total .68 .61 .72 .71 .72 .73 .75 .73 .47 .42 .59 .59 .59 .59 .62 .61 SS Social Studies .66 .65 .74 .74 .76 .76 .76 .76 .47 .47 .62 .63 .64 .64 .65 .65 SC Science .73 .75 .77 .77 .77 .76 .64 .67 .69 .68 .69 .70 S1 .74 .74 .75 .75 .74 .75 .61 .63 .64 .64 .63 .65 S2 .69 .73 .79 .80 .82 .82 .81 .81 .56 .59 .68 .70 .72 .71 .71 .72 ST Maps Reference Sources and Materials Total Diagrams Sources of Information .84 .84 .85 .86 .87 .87 .87 .87 .63 .65 .71 .73 .73 .72 .73 .74 CC - .83 .84 .85 .86 .87 .87 .87 .87 .63 .64 .71 .73 .73 .73 .74 .74 CC + Compo- Composite site .65 .56 .66 .65 .65 - .53 .44 .53 .53 .52 - WA Word Analysis .74 .67 .66 .68 .67 - .56 .48 .46 .48 .48 - Li Listening .79 .73 .78 .74 .77 - .58 .55 .60 .57 .59 - RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 A B C D E F K K 1 1 2 3 4 5 6 7 8 Comprehension Vocabulary Reading 10/29/10 5 6 6 7 8 9 10 11 12 13 14 5 6 6 7 8 9 10 11 12 13 14 Grade CogAT Level 138 ITBS Level Table 8.5 (continued) Correlations Between Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 138 Note: SDE 9.03 8.06 10.32 7.21 8.69 7.79 8.14 9.39 10.34 11.19 11.78 .79 .87 .81 .76 .77 .76 .74 .70 .69 .70 RT Reading Total 4.48 2.70 5.41 4.20 5.08 4.70 7.68 6.81 8.31 6.29 10.25 7.19 11.42 8.16 12.56 9.07 12.90 9.63 13.78 10.19 .61 .81 .83 .75 .73 .69 .68 .66 .70 .69 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 .36 .55 .41 .74 .63 .70 .72 .68 .64 .59 .59 RC Comprehension 4.25 7.17 7.10 9.02 10.06 11.21 11.35 12.35 .82 .74 .80 .79 .80 .80 .82 .79 L1 Spelling 11.49 15.21 16.89 19.84 20.83 21.63 .75 .72 .71 .66 .66 .65 L2 Capitalization 11.22 14.05 15.57 17.08 18.70 20.22 .73 .72 .73 .76 .74 .71 L3 Punctuation .36 .46 .44 .71 .71 .83 .84 .83 .83 .83 .82 LT .51 .56 .62 .69 .69 .73 .73 .71 M1 5.09 5.21 7.04 5.66 7.12 7.25 8.55 10.32 6.02 8.51 12.15 7.56 8.78 14.72 8.76 9.90 16.02 9.93 10.25 17.35 10.59 11.08 19.31 11.53 11.97 .70 .74 .70 .71 .70 .70 L4 .49 .51 .57 .62 .68 .73 .75 .76 .77 .77 .77 .79 .77 .80 .82 .83 .79 .79 .81 M3 MT - 5.12 5.49 6.58 7.90 5.84 3.75 9.30 6.99 5.65 9.89 6.94 6.42 11.96 7.93 7.52 13.06 8.88 8.66 14.87 9.77 11.58 15.71 10.50 13.53 16.39 11.13 14.17 .57 .63 .64 .64 .66 .66 .67 .69 M2 Computation Math Total 4.34 5.43 5.20 5.99 6.88 7.95 8.73 9.30 .71 .76 .85 .86 .85 .85 .85 .85 MT + Math Total 4.51 4.29 5.52 4.11 5.40 4.76 5.62 6.42 7.28 7.99 8.48 .43 .59 .50 .79 .74 .81 .82 .81 .80 .79 .78 CT - Core Total 3.79 5.07 4.45 5.27 6.09 7.00 7.54 8.12 .82 .77 .82 .83 .82 .80 .81 .79 CT + Core Total 8.46 11.28 9.14 10.85 12.56 14.54 15.27 16.06 .41 .32 .61 .64 .64 .62 .65 .69 SS Social Studies 9.69 12.29 10.39 11.11 12.31 13.48 14.34 15.48 .40 .39 .64 .69 .69 .69 .71 .69 SC Science 11.65 13.23 15.40 17.35 19.65 19.85 .59 .63 .62 .63 .58 .64 S1 7.84 9.96 11.29 13.57 14.29 15.99 .66 .68 .70 .65 .71 .66 S2 5.91 7.78 7.47 8.83 10.25 11.87 13.00 13.68 .67 .64 .69 .72 .72 .70 .72 .72 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 4.09 5.36 4.60 5.45 6.35 7.27 7.88 8.45 .72 .65 .82 .82 .81 .80 .80 .80 CC - .63 .70 .63 .72 .69 - WA Word Analysis 7.05 6.80 9.07 3.96 7.99 5.29 10.62 4.54 5.34 6.29 7.18 7.73 8.30 - .71 .65 .82 .84 .81 .81 .82 .81 CC + Compo- Composite site 5.73 6.11 7.38 8.07 9.54 - .39 .41 .44 .33 .37 - Li Listening 5.00 4.69 4.56 4.00 5.50 - .50 .66 .71 .82 .76 - RPT Reading Profile Total 3:16 PM rD K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 RV Vocabulary Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 Level Grade Verbal Reading Table 8.6 Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE ) Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 139 139 SDE .61 .80 .82 .78 .82 .79 .79 .79 .79 .81 4.49 5.36 4.96 7.26 8.14 10.09 11.27 12.30 12.77 13.51 .51 .63 .57 .76 .73 .80 .82 .80 .80 .78 .79 8.70 7.77 9.86 6.91 8.02 7.63 7.93 9.19 9.97 10.81 11.34 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 rD RC RV 2.70 4.15 4.52 6.11 6.09 6.96 7.94 8.64 9.26 9.72 .78 .86 .82 .82 .86 .86 .85 .85 .84 .85 RT Reading Total 4.23 7.11 7.08 9.03 10.18 11.24 11.39 12.39 .81 .73 .82 .81 .81 .82 .83 .81 L1 Spelling 11.58 15.40 17.15 19.96 20.99 21.70 .75 .71 .70 .67 .68 .69 L2 Capitalization 11.32 14.22 15.84 17.33 19.04 20.46 .73 .71 .73 .75 .73 .73 L3 Punctuation .47 .48 .49 .70 .71 .86 .85 .84 .85 .85 .85 LT .44 .46 .55 .60 .56 .62 .57 .56 M1 .48 .54 .64 .59 .60 .59 .58 .62 M2 .45 .41 .49 .55 .58 .69 .67 .67 .67 .64 .65 .76 .73 .72 .74 .76 .70 .71 .75 M3 MT - .64 .69 .79 .78 .78 .76 .76 .75 MT + Math Total 4.89 5.05 5.11 5.50 6.80 6.49 5.52 7.03 7.82 5.72 3.78 4.26 7.00 8.46 9.20 6.85 5.71 5.35 10.23 6.09 8.69 10.02 7.17 6.68 5.51 12.10 7.72 9.07 12.21 8.29 7.88 6.44 14.75 9.04 10.34 13.45 9.43 9.13 7.45 15.91 10.07 10.76 15.31 10.37 12.14 8.62 17.30 10.82 11.75 16.25 11.26 14.10 9.53 19.36 11.69 12.68 17.00 11.98 14.79 10.22 .78 .79 .76 .78 .77 .75 L4 Computation Math Total 4.22 4.09 5.13 3.86 4.91 4.78 5.73 6.69 7.46 8.20 8.78 .59 .65 .61 .79 .76 .86 .85 .84 .83 .83 .82 CT - Core Total 3.58 4.60 4.51 5.42 6.35 7.18 7.81 8.35 .81 .79 .86 .85 .84 .84 .83 .84 CT + Core Total 8.11 10.89 9.04 10.77 12.46 14.35 15.24 16.00 .54 .45 .72 .72 .74 .73 .72 .75 SS Social Studies 9.35 11.76 10.34 11.09 12.38 13.44 14.35 15.48 .51 .52 .71 .74 .74 .76 .76 .76 SC Science 11.70 13.30 15.68 17.64 19.88 20.15 .62 .67 .60 .60 .58 .64 S1 7.83 9.97 11.41 13.57 14.41 16.07 .72 .72 .73 .71 .74 .71 S2 5.77 7.49 7.47 8.89 10.54 12.07 13.27 13.91 .67 .64 .75 .76 .73 .72 .72 .75 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 3.70 4.67 4.59 5.47 6.53 7.27 8.03 8.60 .76 .75 .87 .87 .86 .86 .85 .85 CC - .61 .68 .61 .70 .69 - WA Word Analysis 6.99 6.79 8.95 3.56 7.86 4.62 10.30 4.53 5.41 6.52 7.31 7.94 8.52 - .77 .75 .88 .87 .85 .85 .85 .85 CC + Compo- Composite site 5.49 5.90 7.09 7.83 9.16 - .52 .52 .55 .43 .48 - Li Listening 4.70 4.46 4.19 3.75 4.93 - .62 .71 .76 .83 .81 - RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 Comprehension Vocabulary Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 Level Grade Quantitative 140 Reading Table 8.6 (continued) Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE ) Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 140 SDE 8.45 7.54 9.57 6.47 7.60 7.38 7.66 8.86 9.62 10.44 11.02 4.48 5.15 4.66 6.89 7.88 9.76 10.81 11.77 12.07 12.91 .61 .83 .86 .83 .83 .81 .82 .82 .84 .84 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 rD .62 .70 .67 .83 .80 .83 .85 .83 .83 .81 .81 RC RV .83 .76 .86 .84 .85 .86 .87 .85 L1 Spelling .79 .74 .75 .73 .73 .73 L2 Capitalization .76 .75 .77 .79 .78 .77 L3 Punctuation 2.67 3.88 4.13 4.11 5.57 6.96 5.75 6.81 11.27 11.07 6.58 8.70 15.04 13.87 7.45 9.67 16.64 15.31 8.10 10.68 19.41 16.70 8.60 10.81 20.41 18.38 9.15 11.82 21.20 19.84 .79 .88 .87 .88 .89 .88 .88 .88 .88 .88 RT Reading Total .64 .64 .68 .78 .78 .90 .89 .89 .90 .90 .89 LT .63 .63 .65 .72 .70 .75 .75 .74 M1 .69 .71 .71 .67 .69 .70 .70 .71 M2 .62 .58 .70 .76 .77 .78 .78 .79 .79 .80 .79 .80 .77 .79 .81 .83 .78 .78 .80 M3 MT - 4.69 4.84 4.90 5.31 6.45 6.09 5.20 6.70 7.37 5.21 3.62 6.65 8.11 8.68 6.29 5.56 9.90 5.57 8.42 9.69 6.76 6.42 11.64 7.08 8.67 11.81 7.78 7.53 14.16 8.14 9.81 12.90 8.69 8.67 15.20 9.10 10.07 14.64 9.53 11.58 16.51 9.80 10.93 15.53 10.29 13.60 18.70 10.75 11.86 16.30 11.03 14.24 .81 .82 .80 .82 .81 .79 L4 Computation Math Total 3.82 4.86 5.08 5.93 6.76 7.78 8.63 9.30 .81 .82 .86 .86 .86 .86 .85 .85 MT + Math Total 3.91 3.78 4.60 3.28 4.27 4.23 5.05 5.78 6.36 7.04 7.70 .76 .79 .81 .89 .88 .91 .91 .90 .91 .90 .89 CT - Core Total 3.01 4.00 3.92 4.70 5.41 6.10 6.67 7.29 .90 .88 .91 .91 .91 .91 .91 .90 CT + Core Total 7.82 10.66 8.85 10.51 12.12 13.92 14.71 15.56 .66 .56 .74 .76 .77 .76 .77 .78 SS Social Studies 9.06 11.45 10.11 10.77 11.89 12.92 13.87 15.02 .63 .63 .74 .77 .78 .79 .79 .78 SC Science 11.51 13.05 15.27 17.19 19.49 19.89 .64 .68 .64 .64 .61 .63 S1 .74 .73 .78 .79 .78 .77 .77 .76 ST 5.53 7.18 7.60 7.22 9.70 8.54 10.98 9.98 13.13 11.47 13.91 12.68 15.67 13.49 .76 .75 .77 .75 .77 .74 S2 Maps Reference Sources and Materials Total Diagrams Sources of Information 3.04 4.00 4.05 4.84 5.62 6.29 6.94 7.68 .90 .88 .92 .91 .91 .91 .91 .90 CC - 2.95 3.92 3.98 4.77 5.56 6.31 6.92 7.61 .90 .88 .92 .91 .91 .91 .90 .90 CC + Compo- Composite site 6.79 6.58 8.62 7.51 9.89 - .69 .74 .69 .77 .76 - WA Word Analysis 5.28 5.72 6.82 7.58 8.88 - .65 .63 .67 .59 .62 - Li Listening 4.37 4.20 3.70 3.24 4.34 - .77 .79 .86 .90 .89 - RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 Comprehension Vocabulary Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 Level Grade Nonverbal Reading Table 8.6 (continued) Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE ) Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 141 141 SDE 8.48 7.57 9.65 6.50 7.61 7.28 7.54 8.76 9.53 10.31 10.82 4.46 5.08 4.64 6.83 7.70 9.53 10.60 11.50 11.70 12.41 .61 .80 .84 .78 .78 .75 .75 .75 .77 .78 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 rD .49 .62 .56 .77 .72 .77 .80 .77 .76 .74 .74 RC Comprehension .82 .74 .82 .81 .82 .82 .84 .82 L1 Spelling .75 .70 .70 .66 .66 .67 L2 Capitalization .72 .69 .72 .75 .73 .72 L3 Punctuation 2.62 3.79 4.13 4.06 5.54 6.90 5.57 6.77 11.13 10.90 6.36 8.61 14.81 13.63 7.23 9.56 16.38 15.02 7.87 10.61 19.20 16.35 8.26 10.69 20.10 17.92 8.72 11.65 20.76 19.33 .79 .87 .84 .83 .84 .83 .83 .82 .82 .83 RT Reading Total .45 .47 .50 .70 .72 .86 .86 .86 .86 .86 .85 LT .46 .49 .53 .61 .57 .64 .63 .61 M1 .52 .59 .60 .55 .57 .57 .58 .62 M2 .46 .43 .54 .59 .66 .67 .67 .70 .68 .69 .69 .78 .75 .77 .79 .80 .75 .75 .79 M3 MT - 4.65 4.76 4.89 5.26 6.48 6.09 5.19 6.69 7.38 5.22 3.59 6.57 8.04 8.61 6.19 5.48 9.72 5.34 8.27 9.51 6.53 6.32 11.39 6.73 8.44 11.56 7.45 7.39 13.89 7.73 9.55 12.60 8.30 8.51 14.88 8.68 9.74 14.31 9.09 11.42 16.07 9.20 10.50 15.05 9.70 13.39 18.08 10.05 11.30 15.66 10.23 13.99 .75 .77 .73 .75 .74 .73 L4 Computation Math Total 3.81 4.75 4.93 5.66 6.42 7.42 8.13 8.62 .69 .74 .81 .82 .81 .81 .82 .80 MT + Math Total 3.87 3.75 4.68 3.27 4.17 3.90 4.61 5.29 5.89 6.36 6.84 .58 .65 .62 .82 .80 .86 .85 .85 .84 .85 .83 CT - Core Total 3.01 3.90 3.61 4.26 4.96 5.61 5.99 6.46 .84 .81 .86 .87 .85 .86 .85 .85 CT + Core Total 7.88 10.70 8.73 10.34 11.95 13.73 14.38 15.13 .50 .42 .66 .69 .70 .68 .69 .72 SS Social Studies 9.10 11.47 9.95 10.53 11.62 12.59 13.41 14.46 .48 .49 .66 .71 .71 .72 .73 .72 SC Science 11.33 12.75 14.92 16.81 19.00 19.20 .56 .61 .56 .55 .52 .58 S1 .66 .63 .70 .72 .70 .68 .70 .70 ST 5.50 7.09 7.48 7.00 9.50 8.21 10.73 9.58 12.83 11.03 13.54 12.10 15.17 12.73 .68 .69 .71 .68 .72 .68 S2 Maps Reference Sources and Materials Total Diagrams Sources of Information 3.07 3.93 3.70 4.33 5.09 5.73 6.14 6.64 .79 .77 .88 .87 .86 .86 .86 .86 CC - 2.97 3.88 3.63 4.26 5.04 5.70 6.07 6.58 .80 .77 .88 .88 .86 .86 .86 .86 CC + Compo- Composite site 6.66 6.51 8.59 7.46 9.81 - .62 .70 .61 .72 .70 - WA Word Analysis 5.23 5.71 6.86 7.61 8.90 - .48 .49 .54 .40 .46 - Li Listening 4.32 4.13 3.71 3.24 4.30 - .60 .70 .78 .85 .83 - RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 RV Vocabulary Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 10/29/10 Level Grade Composite 142 Reading Table 8.6 (continued) Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE ) Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 142 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 143 Predicting Achievement from General Cognitive Ability: Group Averages Norms for school averages show the rank of a school in the distribution of all schools that participated in the national standardization. Norms for school averages are included in the Norms and Score Conversions manuals for the ITBS. They also appear on reports available from the Riverside Scoring Service. When interpreting averages for a group, how the group compares with groups similar in average ability is also of interest. Few groups of students are exactly at the national average in cognitive ability; most are above or below. Grade groups can also differ markedly in average ability from year to year, even within the same building. Such factors should be considered when interpreting achievement test results. Comparisons of average achievement and ability are based on data from the spring 2000 standardization of the ITBS and the CogAT. Average scores were obtained for each building; Developmental Standard Scores (SS) for the ITBS and Standard Age Scores (SAS) for the CogAT were used. Records of the two sets of scores were matched by school building. Correlations between CogAT and ITBS averages appear in Table 8.7. As expected, correlations of the ITBS tend to be higher with the CogAT Verbal Battery and lower with the CogAT Nonverbal Battery. Correlations between averages from the ITBS Math tests and CogAT Quantitative Battery are also high. Correlations between CogAT and ITBS averages are generally lower for Spelling and Math Computation. The combined ITBS/CogAT scoring service described in the Interpretive Guide for School Administrators is used with individual student scores. Class summaries are furnished at the end of each report. A classroom teacher can compare the average obtained ITBS scores to the average ITBS scores expected from prediction equations used for combined reporting. Such summaries are also described for building and system averages. When combined ITBS/CogAT score reports are unavailable, Table 8.7 can be used to compare ITBS and CogAT averages. 1 Table 8.7 gives values needed for predicting average ITBS scores from average CogAT scores. Values in the table include the slope (b) and the intercept (c) of the equations for predicting ITBS building averages from CogAT building averages. The equations are in the form: Predicted ITBS SS = b (SAS) + c where SAS is the average CogAT Standard Age Score and b and c are the prediction constants.1 The following example illustrates how to construct a prediction equation for averages. Suppose the average Nonverbal SAS for a school in the spring of grade 6 was 92.6 (i.e., SAS = 92.6). The predicted average ITBS Composite (CC + in standard score units) for such a school would be 1.897 (92.6) + 42.3 = 218. The values of b = 1.897 and c = 42.3 come from Table 8.7. The predicted average Composite score of 218 is an estimate of the average achievement for schools with a Nonverbal SAS of 92.6. The ITBS norms for building averages indicate that 26 percent of the school buildings in the national standardization had an average standard score Composite lower than 218. The prediction equation for averages simply indicates that for a school with an SAS score below the national average, the ITBS building average is also expected to be below the national average. Comparing the predicted average achievement for a school with the actual ITBS school average allows an administrator to determine if average achievement is above or below expectation. It does not indicate whether the magnitude of a difference is important. To evaluate the importance of the discrepancy, the standard errors of estimate (S y.x) in Table 8.7 are helpful. These values estimate the standard deviation of ITBS school averages for a group of schools with the same average CogAT SAS. They measure the variability expected in average achievement of schools similar in average cognitive ability. Third-degree polynomial regression equations were used for the combined ITBS/CogAT individual scoring because the relationships between ability and achievement results for individual students tend to be curvilinear; however, the relationships between ITBS and CogAT school averages appear to be linear. 143 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 144 To illustrate the use of standard errors of estimate, consider the previous example. In Table 8.7, Sy.x for CC+ in grade 6 when predicted by CogAT Nonverbal is 9.2 ITBS standard score units. This value, 9.2, represents the amount by which an average standard score must differ from the predicted average standard score to be in the upper or lower 16 percent of school averages. This means that for schools with an average SAS of 92.6, 16 percent would be expected to have average ITBS Composites above 227.2 (218 + 9.2) and 16 percent would be expected to have average Composites below 208.8 (218 – 9.2).1 1 The prediction constants and standard errors of estimate in Table 8.7 were obtained from schools that administered the ITBS in the spring. They are applicable only at this time of year. Estimates of expected average achievement at fall and midyear can be obtained by adjusting the predicted ITBS SS by grade-to-grade differences in average standard scores. Information about these adjustments can be obtained from the publisher. Adding and subtracting standard errors of estimate to predicted mean standard scores is not strictly appropriate in evaluating a discrepancy between actual and predicted achievement. The standard error of estimate does not account for errors associated with the establishment of the prediction equation. However, accounting for such errors would, in most applications, increase the reported standard errors by less than 2 percent. (Note: The top and bottom 10 percent of averages may be obtained by adding and subtracting 1.28 standard errors of estimate to the predicted average grade equivalent; for the top and bottom 5 percent, use 1.64 standard errors.) 144 K K 1 1 2 3 4 5 6 7 8 K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 5 6 6 7 8 Sy.x 9 10 11 12 13 14 c b 4.6 4.6 5.1 6.5 6.8 6.2 7.3 7.0 9.2 9.5 123.5 118.5 94.3 86.0 66.0 52.0 42.7 24.0 20.1 12.2 .117 .332 .588 .852 1.241 1.527 1.752 2.023 2.183 2.381 3.7 4.0 5.6 5.7 6.1 5.0 5.8 5.9 7.1 7.5 121.1 117.0 83.9 68.1 74.4 65.1 54.7 44.5 43.1 39.9 .142 .348 .690 1.028 1.146 1.384 1.623 1.816 1.954 2.105 .30 .59 .75 .83 .85 .90 .91 .92 .91 .90 RT Reading Total 4.4 5.1 5.5 6.7 7.6 10.3 9.9 9.1 113.0 117.5 108.7 99.7 79.3 75.4 95.3 98.2 .379 .509 .779 1.041 1.392 1.531 1.439 1.545 .62 .63 .77 .77 .82 .74 .76 .79 L1 Spelling -Does not include Computation +Includes Computation 4.3 7.3 4.9 6.8 6.0 6.8 5.2 6.0 6.2 6.9 7.2 52.5 76.6 73.9 72.3 49.6 79.3 76.8 62.1 64.8 66.1 68.7 .800 .631 .762 .801 1.207 1.084 1.255 1.537 1.611 1.724 1.819 .20 .52 .73 .73 .84 .88 .88 .91 .89 .88 RC Comprehension 9.9 10.9 11.7 13.9 14.8 14.5 66.5 30.0 15.0 .6 -3.6 -28.2 1.216 1.751 2.043 2.289 2.435 2.826 .72 .78 .81 .77 .80 .82 L2 Capitalization 8.7 10.4 11.1 13.3 14.0 13.9 73.2 36.3 8.8 -6.0 -2.4 -29.3 1.151 1.694 2.110 2.360 2.439 2.833 .75 .78 .83 .80 .81 .84 L3 Punctuation 8.1 9.2 9.6 11.2 11.8 13.4 53.8 27.2 8.1 -35.7 -22.0 -65.4 1.376 1.797 2.138 2.648 2.620 3.176 .82 .83 .87 .87 .87 .87 L4 2.5 4.1 3.7 4.2 5.3 6.6 7.5 7.9 9.2 10.8 10.4 81.7 94.5 91.8 90.9 89.8 77.3 48.8 28.5 9.0 17.4 -5.3 .512 .478 .595 .610 .797 1.115 1.567 1.916 2.204 2.228 2.588 .87 .69 .80 .80 .78 .82 .85 .89 .87 .86 .88 LT Usage & Language Expression Total Language 4.1 5.5 5.2 6.0 6.3 8.2 8.9 10.4 91.2 88.2 99.3 88.9 84.2 63.8 79.2 46.5 .598 .798 .870 1.118 1.306 1.623 1.584 2.023 .80 .77 .82 .82 .85 .83 .82 .82 M1 4.9 5.9 6.6 6.7 7.7 9.2 9.9 10.7 73.9 80.6 71.5 61.6 44.1 34.1 29.6 8.8 .775 .887 1.166 1.421 1.734 1.940 2.105 2.415 .82 .78 .83 .85 .87 .84 .86 .86 M2 Concepts Problems & Data & InterpreEstimation tation 2.6 4.3 3.8 4.1 5.1 5.6 6.0 6.4 8.1 8.3 9.4 87.3 92.9 98.0 82.6 85.5 85.8 75.3 64.2 49.6 54.5 27.7 .450 .444 .519 .686 .834 1.015 1.269 1.521 1.776 1.843 2.218 3.6 6.3 7.0 8.5 9.9 13.2 16.0 17.8 119.8 128.0 125.3 119.3 129.1 125.0 107.8 100.2 .307 .414 .602 .818 .852 1.034 1.321 1.526 .61 .48 .59 .59 .56 .50 .55 .54 M3 MT - .83 .64 .76 .84 .81 .84 .85 .88 .85 .87 .87 Computation Math Total Mathematics 3.6 5.0 5.3 5.9 6.4 8.3 9.2 10.2 95.1 100.1 99.0 89.9 86.0 75.2 72.0 52.3 .559 .690 .878 1.120 1.297 1.525 1.672 1.984 .82 .75 .82 .82 .85 .81 .83 .82 MT + Math Total 2.4 4.3 3.4 4.1 4.6 5.2 5.1 5.4 5.9 7.4 7.1 74.0 88.9 87.7 86.7 81.8 79.5 64.4 48.7 34.8 39.9 21.9 .586 .510 .627 .654 .881 1.091 1.395 1.693 1.929 1.995 2.294 .90 .69 .84 .83 .85 .87 .90 .93 .93 .91 .92 CT - Core Total 3.9 4.6 5.1 5.0 5.3 6.1 7.4 7.3 90.7 86.8 83.3 69.2 56.0 43.6 45.6 30.3 .614 .832 1.051 1.346 1.618 1.842 1.939 2.213 .82 .83 .87 .90 .92 .91 .90 .92 CT + Core Total 4.3 5.1 6.6 6.1 7.7 8.7 9.8 13.0 74.7 89.4 79.6 58.9 47.9 20.9 16.7 -7.3 .773 .807 1.096 1.447 1.710 2.065 2.232 2.564 .86 .79 .82 .88 .87 .87 .88 .83 SS Social Studies 4.8 5.5 6.0 7.0 8.5 9.9 10.5 11.7 68.6 61.4 58.7 50.6 29.2 8.5 25.4 -4.5 .821 1.083 1.311 1.552 1.912 2.198 2.159 2.568 .85 .85 .88 .86 .87 .85 .86 .85 SC Science 7.6 7.5 9.0 9.8 12.0 13.3 65.1 48.4 36.8 12.2 20.4 -4.7 1.249 1.573 1.828 2.162 2.210 2.558 .81 .85 .85 .85 .83 .82 S1 4.9 6.2 8.1 8.8 10.3 11.6 85.5 76.7 58.2 34.4 32.3 -8.1 1.025 1.278 1.615 1.951 2.091 2.600 .87 .84 .85 .85 .85 .86 S2 4.3 5.6 5.6 6.0 7.5 8.6 10.1 11.2 91.2 84.1 75.9 62.3 47.6 23.7 26.8 -6.2 .597 .851 1.132 1.427 1.721 2.053 2.147 2.577 .79 .78 .87 .88 .88 .87 .86 .86 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 3.6 4.0 5.0 5.0 5.6 6.2 7.7 9.1 82.5 77.8 75.5 61.3 44.7 26.3 32.9 6.5 .696 .923 1.136 1.432 1.744 2.018 2.073 2.446 .87 .88 .89 .91 .93 .92 .91 .90 CC - 3.5 4.0 5.1 5.1 5.7 6.4 7.8 9.2 84.2 79.7 76.1 61.3 46.7 29.0 35.0 11.0 .679 .904 1.132 1.433 1.722 1.992 2.053 2.402 .87 .88 .88 .91 .92 .92 .90 .89 CC + Compo- Composite site 5.9 7.0 6.5 7.0 7.7 77.2 96.9 93.5 73.6 56.4 .565 .371 .591 .805 1.150 .64 .39 .61 .73 .78 WA Word Analysis 2.4 4.1 3.6 3.9 4.8 74.8 81.3 90.3 91.8 84.6 .581 .602 .603 .596 .841 .90 .76 .82 .81 .82 Li Listening 3.4 3.5 3.4 4.5 4.9 68.9 98.1 98.9 90.1 79.4 .642 .392 .526 .625 .908 .86 .67 .79 .79 .84 RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 .85 .57 .80 .74 .86 .80 .88 .90 .89 .90 .88 RV Vocabulary Reading 10/29/10 r K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 Level Grade Verbal Table 8.7 Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 145 145 146 K K 1 1 2 3 4 5 6 7 8 K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 b c 5 6 6 7 8 Sy.x 9 10 11 12 13 14 5.1 7.4 5.4 6.2 6.8 8.0 7.0 8.4 8.7 10.2 10.3 53.6 77.2 69.4 51.8 46.2 82.8 88.5 73.8 83.3 88.1 103.5 .798 .635 .804 .996 1.242 1.056 1.143 1.425 1.427 1.535 1.482 4.5 4.2 4.6 6.5 8.4 7.7 9.2 9.5 11.8 11.7 118.4 106.1 77.7 77.5 71.0 59.9 47.5 37.0 32.1 37.1 .170 .451 .746 .937 1.200 1.453 1.708 1.891 2.103 2.145 .28 .63 .79 .74 .74 .81 .81 .82 .80 .81 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 r .78 .55 .75 .79 .81 .71 .77 .78 .76 .75 .74 RC Comprehension 3.5 3.7 5.0 6.0 7.6 6.8 7.9 8.6 10.4 10.4 114.7 106.4 65.5 62.0 78.3 75.2 60.7 60.3 60.2 69.1 .207 .450 .865 1.089 1.116 1.289 1.567 1.657 1.819 1.825 .42 .68 .81 .81 .75 .81 .83 .81 .80 .80 RT Reading Total 4.1 5.2 6.0 7.1 8.6 11.2 10.2 10.7 101.1 114.7 107.4 101.9 81.6 81.8 92.2 122.3 .492 .538 .797 1.022 1.372 1.465 1.496 1.312 .69 .62 .72 .73 .76 .69 .74 .68 L1 Spelling 9.8 11.1 12.6 15.2 15.4 17.2 54.2 26.3 10.8 8.1 -8.1 8.9 1.347 1.793 2.088 2.210 2.523 2.470 .73 .77 .78 .72 .78 .74 L2 Capitalization 8.9 10.3 11.8 14.7 14.1 17.1 65.1 30.2 1.7 1.3 -11.4 11.3 1.240 1.759 2.184 2.283 2.572 2.442 .74 .78 .81 .75 .81 .74 L3 Punctuation 9.3 10.6 11.4 14.1 14.0 16.9 53.7 35.4 11.1 -17.1 -15.7 -25.1 1.386 1.721 2.112 2.461 2.603 2.791 .76 .77 .81 .78 .82 .78 L4 2.7 3.8 3.5 3.9 5.1 7.0 8.1 9.2 11.2 11.6 13.6 78.8 90.1 81.8 77.4 80.2 71.6 49.0 27.0 19.1 14.8 30.0 .546 .530 .691 .737 .893 1.179 1.570 1.934 2.100 2.294 2.249 .85 .73 .83 .83 .80 .79 .82 .84 .80 .83 .78 LT 3.6 5.4 5.0 5.3 5.6 6.3 7.0 7.9 76.4 79.4 89.2 78.7 70.5 42.3 57.4 33.3 .738 .886 .978 1.223 1.444 1.829 1.828 2.161 .85 .78 .84 .86 .89 .90 .89 .90 M1 4.3 5.3 7.3 6.1 7.2 8.9 10.2 11.1 55.5 66.5 67.0 51.5 30.4 24.2 22.8 19.1 .950 1.026 1.219 1.525 1.872 2.033 2.209 2.323 .87 .83 .79 .88 .89 .86 .85 .85 M2 2.8 3.7 3.4 3.5 4.5 5.8 5.3 5.8 6.9 7.4 8.3 85.7 83.7 86.9 65.9 72.8 78.5 65.3 50.7 33.8 40.2 26.4 .471 .543 .626 .844 .959 1.095 1.372 1.657 1.926 2.018 2.239 3.3 6.0 5.8 6.9 8.8 11.2 14.9 16.7 109.3 118.2 102.1 93.0 104.3 82.6 82.4 84.1 .407 .511 .839 1.081 1.098 1.446 1.597 1.691 .70 .54 .75 .76 .68 .68 .63 .61 M3 MT - .80 .75 .81 .88 .85 .83 .89 .90 .90 .90 .90 Computation Math Total Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 3.1 4.6 4.7 4.4 5.2 6.1 7.9 8.6 80.7 88.7 86.2 74.3 68.7 50.7 54.0 45.6 .695 .803 1.012 1.278 1.470 1.761 1.881 2.058 .88 .80 .86 .91 .90 .90 .88 .88 MT + Math Total 2.8 4.1 3.3 3.5 4.4 6.1 5.8 6.6 7.4 8.5 9.4 72.4 84.8 79.3 70.7 72.4 76.7 64.4 45.7 38.0 39.2 43.2 .608 .560 .708 .806 .975 1.126 1.399 1.725 1.893 2.037 2.092 .86 .73 .85 .88 .86 .82 .87 .89 .88 .88 .86 CT - Core Total 3.4 4.4 5.7 5.4 6.2 7.1 8.4 9.2 75.5 77.5 78.7 67.3 51.8 43.9 43.6 49.8 .758 .925 1.104 1.369 1.662 1.835 1.993 2.030 .87 .85 .83 .88 .89 .88 .87 .86 CT + Core Total 4.4 5.7 8.0 7.6 9.5 11.3 11.6 15.1 62.2 88.1 85.5 67.0 54.1 39.8 21.3 21.4 .891 .821 1.046 1.371 1.652 1.876 2.226 2.290 .85 .74 .71 .80 .79 .77 .82 .76 SS Social Studies 5.0 6.3 7.6 8.2 9.7 11.8 11.7 13.1 56.0 59.3 59.9 56.4 27.6 19.8 26.2 15.1 .938 1.105 1.308 1.500 1.932 2.083 2.189 2.384 .83 .80 .80 .81 .83 .78 .82 .81 SC Science 8.4 8.6 9.3 11.0 12.2 12.8 61.3 53.3 28.4 15.4 12.8 -1.6 1.295 1.529 1.915 2.126 2.324 2.537 .77 .80 .84 .81 .82 .84 S1 5.8 7.0 9.2 10.5 10.7 13.7 83.4 79.7 59.2 44.1 26.7 18.2 1.053 1.252 1.608 1.851 2.184 2.351 .82 .80 .79 .79 .84 .80 S2 3.9 5.2 6.5 7.0 8.4 10.1 10.4 12.1 77.3 72.5 73.1 66.4 44.1 30.0 20.1 8.5 .728 .966 1.168 1.392 1.759 1.987 2.251 2.442 .83 .82 .81 .83 .84 .82 .85 .84 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 3.2 4.1 6.3 6.1 7.1 8.3 8.9 11.1 67.7 70.2 74.9 64.4 43.4 33.9 32.3 28.7 .836 .998 1.150 1.406 1.759 1.939 2.117 2.235 .90 .88 .82 .86 .88 .86 .87 .84 CC - 3.2 4.1 6.2 6.1 6.9 8.1 8.9 10.8 69.9 72.2 74.3 63.6 44.1 34.6 33.6 30.1 .814 .979 1.157 1.415 1.752 1.932 2.104 2.222 .90 .88 .82 .87 .88 .86 .87 .84 CC + Compo- Composite site 5.1 6.2 5.7 6.2 7.7 63.3 78.8 73.6 50.6 45.8 .711 .557 .783 1.024 1.256 .74 .57 .72 .80 .78 WA Word Analysis 2.8 4.4 3.6 4.2 4.8 72.8 82.9 82.4 84.4 77.1 .606 .597 .678 .664 .916 .87 .73 .82 .78 .82 Li Listening 3.3 3.3 3.0 4.0 5.1 63.9 93.2 88.3 74.6 73.3 .699 .447 .627 .772 .970 .86 .73 .84 .84 .82 RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 RV Vocabulary Reading 10/29/10 5 6 6 7 8 9 10 11 12 13 14 Level Grade Quantitative Table 8.7 (continued) Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 146 K K 1 1 2 3 4 5 6 7 8 K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 5 6 6 7 8 Sy.x 9 10 11 12 13 14 c b 4.6 4.1 5.5 6.9 7.1 7.8 9.5 10.3 12.6 13.0 119.2 100.6 80.9 72.1 50.4 68.0 45.2 43.9 43.1 57.4 .165 .508 .723 .994 1.382 1.372 1.732 1.863 1.987 1.946 3.6 3.6 6.2 6.9 6.5 7.2 8.3 9.5 11.3 11.3 113.9 100.5 71.4 57.8 60.7 85.6 59.0 68.0 72.6 84.4 .220 .511 .817 1.135 1.270 1.185 1.585 1.616 1.690 1.675 .35 .68 .68 .74 .83 .79 .81 .76 .75 .76 RT Reading Total 4.4 5.3 6.6 7.8 8.8 12.6 11.1 10.4 100.8 109.5 112.2 116.0 79.3 105.3 106.1 122.9 .502 .591 .737 .882 1.395 1.262 1.353 1.307 .62 .59 .64 .66 .75 .57 .68 .71 L1 Spelling -Does not include Computation +Includes Computation 6.6 7.8 7.2 7.7 7.9 7.3 7.6 8.5 9.5 11.2 10.6 61.2 73.8 94.5 60.7 43.2 68.4 101.5 70.0 92.2 102.2 112.2 .728 .685 .565 .920 1.277 1.181 1.014 1.463 1.368 1.391 1.397 .22 .63 .68 .69 .83 .81 .80 .78 .77 .76 RC Comprehension 10.5 11.5 13.0 16.1 16.6 18.1 56.7 39.9 9.5 18.6 8.8 28.2 1.300 1.657 2.101 2.154 2.347 2.282 .68 .75 .76 .68 .74 .71 L2 Capitalization 9.0 11.1 12.2 16.3 15.3 17.8 60.5 47.7 -.7 20.9 3.3 28.2 1.265 1.585 2.209 2.137 2.417 2.277 .72 .74 .79 .67 .77 .71 L3 Punctuation 8.6 11.0 11.8 14.4 14.7 17.8 38.5 49.3 8.1 -16.6 -6.0 -5.8 1.512 1.581 2.143 2.509 2.498 2.602 .80 .75 .79 .77 .79 .76 L4 3.6 4.4 5.1 5.0 5.8 7.3 8.8 9.7 12.6 12.8 14.2 78.6 88.3 96.8 81.0 76.6 68.6 63.7 24.7 32.6 28.8 44.2 .553 .561 .550 .710 .933 1.190 1.422 1.957 2.010 2.148 2.110 .72 .61 .59 .71 .73 .77 .78 .82 .74 .79 .76 LT Usage & Language Expression Total Language 5.0 6.3 4.9 6.3 6.9 9.3 9.5 10.3 82.9 80.0 83.3 93.6 76.1 68.0 80.2 58.8 .682 .883 1.019 1.075 1.390 1.612 1.596 1.910 .70 .69 .84 .80 .83 .77 .79 .83 M1 6.0 6.8 6.8 7.1 8.2 10.3 10.9 12.5 61.9 72.3 55.9 67.2 32.1 36.1 30.5 39.7 .897 .974 1.308 1.368 1.857 1.957 2.125 2.121 .73 .69 .82 .83 .85 .80 .83 .80 M2 Concepts Problems & Data & InterpreEstimation tation 3.5 4.5 4.6 5.2 5.9 5.5 6.3 7.0 9.3 9.2 10.4 86.7 84.2 97.8 72.6 76.3 69.8 80.5 54.2 52.7 55.5 49.6 .465 .550 .524 .788 .929 1.162 1.221 1.623 1.779 1.860 2.012 3.6 6.2 7.1 8.5 9.8 13.5 17.0 17.9 109.0 114.1 117.7 120.4 118.9 126.8 124.9 112.3 .415 .553 .671 .809 .954 1.037 1.171 1.414 .63 .51 .58 .60 .57 .47 .47 .53 M3 MT - .66 .60 .60 .73 .73 .85 .83 .86 .80 .84 .84 Computation Math Total Mathematics 4.3 5.4 5.2 6.1 6.7 9.2 10.4 10.8 85.3 89.4 85.8 93.6 75.9 77.8 78.4 70.7 .658 .800 .999 1.086 1.400 1.528 1.633 1.811 .73 .70 .82 .81 .83 .75 .77 .80 MT + Math Total 4.0 4.8 5.1 5.0 5.5 5.7 6.5 7.4 9.1 10.1 10.6 74.6 83.2 96.2 76.2 70.8 67.1 77.4 46.4 51.6 54.1 60.9 .591 .589 .548 .761 .995 1.202 1.269 1.720 1.798 1.883 1.918 .70 .61 .58 .73 .77 .85 .83 .86 .81 .82 .82 CT - Core Total 4.7 5.3 5.6 6.4 7.2 9.1 10.3 10.5 80.5 75.2 71.5 81.7 53.8 60.1 61.6 67.9 .718 .951 1.156 1.225 1.644 1.713 1.808 1.852 .73 .77 .84 .83 .85 .79 .80 .81 CT + Core Total 6.4 6.7 7.2 7.9 9.8 11.4 11.9 16.0 75.7 94.7 69.9 77.0 51.3 37.9 26.4 40.0 .769 .760 1.181 1.272 1.680 1.935 2.168 2.108 .65 .60 .78 .78 .78 .76 .81 .72 SS Social Studies 6.8 7.8 7.0 8.5 9.9 12.0 12.2 13.1 68.4 64.9 47.1 66.8 23.4 18.7 33.5 23.2 .828 1.056 1.413 1.395 1.974 2.138 2.109 2.306 .65 .67 .83 .79 .82 .78 .80 .81 SC Science 8.1 8.6 8.6 11.2 12.3 13.6 51.7 60.8 16.2 15.2 16.8 14.5 1.369 1.453 2.036 2.173 2.277 2.379 .78 .80 .86 .80 .82 .81 S1 5.9 7.2 8.5 10.1 10.9 14.6 78.9 87.7 46.2 37.4 30.8 36.3 1.079 1.172 1.737 1.958 2.136 2.173 .81 .79 .83 .80 .83 .76 S2 4.8 6.5 6.4 7.1 7.6 10.1 10.6 13.1 78.5 76.3 65.8 74.3 31.5 26.9 24.3 25.8 .725 .933 1.220 1.312 1.884 2.060 2.202 2.273 .73 .69 .82 .82 .87 .82 .85 .81 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 5.0 5.7 5.7 6.6 7.5 9.2 10.0 12.2 75.0 73.3 64.1 75.4 40.6 39.6 42.8 46.0 .774 .973 1.237 1.295 1.788 1.924 2.005 2.067 .74 .75 .85 .84 .86 .82 .84 .80 CC - 4.9 5.6 5.7 6.6 7.4 9.2 10.1 11.9 76.7 74.9 64.3 75.2 41.9 42.3 45.5 47.5 .757 .957 1.236 1.298 1.774 1.897 1.978 2.053 .74 .75 .85 .84 .87 .82 .83 .81 CC + Compo- Composite site 5.4 7.0 7.0 7.2 9.2 53.2 88.9 87.3 51.0 51.6 .819 .467 .655 1.032 1.206 .71 .38 .53 .71 .66 WA Word Analysis 3.9 5.0 5.2 5.5 6.4 73.8 80.9 99.2 96.0 86.8 .601 .631 .519 .558 .825 .72 .61 .56 .58 .65 Li Listening 4.5 3.9 4.3 5.2 6.1 63.1 94.5 97.0 79.2 73.2 .713 .444 .548 .735 .975 .73 .57 .65 .71 .73 RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 .59 .47 .47 .64 .73 .77 .72 .78 .71 .69 .72 RV Vocabulary Reading 10/29/10 r K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 Level Grade Nonverbal Table 8.7 (continued) Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 147 147 148 K K 1 1 2 3 4 5 6 7 8 K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 b c 5 6 6 7 8 Sy.x 9 10 11 12 13 14 4.8 7.1 5.5 6.3 5.9 5.9 6.2 6.9 7.4 8.8 8.7 53.2 72.1 72.7 59.9 39.9 62.1 91.4 70.4 75.8 83.4 92.8 .803 .689 .772 .921 1.300 1.251 1.115 1.459 1.509 1.565 1.578 4.5 4.1 4.6 6.1 5.7 6.6 7.9 7.5 10.0 10.2 120.2 105.6 83.4 74.5 47.3 63.3 48.7 29.7 30.1 32.6 .152 .456 .693 .963 1.422 1.420 1.697 1.975 2.101 2.174 .25 .66 .79 .78 .89 .87 .86 .89 .86 .86 -Does not include Computation +Includes Computation K K 1 1 2 3 4 5 6 7 8 5 6 6 7 8 9 10 11 12 13 14 r .81 .59 .74 .78 .86 .86 .82 .86 .84 .82 .82 RC Comprehension 3.6 3.5 5.0 5.3 4.9 5.8 6.6 6.9 8.7 8.7 116.3 105.1 72.2 57.4 56.4 78.1 61.5 52.8 56.7 62.0 .192 .463 .802 1.130 1.321 1.261 1.559 1.741 1.833 1.883 .38 .71 .80 .86 .90 .87 .88 .88 .86 .87 RT Reading Total 4.1 5.0 5.7 6.9 7.8 10.9 9.7 9.5 104.5 111.8 104.3 108.4 83.0 83.9 95.9 112.4 .461 .565 .820 .959 1.359 1.453 1.443 1.401 .69 .66 .75 .74 .81 .71 .77 .76 L1 Spelling 9.5 10.6 11.6 14.0 14.3 15.5 51.3 36.5 16.4 4.0 -3.0 -.6 1.362 1.693 2.033 2.264 2.446 2.547 .75 .79 .81 .77 .81 .80 L2 Capitalization 8.1 10.0 10.8 13.7 13.0 15.1 56.9 42.2 8.3 .4 -6.0 -.2 1.309 1.641 2.120 2.306 2.492 2.540 .79 .80 .84 .78 .84 .80 L3 Punctuation 7.0 9.8 10.0 11.6 11.9 14.5 32.9 42.2 14.0 -29.3 -16.7 -36.4 1.577 1.654 2.084 2.596 2.585 2.884 .87 .81 .86 .86 .87 .85 L4 2.5 3.7 3.7 3.8 4.8 6.0 7.6 7.9 9.7 10.3 11.5 78.7 88.2 85.5 81.7 78.4 63.2 57.8 31.0 15.2 18.2 19.6 .548 .551 .655 .698 .909 1.251 1.483 1.895 2.151 2.237 2.336 .88 .75 .81 .84 .83 .85 .84 .89 .86 .87 .85 LT 3.7 5.2 4.0 5.4 5.4 6.7 7.4 8.1 82.2 78.8 82.4 90.1 79.3 53.4 70.8 42.7 .684 .889 1.035 1.110 1.357 1.731 1.677 2.054 .84 .80 .90 .86 .89 .89 .88 .90 M1 4.3 5.3 5.5 5.9 6.6 8.0 8.8 9.9 61.8 68.4 52.9 63.5 39.2 26.4 26.1 19.3 .891 1.005 1.346 1.407 1.786 2.024 2.153 2.306 .87 .83 .89 .89 .91 .88 .89 .88 M2 2.7 3.8 3.5 3.6 4.4 4.3 5.3 5.3 6.6 6.8 7.7 85.6 84.4 89.9 72.1 74.0 68.0 76.9 59.1 40.5 48.5 31.3 .473 .538 .596 .787 .944 1.188 1.258 1.574 1.873 1.915 2.177 3.3 6.0 6.6 7.8 9.3 12.4 15.7 17.1 112.5 118.7 112.6 113.1 120.0 108.9 104.3 97.2 .378 .505 .727 .882 .944 1.197 1.365 1.551 .69 .54 .66 .67 .63 .58 .58 .59 M3 MT - .82 .73 .79 .88 .86 .91 .89 .92 .91 .92 .91 Computation Math Total Mathematics Concepts Problems Usage & Language & Data & Expression Total InterpreEstimation tation Language 3.1 4.5 4.1 5.0 5.3 6.9 8.2 8.7 85.8 89.6 82.9 88.8 79.4 63.8 66.9 53.4 .648 .792 1.034 1.135 1.365 1.643 1.734 1.967 .87 .81 .89 .88 .90 .87 .87 .88 MT + Math Total 2.6 4.0 3.5 3.5 3.9 4.1 5.1 5.2 5.8 7.1 7.5 72.2 82.6 82.5 76.4 70.7 63.2 72.0 50.0 36.5 42.4 39.0 .611 .584 .676 .754 .988 1.248 1.325 1.684 1.919 1.984 2.120 .89 .74 .83 .88 .89 .92 .90 .93 .93 .91 .91 CT - Core Total 3.4 4.0 4.0 4.9 5.1 6.0 7.3 7.5 80.8 75.9 67.4 75.9 56.9 44.5 48.4 46.5 .709 .938 1.205 1.285 1.613 1.840 1.925 2.049 .87 .88 .92 .90 .93 .92 .91 .91 CT + Core Total 4.4 5.4 6.3 6.8 8.2 9.3 9.7 13.6 68.5 86.7 67.7 71.5 54.2 27.8 19.9 14.7 .833 .833 1.211 1.328 1.652 2.004 2.216 2.340 .85 .77 .83 .85 .85 .85 .88 .81 SS Social Studies 4.9 5.8 5.8 7.3 8.4 9.8 10.1 11.1 61.8 55.8 45.7 61.7 31.0 9.8 27.0 7.8 .885 1.136 1.436 1.448 1.899 2.194 2.158 2.441 .84 .84 .89 .85 .87 .86 .87 .87 SC Science 7.2 7.7 8.0 9.1 10.8 11.6 50.3 58.0 32.0 8.5 15.4 -1.5 1.391 1.483 1.880 2.206 2.274 2.520 .84 .84 .88 .88 .86 .87 S1 4.8 6.4 7.8 8.4 9.1 11.9 76.2 84.6 57.8 33.3 28.3 11.7 1.113 1.204 1.622 1.969 2.145 2.399 .88 .84 .86 .87 .89 .85 S2 3.8 5.1 5.2 6.1 6.8 8.0 8.8 10.5 81.3 72.8 63.9 71.2 44.8 21.3 22.3 5.3 .691 .960 1.247 1.345 1.753 2.084 2.206 2.457 .84 .83 .88 .87 .90 .89 .90 .88 ST Maps Reference Sources and Materials Total Diagrams Sources of Information 3.2 3.7 4.3 5.2 5.5 6.1 7.2 9.2 73.3 69.0 61.3 70.4 46.1 28.0 34.2 23.5 .785 1.007 1.273 1.348 1.734 2.010 2.075 2.272 .90 .90 .92 .90 .93 .93 .92 .89 CC - 3.1 3.6 4.3 5.3 5.4 6.2 7.3 9.1 75.2 70.9 61.6 70.1 47.5 30.1 36.3 26.1 .765 .988 1.272 1.351 1.719 1.988 2.055 2.247 .90 .90 .92 .90 .93 .92 .92 .89 CC + Compo- Composite site 5.1 6.6 6.1 6.1 7.4 65.6 86.0 80.8 57.5 45.3 .689 .487 .713 .960 1.258 .74 .49 .67 .80 .80 WA Word Analysis 2.4 4.1 3.8 4.2 4.8 72.2 78.4 86.3 88.5 77.9 .613 .644 .640 .626 .906 .90 .77 .79 .79 .83 Li Listening 3.1 3.3 3.1 4.0 4.6 64.3 93.1 90.6 80.0 70.7 .696 .450 .605 .721 .992 .88 .72 .83 .84 .86 RPT Reading Profile Total 3:16 PM Note: K K 1 1 2 3 4 5 6 7 8 RV Vocabulary Reading 10/29/10 5 6 6 7 8 9 10 11 12 13 14 Level Grade Composite Table 8.7 (continued) Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages Standard Age Scores, Cognitive Abilities Test, Form 6 and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A Spring 2000 National Standardization 961464_ITBS_GuidetoRD.qxp Page 148 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 149 Technical Consideration for Other Iowa Tests PART 9 Other assessments have been developed in conjunction with The Iowa Tests to support local testing programs. These materials include the: • Iowa Tests of Basic Skills® Survey Battery (Levels 7–14) • Iowa Early Learning Inventory™ • Iowa Writing Assessment (Levels 9–14) • Constructed-Response Supplement to The Iowa Tests (Levels 9–14) • Listening Assessment for ITBS® (Levels 9–14) • Iowa Algebra Aptitude Test™ Language At Levels 7 and 8, the Language tests in the Survey Battery assess four content areas: Spelling, Capitalization, Punctuation, and Usage and Expression. The numbers of questions per skill are the same as those for the Complete Battery. At Levels 9 through 14, the Survey Battery includes five skill categories corresponding to the major language skills in the Complete Battery: Spelling, Capitalization, Punctuation, Usage, and Expression. The questions were selected to be representative in content and difficulty of those in the Complete Battery. Mathematics Information about selected aspects of these tests follows. Iowa Tests of Basic Skills Survey Battery The Iowa Tests of Basic Skills Survey Battery consists of achievement tests in the areas of Reading, Language, and Mathematics. The Interpretive Guide for Teachers and Counselors and the Interpretive Guide for School Administrators contain the content specifications and skill descriptions for each level of the ITBS Survey Battery. Other Scores The Survey Total is the average of the standard scores from the Reading, Language, and Math tests. Description of the Tests Test Development The ITBS Survey Battery contains tests in Reading, Language, and Math with questions from the corresponding tests of the Complete Battery. Each test takes 30 minutes to administer. The ITBS Survey Battery reflects the same content emphasis as the Complete Battery. It was developed from items in the Complete Battery, partitioned into non-overlapping levels so that average item difficulty, level by level, was approximately the same as in the Complete Battery. Where clusters of items shared a common stimulus, care was taken to maintain the balance of stimulus types. Reading At Levels 7 and 8, the first part of the test measures reading vocabulary. The second part has questions measuring literal comprehension and questions measuring the ability to make inferences. p In the Survey Battery for Levels 7 through 14, a single test covers Math Concepts, Problem Solving, and Data Interpretation. Separately timed sections measure Computation and Estimation. The relative emphasis of the major skill areas in the Survey Battery is approximately the same as in the Complete Battery. Score reports provide information about major skills in mathematics, but not subskills. At Levels 9 through 14, the first part of the Reading test consists of a representative set of items from the Complete Battery Vocabulary test. The second part, Reading Comprehension, was developed with passages and questions from the Reading test in the Complete Battery. Passages were chosen to reflect genres included in the Complete Battery. Standardization As part of the 2000 fall national standardization, Forms A and B of the ITBS Survey Battery were administered so that each student took one form of the Complete Battery and the alternate form of the Survey Battery. Raw score to standard score conversions for each form of the ITBS Survey 149 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 150 Battery were established by first determining comparable raw scores on Survey and Complete Battery subtests via smoothed equipercentile methods and then attaching standard scores to the Survey raw scores. Because of the joint administration of each ITBS Survey Battery with the alternate form of the Complete Battery, comparable scores for Form A of the ITBS Survey were obtained through Form B of the Complete Battery, and comparable scores for Form B of the ITBS Survey were obtained through Form A of the Complete Battery. Comparable scores on the two batteries were based on approximately 3,000 students per level per test form. information for interpreting the scales. The IELI is intended for use with kindergarten and first-grade students. Description of the Inventory The IELI includes the following scales: General Knowledge, Oral Communication, Written Language, Math Concepts, Work Habits, and Attentive Behavior. It takes an experienced teacher about 10 minutes per student to complete the ratings. General Knowledge: This scale measures the acquisition of general information and facts expected of five- and six-year-old children. For more information about the procedures used to establish norms and score conversions for the ITBS Survey Battery, see Parts 2 and 4. Oral Communication: How well a student is able to communicate ideas, describe what is seen or heard, or ask questions is the focus of this scale. Test Score Characteristics Written Language: A student’s ability to recognize and write letters or simple words is assessed by this scale. Raw score means and standard deviations and internal-consistency reliability estimates for Form A of the Survey Battery for spring testing appear in Table 9.1. Complete technical information about Forms A and B of the Survey Battery is provided in the Norms and Score Conversions manual for each form. Math Concepts: This scale evaluates how well a student understands and is able to use beginning mathematical ideas and processes. Work Habits: Behaviors indicative of success in the classroom—persistence, resourcefulness, and independence—are assessed with this scale. Iowa Early Learning Inventory The Iowa Early Learning Inventory (IELI) is a questionnaire for teachers to rate student behavior in six areas related to school learning. The Teacher’s Directions and Interpretive Guide for the IELI contains descriptions of the six scales and specific Attentive Behavior: The questions on this scale relate to a student’s ability to focus on classroom activities. Table 9.1 Test Summary Statistics Iowa Tests of Basic Skills — Survey Battery, Form A 2000 National Standardization Data Reading Level 7 Number of Items Mean SD 40 26.55 8.94 Language K-R 20 .92 Number of Items 34 Mean SD 23.15 6.68 Math with Computation K-R 20 .87 Number of Items 40 Mean SD 27.11 6.37 K-R 20 .84 Math without Computation Number of Items 27 Mean SD 17.92 4.36 K-R 20 .77 8 44 28.95 8.70 .90 42 30.10 7.40 .88 50 33.64 8.19 .88 33 21.41 5.88 .84 9 27 16.94 6.48 .89 43 26.69 8.78 .90 31 20.85 6.10 .86 23 15.57 4.67 .82 10 30 17.50 6.74 .88 47 27.21 9.37 .90 34 20.12 6.38 .85 25 14.36 4.83 .80 11 32 20.39 6.67 .87 51 30.92 10.58 .92 37 22.80 7.52 .88 28 16.68 5.91 .85 12 34 21.67 7.24 .88 54 33.40 11.30 .92 40 25.82 7.87 .89 30 19.64 6.23 .87 13 36 21.23 7.76 .89 57 32.43 11.42 .92 43 25.63 8.39 .89 33 19.93 6.92 .88 14 37 21.41 7.97 .89 59 34.39 11.46 .91 46 25.32 8.74 .89 35 19.82 7.08 .87 Average 150 .89 .90 .87 .84 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 151 Test Development Extensive research has been done by early childhood educators on the characteristics that contribute to successful classroom learning. From this research, the IELI’s six scales were established. The focus was limited to behaviors that classroom teachers would be likely to observe in day-to-day activities. Care was taken to include behaviors that relate to learning, rather than socialization. Standardization The national norms for the IELI were obtained from a subsample of kindergarten students included in the spring 2000 national standardization of the Iowa Tests of Basic Skills. These norms are the basis for reporting a student’s score as “Developed,” “Developing,” or “Delayed” (about 60 percent, 30 percent, and 10 percent of the norming sample, respectively). Reliability coefficients of IELI scales range from .81 to .93. The percent of kindergarten students rated at each of these categories for the six scales in the 2000 spring national standardization and correlations between IELI ratings and ITBS scores are reported in the Teacher’s Directions and Interpretive Guide for the IELI (Hoover, Dunbar, Frisbie & Qualls, 2003). Iowa Writing Assessment The Iowa Writing Assessment measures a student’s ability to generate, organize, and express ideas in written form. As a performance assessment, it adds another dimension of information to the evaluation of language arts achievement. Although multiplechoice and short-answer tests typically provide highly reliable measurement of language skills, such tests tap a student’s editing skills rather than composition skills. Norm-referenced evaluation of student writing supplements information obtained from the Language tests in the Iowa Tests of Basic Skills. The norms give a national perspective to writing that students do on a regular basis in school. More information about this test can be obtained from the Iowa Writing Assessment Manual for Scoring and Interpretation. Description of the Test The Iowa Writing Assessment measures a student’s ability to write four types of essays: narrative, descriptive, persuasive, and expository. Narrative: A narrative tells a story. It has characters, setting, and action. The characters, the setting, and the problem are usually introduced in the beginning. The problem reaches a high point in the middle. The ending may resolve the problem. A narrative may be a fictional story, a factual account of a real-life experience, or some combination of both. Descriptive: A description creates a vivid image of a person, place, or thing. It enables the reader to share the writer’s experience by appealing to the senses. Persuasive: A persuasive essay states and supports an opinion by drawing on the writer’s experience or the experience of others, or by citing authority. Good persuasive writing considers the audience and presents an argument likely to be effective. Expository: Expository writing conveys information to help the reader understand a process, procedure, or concept. Telling how something is made or done, reporting on an experience, or exploring an idea are examples of expository writing. Directions for local scoring of the Iowa Writing Assessment are provided to schools. These include how to select and train raters, and how to organize scoring sessions. In addition, materials are provided to help raters make valid, consistent judgments about student writing. These materials include a scoring protocol and anchor papers for each prompt, and a booklet of training papers. The anchor papers are actual examples of student writing. They provide concrete illustrations of the criteria for each score point. Student essays can be scored in two ways. The focused-holistic score gives an overall rating of the quality of an essay. The analytic scores provide ratings on four separate scales (Ideas/Content, Organization, Voice, and Conventions). A four-point scale is used for all ratings. Each essay is scored by two independent raters. When two scores differ by more than one point, the essay is rescored by a supervisor. The final score for each essay is the average of the two closest ratings. The supervisor’s rating becomes the final score if it is midway between the other two ratings. Papers that are blank, illegible, not written in English, or not on the assigned topic are deemed unscorable. 151 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 152 For both scoring methods, the protocols describe each point on the score scale in detail. The protocols reflect criteria unique to each type of essay, along with fluency of ideas and development of a personal voice. In general, the emphasis is on thoughtful content, logical organization, and original style. In focused-holistic scoring, language mechanics affect scores only if they inhibit understanding. In analytic scoring, language mechanics are scored on the Conventions scale. For both methods, the protocols emphasize that effective discourse involves the ability to reason, to develop ideas, to write for an audience, and to proceed from one idea to the next. prompts were weighted so the distribution of ITBS Vocabulary scores matched the distribution from the 1992 spring national standardization of Form K. Approximately 20,000 essays, written by students representing 323 buildings in 32 states, were scored in the standardization study. The partially balanced, incomplete block design allowed norms to be computed for each prompt based on randomly equivalent groups of students. It also allowed estimation of correlations between prompts. After the 2000 standardization, the norms for the Iowa Writing Assessment were adjusted. The original standardization sample was reweighted so that the Vocabulary scores matched the Vocabulary distribution in the 2000 spring standardization. Test Development Prompts for each type of writing were developed for students in grades 3 and 4 (Levels 9–10), grades 5 and 6 (Levels 11–12), and grades 7 and 8 (Levels 13–14). Prompts were assigned to levels based on their difficulty. Factors considered in designing each prompt were whether it represents the most salient features of that type of writing; whether it clearly defines the specific demands of that type of writing; and whether it is likely to elicit good writing from students regardless of gender, race or ethnicity, or geographic region. For each level, two prompts were developed and standardized for each type of essay. Test Score Characteristics The procedures for training scorers and conducting scoring sessions helped ensure that scores from the Iowa Writing Assessment would be reliable. Witt (1993) studied the consistency of scores between readers and between prompts. Intraclass correlations were calculated for the analytic scores, for the analytic total, and for the focused-holistic score. Table 9.2 reports average correlations across grades. Reader reliability is estimated to answer these questions: How accurate is the rating of this student’s essay? What is the relation between the obtained rating and the rating the student would have received from a different rater? The first two columns of Table 9.2 display estimates of reader reliability from the national standardization. These Standardization During the 1992 national standardization, prompts were administered in a partially balanced, incomplete block design. Pairs of prompts were spiraled by classroom. Each student also took a form of the ITBS. The essays in each group of paired Table 9.2 Average Reliability Coefficients, Grades 3–8 Iowa Writing Assessment 1992 National Standardization Mode of Discourse Rater Reliability 1 Score Reliability 2 Generalizability 3 Holistic Analytic Holistic Analytic Holistic Analytic Narrative .79 .75 .52 .60 .35 .52 Descriptive .76 .71 .42 .53 .39 .54 Persuasive .82 .76 .37 .53 .34 .49 Expository .77 .78 .46 .61 .36 .52 Average .78 .75 .44 .57 .36 .52 1 Average of two raters 2 Correlations between two essays in the same mode of discourse, essay score average of two ratings 3 Average correlations between essays in different modes of discourse 152 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 153 Table 9.3 Correlations and Reliability of Differences Iowa Writing Assessment and Iowa Tests of Basic Skills Language Total 1992 National Standardization Mode of Discourse Reliability of Differences Correlations Holistic Analytic Holistic Analytic Narrative .44 .58 .53 .47 Descriptive .44 .55 .44 .42 Persuasive .42 .59 .42 .37 Expository .44 .60 .48 .45 Average .43 .58 .47 .43 values are similar to reader reliabilities found in large-scale writing assessments. Estimates of score reliability answer the following questions: How accurate is a student’s score on this essay as an indicator of ability in this type of writing? What is the expected relation between this score and a score from a different prompt? The second two columns of Table 9.2 display estimates of score reliability from the national standardization. Score reliabilities were highest for narrative writing and lowest for descriptive and persuasive writing. Research confirms that score reliability for a single essay is lower than the internal-consistency or parallel-forms reliability of a standardized test that has many discrete items. Scoring that emphasizes content, organization, and voice (features that vary from topic to topic) is less reliable than scoring that emphasizes mechanics, which are relatively constant from topic to topic. Score reliabilities from the standardization should be considered lower limits. Participants were assigned prompts on the basis of demographic, not curricular, characteristics. When schools select prompts, higher score reliabilities are expected because instruction tends to create consistent performance in that type of writing. The correlations between scores from different essay types are reported in the final two columns of Table 9.2. These measure the generalizability of performance on topics requiring different types of writing. Score generalizabilities are directly comparable to score reliabilities—both are correlations between different essays written by the same student. The score reliabilities reflect consistency in one type of writing; generalizabilities reflect consistency across types, which is usually lower than within a type. Correlations between the prompts on the Iowa Writing Assessment and the ITBS Language Total indicate the overlap in skills assessed on the two tests. The correlations included in Table 9.3 are adjusted for unreliability. They estimate the true score relationship between the tests. These correlations show that the Iowa Writing Assessment and the ITBS Language Total tap different aspects of achievement in language arts. Constructed-Response Supplement to The Iowa Tests The Constructed-Response Supplement (CRS) is used together with the Complete Battery or the Survey Battery of the ITBS. The CRS measures learning outcomes in reading, language, and math, using an open format. The tests assess content objectives measured in the multiple-choice format of the ITBS, along with others where an open format is particularly effective. More information about the CRS is found in the Constructed-Response Supplement to The Iowa Tests Manual for Scoring and Interpretation. There is a separate manual for each test level. Description of the Tests Students write their answers to open-ended questions in the test booklets. Teachers use the guidelines provided to score the tests; they record the results on forms that can be processed by the Riverside Scoring Service. Most questions have more than one correct response and can be answered by various strategies. Students answer with phrases, sentences, paragraphs, drawings, diagrams, or mathematical expressions. The administration time for each test is 30 minutes. 153 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 154 Thinking about Reading Students show their ability to understand and interpret what they read. At each level, the Reading Supplement consists of one passage and several questions. Some questions require a one- or twosentence response. Other questions can be answered in a few words. Responses are scored on a 0-1 or a 0-1-2 scale. The maximum number of points awarded for a question depends on its complexity. Some questions have several parts, and points are given for each part. Thinking about Language The Language Supplement assesses students’ ability to develop and organize ideas and to express them in standard written English. At each grade level, the Language tests include three parts: editing, revising, and generating. In the editing part, students identify which parts of three short stories, reports, or letters need editing. They can make changes in spelling, capitalization, punctuation, and the use of words or phrases. In the revising part, students revise a story by completing sentences, changing them to express an idea more clearly, correcting grammatical mistakes, and completing the story. In the generating part, students are given a specific topic and told to define the subject of the story, write at least three ideas for the story, and write a complete sentence to be included in the story. The number of items and the total score points vary by level, from 23 items (38 score points) to 33 items (60 score points). Responses are scored on scales of 0-1, 0-1-2, or 0-1-2-3, depending on item complexity. Scoring guidelines and keys are used to assign points. For example, items in the editing part are worth two points: one point for identifying the error and one point for correcting it. In the revising and generating parts, guidelines instruct scorers to accept reasonable answers and to ignore errors in mechanics unrelated to the skill measured by the item. Thinking about Mathematics The Mathematics Supplement assesses problem solving, data interpretation, conceptual understanding, and estimation skills. Open-ended questions are presented alone or in groups related to a common data source. Items require students to analyze and solve problems and to describe their thinking using words, diagrams, graphs, symbols, calculations, and equations or inequalities. Students may use a variety of solution strategies to solve problems; they must also make connections among mathematical concepts and procedures. Questions require students to explain their reasoning, show their work, and justify their conclusions. 154 The number of items and the total score points vary by level, from 13 items (19 score points) to 17 items (24 score points). Responses are scored on scales of 0-1 or 0-1-2. The scoring guidelines describe characteristics of a response at each score level. A 2-point response demonstrates complete understanding of the math concepts and processes involved in that item, a 1-point response demonstrates partial understanding, and a 0-point response demonstrates no understanding. For each question, the scoring key describes the kinds of answers that would earn full or partial credit. Test Development The goal of test development was to create situations that would elicit different patterns of thinking in different students yet elicit equally correct responses to questions (Perkhounkova, Hoover & Ankenmann, 1997). The content specifications for test development were similar to those used to develop the ITBS. However, the open format created opportunities to tap a wider range of process skills, especially in language and mathematics. Test materials were reviewed for balance in terms of gender, race and ethnicity, and geography as described in Parts 3 and 7 and were field tested in a national item tryout. Joint Scaling with the ITBS CRS results are reported as raw scores. If combined with a multiple-choice ITBS subtest, they can be reported as developmental standard scores, national percentile ranks, and other derived scores. The combined CRS/multiple-choice standard score conversions were developed with data from the fall 1997 scaling study in which national samples of students in grades 3 through 8 took the CRS and one or more subtests of the ITBS. Test Score Characteristics Internal-consistency reliability coefficients for the CRS appear in Table 9.4. The obtained reliabilities are comparable to reliability estimates for multiplechoice tests in Language and Math. In Reading, the reliability estimates reflect the smaller number of items in that test. The correlations between CRS scores and multiple-choice tests of the ITBS appear in Table 9.5 along with the reliabilities of the differences. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 155 Table 9.4 Internal-Consistency Reliability Constructed-Response Supplement 1997 Fall Scaling Study Level Reading Language Math 9 10 11 12 13 14 .67 .68 .61 .65 .70 .65 .80 .81 .79 .81 .86 .85 .75 .78 .75 .82 .80 .82 Average .66 .82 .79 Listening Assessment for ITBS Directions for Administration and Score Interpretation. Description of the Test The main purposes of the test are: (a) to assess strengths and weaknesses in the development of listening skills so effective instruction can be planned to meet individual and group needs; (b) to monitor instruction so effective teaching methods can be identified; and (c) to help teachers and students understand the importance of good listening strategies. Test Development Table 9.5 Correlations and Reliabilities of Differences Constructed-Response Supplement and Corresponding ITBS Subtests 1997 Fall Scaling Study Correlations Level Reading Language Math 9 10 11 12 13 14 .60 .68 .57 .59 .63 .66 .67 .70 .71 .74 .72 .73 .69 .76 .70 .79 .77 .80 Average .62 .71 .75 Content specifications are based on research in the teaching and assessment of listening comprehension. Items in the Listening Assessment measure six major skills: Literal Meaning: details about persons, places, objects, and ideas Inferential Meaning: importance of details; cause and effect; drawing conclusions Following Directions: decoding; verbal, numerical, and spatial relationships; sequence Visual Relationships: verbal-to-visual transformations; word meaning in context Numerical/Spatial/Temporal Relationships: analyzing and visualizing concepts of number, space, and time Speaker’s Purpose, Point of View, or Style: main idea, organization, purpose, tone, and context Reliability of Differences Level Reading Language Math 9 10 11 12 13 14 .20 .16 .18 .25 .28 .17 .24 .40 .35 .38 .45 .42 .22 .19 .27 .26 .21 .18 Average .21 .37 .22 Listening Assessment for ITBS The Listening Assessment is a special supplement to the Iowa Tests of Basic Skills for grades 3 through 9. It is an upward extension of the Primary Battery Listening tests for kindergarten through grade 3. More information about the test can be found in the The test includes 95 items in six overlapping levels, 9 through 14. Standardization The national standardization sample was selected as a subsample of schools participating in the fall 1992 national standardization of the Iowa Tests of Basic Skills. Selection characteristics were region, district enrollment, type of school, and socioeconomic status. Because the sample was not representative of the national population, distributions of the ITBS Vocabulary score were obtained for the total ITBS representative sample and the Listening Assessment sample. These distributions were used to weight the sample that took the Listening Assessment so that it represented the ability of the national sample. 155 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 156 Table 9.6 Test Summary Statistics Listening Assessment for ITBS 1992 Spring National Standardization Grade Number of Items Raw Scores Mean SD SEM Standard Scores Mean SD SEM Reliability (K-R 20) 3 4 5 6 7 8 31 33 35 36 38 40 17.7 4.8 2.5 20.7 4.7 2.6 23.3 4.5 2.6 24.3 5.0 2.6 25.4 5.5 2.6 26.1 6.0 2.8 184.4 19.0 10.3 199.2 20.8 11.8 213.6 23.1 13.7 226.5 24.8 13.1 238.3 26.6 12.9 248.6 28.5 13.4 .72 .69 .67 .73 .78 .79 Table 9.7 Correlations Between Listening and ITBS Achievement Iowa Tests of Basic Skills – Listening Assessment and Complete Battery, Form K 1992 National Standardization Grade Test 3 4 5 6 7 8 Vocabulary .46 (.59) .52 (.67) .63 (.84) .63 (.79) .62 (.76) .65 (.79) Reading Comprehension .43 (.54) .49 (.64) .60 (.78) .63 (.78) .63 (.77) .60 (.72) .47 (.58) .54 (.68) .65 (.83) .67 (.82) .67 (.80) .66 (.78) Spelling .12 (.20) .35 (.45) .39 (.51) .39 (.49) .37 (.46) .40 (.49) Capitalization .24 (.31) .46 (.61) .49 (.67) .47 (.62) .48 (.61) .48 (.60) Punctuation .22 (.28) .41 (.56) .48 (.66) .51 (.67) .46 (.59) .46 (.59) Usage and Expression .43 (.55) .51 (.67) .54 (.73) .59 (.75) .55 (.69) .58 (.71) .45 (.55) .51 (.64) .55 (.70) .56 (.68) .54 (.65) .55 (.65) Concepts and Estimation .42 (.54) .55 (.72) .59 (.79) .63 (.80) .61 (.75) .56 (.69) Problem Solving & Data Interpretation .48 (.62) .56 (.75) .58 (.78) .65 (.84) .63 (.79) .62 (.78) Computation Reading Total Language Total .30 (.38) .38 (.49) .41 (.54) .50 (.63) .43 (.54) .41 (.50) Math Total .48 (.60) .58 (.78) .62 (.80) .68 (.84) .65 (.79) .64 (.77) Core Total .51 (.61) .59 (.73) .66 (.83) .69 (.83) .67 (.79) .67 (.78) Social Studies .45 (.59) .54 (.72) .58 (.79) .67 (.82) .59 (.74) .59 (.73) Science .41 (.53) .53 (.70) .59 (.80) .60 (.77) .59 (.74) .62 (.77) Maps and Diagrams .33 (.43) .53 (.71) .57 (.78) .60 (.79) .55 (.71) .56 (.70) Reference Materials .27 (.34) .50 (.64) .55 (.73) .59 (.75) .54 (.67) .58 (.70) Sources of Information Total .47 (.58) .56 (.71) .60 (.78) .63 (.79) .59 (.72) .61 (.73) Composite .52 (.62) .60 (.74) .67 (.84) .71 (.85) .68 (.80) .68 (.79) Note: Correlations in parentheses are adjusted for unreliability. 156 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 157 Test Score Characteristics Internal-consistency reliability coefficients (KuderRichardson Formula 20) were established with data from the standardization sample. Means, standard deviations, reliability coefficients, and standard errors of measurement for raw scores and developmental standard scores are shown in Table 9.6. Reliability coefficients of listening tests are almost always lower than those of similar-length tests of other basic skills test areas. This is probably because listening comprehension, like reading comprehension, represents cognitive processes ranging from simple recall and attention span to thinking that involves application and generalization. Homogeneous tests, such as vocabulary and spelling, yield relatively high reliability per unit of testing time. Multi-dimensional tests that sample cognitive processes from different domains tend to be less reliable. The reliability coefficients from the national standardization should be viewed as lower limits. They were obtained under standardization conditions with no special preparation or motivation. In most classrooms, reliability can probably be improved by (a) discussing the importance of being a good listener and the rationale of the Listening Assessment, (b) rehearsing the presentation of test items to ensure effective delivery, and (c) pacing test administration so the students pay attention. The correlations between standard scores on the Listening Assessment and the ITBS appear in Table 9.7. These are based on all students from the 1992 fall standardization who took the Listening Assessment and the Complete Battery of the ITBS. Correlations adjusted for unreliability based on the K-R 20 reliability coefficients for the two sets of variables are also presented. These estimate correlations between scores from perfectly reliable tests. They indicate whether the two variables measure the same ability or unique abilities. Most of the adjusted coefficients in Table 9.8 are considerably below 1.00, which suggests the Listening Assessment measures something different from the other ITBS tests. Of particular interest are the adjusted correlations with reading. The cognitive processes involved in reading and listening appear similar, even though reading and listening involve different senses. The ITBS Listening and Reading tests do have discriminant validity, however, as evidenced by the adjusted correlations ranging from .54 to .78. This range of correlations (between “perfectly reliable” tests) is observed for tests as different as Listening and Math (adjusted correlations ranging from .60 to .84), or Listening and Science (.53 to .80). Although reading and listening have strong similarities in development and assessment, their measurement in The Iowa Tests allows for reliable assessment of their differences. Predictive Validity Data collected in the standardization illustrate how well the Listening test of the ITBS predicts later achievement. Data from students who took Level 8 in the spring of grade 2 and Level 9 in the fall of grade 3 were used to compare the predictive ability of the Listening test to the predictive ability of other subtests. Table 9.8 presents the correlations between the Listening test taken in the spring and subsequent achievement as measured by the ITBS in the fall. Table 9.8 Correlations Between Listening Grade 2 and ITBS Grade 3 ITBS Test Listening Vocabulary .49 Reading Total .52 Language Total .42 Math Total .51 Social Studies .50 Science .48 Sources of Information Total .49 Composite with Computation .55 Integrated Writing Skills Test The Integrated Writing Skills Test (IWST) measures editing skills by having students evaluate passages that resemble school-based writing. More information about the IWST can be obtained from the Integrated Writing Skills Test Score Conversions and Technical Summary. Description of the Tests The content and style of the passages in the Integrated Writing Skills Test are patterned after the writing of students in grades 3 through 8. Items 157 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 158 ask students to judge the appropriateness of selected parts of the passage and to indicate where changes are needed. A total score is reported, as well as scores in spelling, capitalization, punctuation, usage, expression, and multiple skills. Raw scores can be converted to scaled scores, grade equivalents, and percentile ranks to interpret performance as it relates to the regular ITBS battery. independent clauses, or recognizing that a homophone of a word was used and should be changed. Including a higher proportion of items that tap written expression and multiple language skills makes the IWST especially useful as a measure of the editorial skills used in the revision stage of the writing process. Standardization Test Development The Integrated Writing Skills Test was developed from the same content specifications as the fourpart Language test in the regular ITBS battery. Items are presented in an age-appropriate story, report, or essay written in the voice of a student at a particular grade level. Items measure a student’s ability to distinguish between correctly and incorrectly spelled words; to apply generally accepted conventions for capitalization, punctuation, and English usage; and to select effective language for writing. These are essential skills in development of the ability to write effectively. An important feature of the IWST is its increased emphasis on usage and written expression from grade 3 to grade 8. The emphasis on items that tap written expression increases across levels, while the emphasis on language mechanics (spelling, capitalization, and punctuation) decreases. The number of items that require students to integrate language mechanics and usage in deciding the best answer also increases with test level. Such items, classified as multiple-skills questions, might involve identifying the improper use of a comma to separate Forms K and M of the IWST were equated to the four-part Language tests in two special studies. In each study, a single-group equating design was used in which students took the IWST and the four-part Language test from the Complete Battery of the ITBS. Raw-score equipercentile equivalents were determined using analytic smoothing. From these relationships, raw score to standard score conversions were developed for the IWST. Approximately 1,000 students per grade participated in the Form K study; approximately 800 per grade participated in the Form M study. Test Score Characteristics Raw-score means and standard deviations as well as internal-consistency reliability estimates for the IWST are given in Table 9.9. The relationship between an integrated approach to language skills assessment and the traditional ITBS approach was studied by Bray and Dunbar (1994). They found that the two formats were similar in average difficulty and discrimination, and that the distributions of item p-values were similar. Table 9.9 Test Summary Statistics Integrated Writing Skills Test, Form M Grade Number of Items 158 3 4 5 6 7 8 38 44 49 52 55 57 Fall Mean SD SEM K-R 20 18.65 7.92 2.7 .88 23.36 8.67 3.0 .88 27.73 9.66 3.1 .90 31.19 10.58 3.1 .91 33.28 11.17 3.2 .92 35.90 9.91 3.3 .89 Spring Mean SD SEM K-R 20 23.34 8.05 2.6 .89 27.08 8.90 2.9 .90 30.70 9.83 3.0 .91 33.39 10.57 3.0 .91 35.31 10.98 3.1 .92 37.36 9.80 3.2 .89 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 159 Table 9.10 Correlations Between IWST and ITBS Reading and Language Tests Content Area Grade 4 Reading .74 IWST Language Test .64 ITBS Language Total Grade 8 .74 .61 .62 .62 .81 .86 .71 .80 .76 .83 .63 .94 .95 .97 .96 .64 .72 .74 .77 .72 .53 .60 .86 .86 .76 .54 .61 .86 .68 .76 .55 .65 .88 .68 .72 .61 .64 .89 .63 .69 .68 .77 Spelling .54 .61 .86 Capitalization .53 .67 .85 .64 Punctuation .53 .67 .86 .65 .71 Usage & Expression .71 .71 .87 .63 .65 .64 .70 .68 .59 .62 .63 .67 .78 .68 .71 .76 .72 .94 .97 .99 .97 .80 .80 .71 .87 .80 .87 .74 Note: Correlations above the diagonal in italics were adjusted for unreliability in both measures. The two formats yielded different reliabilities per unit of testing time, however. Integrated items required more testing time to achieve the same level of reliability. Although the integrated format might require more reading than the traditional format (and show a stronger relation with ITBS Reading scores), this was not necessarily the case. Table 9.10 presents the correlations between Language tests in the two formats as well as the correlations between language and reading. Iowa Algebra Aptitude Test The Iowa Algebra Aptitude Test (IAAT) was developed to help teachers and counselors make informed decisions about initial placement of students in the secondary mathematics curriculum. In making such decisions, recommendations of previous teachers should be given considerable weight. These cannot usually be the only determining factor, however, since a group of junior high or middle school students will typically have had different teachers, and it is unlikely the teachers share a common standard for judging students’ math abilities. Given the desire for objective evidence to supplement teacher recommendations, IAAT was developed to provide a standardized measure of math aptitude. More information about the IAAT can be obtained from the Iowa Algebra Aptitude Test Manual for Test Use, Interpretation, and Technical Support (Fourth Edition). Description of the Test The IAAT provides a four-part profile of students that identifies specific areas of strength and weakness: Interpreting Mathematical Information (18 items, 10 minutes); Translating to Symbols (15 items, 8 minutes); Finding Relationships (15 items, 8 minutes); and Using Symbols (15 items, 10 minutes). Test Development The item development plan included a close examination of current algebra and pre-algebra textbooks. Research literature in math education was studied to determine current thinking on the beginning of the secondary math curriculum as well as possible promising future directions. The NCTM Standards also were a guiding force in the development of the IAAT. Item development and tryout for the Fourth Edition of the IAAT began in 1988 and continued through 1990. Items in the final forms were selected for content coverage, difficulty level, and discriminating power. In selecting items, the first priority was to match the table of specifications as closely as possible. Given this restriction, an effort was made to select the most discriminating items with difficulty indices between .20 and .80. Standardization The Fourth Edition of the Iowa Algebra Aptitude Test, Forms 1 and 2, was standardized in 1991. Over 8,000 students from 98 public and private school systems across the United States participated. Three stratifying variables were used to select participants in the IAAT standardization: geographic region, district enrollment, and socioeconomic status (SES) of the community. Within each cell of the Region by Size by SES matrix, school districts were randomly selected to participate in the standardization study. In addition, within each geographic region at least one Catholic school was selected to participate. 159 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 160 Table 9.11 Test Summary Statistics Iowa Algebra Aptitude Test – Grade 8 1991 Fall National Standardization Number of Items Interpreting Mathematical Information Translating to Symbols Finding Relationships Using Symbols Total 18 15 15 15 63 Form 1 Mean SD SEM K-R 20 10.03 3.35 1.77 .72 9.55 2.89 1.61 .69 9.14 3.59 1.56 .81 8.87 3.19 1.66 .73 37.59 10.57 3.34 .90 Form 2 Mean SD SEM K-R 20 10.63 3.78 1.77 .78 8.58 2.91 1.67 .67 8.67 3.84 1.63 .82 9.19 3.17 1.65 .73 37.07 11.05 3.32 .91 Test Score Characteristics Table 9.11 provides the descriptive statistics for both forms of the IAAT. The two forms have been equated so that scores on both fall on a common standard score scale. Normative data are currently provided for eighth-grade students only. Also in the table are internal-consistency reliability estimates for the Composite score and each subtest of the IAAT. The IAAT scores have reasonably large reliability coefficients given the length of the subtests. It is important that a test used for selection purposes have evidence of criterion-related validity. Validity evidence for the Fourth Edition of the IAAT was collected in a study in which students took the IAAT in the fall, and first- and second-semester test scores and grades in Algebra 1 were collected. The correlations of the IAAT Composite scores, the two semester exam scores, and the semester grades appear in Table 9.12. In addition, multiple regression analyses were carried out using the IAAT Composite scores and the ITBS Math Total Composite scores as predictor variables and the Algebra 1 grades and test scores as the criterion measures. Of interest was whether the IAAT scores add significantly to predictions of the four criterion variables, given that the ITBS Math Total scores were already available. The rationale for these analyses was that most schools probably have available scores from standardized tests. If the IAAT scores cannot contribute to the prediction of success in Algebra 1 beyond information 160 provided by data on hand, the usefulness of the IAAT would be in doubt. The regression analyses showed the IAAT Composite scores did significantly add to the prediction of success in Algebra 1. A common concern in using tests for selection is predictive bias. In addition to content reviews for all types of bias, Barron, Ansley, and Hoover (1991) conducted a study of potential gender bias in predicting success in algebra with the IAAT. They found that IAAT scores did not yield biased predictions of success for females or males. Table 9.12 Correlations Between IAAT and Algebra Grades and Test Scores Iowa Algebra Aptitude Test – Grade 8 1991 Fall National Standardization Algebra 1 IAAT Composite First Semester Exam .69 (.84) Grades .49 (.75) Second Semester Exam .65 (.82) Grades .45 (.74) Note: Correlations in parentheses have been corrected for range restriction. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 161 Works Cited American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Becker, D. F. and Dunbar, S. B. (1990, April). Common dimensions in elementary and secondary school survey achievement batteries. Paper presented at the annual meeting of the National Council of Measurement in Education, Boston. The American heritage dictionary of the English language (4th ed.). (2000). New York: Houghton-Mifflin. Becker, D. F. and Forsyth, R. A. (1992). An empirical investigation of Thurstone and IRT methods of scaling achievement tests. Journal of Educational Measurement, 29: 341–354. Andrews, K. M. (1995). The effects of scaling design and scaling method on the primary score scale associated with a multi-level achievement test. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Becker, D. F. and Forsyth, R. A. (1994). Gender differences in mathematics problem-solving and science: A longitudinal analysis. International Journal of Educational Research, 21: 407–416. Ankenmann, R. D., Witt, E. A., and Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36: 277–300. Beggs, D. L. and Hieronymus, A. N. (1968). Uniformity of growth in the basic skills throughout the school year and during the summer. Journal of Educational Measurement, 5: 91–97. Ansley, T. N. and Forsyth, R. A. (1982, March). Use of standardized achievement test results at the secondary level. Paper presented at the annual convention of the American Educational Research Association, New York. Bishop, N. S. and Frisbie, D. A. (1999). The effects of test item familiarization on achievement test scores. Applied Measurement in Education, 12: 327–341. Ansley, T. N. and Forsyth, R.A. (1983). Relationship of elementary and secondary school achievement test scores to college performance. Educational and Psychological Measurement, 43: 1103–1112. Barron, S. I., Ansley, T. N., and Hoover, H. D. (1991, March). Gender differences in predicting success in high school algebra. Paper presented at the annual convention of the American Educational Research Association, Chicago. Barron, S. I., Ansley, T. N., and Hoover, H. D. (1991, April). Gender differences in predicting success in high school algebra. Paper presented at the annual convention of the American Educational Research Association, Chicago. Bormuth, J. R. (1969, March). Development of readability analyses. Washington, DC: Office of Education. (ERIC Document Reproduction Service No. ED 029 166). Bormuth, J. R. (1971). Development of standards of readability: Toward a rational criterion of passage performance. Final report. (ERIC Document Reproduction Service No. ED 054 233). Bray, G. B. and Dunbar, S. B. (1994, April). Influence of item format on the internal characteristics of alternate forms of tests of language skills. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA. 161 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 162 Brennan, R. L. (1992). Elements of generalizability theory. ACT Publications, Iowa City, IA. Brennan, R. L. and Lee, W. (1997). Conditional standard errors of measurement for scale scores using binomial and compound binomial assumptions. ITP Occasional Paper No. 41, Iowa Testing Programs, The University of Iowa, Iowa City. Chall, J. S. and Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books. The Chicago manual of style (14th ed.). (1993). Chicago: University of Chicago Press. Cronbach, L. J. (1971). Test validation. In R.L. Thorndike (Ed.), Educational Measurement. Washington, DC: American Council on Education. Dale, E. and Chall, J. S. (1948, January/February). A formula for predicting readability. Educational Research Bulletin, 27: 11–20, 37–54. Dale, E. and O’Rourke, J. (1981). The living word: A national vocabulary inventory. Chicago: World Book-Childcraft International. Davison, A. and Kantor, R. N. (1982). On the failure of readability formulas to define readable texts: A case study from adaptations. Reading Research Quarterly, 17: 187–209. Dorans, N. J. and Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland and H. Wainer (Eds.), Differential Item Functioning, 35–66. Hillsdale, NJ: Erlbaum. Dunbar, S. B., Ordman, V. L., and Mengeling, M. A. (2002). A comparative analysis of achievement by Native Americans in Montana. Iowa Testing Programs, The University of Iowa, Iowa City. Dunn, G. E. (1990). Relationship of eighth-grade achievement test scores to ninth-grade teachers’ marks. Unpublished doctoral dissertation, The University of Iowa, Iowa City. 162 Feldt, L. S. (1984). Some relationships between the binomial error model and classic test theory. Educational and Psychological Measurement, 40 (4): 883–891. Feldt, L. S. (1984). Testing the significance of differences among reliability coefficients. Proceedings of the Fourth Measurement and Evaluation Symposium of the American Alliance for Health, Physical Education, Recreation and Dance (invited research paper), University of Northern Iowa, Cedar Falls. Feldt, L. S. (1997). Can validity rise when reliability declines? Applied Measurement in Education, 10: 377–387. Feldt, L. S. and Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational Measurement (3rd ed.) (pp. 105–146). New York: American Council on Education and Macmillan Publishing. Feldt, L. S. and Qualls, A. L. (1998). Approximating scale score standard error of measurement from the raw score standard error. Applied Measurement in Education, 11 (2): 159–177. Flanagan, J. C. (1951). Units, scores, and norms. In E. F. Lindquist (Ed.), Educational Measurement (pp. 695–763). Washington, DC: American Council on Education. Forsyth, R. A., Ansley, T. N., and Twing, J. S. (1992). The validity of normative data provided for customized tests: Two perspectives. Applied Measurement in Education, 5: 49–62. Frisbie, D. A. and Andrews, K. M. (1990). Kindergarten pupil and teacher behavior during standardized achievement testing. Elementary School Journal, 90: 435–438. Frisbie, D. A. and Cantor, N. K. (1995). The validity of scores from alternative methods of assessing spelling achievement. Journal of Educational Measurement, 32 (1): 55–78. Gerig, J. A., Nibbelink, W. H., and Hoover, H. D. (1992). The effect of print size on reading comprehension. Iowa Reading Journal, 5: 26–28. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 163 Harris, D. J. and Hoover, H. D. (1987). An application of the three-parameter IRT model to vertical equating. Applied Psychological Measurement, 2: 151–159. Hieronymus, A. N. and Hoover, H. D. (1986). Manual for school administrators, Levels 5–14, Iowa Tests of Basic Skills Forms G/H. Chicago: Riverside Publishing. Hieronymus, A. N. and Hoover, H. D. (1990). Manual for school administrators supplement, Levels 5–14, Iowa Tests of Basic Skills Form J. Chicago: Riverside Publishing. Hoover, H. D. (1984). The most appropriate scores for measuring educational development in the elementary schools: GEs. Educational Measurement: Issues and Practice, 3: 8–14. Kolen, M. J. and Brennan, R. L. (1995). Test equating methods and practices. New York: Springer-Verlag. Koretz, D. M. (1986). Trends in educational achievement. Congress of the U.S., Congressional Budget Office, Washington, DC. Lee, S. J. (1995). Gender differences in the Iowa Writing Assessment. Unpublished master’s thesis, The University of Iowa, Iowa City. Lee, G., Dunbar, S. B., and Frisbie, D. A. (2001). The relative appropriateness of eight measurement models for analyzing scores from tests composed of testlets. Educational and Psychological Measurement, 61: 958–975. Hoover, H. D. (2003). Some common misconceptions about tests and testing. Educational Measurement: Issues and Practice, 22 (1): 5–14. Lewis, J. C. (1994). The effect of content and gender on assessment of estimation. Paper presented at the annual convention of the National Council on Measurement in Education, New Orleans, LA. Hoover, H. D., Dunbar, S. B., Frisbie, D. A., and Qualls, A. L. (2003). Teacher’s Directions and Interpretive Guide, Iowa Early Learning Inventory. Itasca, IL: Riverside Publishing. Linn, R. L. and Dunbar, S. B. (1982). Predictive validity of admissions measures: Corrections for selection on several variables. Journal of College Student Personnel, 23: 222–226. Huang, C. (1998). Factors influencing the reliability of DIF detection methods. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Linn, R. L. and Dunbar, S. B. (1990). The nation’s report card goes home: Good news and bad about trends in achievement. Phi Delta Kappan, 72: 127–133. Jarjoura, D. (1986). An estimator of examinee-level measurement error variance that considers test form difficulty adjustments. Applied Psychological Measurement, 10: 175–186. Keats, J. A. (1957). Estimation of error variances of test scores. Psychometrika, 22: 29–41. Kolen, M. J. (1981). Comparison of traditional and item response theory methods for equating tests. Journal of Educational Measurement, 18: 1–11. Kolen, M. J. (1984). Effectiveness of analytic smoothing in equipercentile equating. Journal of Educational Statistics, 9: 25–44. Linn, R. L., Baker, E. L., and Dunbar, S. B. (1991). Complex performance-based assessments: Expectations and validation criteria. Educational Researcher, 20: 15–21. Linn, R. L. (Ed.) (1993). Educational measurement. National Council on Measurement in Education and American Council on Education. Phoenix, AZ: Oryx Press. Loyd, B. H. (1980, April). An investigation of differential item performance by Anglo and Hispanic pupils. Paper presented at the annual convention of the American Educational Research Association, Boston. 163 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 164 Loyd, B. H. (1980). Functional level testing and reliability: An empirical study. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Loyd, B. H. (1980). The effect of item ordering and speed on Rasch Model item parameter estimates. Paper presented at the Iowa Educational Research and Evaluation Association, Iowa City, IA. Loyd, B. H. and Hoover, H. D. (1980). Vertical equating using the Rasch Model. Journal of Educational Measurement, 17: 179–193. Loyd, B. H., Forsyth, R. A., and Hoover, H. D. (1980). Relationship of elementary and secondary school achievement test scores to later academic success. Educational and Psychological Measurement, 40: 1117–1124. Lu, S. and Dunbar, S. B. (1996, April). Assessing the accuracy of the Mantel-Haenszel DIF statistic using the bootstrap method. Paper presented at the annual meeting of the American Educational Research Association, New York City. Lu, S. and Dunbar, S. B. (1996, April). The influence of conditioning variables on assessing DIF in a purposefully multidimensional test. Paper presented at the annual meeting of the National Council on Measurement in Education, New York City. Martin, D. J. (1985). The measurement of growth in educational achievement. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Martin, D. J. and Dunbar, S. B. (1985). Hierarchical factoring in a standardized achievement battery. Educational and Psychological Measurement, 45: 343–351. Mengeling, M. A. (2002). An analysis of district and school variance using hierarchical linear modeling and longitudinal standardized achievement data. Unpublished doctoral dissertation, The University of Iowa, Iowa City. 164 Mengeling, M. and Dunbar, S. B. (1999, April). Temporal stability of standardized test scores in the early elementary grades. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada. Merriam-Webster’s dictionary of English usage. (1994). Springfield, MA: Merriam-Webster. Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., 13–103). New York: American Council on Education/Macmillan Series on Higher Education. Mittman, A. (1958). An empirical study of methods of scaling achievement tests at the elementary grade level. Ph.D. thesis, The University of Iowa, Iowa City. Mollenkopf, W. G. (1949). Variation of the standard error of measurement. Psychometrika, 14: 189–229. National Catholic Educational Association. (2000). NCEA/Ganley’s Catholic schools of America (28th ed.). Silverthorne, CO: Fisher Publishing Company. National Center for Education Statistics. (2000). Digest of Education Statistics, 2000. Washington, DC: U.S. Department of Education. National Council of Teachers of Mathematics. (1989). Curriculum and Evaluation Standards for School Mathematics. Reston, VA: Author. National Council of Teachers of Mathematics. (2000). Principles and Standards for School Mathematics. Reston, VA: Author. National Council for the Social Studies. (1994). Curriculum standards for social studies: Expectations of excellence. Washington, DC: Author. National Research Council. (1996). National Science Education Standards: observe, interact, change, learn. Washington, DC: National Academy Press. 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 165 Nibbelink, W. H. and Hoover, H. D. (1992). The student teacher effect on elementary school class achievement. Journal of Research for School Executives, 2: 61–65. Powers, R. D., Sumner, W. A., and Kearl, B. E. (1958). A recalculation of four adult readability formulas. Journal of Educational Psychology, 49: 99–105. Nibbelink, W. H., Gerig, J. A., and Hoover, H. D. (1993). The effect of print size on achievement on mathematics problem solving. School Science and Mathematics, 93: 20–23. Quality Education Data. (2002). QED national education database, [Data file]. Available from Quality Education Data Web site, http://www.qeddata.com/databaselic.htm O’Conner, P. T. (1996). Woe is I: the grammarphobe’s guide to better English in plain English. New York: Putnam. Qualls, A. L. (1980). Black and white teacher ratings of elementary achievement test items for potential race favoritism. Unpublished master’s thesis, The University of Iowa, Iowa City. Pearsall, M. K. (1993). The content core: A guide for curriculum designers. (Rev. ed.). Washington, DC: National Science Teachers Association. Perkhounkova, Y., Hoover, H. D., and Ankenmann, R. D. (1997, March). An examination of construct validity of multiple-choice versus constructedresponse tests. Paper presented at the annual meeting of the National Council in Measurement in Education, Chicago. Perkhounkova, Y. and Dunbar, S. B. (1999, April). Influences of item content and format on the dimensionality of tests combining multiplechoice and open-response items: An application of the Poly-DIMTEST Procedure. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada. Petersen, N. S., Kolen, M. J., and Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational Measurement (3rd ed.). Washington, DC: American Council on Education. Plake, B. S. (1979). The interpretation of norm-based scores from individualized testing using the Iowa Tests of Basic Skills. Psychology in the Schools, 16: 8–13. Polya, G. (1957). How to solve it; a new aspect of mathematical method (2nd ed.). Garden City, NY: Doubleday. Qualls-Payne, A. L. (1992). A comparison of score level estimates of the standard error of measurement. Journal of Educational Measurement, 29: 213–225. Qualls, A. L. and Ansley, T. N. (1995). The predictive relationship of achievement test scores to academic success. Educational and Psychological Measurement, 55: 485–498. Riverside Publishing. (1998). The Iowa Tests: Special report on Riverside’s national performance standards. Itasca, IL: Author. Rosemier, R. A. (1962). An investigation of discrepancies in percentile ranks between a grade eight administration of ITBS and a grade nine administration of ITED. Unpublished study, Iowa Testing Programs, Iowa City, IA. Rutherford, F. J. and Ahlgren, A. (1990). Science for all Americans: Project 2061 American Association for the Advancement of Science. (Rev. ed.). New York: Oxford University Press. Scannell, D. P. (1958). Differential prediction of academic success from achievement test scores. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Schoen, H. L., Blume, G., and Hoover, H. D. (1990). Outcomes and processes on estimation test items in different formats. Journal for Research in Mathematics Education, 21: 61–73. 165 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 166 Snetzler, S. and Qualls, A. L. (2002). Examination of differential item functioning on a standardized achievement battery with limited English proficient students. Educational and Psychological Measurement, 60: 564–577. Snow, R. E. and Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. L. Linn (Ed.), Educational Measurement (3rd ed.). American Council on Education, Washington, DC. Spache, G. D. (1974). Good reading for poor readers. Champaign, IL: Garrard Publishing. Thorndike, R. L. (1951). Reliability. In E.F. Lindquist (Ed.), Educational Measurement. American Council on Education, Washington, DC. Thorndike, R. L. (1963). The concepts of over- and under-achievement (ERIC Document Reproduction Service No. ED 016 250). Witt, E. A. (1993). The construction of an analytic score scale for the direct assessment of writing and an investigation of its reliability and validity. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Witt, E. A., Ankenmann, R. D., and Dunbar, S. B. (1996, April). The sensitivity of the MantelHaenszel statistic to variations in sampling procedure in DIF analysis. Paper presented at the annual meeting of the National Council on Measurement in Education, New York City. 166 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 167 Index Ability testing, See Cognitive Abilities Test Accommodations Catholic sample, 11 English language learners, 14–15 how defined, 12 students with disabilities, 14–15 Achievement and general cognitive ability correlations between CogAT and ITBS, 131, 137–138 matched sample design, 8–9 measuring unique characteristics, 131 predicting achievement from ability, 131, 136, 143–148 reliability of difference scores, 136, 139–142 Achievement testing, 26–27 Constructed-Response Supplement description, 6, 153 reliability and validity, 154–155 standardization, 154 test development, 28, 35, 154 Thinking about Language, 35, 154 Thinking about Mathematics, 154 Thinking about Reading, 154 Content and process specifications, 27, 34–36, 38, 41–42 Content Classifications with Item Norms, 48, 87 sample pages, 88–89 criteria for developing, 27 curriculum review, 28 distribution of skill objectives, 31 for subtests in ITBS, 30–43 NCTM Standards, 27 role in test design, 27 Bias predictive, 160 See also differential item functioning; fairness review; group differences; validity Content Classifications with Item Norms, 48, 87–89 Catholic Schools in America: Year 2000, 8 Content validity, 26 See also validity Ceiling and floor effects assigning test levels, 100 chance level, 55, 100 summary data, 101–105 See difficulty Cognitive Abilities Test, 7, 127, 131–148 Comparability of test forms concurrent assembly, 60 equivalent-forms reliability, 74–76 importance in equating, 60 relationship of Forms A and B to previous forms, 61–62 Completion rates indices to describe, 100 issues related to time limits on tests, 32, 39, 100 summary data, 101–105 Content standards, See content and process specifications Correlations among test scores building averages, 127, 131–135 Cognitive Abilities Test, 137–138 Constructed-Response Supplement, 154–155 developmental standard scores, 121, 122–126 extraneous factors, 121 Integrated Writing Skills Test, 158–159 Iowa Algebra Aptitude Test, 160 Iowa Writing Assessment, 152–153 Listening Assessment for ITBS, 156–157 Criterion-referenced score interpretation, 87 Critical thinking skills, 43 classifying items, 44 how defined, 43 Cut score, 44 167 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 168 Description of the ITBS batteries and levels, 2–5 Developmental score scales, 51 grade-equivalent (GE) scores, 54 in previous editions, 51–52 national standard score scale, 52–54 purposes of, 54–55 Diagnostic uses of tests, 87 Differential item functioning (DIF), 27, 30, 116–119 Difficulty Content Classifications with Item Norms, 48, 87 sample pages, 88–89 effects on test ceiling/floor, 100, 101–105 for different types of tests, 87 in norms booklets, 94, 100 in test development, 87, 94 individualized testing, 94 item difficulty definition, 87 distributions of, 90–93 summary statistics, 95–99 item norms, 87–89 relation to test reliability, 87 Directions for Administration, 3, 6, 52 Gender differences across content domains, 110 effect size vs. bias, 114 in achievement, 107 in composite scores, 110 in language, 107, 110 in math, 110 in variability of test scores, 110 Iowa Algebra Aptitude Test, 160 summary statistics, 111–113 trends over time, 110, 113 Grade-equivalent (GE) scale, 54 Group differences by gender, 107, 111–113 by race/ethnicity, 107, 114–115 in standard errors of measurement, 107–109 in test scores, 107 Growth, 1, 33, 39, 44, 51–55, 87–89, 121 Individualized testing, 45, 83, 100 Integrated Writing Skills Test, 157–159 correlations with ITBS, 158–159 description and development, 157–158 Discrimination item discrimination definition, 94 summary statistics, 95–99 relation to test reliability, 87 Iowa Algebra Aptitude Test description, 159 predictive validity, 160 standardization, 159 test development, 159 test score characteristics, 160 Effect size, 110–111, 115 Iowa Basic Skills Testing Program, 1, 28, 59–60 English language learners, 14–15, 25 Iowa Early Learning Inventory, 150–151 Equating Forms A and B, 60–61, 149–150 Complete Battery, 60 design and methods, 60 equivalence of forms, 60–62 Survey Battery, 149–150 Iowa Tests of Basic Skills Complete Battery, 2 Core Battery, 2 criteria for instrument design, 27 description of levels, 2–5 Survey Battery, 2, 149–150 Fairness review, 30, 116–118 Iowa Tests of Educational Development, 127 Field testing, See item tryout Iowa Writing Assessment, 6, 34 description, 151 reliability and validity, 152–153 standardization, 152 test development, 28, 152 Floor effects, See ceiling and floor effects Gain scores, See growth 168 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 169 Item difficulty, See difficulty Item discrimination, See discrimination Item response theory, 61 Item tryout, 28, 30 how designed, 28–30 national data, 28, 30 numbers of items included, 28 preliminary studies, 28 statistical analyses, 28 Joint Scaling Constructed-Response Supplement, 154 of ITBS and ITED, 127 Concepts and Estimation test, 38–39 context and symbolic form, 39 estimation strategies, 39 grade placement of test content, 39 evolution of test design, 38 Modern Mathematics Supplement, 38 NCTM Standards, 27, 38, 39, 40 Problem Solving and Data Interpretation test, 39–40 computational skills required, 41 data interpretation skills, 40 multiple-step problems, 40 Polya’s model, 40 NCTM Standards, 27, 38, 39, 40 National Education Database™, 7 Language, 34–38 Capitalization and Punctuation tests, 37 Complete Battery vs. Survey Battery, 36 content standards, 35–36, 38 effect of linguistic change, 35 item format, 35–38 Levels 5–6 of ITBS, 35 Levels 7–14 of ITBS, 35–38 relation to Reference Materials test, 36 reliability and validity, 37–38 Spelling test, 35–37 efficiency, 37 selection of words, 37 standard written English, 35 Usage and Expression test, 37–38 National standardization accommodations, 12, 14–15 Catholic sample, 11 design, 8–9 evidence of quality, 12 gender differences in, 110 Individualized Accommodation Plan (IAP), 12 Individualized Education Program (IEP), 12 number of students, 9–10 participating schools, 16–24 public sample, 11 racial-ethnic differences in, 114 racial-ethnic representation, 12–13 Section 504 Plan, 12 weighting, 5–9 Linking, See equating Forms A and B Norm-referenced interpretation, 51, 87 Listening, 33–34 cognitive aspects of, 34 content/process standards, 34 Levels 5–9 of ITBS, 33–34 See also Listening Assessment for ITBS Norms, 14, 55 calculating dates for twelve-month schools, 14 changes over time, 55–60 comparisons across time, 55–57 defined, 2, 51 fall and spring testing dates, 14 sampling, 7–9 special school populations, 8, 12, 60 vs. standards, 44 Listening Assessment for ITBS, 6, 33–35, 155–157 description, 155 reliability and validity, 156–157 standardization, 155 test development, 155 Orshansky index, 7 Mathematics, 38–41 Computation test, 40–41 grade placement of content, 41 modifications in Forms A and B, 41 Parallel forms, See equating Forms A and B Percentile rank, 53–54 169 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 170 Performance standards, 44 Precision of measurement, See reliability Predicting achievement from CogAT scores building averages, 143–148 estimating measurement error, 136 obtained vs. expected achievement, 136 prediction equations, 136 student scores, 131, 136–138 Predictive validity, 46–47 effects of selection, 46 empirical studies, 46–47 types of estimates, 63–64 equivalent forms, 74 equivalent halves, 75–76 internal-consistency, 64 split halves, 75–76 test-retest, 77 Reporting achievement test results developmental levels, 51–52, 54 norms vs. standards, 44 strengths and weaknesses, 1, 51 validity considerations, 26–27 Role of standards, 1, 44–46 Primary Reading Profile, 32 Process specifications, See content and process specifications Program evaluation, 26–27, 45 Racial-ethnic considerations differences in achievement, 114–115 national standardization, 12–13 standard errors of measurement, 107, 109 Readability, 48–50 definitions, 48 formulas, 48, 50 applied to Forms A and B, 48–50 interpreting readability indices, 48, 50 judging grade level, 48, 50 Reading, 32–33 comprehension vs. decoding, 32 content/process standards, 34 critical thinking, 33 Levels 6–8 of ITBS, 32 Levels 9–14 of ITBS, 32–33 two-part structure, 32 types of reading materials, 33 Reliabilities of differences in test performance, 127–130, 139–142 Reliability, 63–85 Complete Battery, Form A, 64–73 individualized testing, effect on, 83 Kuder-Richardson Formula 20 (K-R20), 63–64, 75–76 reliability coefficient, defined, 63 standard error of measurement, defined, 63 Survey Battery, Form A, 150 170 Scaling, 51–55 changes over time, 51–52 comparability of developmental scores, 51–52 defined, 51 grade-to-grade overlap, 54 growth model for the ITBS, 51, 54 Hieronymus scaling, 51, 52–54 national scaling study, 52–53 Science, 42 content classifications, 42 National Science Teachers Association, 42 Score Interpretation evaluating instruction, 44–45 general concerns, 6 improving instruction, 44–45 modification of test content, 45–46 norm-referenced, 7 norms vs. standards, 44 predicting future performance, 46–47 See also validity Scoring rubric, See Constructed-Response Supplement Iowa Writing Assessment Social Studies, 41–42 content standards, 42 NCSS Standards, 42 relation to other tests, 42 Sources of Information, 43 Maps and Diagrams test, 43 Reference Materials test, 43 skill development, 43 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 171 Stability of scores, 83–85 grade-to-grade correlations, 85 spring-to-fall correlations, 84 Test specifications, See content and process specifications Thinking about Language, 35, 154 Standard errors of measurement, 63, 77 Complete Battery, Form A, 65–73 conditional, 77–82 for groups, 107 Thinking about Mathematics, 154 Thinking about Reading, 154 Standardization, See national standardization Time limits, 3–5 Standards role of, 1, 44–46 See also content and process specifications Trends in achievement, 55–60 in national performance, 57 Iowa Basic Skills Testing Program, 59 summary of median differences, 57–59 Standards for Educational and Psychological Testing, 25 Structural relations among subtests, 121 factor analysis results Complete Battery, 126 Early Primary Battery, 126 factor structure of ITBS Form A, 121, 126 interpreting factors, 126–127 Test administration manuals, 3, 6 preparing for testing, 6 Test development, 28–30, 107 differential item functioning (DIF), 116 fairness review, 30, 107, 114, 116, 119 item review, 116, 119 item tryout, 28, 30 Mantel-Haenszel statistics, 116 of individual subtests, 30–43 of Iowa Early Learning Inventory, 151 of Iowa Writing Assessment, 152 of Survey Battery, 149 steps in process, 28–29 Test levels, 2–5 relationship to grade level, 3 subtests included, 2–3 Test modifications, 45–46 See also accommodations Validity, 1, 25–50 construct-irrelevant variance, 27 content quality, 26 definitions, 25 English language learners, 25 evaluating in achievement tests, 25 alignment, 25 local review, 25 factor analysis results, 121, 126–127 fairness review, 30, 116–118 in relation to purpose, 25 local school concerns, 26–27 NCTM Standards, 27 predictive validity, 46–47 responsibility for, 25–26 Standards for Educational and Psychological Testing, 25 statistical evidence, 26 Variability, 52–54, 87, 90, 100, 110 Vocabulary, 30–32 Levels 5–8 of ITBS, 30 Levels 9–14 of ITBS, 30–32 The Living Word Vocabulary, 30 word selection, 30–31 Word Analysis, 32 sample content classification, 88 Test score summary statistics Complete Battery, Form A, 65–73 Integrated Writing Skills Test, 158–159 Iowa Algebra Aptitude Test, 160 Listening Assessment for ITBS, 156 Survey Battery, Form A, 150 171 961464_ITBS_GuidetoRD.qxp 10/29/10 3:16 PM Page 172