ITBS Research Guide - Iowa Testing Programs

Transcription

ITBS Research Guide - Iowa Testing Programs
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page i
Contents
Part 1 Nature and Purposes of The Iowa
Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
The Iowa Tests. . . . . . . . . . . . . . . . . . . . . . . 1
Major Purposes of the ITBS Batteries . . . . . 1
Validity of the Tests . . . . . . . . . . . . . . . . . . . 1
Description of the ITBS Batteries . . . . . . . . 2
Names of the Tests . . . . . . . . . . . . . . . . . 2
Description of the Test Batteries . . . . . . . 2
Nature of the Batteries . . . . . . . . . . . . . . 2
Nature of the Levels . . . . . . . . . . . . . . . . 2
Grade Levels and Test Levels. . . . . . . . . 3
Test Lengths and Times . . . . . . . . . . . . . 3
Nature of the Questions . . . . . . . . . . . . . 3
Mode of Responding . . . . . . . . . . . . . . . . 3
Directions. . . . . . . . . . . . . . . . . . . . . . . . . 3
Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . 6
Iowa Writing Assessment . . . . . . . . . . . . 6
Listening Assessment for ITBS . . . . . . . . 6
Constructed-Response Supplement
to The Iowa Tests . . . . . . . . . . . . . . . . . 6
Other Manuals . . . . . . . . . . . . . . . . . . . . . . . 6
Part 2 The National Standardization Program . . 7
Planning the National Standardization
Program. . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Procedures for Selecting the
Standardization Sample . . . . . . . . . . . . . . 7
Public School Sample . . . . . . . . . . . . . . . 7
Catholic School Sample . . . . . . . . . . . . . 8
Private Non-Catholic School Sample . . . 8
Summary . . . . . . . . . . . . . . . . . . . . . . . . . 8
Design for Collecting the
Standardization Data . . . . . . . . . . . . . . . . . 8
Weighting the Samples . . . . . . . . . . . . . . . . 8
Racial-Ethnic Representation . . . . . . . . . . 12
Participation of Students in
Special Groups . . . . . . . . . . . . . . . . . . . . 12
Empirical Norms Dates . . . . . . . . . . . . . . . 14
School Systems Included in the 2000
Standardization Samples. . . . . . . . . . . . . 16
New England and Mideast . . . . . . . . . . 16
Southeast . . . . . . . . . . . . . . . . . . . . . . . 17
Great Lakes and Plains. . . . . . . . . . . . . 19
West and Far West . . . . . . . . . . . . . . . . 22
Part 3 Validity in the Development and
Use of The Iowa Tests . . . . . . . . . . . . . . . 25
Validity in Test Use . . . . . . . . . . . . . . . . . . 25
Criteria for Evaluating Achievement
Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Validity of the Tests . . . . . . . . . . . . . . . . . . 25
Statistical Data to Be Considered . . . . . . . 26
Validity of the Tests in the Local School . . 26
Domain Specifications . . . . . . . . . . . . . . . . 27
Content Standards and Development
Procedures . . . . . . . . . . . . . . . . . . . . . . . 28
Curriculum Review . . . . . . . . . . . . . . . . 28
Preliminary Item Tryout . . . . . . . . . . . . . 28
National Item Tryout . . . . . . . . . . . . . . . 28
Fairness Review . . . . . . . . . . . . . . . . . . 30
Development of Individual Tests . . . . . . 30
Critical Thinking Skills . . . . . . . . . . . . . . 43
Other Validity Considerations . . . . . . . . . . 44
Norms Versus Standards . . . . . . . . . . . 44
Using Tests to Improve Instruction . . . . 44
Using Tests to Evaluate Instruction . . . . 45
Local Modification of Test Content . . . . 45
Predictive Validity . . . . . . . . . . . . . . . . . 46
Readability. . . . . . . . . . . . . . . . . . . . . . . 48
Part 4 Scaling, Norming, and Equating
The Iowa Tests . . . . . . . . . . . . . . . . . . . . . 51
Frames of Reference for Reporting
School Achievement . . . . . . . . . . . . . . . . 51
Comparability of Developmental Scores
Across Levels: The Growth Model. . . . . . 51
The National Standard Score Scale . . . . . 52
Development and Monitoring of National
Norms for the ITBS . . . . . . . . . . . . . . . . . 55
Trends in Achievement Test
Performance . . . . . . . . . . . . . . . . . . . . . . 55
Norms for Special School Populations . . . 60
Equivalence of Forms . . . . . . . . . . . . . . . . 60
Relationships of Forms A and B to
Previous Forms . . . . . . . . . . . . . . . . . . . . 61
i
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page ii
Part 5 Reliability of The Iowa Tests . . . . . . . . . . 63
Methods of Determining, Reporting,
and Using Reliability Data . . . . . . . . . . . . 63
Internal-Consistency Reliability Analysis . . 64
Equivalent-Forms Reliability Analysis . . . . 74
Sources of Error in Measurement . . . . . . . 75
Standard Errors of Measurement for
Selected Score Levels. . . . . . . . . . . . . . . 77
Effects of Individualized Testing on
Reliability . . . . . . . . . . . . . . . . . . . . . . . . . 83
Stability of Scores on the ITBS . . . . . . . . . 83
Part 6 Item and Test Analysis. . . . . . . . . . . . . . . 87
Difficulty of the Tests . . . . . . . . . . . . . . . . . 87
Discrimination . . . . . . . . . . . . . . . . . . . . . . 94
Ceiling and Floor Effects . . . . . . . . . . . . . 100
Completion Rates . . . . . . . . . . . . . . . . . . 100
Other Test Characteristics . . . . . . . . . . . . 100
Part 7 Group Differences in Item and Test
Performance . . . . . . . . . . . . . . . . . . . . . . 107
Standard Errors of Measurement for
Groups. . . . . . . . . . . . . . . . . . . . . . . . . . 107
Gender Differences in Achievement . . . . 107
Racial-Ethnic Differences in
Achievement . . . . . . . . . . . . . . . . . . . . . 114
Differential Item Functioning . . . . . . . . . . 116
Part 8 Relationships in Test Performance . . . 121
Correlations Among Test Scores for
Individuals . . . . . . . . . . . . . . . . . . . . . . . 121
Structural Relationships Among Content
Domains . . . . . . . . . . . . . . . . . . . . . . . . 121
Levels 9 through 14. . . . . . . . . . . . . . . 126
Levels 7 and 8. . . . . . . . . . . . . . . . . . . 126
Levels 5 and 6. . . . . . . . . . . . . . . . . . . 126
Interpretation of Factors . . . . . . . . . . . 126
Reliabilities of Differences in Test
Performance . . . . . . . . . . . . . . . . . . . . . 127
Correlations Among Building Averages . . 127
Relations Between Achievement and
General Cognitive Ability . . . . . . . . . . . . 127
Predicting Achievement from General
Cognitive Ability: Individual Scores . . . . 131
Obtained Versus Expected
Achievement. . . . . . . . . . . . . . . . . . . 136
Predicting Achievement from General
Cognitive Ability: Group Averages . . . . . 143
ii
Part 9 Technical Consideration for
Other Iowa Tests. . . . . . . . . . . . . . . . . . . 149
Iowa Tests of Basic Skills
Survey Battery . . . . . . . . . . . . . . . . . . . . 149
Description of the Tests . . . . . . . . . . . . 149
Other Scores . . . . . . . . . . . . . . . . . . . . 149
Test Development . . . . . . . . . . . . . . . . 149
Standardization . . . . . . . . . . . . . . . . . . 149
Test Score Characteristics. . . . . . . . . . 150
Iowa Early Learning Inventory . . . . . . . . . 150
Description of the Inventory . . . . . . . . 150
Test Development . . . . . . . . . . . . . . . . 151
Standardization . . . . . . . . . . . . . . . . . . 151
Iowa Writing Assessment . . . . . . . . . . . . 151
Description of the Test. . . . . . . . . . . . . 151
Test Development . . . . . . . . . . . . . . . . 152
Standardization . . . . . . . . . . . . . . . . . . 152
Test Score Characteristics. . . . . . . . . . 152
Constructed-Response Supplement
to The Iowa Tests . . . . . . . . . . . . . . . . . 153
Description of the Tests . . . . . . . . . . . . 153
Test Development . . . . . . . . . . . . . . . . 154
Joint Scaling with the ITBS . . . . . . . . . 154
Test Score Characteristics. . . . . . . . . . 154
Listening Assessment for ITBS . . . . . . . . 155
Description of the Test. . . . . . . . . . . . . 155
Test Development . . . . . . . . . . . . . . . . 155
Standardization . . . . . . . . . . . . . . . . . . 155
Test Score Characteristics. . . . . . . . . . 157
Predictive Validity . . . . . . . . . . . . . . . . 157
Integrated Writing Skills Test . . . . . . . . . . 157
Description of the Tests . . . . . . . . . . . . 157
Test Development . . . . . . . . . . . . . . . . 158
Standardization . . . . . . . . . . . . . . . . . . 158
Test Score Characteristics. . . . . . . . . . 158
Iowa Algebra Aptitude Test . . . . . . . . . . . 159
Description of the Test. . . . . . . . . . . . . 159
Test Development . . . . . . . . . . . . . . . . 159
Standardization . . . . . . . . . . . . . . . . . . 159
Test Score Characteristics. . . . . . . . . . 160
Works Cited. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page iii
Tables and Figures
Part 1: Nature and Purposes of The Iowa Tests® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Table 1.1 Test and Grade Level Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Table 1.2 Number of Items and Test Time Limits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Part 2: The National Standardization Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Table 2.1 Summary of Standardization Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Table 2.2 Sample Size and Percent of Students by Type of School . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 2.3 Percent of Public School Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 2.4 Percent of Public School Students by SES Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Table 2.5 Percent of Public School Students by District Enrollment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 2.6 Percent of Catholic Students by Diocese Size and Geographic Region . . . . . . . . . . . . . . . . . . . 11
Table 2.7 Percent of Private Non-Catholic Students by Geographic Region . . . . . . . . . . . . . . . . . . . . . . . . 12
Table 2.8 Racial-Ethnic Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Table 2.9 Test Accommodations—Special Education and 504 Students. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Table 2.10 Test Accommodations—English Language Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Part 3: Validity in the Development and Use of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Figure 3.1 Steps in Development of the Iowa Tests of Basic Skills. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Table 3.1 Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B . . . . . . . . . . . 31
Table 3.2 Types of Reading Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Table 3.3 Reading Content/Process Standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 3.4 Listening Content/Process Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 3.5 Comparison of Language Tests by Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 3.6 Computational Skill Level Required for Math Problem Solving and Data Interpretation . . . . . . . 41
Table 3.7 Summary Data from Predictive Validity Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Table 3.8 Readability Indices for Selected Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Part 4: Scaling, Norming, and Equating The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Table 4.1 Comparison of Grade-to-Grade Overlap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Table 4.2 Differences Between National Percentile Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 4.1 Trends in National Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Table 4.3 Summary of Median Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 4.2 Trends in Iowa Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Table 4.4 Sample Sizes for Equating Forms A and B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Part 5: Reliability of The Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Table 5.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Table 5.2 Equivalent-Forms Reliabilities, Levels 5–14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Table 5.3 Estimates of Equivalent-Forms Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Table 5.4 Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests . . . . . . . . . . . . . 76
Table 5.5 Test-Retest Reliabilities, Levels 5–8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Table 5.6 Standard Errors of Measurement for Selected Standard Score Levels . . . . . . . . . . . . . . . . . . . . 78
Table 5.7 Correlations Between Developmental Standard Scores, Forms A and B . . . . . . . . . . . . . . . . . . 84
Table 5.8 Correlations Between Developmental Standard Scores, Forms K and L . . . . . . . . . . . . . . . . . . 85
iii
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page iv
Part 6: Item and Test Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Table 6.1 Word Analysis Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table 6.2 Usage and Expression Content Classifications with Item Norms . . . . . . . . . . . . . . . . . . . . . . . . 89
Table 6.3 Distribution of Item Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Table 6.4 Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices . . . . . . . . . . . . 95
Table 6.5 Ceiling Effects, Floor Effects, and Completion Rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Part 7: Group Differences in Item and Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Table 7.1 Standard Errors of Measurement in the Standard Score Metric for ITBS
by Level and Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
by Level and Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Table 7.2 Male-Female Effect Sizes for Average Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Table 7.3 Descriptive Statistics by Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Table 7.4 Gender Differences in Achievement over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Table 7.5 Race Differences in Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Table 7.6 Effect Sizes for Racial-Ethnic Differences in Average Achievement . . . . . . . . . . . . . . . . . . . . . 115
Table 7.7 Fairness Reviewers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Table 7.8 Number of Items Identified in Category C in National DIF Study . . . . . . . . . . . . . . . . . . . . . . . . 119
Part 8: Relationships in Test Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Table 8.1 Correlations Among Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Table 8.2 Reliabilities of Differences Among Scores for Major Test Areas:
Developmental Standard Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Table 8.3 Reliabilities of Differences Among Tests: Developmental Standard Scores . . . . . . . . . . . . . . . 128
Table 8.4 Correlations Among School Average Developmental Standard Scores. . . . . . . . . . . . . . . . . . . 131
Table 8.5 Correlations Between Standard Age Scores and Developmental Standard Scores . . . . . . . . . 137
Table 8.6 Reliabilities of Difference Scores and Standard Deviations of Difference Scores
Due to Errors of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Table 8.7 Correlations, Prediction Constants, and Standard Errors of Estimate for School Averages . . . 145
Part 9: Technical Consideration for Other Iowa Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Table 9.1 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Iowa Tests of Basic Skills–Survey Battery, Form A
Table 9.2 Average Reliability Coefficients, Grades 3–8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Iowa Writing Assessment
Table 9.3 Correlations and Reliability of Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Iowa Writing Assessment and Iowa Tests of Basic Skills Language Total
Table 9.4 Internal-Consistency Reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Constructed-Response Supplement
Table 9.5 Correlations and Reliabilities of Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Constructed-Response Supplement and Corresponding ITBS Subtests
Table 9.6 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Listening Assessment for ITBS
Table 9.7 Correlations Between Listening and ITBS Achievement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Table 9.8 Correlations Between Listening Grade 2 and ITBS Grade 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Table 9.9 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Integrated Writing Skills Test, Form M
Table 9.10 Correlations Between IWST and ITBS Reading and Language Tests . . . . . . . . . . . . . . . . . . . . 159
Table 9.11 Test Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Iowa Algebra Aptitude Test–Grade 8
Table 9.12 Correlations Between IAAT and Algebra Grades and Test Scores . . . . . . . . . . . . . . . . . . . . . . 160
iv
961464_ITBS_GuidetoRD.qxp
10/29/10
PART 1
3:15 PM
Page 1
Nature and Purposes of The Iowa Tests ®
The Iowa Tests
Validity of the Tests
The Iowa Tests consist of a variety of educational
achievement instruments developed by the faculty
and professional staff at Iowa Testing Programs at
The University of Iowa. The Iowa Tests of Basic
Skills® (ITBS®) measure educational achievement
in 15 subject areas for kindergarten through grade 8.
The Iowa Tests of Educational Development ®
(ITED®) measure educational achievement in nine
subject areas for grades 9 through 12. These test
batteries share a history of development that has
been an integral part of the research program in
educational measurement at The University of Iowa
for the past 70 years. In addition to these achievement
batteries, The Iowa Tests include specialized
instruments for specific achievement domains.
The most valid assessment of achievement for a
particular school is one that most closely defines
that school’s education standards and goals for
teaching and learning. Ideally, the skills and
abilities required for success in assessment should
be the same skills and abilities developed through
local instruction. Whether this ideal has been
attained in the Iowa Tests of Basic Skills is
something that must be determined from an itemby-item examination of the test battery early in the
decision-making process.
This Guide to Research and Development is devoted
primarily to the ITBS and related assessments. The
Guide to Research and Development for the ITED
contains technical information about that test
battery and related assessments.
Major Purposes of the ITBS Batteries
The purpose of measurement is to provide
information that can be used to improve instruction
and learning. Assessment of any kind has value to
the extent that it results in better decisions for
students. In general, these decisions apply to
choosing goals for instruction and learning
strategies to achieve those goals, designing effective
classroom environments, and meeting the diverse
needs and characteristics of students.
The Iowa Tests of Basic Skills measure growth
in fundamental areas of school achievement:
vocabulary, reading comprehension, language,
mathematics, social studies, science, and sources of
information. The achievement standards represented
by the tests are crucial in educational development
because they can determine the extent to which
students will benefit from later instruction. Periodic
assessment in these areas is essential to tailor
instruction to individuals and groups, to provide
educational guidance, and to evaluate the
effectiveness of instruction.
Common practices to validate test content have
been used to prepare individual items for The Iowa
Tests. The content standards were determined
through consideration of typical course coverage,
current teaching methods, and recommendations of
national curriculum groups. Test content has been
carefully selected to represent best curriculum
practice, to reflect current performance standards,
and to represent diverse populations. The
arrangement of items into levels within tests follows
a scope and sequence appropriate to a particular
level of teaching and cognitive development. Items
are selected for content relevance from a larger pool
of items tried out with a range of students at each
grade level.
Throughout the battery, efforts have been made to
emphasize the functional value of what students
learn in school. Students’ abilities to use what they
learn to interpret what they read, to analyze
language, and to solve problems are tested in
situations that approximate—to the extent possible
with a paper and pencil test—actual situations in
which students may use these skills.
Ultimately, the validity of information about
achievement derived from The Iowa Tests depends
on how the information is used to improve
instruction and learning. Over the years, the
audience for assessment information has grown.
Today it represents varied constituencies concerned
about educational progress at local, state, and
national levels. To make assessment information
useful, careful attention must be paid to reporting
results to students, to parents and teachers, to
1
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 2
school administrators and board members, and to
the public. Descriptions of the types of score reports
provided with The Iowa Tests are included in the
Interpretive Guide for Teachers and Counselors and
the Interpretive Guide for School Administrators.
How to present test results to various audiences is
discussed in these guides.
Description of the ITBS Batteries
Nature of the Levels
Levels 5–6 (Grades K.1–1.9)
The achievement tests included in the Complete
Battery are listed below. The Composite score for
these levels, Core Total, includes only the tests
preceded by a solid circle (•). Those included in
the Reading Profile Total are followed by an
asterisk (*). Abbreviations used in this Guide appear
in parentheses.
• Vocabulary* (V)
Names of the Tests
Iowa Tests of Basic Skills® (ITBS®) Form A, Level 5;
Form A, Level 6; Forms A and B, Levels 7 and 8;
Forms A and B, Levels 9–14.
Description of the Test Batteries
The ITBS includes three batteries that allow for a
variety of testing needs:
• The Complete Battery consists of five to fifteen
subtests, depending on level, and is available at
Levels 5 through 14.
• The Core Battery consists of a subset of tests in
the Complete Battery, including all tests that
assess reading, language, and math. It is
available at Levels 7 through 14.
• The Survey Battery consists of 30-minute tests on
reading, language, and math. Items in the Survey
Battery come from tests in the Complete Battery.
It is available at Levels 7 through 14.
Nature of the Batteries
Word Analysis* (WA)
Listening* (Li)
• Language (L)
• Mathematics (M)
Reading: Words* (Level 6 only) (RW)
Reading: Comprehension* (Level 6 only) (RC)
Levels 7–8 (Grades 1.7–3.2)
The achievement tests included in the Complete
Battery and the Core Battery are listed below. Those
in the Core Battery are preceded by a solid circle (•).
Those included in the Reading Profile Total are
followed by an asterisk (*). Test abbreviations are
given in parentheses.
•
•
•
•
•
•
•
•
•
Levels 5–8
Levels 5 and 6 of Form A are published as a
Complete Battery; there is no separate Core Battery
or Survey Battery for these levels. Levels 7 and 8 of
Forms A and B are published as a Complete Battery
(twelve tests), a Core Battery (nine tests), and a
Survey Battery (three tests).
Levels 9–14
Levels 9 through 14 of Forms A and B are published
in a Complete Battery (thirteen tests) and a Survey
Battery (three tests). At Level 9, two additional tests
are available, Word Analysis and Listening. For
Level 9 only, a machine-scorable Complete Battery, a
Core Battery (eleven tests), and a Survey Battery
are available. Levels 10 through 14 have no
separate Core Battery booklet; all Core tests are
part of the Complete Battery booklet.
2
Vocabulary* (V)
Word Analysis* (WA)
Reading* (RC)
Listening* (Li)
Spelling* (L1)
Language (L)
Mathematics Concepts (M1)
Mathematics Problems (M2)
Mathematics Computation (M3)
Social Studies (SS)
Science (SC)
Sources of Information (SI)
Levels 9–14 (Grades 3.0–9.9)
The achievement tests in the Complete Battery are
listed below. Those in the Core Battery are preceded
by a solid circle (•). Those tests included in the
Reading Profile Total for Level 9 are followed by an
asterisk (*).
•
•
•
•
•
•
•
•
Vocabulary* (V)
Reading Comprehension* (RC)
Word Analysis* (Level 9 only) (WA)
Listening* (Level 9 only) (Li)
Spelling* (L1)
Capitalization (L2)
Punctuation (L3)
Usage and Expression (L4)
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 3
• Math Concepts and Estimation (M1)
• Math Problem Solving and Data
Interpretation (M2)
• Math Computation (M3)
Social Studies (SS)
Science (SC)
Maps and Diagrams (S1)
Reference Materials (S2)
Tests in the Survey Battery—Reading, Language, and
Mathematics—comprise items from the Complete
Battery. Each test is divided into the parts indicated.
Reading (two parts)
Vocabulary
Comprehension
Language
Mathematics (three parts)
Concepts, Problem Solving and Data
Interpretation
Estimation
Computation
Grade Levels and Test Levels
Levels 5 through 14 represent a comprehensive
assessment program for kindergarten through
grade 9. Each level is numbered to correspond
roughly to the age of the student for whom it is best
suited. A student should be given the level most
compatible with his or her level of academic
development. Typically, students in kindergarten
and grades 1 and 2 would take only three of the
Primary Battery’s four levels before taking Level 9
in grade 3.
Table 1.1 shows how test level corresponds to a
student’s level of academic development, expressed
as a grade range. Decimals in the last column
indicate month of the school year. For example,
K.1–1.5 means the first month of kindergarten
through the fifth month of grade 1.
Test Lengths and Times
For Levels 5 through 8, the number of questions and
approximate working time for each test are given in
Table 1.2. Tests at these levels are untimed; the
actual time required for a test varies somewhat with
the skill level of the students. (The administration
times in the table are based on average rates
reported by teachers in tryout sessions.) The Level 6
Reading test is administered in two sessions.
For Levels 9 through 14, all tests are timed; the
administration times include time to read directions
as well as to take the tests.
Table 1.1
Test and Grade Level Correspondence
Iowa Tests of Basic Skills, Forms A and B
Test Level
Age
Grade Level
5
6
7
8
9
10
11
12
13
14
5
6
7
8
9
10
11
12
13
14
K.1 – 1.5
K.7 – 1.9
1.7 – 2.3
2.3 – 3.2
3.0 – 3.9
4.0 – 4.9
5.0 – 5.9
6.0 – 6.9
7.0 – 7.9
8.0 – 9.9
Nature of the Questions
For Levels 5 through 8, questions are read aloud
except at Level 6 for parts of the Reading test, and
at Levels 7 and 8 except for the Reading test and
parts of the Vocabulary and Math Computation
tests. Questions are multiple choice with three or
four response options. Responses are presented in
pictures, letters, numerals, or words, depending on
the test and level. All questions in Levels 9 through
14 are multiple choice, have four or five options, and
are read by the student.
Mode of Responding
Students who take Levels 5 through 8 mark
answers in machine-scorable booklets by filling in a
circle. Those who take Levels 9 through 14 mark
answers on a separate answer folder (Complete
Battery) or answer sheet (Survey Battery). For the
machine-scorable booklets at Level 9, students
mark answers in the test booklets.
Directions
A separate Directions for Administration manual is
provided for each Complete Battery (Levels 5
through 8) and Core Battery (Levels 7 and 8) level
and form. The Survey Battery (Levels 7 and 8) has
separate Directions for Administration manuals for
each level and form. At Levels 9 through 14, there is
one Directions for Administration manual for Forms
A and B of the Complete Battery. At these levels, the
Survey Battery has a single Directions for
Administration manual. The machine-scorable
booklets of Level 9 have separate Directions for
Administration manuals.
3
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 4
Table 1.2
Number of Items and Test Time Limits
Iowa Tests of Basic Skills, Forms A and B
Level 5: Complete Battery
Approximate
Working Time
(Minutes)
• Vocabulary
Word Analysis
Listening
• Language
• Mathematics
• Core Tests
Complete Battery
20
20
30
25
25
1 hr., 10 min.
2 hrs.
Level 6: Complete Battery
Number
of
Items
29
30
29
29
29
87
146
Level 7: Complete and Core Battery
Approximate
Working Time
(Minutes)
• Vocabulary
15
• Word Analysis
15
• Reading
35
• Listening
25
• Spelling
15
• Language
15
• Math Concepts
20
• Math Problems
25
• Math Computation
20
Social Studies
25
Science
25
Sources of Information
25
• Core Battery
3 hrs., 5 min.
Complete Battery 4 hrs., 20 min.
Reading
30
Language
25
Mathematics
22
Mathematics Computation
8
Survey Battery
1 hr., 25 min.
4
• Vocabulary
20
Word Analysis
20
Listening
30
• Language
25
• Mathematics
25
Reading: Words
23
Reading: Comprehension
20
• Core Tests
1 hr., 10 min.
Complete Battery 2 hrs., 43 min.
Number
of
Items
31
35
31
31
35
29
19
97
211
Level 8: Complete and Core Battery
Number
of
Items
30
35
34
31
23
23
29
28
27
31
31
22
260
344
Level 7: Survey Battery
Approximate
Working Time
(Minutes)
Approximate
Working Time
(Minutes)
Approximate
Working Time
(Minutes)
• Vocabulary
15
• Word Analysis
15
• Reading
35
• Listening
25
• Spelling
15
• Language
15
• Math Concepts
20
• Math Problems
25
• Math Computation
20
Social Studies
25
Science
25
Sources of Information
30
• Core Battery
3 hrs., 5 min.
Complete Battery 4 hrs., 25 min.
Number
of
Items
32
38
38
31
23
31
31
30
30
31
31
28
284
374
Level 8: Survey Battery
Number
of
Items
40
34
27
13
114
Approximate
Working Time
(Minutes)
Reading
30
Language
25
Mathematics
22
Mathematics Computation
8
Survey Battery
1 hr., 25 min.
Number
of
Items
44
42
33
17
136
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 5
Table 1.2 (continued)
Number of Items and Test Time Limits
Iowa Tests of Basic Skills, Forms A and B
Levels 9 –14: Number of Items, Complete and Core Battery
Level
Working Time
(Minutes)
• Vocabulary
15
• Reading Comprehension1
25 + 30
• Spelling
12
• Capitalization
12
• Punctuation
12
• Usage and Expression
30
• Mathematics Concepts and Estimation1 25 + 5
• Mathematics Problem Solving
and Data Interpretation
30
• Mathematics Computation
15
Social Studies
30
Science
30
Maps and Diagrams
30
Reference Materials
25
• Word Analysis2
20
• Listening2
25
• Core Battery (3 hrs., 31 min.)3
Complete Battery (5 hrs., 26 min.)4
211
326
9
10
11
12
13
14
29
37
28
24
24
30
31
34
41
32
26
26
33
36
37
43
36
28
28
35
40
39
45
38
30
30
38
43
41
48
40
32
32
40
46
42
52
42
34
34
43
49
22
25
30
30
24
28
35
31
24
27
34
34
25
30
–
–
26
29
37
37
26
32
–
–
28
30
39
39
28
34
–
–
30
31
41
41
30
36
–
–
32
32
43
43
31
38
–
–
2503
3624
279
402
302
434
321
461
340
488
360
515
1 This
test is administered in two parts.
test is untimed. The time given is approximate.
3 With Word Analysis and Listening at Level 9, testing time is 256 min. (4 h., 16 m.) and the number of items is 316.
4 With Word Analysis and Listening at Level 9, testing time is 371 min. (6 h., 11 m.) and the number of items is 428.
2 This
Levels 9 –14: Number of Items, Survey Battery
Level
Working Time
(Minutes)
9
10
11
12
13
14
Reading
Part 1: Vocabulary
Part 2: Comprehension
Language
Mathematics
Part 1: Concepts and Problems
Part 2: Estimation
Part 3: Computation
30
5
25
30
30
22
3
5
27
10
17
43
31
19
4
8
30
11
19
47
34
21
4
9
32
12
20
51
37
23
5
9
34
13
21
54
40
25
5
10
36
14
22
57
43
27
6
10
37
14
23
59
46
29
6
11
Survey Battery (1 hr., 30 min.)
90
101
111
120
128
136
142
5
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 6
Other Iowa Tests
Other Manuals
Iowa Writing Assessment
In addition to this Guide to Research and
Development, several other manuals provide
information for test users. Each Directions for
Administration manual includes a section on
preparing for test administration as well as the script
needed to administer the tests. The Test Coordinator
Guide offers suggestions about policies and
procedures associated with testing, advice about
planning for and administering the testing program,
ideas about preparing students and parents, and
details about how to prepare answer documents for
the scoring service. The Interpretive Guide for
Teachers and Counselors describes test content,
score reports, use of test results for instructional
purposes, and communication of results to students
and parents. The Interpretive Guide for School
Administrators offers additional information,
including guidance on designing a districtwide
assessment program and reporting test results. The
Norms and Score Conversions booklets contain
directions for hand scoring and norms tables for
converting raw scores to derived scores such as
standard scores and percentile ranks.
The Iowa Writing Assessment measures a student’s
ability to generate, organize, and express ideas in
writing. This assessment includes four prompts that
require students to compose an essay in either
narrative, descriptive, persuasive, or expository
modes. With norm-referenced evaluation of a
student’s writing about a specific topic, the Iowa
Writing Assessment adds to the information
obtained from other language tests and from the
writing students do in the classroom.
Listening Assessment for ITBS
Content specifications for Levels 9 through 14 of
the Listening tests are based on current literature
in the teaching and assessment of listening
comprehension. The main purposes of the Listening
Assessment are: (a) to measure strengths and
weaknesses in listening so effective instruction can
be planned to meet individual and group needs;
(b) to monitor listening instruction; and (c) to help
make teachers and students aware of the
importance of good listening strategies.
Constructed-Response Supplement to
The Iowa Tests
These tests may be used with the Complete Battery
and Survey Battery of the ITBS. The ConstructedResponse Supplement measures achievement in
reading, language, and math in an open-ended
format. Students write answers in the test booklet,
and teachers use the scoring guidelines to rate the
responses. The results can be used to provide
information about achievement to satisfy
requirements for multiple measures.
6
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 7
PART 2
The National Standardization
Program
Normative data collected at the time of
standardization is what distinguishes normreferenced tests from other assessments. It is
through the standardization process that scores,
scales, and norms are developed. The procedures
used in the standardization of The Iowa Tests are
designed to make the norming sample reflect the
national population as closely as possible, ensuring
proportional representation of ethnic and
socioeconomic groups.
The standardization of the Iowa Tests of Basic Skills
(ITBS) Complete Battery and Survey Battery was a
cooperative venture. It was planned by the ITBS
authors, the publisher, and the authors of the Iowa
Tests of Educational Development (ITED) and the
Cognitive Abilities Test™ (CogAT ®). Many public and
non-public schools cooperated in national item
tryouts and standardization activities, which
included the 2000 spring and fall test
administrations, scaling, and equating studies.
Planning the National Standardization
Program
The standardization of the ITBS, ITED, and CogAT
was carried out as a single enterprise. After
reviewing previous national standardization
programs, the basic principles and conditions of
those programs were adapted to the following
current needs:
• The sample should be selected to represent the
national population with respect to ability and
achievement. It should be large enough to
represent the diverse characteristics of the
population, but a carefully selected sample of
reasonable size would be preferred over a larger
but less carefully selected sample.
• Sampling units should be chosen primarily on the
basis of school district size, region of country, and
socioeconomic characteristics. A balance between
public and non-public schools should be obtained.
• The sample of attendance centers should be
sufficiently large and selected to provide
dependable norms for building averages.
• Attendance centers in each part of the sample
should represent the central tendency and
variability of the population.
• To ensure comparability of norms from grade to
grade, all grades in a selected attendance center
(or a designated fraction thereof) should be
tested.
• To ensure comparability of norms for ability and
achievement tests, both the ITBS and the CogAT
should be administered to the same students at
the appropriate grade level.
• To ensure comparability of norms for Complete
and Survey Batteries, alternate forms of both
batteries should be administered at the
appropriate grade level to the same students or to
equivalent samples of students.
• To ensure applicability of norms to all students,
testing accommodations for students who require
them should be a regular part of the
standardization design.
Procedures for Selecting the
Standardization Sample
Public School Sample
Three stratifying variables were used to classify public
school districts across the nation: geographic region,
district enrollment, and socioeconomic status (SES) of
the school district. Within each geographic region
(New England and Mideast, Southeast, Great Lakes
and Plains, and West and Far West), school districts
were stratified into nine enrollment categories.
School district SES was determined with data from
the National Education Database™ (Quality
Education Data, 2002). The socioeconomic index is
the percent of students in a district falling below the
federal government poverty guideline, similar to the
Orshansky index used in sampling for the National
Assessment of Educational Progress (NAEP). This
index was used in each of the four regions to break
the nine district-size categories into five strata.
7
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 8
In each SES category, districts were selected at
random and designated as first, second, or third
choices. Administrators in the selected districts
were contacted by the publisher and invited to
participate. If a district declined, the next choice was
contacted.
Catholic School Sample
The primary source for selecting and weighting the
Catholic sample was NCEA/Ganley’s Catholic
Schools in America (NCEA, 2000). Within each
geographic region of the public sample, schools were
stratified into five categories on the basis of
diocesan enrollment. A two-stage random sampling
procedure was used to select the sample.
In the first stage, dioceses were randomly selected
from each of five enrollment categories. Different
sampling fractions were used, ranging from 1.0 for
dioceses with total student enrollment above
100,000 (all four were selected) to .07 for dioceses
with fewer than 10,000 students (seven of 102 were
selected). In the second stage, schools were
randomly chosen from each diocese selected in the
first stage. In all but the smallest enrollment
dioceses—where only one school was selected—two
schools were randomly chosen. If the selected school
declined to participate, the alternate school was
contacted. If neither school agreed to participate,
additional schools randomly selected from the
diocese were contacted.
Private Non-Catholic School Sample
The sample of private non-Catholic schools was
obtained from the QED data file. The schools in each
geographic region of the public and Catholic
samples were stratified into two types: churchrelated and nonsectarian.
Schools were randomly sampled in eight categories
(region by type of school) until the target number of
students was reached. For each school selected, an
alternate school was chosen to be contacted if the
selected school declined to participate.
Summary
These sampling procedures produced (1) a national
probability sample representative of students
nationwide; (2) a nationwide sample of schools for
school building norms; (3) data for Catholic/private
and other special norms; and (4) empirical norms for
the Complete Battery and the Survey Battery.
8
The authors and publisher of the ITBS are grateful
to many people for assistance in preparing test
materials and administering tests in item tryouts
and special research projects. In particular,
gratitude is acknowledged to administrators,
teachers, and students in the schools that took part
in the national standardization. These schools are
listed at the end of this part of the Guide to Research
and Development. Schools marked with an asterisk
participated in both spring and fall standardizations.
Design for Collecting the
Standardization Data
A timetable for administration of the ITBS and the
CogAT is given in Table 2.1. This illustrates how the
national standardization study was designed.
During the spring standardization, students took
the appropriate level of the Complete Battery of the
ITBS, Form A. These same students took Form 6 of
the CogAT.
The design of the fall standardization was more
complex. Every student in grades 2 through 8
participated in two units of testing. The order of the
two testing units was counterbalanced. In the first
testing unit, the student took the Complete Battery
of either Form A or Form B of the ITBS. In grades 2
and 3, Forms A and B of the ITBS machine-scorable
booklets were used in alternate classrooms. In
approximately half of the grade 3 classrooms,
alternate forms of the ITBS Level 8 were
administered; in the remaining grade 3 classrooms,
Forms A and B of Level 9 were administered to
every other student. In grades 4 through 8, Forms A
and B were administered to every other student in
all classrooms.
In the second testing unit of the fall
standardization, students took Form A or Form B of
the Survey Battery. (Students who had taken Form A
of the Complete Battery took Form B of the Survey
Battery and vice versa.)
Weighting the Samples
After materials from the spring standardization had
been received by the Riverside Scoring Service®, the
number and percents of students in each sample
(public, Catholic, and private non-Catholic) and
stratification category were determined. The
percents were adjusted by weighting to compensate
for missing categories and to adjust for schools that
tested more or fewer students than required.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 9
Table 2.1
Summary of Standardization Schedule
Time
First Unit
Spring 2000
ITBS, Form A
Complete Battery
(Levels 5 – 8,
Grades K – 2)
CogAT, Form 6
(Levels 1 – 2,
Grades K – 3)
ITBS, Form A
Complete Battery
(Levels 9 – 14,
Grades 3 – 8)
CogAT, Form 6
(Levels A – F,
Grades 3 – 8)
Fall 2000
Second Unit
ITBS, Form A/B
Complete Battery
(Levels 7 – 8,
Grades 2 – 3)
ITBS, Form A/B
Complete Battery
(Levels 9 –14,
Grades 3 – 8)
ITBS, Form B/A
Survey Battery
(Levels 9 – 14,
Grades 3 – 8)
The number of students in the 2000 spring national
standardization of the ITBS is given in Table 2.2 for
the public, Catholic, and private non-Catholic
samples. Table 2.2 also shows the unweighted and
weighted sample percents and the population
percents for each cohort.
Tables 2.3 through 2.7 summarize the unweighted
and weighted sample characteristics for the spring
2000 standardization of the ITBS based on the
principal stratification variables of the public school
sample and other key characteristics of the nonpublic sample.
Optimal weights for these samples were determined
by comparing the proportion of students nationally
in each cohort to the corresponding sample
proportion. Once the optimal weight for each sample
was obtained, the stratification variables were
simultaneously considered to assign final weights.
These weights (integer values 0 through 9, with 3
denoting perfect proportional representation) were
assigned to synthesize the characteristics of a
missing unit or adjust the frequencies in other units.
As a result, the weighted distributions in the three
standardization samples closely approximate those
of the total student population.
In addition to the regular norms established in the
2000 national standardization, separate norms were
established for special populations. These norms
and the procedures used to derive them are
discussed in Part 4.
9
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 10
Table 2.2
Sample Size and Percent of Students by Type of School
Spring 2000 National Standardization Sample, ITBS, Grades K–8
Public School
Sample
Catholic School
Sample
Private
Non-Catholic
Sample
Total
Unweighted
Sample Size
149,831
10,797
9,589
170,217
Unweighted
Sample %
88.0
6.3
5.6
100.0
Weighted
Sample %
90.1
4.9
5.0
100.0
Population %
90.1
4.9
5.0
100.0
Table 2.3
Percent of Public School Students by Geographic Region
Spring 2000 National Standardization Sample, ITBS, Grades K–8
Geographic
Region
% of Students
in Sample
% of Students
in Weighted Sample
% of Students
in Population
New England and
Mideast
14.7
22.2
21.7
Southeast
26.9
23.8
23.6
Great Lakes and
Plains
25.0
22.3
21.9
West and Far West
33.4
31.7
32.7
Table 2.4
Percent of Public School Students by SES Category
Spring 2000 National Standardization Sample, ITBS, Grades K–8
% of Students
in Sample
% of Students
in Weighted Sample
% of Students
in Population
High
12.2
15.3
15.2
High Average
23.5
19.2
19.1
Average
36.8
31.3
31.5
Low Average
21.2
19.1
19.1
6.3
15.2
15.1
SES
Category
Low
10
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 11
Table 2.5
Percent of Public School Students by District Enrollment
Spring 2000 National Standardization Sample, ITBS, Grades K–8
District K–12
Enrollment
% of Students
in Sample
% of Students
in Weighted Sample
% of Students
in Population
100,000 +
3.9
9.7
15.6
50,000 – 99,999
5.6
9.6
8.5
25,000 – 49,999
11.4
17.9
11.4
10,000 – 24,999
23.3
20.1
17.7
5,000 – 9,999
15.8
10.2
14.7
2,500 – 4,999
17.6
13.6
14.7
1,200 – 2,499
12.5
9.2
10.2
600 – 1,199
7.4
6.9
4.5
Less than 600
2.5
2.8
2.7
Table 2.6
Percent of Catholic Students by Diocese Size and Geographic Region
Spring 2000 National Standardization Sample, ITBS, Grades K–8
% of Students
in Sample
% of Students
in Weighted Sample
% of Students
in Population
100,000 +
7.7
17.5
17.5
50,000 – 99,999
7.4
17.9
18.0
20,000 – 49,999
35.0
21.8
21.7
10,000 – 19,999
19.4
24.9
25.0
Less than 10,000
30.5
17.9
17.8
New England and
Mideast
23.4
35.0
34.9
Southeast
17.5
13.7
13.6
Great Lakes and
Plains
44.2
33.7
33.9
West and Far West
14.9
17.6
17.6
Diocese Size
Geographic Region
11
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 12
Table 2.7
Percent of Private Non-Catholic Students by Geographic Region
Spring 2000 National Standardization Sample, ITBS, Grades K–8
Geographic Region
New England and
Mideast
% of Students
in Sample
% of Students
in Weighted Sample
% of Students
in Population
9.7
24.0
23.8
Southeast
19.5
29.3
29.4
Great Lakes and
Plains
34.0
19.8
19.7
West and Far West
36.8
26.9
27.1
Racial-Ethnic Representation
Participation of Students in Special Groups
Although not a direct part of a typical sampling
plan, the racial-ethnic composition of a national
standardization sample should represent that of the
school population. The racial-ethnic composition of
the 2000 ITBS spring standardization sample was
estimated from responses to demographic questions
on answer documents. In all grades, all the racialethnic group(s) to which a student belonged was
requested. In kindergarten through grade 3,
teachers furnished this information. In the
remaining grades, students furnished it. The results
reported in Table 2.8 include students in Catholic
and other private schools. The table also shows
estimates of population percents in public schools
for each category, according to the National Center
for Education Statistics.
In the spring 2000 national standardization, schools
were given detailed instructions for the testing of
students with disabilities and English Language
Learners. Schools were asked to decide whether
students so identified should be tested, and, if so,
what modifications in testing procedures were
needed.
The response rate for racial-ethnic information was
high; 98 percent of the standardization participants
indicated membership in one of the groups listed.
Although the percents of students in each group
fluctuate from grade to grade, differences between
sample and population percents were generally
within chance error. This was true for all groups
except Hispanics or Latinos, who were slightly
underrepresented. However, some of this
underrepresentation can be attributed to school
districts exempting from testing students whose
first language is not English. These students are not
as likely to be represented in the test-taking
population as they are in the school population.
Collectively, the results in Table 2.8 provide
evidence of the overall quality of the national
standardization sample and its representativeness
of the racial and ethnic makeup of the U.S. student
population.
12
Among students with disabilities, nearly all were
identified as eligible for special education services
and had an Individualized Education Program
(IEP), an Individualized Accommodation Plan (IAP),
or a Section 504 Plan. Schools were asked to
examine the IEP or other plan for these students,
decide whether the student should receive
accommodations, and determine the nature of those
accommodations.
Schools were told an accommodation refers to a
change in the procedures for administering the test
and that an accommodation is intended to
neutralize, as much as possible, the effect of the
student’s disability on the assessment process.
Accommodations should not change the kind of
achievement being measured, but change how
achievement is measured.
When accommodations were used, the test
administrator recorded the type of accommodation
on each student’s answer document. The
accommodations most frequently used by students
with IEPs or Section 504 Plans were listed on the
student answer document. Space for indicating
other accommodations was also included.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 13
Table 2.8
Racial-Ethnic Representation
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
White (62.1 %)*
Grade
Number
Black or African American (17.2 %)*
Percent
Weighted
Number
Percent
Grade
Number
Percent
Weighted
Number
Percent
K
11,137
67.8
33,201
57.0
K
2,735
16.7
11,343
19.5
1
12,648
70.2
29,729
58.7
1
2,629
14.6
7,773
15.3
2
13,529
71.9
30,560
59.1
2
2,323
12.3
6,780
13.1
3
13,308
72.3
33,713
65.6
3
2,229
12.1
7,336
14.3
4
13,437
71.6
33,866
64.6
4
2,024
10.8
7,028
13.4
5
14,516
72.3
35,010
67.0
5
2,146
10.7
7,231
13.8
6
14,776
73.1
35,509
70.0
6
2,223
11.0
7,404
14.6
7
14,346
75.0
36,519
73.1
7
1,731
9.1
6,160
12.3
8
12,146
71.8
37,154
71.4
8
1,877
11.1
8,503
16.3
Total
119,843
71.9
371,460
66.0
Total
19,917
11.9
62,371
14.6
Hispanic or Latino (15.6 %)*
Grade
Number
Percent
Weighted
Number
Asian/Pacific Islander (4.0 %)*
Percent
Grade
Number
Percent
Weighted
Number
Percent
K
1,839
11.2
6,399
11.0
K
429
2.6
1,493
2.6
1
1,975
11.0
5,732
11.3
1
460
2.6
1,192
2.4
2
2,084
11.1
5,785
11.2
2
537
2.9
1,291
2.5
3
1,941
10.5
6,894
13.4
3
497
2.7
1,406
2.7
4
2,080
11.1
7,127
13.6
4
553
2.9
1,468
2.8
5
2,031
10.1
7,499
14.3
5
591
2.9
1,487
2.8
6
1,745
8.6
4,643
9.1
6
614
3.0
1,602
3.2
7
1,647
8.6
4,466
8.9
7
477
2.5
1,328
2.7
8
1,490
8.8
4,379
8.4
8
485
2.9
1,697
3.3
Total
16,832
10.1
62,371
11.1
Total
4,643
2.9
15,584
2.8
American Indian/Alaskan Native (1.2 %)*
Grade
Number
Percent
Weighted
Number
Percent
Native Hawaiian (NA)
Grade
Number
Percent
Weighted
Number
Percent
K
207
1.3
878
1.5
K
70
0.4
194
0.3
1
225
1.2
885
1.7
1
69
0.5
157
0.3
2
250
1.3
946
1.8
2
95
0.5
148
0.3
3
364
2.0
1,279
2.5
3
80
0.4
142
0.3
4
500
2.7
1,806
3.4
4
181
1.0
498
1.0
5
673
3.4
2,128
4.1
5
111
0.6
247
0.5
6
749
3.7
2,244
4.4
6
109
0.5
251
0.5
7
789
4.1
2,476
5.0
7
136
0.7
379
0.8
8
656
3.9
2,148
4.1
8
274
1.6
969
1.9
4,413
2.6
17,782
3.2
Total
1,125
0.7
4,060
0.7
Total
*Population percent (Source: Digest of Education Statistics 2000, 1999–2000 public school enrollment)
13
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 14
For students whose native language was not English
and who had been in an English-only classroom for
a limited time, two decisions had to be made prior to
testing. First, was English language developed
sufficiently to warrant testing, and, second, should
an accommodation be used? In all instances, the
district’s instructional guidelines were used in
decisions about individual accommodations.
Test administration for the 2000 spring
standardization of the ITBS, Form A, took place
between March 23 and May 29; it took place for the
fall standardization between September 21 and
November 11. The spring norming group was a
national probability sample of approximately
170,000 students in kindergarten through grade 8;
the fall sample was approximately 76,000 students.
The test administrators were told that the use of
testing accommodations with English Language
Learners is intended to allow the measurement of
skills and knowledge in the curriculum without
significant interference from a limited opportunity
to learn English. Those just beginning instruction in
English were not likely to be able to answer many
questions no matter what types of accommodations
were used. For those in the second or third year of
instruction in an English as a Second Language
(ESL) program, accommodations might be
warranted to reduce the effect of limited English
proficiency on test performance. The types of
accommodations sometimes used with such
students were listed on the student answer
document for coding.
After answer documents were checked and scored
and sampling weights had been assigned to schools,
weighted opening and closing dates were
determined. These are reference points for the
empirical norms dates. The median empirical norms
date for spring testing is April 30; for fall testing it
is October 22.
Table 2.9 summarizes the use of accommodations
with students with disabilities during the
standardization. While the percents vary somewhat
across grades, an average of about 7 percent of the
students were identified as special education
students or as having a 504 Plan. Of these students,
roughly 50 percent received at least one
accommodation. The last column in the table shows
that in the final distribution of scores from which
the national norms were obtained, an average of
3 percent to 4 percent of the students received an
accommodation. Table 2.10 reports similar
information for English Language Learners.
Empirical Norms Dates
To provide more information for schools with
alternative school calenders, data were collected
from districts on their opening and closing dates.
Procedures to analyze these data were altered from
those used in the 1976–77 standardization—when
the Title I program first required empirical norms
dates—to determine weighted opening and closing
dates. The procedures used and the advice given to
school districts that do not have a standard 180-day,
September-to-May school year are noted below.
14
Regular fall, midyear, and spring norms can be used
by school districts that operate on a twelve-month
schedule. To do so, testing should be scheduled so
the number of instructional days prior to testing
corresponds to the median number of instructional
days for schools in the national standardization.
For example, the fall norms for the 2000 national
standardization were established with a median
testing date of October 22, on average 40
instructional days from the median start date of
schools in the national standardization. If a school
year begins on July 15, testing should be scheduled
between September 1 and September 21. Doing so
places the median testing date at September 10,
about 40 instructional days from the July 15 start
date. By testing during this period, instructional
opportunity is comparable to the norms group and
the use of fall norms is therefore appropriate.
Testing dates for twelve-month schools can be
calculated in a similar way so midyear and spring
norms can be used.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 15
Table 2.9
Test Accommodations — Special Education and 504 Students
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Grade
Standardization
Sample
Identified Students
Accommodated Students
N
N
% of
Standardization
Sample
K
58,216
2,121
3.6
262
12.4
0.5
1
50,687
2,397
4.7
905
37.8
1.8
2
51,725
3,076
5.9
1,322
43.0
2.6
3
51,414
3,485
6.8
1,615
46.3
3.1
4
52,392
4,101
7.8
2,184
53.3
4.2
5
52,277
4,286
8.2
2,241
52.3
4.3
6
50,753
3,652
7.2
1,662
45.5
3.3
7
49,925
3,478
7.0
2,146
61.7
4.3
8
52,072
3,489
6.7
2,109
60.4
4.1
N
% of
Identified
Students
% of
Standardization
Sample
Note: Accommodations included Braille, large print, tested off level, answers recorded, extended time, communication assistance,
transferred answers, individual/small group administration, repeated directions, tests read aloud (except for Vocabulary and Reading
Comprehension), plus selected others.
Table 2.10
Test Accommodations — English Language Learners
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Grade
Standardization
Sample
Identified Students
Accommodated Students
N
N
% of
Standardization
Sample
N
K
58,216
3,780
6.5
1,122
29.7
1.9
1
50,687
2,853
5.6
382
13.4
0.8
2
51,725
3,352
6.5
244
7.3
0.5
3
51,414
2,460
4.8
358
14.6
0.7
4
52,392
3,604
6.9
565
15.7
1.1
5
52,277
3,060
5.9
315
10.3
0.6
6
50,753
973
1.9
163
16.8
0.3
7
49,925
739
1.5
216
29.2
0.4
8
52,072
662
1.3
156
23.6
0.3
% of
Identified
Students
% of
Standardization
Sample
Note: Accommodations included tested off level, extended time, individual/small group administration, repeated directions, provision of
English/native language word-to-word dictionary, test administered by ESL teacher or individual providing language services.
15
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 16
School Systems Included in the 2000
Standardization Samples
New England and Mideast
Connecticut
Orange: New Haven Hebrew Day School
Thomaston, Thomaston School District: Thomaston
Center Intermediate School
Waterbury, Archdiocese of Hartford: St. Joseph School
Delaware
Newark, Christina School District: Summit School
Wilmington, Brandywine School District: Talley Middle
School
District of Columbia
Washington: Nannie H. Borroughs School
Maine
Bangor, Hermon School District*: Hermon Middle School
Bowdoinham, School Admin. District 75: Bowdoinham
Community Elementary School
Calais, Union 106 Calais: Calais Elementary School
Danforth, School Admin. District 14: East Grand School
Hancock, Union 92 Hancock: Hancock Elementary School
Jonesport, Union 103 Jonesport: Jonesport Elementary
School
Limestone, Caswell School District: Dawn F. Barnes
Elementary School
Monmouth, Monmouth Public School District: Henry
Cottrell Elementary School
North Berwick, School Admin. District 60: Berwick
Elementary School, Hanson School, Noble Junior
High School, North Berwick Elementary School,
Vivian E. Hussey Primary School
Portland, Diocese of Portland: Catherine McAuley High
School, St. Joseph’s School
Robbinston, Union 106 Robbinston*: Robbinston Grade
School
Turner*: Calvary Christian Academy
Vanceboro, Union 108 Vanceboro*: Vanceboro Elementary
School
Maryland
Baltimore, Baltimore City-Dir. Inst. Area 9*: Roland Park
Elementary/Middle School 233
Baltimore, Baltimore City Public School District*:
Edgecombe Circle Elementary School 62, Samuel F.
B. Morse Elementary School 98
Hagerstown: Heritage Academy
Hagerstown, Archdiocese of Baltimore: St. Maria Goretti
High School, St. Mary School
Stevensville, Queen Annes County School District: Kent
Island Elementary School
16
Massachusetts
Adams, Adams Cheshire Regional School District*:
C. T. Plunkett Elementary School
Boston, Archdiocese of Boston: Holy Trinity School,
Immaculate Conception School, St. Bridget School
Bridgewater S., Bridgewater-Raynham Regional School
District: Burnell Laboratory School
Danvers, Danvers School District: Highlands Elementary
School, Willis Thorpe Elementary School
Fall River: Antioch School
Fall River, Diocese of Fall River: Our Lady of Lourdes
School, Our Lady of Mt. Carmel School, Taunton
Catholic Middle School
Fall River, Diocese of Fall River*: Espirito Santo School,
St. Jean Baptiste School
Fall River, Fall River School District*: Brayton Avenue
Elementary School, Harriet T. Healy Elementary
School, Laurel Lake Elementary School, McCarrick
Elementary School, Ralph Small Elementary School,
Westall Elementary School
Fitchburg, Fitchburg School District: Memorial
Intermediate School
Lowell: Lowell Public School District
Peabody, Peabody Public School District: Kiley Brothers
Memorial School
Phillipston, Narragansett Regional School District:
Phillipston Memorial Elementary School
South Lancaster*: Browning SDA Elementary School
Swansea: Swansea School District
Walpole: Walpole Public School District
Weymouth, Weymouth School District*: Academy Avenue
Primary School, Lawrence Pingree Primary School,
Murphy Primary School, Ralph Talbot Primary
School, South Intermediate School, Union Street
Primary School, William Seach Primary School
Worcester, Worcester Public School District: University
Park Campus School
New Hampshire
Bath, District 23 Bath: Bath Village Elementary School
Litchfield, District 27 Litchfield: Griffin Memorial
Elementary School
Manchester, District 37 Manchester*: Bakersville
Elementary School, Gossler Park Elementary School,
Hallsville Elementary School, McDonough
Elementary School, Southside Middle School, Weston
Elementary School
North Haverhill, District 23 Haverhill: Haverhill
Cooperative Middle School, Woodsville Elementary
School, Woodsville High School
Rochester, District 54 Rochester: Maple Street
Elementary School
Salem: Granite Christian School
Warren, District 23 Warren*: Warren Village School
Note: Schools marked with an asterisk (*) participated in
both spring and fall standardizations.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 17
New Jersey
Collingswood*: Collingswood Public School District
Elizabeth: Bruriah High School For Girls
Jersey City: Jersey City Public School District
Salem, Mannington Township School District:
Mannington Elementary School
New York
Beaver Falls, Beaver River Central School District:
Beaver River Central School
Briarcliff Manor, Briarcliff Manor Union Free School
District: Briarcliff High School
Bronx*: Regent School
Dobbs Ferry, Archdiocese of New York*: Our Lady of
Victory Academy
Elmhurst, Diocese of Brooklyn: Cathedral Preparatory
Seminary
Lowville, Lowville Central School District: Lowville
Academy and Central School
New York, Archdiocese of New York: Corpus Christi
School, Dominican Academy, St. Christopher
Parochial School, St. John Villa Academy-Richmond,
St. Joseph Hill Academy
North Tonawanda: North Tonawanda School District
Old Westbury: Whispering Pines SDA School
Spring Valley, East Ramapo Central School District*:
M. L. Colton Intermediate School
Weedsport, Weedsport Central School District: Weedsport
Elementary School, Weedsport Junior/Senior High
School
Pennsylvania
Austin, Austin Area School District: Austin Area School
Bloomsburg, Bloomsburg Area School District:
Bloomsburg Memorial Elementary School
Cheswick, Allegheny Valley School District: Acmetonia
Primary School
Dubois, Diocese of Erie*: Dubois Central Christian High
School
Ebensburg, Diocese of Altoona Johnstown: Bishop Carroll
High School
Erie, Millcreek Township School District*: Chestnut Hill
Elementary School
Farrell: Farrell Area School District
Gettysburg, Gettysburg Area School District: Gettysburg
Area High School
Hadley, Commodore Perry School District: Commodore
Perry School
Lebanon, Diocese of Harrisburg: Lebanon Catholic
Junior/Senior High School
Manheim: Manheim Central School District
McKeesport, Diocese of Pittsburgh: Serra Catholic High
School
McKeesport, South Allegheny School District: Glassport
Central Elementary School, Manor Elementary
School, Port Vue Elementary School, South
Allegheny Middle High School
Middleburg, Midd-West School District: Penns Creek
Elementary School, Perry-West Perry Elementary
School
Philadelphia, Philadelphia School District-Bartram*:
Bartram High School
Philadelphia, Philadelphia School District-Franklin*:
Stoddart-Fleisher Middle School
Philadelphia, Philadelphia School District-Gratz*:
Thomas M. Peirce Elementary School
Philadelphia, Philadelphia School District-Kensington*:
Alexander Adaire Elementary School
Philadelphia, Philadelphia School District-Olney*: Jay
Cooke Middle School
Philadelphia, Philadelphia School District-Overbrook*:
Lewis Cassidy Elementary School
Philadelphia, Philadelphia School District-William
Penn*: John F. Hartranft Elementary School
Pittsburgh: St. Matthew Lutheran School
Pittsburgh, Diocese of Pittsburgh*: St. Mary of the Mount
School
Rhode Island
Johnston: Trinity Christian Academy
Providence: Providence Hebrew Day School
Providence, Diocese of Providence: All Saints Academy,
St. Xavier Academy
Vermont
Chelsea, Chelsea School District: Chelsea School
Williston: Brownell Mountain SDA School
Southeast
Alabama
Abbeville, Henry County School District: Abbeville
Elementary School, Abbeville High School
Columbiana, Shelby County School District: Helena
Elementary School, Oak Mountain Middle School
Dothan, Dothan City School District*: Beverlye Middle
School, East Highland Learning Center, Girard
Middle School, Honeysuckle Middle School
Eclectic, Elmore County School District: Eclectic
Elementary School
Elberta, Baldwin County School District: Elberta Middle
School
Fairhope, Baldwin County School District*: Fairhope
Elementary School, Fairhope Intermediate School
Jacksonville: Jacksonville Christian Academy
Mobile, Mobile County School District: Cora Castlen
Elementary School, Dauphin Island Elementary
School
Mobile, Mobile County School District*: Adelia Williams
Elementary School, Florence Howard Elementary
School, Mobile County Training School
Monroeville, Monroe County School District: Monroe
County High School
Tuscaloosa, Tuscaloosa County School District: Hillcrest
High School
17
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 18
Arkansas
Kentucky
Altus, Altus-Denning School District 31: Altus-Denning
Elementary School, Altus-Denning High School
Beebe, Beebe School District*: Beebe Elementary School,
Beebe Intermediate School
Bismarck, Bismarck School District*: Bismarck
Elementary School
Conway, Conway School District*: Ellen Smith
Elementary School, Florence Mattison Elementary
School, Ida Burns Elementary School, Marguerite
Vann Elementary School
Fouke, Fouke School District 15*: Fouke High School,
Fouke Middle School
Gentry: Ozark Adventist Academy
Grady: Grady School District 5
Little Rock*: Heritage Christian School
Mountain Home, Mountain Home School District 9:
Pinkston Middle School
Norman: Caddo Hills School District
Springdale, Springdale School District 50*: Parson Hills
Elementary School
Strawberry: River Valley School District
Benton*: Christian Fellowship School
Bowling Green, Warren County School District*: Warren
East High School
Campbellsville: Campbellsville Independent School
District
Elizabethtown, Hardin County School District: East
Hardin Middle School, Parkway Elementary School,
Rineyville Elementary School, Sonora Elementary
School, Upton Elementary School, Woodland
Elementary School
Elizabethtown, Hardin County School District*: Brown
Street Alternative Center, G. C. Burkhead
Elementary School, Lynnvale Elementary School,
New Highland Elementary School
Florence: Northern Kentucky Christian School
Fordsville, Ohio County School District: Fordsville
Elementary School
Hardinsburg, Breckinridge County School District:
Hardinsburg Primary School
Hartford, Ohio County School District*: Ohio County
High School, Ohio County Middle School, Wayland
Alexander Elementary School
Hazard, Perry County School District*: Perry County
Central High School
Louisville: Eliahu Academy
Pineville: Pineville Independent School District
Williamstown, Williamstown Independent School District:
Williamstown Elementary School
Florida
Archer, Alachua County School District: Archer
Community School
Century, Escambia School District: George W. Carver
Middle School
Fort Lauderdale, Archdiocese of Miami: St. Helen School
Gainesville, Alachua County School District*: Hidden
Oak Elementary School, Kimball Wiles Elementary
School
Jacksonville Beach: Beaches Episcopal School
Kissimmee, Osceola School District: Reedy Creek
Elementary School
Miami, Archdiocese of Miami*: St. Agatha School
Ocala, Marion County School District*: Fort King Middle
School
Orlando, Orange County School District-East: Colonial
9th Grade Center, University High School
Palm Bay, Diocese of Orlando: St. Joseph Catholic School
Palm Coast, Flagler County School District: Buddy
Taylor Middle School, Old Kings Elementary School
Pensacola, Escambia School District*: Redirections
Georgia
Barnesville: Lamar County School District
Crawfordville, Taliaferro County School District:
Taliaferro County School
Cumming, Forsyth County School District*: South
Forsyth Middle School
Dalton, Whitfield County School District: Cohutta
Elementary School, Northwest High School, Valley
Point Middle School, Westside Middle School
Shellman*: Randolph Southern School
18
Louisiana
Chalmette, St. Bernard Parish School District: Arabi
Elementary School, Beauregard Middle School,
Borgnemouth Elementary School, J. F. Gauthier
Elementary School, Joseph J. Davies Elementary
School, N. P. Trist Middle School, Sebastien Roy
Elementary School, St. Bernard High School
Chalmette, St. Bernard Parish School District*: Andrew
Jackson Fundamental High School, C. F. Rowley
Elementary School, Lacoste Elementary School
Lafayette, Diocese of Lafayette*: Redemptorist
Elementary School, St. Peter School
Plain Dealing: Plain Dealing Academy
Shreveport, Diocese of Shreveport*: Holy Rosary School,
Jesus the Good Shepherd School, St. John Cathedral
Grade School
West Monroe, Diocese of Shreveport: St. Paschal School
Mississippi
Brandon*: University Christian School
Gulfport, Gulfport School District: Bayou View
Elementary School
North Carolina
Greensboro, Guilford County School District: Alamance
Elementary School, Montlieu Avenue Elementary
School, Shadybrook Elementary School
Hillsborough: Abundant Life Christian School
Manteo: Dare County School District
New Bern, Diocese of Raleigh: St. Paul Education Center
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 19
South Carolina
West Virginia
Beaufort, Beaufort County School District: E. C.
Montessori and Grade School
Camden: Kershaw County School District
North Augusta, Diocese of Charleston: Our Lady of Peace
School
Rock Hill: Westminster Catawba Christian School
Salem, Oconee County School District*: Tamassee Salem
Middle High School
Westminster, Oconee County School District: West-Oak
High School
Arnoldsburg, Calhoun County School District*:
Arnoldsburg School
Elizabeth, Wirt County School District: Wirt County High
School
Grantsville, Calhoun County School District: Pleasant
Hill Elementary School
Omar: Beth Haven Christian School
Wayne, Wayne County School District: East Lynn
Elementary School, Lavalette Elementary School,
Wayne Middle School
Weirton, Diocese of Wheeling-Charleston: Madonna High
School
Tennessee
Athens, McMinn County School District: Mountain View
Elementary School, Niota Elementary School
Athens, McMinn County School District*: E. K. Baker
Elementary School, Rogers Creek Elementary School
Byrdstown*: Pickett County School District
Dyer, Gibson County School District: Medina Elementary
School, Rutherford Elementary School
Fairview, Williamson County School District: Fairview
High School
Harriman, Harriman City School District*: Raymond S.
Bowers Elementary School, Walnut Hill Elementary
School
Harrogate: J. Frank White Academy
Murfreesboro, Rutherford County School District: Central
Middle School, Smyrna West Kindergarten
Somerville, Fayette County School District: Jefferson
Elementary School, Oakland Elementary School
Yorkville, Gibson County School District*: Yorkville
Elementary School
Virginia
Charlottesville, Albemarle County School District:
Monticello High School
Chesapeake: Tidewater Adventist Academy
Forest, Bedford County School District: Forest Middle
School
Jonesville, Lee County School District*: Ewing
Elementary School, Rose Hill Elementary School
Madison Heights, Amherst County School District:
Madison Heights Elementary School
Marion, Smyth County School District: Atkins
Elementary School, Chilhowie Elementary School,
Chilhowie Middle School, Marion Intermediate
School, Marion Middle School, Marion Primary
School, Sugar Grove Combined School
Saltville, Smyth County School District*: Northwood
Middle School, Rich Valley Elementary School,
Saltville Elementary School
St. Charles, Lee County School District: St. Charles
Elementary School
Staunton: Stuart Hall School
Suffolk, Suffolk Public School District*: Forest Glen
Middle School
Great Lakes and Plains
Illinois
Bartlett, Elgin School District U-46, Area B: Bartlett
Elementary School, Bartlett High School
Benton, Benton Community Consolidated School District
47: Benton Elementary School
Berwyn, Berwyn South School District 100: Heritage
Middle School
Cambridge, Cambridge Community Unit School District
227*: Cambridge Community Elementary School,
Cambridge Community Junior/Senior High School
Chicago, Chicago Public School District-Region 1*:
Stockton Elementary School
Chicago, Chicago Public School District-Region 4*:
Brighton Park Elementary School
Duquoin, Duquoin Community Unit School District 300:
Duquoin Middle School
Elgin, Elgin School District U-46, Area A*: Century Oaks
Elementary School, Garfield Elementary School,
Washington Elementary School
Elgin, Elgin School District U-46, Area B*: Elgin High
School, Ellis Middle School
Glendale Heights, Queen Bee School District 16:
Pheasant Ridge Primary School
Joliet: Ridgewood Baptist Academy
Lake Villa, Lake Villa Community Consolidated School
District 41: Joseph J. Pleviak Elementary School
Lincoln, Lincoln Elementary School District 27*:
Northwest Elementary School, Washington-Monroe
Elementary School
Mossville, Illinois Valley Central School District 321:
Mossville Elementary School
Quincy, Diocese of Springfield: St. Francis Solanus
School
Schaumburg, Schaumburg Community Consolidated
School District 54: Douglas MacArthur Elementary
School, Everett Dirksen Elementary School
Streamwood, Elgin School District U-46, Area C*:
Oakhill Elementary School, Ridge Circle Elementary
School
Villa Park: Islamic Foundation School
Wayne City*: Wayne City Community Unit School
District 100
Westmont: Westmont Community Unit School District 201
19
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 20
Indiana
Hammond, Diocese of Gary*: Bishop Noll Institute, St.
Catherine of Siena School, St. John Bosco School
Indianapolis, Perry Township School District: Homecroft
Elementary School, Mary Bryan Elementary School
Logansport, Logansport Community School District*:
Lincoln Middle School
Spencer, Spencer-Owen Community School District:
Gosport Elementary School, Patricksburg
Elementary School, Spencer Elementary School
Valparaiso, Valparaiso Community School District:
Benjamin Franklin Middle School, Thomas Jefferson
Middle School
Vevay, Switzerland County School District: Switzerland
County High School
Warsaw: Redeemer Lutheran School
Warsaw, Warsaw Community Schools: Eisenhower
Elementary School
Iowa
Alton, Diocese of Sioux City: Spalding Catholic
Elementary School
Bellevue, Archdiocese of Dubuque: Marquette High School
Davenport, Diocese of Davenport: Cardinal Stritch
Junior/Senior High School, Mater Dei Junior/Senior
High School, Notre Dame Elementary School, Trinity
Elementary School
Delhi: Maquoketa Valley Community School District
Remsen, Diocese of Sioux City*: St. Mary’s High School
Williamsburg: Lutheran Interparish School
Kansas
Anthony, Anthony-Harper Unified School District 361:
Chaparral High School, Harper Elementary School
Columbus, Columbus Unified School District 493:
Central School
Galena, Columbus Unified School District 493*: Spencer
Elementary School
Kansas City*: Mission Oaks Christian School
Kansas City, Archdiocese of Kansas City: Assumption
School, St. Agnes School
Osawatomie*: Osawatomie Unified School District 367
Spring Hill, Spring Hill Unified School District 230:
Spring Hill High School
St. Paul, Erie-St. Paul Consolidated School District 101:
St. Paul Elementary School, St. Paul High School
Westwood*: Mission Oaks Christian School-Westwood
Michigan
Algonac, Algonac Community School District: Algonac
Elementary School
Auburn*: Zion Lutheran School
Berkley, Berkley School District: Pattengill Elementary
School
Bloomingdale: Bloomingdale Public School District
Buckley, Buckley Community School District: Buckley
Community School
Canton, Plymouth-Canton Community Schools*: Gallimore
Elementary School, Hoben Elementary School
20
Carleton: Airport Community School District
Dafter, Sault Ste. Marie Area School District*: Bruce
Township Elementary School
Gaylord, Gaylord Community School District*: Elmira
Elementary School, Gaylord High School, Gaylord
Intermediate School, Gaylord Middle School, North
Ohio Elementary School, South Maple Elementary
School
Grand Blanc: Grand Blanc Community School District
Macomb: St. Peter Lutheran School
Plymouth, Plymouth-Canton Community Schools: Allen
Elementary School, Bird Elementary School,
Farrand Elementary School, Fiegel Elementary
School, Smith Elementary School
Redford, Archdiocese of Detroit*: St. Agatha High School
Reese: Trinity Lutheran School
Rockwood, Gibraltar School District: Chapman
Elementary School
Royal Oak, Royal Oak Public School District: Dondero
High School, Franklin Elementary School
St. Joseph, Diocese of Kalamazoo*: Lake Michigan
Catholic Elementary School, Lake Michigan Catholic
Junior/Senior High School
Traverse City: Traverse City Area Public Schools
Wayland, Wayland Union School District: Bessie B.
Baker Elementary School, Dorr Elementary School
Whittemore, Whittemore Prescott Area School District:
Whittemore-Prescott Alternative Education Center
Minnesota
Barnum, Barnum Independent School District 91:
Barnum Elementary School
Baudette, Lake of the Woods Independent School District
390: Lake of the Woods School
Farmington: Farmington Independent School District 192
Hanska, New Ulm Independent School District 88:
Hanska Community School
Hastings, Hastings Independent School District 200:
Cooper Elementary School, John F. Kennedy
Elementary School, Pinecrest Elementary School
Isanti, Cambridge-Isanti School District 911: Isanti
Middle School
Lafayette, Minnesota Department of Education: Lafayette
Charter School
Menahga, Menahga Independent School District 821:
Menahga School
Mendota Heights: West St. Paul-Mendota-Eagan School
District 197
Newfolden, Newfolden Independent School District 441*:
Newfolden Elementary School
Rochester: Schaeffer Academy
St. Paul: Christ Household of Faith School
Stillwater, Stillwater School District 834: Stonebridge
Elementary School
Watertown, Watertown-Mayer School District 111:
Watertown-Mayer Elementary School, WatertownMayer High School, Watertown-Mayer Middle School
Winsted, Diocese of New Ulm: Holy Trinity Elementary
School, Holy Trinity High School
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 21
Missouri
Cape Girardeau, Cape Girardeau School District 63:
Barbara Blanchard Elementary School
Cape Girardeau, Diocese of Springfield/Cape Girardeau:
Notre Dame High School
Lexington, Lexington School District R5: Lexington High
School
Liberal: Liberal School District R2
Rogersville: Greene County School District R8
Rueter, Mark Twain Elementary School District R8: Mark
Twain R8 Elementary School
Sparta: Sparta School District R3
Springfield: New Covenant Academy
Nebraska
Burwell, Burwell High School District 100: Burwell
Junior/Senior High School
Creighton, Archdiocese of Omaha*: St. Ludger
Elementary School
Lemoyne, Keith County Centers: Keith County District 51
School
Omaha, Archdiocese of Omaha: All Saints Catholic
School, Guadalupe-Inez School, Holy Name School,
Pope John XXIII Central Catholic High School,
Roncalli Catholic High School, Sacred Heart School,
SS Peter and Paul School, St. James-Seton School,
St. Thomas More School
Randolph: Randolph School District 45
Seward: St. John’s Lutheran School
Spalding, Diocese of Grand Island*: Spalding Academy
North Dakota
Belfield, Belfield Public School District 13*: Belfield
School
Bismarck: Dakota Adventist Academy
Fargo: Grace Lutheran School
Grand Forks: Grand Forks Christian School
Halliday, Twin Buttes School District 37: Twin Buttes
Elementary School
Minot, Diocese of Bismarck: Bishop Ryan Junior/Senior
High School
New Town, New Town School District 1*: Edwin Loe
Elementary School, New Town Middle School/High
School
Ohio
Akron, Akron Public Schools*: Academy At Robinson,
Erie Island Montessori School, Mason Elementary
School
Bowling Green: Wood Public Schools
Chillicothe, Chillicothe City School District: Tiffin
Elementary School
Cincinnati, Archdiocese of Cincinnati: Catholic Central
High School, St. Brigid School
Cleveland: Lutheran High School East
Dalton, Dalton Local School District: Dalton
Intermediate School
Danville, Danville Local School District: Danville High
School, Danville Intermediate School, Danville
Primary School
East Cleveland, East Cleveland City School District:
Caledonia Elementary School, Chambers
Elementary School, Mayfair Elementary School,
Rozelle Elementary School, Shaw High School,
Superior Elementary School
Lima, Lima City School District: Lowell Elementary
School
London, London City School District: London Middle
School
Ripley: Ripley-Union-Lewis-Huntington Elementary
School
Sidney, Sidney City School District: Bridgeview Middle
School
Steubenville, Diocese of Steubenville: Catholic Central
High School
Toledo, Washington Local School District: Jefferson
Junior High School
Upper Sandusky, Upper Sandusky Exempted Village
School District*: East Elementary School
South Dakota
Huron: James Valley Christian School
Sioux Falls*: Calvin Christian School
Wisconsin
Appleton: Fox Valley Lutheran School, Grace Christian
Day School
Edgerton: Oaklawn Academy
Kenosha, Kenosha Unified School District 1: Bose
Elementary School, Bullen Middle School, Grewenow
Elementary School, Jeffery Elementary School,
Lance Middle School, Lincoln Middle School,
McKinley Middle School
Milwaukee: Bessie M. Gray Prep. Academy*, Clara
Muhammed School, Early View Academy of
Excellence, Hickman’s Prep. School*, Milwaukee
Multicultural Academy, Mount Olive Lutheran
School, The Woodson Academy
Milwaukee, Milwaukee Public Schools*: Khamit Institute
Oshkosh, Diocese of Green Bay*: St. John Neumann
School
Plymouth, Plymouth Joint School District: Cascade
Elementary School, Fairview Elementary School,
Horizon Elementary School, Parkview Elementary
School, Parnell Elementary School, Riverview Middle
School
Stoughton, Stoughton Area School District: Sandhill
School, Yahara Elementary School
Strum, Eleva-Strum School District: Eleva-Strum
Primary School
21
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 22
West and Far West
Alaska
Anchorage: Heritage Christian School
Juneau, Juneau School District: Auke Bay Elementary
School, Dzantik’i Heeni Middle School, Floyd Dryden
Middle School
Nikiski, Kenai Peninsula Borough School District:
Nikiski Middle High School
Palmer: Valley Christian School
Arizona
Litchfield Park, Litchfield Elementary School District 79:
Litchfield Elementary School
Mesa, Diocese of Phoenix: Christ the King School
Mesa, Mesa Unified School District 4: Franklin East
Elementary School
Phoenix, Creighton School District 14*: Loma Linda
Elementary School
Phoenix, Washington Elementary School District 6:
Roadrunner Elementary School
Pima, Pima Unified School District 6*: Pima Elementary
School
Teec Nos Pos: Immanuel-Carrizo Christian Academy
Tempe: Grace Community Christian School
Tucson: Tucson Hebrew Academy
California
Atascadero, Atascadero Unified School District: Carrisa
Plains Elementary School, Creston Elementary
School
Bakersfield, Panama Buena Vista Union School District:
Laurelglen Elementary School
Cathedral City, Palm Springs Unified School District:
Cathedral City Elementary School
Cerritos, ABC Unified School District: Faye Ross Middle
School, Joe A. Gonsalves Elementary School, Palms
Elementary School
Fontana, Fontana Unified School District*: North
Tamarind Elementary School
Fresno, Fresno Unified School District: Malloch
Elementary School
Lompoc, Lompoc Unified School District: LA Canada
Elementary School, Leonora Fillmore Elementary
School
Los Angeles, Archdiocese of Los Angeles: Alverno High
School, St. Lucy School
Los Angeles, Los Angeles Unified School District, Local
District C: Lanai Road Elementary School
Los Angeles, Los Angeles Unified School District, Local
District D: LA Center For Enriched Studies
Los Angeles, Los Angeles Unified School District, Local
District F: Pueblo De Los Angeles High School
Los Angeles, Los Angeles Unified School District, Local
District G: Fifty-Second Street Elementary School
Los Angeles, Los Angeles Unified School District, Local
District I: Alain LeRoy Locke Senior High School,
David Starr Jordan Senior High School, Youth
Opportunities Unlimited
22
Modesto, Modesto City School District: Everett
Elementary School, Franklin Elementary School
Norco, Corona-Norco Unified School District: Coronita
Elementary School, Highland Elementary School
Oakland, Oakland Unified School District: Lockwood
Elementary School
Oceanside, Oceanside Unified School District: San Luis
Rey Elementary School
Palm Desert, Desert Sands Unified School District: Palm
Desert Middle School
Ripon, Ripon Unified School District: Ripon Elementary
School
Salinas: Winham Street Christian Academy
San Clemente, Diocese of Orange: Our Lady of Fatima
School
San Diego, San Diego Unified School District: Sojourner
Truth Learning Academy
San Diego, Streetwater Union High School District: Mar
Vista Middle School
San Francisco: Hebrew Academy-San Francisco
Santa Ana, Diocese of Orange*: Our Lady of the Pillar
School
Santa Ana, Santa Ana Unified School District*: Dr.
Martin L. King Elementary School, Madison
Elementary School
Walnut, Walnut Valley Unified School District*: Walnut
Elementary School
Colorado
Aurora, Aurora School District 28-J: Aurora Central
High School, East Middle School
Colorado Springs, Academy School District 20: Explorer
Elementary School, Foothills Elementary School,
Pine Valley Elementary School
Denver: Beth Eden Baptist School
Denver, Archdiocese of Denver: Bishop Machebeuf High
School
Golden, Jefferson County School District R-1: Bear Creek
Elementary School, Campbell Elementary School,
D’Evelyn Junior/Senior High School, Devinny
Elementary School, Jefferson Academy Elementary
School, Jefferson Academy Junior High School,
Jefferson Hills, Lakewood Senior High School,
Lincoln Academy, Moore Middle School, Sierra
Elementary School
Northglenn, Adams 12 Five Star Schools: Northglenn
Middle School
Parker, Douglas County School District R-1: Colorado
Visionary Academy
Penrose, Fremont School District R-2*: Penrose
Elementary/Middle School
Thornton, Adams 12 Five Star Schools*: Cherry Drive
Elementary School, Eagleview Elementary School
Windsor, Windsor School District R-4: Mountain View
Elementary School, Skyview Elementary School
Hawaii
Honolulu: Holy Nativity School, Hongwanji Mission
School
Lawai: Kahili Adventist School
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 23
Idaho
Boise: Cole Christian School
Burley, Cassia County Joint School District 151: Albion
Elementary School, Cassia County Education
Center, Declo Elementary School, Oakley
Junior/Senior High School, Raft River Elementary
School, Raft River High School, White Pine
Intermediate School
Kimberly*: Kimberly School District 414
Saint Maries: St. Maries Joint School District 41
Twin Falls: Immanuel Lutheran School
Montana
Belfry, Belfry School District 3*: Belfry School
Billings*: Billings Christian School
Kalispell: Flathead Christian School
Miles City, Diocese of Great Falls-Billings: Sacred Heart
Elementary School
Willow Creek, Willow Creek School District 15-17J:
Willow Creek School
Nevada
Amargosa Valley, Nye County School District: Amargosa
Valley Elementary School
Henderson*: Black Mountain Christian School
Reno: Silver State Adventist School
Sandy Valley, Clark County School District-Southwest*:
Sandy Valley School
Sparks*: Legacy Christian Elementary School
New Mexico
Alamogordo, Alamogordo School District 1*: Sacramento
Elementary School
Albuquerque: Evangelical Christian Academy
Espanola, Espanola School District 55: Chimayo
Elementary School, Hernandez Elementary School,
Velarde Elementary School
Oklahoma
Arapaho, Arapaho School District I-5: Arapaho School
Bray: Bray-Doyle School District 42
Chickasha: Chickasha School District 1
Duncan: Duncan School District 1
Elk City: Elk City School District 6
Guthrie: Guthrie School District 1
Laverne: Laverne School District
Milburn, Milburn School District I-29: Milburn
Elementary School, Milburn High School
Mill Creek: Mill Creek School District 2
Oklahoma City: Oklahoma City School District I-89
Oklahoma City, Archdiocese of Oklahoma City: St.
Charles Borromeo School
Purcell: Purcell School District 15
Roland: Roland Independent School District 5
Shidler, Shidler School District 11: Shidler High School
Tulsa: Tulsa Adventist Academy
Tulsa, Diocese of Tulsa: Bishop Kelley High School, Holy
Family Cathedral School
Wellston, Wellston School District 4*: Wellston Public
School
Oregon
Boring*: Hood View Junior Academy
Corvallis, Corvallis School District 509J: Inavale
Elementary School, Western View Middle School
Eugene, Eugene School District 4J: Buena Vista Spn.
Immersion School, Gilham Elementary School,
Meadowlark Elementary School, Washington
Elementary School
Grants Pass: Brighton Academy
Jefferson: Jefferson School District 14J
Portland*: Portland Christian Schools
Portland, Archdiocese of Portland: Fairview Christian
School, O’Hara Catholic School
Texas
Amarillo, Diocese of Amarillo*: Alamo Catholic High
School
Baird*: Baird Independent School District
Brownsboro, Brownsboro Independent School District*:
Brownsboro Elementary School, Brownsboro High
School, Brownsboro Intermediate School, Chandler
Elementary School
Dallas, Dallas Independent School District-Area 1:
Gilbert Cuellar Senior Elementary School
Deweyville, Deweyville Independent School District:
Deweyville Elementary School
Dilley, Dilley Independent School District: Dilley High
School
Driscoll: Driscoll Independent School District
Franklin: Franklin Independent School District
Fresno, Fort Bend Independent School District*: Walter
Moses Burton Elementary School
Gladewater: Gladewater Independent School District
Houston, Spring Branch Independent School District*:
Cornerstone Academy, Spring Shadows Elementary
School
Imperial: Buena Vista Independent School District
Jacksonville, Jacksonville Independent School District:
Jacksonville Middle School, Joe Wright Elementary
School
Laredo*: Laredo Christian Academy
Lubbock, Lubbock Independent School District: S. Wilson
Junior High School
Nederland, Nederland Independent School District*:
Wilson Middle School
Odessa, Ector County Independent School District:
Burleson Elementary School
Perryton: Perryton Independent School District
Stockdale, Stockdale Independent School District:
Stockdale Elementary School, Stockdale High School
Sugar Land, Fort Bend Independent School District:
Sugar Mill Elementary School
Whitesboro, Whitesboro Independent School District:
Whitesboro High School
23
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 24
Utah
American Fork, Alpine School District: Lehi High School,
Lone Peak High School, Manila Elementary School,
Meadow Elementary School
Brigham City, Box Elder County School District*: Adele
C. Young Intermediate School, Box Elder Middle
School, Perry Elementary School, Willard
Elementary School
Cedar City, Iron County School District: Cedar Middle
School
Eskdale: Shiloah Valley Christian School
Layton, Davis County School District: Crestview
Elementary School
Murray: Deseret Academy
Murray, Murray City School District: Liberty Elementary
School
Ogden*: St. Paul Lutheran School
Ogden, Ogden City School District*: Bonneville
Elementary School, Carl H. Taylor Elementary
School, Gramercy Elementary School, Ogden High
School
Ogden, Weber School District*: Green Acres Elementary
School
Orem, Alpine School District*: Canyon View Junior High
School
Price: Carbon County School District
Tremonton, Box Elder County School District: North Park
Elementary School
24
Washington
Gig Harbor: Gig Harbor Academy
Seattle: North Seattle Christian School
Spokane: Spokane Lutheran School
Vancouver: Evergreen School District 114
Wyoming
Cheyenne: Trinity Lutheran School
Torrington: Valley Christian School
Yoder, Goshen County School District 1: South East
School
961464_ITBS_GuidetoRD.qxp
10/29/10
PART 3
3:15 PM
Page 25
Validity in the Development and
Use of The Iowa Tests
Validity in Test Use
Validity is an attribute of information from tests
that, according to the Standards for Educational
and Psychological Testing, “refers to the degree to
which evidence and theory support the
interpretations of test scores entailed by proposed
uses of tests” (1999, p. 9).
Assessment information is not considered valid or
invalid in any absolute sense. Rather, the
information is considered valid for a particular use
or interpretation and invalid for another. The
Standards further state that validation involves the
accumulation of evidence to support the proposed
score interpretations.
This part of the Guide to Research and Development
provides an overview of the data collected over the
history of The Iowa Tests that pertains to validity.
Data and research pertaining to The Iowa Tests
consider the five major sources of validity evidence
outlined in the Standards: (1) test content,
(2) response processes, (3) internal structure,
(4) relations to other variables, and (5) consequences
of testing.
The purposes of this part of the Guide are (1) to
present the rationale for the professional judgments
that lie behind the content standards and
organization of the Iowa Tests of Basic Skills, (2) to
describe the process used to translate those
judgments into developmentally appropriate test
materials, and (3) to characterize a range of
appropriate uses of results and methods for
reporting information on test performance to
various audiences.
Criteria for Evaluating Achievement Tests
Evaluating an elementary school achievement test
is much like evaluating other instructional
materials. In the latter case, the recommendations
of other educators as well as the authors and
publishers would be considered. The decision to
adopt materials locally, however, would require
page-by-page scrutiny of the materials to
understand their content and organization.
Important factors in reviewing materials would be
alignment with local educational standards and
compatability with instructional methods.
The evaluation of an elementary achievement test is
much the same. What the authors and publisher can
say about how the test was developed, what
statistical data indicate about the technical
characteristics of the test, and what judgments of
quality unbiased experts make in reviewing the test
all contribute to the final evaluation. But the
decision about the potential validity of the test rests
primarily on local review and item-by-item
inspection of the test itself. Local analysis of test
content—including judgments of its appropriateness
for students, teachers, other school personnel, and
the community at large—is critical.
Validity of the Tests
Validity must be judged in relation to purpose.
Different purposes may call for tests built to
different specifications. For example, a test intended
to determine whether students have reached a
performance standard in a local district is unlikely
to have much validity for measuring differences in
progress toward individually determined goals.
Similarly, a testing program designed primarily to
answer “accountability” questions may not be the
best program for stimulating differential instruction
and creative teaching.
Cronbach long ago made the point that validation is
the task of the interpreter: “In the end, the
responsibility for valid use of a test rests on the
person who interprets it. The published research
merely provides the interpreter with some facts and
concepts. He has to combine these with his other
knowledge about the person he tests. . . .” (1971,
p. 445). Messick contended that published research
should bolster facts and concepts with “some
exposition of the critical value contents in which the
facts are embedded and with provisional accounting
of the potential social consequences of alternative
test uses” (1989, p. 88).
Instructional decisions involve the combination of
test validity evidence and prior information about
25
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 26
the person or group tested. The information that test
developers can reasonably be expected to provide
about all potential uses of tests in decision-making
is limited. Nevertheless, one should explain how
tests are developed and provide recommendations
for appropriate uses. In addition, guidelines should
be established for reporting test results that lead to
valid score interpretations so that the consequences
of test use at the local level are clear.
The procedures used to develop and revise test
materials and interpretive information lay the
foundation for test validity. Meaningful evidence
related to inferences based on test scores, not to
mention desirable consequences from those
inferences, can only provide test scores with social
utility if test development produces meaningful test
materials. Content quality is thus the essence of
arguments for test validity (Linn, Baker & Dunbar,
1991). The guiding principle for the development of
The Iowa Tests is that materials presented to
students be of sufficient quality to make the time
spent testing instructionally useful. Passages are
selected for the Reading tests, for example, not only
because they yield good comprehension questions,
but because they are interesting to read. Items that
measure discrete skills (e.g., capitalization and
punctuation) contain factual content that promotes
incidental learning during the test. Experimental
contexts in science expose students to novel
situations through which their understanding of
scientific reasoning can be measured. These
examples show ways in which developers of The
Iowa Tests try to design tests so taking the test can
itself be considered an instructional activity. Such
efforts represent the cornerstone of test validity.
Statistical Data to Be Considered
The types of statistical data that might be
considered as evidence of test validity include
reliability coefficients, difficulty indices of individual
test items, indices of the discriminating power of the
items, indices of differential functioning of the items,
and correlations with other measures such as course
grades, scores on other tests of the same type, or
experimental measures of the same content or skills.
All of these types of evidence reflect on the validity
of the test, but they do not guarantee its validity.
They do not prove that the test measures what it
purports to measure. They certainly cannot reveal
whether the things being measured are those that
ought to be measured. A high reliability coefficient,
for example, shows that the test is measuring
something consistently but does not indicate what
26
that “something” is. Given two tests with the same
title, the one with the higher reliability may actually
be the less valid for a particular purpose (Feldt,
1997). For example, one can build a highly reliable
mathematics test by including only simple
computation items, but this would not be a valid test
of problem-solving skills. Similarly, a poor test may
show the same distribution of item difficulties as a
good test, or it may show a higher average index of
discrimination than a more valid test.
Correlations of test scores with other measures are
evidence of the validity of a test only if the other
measures are better than the test that is being
evaluated. Suppose, for example, that three language
tests, A, B, and C, show high correlations among
themselves. These correlations may be due simply to
the three tests exhibiting the same defects—such as
overemphasis on memorization of rules. If Test D, on
the other hand, is a superior measure of the
student’s ability to apply those rules, it is unlikely to
correlate highly with the other three tests. In this
case, its lack of correlation with Tests A, B, and C is
evidence that Test D is the more valid test.
This is not meant to imply that well-designed
validation studies are of no value; published tests
should be supported by a continuous program of
research. Rational judgment also plays a key part in
evaluating the validity of achievement tests against
content and process standards and in interpreting
statistical evidence from validity studies.
Validity of the Tests in the Local School
Standardized tests such as the Iowa Tests of Basic
Skills are constructed to correspond to widely
accepted goals of instruction in schools across
the nation. No standardized test, no matter how
carefully planned and constructed, can ever be
equally suited for use in all schools. Local differences
in curricular standards, grade placement, and
instructional emphasis, as well as differences in the
nature and characteristics of the student population,
should be taken into account in evaluating the
validity of a test.
The two most important questions in the selection
and evaluation of achievement tests at the local
level should be:
1. Are the skills and abilities required for successful
test performance those that are appropriate for
the students in our school?
2. Are our standards of content and instructional
practices represented in the test questions?
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 27
To answer these questions, those making the
determination should take the test or at least
answer a sample of representative questions. In
taking the test, they should try to decide by which
cognitive processes the student is likely to reach the
correct answer. They should then ask:
• Are all the cognitive processes considered
important in the school represented in the test?
• Are any desirable cognitive processes omitted?
• Are any specific skills or abilities required for
successful test performance unrelated to the
goals of instruction?
Evaluating an achievement test battery in this
manner is time-consuming. It is, however, the only
way to discern the most important differences
among tests and their relationships to local
curriculum standards. Considering the importance
of the inferences that will later be drawn from test
results and the influence the test may exert on
instruction and guidance in the school, this type of
careful review is important.
Domain Specifications
The content and process specifications for The Iowa
Tests have undergone constant revision for more
than 60 years. They have involved the experience,
research, and expertise of professionals from a
variety of educational specialties. In particular,
research in curriculum practices, test design,
technical measurement procedures, and test
interpretation and utilization has been a continuing
feature of test development.
Criteria for the design of assessments, the selection
and placement of items, and the distribution of
emphasis in a test include:
1. Placement and emphasis in current instructional
materials, including textbooks and other forms of
published materials for teaching and learning.
2. Recommendations of the education community in
the form of subject-matter standards developed by
national organizations, state and national curriculum
frameworks, and expert opinion in instructional
methods and the psychology of learning.
3. Continuous interaction with users, including
discussions of needs and priorities, reviews, and
suggestions for changes. Feedback from students,
teachers, and administrators has resulted in
improvements of many kinds (e.g., Frisbie &
Andrews, 1990).
4. Frequency of need or occurrence and social utility
studies in various curriculum areas.
5. Studies of frequency of misunderstanding,
particularly in reading, language, and
mathematics, as determined from research
studies and data from item tryout.
6. Importance or cruciality, a judgment criterion
that may involve frequency, seriousness of error
or seriousness of the social consequences of error,
expert judgment, instructional trends, public
opinion, etc.
7. Independent reviews by professionals from
diverse cultural groups for fairness and
appropriateness of content for students of
different backgrounds based on geography,
race/ethnicity, gender, urban/suburban/rural
environment, etc.
8. Empirical studies of differential item functioning
(e.g., Qualls, 1980; Becker & Forsyth, 1994; Lewis,
1994; Lee, 1995; Lu & Dunbar, 1996; Witt,
Ankenmann & Dunbar, 1996; Huang, 1998;
Ankenmann, Witt & Dunbar, 1999; Dunbar,
Ordman & Mengeling, 2002; Snetzler & Qualls,
2002).
9. Technical characteristics of items relating to
content validity; results of studies of
characteristics of item formats; studies of
commonality and uniqueness of tests, etc. (e.g.,
Schoen, Blume & Hoover, 1990; Gerig, Nibbelink
& Hoover, 1992; Nibbelink & Hoover, 1992;
Nibbelink, Gerig & Hoover, 1993; Witt, 1993; Bray
& Dunbar, 1994; Lewis, 1994; Frisbie & Cantor,
1995; Perkhounkova, Hoover & Ankenmann,
1997; Bishop & Frisbie, 1999; Perkhounkova &
Dunbar, 1999; Lee, Dunbar & Frisbie, 2001).
The importance of each of these criteria differs from
grade to grade, from test to test, from level to level,
and even from skill to skill within tests. For
example, the correspondence between test content
and textbook or instructional treatment varies
considerably. In the upper grades, the Vocabulary,
Reading Comprehension, and Math Problem Solving
tests are relatively independent of the method used
to teach these skills. On the other hand, there is a
close correspondence between the first part of the
Math Concepts and Estimation test and the
vocabulary, scope, sequence, and methodology of
leading textbooks, as well as between the second
part of that test and The National Council of
Teachers of Mathematics (NCTM) Standards for
estimation skills.
27
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 28
Content Standards and Development
Procedures
New forms of The Iowa Tests are the result of an
extended, iterative process during which
“experimental” test materials are developed and
administered to national and state samples to
evaluate their measurement quality and
appropriateness. The flow chart in Figure 3.1 shows
the steps involved in test development.
Curriculum Review
Review of local, state, and national guidelines for
curriculum in the subjects included in The Iowa
Tests is an ongoing activity of the faculty and staff
of the Iowa Testing Programs. How well The Iowa
Tests reflect current trends in school curricula
is monitored through contact with school
administrators, curriculum coordinators, and
classroom teachers across the United States. New
editions of the tests are developed to be consistent
with lasting shifts in curriculum and instructional
practice when such changes can be accommodated
by changes in test content and item format.
Supplementary measures of achievement, such as
the Iowa Writing Assessment and the ConstructedResponse Supplement, are developed when the need
arises for a new approach to measurement.
Preliminary Item Tryout
Developing The Iowa Tests involves research in the
areas of curriculum, instructional practice, materials
design, and psychometric methods. This work
contributes to the materials that undergo preliminary
tryout as part of the Iowa Basic Skills Testing Program.
During this phase of development, final content
standards for new forms of the tests are determined.
Preliminary tryouts involve multiple revisions to help
ensure high quality in the materials that become part
of the item bank used to develop final test forms.
Materials that do not meet the necessary standards for
content and technical quality are revised or discarded.
The preliminary tryout of items is a regular part of
the Iowa Basic Skills Testing Program. This state
testing program is a cooperative effort maintained
and supported by the College of Education of the
University of Iowa. Over 350 school systems, testing
over 250,000 students annually, administer the
ITBS under uniform conditions. To participate, each
school agrees to schedule a twenty-minute testing
period for tryout materials for new editions.
In the preliminary tryouts, new items are organized
into units (short test booklets) that are distributed to
28
students in a spiraled sequence. For example, if 30
units are tried out in a given grade, each student in
each consecutive set of 30 students receives a different
unit. This process assures the sample of students to
which each unit is administered represents all schools
in the tryout. It also assures a high degree of
comparability of results from unit to unit.
For Levels 9 to 14, the ITBS forms published since
1955 (18,772 total items), 1,681 units with 46,741
test items were tried out in this fashion. Each unit
was administered to a sample of approximately 200
students per grade, usually in three consecutive
grades. For Levels 9 to 14 of Forms A and B (3,158
total items), 361 units with 8,708 items were
included in the preliminary item tryout.
Because most tests in Levels 5 through 8 are read
aloud by the teacher, tryout units for these levels are
given to intact groups. The procedures used for these
tryouts involve stratification of the available schools
according to prior achievement test scores. Tryout
units are systematically rotated to ensure comparable
groups across units. For the nine forms of Levels 5
through 8 (7,340 total items), 336 units with
approximately 13,010 items were tried out. For Forms
A and B (1,837 total items), 76 tryout units with 2,388
items were assembled. Nearly 200,000 students in
kindergarten through grade 8 participated in the
preliminary item tryouts for Forms A and B.
Standard procedures are used to analyze item data
from the preliminary tryout. Difficulty and
discrimination indices are computed for each item;
because performance in Iowa schools differs
significantly and systematically from national
performance, difficulty indices are adjusted. Biserial
correlations between items and total unit score
measure discrimination. Items with increasing
percent correct in successive grades provide
evidence of developmental discrimination.
National Item Tryout
After the results of the preliminary tryout are
analyzed, items from all tests are administered to
selected national samples. The national item tryout
for Forms A and B of the ITBS was conducted in the
fall of 1998 and the spring and fall of 1999.
Approximately 100,000 students were tested
(approximately 11,000 students per grade in
kindergarten through grade 8). A total of 10,370
items were included in the national item tryouts for
Forms A and B.
The major purpose of the national item tryouts is
to obtain item difficulty and discrimination data on
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 29
Figure 3.1
Steps in Development of the Iowa Tests of Basic Skills
Educational Community
• National Curriculum Organizations
• State Curriculum Guides
• Content Standards
• Textbooks and Instructional Materials
Iowa Testing Programs (ITP)
Test Specifications
Item Writers
ITP Editing and
Content Review
NO
Iowa Tryout
Analysis of Iowa
Data
NO
ITP Revisions
Iowa Tryout
YES
Iowa Item Bank
RPC Editing and
Content Review
National
Tryout
NO
YES
External
Fairness/Content
Review
Analysis of
National Tryout Data and
Reviewer Comments
NO
YES
Test Item Bank
Preliminary Forms
of the Tests
Special
Content Review
Final Forms
of the Tests
Standardization
29
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 30
a national sample of diverse curricular and
demographic characteristics and objective data on
possible ethnic and gender differences for the
analysis of differential item functioning (DIF).
Difficulty and discrimination indices, DIF analyses,
and other external criteria were used for final item
selection. Results of DIF analyses by gender and
race/ethnicity are described in Part 7.
Fairness Review
Content analysis of test materials is a critical aspect
of test development. To ensure that items represent
ethnic groups accurately and show the range of
interests of both genders, expert panels review all
materials in the national item tryout.
Such panels serve a variety of functions, but their
role in test development is to ensure that test
specifications cover what is intended given the
definition of each content domain. They also ensure
that item formats in each test make questions
readily accessible to all students and that sources
of construct-irrelevant variance are minimized.
Members of these panels come from national
education communities with diverse social, cultural,
and geographic perspectives. A description of the
fairness review procedures used for Forms A and B
of the tests appears in Part 7.
Development of Individual Tests
The distribution of skills in Levels 5 through 14 of
the Iowa Tests of Basic Skills appears in Table 3.1.
The table indicates major categories in the content
specifications for each test during item
development. For some tests (e.g., Vocabulary), more
categories are used during item development than
may be used on score reports because their presence
ensures variety in the materials developed and
chosen for final forms. For most tests, however, the
major content categories are broken down further in
diagnostic score reports. The current edition of the
ITBS reflects a tendency among educators to focus
on core standards and goals, particularly in reading
and language arts, so in some parts of the battery
there are fewer specific skill categories than in
earlier editions.
The following descriptions of major content areas of
the Complete Battery provide information about the
conceptual definition of each domain. They address
general issues related to measuring achievement in
each domain and give the rationale for approaches
to item development. A unique feature of tests such
as the ITBS is that the continuum of achievement
they measure spans ages 5 to 14, when cognitive and
30
social development proceed at a rapid rate. The
definition of each domain describes school
achievement over a wide range. Thus, each test in
the battery is actually conceived as a broad measure
of educational development in school-related
subjects for students ages 5 through 14.
Vocabulary
Understanding the meanings of words is essential to
all communication and learning. Schools can
contribute to vocabulary power through planned,
systematic instruction; informal instruction
whenever the opportunity arises; reading of a
variety of materials; and activities and experiences,
such as field trips and assemblies. One of a teacher’s
most important responsibilities is to provide
students with an understanding of the specialized
vocabulary and concepts of each subject area.
The Vocabulary test involves reading and word
meaning as well as concept development.
Linguistic/structural distinctions among words are
monitored during test development so each form
and level includes nouns, verbs, and modifiers. At
each test level, vocabulary words come from a range
of subjects and represent diverse experiences.
Because the purpose of the Vocabulary test is to
provide a global measure of word knowledge, specific
skills are not reported.
Monitoring the parts of speech in the Vocabulary
test is important because vocabulary is closely tied
to concept development. The classification of words
is based on functional distinctions; i.e., the part that
nouns, verbs, modifiers, and connectives play in
language. Representing the domain of word
knowledge in this way is especially useful for
language programs that emphasize writing.
Although words are categorized by part of speech
(nouns, verbs, and modifiers) for test development,
skill scores are not reported by category.
In Levels 5 and 6, the Vocabulary test is read aloud
by the teacher. The student is required to identify
the picture that goes with a stimulus word. In
Levels 7 and 8, the Vocabulary test is read silently
by the student and requires decoding skills. In the
first part, the student selects the word that
describes a stimulus picture. In the second part, the
student must understand the meaning of words in
the context of a sentence.
In Levels 9 through 14, items consist of a word in
context followed by four possible definitions.
Stimulus words were chosen from The Living Word
Vocabulary (Dale & O’Rourke, 1981), as were words
constituting the definitions. Emphasis is on grade-
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 31
Table 3.1
Distribution of Skills Objectives for the Iowa Tests of Basic Skills, Forms A and B
Levels 5 and 6
Test
Levels 7 and 8
Levels 9 –14
Number of
Major
Categories
Number of
Skills
Objectives
Number of
Major
Categories
Vocabulary
1
3
1
3
1
3
Reading Comprehension
2
5
2
4
3
9
3
8
3
7
4
12
Spelling
–
–
4
4
3
5
Capitalization
–
–
–
–
7
19
Punctuation
–
–
–
–
4
21
Usage and Expression
–
–
–
–
5
22
7
7
4
13
19
67
Math Concepts
and Estimation
–
–
4
13
6
20
Math Problem Solving
and Data Interpretation
–
–
6
12
6
13
Number of
Skills
Objectives
Number of
Major
Categories
Number of
Skills
Objectives
Reading
Reading Total
Language
Language Total
Mathematics
Math Computation
–
–
2
6
12
24
4
11
12
31
24
57
Social Studies
–
–
4
12
4
14
Science
–
–
4
10
4
11
Maps and Diagrams
–
–
–
–
3
10
Reference Materials
–
–
–
–
3
10
–
–
4
9
6
20
Listening
2
8
2
8
2*
8*
Word Analysis
2
6
2
8
2*
8*
18
40
35
98
Math Total
Sources of Information
Sources Total
Total
65
197
*Listening and Word Analysis are supplementary tests at Level 9.
level appropriate vocabulary that children are likely
to encounter and use in daily activities, both in and
out of school, rather than on specialized or esoteric
words, jargon, or colloquialisms. Nouns, verbs, and
modifiers are given approximately equal
representation. Target words are presented in a
short context to narrow the range of meaning and, in
the upper grades, to allow testing for knowledge of
uncommon meanings of common vocabulary words.
Word selection is carefully monitored to prevent the
use of extremely common words and cognates as
distractors and to ensure that the same target
words do not appear in parallel forms of current and
previous editions of the tests.
Few words in the English language have exactly the
same meaning. An effective writer or speaker is one
who can select words that express ideas precisely. It
is not the purpose of an item in the Vocabulary test
to determine whether the student knows the
meaning of a single word (the stimulus word). Nor is
it necessary that the response words be easier or
more frequently used than the stimulus word,
although they tend to be. Rather, the immediate
31
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 32
purpose of each item is to determine if the student is
able to discriminate among the shades of meaning of
all words used in the item. Thus, a forty-item
Vocabulary test may sample as many as 200 words
from a student’s general vocabulary.
Word Analysis
The purpose of the Word Analysis test is to provide
diagnostic information about a student’s ability to
identify and analyze distinct sounds and symbols of
spoken and written language. At all levels the test
emphasizes the student’s ability to transfer
phonological representations of language to their
graphemic counterparts. Transfer is from sounds to
symbols in all items, which is consistent with
developmental patterns in language learning.
Word analysis skills are tested in Levels 5 through
9. Skills involving sound-letter association,
phonemic awareness, and word structure are
represented. Stimuli consist of pictures, spoken
language, written language, and novel words.
In Levels 5 and 6, the skills involve letter
recognition, letter-sound correspondence, initial
sounds, final sounds, and rhyming sounds. In Levels
7 through 9, more complex phonological structures
are introduced: medial sounds; silent letters; initial,
medial, and final substitutions; long and short vowel
sounds; affixes and inflections; and compound words.
Items in the Word Analysis test measure decoding
skills that require knowledge of grapheme-phoneme
relationships. Information from such items may be
useful in diagnosing difficulties of students with low
scores in reading comprehension.
Reading
The Reading tests in the Complete Battery of the
ITBS have emphasized comprehension over the 15
editions. This emphasis continues in the test
specifications for all levels. The Reading tests are
concerned with a student’s ability to derive
meaning; skills related to the so-called building
blocks of reading comprehension (graphemephoneme connections, word attack, sentence
comprehension) are tested in other parts of the
battery in the primary grades. When students reach
an age when independent reading is a regular part
of daily classroom activity, the emphasis shifts to
questions that measure how students derive
meaning from what they read.
The Reading test in Levels 6 through 8 of the ITBS
accommodates a wide range of achievement in early
reading. Level 6 consists of Reading Words and
Reading Comprehension. Reading Words includes
32
three types of items to measure how well the
student can identify and decode letters and words in
context. Auditory and picture cues are used to
measure word recognition. Complete sentences
accompanied by picture cues are used to measure
word attack. Reading Comprehension includes three
types of items to assess how well the student can
understand sentences, picture stories, and
paragraphs. Separate scores are reported for
Reading Words and Reading Comprehension so it is
possible to obtain a Reading score for students who
are not reading beyond the word level.
In Levels 7 and 8, the Reading test measures
sentence and story comprehension. Sentence
comprehension is a cloze task that requires the
student to select the word that completes a sentence
appropriately. Story comprehension is assessed with
pictures or text as stimuli. Pictures that tell a story
are followed by questions that require students to
identify the story line and understand connections
between characters and plot. Fiction and nonfiction
topics are used to measure how well the student
can read and comprehend paragraphs. Story
comprehension skills require students to understand
factual details as well as make inferences and draw
generalizations from what they have read.
Levels 5 through 9 of the Complete Battery of Forms
A and B combine the tests related to early reading
in the Primary Reading Profile. The reading profile
can be used to help explain the reasons for low
scores on the Reading test at these levels. It is
described in the Interpretive Guide for Teachers and
Counselors for Levels 5–8.
New to Forms A and B is a two-part structure for
the Reading Comprehension test at Levels 9–14.
Prior to standardization, a preliminary version of
the Form A Reading Comprehension test was
administered in two separately timed sections. Item
analysis statistics and completion rates indicated
that the two-part structure was preferable to a
single, timed administration.
In Levels 9 through 14, the Reading Comprehension
test consists of passages that vary in length,
generally becoming longer and more complex in the
progression from Levels 9 to 14. The range of
content in Form A of the Complete Battery is shown
in Table 3.2. The passages represent various types of
material students read in and out of school. They
come from previously published material and
information sources, and they include narrative,
poetry, and topics in science and social studies. In
addition, some passages explain how to make or do
something, express an opinion or point of view, or
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 33
Table 3.2
Types of Reading Materials
Iowa Tests of Basic Skills — Complete Battery,
Form A
Items
Nonfiction
(Biography)
Social Studies
(U.S. History)
Literature
(Fiction)
Literature
(Poetry)
Science
(Field Observation)
Nonfiction
(Newspaper Editorial)
Social Studies
(Anthropology)
Science
(Earth Science)
6
4
4
7
6
4
6
5
7
4
Table 3.3 presents the content specifications for the
Reading Comprehension test. It includes a sample of
items from Levels 10 and 14 that corresponds to
each process skill in reading. More detailed
information about other levels of the Reading
Comprehension test is provided in the Interpretive
Guide for Teachers and Counselors.
7
6
6
4
8
Level 13
Nonfiction (Social Roles)
Science (Human Anatomy)
Social Studies
(Preservation)
Social Studies
(Government Agency)
5
Listening
5
7
5
7
8
7
5
Level 14
Nonfiction
(Personal Essay)
Literature
(Poetry)
Literature
(Fiction)
Social Studies
(Culture and Traditions)
5
Level 11
Nonfiction
(Transportation)
Literature
(Fiction)
Nonfiction
(Food and Culture)
Science
(Insects)
4
5
Reading requires critical thinking on a number of
levels. The ability to decode words and understand
literal meaning is, of course, important. Yet active,
strategic reading has many other components. An
effective reader must draw on experience and
background knowledge to make inferences and
generalizations that go beyond the words on the
page. Active readers continually evaluate what they
read to comprehend it fully. They make judgments
about the central ideas of the selection, the author’s
point of view or purpose, and the organizational
scheme and stylistic qualities used. This is true at all
developmental levels. Children do not suddenly
learn to read with such comprehension at any
particular age or grade. Thoughtful reading is the
result of a period of growth in comprehension that
begins in kindergarten or first grade; no amount of
concentrated instruction in the upper elementary
grades can make up for a lack of attention to reading
for meaning in the middle or lower grades.
Measurement of these aspects of the reading process
is required if test results are used to support
inferences about reading comprehension. The ITBS
Reading tests are based on content standards that
reflect reading as a dynamic cognitive process.
Level 12
Nonfiction
(Biography)
Science
(Animal Behavior)
Literature
(Folktale)
Literature
(Poetry)
4
4
Level 9
Literature (Fiction)
Literature (Fable)
Social Studies
(Urban Wildlife)
Literature (Fiction)
Level 10
Passages
describe a person or topic of general interest. When
needed, an introductory note is included to give
background information. Passages are chosen to
satisfy high standards for writing quality and appeal.
Good literature and high-quality nonfiction offer
opportunities for questions that tap complex
cognitive processes and that sustain student interest.
Listening comprehension is measured in Levels 5
through 9 of the ITBS Complete Battery and in
Levels 9 through 14 of the Listening Assessment
for ITBS. The Listening Assessment for ITBS is
described in Part 9.
Listening is often referred to as a “neglected area” of
the school curriculum. Children become good
listeners through a combination of direct instruction
and incidental learning; however, children with good
listening strategies use them throughout their
school experiences. Good listening strategies
developed in the early elementary years contribute
to effective learning later.
33
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 34
Table 3.3
Reading Content/Process Standards
Iowa Tests of Basic Skills — Complete Battery, Form A
Illustrative Items
Content/Process Standards
Level 10
Level 14
Factual Understanding
• Understand stated information
• Understand words in context
6, 15, 28
19, 25
26, 35, 42
5, 28, 38
Inference and Interpretation
• Draw conclusions
• Infer traits, feelings, or motives of characters
• Interpret information in new contexts
• Interpret nonliteral language
4, 18, 29
17, 27, 30
7, 26
12, 14
19, 27, 41
2, 24
44, 51
21, 37
Analysis and Generalization
• Determine main ideas
• Identify author’s purpose or viewpoint
• Analyze style or structure of a passage
11, 33
37
13, 20
17, 25, 47
39, 52
11, 22, 45
Levels 5 through 9 of the Listening test measure
general comprehension of spoken language. Table 3.4
shows the content/process standards for these tests.
They emphasize understanding meaning at all
levels. Many comprehension skills measured by the
Listening tests in the early grades reflect aspects of
cognition measured by the Reading Comprehension
tests in the later grades, when a student’s ability to
construct meaning from written text has advanced.
Such items would be much too difficult for the
Reading tests in the Primary Battery because of the
complex reading material needed to tap these skills.
It is possible, however, to measure such skills
through spoken language, so at the early levels the
Listening tests are important indicators of the
cognitive processes that influence reading.
Language
Language arts programs comprise the four
communication skills that prepare students for
effective interaction with others: reading, writing,
listening, and speaking. These aspects of language
are assessed in several ways in The Iowa Tests.
Reading, because of its importance in the
elementary grades, is assessed by separate tests in
Levels 6 through 14 of the ITBS. Writing, the
process of generating, organizing, and expressing
ideas in written form, is the focus of the Iowa
Writing Assessment in Levels 9 through 14.
34
Table 3.4
Listening Content/Process Standards
Iowa Tests of Basic Skills — Complete Battery
Process Skill
Levels Levels
5–9
9–14*
Literal Meaning
Following Directions
Visual Relationships
Sustained Listening
Inferential Meaning
Concept Development
Predicting Outcomes
Sequential Relationships
Numerical / Spatial / Temporal
Relationships
Speaker’s Purpose, Point of View,
or Style
*Listening is a supplementary test at these levels.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 35
Listening, the process of paying attention to and
understanding spoken language, is measured in the
ITBS Complete Battery at Levels 5 through 9 and in
the Listening Assessment for ITBS in Levels 9
through 14. Selecting or developing revisions of
written text is measured by the ConstructedResponse Supplement to The Iowa Tests: Thinking
About Language in Levels 9 through 14.
The domain specifications for the Complete Battery
and Survey Battery of the ITBS identify aspects of
spoken and written language important to clarity
of thought and expression. The complexity of tasks
presented and the transition from spoken to written
language both progress from Level 5 through
Level 14.
The Language tests in Levels 5 and 6 measure the
student’s comprehension of linguistic relationships
common to spoken and written language. The tests
focus on ways language is used to express ideas and
to understand relationships. The student is asked to
select a picture that best represents the idea
expressed in what the teacher reads aloud.
The subdomains of the Language tests in Levels 5
and 6 include:
Operational Language: understanding
relationships among subject, verb, object,
and modifier
Verb Tense: discriminating past, present, and
future
Classification: recognizing common
characteristics or functions
Prepositions: understanding relationships such
as below, behind, between, etc.
Singular-Plural: differentiating singular and
plural referents
Comparative-Superlative: understanding
adjectives that denote comparison
Spatial-Directional: translating verbal
descriptions into pictures
The Language tests in Levels 7 through 14 of the
Complete Battery and Survey Battery were
developed from domain specifications with primary
emphasis on linguistic conventions common to
standard written English.1 Although writing is
taught in a variety of ways, the approaches share a
common goal: a written product that expresses the
writer’s meaning as precisely as possible. An
important quality of good writing is a command of
the conventions of written English that allows the
writer to communicate effectively with the intended
audience. The proofreading, editing, and revising
stages of the writing process involve these skills,
and the proofreading format used for the Language
tests is an efficient way to measure knowledge of
these conventions.
Although linguistic conventions change constantly
(O’Conner, 1996), the basic skills in written
communication have changed little over the
years. The importance of precision in written
communication is greater than ever for an
increasing proportion of the adult population,
whether because of the Internet or because of
greater demand for information. To develop tests of
language skills, authors must strike a balance
between precision on the one hand and fluctuating
standards of appropriateness on the other. In
general, skills for the Language tests sample
aspects of spelling, capitalization, punctuation, and
usage pertaining to standard written English,
according to language specialists and writing
guides (e.g., The Chicago Manual of Style, Webster’s
Guide to English Usage, The American Heritage
Dictionary). Content standards for usage and
written expression continue to evolve, reflecting the
strong emphasis on effective writing in language
arts programs.
Levels 7 and 8 of the ITBS provide a smooth transition
from the emphasis on spoken language found in Levels
5 and 6 to the emphasis on written language found in
Levels 9 through 14. The entire test at Levels 7 and 8
is read aloud by the teacher. In sections involving
written language, students read along with the teacher
as they take the test. Spelling is measured in a
separate test; the teacher reads a sentence that
contains three keywords, and the student identifies
which word is spelled incorrectly. Capitalization,
punctuation, and usage/expression are measured in
context, using items similar in format to those in Levels
9 through 14.
The content of Levels 9 through 14 includes skills
in spelling, capitalization, punctuation, and
1
The Language tests measure skills in the conventions of “standard” written English. Students with multicultural backgrounds
or, particularly, second-language backgrounds may have special difficulty with certain types of situations presented in the
Language tests. It is important to remember that the tests measure proficiency in standard written English, which may differ from
the background language of the home. Such differences should be taken into consideration in interpreting the scores from the
Language tests, and in any follow-up instruction that may be based, in part, on test results.
35
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 36
usage/expression. Separate tests in each area are
used in the Complete Battery; a single test is used in
the Survey Battery. Table 3.5 shows the distribution
of skills for language tests in the Complete Battery
and Survey Battery of Level 10, Form A. Writing
effectively requires command of linguistic
conventions in all of these areas at once, but greater
diagnostic information about strengths and
weaknesses in writing is obtained using separate
tests in each area.
Table 3.5
Comparison of Language Tests by Battery
Iowa Tests of Basic Skills — Level 10, Form A
Battery
Content Area
Complete
Survey
Spelling
Root Words
Words with Affixes
Correct Spelling
32
22
6
4
11
Capitalization
Names and Titles
Dates and Holidays
Place Names
Organizations and Groups
Writing Conventions
Overcapitalization
Correct Capitalization
26
3
3
6
3
6
2
3
10
Punctuation
End Punctuation
Comma
Other Punctuation Marks
Correct Punctuation
26
12
7
4
3
10
Usage and Expression
Nouns, Pronouns, and
Modifiers
Verbs
Conciseness and Clarity
Organization of Ideas
Appropriate Use
33
9
16
Total
117
8
4
5
7
47
The development of separate tests of language skills
offers several advantages. First, content that offers
opportunities to measure one skill (e.g., salutations
in business letters on a punctuation test) may not
offer opportunities to measure another. It is
extremely difficult to construct a single test that will
provide as comprehensive a definition of each
domain—and hence as valid a test—as a separate
test in each domain. Second, a single language test
36
covering all skills would need many items to yield a
reliable, norm-referenced score in each area. (Note
that national percentile ranks are not provided with
the Survey Battery in spelling, capitalization,
punctuation, and usage/expression.) Third, the
directions to students can be complicated when
items covering all four skills are included. Finally, it
is easier to maintain uniformly high-quality test
items if they focus on specific skills. In a unitary
test, to retain good items it is sometimes necessary
to include a less than ideal item associated with the
same stimulus material.
A comprehensive school curriculum in the language
arts is likely to be considerably broader in coverage
than the content standards of the ITBS Language
tests. For example, many language arts programs
teach the general research skills students need
for lifelong learning. Because these abilities are
used in many school subjects, they are tested in
the Reference Materials test rather than in the
Language tests. This permits more thorough
coverage of reference skills and underscores their
importance to all teachers. Language arts programs
also develop competence in writing by having
students read each other’s writing. Although such
skills are measured in the second part of the Usage
and Expression test, they are also covered in the
Reading Comprehension test in standards related to
inference and generalization.
Each Language test in the Complete Battery is
developed through delineation of the relevant
domain. The content specifications are adjusted for
each edition to reflect changing patterns in the
conventions of standard written English. The trend
toward so-called open punctuation, for example, has
led to combining certain skills for that test in the
current edition. Other details about content
specifications appear in the Interpretive Guide for
Teachers and Counselors. Domain descriptions for
the language tests are as follows:
Spelling. The Spelling test for Levels 9 through 14
directly measures a student’s ability to identify
misspelled words. The items consist of four words,
one of which may be misspelled. The student is
asked to identify the incorrectly spelled word. A fifth
response, “No mistakes,” is the correct response for
approximately one-eighth of the items.
The main advantage of the item format for Spelling
is that it tests four spelling words in each item.
Another advantage is the words tested better
represent spelling words taught at target grade
levels. With this item format, one can obtain suitable
reliability and validity without using more
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 37
advanced or less familiar spelling words. Careful
selection of words is the crucial aspect of content
validity in spelling tests, regardless of item format.
The spelling words chosen for each form of the test
come from the speaking vocabulary of students at a
given grade level. Errors are patterned after
misspellings observed in student writing. In
addition, misspellings are checked so that: (1) the
keyword can be identified by the student despite its
misspelling, (2) a misspelled English word is not a
correctly spelled word in a common second
language, and (3) target words do not appear in
parallel forms of the current or previous edition of
the battery. Spelling words are also selected to avoid
overlap with target words and distractors in the
Vocabulary test at the same level.
The type of spelling item used on The Iowa Tests has
been demonstrated to be superior to the type that
presents four possible spellings of the same word.
The latter type has several weaknesses. Many
frequently misspelled words have only one common
misspelling. Other spelling variations included as
response options are seldom selected, limiting what
the item measures. In addition, the difficulty of
many spelling items in this form doesn’t reflect the
frequency with which the word is misspelled. This
inconsistency raises doubt about the validity of a
test composed of such items.
Educators often question the validity of multiplechoice spelling tests versus list-dictation tests.
However, a strong relationship between dictation
tests and certain multiple-choice tests has
repeatedly been found (Frisbie & Cantor, 1995).
Capitalization and Punctuation. Capitalization
and punctuation skills function in writing rather
than in speaking or reading. Therefore, a valid test
of these skills should include language that might
have been drawn from the written language of a
student. The phrasing of items should also be on a
level of sophistication commensurate with the age
and developmental level of the student. Efforts have
been made to include materials that might have
come from letters, reports, stories, and other writing
from a student’s classroom experience.
The item formats in the Capitalization and
Punctuation tests are similar. They include one or
two sentences over three lines about equal in length.
The student identifies the line with an error or
selects a fourth response to indicate no error.
This item type, uncommon in language testing, was
the subject of extensive empirical study before being
used in the ITBS. Large-scale tryout of experimental
tests composed of such items indicated the
reliability per unit of testing time was at least as
high as that of more familiar item types.
Items of this type also have certain logical
advantages. An item in uninterrupted discourse is
more likely to differentiate between students who
routinely use correct language and those who lack
procedural knowledge of the writing conventions
measured. In the traditional multiple-choice item,
the error situations are identified. For example, the
student might only be required to decide whether
“Canada” should be capitalized or whether a comma
is required in a given place in a sentence. In the
find-the-error item type used on the ITBS, however,
error situations are not identified. The student must
be sensitive to errors when they occur. Such items
more realistically reflect the situations students
encounter in peer editing or in revising their writing.
Another reason for using the find-the-error format
concerns the frequency of various types of errors.
Some errors occur infrequently but are serious
nonetheless. With a find-the-error item, all of the
student’s language practices, good and bad, have an
opportunity to surface during testing. Such items
can be made easy or difficult as test specifications
demand without resorting to artificial language or
esoteric conventions.
Usage and Expression. Knowledge and use of
word forms and grammatical constructions are
revealed in both spoken and written language.
Spoken language varies with the social context.
Effective language users change the way they
express themselves depending on their audience
and their intended meaning. Written language in
school is a specific aspect of language use, and tests
of written expression typically reflect writing
conventions associated with “standard” English.
Mastery of this aspect of written expression is a
common goal of language arts programs in the
elementary and middle grades; other important
goals tend to vary from school to school, district to
district, and state to state, and they tend to be
elusive to measure. This is why broad-range
achievement test batteries such as the ITBS focus
exclusively on standard English.
The mode of expression, spoken or written,
influences how a person chooses words to express
meaning. A student’s speech patterns may contain
constructions that would be considered inappropriate
in formal writing situations. Similarly, some forms
37
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 38
of written expression would be awkward if spoken
because complex constructions do not typically occur
in conversation. The ITBS Usage and Expression
test includes situations that could arise in both
written and spoken language.
The first part of the Usage and Expression test uses
the same item format found in Capitalization and
Punctuation. Items contain one to three sentences,
one of which may have a usage error. Students
identify the line with the error or select “No
mistakes” if they think there is no error. Some of the
items in this part form a connected story.
The validity, reliability, and functional characteristics
of this item type are important considerations in its
use. In a study of why students selected various
distractors, students were found to make various
usage errors—many more than could be sampled in
a test of reasonable length. Satisfactory reliability
could be achieved with fewer items if the items
contained distractors in which other types of usage
errors are commonly made. Comparisons of find-theerror items and other item formats indicated better
item discrimination for the former. The difficulty of
the find-the-error items also reflected more closely
the frequency with which various errors occur in
student speech and writing.
The second part of the Usage and Expression test
assesses broader aspects of written language, such
as concise expression, paragraph development, and
appropriateness to audience and purpose. This part
includes language situations more directly
associated with editing and revising paragraphs and
stories. Students are required to discriminate
between more and less desirable ways to express the
same idea based on content standards in the areas of:
Conciseness and Clarity: being clear and using as
few words as possible
Appropriateness: recognizing the word, phrase,
sentence, or paragraph that is most appropriate
for a given purpose
Organization: understanding the structure of
sentences and paragraphs
The item types in this part of the test were
determined by the complexity and linguistic level of
the skill to be assessed. In some cases, the student is
asked to select the best word or phrase for a given
situation; in others, the choice is between sentences
or paragraphs. In all cases, the student must
evaluate the effectiveness or appropriateness of
alternate expressions of the same idea.
38
What constitutes “good” or “correct” usage varies
with situations, audiences, and cultural influences.
Effective language teaching includes appreciation of
the influence of context and culture on usage, but no
attempt is made to assess this type of linguistic
awareness. A single test embracing these objectives
would involve more than “basic” skills and would
have to sample the language domain beyond what is
common to most school curricula.
Mathematics
In general, changes occur slowly in the nature of
basic skills and objectives of instruction. The field of
mathematics, however, has been a noticeable
exception. Elementary school mathematics has been
in the process of continual change over the past 45
to 50 years. In grades 3 through 8, the math
programs of recent years have modified method,
placement,
sequence,
and
emphasis,
but
quantitative reasoning and problem solving remain
important. The National Council of Teachers of
Mathematics (NCTM) Principles and Standards for
School Mathematics (2000) describes the content of
the math curriculum and the process by which it is
assessed. Changes in content and emphasis of the
ITBS Math tests reflect these new ideas about
school math curricula.
Forms 1 and 2 of the ITBS were constructed when
textbook series exhibited remarkable uniformity in
the grade placement of math concepts. Forms 3 and
4 were developed during the transition from
“traditional” to “modern” math. In the three years
following publication of Forms 3 and 4, math
programs changed so dramatically that a special
interim edition, The Modern Mathematics
Supplement, was published to update the tests.
During the late 1960s and early 1970s, the math
curriculum was relatively stable. Forms 5 and 6
(standardized in 1970–71) increasingly emphasized
concept development, whereas Forms 7 and 8
(standardized in 1977–78) shifted the emphasis to
computation processes and problem-solving
strategies. Greater attention was paid to estimation
skills and mental arithmetic in Forms G and H
(standardized in 1984–85). The 1989 NCTM
Standards led to significant changes in two of
the ITBS Math tests (Concepts and Estimation
and Problem Solving and Data Interpretation)
in Forms K, L, and M. The 2000 revision of the
NCTM Standards is reflected in slight modifications
of these tests in Forms A and B.
Concepts and Estimation. The curriculum and
teaching methods in mathematics show great
diversity. Newer programs reflect the NCTM
Standards more directly, yet some programs have
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 39
endorsed new standards for math education without
significantly changing what is taught. In part,
diverse approaches to math teaching belie the high
degree of similarity in the purposes and objectives of
math education.
As with any content area, the method used to teach
math is probably less important than a teacher’s
skill with the method. When new content standards
and methods are introduced, teachers need time to
apply them; teachers must learn what works, what
does not work, and what to emphasize. During times
of curriculum transition, an important part of a
teacher’s experience is adjusting to changes in
assessment based on new standards.
The Iowa Tests have always emphasized
understanding, discovery, and quantitative thinking.
Math teachers know students need more time to
understand a fact or a process when meaning is
stressed than when math is taught simply by drill
and practice. In the long run, children taught by
methods that focus on understanding will develop
greater competence, even though they may not
master facts as quickly in the early stages of learning.
Even with a test aimed at a single grade level,
compromises are necessary in placement of test
content. A test with many items on concepts taught
late in the school year may be inappropriate to use
early in the school year. Conversely, if concepts
taught late in the school year are not covered on the
test, this diminishes the validity of the test if
administered at the end of the year. In the Concepts
and Estimation test, a student in a given grade
should be familiar with 80 to 85 percent of the items
for that grade at the beginning of the school year. By
midyear, a student should be familiar with 90 to 95
percent of the items for that grade. The remaining
items require understanding usually gained in the
second half of the year. Assigning levels for the
Concepts and Estimation test should be done
carefully, because shifts in local curriculum can affect
which test levels are appropriate for which grades.
Beginning with Forms K and L, Levels 9 through 14
(grades 3 through 8), the Math Concepts test
became a two-part test that included estimation.
The name of the test was changed to Concepts and
Estimation because of the separately timed
estimation section. Part 1 is similar to the Math
Concepts test in previous editions of the ITBS. In
Forms A and B, this part continues to focus on
numeration, properties of number systems, and
number sequences; fundamental algebraic concepts;
and basic measurement and geometric concepts.
More emphasis is placed on probability and
statistics. As in past editions, computational
requirements of Part 1 are minimal. Students may
use a calculator on Part 1.
Part 2 of the Concepts and Estimation test is
separately timed to measure computational
estimation. Early editions included a few estimation
items in the Concepts test (about 5 percent). The
changing role of computation in the math curriculum,
however, created by the growing use of calculators
and the continued need for estimation skills in
everyday life, requires a more prominent role for
estimation. Both the 1989 and 2000 NCTM
Standards documented the importance of estimation.
Studies indicate that, with proper directions and
time limits, students will use estimation strategies
and will rarely resort to exact computation (Schoen,
Blume & Hoover, 1990). Several aspects of
estimation are represented in Part 2 of the Concepts
and Estimation test, including: (a) standard
rounding—rounding to the closest power of 10 or, in
the case of mixed numbers, to the closest whole
numbers; (b) order of magnitude involving powers of
ten; and (c) number sense, including compatible
numbers and situations that require compensation.
Besides varying the estimation strategy, some items
place estimation tasks in context and others use
symbolic form. Student performance on estimation
items can differ dramatically depending on whether
a context is provided. Because estimation strategies
in connection with the use of a calculator or
computer rarely involve context, some items are
presented without a context. At lower levels of the
test, about two-thirds of the items are in context. This
proportion decreases to roughly one-half at the
upper levels.
In Forms A and B, the estimation section was
shortened by about 50 percent from what it had been
in Forms K, L, and M. Experiences of users indicated
that sufficient content coverage and reliability would
be obtained with fewer items. Using a calculator is
not permitted on this part of the test.
Problem Solving and Data Interpretation.
Another change in the ITBS Math tests occurred in
the Problem Solving and Data Interpretation test.
The 2000 NCTM Standards called for continued
attention to problem solving in the math curriculum
with added emphasis on interpretation and analysis
of data. Forms K, L, and M, which were designed
with this emphasis in mind, contained two parts:
one focusing on problem-solving skills and the other
on interpreting data in graphs and tables. Part 1 of
Problem Solving and Data Interpretation was
39
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 40
similar in format to earlier editions of the ITBS
Problem Solving test. Part 2 included materials that
had been in one of the tests on work-study skills. In
Forms A and B, problem solving and data
interpretation are integrated. The problem
situations, graphs, and tables for this test are based
on real data and emphasize connections to other
areas of the curriculum.
The content of the Math Problem Solving tests in the
ITBS has been strongly influenced by historical
changes in the design of the battery. The addition of
a test of computational speed and accuracy
beginning with Forms 7 and 8 in the late 1970s
marked a fundamental change in the definition of
problem solving in the ITBS. What had been a
domain that included computational skills in an
applied setting became one of problems that require
fundamental (often multiple) operations and
concepts in a meaningful context. Problem Solving
and Data Interpretation still requires computation.
But the operations and concepts, in most cases, have
been introduced at least a year before the grade level
for which the test is intended. Most of the operations
are basic facts or facts that do not require renaming.
The total number of operations at the upper levels is
substantially greater than the number of items. This
reflects the large proportion of items at these levels
that require multiple steps to solve. The content
standards emphasize problem contexts with solution
strategies
beyond
simple
application
of
computational algorithms memorized through a drill
and practice curriculum. Table 3.6 outlines the
computational skills required for Form A of the
Problem Solving and Data Interpretation test.
In mathematics, an ideal problem is one that is
novel for the individual solving it. Many problems in
instructional materials, however, might be better
described as “exercises.” Often they are identical or
similar to others explained in textbooks or by the
teacher. In such examples, the student is not called
on to do anything new; modeling, imitation, and
recall are the primary behaviors required. This is
not to suggest that repetition—such as working
exercises and practicing basic facts—is not useful;
indeed, it can be important. However, opportunities
should also be provided for students to solve
realistic problems in novel situations they might
experience in daily life.
Problem Solving and Data Interpretation includes
items that measure “problem-solving process” or
“strategy.” These item types were adapted from
Polya’s four-step problem-solving model (Polya,
1957). As part of the Iowa Problem Solving Project
(Schoen, Blume & Hoover, 1990), items were
40
developed to measure the steps of (1) getting to
know the problem, (2) choosing what to do, (3) doing
it, and (4) looking back. Information gathered as
part of this project was used to integrate items of
this type into The Iowa Tests.
Including questions that require the interpretation of
data underscores a long-standing belief in the
importance of graphical displays of quantitative
information. The ITBS was the first achievement
battery to assess a student’s ability to interpret data
in graphs and tables. Such items have been included
since the first edition in 1935 and appeared in a
separate test until 1978. In more recent editions,
these items were part of a test called Visual Materials.
Formal instruction in the interpretation and
analysis of graphs, tables, and charts has become
part of the mathematics curriculum as a result of
the 1989 and 2000 NCTM Standards. This trend
reflects increased emphasis on statistics at the
elementary level. The interpretation of numerical
information presented in graphs and tables provides
the foundation for basic descriptive statistics.
The data interpretation skills assessed in this test
are reading amounts, comparing quantities, and
interpreting relationships and trends in graphs and
tables. Stimulus materials include pictographs,
circle graphs, bar graphs, and line graphs. Tables,
charts, and other visual displays from magazines,
newspapers, television, and computers are also
presented. The information is authentic, and the
graphical or tabular form used is logical for the data.
Items in this part of the test essentially require no
computation. Students may use a calculator on the
Problem Solving and Data Interpretation test.
Computation. The early editions of the ITBS
measured computation skills with word problems in
the Mathematics Problem Solving test just described.
Computational load is now considered a confounded
effect in problem solving items. As problem solving
itself became the focus of that test, an independent
measure of speed and accuracy in computation was
needed. Beginning with Forms 7 and 8, a separate
test of computational skill was added.
Instruction in computation takes place with whole
numbers, fractions, and decimals. Each of these
areas includes addition, subtraction, multiplication,
and division. Although much computation in “real
world” settings involves currency, percents, and
ratio or proportion, these applications are nothing
more than special cases of basic operations with
whole numbers, fractions, and decimals. “Real
world” problems that require performing these
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 41
specialized computation skills are still part of the
Problem Solving and Data Interpretation test. The
logic of the computation process and the student’s
understanding of algorithms are measured in the
first part of the Concepts and Estimation test.
The grade placement of content in a computation
test is more crucial than in other areas of
mathematics. For example, whole number division
may be introduced toward the end of third grade in
many textbooks. Including items that measure this
skill in the Level 9 test would be inappropriate if
students have not yet been taught the process. In
placing items that measure specific computational
skills, only skills taught during the school year
preceding the year when the test level is typically
used are included.
The Computation test for Levels 9 through 14 of
Forms A and B differs only in length from recent
editions. At all levels, testing time and number of
items were each reduced by about 25 percent. The
small decrease in the proportion of fraction items
and the small increase in the proportion of decimal
items made with Forms K, L, and M was maintained
in Forms A and B. These modifications reflect
increased emphasis on calculators, computers, and
metric applications in the math curriculum. This
test remains a direct measure of computation that
requires a single operation—addition, subtraction,
multiplication, or division on whole numbers,
fractions, or decimals at appropriate grade levels.
Unlike some tests of computation, the ITBS Math
Computation test does not confound concept and
estimation skills with computational skill.
Computation is included in the Math Total score
unless a special composite, Math Total without
Computation, is requested.
Social Studies
The content domain for the Social Studies test was
designed to broaden the scope of the basic skills and
to assess progress toward additional concepts and
understandings in this area.
Although the Social Studies test requires knowledge,
it deals with more than memorization of facts or the
outcomes of a particular textbook series or course of
study. It is concerned with generalizations and
applications of principles learned in the social
studies curriculum. Many questions on the Social
Studies test involve materials commonly found in
social studies instruction—timelines, maps, graphs,
and other visual stimuli.
Table 3.6
Computational Skill Level Required for Math Problem Solving and Data Interpretation
Iowa Tests of Basic Skills — Complete Battery, Form A
(Number of operations and percent of items per level)
Level
Operation
Whole Numbers and Currency (Totals)
• Basic facts (addition, subtraction,
multiplication, and division)
• Other sums, differences, products, and
quotients: No renaming (no remainder)
• Other sums, differences, products,
and quotients: Renaming (remainder)
Fractions and Decimals
Total Number of Operations
Number of Items Requiring Computation
Number of Items Requiring No Computation
7
8
9
10
11
12
13
14
20
(100)
24
(100)
15
(100)
19
(95)
18
(90)
15
(75)
18
(72)
23
(85)
19
(95)
19
(79)
8
(53)
7
(35)
5
(25)
7
(35)
6
(24)
4
(15)
1
(5)
5
(21)
6
(40)
12
(60)
10
(50)
3
(15)
6
(24)
12
(44)
—
—
1
(7)
—
3
(15)
5
(25)
6
(24)
7
(26)
—
—
—
1
(5)
2
(10)
5
(25)
7
(28)
4
(15)
20
24
15
20
20
20
25
27
17
16
12
11
13
14
15
15
11
14
10
13
13
14
15
17
41
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 42
The content areas in the Social Studies test are
history, geography, economics, and government and
society (including social structures, ethics,
citizenship, and points of view). These areas are
interrelated, and many tasks involve more than one
content area.
The history content requires students to view
events from non-European as well as European
perspectives. The test focuses on historical events
and experiences in the lives of ordinary citizens. In
geography, students apply principles rather than
identify facts. Countries in the eastern and western
hemispheres are represented. In economics,
students are expected to recognize the impact of
technology, to understand the interdependence of
national economies, and to use basic terms and
concepts. Questions in government and society
measure a student’s understanding of responsible
citizenship and knowledge of democratic
government. They also assess needs common to
many cultures and the functions of social
institutions. In addition, a student’s knowledge of
the unique character of cultural groups is measured.
The Social Studies test attempts to represent the
curriculum standards for social studies of several
national organizations. National panels have
developed standards in history, geography,
economics, and civics. The National Council for the
Social Studies (NCSS) adopted Curriculum
Standards for the Social Studies (NCSS, 1994),
which specifies ten content strands for the social
studies curriculum. The domain specifications for
the Social Studies test parallel the areas in which
national standards have been developed. The NCSS
curriculum strands map onto many content
standards of the ITBS. For example, Strand II of the
NCSS Standards (Time, Continuity, and Change) is
represented in history: change and chronology in the
ITBS. Similarly, Strand III (People, Places, and
Environments) matches two ITBS standards in
geography (Earth’s features and people and the
environment). Similar connections exist for other
ITBS content skills.
Some process skills taught in social studies are
assessed in other tests of the ITBS Complete
Battery—in particular, the Maps and Diagrams test
and the Problem Solving and Data Interpretation test.
Science
Content specifications for Forms A and B of the
Science test were influenced by national standards
of science education organizations. The National
42
Science Education Standards (NRC, 1996), prepared
under the direction of the National Research
Council, were given significant attention. In
addition, earlier work on science standards was
consulted. Science for All Americans (Rutherford
and Ahlgren, 1990) and The Content Core: A Guide
for Curriculum Designers (Pearsall, 1993) were
resources for developing Science test specifications.
The main impact of this work was to elevate the role
of process skills and science inquiry in test
development. Depending on test level, one-third to
nearly one-half of the questions concern the general
nature of scientific inquiry and the processes of
science investigation.
For test development, questions were classified by
content and process. The content classification
outlined four major domains:
Scientific Inquiry: understanding methods of
scientific inquiry and process skills used in
scientific investigations
Life Science: characteristics of life processes in
plants and animals; body processes, disease,
and nutrition; continuity of life; and
environmental interactions and adaptations
Earth and Space Science: the Earth’s surfaces,
forces of nature, conservation and renewable
resources, atmosphere and weather, and the
universe
Physical Science: basic understanding of mechanics,
forces, and motion; forms of energy; electricity
and magnetism; properties and changes of
matter
These content standards were used with a process
classification developed by AAAS: classifying,
hypothesizing, inferring, measuring, and explaining.
This classification helped ensure a range of thinking
skills would be required to answer questions in each
content area of Science.
As in social studies, skills associated with the
science curriculum are measured in other tests in
the Complete Battery of the ITBS. Some passages in
the Reading test use science content to measure
comprehension in reading. Skills measured in
Problem Solving and Data Interpretation are
essential to scientific work. Some skills in the
Sources of Information tests are necessary for
information gathering in science. Skill scores from
these tests may be related to a student’s progress in
science or a school’s science curriculum.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 43
Sources of Information
The Iowa Tests recognize that basic skills extend
beyond the school curriculum. Only a small part of
the information needed by an educated adult is
acquired in school. An educated student or adult
knows how to locate and interpret information from
available resources. In all curriculum areas,
students must be able to locate, interpret, and use
information. For this reason, the ITBS Complete
Battery includes separate tests on using information
sources.
Teaching and learning about information sources
differ from other content areas in the ITBS because
“sources of information” is not typically a subject
taught in school. Skills in using information are
developed through many activities in the
elementary school curriculum. Further, the
developmental sequence of these skills is not wellestablished. As a result, before the specifications for
these tests could be written, the treatment and
placement of visual and reference materials in
instruction were examined. Authorities in related
disciplines were also consulted. The most widely
used textbooks in five subject areas were reviewed
to identify grade placement of information sources
and visual displays. Also considered were the extent
to which textbook series agreed on placement and
emphasis, the contribution of subject areas to skills
development, and whether information sources were
accompanied by instruction on their use.
The original skills selected for these tests were
classified as the knowledge and use of (1) map
materials, (2) graphic and tabular materials, and (3)
reference materials. Graphs and tables by and large
have become part of the school math curriculum, so
that category of information sources was shifted to
the Math tests beginning with Form K. In its place,
a category on presentation of information through
diagrams and related visual materials was added.
Such materials have become a regular part of
various publications—from periodicals to textbooks
and research documents—and represent an
important source of information across the
curriculum. General descriptions of the tests in
Sources of Information follow.
Maps and Diagrams. The Maps and Diagrams test
measures general cognitive functions for the
processing of information as well as specific skills in
reading maps and diagrams. Developing the domain
specifications for this test involved formal and
informal surveys of instructional materials that use
visual stimuli. The specifications for map reading
were based on a detailed classification by type,
function, and complexity of maps appearing in
textbooks. The geographical concepts used in map
reading were organized by grade level. These
concepts were classified as: the mechanics of map
reading (e.g., determining distance and direction,
locating and describing places), the interpretation of
data (geographic, sociological, or economic
conditions), and inferring behavior and living
conditions (routes of travel, seasonal variations, and
land patterns). The specifications for the diagrams
part of the test came from analyzing materials in
textbooks and other print material related to
functional reading. Items require locating
information, explaining relationships, inferring
processes or products, and comparing and
contrasting features depicted in diagrams.
Reference Materials. Although students in most
schools have access to reference materials across the
curriculum, few curriculum guides pay explicit
attention to the foundation skills a student needs
to take full advantage of available sources. The
content standards for the Reference Materials test
include aspects of general cognitive development as
well as specific information-gathering skills. The
focus is on skills needed to use a variety of references
and search strategies in classroom research and
writing activities.
The items tap a variety of cognitive skills. In the
section on using a dictionary, items relate to
spelling, pronunciation, syllabification, plural forms,
and parts of speech. In the section on general
references, items concern the parts of a book
(glossary, index, etc.), encyclopedias, periodicals, and
other special references. General skills required to
access information—such as alphabetizing and
selecting guidewords or keywords—are also
measured. To answer questions on the Reference
Materials test, students must select the best source,
judge the quality of sources, and understand search
strategies. The upper levels of Forms A and B also
include items that measure note-taking skills.
Critical Thinking Skills
The items in the ITBS batteries for Levels 9–14
were evaluated for the critical thinking demands
they require of most students. Questions were
classified by multiple reviewers, and a consensus
approach was used for final decisions. The test
specifications for the final test forms did not contain
specific quotas for critical thinking skills. Cognitive
processing demands were considered in developing,
revising, and selecting items, but the definition of
critical thinking was not incorporated directly in
any of those decisions.
43
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 44
Classifying items as requiring critical thinking
depends on judgments that draw upon
(a) knowledge of the appropriate content, (b) an
understanding of how students interact with the
content of learning and the remembering of it, and
(c) a consistent use of the meaning of the term
“critical thinking” in the content area in question.
The ITBS classifications represent the consensus of
the authors about the critical thinking required of
most students who correctly answer such items.
Further information about item classifications for
critical thinking is provided in the Interpretive
Guide for Teachers and Counselors.
Other Validity Considerations
Norms Versus Standards
In using test results to evaluate teaching and
learning, it should be recognized that a norm is
merely a description of average performance. Normreferenced interpretations of test scores use the
distribution of test scores in a norm group as a
frame of reference to describe the performance of
individuals or groups. Norms for achievement
should not be confused with standards for
performance (i.e., indicators of what constitutes
“satisfactory,”
“proficient,”
or
“exemplary”
performance). The distributions of building averages
on the ITBS show substantial variability in average
achievement from one content area to another in the
same school system. For example, schools in a given
locale may spend substantially more time on
mathematics than on writing, and schools with high
averages on the ITBS Language tests may not
emphasize writing. A school that scores below the
norm in math and above the norm in language may
nevertheless need to improve its writing instruction
more than its math instruction. Such a judgment
requires thorough understanding of patterns of
achievement in the district, current teaching
emphasis, test content, and expectations of educators
and the community. All of these factors contribute to
standards-based interpretations of test scores.
Many factors should be considered when evaluating
the performance of a school. These include the
general cognitive ability of the students, learning
opportunities outside the school, the emphasis
placed on basic skills in the curriculum, and the
grade placement and sequencing of the content
taught. Large differences in achievement between
schools in the same system can be explained by such
factors. These factors also can influence how a school
or district ranks compared to general norms. Quality
of instruction is not the only determining factor.
44
What constitutes satisfactory performance, or what
is an acceptable standard, can only be determined
by informed judgments about school and individual
performance. It is likely to vary from one content
area to another, as well as from one locale to
another. Ideally, each school must determine what
may be reasonably expected of its students. Belowaverage performance on a test does not necessarily
indicate poor teaching or a weak curriculum.
Examples of effective teaching are found in many
such schools. Similarly, above-average performance
does not necessarily mean there is no room for
improvement. Interpreting test scores based on
performance standards reflects a collective
judgment about the quality of achievement. The use
of such judgments to improve instruction and
learning is the ultimate obligation of a standardsbased reporting system.
Some schools may wish to formalize the judgments
needed to set performance standards on the Iowa
Tests of Basic Skills. National performance
standards were developed in 1996 in a workshop
organized by the publisher. Details about national
performance standards and the method used to
develop them are given in The Iowa Tests: Special
Report on Riverside’s National Performance
Standards (Riverside Publishing, 1998).
Using Tests to Improve Instruction
Using tests to improve instruction and learning is
the most important purpose of any form of
assessment. It is the main reason for establishing
national norms and developmental score scales.
Valid national norms provide the frame of reference
to determine individual strengths and weaknesses.
Sound developmental scales create the frame of
reference to interpret academic growth—whether
instructional practices have had the desired effect
on achievement. These two frames of reference
constitute the essential contribution of a
standardized achievement test in a local school.
Most teachers provide for individual and group
differences in one way or another. It would be
virtually impossible to structure learning for all
students in exactly the same way even if one wanted
to. The characteristics, needs, and desires of
students require a teacher to allocate attention and
assistance differentially. Test results help a teacher
to tailor instruction to meet individual needs.
Test results are most useful when they reveal
discrepancies in performance—between test areas,
from year to year, between achievement and ability
tests, and between expectations and performance.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 45
Many score reports contain unexpected findings.
These findings represent information that should
not be ignored but instead examined further.
Suggestions about using test results to individualize
instruction are given in the Interpretive Guide for
Teachers and Counselors, along with a classification
of content for each test. Any classification system is
somewhat arbitrary. The content of the ITBS
is represented in the skills a student is required
to demonstrate. The skills taxonomy is re-evaluated
periodically because of changes in curriculum
and teaching methods. For some tests (e.g.,
Capitalization), the categories are highly specific;
for others (e.g., Reading Comprehension), they are
more general. The criteria for defining a test’s
content classification system are meaningfulness
and usefulness to the teacher.
The Interpretive Guide for Teachers and Counselors
and the Interpretive Guide for School Administrators
present the skills classification system for each test.
A detailed list of skills measured by every item in
each form is included. In addition, suggestions are
made for improving achievement in each area. These
are intended as follow-up activities. Comprehensive
procedures to improve instruction may be found in
the literature associated with each curriculum area
in the elementary and middle grades.
Using Tests to Evaluate Instruction
To address the issue of evaluating curriculum and
instruction is to confront one of the most difficult
problems in assessment. School testing programs do
not exist in a vacuum; there are many audiences for
assessment data and many stakeholders in the
results. Districts and states are likely to consider
using standardized tests as part of their evaluation
instruction. Like any assessment information,
standardized test results provide only a partial view
of the effectiveness of instruction. The word “partial”
deserves emphasis because the validity of tests used
to evaluate instruction hinges on it.
First, standardized tests are concerned with basic
skills and abilities. They are not intended to measure
total achievement in a given subject or grade.
Although these skills and abilities are essential to
nearly all types of academic achievement, they do
not include all desired outcomes of instruction.
Therefore, results obtained from these tests do not
by themselves constitute an adequate basis for, and
should not be overemphasized in, the evaluation of
instruction. It is possible, although unlikely, that
some schools or classes may do well on these tests
yet be relatively deficient in other areas of
instruction—for example, music, literature, health,
or career education. Other schools or classes with
below-average test results may provide a healthy
educational environment in other respects.
Achievement tests are concerned with areas of
instruction that can be measured under standard
conditions. The content standards represented in
the Iowa Tests of Basic Skills are important.
Effective use of the tests requires recognition of
their limits, however. Schools should treat as
objectively as possible those aspects of instruction
that can be measured in this way. Other less
tangible, yet still important, outcomes of education
should not be neglected.
Second, local performance is influenced by many
factors. The effectiveness of the teaching staff is only
one factor. Among the others are the cognitive ability
of students, the school environment, the students’
educational history, the quality of the instructional
materials, student motivation, and the physical
equipment of the school.
At all times, a test must be considered a means to an
end and not an end in itself. The principal value of
these tests is to focus the attention of the teaching
staff and the students on specific aspects of
educational development in need of individual
attention. Test results should also facilitate
individualized instruction, identify which aspects of
the instructional program need greater emphasis or
attention, and provide the basis for better
educational guidance. Properly used results should
motivate teachers and students to improve
instruction and learning.
Used with other information, test results can help
evaluate the total program of instruction. Unless
test results are used in this way, however, they may
do serious injustice to teachers or to well-designed
instructional programs.
Local Modification of Test Content
The Iowa Tests are based on a thorough analysis of
curriculum materials from many sources. Every
effort is made to ensure content standards reflect a
national consensus about what is important to teach
and assess. Thus, many questions about content
representativeness of the tests are addressed during
test development.
Adapting the content of nationally standardized
tests to match more closely local or state standards
is a trend in large-scale assessment. Sometimes
these efforts involve augmenting standardized tests
and reporting criterion-referenced information along
45
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 46
with national norms from the intact test. In other
cases, local districts modify a test by selecting some
items and omitting others based on the match with
the local curriculum.
Studies of the latter type of modified test (e.g.,
Forsyth, Ansley & Twing, 1992), also known as
customized tests, have shown that national norms
after modification can differ markedly from norms
on the test as originally standardized. Some of this
distortion may result from selecting items so
tailored to the curriculum that students perform
better than they would on the original version.
Other distortions are caused by context effects on
items. When items are removed, those remaining
may not have the same psychometric properties
(Brennan, 1992), which affects national norms.
When this occurs, other normative score
interpretations are also affected. The evaluation of
strengths and weaknesses, the assessment of
growth, and the status relative to national
standards can be distorted if local modifications do
not retain the same balance of content as in the
original test.
Content standards at the local and state level can
change dramatically over a short time. Performance
standards can be as influenced by politics as by
advances in understanding how students learn. For
these reasons, The Iowa Tests should be
administered under standard conditions to ensure
the validity of norm-referenced interpretations.
Predictive Validity
The Iowa Tests of Basic Skills were not designed as
tests of academic aptitude or as predictors of future
academic success. However, the importance of basic
skills to high school and college success has been
demonstrated repeatedly.
Evidence of the predictive “power” of tests is difficult
to obtain because selection eliminates from research
samples students whose performance fails to qualify
them for later education. Many college students
complete high school and enter college in part because
of high proficiency in the basic skills. Students who
lack proficiency in the basic skills are either not
admitted to college or seek employment. Therefore,
coefficients of predictive validity are obtained for a
select population. Estimates of correlations for an
unselected population can be obtained (e.g., Linn &
Dunbar, 1982), but the assumptions underlying the
computations are not always satisfied.
46
Five studies of predictive validity are summarized
in Table 3.7 with correlation coefficients between
the ITBS Complete Composite and several criterion
measures. In a study by Scannell (1958), ITBS scores
at grades 4, 6, and 8 of students entering one of the
Iowa state universities were correlated with three
criterion measures. These criteria were (a) grade 12
Composite scores on the Iowa Tests of Educational
Development, (b) high school grade-point average
(GPA), and (c) first-year college GPA. Considerable
restriction in the range of the ITBS scores was
present. The observed correlations should be
regarded as lower-bound estimates of the actual
correlation in an unselected population. When
adjustments for restriction in range were made, the
estimated correlations with the ITED grade 12
Composite were .77, .82, and .81 for grades 4, 6, and
8, respectively.
In Rosemier’s (1962) study of freshmen entering The
University of Iowa in the fall of 1962, test scores
were obtained for the ITBS in grade 8 and for the
ITED in grades 10–12. Scores on the American
College Tests (ACT), high school GPA, and first-year
college GPA were also obtained. The standard
deviation of the ITBS Composite scores for the
sample was 7.52, compared to 14.91 for the total
grade 8 student distribution that year. Differences
between the obtained and adjusted correlations
show the effect of range restriction on estimated
validity coefficients.
Loyd, Forsyth, and Hoover (1980) conducted a study
of the relation between ITBS, ITED, and ACT scores
and high school and first-year college GPA of 1,997
graduates of Iowa high schools who entered The
University of Iowa in the fall of 1977. As in the
Rosemier study, variability in the college-bound
population was much smaller than that in the
general high school population.
Ansley and Forsyth (1983) obtained final college
GPAs of the students in the Loyd et al. study. They
found the ITBS predicted final college GPA as well
as it predicted first-year college GPA.
Qualls and Ansley (1995) replicated the Loyd et al.
study with data from freshmen entering The
University of Iowa in the fall of 1991. They found
ITBS and ITED scores still showed substantial
predictive validity, but the correlations between
tests scores and grades were somewhat lower than
those in the earlier study. To investigate these
differences, a study is under way with data from the
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 47
Table 3.7
Summary Data from Predictive Validity Studies
Correlations with ITBS Complete Composite
ITED Composite
Source
Scannell, 1958 (#)
Grade 8
Grade 6
Grade 4
Rosemier, 1962(#)
Grade 8
Loyd et al., 1980
Grade 8
Grade 6
Ansley & Forsyth, 1983 (#)
Grade 8
Grade 6
Qualls & Ansley, 1995 (#)
Grade 8
Grade 6
Grade 4
Grade 10
Grade 11
High School
Grade 12
HS GPA
0.73
0.76
0.68
0.61
0.59
0.53
ACT Composite
College GPA
Freshman
0.48
0.49
0.42
0.78
0.77
0.77
0.59
0.73
0.41
0.84
0.78
0.84
0.80
0.79
0.74
0.49
0.49
0.78
0.73
0.44
0.45
freshman classes of 1996 and 1997. Preliminary
results indicate the correlations between test scores
and GPA are smaller than those reported in the
1950s and 1960s, but the relationship is stronger
than the Qualls and Ansley research suggests.
Three predictive validity studies that examine the
relation between achievement test scores in eighth
grade and subsequent course grades in ninth grade
are summarized below. Dunn (1990) found that
correlation coefficients between the two measures of
performance, test scores and course grades, were
relatively consistent for composites.
The average correlation between the ITBS Complete
Composite and grades across 13 high school
courses—including language arts, U.S. history,
general math, algebra, etc.—was .62. The smallest
correlations were observed in courses for which
selection criteria narrowed the range of overall
achievement considerably (e.g., algebra).
As part of this investigation, a variety of regression
analyses were performed to examine the joint role of
test scores and course grades in predicting later
performance in school. These analyses showed that
Final
0.43
0.44
0.38
0.36
0.32
0.26
0.21
0.18
achievement test scores added significantly to the
prediction of course grades in high school after
performance in middle school courses was taken into
account. Course grades in the middle school years
tended to be better predictors of high school
performance than test scores, suggesting unique
factors influence grades.
Similar results were obtained in a study by Barron,
Ansley, and Hoover (1991) that looked specifically at
predicting achievement in Algebra I. As in the Dunn
study, multiple regression analyses showed that
ITBS scores added significantly to the prediction of
ninth-grade algebra achievement even after
previous grades were taken into account.
A more recent analysis of the relation between ITBS
Core Total scores in grade 8 and ACT composite
scores was conducted by Iowa Testing Programs
(1998). Predictive validity coefficients in this study
were .76, .81, and .78 for fall, midyear, and spring
administrations of the ITBS in grade 8. Predictive
validity coefficients of this magnitude compare
favorably with those of achievement tests given in
the high school years.
47
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 48
Tests such as the ITBS have been used in many
ways to support judgments about how well students
are prepared for future instruction, that is, as
general measures of readiness. This aspect of test
use has become somewhat controversial in recent
years because of situations where tests are used to
control access to educational opportunities.
Readability
The best way to determine the difficulty of a
standardized test is to examine its norms tables and
distribution of item difficulty. The difficulty data for
items, skills, and tests in the ITBS are reported in
Content Classifications with Item Norms. Of the
various factors that influence difficulty, readability
is the focus of much attention.
The readability of written materials is measured in
several ways. An expert may judge the grade level of
a reading passage based on perception of its
complexity. The most common method of quantifying
these judgments is to use a readability formula.
Readability formulas are often used by curriculum
specialists, classroom teachers, and school librarians
to match textbooks and trade books to the reading
abilities of students.
The virtue of readability formulas is objectivity.
Typically, they use vocabulary difficulty (e.g., word
familiarity or length) and syntactic complexity (e.g.,
sentence length) to predict the difficulty of a passage.
The shortcoming of readability formulas is failure to
account for qualitative factors that influence how
easily a reader comprehends written material. Such
factors include organization and cohesiveness of a
selection, complexity of the concepts presented,
amount of knowledge the reader is expected to bring
to the passage, clarity of new information, and
interest level of the material to its audience.
Readability formulas were originally developed to
assess the difficulty of written prose and sometimes
have been used as a basis for modifying written
material. Using a readability formula in this way
does not automatically result in more readable text.
Short, choppy sentences and familiar but imprecise
words can actually increase the difficulty of a
selection even though they lower its index of
readability (Davison & Kantor, 1982).
Readability formulas use word lists that become
dated over time. For instance, the 1958 Dale-Chall
(Dale & Chall, 1948; Powers, Sumner & Kearl, 1958)
and the Bormuth (1969) formulas use the Dale List
of 3,000 words, which reflects the reading
vocabulary of students of the early 1940s. This list
48
results in some curiosities: “Ma” and “Pa,” which
today would appear primarily in regional literature,
are considered familiar; “Mom” is unfamiliar.
Similarly, “bicycle” is an easy word, but “bike” is
hard. “Radio” is familiar; “TV” is not. The 1995 DaleChall revision addresses some of these concerns, but
what is truly familiar and what is not will always be
time dependent.
A similar problem exists in predicting the
vocabulary difficulty of subject-area tests. Words
that are probably familiar to students in a
particular curriculum (e.g., “cost,” “share,” and
“subtract” in math problem solving; “area,” “map,”
and “crop” in social studies; “body,” “bone,” and “heat”
in science; and even the days of the week in a
capitalization test) are treated as unfamiliar words
by the Spache formula.
Readability concerns are often raised on tests such
as math problem solving. It is generally believed
that a student’s performance in math should not be
influenced by reading ability. Readability data are
frequently requested for passages in reading tests.
Here, readability indices document the range of
difficulty included so the test can discriminate the
reading achievement of all students. Since reading
achievement in the middle grades can span seven or
eight grade levels, the readability indices of
passages should vary substantially.
Three readability indices for Forms A and B are
reported in the accompanying table for Reading
Comprehension, Language Usage and Expression,
Math Problem Solving, Social Studies, and Science.
The Spache (1974) index, reported in grade-level
values, measures two factors: mean sentence length
and proportion of “unfamiliar” or “difficult” words.
The Spache formula uses a list of 1,041 words and is
appropriate with materials for students in grades
1 through 3.
The Bormuth formula (Bormuth, 1969, 1971) reflects
three factors: average sentence length, average word
length, and percent of familiar words (based on the
old Dale List). Bormuth’s formula predicts the cloze
mean (CM), the average of percent correct for a set
of cloze passages (the higher the value, the easier
the passage). The value reported in the table is an
inverted Bormuth index. The inverted index was
multiplied by 100 to remove the decimal.
9
10
11
12
13
14
Form B
Level
9
10
11
12
13
14
46-58
38-58
38-49
35-49
27-49
27-44
2.2-4.2
2.2-3.8
—
—
—
—
53
51
50
41
39
38
30
33
35
38
40
43
53
51
50
47
43
41
Number of
1995
Items
Dale-Chall
30
33
35
38
40
43
Number of
1995
Items
Dale-Chall
45
49
50
52
57
59
Inverted
Bormuth
44
47
48
57
61
59
Inverted
Bormuth
1995
Dale-Chall
Spache
3.0
3.1
—
—
—
—
Spache
3.5
3.8
—
—
—
—
Spache
52
49
44
44
40
36
1995
Dale-Chall
48
44
44
42
39
38
1995
Dale-Chall
49
51
54
55
58
60
Inverted
Bormuth
53
56
57
58
59
59
Inverted
Bormuth
9
10
11
12
13
14
Form B
Level
9
10
11
12
13
14
Form A
Level
46
48
47
46
48
47
22
24
26
28
30
32
48
49
50
48
45
42
Number of
1995
Items
Dale-Chall
22
24
26
28
30
32
Number of
1995
Items
Dale-Chall
47
48
52
52
52
53
Inverted
Bormuth
49
49
51
51
51
52
Inverted
Bormuth
Mathematics Problem Solving and
Data Interpretation
42-56
44-59
50-59
50-62
54-67
54-67
Inverted
Bormuth
49-59
51-65
49-65
49-69
54-69
54-67
Inverted
Bormuth
—
30
34
37
39
41
43
51
49
46
43
39
39
9
10
11
12
13
14
30
34
37
39
41
43
49
48
46
40
38
42
Form B Number of
1995
Level
Items
Dale-Chall
9
10
11
12
13
14
50
51
51
56
58
56
Inverted
Bormuth
48
51
53
54
57
57
44
46
48
49
52
55
2.5
2.8
—
—
—
—
53
50
47
46
43
39
1995
Dale-Chall
51
47
45
44
41
40
1995
Dale-Chall
Science
Spache
2.8
3.1
—
—
—
—
Spache
Means
30
34
37
39
41
43
49
47
49
44
42
42
9
10
11
12
13
14
30
34
37
39
41
43
52
51
49
48
45
42
Form B Number of
1995
Level
Items
Dale-Chall
9
10
11
12
13
14
Form A Number of
1995
Level
Items
Dale-Chall
Inverted
Bormuth
44
47
49
51
55
56
Inverted
Bormuth
Inverted
Bormuth
54
52
50
49
45
44
1995
Dale-Chall
54
51
49
47
44
42
1995
Dale-Chall
Social Studies
2.3
2.6
—
—
—
—
Spache
2.3
2.6
—
—
—
—
Spache
Form A Number of
1995
Level
Items
Dale-Chall
37
41
43
45
48
52
Number of
Items
37
41
43
45
48
52
Number of
Items
Means
49
50
49
51
55
58
Inverted
Bormuth
49
50
51
56
59
58
Inverted
Bormuth
47
50
52
53
56
59
Inverted
Bormuth
50
53
54
56
58
58
Inverted
Bormuth
Passages Plus Items
3:15 PM
Form A
Level
Number of
Passages
4
4
8
8
8
8
8
8
Form B
Level
7
8
9
10
11
12
13
14
42-53
34-49
34-53
28-53
28-45
25-45
3.0-5.1
3.0-5.1
—
—
—
—
—
1995
Dale-Chall
—
Spache
Means
Items Only
10/29/10
Usage and Expression
Number of
Passages
1
4
4
8
8
8
8
8
8
Form A
Level
6
7
8
9
10
11
12
13
14
Range
Passages Only
Reading Comprehension
Table 3.8
Readability Indices for Selected Tests
961464_ITBS_GuidetoRD.qxp
Page 49
49
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 50
Like the Bormuth formula, the 1995 Dale-Chall
index estimates the cloze mean but from only two
factors, percent of unfamiliar words and average
sentence length. Easy passages tend to have
relatively few unfamiliar words and shorter
sentences, whereas difficult passages tend to have
more unfamiliar words and longer sentences.
Readability indices for Reading Comprehension are
reported separately by form and level for passages,
items, and passages plus items. Indices for passages
in each level vary greatly, which helps to
discriminate over the range of reading achievement.
The average, however, is typically at or below grade
level. For example, the 1995 Dale-Chall index for
passages in Form A, Level 9, ranges from 42 to 53;
the average is 48. The readability index for an item
50
usually indicates the item is easier to read than the
corresponding passage. In Form A, Levels 10–14, for
example, the average 1995 Dale-Chall index is 43 for
the passages, 48 for the items, and 45 for the
passages plus items.
The formulas tend to treat many words common to
specific subjects as unfamiliar. This could have an
effect on the readability indices for the Social
Studies, Science, and Math Problem Solving and
Data Interpretation tests, especially in the lower
grades. The values given in the accompanying table,
however, are usually below the grade level in which
tests are typically given.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 51
PART 4
Scaling, Norming, and Equating
The Iowa Tests
Frames of Reference for Reporting
School Achievement
Defining the frame of reference to describe and
report educational development is the fundamental
challenge in educational measurement. Some
educators are interested in determining the
developmental level of students, and in describing
achievement as a point on a continuum that spans
the years of schooling. Others are concerned with
understanding student strengths and weaknesses
across the curriculum, setting goals, and designing
instructional programs. Still others want to know
whether students satisfy standards of performance
in various school subjects. Each of these educators
may share a common purpose for assessment but
would require different frames of reference for
reports of results.
This part of the Guide to Research and Development
describes procedures used for scaling, norming, and
equating The Iowa Tests. Scaling methods define
longitudinal score scales for measuring growth in
achievement. Norming methods estimate national
performance and long-term trends in achievement
and provide a basis for measuring strengths and
weaknesses of individuals and groups. Equating
methods establish comparability of scores on
equivalent test forms. Together these techniques
produce reliable scores that satisfy the demands of
users and meet professional test standards.
Comparability of Developmental Scores
Across Levels: The Growth Model
The foundation of any developmental scale of
educational achievement is the definition of gradeto-grade overlap. Students vary considerably within
any given grade in the kinds of cognitive tasks they
can perform. For example, some students in third
grade can solve problems in mathematics that are
difficult for the average student in sixth grade.
Conversely, some students in sixth grade read no
better than the average student in third grade.
There is even more overlap in the cognitive skills of
students in adjacent grades—enough that some
communities have devised multi-age or multi-grade
classrooms to accommodate it. Grade-to-grade
overlap in the distributions of cognitive skills is
basic to any developmental scale that measures
growth in achievement over time. Such overlap is
sometimes described by the ratio of variability
within grade to variability between grades. As this
ratio increases, the amount of grade-to-grade
overlap in achievement increases.
The problems of longitudinal comparability of tests
and vertical scaling and equating of test scores have
existed since the first use of achievement test
batteries to measure educational progress. The
equivalence of scores from various levels is of special
concern in using tests “out-of-level” or in
individualized testing applications. For example, a
standard score of 185 earned on Level 10 should be
comparable to the 185 earned on any other level; a
grade equivalent score of 4.8 earned on Level 10
should be comparable to a grade equivalent of 4.8
earned on another level.
Each test in the ITBS battery from Levels 9 through
14 is a single continuous test representing a range
of educational development from low grade 3
through superior grade 9. Each test is organized as
six overlapping levels. During the 1970s, the tests
were extended downward to kindergarten by the
addition of Levels 5 through 8 of the Primary
Battery. Beginning in 1992, the Iowa Tests of
Educational Development, Levels 15–17/18 were
jointly standardized with the ITBS. A common
developmental scale was needed to relate the scores
from each level to the other levels. The scaling
requirement consisted of establishing the overlap
among the raw score scales for the levels and
relating the raw score scales to a common
developmental scale. The scaling test method used
to build the developmental scale for the ITBS and
ITED, Hieronymus scaling, is described in Petersen,
Kolen & Hoover (1989). Scaling procedures that are
specific to current forms of The Iowa Tests are
discussed in this part of the Guide to Research and
Development.
The developmental scales for the previous editions
of the ITBS steadily evolved over the years of their
use. The growth models and procedures used to
51
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 52
derive the developmental scales for the Multilevel
Battery (Forms 1 through 6) using Hieronymus
scaling are described on pages 75–78 of the 1974
Manual for Administrators, Supervisors, and
Counselors. The downward extension of the growth
model to include Levels 7 and 8 is outlined in the
Manual for Administrators for the Primary Battery,
1975, pages 43–45. The further downward extension
to Levels 5 and 6 in 1978 is described on page 118 of
the 1982 Manual for School Administrators. Over
the history of these editions of the tests, the
scale was adjusted periodically. This was done
to accommodate new levels of the battery or
changes in the ratio of within- to between-grade
variability observed in national standardization
studies and large-scale testing programs that used
The Iowa Tests.
In the 1963 and 1970 national standardization
programs, minor adjustments were made in the
model at the upper and lower extremes of the grade
distributions, mainly as a result of changes in
extrapolation procedures. During the 1970s it
became apparent that differential changes in
achievement were taking place from grade to grade
and from test to test. Achievement by students in
the lower grades was at the same level or slightly
higher during the seven-year period. In the upper
grades, however, achievement levels declined
markedly in language and mathematics over the
same period. Differential changes in absolute level
of performance increased the amount of grade-tograde overlap in performance and necessitated
major changes in the grade-equivalent to percentilerank relationships. Scaling studies involving the
vertical equating of levels were based on 1970–1977
achievement test scores. The procedures and the
resulting changes in the growth models are
described in the 1982 Manual for School
Administrators, pages 117–118.
Between 1977 and 1984, data from state testing
programs and school systems across the country
suggested that differential changes in achievement
across grades had continued. Most of the available
evidence, however, indicated that these changes
differed from changes of the previous seven-year
period. In all grades and test areas, achievement
appeared to be increasing. Changes in median
achievement by grade for 1977–1981 and 1981–84
are documented in the 1986 Manual for School
Administrators (Hieronymus & Hoover, 1986).
Changes in median achievement after 1984 are
described in the 1990 Manual for School
Administrators, Supplement (Hoover & Hieronymus,
1990), and later in Part 4 of this Guide.
52
Patterns of achievement on the tests during the
1970s and 1980s provided convincing evidence that
another scaling study was needed to ascertain the
grade-to-grade overlap for future editions of the
tests. Not only had test performance changed
significantly, so had school curriculum in the
achievement areas measured by the tests. In
addition, in 1992 the ITED was to be jointly
standardized and scaled with the ITBS for the first
time, so developmental links between the two
batteries were needed.
The National Standard Score Scale
Students in the 1992 spring national standardization
participated in special test administrations for
scaling the ITBS and ITED. The scaling tests were
wide-range achievement tests designed to represent
each content domain in the Complete Battery of the
ITBS or ITED. Scaling tests were developed for
three groups: kindergarten through grade 3, grades
3 through 9, and grades 8 through 12. These
tests were designed to establish links among the
three sets of tests from the data collected. During
the standardization, scaling tests in each content
area were spiraled within classrooms to obtain
nationally representative and comparable data for
each subtest.
The scaling tests provide essential information
about achievement differences and similarities
between groups of students in successive grades. For
example, the scores show the variability among
fourth graders in science achievement and the
proportion of fourth graders that score higher in
science than the typical fifth grader. The study of
such relations is essential to building developmental
score scales. These score scales monitor year-to-year
growth and estimate students’ developmental levels
in areas such as reading, language, and math. To
describe the developmental continuum in one
subject area, students in several different grades
must answer the same questions. Because of the
range of item difficulty in the scaling tests, special
Directions for Administration were prepared.
The score distributions on the scaling tests defined
the grade-to-grade overlap needed to establish the
common developmental achievement scale in each
test area. An estimated distribution of true scores
was obtained for every content area using the
appropriate adjustment for unreliability (Feldt &
Brennan, 1989). The percentage of students in a
given grade who scored higher than the median of
other grades on that scaling test was determined
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 53
from the estimated distribution of true scores. This
procedure provided estimates of the ratios of withinto between-grade variability free of chance errors of
measurement and defined the amount of grade-tograde overlap in each achievement domain.
The table summarizes the relations among grade
medians for Language Usage and Expression for
Forms G and H in 1984 and for Forms K and L in
1992. Each row of Table 4.1 reports the percent of
students in that grade who exceeded the median of
the grade in each column. The entries for 1992 also
describe the scale used for Forms A and B after the
2000 national standardization.
The relation of standard scores to percentile ranks
for each grade was obtained from the results of the
scaling test. Given the percentages of students in
the national standardization in one grade above or
below the medians of other grades, within-grade
percentiles on the developmental scale were
determined. These percentiles were plotted and
smoothed. This produced a cumulative distribution of
standard scores for each test and grade, which
represents the growth model for that test. The
relations between raw scores and standard scores
were obtained from the percentile ranks on each scale.
Two factors created the differences between the
1984 and 1992 distributions. First, the ratio of
within- to between-grade variability in student
performance increased. Second, before 1992, the
parts of the growth model below grade 3 and above
grade 8 were extrapolated from the available data
on grades 3–8. In the 1992 standardization, scaling
test data were collected in the primary and high
school grades, which allowed the growth model to be
empirically determined below grade 3 and above
grade 8.
Table 4.1 illustrates the changes in grade-to-grade
overlap that led to the decision to rescale the tests.
Table 4.1
Comparison of Grade-to-Grade Overlap
Iowa Tests of Basic Skills, Language Usage and Expression—Forms K/L vs. Forms G/H
National Standardization Data, 1992 and 1984
Percent of GEs in Each Grade Exceeding Grade Median (Fall)
Determined from the 1992 and 1984 Scaling Studies
Grade
Year
Grade Medians
K
123
K.2
1
140
1.2
2
157
2.2
3
174
3.2
4
190
4.2
5
205
5.2
6
219
6.2
7
230
7.2
8
241
8.2
8
1992
1984
99
99
97
99
91
99
84
96
76
88
66
76
58
64
50
50
7
1992
1984
99
99
95
99
90
98
81
91
70
79
59
65
50
50
42
35
6
1992
1984
99
99
95
99
88
93
77
82
63
67
50
50
41
33
33
17
5
1992
1984
99
99
93
97
83
85
67
68
50
50
35
31
27
16
20
6
4
1992
1984
99
99
98
99
89
88
74
70
50
50
32
30
21
15
14
5
9
1
3
1992
1984
99
99
95
93
79
73
50
50
26
28
13
13
6
3
3
1
1
1
2
1992
1984
99
99
87
78
50
50
19
27
7
11
2
2
1
1
1
1992
1984
94
86
50
50
10
24
1
9
1
1
K
1992
1984
50
50
2
22
1
7
1
1
53
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 54
Table 4.1 indicates that the amount of grade-tograde overlap in the 1992 and 2000 developmental
standard score scale tends to increase steadily from
kindergarten to eighth grade. This pattern is
consistent with a model for growth in achievement
in which median growth decreases across grades at
the same time as variability in performance
increases within grades.
The type of data illustrated in Table 4.1 provides
empirical evidence of grade-to-grade overlap that
must be incorporated into the definition of growth
reflected in the final developmental scale. But such
data do not resolve the scaling problem. Units for
the description of growth from grade to grade must
be defined so that comparability can be achieved
between descriptions of growth in different content
areas. To define these units, achievement data were
examined from several sources in which the focus of
Grade:
standard-score points, but from grade 7 to grade 8 it
averages only 11 points.
The grade-equivalent (GE) scale for The Iowa Tests
is a monotonic transformation of the standard score
scale. As with previous test forms, the GE scale
measures growth based on the typical change
observed during the school year. As such, it
represents a different growth model than does the
standard score scale (Hoover, 1984). With GEs, the
average student ‘‘grows’’ one unit on the scale
each year, by definition. As noted by Hoover,
GEs are a readily interpretable scale for many
elementary school teachers because they describe
growth in terms familiar to them. GEs become less
useful during high school, when school curriculum
becomes more varied and the scale tends to
exaggerate growth.
K
1
2
3
4
5
6
7
8
9
10
11
12
SS:
130
150
168
185
200
214
227
239
250
260
268
275
280
GE:
K.8
1.8
2.8
3.8
4.8
5.8
6.8
7.8
8.8
9.8
10.8
11.8
12.8
measurement was on growth in key curriculum
areas at a national level. The data included results
of scaling studies using not only the Hieronymus
method, but also Thurstone and item-response
theory methods (Mittman, 1958; Loyd & Hoover,
1980; Harris & Hoover, 1987; Becker & Forsyth,
1992; Andrews, 1995). Although the properties of
developmental scales vary with the methods used to
create them, all data sources showed that growth in
achievement is rapid in the early stages of
development and more gradual in the later stages.
Theories of cognitive development also support
these general findings (Snow & Lohman, 1989). The
growth model for the current edition of The Iowa
Tests was determined so that it was consistent with
the patterns of growth over the history of The Iowa
Tests and with the experience of educators in
measuring student growth and development.
The developmental scale used for reporting ITBS
results was established by assigning a score of 200
to the median performance of students in the spring
of grade 4 and 250 to the median performance of
students in the spring of grade 8. The table above
shows the developmental standard scores that
correspond to typical performance of grade groups
on each ITBS test in the spring of the year. The scale
illustrates that average annual growth decreases as
students move through the grades. For example, the
growth from grade 1 to grade 2 averages 18
54
Before 1992, the principal developmental score scale
of the ITBS was defined with grade equivalents
(GEs) using the Hieronymus method. Other scales
for reporting results on those editions of the tests,
notably developmental standard scores, were
obtained independently using the Thurstone
method. For reasons related to the non-normality of
achievement test score distributions (Flanagan,
1951), the Thurstone method was not used for the
current editions of The Iowa Tests. Beginning with
Forms K/L/M, both developmental standard scores
and grade equivalents were derived with the
Hieronymus method. As a result of the development
of new scales, neither GEs nor standard scores for
Forms A and B are directly comparable to those
reported before Forms K/L/M.
The purpose of a developmental scale in
achievement testing is to permit score comparisons
between different levels of a test. Such comparisons
are dependable under standard conditions of test
administration. In some situations, however,
developmental scores (developmental standard
scores and grade equivalents) obtained across levels
may not seem comparable. Equivalence of scores
across levels in the scaling study was obtained
under optimal conditions of motivation. Differences
in attitude and motivation, however, may affect
comparisons of results from ‘‘on-level’’ and ‘‘out-oflevel’’ testing of students who differ markedly in
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 55
developmental level. If students take their tests
seriously, scores from different levels will be similar
(except for errors of measurement). If students are
frustrated or unmotivated because a test is too
difficult, they will probably obtain scores in the
‘‘chance’’ range. But if students are challenged and
motivated taking a lower level, their achievement
will be measured more accurately.
Greater measurement error is expected if students
are assigned an inappropriate level of the test (too
easy or too difficult). This results in a higher
standard score or grade equivalent on higher levels
of the test than lower levels, because standard
scores and grade equivalents that correspond to
‘‘chance’’ increase from level to level. The same is
true for perfect or near-perfect scores. These
considerations show the importance of motivation,
attitude, and assignment of test level in accurately
measuring a student’s developmental level.
For more discussion of issues concerning
developmental score scales, see ‘‘Scaling, Norming,
and Equating’’ in the third edition of Educational
Measurement (Petersen, Kolen & Hoover, 1989).
Characteristics of developmental score scales,
particularly as they relate to statistical procedures
and assumptions used in scaling and equating, have
been addressed by the continuous research program
at The University of Iowa (Mittman, 1958; Beggs &
Hieronymus, 1968; Plake, 1979; Loyd, 1980; Loyd &
Hoover, 1980; Kolen, 1981; Hoover, 1984; Harris &
Hoover, 1987; Becker & Forsyth, 1992; Andrews,
1995).
Development and Monitoring
of National Norms for the ITBS
The procedures used to develop norms for the ITBS
were described in Part 2. Similar procedures have
been used to develop national norms since the first
forms of the ITBS were published in 1956. These
procedures form the basis for normative information
available in score reports for The Iowa Tests:
student norms, building norms, skill norms, item
norms, and norms for special school populations.
Over the years, changes in performance have been
monitored to inform users of each new edition about
the normative differences they might expect with
new test forms.
The 2000 national standardization of The Iowa Tests
formed the basis for the norms of Forms A and B of
the Complete Battery and Survey Battery. Data
from the standardization established benchmark
performance for nationally representative samples
of students in the fall and spring of the school year
and were used to estimate midyear performance
through interpolation.
The differences between 1992 and 2000 performance,
expressed in percentile ranks for the main test
scores, are shown in Table 4.2. The achievement
levels in the first column are expressed in terms of
1992 national percentile ranks. The entries in the
table show the corresponding 2000 percentile ranks.
For example, a score on the Reading test that would
have a percentile rank of 50 in grade 5 according to
1992 norms would convert to a percentile rank of 58
on the 2000 norms.
Trends in Achievement Test Performance
In general, true changes in educational achievement
take place slowly. Despite public debate about school
reform and education standards, the underlying
educational goals of schools are relatively stable.
Lasting changes in educational methods and
materials tend to be evolutionary rather than
revolutionary, and student motivation and public
support of education change slowly. Data from the
national standardizations provide important
information about trends in achievement over time.
Nationally standardized tests of ability and
achievement are typically restandardized every
seven to ten years when new test forms are
published. The advantage of using the same norms
over a period of time is that scores from year to year
can be based on the same metric. Gains or losses are
“real”; i.e., no part of the measured gains or losses
can be attributed to changes in the norms. The
disadvantage, of course, is the norms become dated.
How serious this is depends on how much
performance has changed over the period.
Differences in performance between editions, which
are represented by changes in norms, were
relatively minor for early editions of the ITBS.
This situation changed dramatically in the late
1960s and the 1970s. Shortly after 1965,
achievement declined—first in mathematics, then in
language skills, and later in other curriculum areas.
This downward trend in achievement in the late
1960s and early 1970s was reflected in the test
norms during that period, which were “softer” than
norms before and after that time.
Beginning in the mid-1970s, achievement improved
slowly but consistently in all curriculum areas until
the early 1990s, when it reached an all-time high.
55
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 56
Table 4.2
Differences Between National Percentile Ranks
Iowa Tests of Basic Skills — Forms K/L vs. A/B
National Standardization Data, 1992 and 2000
Reading
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
Language
Corresponding PRs: 2000 National Norms
1
2
3
98
94
88
78
68
58
48
39
31
22
12
5
1
99
97
93
84
73
63
52
41
31
21
12
5
1
99
98
93
86
77
68
58
48
37
26
15
7
3
Grade
5
4
6
7
8
99
98
92
83
74
66
57
48
38
28
16
8
3
99
97
93
83
71
60
50
39
28
18
9
3
1
99
97
91
80
70
60
52
43
34
24
13
6
1
99
97
91
82
73
62
52
41
30
20
11
5
2
99
99
96
88
78
67
58
46
37
26
15
7
2
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
Corresponding PRs: 2000 National Norms
1
2
3
Grade
5
4
6
7
8
98
96
89
79
71
61
51
42
34
24
12
5
2
99
97
91
81
72
62
54
44
34
24
12
5
2
99
97
92
84
75
66
56
47
35
25
15
6
2
99
97
93
85
77
68
60
51
41
30
17
9
3
99
97
91
80
70
59
48
39
30
21
11
5
1
99
96
91
81
71
61
50
42
33
23
14
7
3
98
95
88
79
70
61
51
42
33
25
15
8
3
Mathematics
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
Social Studies
Corresponding PRs: 2000 National Norms
1
2
3
Grade
5
4
6
7
8
98
95
89
78
70
61
51
41
32
23
13
6
2
98
95
88
78
69
58
48
38
29
19
10
4
1
98
95
90
79
69
60
50
40
31
21
11
4
1
99
96
90
81
70
60
51
42
32
22
12
5
1
99
97
90
78
67
56
46
37
28
20
10
4
1
99
96
90
79
68
57
47
38
30
20
11
5
1
98
96
90
80
69
59
49
39
31
21
11
5
2
99
96
91
80
71
62
52
44
34
25
14
6
2
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
Science
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
56
99
97
93
85
77
68
60
51
41
30
19
10
5
Corresponding PRs: 2000 National Norms
1
2
3
Grade
5
4
6
7
8
98
95
91
83
75
67
60
51
41
29
15
6
2
98
96
92
84
75
66
56
48
37
28
15
7
2
99
99
95
87
80
72
61
50
39
27
14
5
2
99
98
94
86
78
68
58
50
41
29
15
6
2
99
97
91
81
71
61
51
42
32
21
11
4
1
98
96
91
82
70
59
47
38
28
19
8
3
1
99
97
92
81
71
62
52
44
32
22
11
4
1
99
98
93
86
79
72
64
53
43
31
19
9
4
Sources of Information
Corresponding PRs: 2000 National Norms
1
2
3
Grade
5
4
99
96
90
80
70
61
53
43
33
22
13
5
1
98
96
90
81
72
64
53
44
35
25
14
6
2
97
93
87
78
69
61
53
43
32
22
13
7
2
98
95
88
79
70
62
53
46
38
27
15
7
2
97
94
87
77
69
62
54
46
37
28
17
9
3
6
7
8
98
94
88
77
68
58
47
38
29
19
8
3
1
99
97
91
82
72
62
51
41
31
21
9
4
1
99
97
91
81
73
63
54
47
38
29
17
7
2
Achievement
Level 1992
99
96
90
80
70
60
50
40
30
20
10
4
1
Corresponding PRs: 2000 National Norms
1
2
3
Grade
5
4
6
7
8
97
95
90
80
70
60
52
42
32
21
10
3
1
98
96
89
80
70
60
52
41
31
23
13
6
3
99
97
93
85
77
68
58
49
38
26
13
5
1
99
97
91
83
74
65
57
47
38
27
14
7
2
99
98
94
85
74
63
52
42
32
23
14
7
2
99
98
94
85
74
63
52
42
33
25
14
6
2
99
96
92
82
71
60
50
41
32
22
14
7
2
99
96
92
83
73
65
56
48
39
28
17
8
3
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 57
Figure 4.1
Trends in National Performance
Iowa Tests of Basic Skills — Complete Composite, Grades 3–8
National Standardization Data, 1955–2000
9.0
Grade 8
Grade-Equivalent Score
8.0
7
Grade
7.0
Grade 6
6.0
5
Grade
5.0
4
Grade
4.0
3
Grade
3.0
1955
1963
1970
1977
1984
1992
2000
Year of National Standardization
Since the early 1990s, no dominant trend in
achievement test scores has appeared. Scores have
increased slightly in some areas and grades and
decreased slightly in others. In the context of
achievement trends since the mid-1950s, achievement
in the 1990s has been extremely stable.
National trends in achievement measured by the
ITBS Complete Composite and monitored across
standardizations are shown in Figure 4.1.
Differences in median performance for each
standardization period from 1955 to 2000 are
summarized in Table 4.3 using 1955 grade
equivalents as the base unit.
Between 1955 and 1963, achievement improved
consistently for most test areas, grade levels, and
achievement levels. The average change for the
composite over all grades represented an
improvement of 2.3 months.
From 1963 to 1970, differences in median composite
scores were negligible, averaging a loss of twotenths of a month. Small but consistent qualitative
differences occurred in some achievement areas and
grades, however. In general, these changes were
positive in the lower grades and tended to balance
out in the upper grades. Gains were fairly consistent
in Vocabulary and Sources of Information, but losses
occurred in Reading and some Language skills.
Math achievement improved in the lower grades,
but sizable losses in concepts and problem solving
occurred in the upper grades.
Between 1970 and 1977, test performance generally
declined, especially in the upper grades. The average
median loss over grades 3 through 8 for the
Composite score was 2.2 months. Differences varied
from grade to grade and test to test. Differences in
grade 3 were generally positive, especially at the
median and above. From grades 4 to 8, performance
declined more markedly. In general, the greatest
57
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 58
Table 4.3
Summary of Median Differences
Iowa Tests of Basic Skills
National Standardization Data, 1955–2000
(1995 National Grade Equivalents in “Months”)
Reading
Vocabulary
Grade
Period
Spelling
Capitalization
Punctuation
L2
Mathematics
Concepts Problems
Usage & Language
Computa& Data
&
Expression
Total
tion
InterpreEstimation
tation
Sources of Information
Math
Total
Social
Studies
Science
SS
SC
Maps
Reference Sources
and
Materials
Total
Diagrams
Composite
Word
Analysis
Listening
WA
Li
RV
RC
L1
L3
L4
LT
M1
M2
M3
MT
S1
S2
ST
CC
8
00-92
92-84
84-77
77-70
70-63
63-55
-1.0
1.1
5.2
-4.9
.4
-.8
-1.0
2.3
6.0
-4.2
-2.1
2.5
-2.2
-.6
5.7
-.3
1.0
2.1
-2.2
3.4
7.8
-10.1
.6
3.3
.1
11.4
5.9
-5.6
-1.6
-2.0
-.7
.4
7.8
-9.7
-3.1
1.6
-1.0
3.5
7.0
-7.2
.0
1.3
1.6
1.9
4.0
-7.9
-1.7
3.4
-1.3
3.1
5.0
-5.1
-3.1
1.6
-1.9
.3
5.6
.0
1.9
4.9
-6.4
-2.4
2.5
-.7
-.6
5.5
-5.2
.8
2.6
-1.5
-2.9
4.4
-2.6
1.4
1.8
-1.0
-1.8
5.0
-3.9
1.0
2.3
-.8
1.2
5.5
-5.0
-1.1
1.6
7
00-92
92-84
84-77
77-70
70-63
63-55
-2.3
1.1
3.1
-2.9
1.0
1.8
-.4
2.2
5.6
-2.6
-1.4
1.5
-2.0
.0
5.5
-1.9
1.1
4.8
-1.6
2.2
6.6
-6.4
-.9
6.6
-1.0
7.7
5.7
-5.4
.0
-2.4
-.6
2.8
5.0
-7.2
-1.6
1.4
-1.2
3.0
5.8
-5.2
.0
2.6
.3
.9
3.7
-6.1
-2.2
3.5
.7
.7
4.2
-5.2
-4.1
4.6
-.7
-.3
4.1
.5
.6
4.5
-5.6
-2.8
4.1
-.8
-.3
3.8
-3.3
1.7
3.8
-4.4
-1.5
2.1
-2.2
2.0
2.8
-2.4
-.9
3.0
-2.7
1.8
3.4
-1.2
1.1
4.3
-3.4
-.4
2.7
6
00-92
92-84
84-77
77-70
70-63
63-55
-.3
2.8
3.3
-3.4
.9
.8
.2
.6
5.5
-3.0
-2.6
2.2
-.8
.2
5.1
-1.0
-1.4
1.5
-1.0
3.0
4.4
-8.3
-2.8
4.5
.5
4.5
5.6
-6.4
-1.4
-.2
.9
3.3
4.9
-7.6
-1.6
2.4
.0
2.4
5.2
-5.8
-1.4
2.1
1.6
3.1
3.5
-6.4
-2.7
4.4
.0
1.2
1.9
-4.3
-3.0
2.5
-1.0
.9
2.9
.7
1.7
2.9
-5.3
-3.0
3.4
.1
-.7
4.0
-3.5
1.4
1.6
-4.1
-1.4
3.2
-3.4
.3
3.8
-1.9
-1.0
3.6
-3.4
1.1
2.3
-.3
1.3
4.1
-3.8
-1.1
2.2
5
00-92
92-84
84-77
77-70
70-63
63-55
-3.3
2.7
3.6
-2.2
1.4
.8
-3.7
3.1
5.2
-1.6
-.7
2.6
-5.3
2.8
5.5
.0
.4
4.0
-4.7
4.1
4.2
-3.6
-2.3
4.4
-3.2
5.5
4.5
-6.0
.0
1.8
-4.0
3.5
4.5
-5.3
-.8
2.0
-4.2
4.0
4.8
-3.7
-.5
3.1
-.6
2.6
3.0
-3.3
.8
3.6
-1.3
2.4
2.1
-3.8
.6
2.6
-1.0
.7
2.4
-.9
1.9
2.5
-3.5
.6
3.1
-1.1
-.6
2.7
-1.5
2.4
1.5
-4.1
.5
3.4
-1.5
.7
1.8
-2.5
.2
3.0
-1.5
1.5
1.6
-2.9
2.5
3.8
-2.3
.6
2.2
4
00-92
92-84
84-77
77-70
70-63
63-55
-3.6
1.3
2.0
-2.1
.6
3.6
-2.2
1.3
4.6
-.9
-.6
2.5
-4.3
1.7
4.8
.3
-1.0
3.4
-4.4
.7
3.0
-3.5
-1.6
6.6
-2.2
4.4
4.0
-3.4
-.6
3.4
-3.3
2.0
4.3
-2.2
-2.2
3.2
-3.5
2.0
4.2
-2.2
-1.4
4.2
-.8
2.0
2.7
-1.9
.6
1.9
-.4
1.5
1.6
-2.1
.3
.6
-1.0
.9
1.9
-.4
1.9
2.1
-2.0
.7
1.3
-2.0
1.2
1.8
-.5
2.3
1.0
-2.3
1.0
2.3
-1.3
1.0
2.8
-2.1
1.3
2.0
-.9
1.9
1.6
-2.4
1.6
2.9
-1.4
.3
2.6
3
00-92
92-84
84-77
77-70
70-63
63-55
-2.9
.7
1.4
3.9
1.3
2.4
-2.4
-1.1
3.2
3.7
.5
4.3
-1.5
2.4
3.2
2.1
.0
3.0
-2.9
3.7
1.9
-.3
-.6
4.6
-1.3
3.7
3.5
-1.4
1.0
1.8
-2.2
4.6
2.9
2.2
.0
.5
-1.9
3.5
2.8
.7
-.6
2.5
.2
1.7
2.1
1.0
.7
2.8
-.1
.4
1.3
2.0
1.9
.6
-1.0
.9
.8
.0
1.1
1.5
1.5
1.6
1.7
-2.4
2.5
1.4
1.9
.6
2.0
-1.9
.4
1.2
2.5
1.0
2.2
-2.1
1.9
1.3
2.2
.9
2.1
-1.9
1.2
2.1
2.6
.4
2.6
2
00-92
92-84
84-77
77-70
-.7
.3
1.9
4.2
-.8
-.1
3.2
1.8
-.6
1.3
1.8
-.4
.3
1.9
.6
-1.2
.6
2.6
1.9
-1.6
-.9
.7
.7
.4
1.7
1.1
-1.4
-.4
.9
1.3
1.2
-.4
2.1
2.0
1.0
.4
3.7
2.0
.9
-1.0
4.9
2.7
.7
1
00-92
92-84
84-77
77-70
-.8
-.9
1.8
.1
1.2
1.8
.9
.4
.1
2.7
2.6
-2.2
-.3
2.6
.8
-.8
-.1
.8
-.9
-.2
1.8
1.6
-.5
-.8
2.4
.9
.6
-1.0
1.5
2.7
-1.0
00-92
92-84
84-77
77-70
70-63
63-55
-2.2
1.6
3.1
-1.9
.9
1.4
-1.6
1.4
5.0
-1.4
-1.2
2.6
-2.0
-.1
3.0
-1.7
1.4
2.2
-1.6
1.5
3.8
-2.2
-.2
2.3
Av.
Grades
3-8
58
Comprehension
Language
-2.7
1.1
5.0
-.6
.0
3.1
-2.8
2.9
4.7
-5.4
-1.3
5.0
-1.2
6.2
4.9
-4.7
-.4
.4
-1.7
2.8
4.9
-5.0
-1.6
1.8
-2.0
3.1
5.0
-3.9
-.6
2.6
.4
2.0
3.2
-4.1
-.8
3.3
-.4
1.6
2.7
-3.1
-1.2
2.1
-1.1
.6
3.0
.0
1.5
3.0
-2.1
.9
2.7
-1.2
.3
3.2
-2.0
1.5
2.1
-3.1
-.7
2.8
-1.4
1.1
2.5
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 59
Figure 4.2
Trends in Iowa Performance
Iowa Tests of Basic Skills — Complete Composite, Grades 3–8
Iowa State Testing Program Data, 1955–2001
(1965 Iowa Grade Equivalents)
9.0
de
Gra
Grade-Equivalent Score
8.0
Gr
7.0
ad
8
e7
de 6
Gra
6.0
de 5
Gra
5.0
e4
Grad
4.0
e3
Grad
3.0
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
Year
declines appeared in Capitalization, Usage and
Expression, and Math Concepts. These trends are
consistent with other national data on student
performance for the same period (Koretz, 1986).
Between 1977 and 1984, the improvement in ITBS
test performance more than made up for previous
losses in most test areas. Achievement in 1984 was
at an all-time high in nearly all test areas. This
upward trend continued through the 1980s and is
reflected in the norms developed for Forms K and L
in 1992.
During February 2000, Forms K and A were jointly
administered to a national sample of students in
kindergarten through grade 8. This sample was
selected to represent the norming population in
terms of variability in achievement, the main
requirement of an equating sample (Kolen &
Brennan, 1995). A single-group, counterbalanced
design was used. In each grade, students took Form
K and Form A of the ITBS Complete Battery.
For each subtest and level, matched student records
of Form K and Form A were created. Frequency
distributions were obtained, and raw scores were
linked by the equipercentile method. The resulting
equating functions were then smoothed with cubic
splines. This procedure defined the raw-score to
raw-score relationship between Form A and Form K
for each test. Standard scores on Form A could then
be determined for norming dates before 2000 by
linear interpolation. In this way, trend lines could be
updated, and expected relations between new and
old test norms could be determined.
Trends in the national standardization data for the
ITBS are reflected in greater detail in the trends for
the state of Iowa. Virtually all schools in Iowa
participate on a voluntary basis in the Iowa Basic
Skills Testing Program. Trend lines for student
performance have been monitored as part of the
Iowa state testing program since the 1950s.
59
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 60
Trend lines for the state of Iowa (Figure 4.2) show a
pattern in achievement test scores that is similar to
that in national standardization samples. For any
given grade, the peaks and valleys of overall
achievement measured by the ITBS occur at about
the same time. Further, both Iowa and national trends
indicate that test scores in the lower elementary
grades, grades 3 and 4, have generally held steady
or risen since the first administration of the ITBS
Multilevel Battery. An exception to this observation
appears in the Iowa data in the years since the 1992
standardization, when declining Composite scores
were observed in grades 3 and 4 for the first time.
This decline was also evident in the 2000 national
standardization and extended to grade 5.
Norms for Special School Populations
As described in Part 2, the 2000 standardization
sample included three independent samples: a
public school sample, a Catholic school sample, and
a private non-Catholic school sample. Schools in
the standardization were further stratified by
socioeconomic status. Data from these sources were
used to develop special norms for The Iowa Tests for
students enrolled in Catholic/private schools, as well
as norms for other groups.
The method used to develop norms was the same
for each special school population. Frequency
distributions from each grade in the standardization
sample were cumulated for the relevant group of
students. The cumulative distributions were then
plotted and smoothed.
Interpretive Guide for Teachers and Counselors,
show the parallelism achieved in content for each
test and level. Alternate forms of tests should be
similar in difficulty as well. Concurrent assembly of
test forms provides some control over difficulty, but
small differences between forms are typically
observed during standardization. Equating methods
are used to adjust scores for differences in difficulty
not controlled during assembly of the forms.
Forms A and B were assembled concurrently to the
same content and difficulty specifications from the
pool of items included in preliminary and national
item tryouts. In the tests consisting of discrete
questions (Vocabulary, Spelling, Capitalization,
Punctuation, Concepts and Estimation, Computation,
and parts of Social Studies, Science, and Reference
Materials), items in the same or similar content,
skills, and difficulty categories were first assigned,
more or less at random, to Form A or B. Then,
adjustments were made to avoid too much similarity
from item to item and to achieve comparable
difficulty distributions across forms.
Concurrent assembly of multiple test forms is the
best way to ensure comparability of scores and
reasonable results from equating. Linking methods
rely on comparable content to justify the term
“equating” (Linn, 1993; Kolen & Brennan, 1995).
The Iowa Tests are designed so truly equated scores
on parallel forms can be obtained.
The Iowa Tests of Basic Skills have been
restandardized approximately every seven years.
Each time new forms are published, they are
carefully equated to previous forms. Procedures for
equating previous forms to each other have been
described in the Manual for School Administrators
for those forms. The procedures used in equating
Forms A and B of the current edition are described
in this part of the Guide to Research and Development.
Forms A and B of the Complete Battery were
equated with a comparable-groups design (Petersen,
Kolen & Hoover, 1989). In Levels 7 and 8 of the
Complete Battery, which are read aloud by
classroom teachers, test forms were administered by
classroom to create comparable samples. Student
records were matched to Form A records from the
spring standardization. Frequency distributions in
the fall sample were weighted so that the Form A
and B cohorts had the same distribution on each
subtest in the spring sample. The weighted
frequency distributions were used to obtain the
equipercentile relationship between Form A and B
of each subtest. This relation was smoothed with
cubic splines (Kolen, 1984) and standard scores were
attached to Form B raw scores by interpolation.
The equivalence of alternate forms of The Iowa Tests
is established through careful test development and
standard methods of test equating. The tests are
assembled to match tables of specifications that are
sufficiently detailed to allow test developers to
create equivalent forms in terms of test content. The
tables of skill classifications, included in the
At Levels 9 through 14, Forms A and B were
spiraled within classroom to obtain comparable
samples. Frequency distributions for the two forms
were linked by the equipercentile method and
smoothed with cubic splines. Standard scores were
attached to each raw score distribution using the
equating results. The raw-score to standard-score
Equivalence of Forms
60
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 61
conversions were then smoothed. Table 4.4 reports
the sample sizes used in the equating of Levels 7–14
of Forms A and B.
Table 4.4
Sample Sizes for Equating Forms A and B
Form B
Form A
Level
7
8
9
10
11
12
13
14
1
2
Survey
1
3030
1697
3098
5966
5548
6189
5445
5834
Complete 1
Survey 2
5797
3703
3045
6038
5620
6533
5681
6091
2767
2006
1437
2918
2696
2857
2463
2633
Comparable-groups design
Single-group design
The raw-score to standard-score conversions for
the ITBS Survey Battery of Forms A and B also
were developed with data from the 2000 fall
standardization sample. At Levels 9 through 14 in
the fall standardization, students took one form of
the Complete Battery and the alternate form of the
Survey Battery in a counterbalanced design. This
joint administration defined the equipercentile
relation between each Survey Battery subtest and
the corresponding test of the Complete Battery via
intact administrations of each version. The equating
function was smoothed via cubic splines, and the
resulting raw-score to raw-score conversion tables
were used to attach standard scores to the Survey
Battery raw scores.
Forms A and B contain a variety of testing
configurations of The Iowa Tests. For normative
scores, methods for equating parallel forms used
empirical data designed specifically to accomplish
the desired linking. These methods do not rely
on mathematical models, such as item response
theory or strong true-score theory, which entail
assumptions about the relationship between
individual items and the domain from which they
are drawn or about the shape of the distribution of
unobservable true scores. Instead, these methods
establish direct links between the empirical
distributions of raw scores as they were observed in
comparable samples of examinees. The equating
results accommodate the influence of context or
administrative sequence that could affect scores.
Relationships of Forms A and B
to Previous Forms
Forms 1 through 6 of the Iowa Tests of Basic Skills
Multilevel Battery were equivalent forms in many
ways. Pairs of forms—1 and 2, 3 and 4, 5 and 6—
were assembled as equivalent forms in the manner
described for Forms A and B. Because the objectives,
placement, and methodology in basic skills
instruction changed slowly when these forms were
used, the content specifications of the three pairs of
forms did not differ greatly. One exception, Math
Concepts, was described previously. The organization
of the tests in the battery, the number of items per
level, the time limits, and even the number of items
per page were identical for the first six forms.
The first significant change in organization of the
battery occurred with Forms 7 and 8, published in
1978. Separate tests in Map Reading and Reading
Graphs and Tables were replaced by a single Visual
Materials test. In mathematics, separate tests in
Problem Solving and Computation replaced the test
consisting of problems with embedded computation.
Other major changes included a reduction in the
average number of items per test, shorter testing
time, a revision in grade-to-grade item overlap, and
major revisions in the taxonomy of skills objectives.
With Forms G and H, published in 1985, the format
changed considerably. Sixteen pages were added to
the multilevel test booklet. Additional modifications
were made in grade-to-grade overlap and in number
of items per test. For most purposes, however, Forms
G, H, and J were considered equivalent to Forms 7
and 8 in all test areas except Language Usage.
As indicated in Part 3, the scope of the Usage test
was expanded to include appropriateness and
effectiveness of expression as well as correct usage.
Forms K, L, and M continued the gradual evolution
of content specifications to adapt to changes in
school curriculum. The most notable change was in
the flexibility of configurations of The Iowa Tests to
meet local assessment needs. The Survey Battery
was introduced for schools that wanted general
achievement information only in reading, language
arts, and mathematics. The Survey Battery is
described in Part 9.
Other changes in Forms K, L, and M occurred in how
Composite scores were defined. Three core Composite
scores were established for these forms: Reading
Total, Language Total, and Mathematics Total. The
Reading Total was defined as the average of the
61
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 62
standard scores in Vocabulary and Reading
Comprehension. The Language Total, identical to
previous editions, was the average standard score of
the four Language tests: Spelling, Capitalization,
Punctuation, and Usage and Expression. The Math
Total for Forms K, L, and M was defined in two
ways: the average of the first two subtests (Concepts
& Estimation and Problem Solving & Data
Interpretation) or the average of all three subtests
(including Computation). The Social Studies and
Science tests, considered supplementary in previous
editions, were moved to the Complete Battery and
were added to the ITBS Complete Composite score
beginning with Forms K, L, and M.
Forms K, L, and M also introduced changes to the
makeup of the tests in math and work-study skills
(renamed Sources of Information). In math, a
separately timed estimation component was added
to the Concepts test, resulting in a new test called
Concepts and Estimation. The Problem Solving test
was modified by adding items on the interpretation
of data using graphs and tables, which had been in
the Visual Materials test in Forms G, H, and J. This
math test was called Problem Solving and Data
Interpretation in Forms K, L, and M. A concomitant
change in what had been the Visual Materials test
involved adding questions on schematic diagrams
and other visual stimuli for a new test: Maps
and Diagrams.
An additional change in the overall design
specifications for the tests in Forms K, L, and M
concerned grade-to-grade overlap. Previous test
forms had overlapping items that spanned three
levels. Overlapping items in the Complete Battery of
Forms K, L, and M, Levels 9–14, spanned two levels.
The Survey Battery contained no overlapping items.
Forms A and B of the ITBS are equivalent to
Forms K, L, and M in most test areas. Minor
changes were introduced in time limits, number of
items, and content emphasis in Vocabulary, Reading
Comprehension, Usage and Expression, Concepts
and Estimation, Problem Solving and Data
Interpretation, Computation, Science, and Reference
Materials. These changes were described in Part 3.
62
The other fundamental change in the ITBS occurred
in the 1970s with the introduction of the Primary
Battery (Levels 7 and 8) with Forms 5 and 6 in 1971
and the Early Primary Battery (Levels 5 and 6) in
1977. These levels were developed to assess basic
skills in kindergarten through grade 3. Machinescorable test booklets contain responses with
pictures, words, phrases, and sentences designed for
the age and developmental level of students in the
early grades.
In Levels 5 and 6 of the Early Primary Battery,
questions in Listening, Word Analysis, Vocabulary,
Language, and Mathematics are read aloud by the
teacher. Students look at the responses in the test
booklet as they listen. Only the Reading test in
Level 6 requires students to read words, phrases,
and sentences to answer the questions.
In the original design of Levels 7 and 8, questions on
tests in Listening, Word Analysis, Spelling, Usage
and Expression, Mathematics (Concepts, Problems,
and Computation), Visual Materials, Reference
Materials, Social Studies, and Science were read
aloud by the teacher. In Vocabulary, Reading,
Capitalization, and Punctuation, students read the
questions on their own.
Because of changes in instructional emphasis,
Levels 5 through 8 of the ITBS have been revised
more extensively than other levels. Beginning with
Forms K, L, and M, the order of the subtests was
changed. The four Language tests were combined
into a single test with all questions read aloud by
the teacher. At the same time, graphs and tables
were moved from Visual Materials to Math Problem
Solving, and a single test on Sources of Information
was created.
Forms A and B are equivalent to Forms K, L, and M
in most respects. The number of spelling items in
the Language test was increased so that a separate
Spelling score could be reported. Slight changes
were also made in the number of items in several
other subtests and in the number of response options.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 63
PART 5
Reliability of The Iowa Tests
Methods of Determining, Reporting, and
Using Reliability Data
A soundly planned, carefully constructed, and
comprehensively standardized achievement test
battery represents the most accurate and
dependable measure of student achievement
available to parents, teachers, and school officials.
Many subtle, extraneous factors that contribute to
unreliability and bias in human judgment have
little or no effect on standardized test scores. In
addition, other factors that contribute to the
apparent inconsistency in student performance can
be effectively minimized in the testing situation:
temporary changes in student motivation, health,
and attentiveness; minor distractions inside and
outside the classroom; limitations in number, scope,
and comparability of the available samples of
student work; and misunderstanding by students of
what the teacher expects of them. The greater
effectiveness of a well-constructed achievement test
in controlling these factors—compared to a teacher’s
informal evaluation of the same achievement—is
evidenced by the higher reliability of the test.
Test reliability may be quantified by a variety of
statistical data, but such data reduce to two basic
types of indices. The first of these indices is the
reliability coefficient. In numerical value, the
reliability coefficient is between .00 and .99, and
generally for standardized tests between .60 and
.95. The closer the coefficient approaches the upper
limit, the greater the freedom of the test scores from
the influence of factors that temporarily affect
student performance and obscure real differences in
achievement. This ready frame of reference for
reliability coefficients is deceptive in its simplicity,
however. It is impossible to conclude whether a
value such as .75 represents a “high” or “low,”
“satisfactory” or “unsatisfactory” reliability. Only
after a coefficient has been compared to those of
equally valid and equally practical alternative tests
can such a judgment be made. In practice, there is
always a degree of uncertainty regarding the terms
“equally valid” and “equally practical,” so the
reliability coefficient is rarely free of ambiguity.
Nonetheless, comparisons of reliability coefficients
for alternative approaches to assessment can be
useful in determining the relative stability of the
resulting scores.
The second of the statistical indices used to describe
test reliability is the standard error of
measurement. This index represents a measure of
the net effect of all factors leading to inconsistency
in student performance and to inconsistency in the
interpretation of that performance. The standard
error of measurement can be understood by a
hypothetical example. Suppose students with the
same reading ability were to take the same reading
test. Despite their equal ability, they would not all
get the same score. Instead, their scores would range
across an interval. A few would get much higher
scores than they deserve, a few much lower; the
majority would get scores fairly close to their actual
ability. Such variation in scores would be
attributable to differences in motivation,
attentiveness, and other factors suggested above.
The standard error of measurement is an index of
the variability of the scores of students having the
same actual ability. It tells the degree of precision in
placing a student at a point on the achievement
continuum.
There is, of course, no way to know just how much a
given student’s achievement may have been underor overestimated from a single administration of a
test. We may, however, make reasonable estimates of
the amount by which the abilities of students in a
particular reference group have been mismeasured.
For about two-thirds of the examinees, the test
scores obtained are “correct” within one standard
error of measurement; for 95 percent, the scores are
incorrect by less than two standard errors; for more
than 99 percent, the scores are incorrect by less than
three standard error values.
Two methods of estimating reliability were used
to obtain the summary statistics provided in
the following two sections. The first method
employed internal-consistency estimates using
Kuder-Richardson Formula 20 (K-R20). Reliability
coefficients derived by this technique were based on
data from the entire national standardization
sample. The coefficients for Form A of the Complete
63
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 64
Battery are reported here. Coefficients for Form B of
the Complete Battery and Forms A and B of the
Survey Battery are available in Norms and Score
Conversions for each form and battery.
The second method provided estimates of
equivalent-forms reliability for Forms K and A
from the spring 2000 equating of those forms,
and for Forms A and B from the fall 2000
standardization sample.
Prior to the spring standardization, a national
sample of students took Forms K and A of the ITBS
Complete Battery. Correlations between tests on
alternate forms served as one estimate of reliability.
During the fall standardization, students were
administered the Complete Battery of one form and
the Survey Battery of the other form. The observed
relationships between scores on the Complete
Battery and Survey Battery were used to estimate
equivalent-forms reliability of the tests common to
the two batteries. These estimates were computed
from unweighted distributions of developmental
standard scores.
64
Internal-Consistency Reliability Analysis
The reliability data presented in Table 5.1 are
based on Kuder-Richardson Formula 20 (K-R20)
procedures. The means, standard deviations, and
item proportions used in computing reliability
coefficients for Form A are based on the entire
spring national standardization sample. Means,
standard deviations, and standard errors of
measurement are shown for raw scores and
developmental standard scores.
Some tests in the current edition of the ITBS have
fewer items and shorter time limits than previous
forms. The reliability coefficients compare favorably
with those of previous editions.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 65
Table 5.1
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 5
Kindergarten
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Core
Total
Reading
Profile
Total
V
WA
Li
L
M
CT
RPT
Number of items
29
30
29
29
29
Fall
RSs
Mean
SD
SEM
17.3
4.7
2.3
15.8
5.7
2.4
16.3
5.1
2.5
15.5
5.3
2.4
16.2
4.7
2.3
SSs
Mean
SD
SEM
121.7
13.1
6.4
121.4
12.9
5.5
122.7
9.3
4.5
123.3
8.6
3.9
121.4
8.9
4.4
121.8
9.0
2.9
122.0
8.9
3.2
Reliability K-R20
.763
.820
.770
.797
.748
.896
.873
Spring
RSs
Mean
SD
SEM
20.2
4.0
2.2
19.8
5.2
2.3
20.2
5.0
2.3
19.9
4.9
2.3
20.9
4.6
2.1
SSs
Mean
SD
SEM
131.1
15.0
8.2
131.5
14.3
6.3
130.8
10.8
4.9
131.1
9.4
4.4
130.7
9.8
4.5
130.8
9.8
3.4
130.9
11.1
3.8
Reliability K-R20
.699
.806
.793
.788
.793
.877
.882
Level 6
Grade 1
Number of items
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Core
Total
Reading
Words
Reading
Comprehension
Reading
Total
Reading
Profile
Total
V
WA
Li
L
M
CT
RW
RC
RT
RPT
31
35
31
31
35
29
19
48
18.4
6.3
2.3
8.5
4.4
1.9
27.5
10.0
3.0
Fall
RSs
Mean
SD
SEM
18.8
4.9
2.4
21.5
5.8
2.5
18.2
5.1
2.5
16.1
5.5
2.5
19.9
5.8
2.6
SSs
Mean
SD
SEM
138.1
16.0
7.9
138.9
15.9
6.9
138.1
11.9
5.8
138.3
11.0
5.0
138.3
11.3
5.1
138.2
10.9
3.6
139.1
10.2
3.7
139.1
10.2
4.5
139.0
9.1
2.7
138.6
12.3
2.7
.754
.811
.764
.790
.793
.893
.871
.805
.909
.953
Reliability K-R20
Spring
RSs
Mean
SD
SEM
22.2
4.3
2.3
25.7
5.2
2.3
22.7
4.5
2.2
21.6
4.9
2.3
25.1
5.6
2.4
SSs
Mean
SD
SEM
150.9
18.0
9.4
152.2
18.4
8.2
150.4
13.5
6.6
151.5
13.4
6.2
150.2
13.6
5.7
.725
.800
.758
.786
.821
Reliability K-R20
24.4
5.1
1.7
13.7
4.8
1.6
38.1
9.6
2.4
151.4
12.7
4.2
152.2
14.3
4.8
152.2
14.3
4.8
151.5
13.5
3.4
151.3
12.6
3.1
.890
.886
.889
.937
.938
65
.916
.896
.910
.886
Note: -Does not include Computation
+Includes Computation
K-R20
152.2
14.3
4.3
150.9
18.0
6.1
Mean
SD
SEM
SSs
Reliability
22.4
7.9
2.4
18.6
6.8
2.3
Mean
SD
SEM
RSs
Spring — Grade 1
K-R20
158.9
16.3
4.7
157.5
19.0
6.1
Mean
SD
SEM
SSs
Reliability
25.6
7.4
2.1
20.9
6.6
2.1
34
30
Mean
SD
SEM
RC
RV
RSs
Comprehension
Vocabulary
.925
151.3
13.5
3.7
.940
158.4
15.8
3.9
RT
Reading
Total
.853
152.2
18.4
7.1
23.7
6.5
2.5
.868
159.2
20.4
7.4
25.8
6.3
2.3
35
WA
Word
Analysis
.699
150.4
13.5
7.4
20.7
4.4
2.4
.716
156.9
14.8
7.9
22.6
4.2
2.2
31
Li
Listening
.880
150.4
11.2
3.9
16.3
5.3
1.8
.907
157.0
12.9
3.9
18.7
5.0
1.5
23
L1
Spelling
.869
151.5
13.4
4.8
23.2
6.7
2.4
.874
158.1
15.1
5.4
26.0
6.1
2.2
34
L
Language
.776
150.0
13.5
6.4
20.4
4.5
2.1
.806
156.5
14.5
6.4
22.4
4.3
1.9
29
M1
Concepts
.807
150.7
15.9
7.0
17.0
5.1
2.2
.845
157.5
17.3
6.8
19.0
5.2
2.0
28
M2
Problems
& Data
Interpretation
.878
150.2
13.6
4.7
.900
157.1
14.8
4.7
M3
MT -
.865
150.1
9.3
3.4
18.6
5.6
2.0
.872
154.2
9.9
3.5
20.6
5.2
1.9
27
Computation
Math
Total
Mathematics
.910
150.2
11.2
3.4
.932
156.4
12.8
3.3
MT +
Math
Total
.959
151.4
12.7
2.6
.962
157.8
13.8
2.7
CT -
Core
Total
.964
151.7
12.2
2.3
.966
157.4
13.4
2.5
CT +
Core
Total
.750
151.1
15.2
7.6
21.8
4.7
2.3
.755
157.8
16.3
8.1
23.6
4.4
2.2
31
SS
Social
Studies
.726
149.8
16.8
8.8
23.8
4.0
2.1
.702
157.4
18.3
10.0
25.3
3.5
1.9
31
SC
Science
.843
150.5
13.1
5.2
14.3
4.9
1.9
.862
157.6
14.8
5.5
16.6
4.6
1.7
22
SI
.966
151.2
12.3
2.3
.966
157.5
13.2
2.4
CC -
Sources
Compoof
site
Information
.965
151.4
11.8
2.2
.966
157.8
13.0
2.4
CC +
Composite
.956
151.3
12.6
2.6
.961
158.5
14.0
2.8
RPT
Reading
Profile
Total
3:15 PM
Fall — Grade 2
66
10/29/10
Number of items
Level 7
Reading
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 66
.900
.875
.897
.873
Note: -Does not include Computation
+Includes Computation
K-R20
170.7
19.6
6.3
168.6
19.8
7.1
Mean
SD
SEM
SSs
Reliability
27.9
7.2
2.3
20.0
6.8
2.4
Mean
SD
SEM
RSs
Spring — Grade 2
K-R20
177.5
21.4
6.8
175.4
20.6
7.3
Mean
SD
SEM
SSs
Reliability
29.9
6.8
2.1
22.3
6.5
2.3
Mean
SD
SEM
RSs
Fall — Grade 3
38
RC
RV
.939
170.0
19.1
4.7
.939
176.3
20.1
5.0
RT
Reading
Total
.847
171.0
23.7
9.3
24.3
6.8
2.7
.862
177.6
25.4
9.4
26.1
6.8
2.5
38
WA
.723
168.2
16.3
8.6
21.5
4.5
2.4
.740
174.6
17.3
8.8
23.1
4.4
2.2
31
Li
Listening
.821
168.5
15.8
6.7
16.9
4.4
1.9
.853
175.4
17.9
6.8
18.5
4.3
1.6
23
L1
Spelling
.875
169.8
17.2
6.1
30.1
7.4
2.6
.891
177.0
19.5
6.4
32.7
7.1
2.4
42
L
Language
.787
168.1
16.5
7.6
21.4
4.8
2.2
.815
174.5
17.7
7.6
23.2
4.7
2.0
31
M1
Concepts
.834
169.1
19.8
8.1
19.5
5.5
2.2
.859
176.1
21.3
8.0
21.2
5.6
2.1
30
M2
Problems
& Data
Interpretation
.892
168.6
16.9
5.6
.910
175.6
18.4
5.5
M3
MT -
.839
168.3
13.1
5.3
20.0
5.2
2.1
.854
172.3
13.9
5.3
21.4
5.2
2.0
30
Computation
Math
Total
Mathematics
.922
168.9
14.7
4.1
.935
174.5
16.0
4.1
MT +
Math
Total
.960
169.6
15.9
3.2
.965
175.9
17.4
3.3
CT -
Core
Total
.964
169.9
15.3
2.9
.967
175.7
16.7
3.0
CT +
Core
Total
.655
169.5
17.8
10.5
20.8
4.1
2.4
.714
176.6
19.4
10.4
22.3
4.2
2.2
31
SS
Social
Studies
.726
169.7
21.2
11.1
21.1
4.5
2.4
.736
176.9
22.5
11.6
22.6
4.3
2.2
31
SC
Science
.851
169.1
17.1
6.6
18.5
5.7
2.2
.869
176.5
18.8
6.8
20.8
5.5
2.0
28
SI
.964
169.6
15.2
2.9
.971
176.4
17.1
2.9
CC -
Sources
Compoof
site
Information
.964
169.5
15.0
2.8
.971
175.8
17.1
2.9
CC +
Composite
.955
169.7
16.1
3.4
.957
176.3
17.1
3.5
RPT
Reading
Profile
Total
3:15 PM
32
Comprehension
Vocabulary
Word
Analysis
10/29/10
Number of items
Level 8
Reading
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 67
67
15.4
6.6
2.3
12.5
5.2
2.1
11.8
5.1
2.1
24
L3
15.7
6.9
2.4
30
L4
LT
Usage & Language
Expression
Total
16.4
5.7
2.5
31
M1
11.7
4.9
2.0
22
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
M3
MT -
11.5
5.1
2.1
25
Computation
Math
Total
MT +
Math
Total
CT -
CT +
Core
Total
16.1
6.0
2.4
30
SS
Social
Studies
13.9
6.1
2.4
30
SC
Science
13.1
4.4
2.1
24
S1
13.0
6.0
2.3
28
S2
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
CC -
CC +
Compo- Composite
site
21.8
6.2
2.6
35
WA
Word
Analysis
19.3
4.4
2.5
31
Li
Listening
RPT
Reading
Profile
Total
18.8
6.5
2.1
.882
14.7
5.5
2.0
.838
14.0
5.3
2.1
.833
18.6
7.0
2.3
.878
.953
19.7
5.7
2.4
.813
13.8
5.1
1.9
.833
.900
16.0
5.6
2.0
.823
.927
.972
.976
18.9
6.0
2.3
.840
16.5
6.3
2.4
.845
15.0
4.6
2.0
.763
16.0
6.3
2.3
.844
.892
.976
.977
23.9
6.3
2.5
.822
21.6
4.5
2.3
.679
.953
.896
.912
.946
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
.897
.863
.850
.892
.957
.827
.855
.912
.865
.934
.976
.979
.848
.855
.805
.868
.901
.980
.981
.849
.736
.960
Mean 185.0 187.8 186.2 185.8 187.2 188.3 188.7 188.1 185.2 186.6 186.2 185.4 185.8 186.8 186.9 186.8 187.4 187.2 186.3 187.0 187.4 187.3 187.2 184.2 186.2
SD 21.6 24.5 21.7 20.4 29.2 27.4 28.2 22.7 19.2 24.0 20.5 16.7 17.7 19.9 19.1 21.7 25.2 25.0 19.8 21.0 20.0 19.9 28.6 19.2 19.0
7.0
7.3
5.0
6.5 10.8 10.6
9.3
4.7
8.0
9.1
6.1
6.1
4.5
3.1
2.8
8.4
9.6 11.0
7.2
6.6
2.8
2.8 11.1
9.8
3.8
SEM
.940
SSs
21.5
8.7
2.6
.896
Mean
SD
SEM
18.1
6.8
2.2
.885
RSs
Spring
Reliability K-R20
Mean 175.4 177.5 176.3 175.4 175.1 177.5 176.8 177.0 174.5 176.1 175.6 172.3 174.5 175.9 175.7 176.6 176.9 177.6 176.5 176.5 176.4 175.8 177.6 174.6 176.3
SD 20.6 21.4 20.1 17.9 23.2 23.6 23.9 19.5 17.7 21.3 18.4 13.9 16.0 17.4 16.7 19.4 22.5 21.5 16.8 18.8 17.1 17.1 25.4 17.3 17.1
7.0
6.9
4.9
6.1
9.3
9.6
8.4
4.2
7.6
8.7
5.8
5.8
4.3
2.9
2.6
7.8
8.9 10.4
6.6
6.2
2.6
2.6 10.7
9.8
3.7
SEM
17.9
8.2
2.7
24
L2
Punctuation
Core
Total
SSs
15.0
6.8
2.3
28
L1
Spelling
Capitalization
Mathematics
Mean
SD
SEM
RT
Reading
Total
Language
RSs
Fall
37
RC
RV
29
Comprehension
Vocabulary
Reading
3:15 PM
Number of items
68
10/29/10
Grade 3
Level 9
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 68
Mean
SD
SEM
SSs
Mean
SD
SEM
SSs
.944
.906
.892
202.5
25.3
8.3
201.2
24.4
5.8
199.9 202.6
23.4 28.7
7.2
9.0
.901
20.3
7.0
2.3
24.4
8.8
2.8
20.3
8.0
2.4
.882
.938
.895
.887
192.2
22.2
7.6
191.8
22.8
5.7
191.1 193.8
22.5 25.9
7.3
8.7
32
L1
17.4
7.1
2.4
RT
Spelling
21.8
8.4
2.8
17.3
7.8
2.5
41
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
Mean
SD
SEM
RSs
Spring
Reliability K-R20
Mean
SD
SEM
RSs
Fall
34
RC
RV
Reading
Total
.841
204.0
36.2
14.4
15.8
5.3
2.1
.820
194.0
31.4
13.3
14.4
5.2
2.2
26
L2
Capitalization
.853
204.9
34.4
13.2
14.3
5.8
2.2
.823
195.4
30.0
12.6
12.7
5.4
2.3
26
L3
.902
204.9
34.6
10.8
20.4
7.7
2.4
.893
195.1
30.5
10.0
18.4
7.6
2.5
33
L4
19.2
6.9
2.7
36
M1
22.4
7.2
2.6
.848
.956
.872
203.9 200.4
28.4
22.5
6.0
8.0
.950
.845
202.9
28.3
11.1
14.4
5.1
2.0
.817
193.2
25.4
10.9
12.8
4.9
2.1
24
M2
.918
201.6
24.0
6.9
.905
191.7
21.8
6.7
M3
MT -
.878
200.7
20.5
7.2
17.0
6.1
2.1
.839
188.8
17.4
7.0
13.5
5.6
2.2
27
Computation
Math
Total
Mathematics
Concepts Problems
& Data
&
InterpreEstimation
tation
194.5 190.4
24.9
20.2
5.6
7.9
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.940
201.5
21.1
5.2
.929
190.5
18.9
5.0
MT +
Math
Total
CT +
Core
Total
.976
.977
.980
202.5 202.5
23.7
22.9
3.6
3.3
.973
192.6 192.5
21.2
20.4
3.5
3.1
CT -
Core
Total
16.7
7.3
2.6
34
SC
Science
19.0
7.5
2.6
.872
.856
.881
202.6 203.5
26.4
29.2
10.0
10.1
20.8
6.7
2.5
.842
192.6 193.8
23.3
26.6
9.2
9.5
18.2
6.5
2.6
34
SS
Social
Studies
.842
204.1
31.0
12.3
15.1
5.4
2.1
.818
193.2
27.3
11.6
13.2
5.2
2.2
25
S1
.869
203.2
25.2
9.1
18.2
6.4
2.3
.866
192.7
21.7
7.9
15.6
6.5
2.4
30
S2
.913
203.3
26.0
7.7
.905
192.7
22.8
7.0
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.982
203.1
23.9
3.2
.980
192.9
21.3
3.0
CC -
.983
203.2
23.9
3.1
.981
192.7
21.4
3.0
CC +
Compo- Composite
site
3:15 PM
Number of items
Comprehension
Vocabulary
Reading
10/29/10
Grade 4
Level 10
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 69
69
Mean
SD
SEM
SSs
Mean
SD
SEM
SSs
.943
.893
.903
216.9
29.3
9.1
214.6
27.3
6.5
214.0 215.5
25.5 32.2
8.3 10.0
.904
21.9
7.9
2.5
26.2
9.1
2.8
21.7
8.0
2.6
.934
.873
.895
207.7
26.8
8.7
205.8
25.4
6.5
205.1 207.0
24.0 29.9
8.5
9.8
.892
19.4
7.9
2.5
36
L1
Spelling
23.8
8.9
2.9
19.0
7.6
2.7
RT
Reading
Total
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
Mean
SD
SEM
RSs
Spring
Reliability K-R20
Mean
SD
SEM
RSs
Fall
43
RC
RV
37
Comprehension
Vocabulary
Reading
.851
218.5
41.0
15.8
16.8
5.9
2.3
.840
209.0
37.9
15.2
15.5
5.7
2.3
28
L2
Capitalization
.870
219.3
40.0
14.4
16.0
6.4
2.3
.852
210.3
36.6
14.1
14.7
6.1
2.3
28
L3
.892
218.9
40.1
13.2
21.5
7.7
2.5
.881
209.8
36.4
12.6
19.8
7.5
2.6
35
L4
21.0
7.4
2.8
40
M1
23.9
7.7
2.7
.858
.960
.874
218.3 214.6
33.3
25.4
6.7
9.0
.955
.861
216.9
32.2
12.0
15.1
5.8
2.2
.841
208.1
29.7
11.9
13.5
5.6
2.2
26
M2
.927
215.7
27.9
7.5
.915
206.9
25.6
7.4
M3
MT -
.890
215.3
24.7
8.2
18.0
6.6
2.2
.858
204.2
21.4
8.0
15.2
6.2
2.3
29
Computation
Math
Total
Mathematics
Concepts Problems
& Data
&
InterpreEstimation
tation
209.2 205.3
30.5
23.7
6.4
8.9
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.947
215.7
24.8
5.7
.936
205.7
22.3
5.6
MT +
Math
Total
CT +
Core
Total
.978
.979
.981
216.5 216.3
27.2
26.3
4.0
3.6
.976
207.6 207.3
25.4
24.2
3.9
3.6
CT -
Core
Total
18.7
7.9
2.7
37
SC
Science
21.0
8.1
2.7
.881
.865
.891
217.0 217.9
31.2
33.3
11.5
11.0
21.2
7.3
2.7
.847
207.2 208.5
28.2
30.6
11.0
10.6
18.8
7.0
2.7
37
SS
Social
Studies
.835
218.1
35.4
14.4
14.6
5.5
2.2
.808
209.1
32.4
14.2
13.4
5.2
2.3
26
S1
.884
217.9
30.0
10.2
19.8
7.1
2.4
.876
208.1
26.8
9.4
17.6
7.1
2.5
32
S2
.917
217.8
30.6
8.8
.906
208.7
27.8
8.5
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.983
217.2
27.7
3.6
.980
208.1
25.1
3.5
CC -
.983
217.1
27.5
3.5
.982
207.9
25.2
3.4
CC +
Compo- Composite
site
3:15 PM
Number of items
70
10/29/10
Grade 5
Level 11
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 70
Mean
SD
SEM
SSs
Mean
SD
SEM
SSs
.944
.892
.901
229.5
32.2
10.1
227.0
29.6
7.0
226.7 227.3
27.5 35.3
9.0 10.7
.908
23.5
8.2
2.6
27.7
9.5
2.9
24.0
8.1
2.7
.896
.938
.878
.899
221.5
30.3
9.8
219.5
28.2
7.0
219.2 220.0
26.3 33.4
9.2 10.6
38
L1
21.4
8.2
2.6
RT
Spelling
25.7
9.3
2.9
21.7
7.9
2.8
45
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
Mean
SD
SEM
RSs
Spring
Reliability K-R20
Mean
SD
SEM
RSs
Fall
39
RC
RV
Reading
Total
.825
231.1
44.6
18.6
17.8
5.7
2.4
.813
223.1
42.1
18.2
16.8
5.6
2.4
30
L2
Capitalization
.882
232.4
45.4
15.6
18.4
6.7
2.3
.862
224.0
41.5
15.4
17.3
6.5
2.4
30
L3
.904
230.8
45.0
13.9
23.8
8.4
2.6
.895
223.3
41.7
13.5
22.6
8.2
2.7
38
L4
23.6
8.6
2.9
43
M1
26.1
8.8
2.8
.886
.959
.899
230.8 227.6
36.8
28.5
7.4
9.0
.956
.860
229.9
36.3
13.6
17.4
5.9
2.2
.841
221.8
33.7
13.4
16.1
5.8
2.3
28
M2
.929
228.9
30.6
8.2
.921
220.6
28.9
8.1
M3
MT -
.858
228.4
29.3
11.1
18.3
6.1
2.3
.814
219.3
25.7
11.1
16.5
5.6
2.4
30
Computation
Math
Total
Mathematics
Concepts Problems
& Data
&
InterpreEstimation
tation
222.5 219.3
34.8
26.7
7.3
9.0
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.944
228.8
27.9
6.6
.936
219.8
25.8
6.5
MT +
Math
Total
CT +
Core
Total
.979
.979
.980
229.3 229.3
29.8
28.9
4.4
4.1
.976
220.6 220.9
28.2
27.5
4.3
4.0
CT -
Core
Total
20.5
8.3
2.8
39
SC
Science
22.7
8.5
2.7
.888
.863
.897
229.6 230.7
35.5
36.8
13.1
11.8
21.5
7.6
2.8
.842
221.5 221.8
32.5
34.6
12.9
11.6
20.0
7.2
2.9
39
SS
Social
Studies
.830
230.3
39.2
16.2
16.2
5.6
2.3
.815
223.0
37.0
15.9
15.1
5.4
2.3
28
S1
.873
230.8
34.2
12.2
19.9
7.0
2.5
.861
222.6
31.6
11.8
18.2
6.8
2.5
34
S2
.912
230.5
34.2
10.1
.904
222.4
32.0
9.9
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.983
230.1
30.5
4.0
.981
221.7
28.5
4.0
CC -
.983
229.8
30.4
4.0
.981
221.4
28.5
3.9
CC +
Compo- Composite
site
3:15 PM
Number of items
Comprehension
Vocabulary
Reading
10/29/10
Grade 6
Level 12
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 71
71
Number of items
Mean
SD
SEM
SSs
Mean
SD
SEM
SSs
.948
.886
.910
240.9
34.0
10.2
238.3
32.1
7.3
238.1 238.4
29.0 38.6
9.8 10.8
.922
24.0
8.9
2.7
28.4
10.7
3.0
24.0
8.3
2.8
.941
.872
.906
233.5
32.8
10.1
231.1
30.4
7.4
231.2 231.4
28.2 36.3
10.1 10.8
.912
22.3
8.9
2.7
40
L1
Spelling
26.5
10.3
3.1
22.0
8.0
2.9
RT
Reading
Total
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
Mean
SD
SEM
RSs
Spring
Reliability K-R20
Mean
SD
SEM
RSs
48
RC
RV
41
Comprehension
Vocabulary
Reading
.837
242.5
48.1
19.4
19.2
6.1
2.4
.821
235.4
45.9
19.4
18.2
5.9
2.5
32
L2
Capitalization
.879
243.6
49.2
17.1
18.6
7.0
2.4
.867
236.8
47.0
17.1
17.6
6.8
2.5
32
L3
.905
241.6
48.8
15.1
24.2
8.7
2.7
.898
234.7
46.3
14.8
22.9
8.5
2.7
40
L4
25.6
9.0
3.0
46
M1
28.0
9.3
2.9
.890
.961
.902
241.6 239.5
40.0
31.1
7.9
9.7
.957
.870
241.3
39.5
14.2
18.0
6.5
2.3
.853
234.0
37.3
14.3
17.0
6.2
2.4
30
M2
.935
240.5
33.9
8.6
.925
233.1
31.7
8.7
M3
MT -
.848
240.6
33.5
13.1
16.7
6.2
2.4
.796
231.8
30.1
13.6
15.1
5.5
2.5
31
Computation
Math
Total
Mathematics
Concepts Problems
& Data
&
InterpreEstimation
tation
234.7 231.7
38.2
29.3
7.9
9.7
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.945
240.4
30.9
7.2
.934
232.4
28.5
7.3
MT +
Math
Total
CT +
Core
Total
.979
.980
.982
240.8 240.9
32.9
31.8
4.6
4.3
.978
233.2 233.0
30.8
29.9
4.6
4.3
CT -
Core
Total
21.4
8.6
2.8
41
SC
Science
23.2
8.8
2.8
.890
.877
.900
240.7 241.7
39.0
39.9
13.7
12.6
21.4
8.3
2.9
.857
233.2 233.9
36.4
37.8
13.8
12.5
19.8
7.8
2.9
41
SS
Social
Studies
.817
242.1
42.9
18.3
15.8
5.6
2.4
.791
234.5
40.4
18.5
15.0
5.3
2.4
30
S1
.884
242.2
37.8
12.9
20.5
7.7
2.6
.876
235.0
35.6
12.5
19.2
7.6
2.7
36
S2
.911
242.3
37.5
11.2
.901
234.9
35.4
11.2
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.984
241.1
33.6
4.3
.982
233.7
31.8
4.3
CC -
.984
241.1
33.2
4.2
.982
233.8
31.4
4.2
CC +
Compo- Composite
site
3:15 PM
Fall
72
10/29/10
Grade 7
Level 13
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 72
Mean
SD
SEM
SSs
Mean
SD
SEM
SSs
.950
.890
.903
251.2
35.6
11.1
248.8
34.1
7.6
248.7 248.9
30.9 41.4
10.2
11.3
.925
24.5
8.8
2.7
30.9
11.2
3.1
24.8
8.5
2.8
.898
.944
.874
.917
244.4
34.5
11.0
242.3
32.8
7.8
241.9 242.4
29.7 39.5
10.5
11.4
42
L1
22.8
8.8
2.8
RT
Spelling
29.1
10.9
3.1
22.9
8.2
2.9
52
Note: -Does not include Computation
+Includes Computation
Reliability K-R20
Mean
SD
SEM
RSs
Spring
Reliability K-R20
Mean
SD
SEM
RSs
Fall
42
RC
RV
Reading
Total
.843
251.7
50.5
20.0
20.3
6.3
2.5
.836
246.0
49.0
19.9
19.6
6.2
2.5
34
L2
Capitalization
.872
252.4
51.6
18.4
19.0
7.2
2.6
.862
247.2
50.0
18.6
18.3
7.0
2.6
34
L3
.896
251.5
52.6
17.0
24.6
8.8
2.8
.885
245.2
50.2
17.0
23.7
8.5
2.9
43
L4
24.6
9.4
3.1
49
M1
26.7
9.9
3.1
.890
.960
.904
251.6 250.4
42.6
33.5
8.5
10.4
.957
.879
250.9
42.3
14.7
18.7
6.9
2.4
.862
244.9
40.4
15.0
17.8
6.7
2.5
32
M2
.938
250.8
36.1
9.0
.929
244.4
34.5
9.2
M3
MT -
.864
251.3
36.8
13.6
15.9
6.7
2.5
.819
243.9
34.1
14.5
14.6
6.0
2.5
32
Computation
Math
Total
Mathematics
Concepts Problems
& Data
&
InterpreEstimation
tation
245.4 243.5
41.0
32.0
8.5
10.6
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.949
251.0
33.2
7.5
.939
244.1
31.6
7.8
MT +
Math
Total
CT +
Core
Total
.980
.980
.982
250.9 251.0
34.5
33.6
4.8
4.6
.979
244.3 244.1
33.5
32.6
4.9
4.6
CT -
Core
Total
22.3
8.8
2.9
43
SC
Science
23.6
9.0
2.9
.889
.884
.898
250.6 251.5
42.1
42.4
14.3
13.5
23.2
8.7
2.9
.869
244.2 245.0
39.8
40.6
14.4
13.5
22.1
8.3
3.0
43
SS
Social
Studies
.837
251.7
45.6
18.4
16.6
6.0
2.4
.823
245.8
44.1
18.5
15.9
5.9
2.5
31
S1
.872
251.9
40.2
14.4
21.8
7.5
2.7
.863
246.0
38.6
14.3
20.8
7.4
2.8
38
S2
.914
251.7
39.8
11.7
.908
245.8
38.5
11.7
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.984
251.9
35.6
4.5
.983
244.8
34.4
4.5
CC -
.984
251.7
35.2
4.4
.983
244.9
33.9
4.5
CC +
Compo- Composite
site
3:15 PM
Number of items
Comprehension
Vocabulary
Reading
10/29/10
Grade 8
Level 14
Table 5.1 (continued)
Test Summary Statistics
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 73
73
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 74
ITBS. In general, alternate-forms coefficients tend
to be smaller than their internal-consistency
counterparts because they are sensitive to more
sources of measurement error. The coefficients in
Table 5.2 also reflect changes across editions of The
Iowa Tests.
Equivalent-Forms Reliability Analysis
Reliability coefficients obtained by correlating the
scores from equivalent forms are considered
superior to those derived through internalconsistency procedures because all four major
sources of error are taken into account: variations
arising within the measurement procedure, changes
in the specific sample of tasks, changes in the
individual from day to day, and changes in the
individual’s speed of work. Internal-consistency
procedures take into account only the first two
sources of error. For this reason, K-R20 reliability
estimates tend to be higher than those obtained
through the administration of equivalent forms.
Another source of alternate-forms reliability came
from the 2000 fall standardization sample. During
the fall administration, students took one form of
the Complete Battery and a different form of the
Survey Battery. The correlations between standard
scores on subtests in both Complete and Survey
batteries represent indirect estimates of equivalentforms reliability. To render these correlations
consistent with the length and variability of
Complete and Survey subtests, the estimates
reported in Table 5.3 were adjusted for differences in
length of the two batteries as well as for differences
in variability typically observed between fall and
spring test administrations. These reliability
coefficients isolate the presence of form-to-form
differences in the sample of tasks included on the
tests at each level. Equivalent-forms reliability
estimates for Total scores and Composites show a
tendency to be lower than the corresponding K-R20
coefficients in Table 5.1; however, their magnitudes
are comparable to those of internal-consistency
reliabilities reported for the subtests of major
achievement batteries.
The principal reason that equivalent-forms
reliability data are not usually provided with all
editions, forms, and levels of achievement batteries
is that it is extremely difficult to obtain the
cooperation of a truly representative sample of
schools for such a demanding project.
The reliability coefficients in Table 5.2 are based on
data from the equating of Form A to Form K. Prior
to the 2000 spring standardization, a national
sample of students in kindergarten through grade 8
took both Form K and Form A. Between-test
correlations from this administration are direct
estimates of the alternate-forms reliability of the
Table 5.2
Equivalent-Forms Reliabilities, Levels 5–14
Iowa Tests of Basic Skills — Complete Battery, Forms A and K
Spring 2000 Equating Sample
Reading
Level (N)
Language
Vocabulary
Comprehension
Reading
Total
Spelling
Capitalization
RV
RC
RT
L1
L2
L3
Sources of Information
Mathematics
Concepts Problems
Punctu- Usage & Language
Computa& Data
&
ation
Expression
Total
tion
InterpreEstimation
tation
L4
LT
M1
M2
M3
Math
Total
MT
Social
Studies
Science
SS
SC
Maps
Reference Sources
and
Materials
Total
Diagrams
S1
S2
5 (418)
.63
6 (1121)
.72
7 (879)
.81
.86
.81
.77
.77
.78
.69
.64
8 (1111)
.81
.83
.83
.79
.78
.71
.64
.64
9 (684)
.78
.82
.80
.76
.73
.79
.76
.76
.73
.73
.72
.73
.77
10 (596)
.80
.79
.84
.74
.77
.79
.81
.77
.78
.74
.77
.75
.81
11 (919)
.82
.83
.85
.78
.81
.80
.82
.80
.79
.79
.83
.76
.80
12 (824)
.86
.78
.88
.78
.81
.81
.80
.77
.78
.76
.78
.74
.78
13 (939)
.84
.84
.86
.79
.80
.82
.83
.76
.76
.77
.76
.74
.74
14 (857)
.80
.79
.85
.77
.77
.80
.86
.79
.78
.80
.77
.68
.77
74
.93
ST
Word
Analysis
Listening
WA
Li
.74
.74
.78
.75
.76
.82
.78
.70
.74
.80
.67
.76
.82
.67
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 75
Table 5.3
Estimates of Equivalent-Forms Reliability
Iowa Tests of Basic Skills — Complete Battery, Forms A and B
2000 National Standardization
Time
of
Year
Reading
Total
Language
Total
Math
Total
Math
Total
Core
Total
Core
Total
Level
RT
LT
MT -
MT +
CT -
CT +
Fall
9
10
11
12
13
14
.854
.852
.855
.870
.866
.858
.863
.882
.888
.890
.911
.893
.817
.811
.866
.842
.849
.859
.839
.836
.879
.865
.854
.874
.920
.915
.925
.926
.927
.927
.923
.919
.927
.929
.928
.929
Spring
9
10
11
12
13
14
.877
.872
.876
.883
.881
.869
.902
.911
.907
.902
.920
.901
.856
.848
.889
.861
.869
.872
.870
.870
.903
.885
.877
.886
.939
.933
.935
.934
.936
.931
.942
.936
.939
.936
.937
.933
Note:
-Does not include Computation
+Includes Computation
Sources of Error in Measurement
Further investigation of sources of error in
measurement for the ITBS was provided in two
studies of equivalent-forms reliability. The first
(Table 5.4) used data from the spring 2000 equating
of Forms K and A. The second (Table 5.5) used data
from the fall 1995 equating of Forms K and M of the
Primary Battery.
As previously described, Forms K and A of the ITBS
were given to a large national sample of schools
selected to be representative with respect to
variability in achievement. Order of administration
of the two forms was counterbalanced by school, and
there was a seven- to ten-day lag between test
administrations. The design of this study made
possible an analysis of relative contributions of
various sources of measurement error across tests,
grades, and schools.
In addition to equivalent-forms reliability coefficients,
three other “within-forms” reliability coefficients
were computed for each school, for each test, for each
form, and in each sequence:
• K-R20 reliability coefficients were calculated from
the item-response records.
• Split-halves, odds-evens (SHOE) reliability
coefficients were computed by correlating the raw
scores from odd-numbered versus even-numbered
items. Full-test reliabilities were estimated using
the Spearman-Brown formula.
• Split-half, equivalent-halves (SHEH) reliability
coefficients were obtained by dividing items
within each form into equivalent half-tests in
terms of content, difficulty, and test length. For
tests composed of discrete items such as Spelling,
equivalent halves were assembled by matching
pairs of items testing the same skill and having
approximately the same difficulty index. One
item of the pair then was randomly assigned to
the X half of the test and the other was assigned
to the Y half. For tests composed of testlets (sets
of items associated with a common stimulus) such
as Reading, the stimulus and the items
dependent on it were treated as a testing unit and
assigned intact to the X half or Y half. Small
adjustments in the composition of equivalent
halves were made for testlet-based tests to
balance content, difficulty, and number of items.
After equivalent halves were assembled, a
correlation coefficient between the X half and the
Y half was computed. The full-test equivalent75
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 76
as its predecessor, Form K, but was designed to
allow more space on each page for item locator art
and other decorative features. To ensure that the
formatting changes had no effect on student
performance, a subsample of the 1995 Form M
equating sample was administered with both
Forms K and K/M of the Primary Battery in
counterbalanced order.
halves reliabilities were estimated using the
Spearman-Brown formula.
Differences between equivalent-halves estimates
obtained in the same testing session and equivalentforms estimates obtained a week or two apart
constitute the best evidence on the effects of changes
in pupil motivation and behavior across several days.
The means of the grade reliabilities by test are
reported in Table 5.4. Overall, the estimates of the
three same-day reliability coefficients are similar.
The same-day reliabilities varied considerably
among the individual tests, however. For the
Reading and the Maps and Diagrams tests, the
equivalent-halves reliabilities are nearer to the
equivalent-forms reliabilities. These lower reliability
estimates are due to the manner in which the
equivalent halves were established for these two tests.
Table 5.5 contains correlations between scores from
the two administrations of Levels 5 through 8
during the 1995 equating study. These values
represent direct evidence of the contribution of
between-days sources of error to unreliability. These
sources of error are thought to be especially
important to the interpretation of scores on
achievement tests for students in the primary
grades. Although some variation exists, the
estimates of test-retest reliability are generally
consistent with the internal-consistency estimates
for these levels reported in Table 5.1. The
correlations in Table 5.5 suggest a substantial
degree of stability in the performance of students in
the early elementary grades over a short time
interval (Mengeling & Dunbar, 1999).
Another study of sources of error in measurement
was completed during the 1995 equating of Form M
to Form K. With the introduction of Form M of the
ITBS, Levels 9 through 14, a newly formatted
edition of Form K, Levels 5 through 8, was developed
and designated Form K/M. This edition of the
Primary Battery contained exactly the same items
Table 5.4
Mean (Grades 3–8) Reliability Coefficients: Reliability Types Analysis by Tests
Iowa Tests of Basic Skills — Complete Battery, Forms K and A
76
Form K
Form A
Test
K-R20 SHOE SHEH
K-R20 SHOE SHEH
EFKA
Vocabulary
Reading Comprehension
Spelling
Capitalization
Punctuation
Usage and Expression
Math Concepts and Estimation
Math Problem Solving and Data Interpretation
Mathematics Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
.871
.890
.889
.839
.837
.869
.869
.840
.896
.836
.851
.832
.879
.876
.894
.888
.842
.838
.876
.877
.849
.903
.838
.859
.837
.893
.877
.865
.893
.842
.844
.867
.871
.840
.910
.836
.848
.809
.866
.873
.904
.894
.828
.850
.887
.866
.845
.854
.851
.873
.816
.856
.880
.907
.894
.823
.840
.889
.875
.858
.872
.854
.881
.828
.865
.875
.886
.895
.831
.859
.886
.866
.837
.873
.846
.869
.781
.844
.816
.809
.846
.773
.780
.804
.814
.774
.771
.764
.771
.733
.777
Mean
.861
.867
.859
.861
.867
.858
.787
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 77
Table 5.5
Test-Retest Reliabilities, Levels 5–8
Iowa Tests of Basic Skills — Complete Battery, Form K
Fall 1995 Equating Sample
Reading
Language
Mathematics
Problems
Listening Language Concepts
Computa& Data
tion
Interpretation
Science
Sources
Word
of
Analysis
Information
Vocabulary
Comprehension
Reading
Total
RV
RC
RT
Li
L
M1
M2
M3
MT -
SS
SC
SI
WA
5 (N > 826)
.74
—
—
.71
.80
—
—
—
.81
—
—
—
.80
6 (N > 767)
.88
.83
.90
.78
.82
—
.81
—
.85
—
—
—
.86
7 (N > 445)
.90
.93
—
.76
.90
.83
.83
.77
—
.82
.80
.84
.87
8 (N > 207)
.91
.93
—
.83
.88
.84
.85
.72
—
.82
.83
.85
.91
Level
Math
Total
Social
Studies
Note: ⫺Does not include Computation
The most important result of these analyses is
the quantification of between-days sources of
measurement error and their contribution to
unreliability. Reliability coefficients based on
internal-consistency analyses are not sensitive to
this source of error.
Standard Errors of Measurement for
Selected Score Levels
A study of examinee-level standard errors of
measurement based on a single test administration
was conducted by Qualls-Payne (1992). The single
administration procedures investigated were those
originated by Mollenkopf (1949), Thorndike (1951),
Keats (1957), Feldt (1984), and Jarjoura (1986), and
a modified three-parameter latent trait model. The
accuracy and reliability of estimates varied across
tests, grades, and criteria.
The procedure recommended for its agreement with
equivalent-forms estimates was Feldt’s modification
of Lord’s binomial error model, with partitioning
based on a content classification system. Application
of this procedure provides more accurate estimates
of individual standard errors of measurement than
have previously been available from a single test
administration. For early editions of the ITBS,
score-level standard errors of measurement were
estimated using data from special studies in which
students were administered two parallel forms of
the tests. Since that time, additional research has
produced methods for estimating the standard error
of measurement at specific score levels that do not
require multiple test administrations. These
conditional SEMs were estimated from the 2000
spring national standardization of Form A of the
ITBS. Additional tables with conditional SEMs for
Form B are available from the Iowa Testing
Programs. The form-to-form differences in these
values are minor.
The results in Table 5.6 were obtained using a
method developed by Brennan and Lee (1997) for
smoothing a plot of conditional standard errors for
scaled scores based on the binomial error model. In
addition to this method, an approach developed by
Feldt and Qualls (1998) and another based on
bootstrap techniques were used at selected test
levels. Because the results of all three methods
agreed closely and generally matched the patterns
of varying SEMs by score level found with previous
editions of the tests, only the results of the Brennan
and Lee method are provided.
77
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 78
Table 5.6
Standard Errors of Measurement for Selected Standard Score Levels
Iowa Tests of Basic Skills—Complete Battery, Form A
2000 National Standardization
Test
Level
5
6
Score
Level
Word Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
3.56
5.74
4.45
90–99
2.32
2.91
3.42
3.85
2.94
100–109
4.05
4.66
4.86
4.99
4.89
110–119
7.45
5.76
4.84
4.29
5.59
120–129
8.64
6.18
4.66
4.17
4.80
130–139
9.92
7.24
5.87
4.97
5.22
140–149
11.33
8.14
6.58
5.74
5.83
150–159
11.90
8.59
5.55
4.73
5.36
160–169
11.08
7.43
170–179
8.67
2.37
3.09
2.39
3.66
5.35
90–99
2.39
100–109
4.10
2.41
110–119
5.37
6.50
3.65
4.79
5.94
5.10
6.47
6.29
120–129
7.41
7.17
3.45
6.22
5.57
4.20
5.28
5.21
130–139
9.50
5.31
2.59
7.60
6.16
2.86
4.90
5.61
140–149
10.32
3.70
2.71
8.51
7.07
4.59
6.28
5.94
150–159
10.85
5.30
4.45
9.41
8.02
7.43
8.24
6.86
160–169
11.08
6.66
4.80
10.34
8.81
9.09
7.96
170–179
10.50
10.49
8.24
8.47
7.81
180–189
9.08
9.57
6.71
6.31
6.13
190–199
7.14
7.69
200–209
78
Vocabulary
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 79
Table 5.6 (continued)
Standard Errors of Measurement for Selected Standard Score Levels
Iowa Tests of Basic Skills—Complete Battery, Form A
2000 National Standardization
Reading
Mathematics
Word
Analysis
Test
Level
Score
Level
100–109
7
Vocabulary
Comprehension
V
RC
4.06
Spelling
Language
Concepts
WA
Li
L1
2.91
3.44
5.73
L
4.69
Problems
& Data Computation
Interpretation
M1
M2
2.99
3.73
M3
Social
Studies
Science
Sources
of
Information
SS
SC
SI
2.98
2.19
4.67
110–119
6.02
3.83
4.46
5.65
7.12
5.82
5.37
6.94
6.62
4.96
4.38
6.61
120–129
7.29
4.88
6.59
5.77
7.51
4.96
6.88
8.63
6.90
6.09
5.95
7.48
130–139
7.28
5.25
7.14
6.97
6.71
4.54
7.05
8.67
4.35
6.58
8.63
6.61
140–149
5.81
3.92
6.45
7.93
4.03
4.38
6.99
7.91
3.24
7.78
10.71
4.75
150–159
5.61
3.76
7.13
8.57
4.01
5.39
7.47
7.48
4.51
9.22
11.92
4.69
160–169
7.15
7.44
9.17
9.70
5.31
5.13
170–179
9.64
10.12
10.08
9.68
180–189
10.65
9.56
8.80
7.89
7.37
190–199
8
Listening
6.65
8.70
7.79
10.11
12.86
8.36
6.68
9.04
8.40
10.30
13.10
8.00
8.09
7.11
9.58
11.98
7.90
10.76
100–109
4.00
2.43
3.00
2.94
2.95
2.94
2.40
5.30
110–119
6.12
4.26
4.70
5.27
5.70
4.66
5.47
5.67
6.08
5.47
3.99
7.38
120–129
8.05
5.52
6.80
6.83
7.24
5.05
7.02
8.26
7.48
6.92
6.38
7.69
130–139
8.89
5.14
8.35
7.10
7.49
5.55
7.72
9.73
7.32
7.33
7.43
7.62
140–149
8.34
4.43
9.08
7.98
7.82
5.35
8.17
10.05
5.63
8.69
8.66
6.82
150–159
6.80
4.39
9.19
8.71
7.11
5.39
8.68
9.49
5.32
10.41
10.44
5.74
160–169
6.09
6.93
9.36
8.89
6.20
5.62
8.76
8.61
6.19
11.61
11.94
6.51
170–179
8.28
9.88
10.05
9.95
6.87
6.94
8.64
8.44
6.94
12.40
13.12
10.41
180–189
11.26
11.80
10.71
10.93
7.35
9.03
8.75
9.15
8.36
12.07
14.31
10.55
190–199
12.19
12.74
10.91
11.12
9.91
8.90
9.41
9.19
11.09
14.74
8.75
200–209
11.61
12.44
10.53
10.49
8.98
7.47
8.90
8.20
9.59
14.42
210–219
8.67
10.37
9.61
8.88
7.28
12.66
220–229
7.38
7.48
9.14
79
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 80
Table 5.6 (continued)
Standard Errors of Measurement for Selected Standard Score Levels
Iowa Tests of Basic Skills—Complete Battery, Form A
2000 National Standardization
Reading
Test
Level
Score
Level
Vocabulary
Comprehension
RV
RC
Language
Sources of
Information
Mathematics
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
L1
L2
L3
L4
M1
Problems
Computa& Data
tion
Interpretation
M2
M3
Social
Studies
Science
SS
SC
110–119
9
Maps
Reference
and
Materials
Diagrams
S1
S2
5.38
120–129
4.63
5.08
5.92
4.24
6.56
5.08
4.13
5.62
130–139
8.08
7.22
7.76
7.04
8.34
7.86
7.26
8.60
140–149
10.24
8.68
8.61
10.06
9.64
9.76
8.45
10.16
7.97
150–159
10.43
8.99
7.91
10.53 10.07
9.92
8.20
10.40
7.31
160–169
8.93
8.11
6.18
9.63
9.73
8.76
8.20
9.97
7.27
170–179
6.85
6.61
5.63
9.28
9.11
7.16
8.42
9.06
6.51
180–189
6.61
6.52
6.50
11.29
9.99
8.49
8.34
9.08
6.02
190–199
7.60
8.21
8.75
14.56 13.62 12.41
8.96
200–209
9.74
9.63 12.04
17.24 16.73 14.96
10.81
210–219
11.55
11.09 13.88
18.58 18.53 16.61
12.46
12.86
8.25
220–229
10.80 12.00
19.00 19.48 16.60
12.27
10.45
230–239
11.98
17.81 18.92 15.81
9.99
240–249
10.64
15.92 16.34 15.02
250–259
14.03 11.47
10
4.82
4.00
130–139
7.35
6.36
7.07
140–149
10.14
7.82
150–159
11.86
160–169
170–179
7.75
7.60
8.03
11.59
9.03
9.98
9.76
7.54
8.68 12.12
8.81
10.45
9.84
7.77
8.98
11.54
7.69
10.62
10.78
8.11
8.98 10.35
6.44
11.20
10.93
9.11
9.31 10.70
7.20
11.43
9.97
10.94
7.52 10.23
11.12 13.76
9.51
11.99
10.72
12.49
9.16
11.05 13.18 16.72
11.70
13.19
12.71
7.49
13.65
13.67
13.86
12.24
8.84 13.94 16.42 12.52
11.98
9.50
11.59 13.87 10.05
9.00
8.15
11.32
8.00
6.64
6.18
9.95
6.91
8.73
10.12 10.53 10.28
8.71
9.99
9.47
7.24
8.32 10.95
9.53
9.16
8.54
11.98 12.17
11.80
9.76
11.41
9.23
7.16
9.95
12.06
9.82
7.94
12.54 12.82
11.76
9.99
12.10
8.74
9.01
11.12 12.62
9.20
10.89
9.44
7.42
12.65 12.48 10.21
9.75
12.24
9.07
9.71 10.99 12.21
7.77
180–189
8.00
8.55
7.07
13.05 11.69
8.12
8.91
11.85
8.16
9.26
9.74
11.23
7.10
190–199
6.31
7.96
7.68
14.69 11.86
8.63
7.37
11.48
6.99
9.37
8.27
11.16
8.30
200–209
6.97
9.15
9.62
18.20 14.16 12.24
7.66
12.46
7.50
11.08
9.75 13.47
11.54
210–219
8.25
11.46 12.27
21.39 16.31 15.38
9.69
14.17
8.62 12.81 12.45 15.68 14.61
8.07
11.95 10.02
220–229
9.71 12.76 14.33
23.02 18.12 17.65
11.59
15.11
9.74 14.14 14.76 17.44 16.59
230–239
11.09 13.00 15.43
23.32 18.99 19.10
12.80
15.04
9.09 14.43 15.79 17.92 17.22
240–249
11.27 12.58 15.67
23.62 18.61 18.99
12.84
14.16
13.62 15.43 17.43 16.94
11.05 14.40
22.75 17.41 17.75
10.04
12.26
11.87 13.82 16.94 15.54
260–269
9.16 10.89
20.30 15.43 16.47
8.99
8.95 12.14 14.79 13.03
270–279
7.14
16.60 12.81 15.18
290–299
11.86
6.73
9.04
11.39 14.73 18.31 12.92
11.68
9.39 10.99
7.06
8.58
10.98 14.77 18.55 13.21
7.36
14.23
4.06
9.97
4.00
280–289
3.87
5.32
6.65
9.71
Li
7.03
5.11
7.62
WA
4.39
7.76
250–259
80
3.95
6.67
Listening
6.56
7.27
110–119
120–129
Word
Analysis
8.98
11.19
9.45
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 81
Table 5.6 (continued)
Standard Errors of Measurement for Selected Standard Score Levels
Iowa Tests of Basic Skills—Complete Battery, Form A
2000 National Standardization
Reading
Test
Level
Score
Level
Vocabulary
Comprehension
RV
RC
Language
Spelling
Capitalization
Punctuation
L1
L2
L3
120–129
130–139
11
Usage &
Expression
Concepts
&
Estimation
L4
M1
Problems
Computa& Data
tion
Interpretation
M2
M3
Social
Studies
Science
SS
SC
4.00
6.07
5.20
8.00
5.46
6.60
6.75
Maps
Reference
and
Materials
Diagrams
S1
S2
7.93
5.74
7.55
6.34
6.14
10.64
140–149
8.56
6.84
9.57
8.16
9.65
9.82
7.82
9.35
8.72
7.91
8.15
12.22
8.69
150–159
11.08
9.00
9.48
11.10
12.69
12.42
9.08
11.20
10.60
8.86
9.34
13.69
10.33
160–169
12.63
10.84
8.49
13.18
14.30
13.79
10.00
13.00
11.06
10.18
10.23
14.80
9.98
170–179
12.39
11.04
8.44
14.17
14.83
13.27
10.52
13.56
11.25
11.61
11.07
15.38
8.94
180–189
10.81
10.42
8.47
14.59
14.60
11.30
10.62
13.36
10.77
12.20
11.10
15.37
8.42
190–199
9.36
9.89
8.16
14.81
13.90
9.96
10.14
12.66
9.10
11.65
10.13
14.80
8.05
200–209
8.50
9.67
8.54
15.53
13.26
11.16
9.43
12.14
7.76
10.79
9.53
14.13
8.56
210–219
8.14
9.97
9.77
17.60
14.57
14.06
9.08
12.55
8.10
11.09
10.37
14.48
11.22
220–229
8.36
10.64
10.99
19.86
17.06
17.20
9.09
14.06
10.23
12.72
12.17
15.85
14.26
230–239
8.70
11.18
12.40
21.38
18.82
19.49
9.93
15.01
11.80
14.00
14.06
17.03
16.07
240–249
9.65
11.96
13.48
22.54
19.43
20.66
11.39
15.29
12.05
14.76
15.50
17.63
16.95
250–259
10.52
13.07
13.81
23.00
19.40
21.32
11.97
14.68
10.78
15.12
15.98
17.78
17.29
260–269
10.25
13.27
13.42
22.39
18.69
21.28
11.28
12.84
8.31
15.04
15.66
16.68
16.25
270–279
8.42
12.65
12.32
20.53
17.46
20.34
9.97
10.88
14.01
13.97
14.86
13.59
280–289
11.12
9.88
17.46
15.91
18.47
7.76
8.21
11.90
11.40
12.18
10.54
290–299
8.36
13.56
14.09
14.24
8.40
8.57
8.64
9.16
11.30
9.23
4.74
5.37
5.83
300–309
120–129
12
Sources of
Information
Mathematics
3.98
130–139
5.32
4.79
5.75
6.72
6.30
6.13
8.30
140–149
7.33
7.28
8.22
6.65
7.62
8.63
8.12
8.37
7.48
8.35
7.97
11.23
7.85
150–159
9.86
8.44
10.31
9.65
10.53
12.96
9.67
10.03
9.29
9.08
9.01
12.26
9.53
160–169
11.90
9.68
9.31
13.13
12.91
14.47
10.22
12.07
10.38
9.91
10.11
12.88
9.81
170–179
12.47
10.83
7.88
14.68
13.99
14.65
10.40
13.32
11.22
11.57
11.85
14.11
9.98
180–189
12.19
11.32
8.94
15.71
14.34
13.54
10.43
13.52
11.79
12.86
12.84
14.94
10.46
190–199
11.37
11.43
9.63
17.00
14.03
11.63
10.06
13.25
12.04
13.49
12.57
15.34
10.71
200–209
10.41
11.39
9.80
18.31
13.96
10.86
9.51
12.78
12.03
13.41
11.44
15.73
10.54
210–219
9.62
11.29
9.97
20.19
14.73
12.01
9.22
13.33
11.95
12.88
10.37
16.71
11.16
220–229
9.24
11.10
10.26
22.41
16.66
14.02
9.06
15.03
12.13
12.97
10.66
17.95
13.64
230–239
9.14
11.04
10.99
23.91
18.66
16.41
9.19
16.67
12.73
14.06
12.35
19.37
15.72
240–249
8.72
11.41
12.08
25.16
20.57
19.37
9.71
17.95
13.24
15.20
14.37
20.56
16.97
250–259
8.81
12.03
13.37
25.92
21.77
21.54
11.16
18.39
13.30
16.05
15.97
20.97
17.68
260–269
10.59
12.79
14.56
25.93
21.79
22.55
12.86
17.89
12.75
16.38
16.76
20.91
17.40
270–279
12.08
13.05
14.66
25.06
21.11
22.88
13.23
16.72
11.64
16.15
16.82
20.26
16.26
280–289
11.63
12.62
13.89
23.26
20.40
22.25
12.64
14.88
10.14
15.62
15.94
18.97
15.06
290–299
11.32
12.25
20.69
19.69
20.55
9.74
12.30
7.54
14.21
14.11
17.04
12.95
300–309
9.29
9.31
17.59
17.54
17.86
11.44
11.53
14.59
10.54
310–319
14.30
14.65
14.48
8.63
8.64
8.14
10.11
8.14
320–329
11.04
10.83
10.41
6.06
330–339
7.65
81
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 82
Table 5.6 (continued)
Standard Errors of Measurement for Selected Standard Score Levels
Iowa Tests of Basic Skills—Complete Battery, Form A
2000 National Standardization
Reading
Vocabulary
Test
Level
82
Score
Level
13
130–139
140–149
150–159
160–169
170–179
180–189
190–199
200–209
210–219
220–229
230–239
240–249
250–259
260–269
270–279
280–289
290–299
300–309
310–319
320–329
330–339
340–349
350–359
14
130–139
140–149
150–159
160–169
170–179
180–189
190–199
200–209
210–219
220–229
230–239
240–249
250–259
260–269
270–279
280–289
290–299
300–309
310–319
320–329
330–339
340–349
350–359
360–369
Language
Comprehension
Spelling
RV
RC
L1
5.69
7.19
8.87
11.76
14.04
15.03
15.11
14.28
12.65
11.22
10.44
9.56
8.65
8.71
10.17
11.56
11.81
10.16
4.77
6.94
8.50
11.09
13.29
14.12
14.07
13.22
11.92
10.89
10.32
10.17
10.38
11.08
12.26
13.16
13.36
12.70
11.13
8.89
6.38
8.48
10.85
13.39
15.26
15.93
15.84
14.97
13.36
11.69
10.67
9.98
9.09
8.59
9.92
11.60
12.31
10.99
3.93
5.87
7.68
9.84
12.18
14.28
15.21
15.07
14.13
13.00
12.02
11.36
10.92
11.11
11.61
12.20
12.69
12.76
12.08
11.27
9.68
8.45
10.16
10.65
10.37
10.19
10.06
9.81
9.85
10.12
10.29
10.72
11.86
13.05
13.72
13.85
13.55
11.93
8.36
9.72
11.10
11.00
10.65
10.61
10.97
11.08
11.01
11.39
12.02
12.75
13.48
13.88
13.92
13.52
12.61
11.54
9.97
7.54
Capitalization
L2
Punctuation
Sources of
Information
Mathematics
Usage &
Expression
Concepts
&
Estimation
Problems
Computa& Data
tion
Interpretation
M3
Science
SS
SC
S1
8.30
10.74
12.47
13.76
15.38
17.17
18.66
19.69
20.06
20.42
20.85
21.54
22.26
22.53
22.46
21.91
20.79
19.11
15.79
11.14
8.17
5.95
L3
L4
M1
4.39
6.28
8.71
11.59
14.00
15.90
17.08
18.34
19.80
21.57
23.50
24.76
25.79
26.40
26.37
25.56
23.87
21.39
18.34
13.69
9.84
7.33
4.77
7.19
10.73
14.04
15.70
16.45
16.63
16.46
16.38
17.23
19.23
21.01
22.01
22.65
22.83
22.47
21.58
20.17
17.31
14.09
11.78
8.75
5.26
8.44
12.25
15.03
16.13
16.08
14.97
13.28
12.47
13.16
15.31
17.93
20.64
22.26
22.89
23.04
22.66
21.71
20.19
18.10
15.48
12.36
8.65
5.75
7.52
8.85
9.80
10.53
11.14
11.37
11.20
10.81
10.34
9.87
9.85
10.33
10.86
11.26
12.02
12.88
12.34
9.98
7.46
10.81
12.65
13.94
14.67
14.79
14.68
14.76
15.24
16.17
16.97
17.38
17.16
16.16
14.73
13.64
11.80
8.91
7.48
9.02
10.59
13.30
15.26
16.51
17.14
17.14
16.59
15.96
15.25
14.31
13.32
12.08
10.19
8.92
7.88
7.74
9.91
10.64
12.04
13.86
14.95
14.94
14.37
13.91
14.10
14.61
15.11
15.30
15.03
14.45
13.87
13.22
12.01
10.44
7.61
6.93
8.93
10.09
11.95
13.69
14.17
13.89
12.88
11.85
11.67
13.10
14.86
16.06
16.64
16.68
16.20
15.06
13.39
10.92
7.38
5.44
8.31
10.88
13.20
14.89
16.23
17.76
20.27
22.52
24.13
25.65
26.92
27.77
28.02
27.59
26.46
24.70
22.44
19.87
15.91
12.20
8.42
6.05
10.71
14.57
16.76
17.66
18.22
18.23
18.05
18.26
18.98
20.23
21.66
22.83
23.38
23.16
22.44
21.28
19.77
18.04
15.35
12.81
11.20
8.91
7.07
9.22
10.59
11.51
12.23
13.00
13.20
12.83
12.24
11.67
11.20
11.09
11.08
10.81
10.27
9.88
9.74
9.49
9.30
8.19
6.78
9.52
12.24
13.77
14.65
15.11
15.41
15.92
16.66
17.45
17.89
17.95
17.61
16.76
15.58
14.22
12.74
11.95
11.16
9.29
8.74
11.56
13.28
14.58
16.26
17.73
18.42
18.86
18.71
18.05
17.19
16.25
14.96
13.63
12.37
10.72
9.08
8.37
7.15
7.06
8.83
9.09
10.16
12.81
14.76
15.75
15.91
15.61
15.19
14.90
14.93
15.08
15.34
15.72
15.93
15.64
14.65
12.60
10.29
8.22
5.98
6.13
7.99
9.64
12.27
14.55
15.58
15.80
15.32
14.17
13.08
12.65
13.44
15.00
16.38
17.35
17.55
17.02
15.88
14.08
11.92
9.46
6.38
4.70
7.02
10.83
14.92
16.52
17.02
16.66
15.98
15.77
16.25
17.27
18.61
20.01
20.96
21.68
22.11
22.11
21.82
20.65
18.68
16.90
13.90
11.07
8.50
M2
Social
Studies
Maps
Reference
and
Materials
Diagrams
8.67
11.16
12.39
13.76
15.35
17.04
18.45
19.12
19.74
20.10
20.35
20.68
20.98
21.50
21.90
21.95
21.69
21.04
19.92
17.37
14.18
10.64
6.63
S2
8.43
10.35
10.78
10.55
11.01
11.87
12.15
11.65
11.57
12.96
14.87
16.29
17.04
17.24
16.89
16.27
14.79
12.84
11.17
8.70
9.00
10.49
10.29
10.20
11.03
11.70
12.38
13.50
14.85
16.48
18.15
19.13
19.53
19.33
18.80
17.58
16.05
14.26
12.22
10.55
8.14
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 83
Effects of Individualized Testing
on Reliability
The extensive reliability data reported in the
preceding sections are based on group rather than
individualized testing and on results from schools
representing a wide variety of educational practices.
The reliability coefficients obtained in individual
schools may be improved considerably by optimizing
the conditions of test administration. This can be
done by attending to conditions of student
motivation and attitude toward the tests and by
assigning appropriate test levels to individual
students.
One of the important potential values of
individualized testing is improvement of the
accuracy of measurement. The degree to which this
is realized, of course, depends on how carefully test
levels are assigned and how motivated students are
in the testing situation. The reliability coefficients
reported apply to the consistency of placing students
in grade groups. The reliability with which tests
place students along the total developmental
continuum has been investigated by Loyd (1980).
She examined the effects of individualized or
functional-level testing on reliability in a sample of
fifth- and sixth-grade students. Two ITBS tests,
Language Usage and Math Concepts, were selected
for study because they represent different degrees of
curriculum dependence.
In Loyd’s study, each student was administered an
in-level Language Usage test and one of four levels
of the parallel form, ranging from two levels below to
one level above grade level. Similarly, each student
was administered an in-level Math Concepts test
and one of four levels of the parallel form, ranging
from two levels below to one level above grade level.
To address the issue of which test level provided
more reliable assessment of developmental
placement, an independent assessment of
developmental placement was provided. This was
done by administering an independently scaled,
broad range test, or “scaling test” test, to comparable
samples in each of grades 3 through 8.
Estimates of reliability and expected squared error
in grade-equivalent scores obtained at each level
were analyzed for each reading achievement grade
level. For students at or below grade level,
administering an easier test produced less error.
This suggests that testing such students with a
lower test level may result in more reliable
measurement. For both Language Usage and Math
Concepts, testing above-average students with lower
test levels introduced significantly more error into
derived scores.
The results of this study provide support for the
validity of individualized or out-of-level testing.
Individualized testing is most often used to test
students who are atypical when compared with their
peer group. For students lagging in development,
the results suggest that a lower level test produces
comparable derived scores, and these derived scores
may be more reliable. For students advanced in
development, the findings indicate that testing with
a higher test level results in less error and therefore
more reliable derived scores.
Stability of Scores on the ITBS
The evidence of stability of scores over a long period
of time and across test levels has a special meaning
for achievement tests. Achievement may change
markedly during the course of the school year, or
from the spring of one school year to the fall of the
next. In fact, one goal of good teaching is to alter
patterns of growth that do not satisfy the standards
of progress expected for individual students and for
groups. If the correlations between achievement test
scores for successive years in school are exceedingly
high, this could mean that little was done to adapt
instruction to individual differences that were
revealed by the tests. In addition to changes in
achievement, there are also changes in test content
across levels because of the way curriculum in any
achievement domain changes across grades.
Differences in test content, while subtle, tend to
lower the correlations between scores on adjacent
test levels.
Despite these influences on scores over time,
when equivalent forms are used in two test
administrations, the correlations may be regarded
as lower-bound estimates of equivalent-forms
reliability. In reporting stability coefficients for such
purposes, it is important to remember that they are
attenuated, not only by errors of measurement, but
also by differences associated with changes in true
status and in test content.
The stability coefficients reported in Table 5.7
are based on data from the 2000 national
standardization. In the fall, subsamples of students
who had taken Form A the previous spring were
administered the next level of either Form A or
Form B. The correlations in Table 5.7 are based on
the developmental standard scores from the spring
and fall administrations.
83
84
467
1071
1028
911
890
834
779
942
907
902
840
6(1)A
7(2)A
7(2)B
7(2)A
7(2)B
8(3)A
8(3)B
9(3)A
9(3)B
6(K)A
6(1)A
6(1)A
7(1)A
7(1)A
8(2)A
8(2)A
8(2)A
8(2)A
9(3)A 10(4)A
9(3)A 10(4)B
10(4)A 11(5)A
10(4)A 11(5)B
11(5)A 12(6)A
11(5)A 12(6)B
12(6)A 13(7)A
12(6)A 13(7)B
13(7)A 14(8)A
13(7)A 14(8)B
.77
.77
.79
.82
.78
.78
.76
.81
.74
.77
.74
.78
.77
.81
.77
.77
.34
.52
.67
.43
.79
.74
.76
.82
.76
.79
.79
.79
.75
.78
.70
.72
.70
.81
.73
.73
.71
.74
RC
Note: -Does not include Computation
+Includes Computation
541
627
602
306
335
458
554
155
RV
Comprehension
.84
.81
.83
.88
.84
.85
.85
.86
.82
.84
.80
.84
.83
.87
.83
.81
.77
.79
RT
Reading
Total
.83
.81
.84
.84
.80
.83
.81
.80
.75
.73
.70
.71
.71
.72
.72
.72
L1
Spelling
.72
.72
.74
.73
.71
.71
.66
.70
.63
.62
L2
Capitalization
.76
.73
.77
.77
.74
.77
.71
.74
.68
.68
L3
.75
.72
.75
.77
.72
.73
.73
.76
.71
.71
L4
.87
.83
.88
.88
.86
.86
.85
.87
.82
.81
.74
.74
.74
.77
.75
.72
.49
.57
.74
.56
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.79
.77
.77
.78
.76
.75
.76
.77
.69
.69
.65
.64
.68
.71
.55
.71
M1
.73
.68
.72
.72
.73
.72
.71
.72
.70
.68
.62
.61
.75
.75
.64
.64
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.82
.78
.81
.81
.82
.80
.82
.81
.79
.76
.74
.71
.81
.81
.70
.76
.71
.70
.75
.66
MT -
Math
Total
.68
.66
.57
.64
.64
.62
.58
.62
.58
.52
.57
.54
.61
.60
.53
.58
M3
Computation
Mathematics
.84
.79
.80
.82
.82
.81
.82
.83
.80
.77
.76
.75
.81
.81
.73
.77
MT +
Math
Total
.90
.84
.90
.90
.89
.89
.90
.91
.88
.86
.86
.86
.88
.90
.85
.86
.64
.72
.85
.67
CT -
Core
Total
.90
.84
.90
.90
.89
.89
.91
.91
.87
.86
.87
.86
.88
.90
.86
.85
CT +
Core
Total
.72
.70
.73
.78
.72
.73
.72
.76
.68
.70
.55
.54
.57
.71
.57
.63
SS
Social
Studies
.69
.69
.71
.76
.70
.73
.74
.75
.69
.67
.50
.49
.60
.76
.50
.63
SC
Science
.67
.67
.64
.69
.64
.67
.67
.70
.61
.61
S1
.69
.66
.73
.73
.69
.66
.67
.66
.69
.63
S2
.76
.73
.77
.79
.75
.74
.76
.77
.74
.70
.65
.65
.63
.71
.64
.68
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.89
.85
.89
.91
.88
.89
.90
.91
.87
.86
.84
.85
.88
.91
.87
.87
CC -
.88
.84
.88
.90
.87
.88
.90
.90
.86
.85
.84
.84
.87
.91
.87
.87
CC +
Compo- Composite
site
.73
.81
.66
.74
.59
.67
.61
.57
WA
Word
Analysis
.57
.66
.53
.62
.47
.62
.67
.60
Li
Listening
.87
.91
.86
.85
.77
.82
RPT
Reading
Profile
Total
3:15 PM
503
6(1)A
5(K)A
N
Fall
Vocabulary
Reading
10/29/10
Spring
Level (Grade) Form
Table 5.7
Correlations Between Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Forms A and B
Spring and Fall 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 84
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 85
The top row in the table shows correlations for 503
students who took Form A, Level 5 in the spring of
kindergarten and Form A, Level 6 in the fall of grade
1. Row 2 shows within-level correlations for 155
students who took Form A, Level 6 in both the
spring of kindergarten and the fall of grade 1.
Beginning in row 4 and continuing on alternate
rows are correlations between scores on alternate
forms.
Additional evidence of the stability of ITBS scores is
based on longitudinal data from the Iowa Basic
Skills Testing Program. Mengeling (2002) identified
school districts that had participated in the program
and had tested fourth-grade students in school years
1993–1994, 1994–1995, and 1995–1996. Each
district had also tested at the same time of year and
in grades 5, 6, 7, and 8 in successive years. Matched
records were created for 40,499 students who had
been tested at least once during the years and
grades included in the study. Approximately 50
percent of the records had data for every grade,
although all available data were used as
appropriate. The correlations in Table 5.8 provide
evidence regarding the stability of ITBS scores over
the upper-elementary and middle-school years.
The relatively high stability coefficients reported in
Tables 5.7 and 5.8 support the reliability of the tests.
Many of the year-to-year correlations are nearly as
high as the equivalent-forms reliability estimates
reported earlier. The high correlations also indicate
that achievement in the basic skills measured by the
tests was very consistent across the same years
analyzed in research a decade or more earlier
(Martin, 1985). As discussed previously, these
results might suggest that schools are not making
the most effective use of test results. On the other
hand, stability rates are associated with level of
performance. That is, there is a tendency for aboveaverage students to obtain above-average gains in
performance and for below-average students to
achieve more modest gains.
Table 5.8
Correlations Between Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Forms K and L
Iowa Basic Skills Testing Program
4th to 5th
4th to 6th
4th to 7th
4th to 8th
Test
Fall
Spring
Fall
Spring
Fall
Spring
Fall
Spring
Reading Total
.86
.85
.85
.84
.83
.84
.81
.83
Language Total
.87
.85
.84
.84
.82
.81
.80
.79
Math Total
.81
.80
.81
.79
.79
.79
.78
.77
Core Total
.92
.90
.90
.89
.88
.88
.87
.86
85
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 86
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 87
PART 6
Difficulty of the Tests
Elementary school teachers, particularly those in
the primary grades, often criticize standardized
tests for being too difficult. This probably stems
from the fact that no single test can be perfectly
suited in difficulty for all students in a
heterogeneous grade group. The use of
individualized testing should help to avoid the
frustrations that result when students take tests
that are inappropriate in difficulty. It also is
important for teachers to understand the nature of a
reliable measuring instrument; they should
especially realize that little diagnostic information
is gained from a test on which all students correctly
answer almost all of the items.
Characteristics of the “ideal” difficulty distribution
of items in a test have been the subject of
considerable controversy. Difficulty specifications
differ for types of tests: survey versus diagnostic,
norm-referenced
versus
criterion-referenced,
mastery tests versus tests intended to maximize
individual differences, minimum-competency tests
versus tests designed to measure high standards of
excellence, and so forth. Developments in the area of
individualized testing and adaptive testing also
have shed new light on test difficulty. As noted in the
discussion of reliability, the problem of placing
students along a developmental continuum may
differ from that of determining their ranks in grade.
To maximize the reliability of a ranking within a
group, an achievement test must utilize nearly the
entire range of possible scores; the raw scores on the
test should range from near zero to the highest
possible score. The best way to ensure such a
continuum is to conduct one or more preliminary
tryouts of items that will determine objectively the
difficulty and discriminating power of the items. A
few items included in the final test should be so easy
that at least 80 percent of students answer them
correctly. These should identify the least able
students. Similarly, a few very difficult items should
be included to challenge the most able students.
Most items, however, should be of medium difficulty
Item and Test Analysis
and should discriminate well at all levels of ability.
In other words, the typical student will succeed on
only a little more than half of the test items, while
the least able students may succeed on only a few. A
test constructed in this manner results in the widest
possible range of scores and yields the highest
reliability per unit of testing time.
The ten levels of the Iowa Tests of Basic Skills were
constructed to discriminate in this manner among
students in kindergarten through grade 8. Item
difficulty indices for three times of the year (October
15, January 15, and April 15) are reported in the
Content Classifications with Item Norms booklets.
In Tables 6.1 and 6.2, examples of item norms are
shown for Word Analysis on Level 6 and for
Language Usage and Expression on Level 12 of
Form A. Content classifications and item descriptors
are shown in the first column. The content
descriptors are cross-referenced to the Interpretive
Guide for Teachers and Counselors and to various
criterion-referenced reports. The entries in the
tables are percent correct for total test, major skill
grouping, and item, respectively.
For the Level 6 Word Analysis test, there are 35
items. The mean item percents correct are 53% for
kindergarten, spring; 61% for grade 1, fall; 68% for
grade 1, midyear; and 73% for grade 1, spring. The
items measuring letter recognition (Printed letters)
are very easy—all percent correct values are 90 or
above. Items measuring other skills are quite
variable in difficulty.
In Levels 9 through 14 of the battery, most items
appear in two consecutive grades. In Language
Usage and Expression, item 23, for example,
appears in Levels 11 and 12, and item norms are
provided for grades 5 and 6. In grade 5, the percents
answering this item correctly are 51, 56, and 60 for
fall, midyear, and spring, respectively. In grade 6 the
percents are 63, 65, and 67 (Table 6.2). The
consistent increases in percent correct—from 51% to
67%—show that this item measures skill
development across the two grades.
87
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 88
Table 6.1
Word Analysis Content Classifications with Item Norms
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 6
WORD ANALYSIS
Average Percent Correct
Kindergarten
Grade 1
Item
Number
Spring
Fall
Midyear
(35)
53
61
68
73
(20)
56
64
70
76
Initial sounds: pictures
Initial sounds: pictures
Initial sounds: pictures
Initial sounds: pictures
8
9
10
11
64
52
79
69
68
57
87
76
71
61
90
80
74
65
93
84
Initial sounds: words
Initial sounds: words
Initial sounds: words
Initial sounds: words
Initial sounds: words
Initial sounds: words
Initial sounds: words
Initial sounds: words
12
13
14
15
16
17
18
19
42
24
30
65
49
37
52
42
44
30
36
67
61
50
59
49
53
40
46
68
71
61
67
59
61
49
55
68
81
71
74
68
4
5
6
7
78
70
59
52
90
84
69
65
93
90
78
75
95
95
87
84
20
21
22
23
66
60
62
65
73
67
68
75
78
70
72
79
82
72
75
82
(15)
50
58
64
70
1
2
3
91
90
91
98
96
97
99
97
98
99
98
98
Letter substitutions
Letter substitutions
Letter substitutions
Letter substitutions
Letter substitutions
Letter substitutions
24
25
26
27
28
29
46
46
33
41
39
23
57
53
39
48
49
30
62
57
49
55
57
39
67
61
58
61
65
47
Word building
Word building
Word building
Word building
Word building
Word building
30
31
32
33
34
35
48
58
31
45
32
35
63
67
37
63
33
42
73
78
49
72
35
47
82
88
61
81
37
51
WORD ANALYSIS
Phonological Awareness and Decoding
Letter-sound correspondences
Letter-sound correspondences
Letter-sound correspondences
Letter-sound correspondences
Rhyming sounds
Rhyming sounds
Rhyming sounds
Rhyming sounds
Identifying and Analyzing Word Parts
Printed letters
Printed letters
Printed letters
88
(Number
of
Items)
Spring
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 89
Table 6.2
Usage and Expression Content Classifications with Item Norms
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 12
USAGE AND EXPRESSION
(Number
of
Items)
Average Percent Correct
Grade 6
Item
Number
Fall
Midyear
USAGE AND EXPRESSION
(38)
59
61
63
Nouns, Pronouns, and Modifiers
(10)
58
60
62
Irregular plurals
Spring
6
58
59
59
Homonyms
12
26
27
30
Redundancies
15
63
65
67
Pronoun case
5
70
72
73
Nonstandard pronouns
7
49
50
50
Comparative adjectives
4
55
57
58
1
9
11
14
73
62
65
63
74
65
68
65
75
68
70
67
Misuse of adjective for adverb
Misuse of adjective
Misuse of adverb
Misuse of adjective for adverb
Verbs
(6)
58
61
62
Subject-verb agreement
19
60
62
63
Tense
Tense
3
17
58
57
60
59
62
60
Participles
Verb forms
Verb forms
2
10
34
59
69
47
63
72
48
66
74
49
Conciseness and Clarity
(5)
53
55
57
Lack of conciseness
35
39
40
40
Combining sentences
24
36
38
41
Misplaced modifiers
27
70
73
75
Ambiguous references
Ambiguous references
23
32
63
57
65
59
67
61
Organization of Ideas
(6)
59
61
62
Appropriate sentence order
Appropriate sentence order
21
26
57
52
59
54
61
58
Sentences appropriate to function
Sentences appropriate to function
Sentences appropriate to function
25
33
37
74
65
43
76
67
44
77
68
44
Sentences suitable to purpose
20
62
63
64
Appropriate Use
(11)
64
66
67
Use of complete sentences
Use of complete sentences
16
28
60
63
62
65
64
66
Appropriate word order in sentences
Appropriate word order in sentences
Appropriate word order in sentences
Appropriate word order in sentences
13
22
30
36
62
69
64
59
65
70
66
61
67
71
67
63
Parallel construction
38
47
48
49
Conjunctions
Conjunctions
29
31
67
60
69
62
71
63
Correct written language
Correct written language
8
18
78
74
79
75
79
76
89
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 90
The distributions of item norms (proportion correct)
are shown in Table 6.3 for all tests and levels of the
ITBS. The results are based on an analysis of the
weighted sample from the 2000 spring national
standardization of Form A. As can be seen from the
various sections of Table 6.3, careful test
construction led to an average national item
difficulty of about .60 for spring testing. At the lower
grade levels, tests tend to be slightly easier than
this; at the upper grade levels, they tend to be
slightly harder. In general, tests with average item
difficulty of about .60 will have nearly optimal
internal consistency-reliability coefficients.
These distributions also illustrate the variability in
item difficulty needed to discriminate throughout
the entire ability range. It is extremely important in
test development to include both relatively easy and
relatively difficult items at each level. Not only are
such items needed for motivational reasons, but
they are critical for a test to have enough ceiling for
the most capable students and enough floor for the
least capable ones. Nearly all tests and all levels
have some items with difficulties above .8 as well as
some items below .3.
Table 6.3
Distribution of Item Difficulties
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Level 5
Grade K
Vocabulary
Word Analysis
Listening
Language
Mathematics
V
WA
Li
L
M
Proportion Correct
>=.90
3
4
2
5
.80–.89
11
1
6
5
7
.70–.79
3
8
11
7
7
.60–.69
3
8
7
8
4
.50–.59
2
3
3
4
2
.40–.49
5
4
2
3
3
.30–.39
2
2
0
.20–.29
1
.10–.19
<.10
Average
Level 6
Grade K
.70
.66
.70
.69
.72
Vocabulary
Word Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
Proportion Correct
>=.90
3
.80–.89
2
0
.70–.79
4
3
2
.60–.69
5
7
6
1
3
2
.50–.59
8
5
8
8
7
6
.40–.49
5
8
6
6
7
8
1
9
.30–.39
4
7
5
11
8
9
9
18
.20–.29
3
2
4
2
6
4
8
12
1
1
.29
.37
.10–.19
4
3
2
6
<.10
Average
90
.54
.53
.49
.40
.46
.43
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 91
Table 6.3 (continued)
Distribution of Item Difficulties
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Level 6
Grade 1
Vocabulary
Word Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
6
6
4
4
7
Proportion Correct
>=.90
7
.80–.89
6
9
9
9
7
14
6
20
.70–.79
5
5
6
10
8
7
5
12
.60–.69
5
9
6
5
9
1
.50–.59
5
3
3
2
5
.40–.49
4
2
3
3
1
1
1
.30–.39
1
.20–.29
5
6
3
3
.72
.79
1
.10–.19
<.10
Average
.72
.73
.73
.70
.72
Mathematics
Reading
Level 7
Grade 1
.84
Word
Analysis
Listening
Vocabulary
Comprehension
RV
RC
WA
Li
1
1
2
Spelling
Language
Concepts
L1
L
Problems
& Data Computation
Interpretation
Social
Studies
Science
Sources
of
Information
SI
M1
M2
M3
SS
SC
3
1
1
1
6
Proportion Correct
>=.90
.80–.89
5
4
2
4
7
7
10
6
6
5
12
3
.70–.79
3
4
13
7
6
9
4
3
8
12
5
4
.60–.69
8
13
11
7
5
10
4
6
5
9
2
9
.50–.59
10
11
6
7
3
7
3
3
4
2
4
4
.40–.49
3
1
1
3
2
1
2
3
2
1
1
2
.30–.39
1
0
0
3
5
1
0
1
1
1
.68
.67
.71
.68
Word
Analysis
Listening
Spelling
Language
.20–.29
1
1
.10–.19
<.10
Average
.62
.66
.70
.69
.70
.77
.65
Social
Studies
Science
Sources
of
Information
SS
SC
SI
Mathematics
Reading
Level 8
Grade 2
.61
Problems
& Data Computation
Interpretation
Vocabulary
Comprehension
RV
RC
4
2
3
2
1
3
4
5
2
2
3
11
3
4
5
10
10
6
5
6
7
.70–.79
4
12
6
6
6
15
3
3
6
8
7
8
.60–.69
13
3
13
12
8
8
5
4
4
4
6
7
.50–.59
8
6
9
6
2
6
4
6
4
6
6
6
.40–.49
3
1
3
2
3
6
1
4
2
3
.30–.39
0
1
2
3
1
3
1
0
.20–.29
1
Concepts
WA
Li
L1
L
M1
M2
M3
Proportion Correct
>=.90
.80–.89
1
.10–.19
4
1
1
<.10
Average
.63
.73
.64
.69
.74
.72
.69
.65
.67
.67
.68
.66
91
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 92
Table 6.3 (continued)
Distribution of Item Difficulties
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Reading
Level 9
Grade 3
Language
Sources of
Information
Mathematics
Social
Studies
Science
SS
SC
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
M2
1
1
7
3
3
1
6
2
2
2
Problems
Computa& Data
tion
Interpretation
M3
Maps
Reference
and
Materials
Diagrams
S1
S2
Word
Analysis
Listening
WA
Li
Proportion Correct
>=.90
.80–.89
4
2
1
4
2
8
3
.70–.79
4
8
6
5
4
8
6
4
7
12
4
6
6
4
9
.60–.69
9
8
5
6
3
10
6
5
4
5
9
3
7
15
6
.50–.59
7
10
8
5
6
8
7
6
7
5
7
4
7
4
6
.40–.49
4
9
2
2
6
2
5
4
3
4
6
4
4
2
2
.30–.39
0
2
3
2
1
1
0
2
2
3
4
1
1
.20–.29
1
.62
.57
.68
.70
1
2
.10–.19
<.10
Average
.62
.58
.67
Reading
Level 10
Grade 4
.61
.58
.62
.63
Language
.63
.64
.63
.55
Sources of
Information
Mathematics
Social
Studies
Science
SS
SC
S1
3
2
2
3
5
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
.80–.89
1
2
4
2
1
2
4
5
3
.70–.79
6
8
9
7
3
6
8
3
7
9
.60–.69
13
11
9
2
7
16
10
5
5
6
9
8
10
.50–.59
7
13
3
4
7
5
8
7
3
6
16
5
5
.40–.49
3
5
5
8
5
3
7
1
4
10
5
5
5
.30–.39
4
1
1
1
4
2
2
3
2
1
3
1
1
1
.60
.63
.61
.56
.60
.61
Problems
Computa& Data
tion
Interpretation
M2
M3
S2
Proportion Correct
>=.90
1
.20–.29
1
2
.10–.19
<.10
Average
.60
Reading
Level 11
Grade 5
.55
.62
.62
Language
.60
.63
.61
Sources of
Information
Mathematics
Social
Studies
Science
M3
SS
SC
5
1
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
.80–.89
1
2
3
.70–.79
8
5
10
4
5
12
8
3
6
5
4
5
8
.60–.69
9
18
5
5
6
10
9
8
7
8
10
6
11
.50–.59
9
13
9
7
11
7
11
6
6
11
11
6
9
.40–.49
7
3
3
6
5
4
8
4
1
9
12
6
4
.30–.39
3
2
5
2
0
1
2
2
3
2
1
1
1
1
.57
.61
.58
.62
Problems
Computa& Data
tion
Interpretation
M2
S1
S2
Proportion Correct
>=.90
1
4
.20–.29
1
1
1
1
1
3
.10–.19
<.10
Average
92
.59
.61
.61
.60
.60
.57
.57
.56
.62
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 93
Table 6.3 (continued)
Distribution of Item Difficulties
Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization (Weighted Sample)
Reading
Level 12
Grade 6
Language
Sources of
Information
Mathematics
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
Problems
Computa& Data
tion
Interpretation
M2
M3
Social
Studies
Science
SS
SC
S1
Maps
Reference
and
Materials
Diagrams
S2
Proportion Correct
>=.90
1
1
.80–.89
3
2
6
4
1
.70–.79
7
7
6
3
9
.60–.69
9
16
7
7
.50–.59
13
12
9
7
.40–.49
5
6
9
.30–.39
1
1
1
.20–.29
1
1
2
4
6
2
3
3
10
7
4
6
4
6
4
8
6
18
12
5
4
6
14
7
6
7
4
13
7
5
15
10
5
9
6
6
5
6
7
4
11
6
5
2
1
1
1
2
4
1
3
3
5
1
1
.58
.59
2
1
.10–.19
<.10
Average
.62
.62
.62
Reading
Level 13
Grade 7
.59
.61
.63
.61
Language
.62
.61
.55
.58
Social
Studies
Science
SS
SC
Sources of
Information
Mathematics
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
2
1
2
4
2
Problems
Computa& Data
tion
Interpretation
M2
M3
Maps
Reference
and
Materials
Diagrams
S1
S2
Proportion Correct
>=.90
.80–.89
1
1
4
2
1
2
1
.70–.79
6
7
7
4
8
11
7
5
6
.60–.69
10
14
8
8
4
15
11
7
3
12
7
3
5
13
8
12
.50–.59
14
20
18
8
13
6
14
7
4
.40–.49
7
6
2
5
0
4
7
7
10
12
7
4
10
13
11
5
.30–.39
1
3
3
4
2
2
1
5
7
4
3
4
.20–.29
1
1
2
2
.58
.60
1
4
.10–.19
<.10
Average
.59
.59
.60
Reading
Level 14
Grade 8
.60
.61
Language
.60
.54
.52
.57
.53
.57
Social
Studies
Science
SS
SC
S1
1
1
3
9
6
8
Sources of
Information
Mathematics
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
1
1
.80–.89
1
3
2
4
2
1
.70–.79
7
9
7
5
2
8
.60–.69
12
12
12
8
11
12
9
7
4
10
6
5
7
.50–.59
14
16
8
8
8
10
14
6
4
17
10
5
11
.40–.49
6
10
7
3
7
7
7
9
13
10
14
6
7
.30–.39
1
2
3
3
4
3
6
2
5
2
4
6
4
.20–.29
1
2
2
2
4
2
1
.58
.60
.57
.55
.50
.54
Problems
Computa& Data
tion
Interpretation
M2
M3
S2
Proportion Correct
>=.90
1
9
1
2
6
2
2
.10–.19
<.10
Average
.59
.60
.56
.58
.55
.54
.57
93
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 94
A summary of the Form A difficulty indices for all
tests and grades is presented in Table 6.4. The
difficulty indices reported for each grade are item
proportions (p-values) rather than percents correct.
These data are from the 2000 spring and fall
standardizations; the mean item proportions (in
italics), the medians, and the 10th and 90th
percentiles in the distributions of item difficulty are
given. Comparable data for Form B are included in
Norms and Score Conversions, Form B. Norms and
score conversions for Forms A and B of the Survey
Battery are published in separate booklets.
Appropriateness of test difficulty can best be
ascertained by examining relationships between
raw scores, standard scores, and percentile ranks in
the tables in Norms and Score Conversions. For
example, the norms tables indicate 39 of 43 items on
Level 12 of the Concepts and Estimation test must
be answered correctly to score at the 99th percentile
in the fall of grade 6, and that 41 items must be
answered correctly to score at the 99th percentile in
the spring. Similarly, the number of items needed to
score at the median for the three times of the year
are 23, 25, and 27 (out of 43), respectively. This test
thus appears to be close to ideal in item difficulty for
the grade in which it is typically used.
It should be noted that these difficulty
characteristics are for a cross section of the
attendance centers in the nation. The distributions
of item difficulty vary markedly among attendance
centers, both within and between school systems. In
some schools, when the same levels are
administered to all students in a given grade, the
tests are too difficult; in others they may be too easy.
When tests are too difficult, a given student’s scores
may be determined largely by “chance.” When tests
are too easy and scores approach the maximum
possible, a student’s true performance level may be
seriously underestimated.
94
Individualized testing is necessary to adapt
difficulty levels to the needs and characteristics of
individual students. The Interpretive Guide for
School Administrators discusses issues related to
the selection of appropriate test levels. Both content
and difficulty should be considered when assigning
levels of the tests to individual students. The tasks
reflected by the test questions should be relevant to
the student’s needs and level of development. At the
same time, the level of difficulty of the items should
be such that the test is challenging, but success is
attainable.
Discrimination
As discussed in Part 3, item discrimination indices
(item-test correlations) are routinely determined in
tryout and are one of several criteria for item
selection. Developmental discrimination is inferred
from tryout and standardization data showing that
items administered at adjacent grade levels have
increasing p-values from grade to grade.
Discrimination indices (biserial correlations) were
computed for items in all tests and grades for Form
A in the 2000 spring standardization program.
The means (in italics), medians, and the 90th and
10th percentiles in the distributions of biserial
correlations are shown in Table 6.4. As would be
expected, discrimination indices vary considerably
from grade to grade, from test to test, and even from
skill to skill. In general, discrimination indices
tend to be higher for tests that are relatively
homogeneous in content and lower for tests that
include complex stimuli or for skills within tests
that require complex reasoning processes.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 95
Table 6.4
Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Vocabulary
Word Analysis
Listening
Language
Mathematics
V
WA
Li
L
M
29
30
29
29
29
Mean
.60
.53
.56
.53
.56
P90
.80
.79
.68
.75
.81
Median
.66
.53
.55
.52
.60
P10
.31
.29
.37
.29
.26
Mean
.70
.66
.70
.69
.72
P90
.89
.91
.81
.87
.91
Median
.75
.67
.71
.67
.75
P10
.40
.40
.51
.46
.43
Mean
.48
.55
.53
.53
.55
P90
.66
.69
.68
.66
.67
Median
.50
.54
.54
.54
.55
P10
.30
.42
.40
.39
.40
Level 5
Grade K
Number of Items
Difficulty
Fall
Spring
Discrimination
Spring
Level 6
Grade 1
Vocabulary
Word Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
Number of Items
31
35
31
31
35
29
19
48
Difficulty
Fall
Mean
.61
.61
.59
.52
.57
.63
.45
.56
P90
.85
.89
.80
.71
.82
.78
.53
.76
Median
.58
.62
.57
.53
.54
.63
.45
.53
P10
.38
.35
.36
.22
.38
.47
.29
.38
Spring
Mean
.72
.73
.73
.70
.72
.84
.72
.79
P90
.93
.95
.92
.87
.91
.92
.81
.92
Median
.72
.73
.73
.72
.71
.83
.75
.81
P10
.48
.50
.50
.43
.52
.75
.56
.66
Discrimination
Spring
Mean
.48
.54
.49
.51
.53
.73
.76
.74
P90
.64
.72
.61
.69
.66
.84
.91
.87
Median
.48
.54
.48
.50
.55
.72
.75
.73
P10
.30
.37
.35
.38
.38
.62
.63
.62
95
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 96
Table 6.4 (continued)
Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Reading
Level 7
Grade 2, Fall
Grade 1, Spring
Number of Items
Difficulty
Fall
Mathematics
Word
Analysis
Listening
Spelling
Language
Problems
& Data Computation
Interpretation
Social
Studies
Science
Sources
of
Information
Vocabulary
Comprehension
RV
RC
WA
Li
L1
L
M1
M2
M3
SS
SC
SI
30
34
35
31
23
34
29
28
27
31
31
22
Concepts
(Grade 2)
Mean
.70
.75
.74
.73
.81
.76
.77
.68
.76
.76
.82
.76
P90
.89
.90
.85
.94
.93
.92
.95
.91
.91
.89
.96
.91
Median
.67
.74
.75
.73
.83
.77
.83
.70
.82
.79
.87
.76
P10
.53
.63
.60
.57
.63
.60
.41
.36
.51
.62
.59
.56
Spring (Grade 1)
Mean
.62
.66
.68
.67
.71
.68
.71
.61
.69
.70
.77
.65
P90
.80
.82
.79
.88
.84
.84
.89
.84
.87
.84
.93
.82
Median
.60
.65
.68
.69
.71
.66
.74
.60
.73
.72
.83
.63
P10
.47
.54
.54
.49
.52
.53
.37
.32
.44
.57
.51
.45
Discrimination
Spring
Mean
.63
.68
.55
.43
.74
.59
.53
.54
.64
.48
.52
.63
P90
.76
.79
.68
.52
.87
.76
.71
.64
.71
.59
.73
.72
Median
.64
.70
.56
.44
.76
.60
.53
.56
.67
.51
.52
.67
P10
.49
.53
.40
.33
.61
.46
.34
.41
.52
.33
.31
.48
Social
Studies
Science
Sources
of
Information
Reading
Level 8
Grade 3, Fall
Grade 2, Spring
Number of Items
Difficulty
Fall
Mathematics
Word
Analysis
Listening
Spelling
Language
Problems
& Data Computation
Interpretation
Vocabulary
Comprehension
RV
RC
WA
Li
L1
L
M1
M2
M3
SS
SC
SI
32
38
38
31
23
42
31
30
30
31
31
28
Concepts
(Grade 3)
Mean
.70
.79
.69
.75
.81
.78
.75
.71
.71
.72
.73
.74
P90
.86
.93
.84
.89
.96
.92
.95
.94
.95
.90
.89
.89
Median
.70
.82
.68
.74
.81
.81
.80
.66
.76
.74
.75
.76
P10
.57
.56
.52
.58
.66
.57
.42
.48
.34
.47
.52
.56
Spring (Grade 2)
Mean
.63
.73
.64
.69
.74
.72
.69
.65
.67
.67
.68
.66
P90
.79
.89
.82
.84
.88
.86
.89
.90
.91
.85
.84
.82
Median
.62
.76
.62
.68
.73
.74
.71
.60
.70
.70
.70
.67
P10
.49
.50
.47
.53
.59
.53
.35
.42
.32
.45
.45
.48
Discrimination
Spring
96
Mean
.61
.69
.52
.45
.65
.56
.53
.57
.59
.41
.44
.60
P90
.71
.95
.61
.54
.80
.71
.67
.70
.68
.55
.61
.70
Median
.63
.69
.54
.45
.63
.57
.51
.56
.60
.40
.46
.63
P10
.46
.37
.37
.34
.55
.42
.39
.45
.51
.32
.21
.44
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 97
Table 6.4 (continued)
Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Reading
Level 9
Grade 3
Number of Items
Language
Sources of
Information
Mathematics
Social
Studies
Science
Spelling
Punctuation
Usage &
Expression
Concepts
&
Estimation
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
37
28
24
24
30
31
22
25
30
30
24
Comprehension
RV
29
Problems
Computa& Data
tion
Interpretation
Word
Analysis
Listening
S2
WA
Li
28
35
31
Maps
Reference
and
Materials
Diagrams
Capitalization
Vocabulary
Difficulty
Fall
Mean
.52
.49
.62
.62
.55
.52
.49
.52
.53
.53
.46
.54
.46
.62
.62
P90
.70
.65
.84
.84
.72
.71
.72
.69
.75
.76
.70
.69
.65
.85
.84
Median
.50
.49
.59
.61
.54
.54
.46
.53
.51
.47
.45
.57
.44
.59
.62
P10
.32
.31
.47
.43
.36
.26
.26
.35
.30
.36
.21
.30
.29
.47
.43
Spring
Mean
.62
.58
.67
.61
.58
.62
.63
.63
.64
.63
.55
.62
.57
.68
.70
P90
.80
.77
.84
.80
.80
.73
.84
.82
.81
.75
.74
.87
.71
.88
.90
Median
.62
.58
.66
.63
.55
.64
.65
.60
.61
.66
.54
.60
.58
.67
.70
P10
.45
.43
.50
.36
.37
.41
.40
.46
.46
.41
.36
.37
.37
.51
.47
Discrimination
Spring
Mean
.67
.64
.69
.66
.62
.64
.55
.66
.64
.57
.56
.60
.61
.54
.45
P90
.80
.76
.83
.83
.78
.76
.67
.78
.74
.71
.70
.75
.81
.68
.53
Median
.68
.63
.69
.69
.65
.64
.56
.69
.67
.58
.57
.63
.63
.54
.44
P10
.56
.53
.58
.44
.44
.51
.42
.50
.56
.40
.40
.42
.43
.42
.37
Reading
Level 10
Grade 4
Language
Sources of
Information
Mathematics
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
34
41
32
26
26
33
36
24
27
34
34
25
30
Mean
.51
.53
.55
.56
.49
.56
.53
.53
.50
.54
.49
.53
.52
P90
.65
.67
.73
.76
.62
.72
.73
.79
.71
.71
.62
.76
.68
Median
.52
.53
.57
.53
.49
.56
.51
.52
.52
.50
.48
.53
.53
P10
.29
.37
.35
.31
.32
.35
.34
.24
.24
.36
.34
.36
.30
Mean
.60
.60
.63
.61
.55
.62
.62
.60
.63
.61
.56
.61
.61
P90
.75
.73
.80
.81
.69
.75
.78
.84
.80
.76
.66
.80
.75
Median
.61
.60
.67
.59
.55
.62
.61
.59
.69
.60
.56
.61
.63
P10
.37
.44
.43
.37
.37
.45
.46
.30
.38
.44
.42
.43
.38
Mean
.63
.59
.63
.61
.59
.64
.57
.63
.65
.55
.57
.60
.61
P90
.76
.72
.74
.76
.74
.78
.70
.76
.74
.70
.74
.71
.77
Median
.66
.58
.63
.61
.58
.64
.58
.64
.65
.58
.59
.60
.64
P10
.46
.45
.51
.41
.47
.47
.46
.48
.55
.38
.40
.44
.36
Number of Items
Vocabulary
Comprehension
RV
Problems
Computa& Data
tion
Interpretation
Difficulty
Fall
Spring
Discrimination
Spring
97
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 98
Table 6.4 (continued)
Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Reading
Level 11
Grade 5
Language
Sources of
Information
Mathematics
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
37
43
36
28
28
35
40
26
29
37
37
26
32
Mean
.51
.55
.54
.56
.53
.57
.52
.52
.52
.51
.51
.51
.55
P90
.66
.70
.73
.78
.69
.71
.73
.64
.73
.68
.62
.66
.71
Median
.51
.55
.52
.54
.50
.57
.50
.50
.54
.50
.51
.50
.57
P10
.33
.40
.31
.34
.38
.43
.33
.28
.28
.34
.37
.33
.39
Mean
.59
.61
.61
.60
.57
.61
.60
.58
.62
.57
.57
.56
.62
P90
.75
.75
.79
.80
.72
.74
.79
.71
.80
.74
.69
.73
.78
Median
.57
.62
.60
.58
.56
.61
.58
.56
.65
.55
.57
.55
.63
P10
.42
.45
.38
.40
.42
.47
.41
.34
.32
.41
.44
.38
.46
Mean
.59
.58
.63
.59
.61
.61
.54
.62
.64
.54
.58
.57
.62
P90
.74
.72
.80
.71
.73
.76
.70
.78
.80
.69
.76
.70
.79
Median
.61
.58
.63
.61
.61
.63
.54
.63
.66
.57
.59
.60
.64
P10
.43
.43
.49
.45
.41
.47
.39
.44
.46
.35
.42
.40
.46
Number of Items
Problems
Computa& Data
tion
Interpretation
Difficulty
Fall
Spring
Discrimination
Spring
Reading
Level 12
Grade 6
Language
Sources of
Information
Mathematics
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
39
45
38
30
30
38
43
28
30
39
39
28
34
Mean
.56
.57
.56
.56
.58
.59
.55
.57
.55
.51
.53
.54
.54
P90
.72
.71
.78
.79
.72
.71
.69
.74
.74
.66
.63
.70
.72
Median
.53
.58
.53
.53
.56
.60
.53
.55
.56
.48
.53
.52
.53
P10
.40
.43
.39
.34
.44
.42
.37
.39
.30
.38
.39
.35
.32
.62
.62
.62
.59
.61
.63
.61
.62
.61
.55
.58
.58
.59
Number of Items
Problems
Computa& Data
tion
Interpretation
Difficulty
Fall
Spring
Mean
P90
.76
.75
.84
.82
.78
.75
.75
.81
.81
.72
.71
.75
.78
Median
.59
.63
.59
.57
.62
.64
.60
.59
.60
.53
.60
.54
.59
P10
.48
.47
.44
.35
.47
.43
.44
.42
.35
.42
.42
.39
.34
Mean
.57
.58
.62
.54
.62
.61
.58
.61
.58
.51
.58
.56
.58
P90
.68
.73
.74
.73
.74
.76
.73
.80
.70
.67
.72
.74
.71
Median
.58
.59
.64
.55
.63
.64
.60
.62
.61
.53
.58
.59
.61
P10
.44
.44
.46
.34
.50
.45
.43
.37
.48
.33
.40
.37
.37
Discrimination
Spring
98
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 99
Table 6.4 (continued)
Summary of Difficulty (Proportion Correct) and Discrimination (Biserial) Indices
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Reading
Level 13
Grade 7
Language
Sources of
Information
Mathematics
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
41
48
40
32
32
40
46
30
31
41
41
30
36
Mean
.54
.55
.56
.57
.55
.57
.56
.57
.49
.48
.52
.50
.53
P90
.70
.67
.74
.77
.72
.71
.73
.72
.69
.59
.65
.70
.67
Median
.53
.54
.53
.57
.53
.61
.53
.54
.43
.47
.51
.49
.55
P10
.36
.43
.42
.36
.33
.30
.39
.39
.28
.34
.35
.24
.37
Mean
.59
.59
.60
.60
.58
.60
.61
.60
.54
.52
.57
.53
.57
P90
.76
.72
.79
.80
.75
.75
.78
.75
.72
.63
.72
.74
.70
Median
.57
.58
.58
.59
.58
.63
.59
.58
.49
.51
.55
.52
.59
P10
.42
.48
.48
.39
.35
.34
.43
.41
.32
.38
.40
.25
.41
Mean
.54
.60
.61
.54
.59
.59
.56
.60
.54
.51
.57
.52
.56
P90
.64
.73
.72
.75
.75
.78
.70
.76
.64
.62
.72
.68
.69
Median
.56
.61
.62
.53
.63
.64
.58
.59
.54
.52
.58
.58
.57
P10
.45
.43
.46
.37
.39
.36
.40
.42
.42
.37
.41
.25
.40
Number of Items
Problems
Computa& Data
tion
Interpretation
Difficulty
Fall
Spring
Discrimination
Spring
Reading
Level 14
Grade 8
Language
Sources of
Information
Mathematics
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
Concepts
&
Estimation
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
42
52
42
34
34
43
49
32
32
43
43
31
38
Mean
.55
.56
.54
.58
.53
.55
.50
.56
.46
.51
.52
.51
.55
P90
.67
.73
.72
.80
.67
.72
.69
.75
.63
.62
.69
.73
.69
Median
.55
.54
.56
.57
.53
.55
.52
.54
.41
.52
.49
.51
.55
P10
.40
.41
.31
.29
.36
.33
.26
.41
.30
.39
.36
.30
.35
.59
.60
.58
.60
.56
.57
.55
.58
.50
.54
.55
.54
.57
Number of Items
Problems
Computa& Data
tion
Interpretation
Difficulty
Fall
Spring
Mean
P90
.74
.76
.78
.83
.71
.74
.71
.78
.71
.66
.73
.76
.74
Median
.59
.58
.61
.61
.56
.57
.56
.56
.47
.55
.54
.53
.57
P10
.45
.44
.32
.30
.38
.34
.30
.42
.31
.40
.37
.32
.36
Mean
.54
.59
.59
.55
.56
.56
.54
.60
.56
.52
.56
.54
.55
P90
.68
.72
.71
.79
.72
.79
.71
.75
.64
.67
.75
.71
.71
Median
.54
.63
.60
.58
.57
.58
.55
.60
.57
.53
.58
.58
.55
P10
.41
.42
.43
.32
.35
.31
.35
.46
.43
.32
.32
.32
.35
Discrimination
Spring
99
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 100
Ceiling and Floor Effects
The ITBS battery is designed for flexibility in
assigning test levels to students. In schools where
all students in a given grade are tested with the
same level, it is important that each level of the test
accurately measures students of all ability levels.
For exceptionally able students or students who are
challenged in skills development, individualized
testing with appropriate levels can be used to match
test content and item difficulty to student ability
levels.
Students at the extremes of the score distributions
are of special concern. To measure high-ability
students accurately, the test must have enough
ceiling to allow such students to demonstrate their
skills. If the test is too easy, a considerable
proportion of these students will obtain perfect or
near-perfect scores. If the test is too difficult for lowability students, many will obtain chance scores and
such scores may have inflated percentile ranks.
A summary of ceiling and floor effects for all tests
and grades for spring testing is shown in Table 6.5.
The top line of the table for each grade is the
number of items in each test (k). Under “Ceiling,”
the percentile rank of a perfect score is listed for
each test as well as the percentile rank of a score one
less than perfect (k-1).
A “chance” score is frequently defined as the number
of items in the test divided by the average number
of responses per item. The percentile ranks of these
“chance” estimates are listed under “Floor.” Of
course, not all students who score at this level do so
by chance. However, a substantial proportion of
students scoring at this level is an indication the
test may be too difficult and that individualized
testing should be considered.
Completion Rates
There is by no means universal agreement on the
issue of speed in achievement testing. Many believe
that the fact good students tend to be quicker than
poor students is not in itself a sufficient reason to
penalize the occasional good student who works
slowly. On the other hand, if time limits are too
generous in order to free all examinees from time
constraints, a considerable portion of the examination
period is wasted for the majority of students who
work at a reasonable pace. Also, when items are
arranged essentially in order of increasing difficulty,
as they are in most tests, it may be unreasonable to
100
expect all students to complete the difficult items at
the end of the test.
The speed issue also differs with types of tests,
levels of development, and nature of the objectives.
Speed can be important in repetitive, mechanical
operations such as letter recognition and math
computation. It is difficult to conceive of proficiency
in computation without taking rate of performance
into account. If time limits are too generous on a test
of estimation, students are likely to perform
calculations instead of using estimation skills. On
the other hand, in tests that require creative, critical
thinking it is quality of thought rather than speed of
response that is important.
Two indices of completion rates for all tests and
levels from the spring standardization are shown in
Table 6.5. The first is percent of students completing
the test. Because something less than 100 percent
completion is generally considered ideal for most
achievement tests, another widely used index is
given: the percent of students who completed at
least 75 percent of the items in the test.
The data in Table 6.5 indicate that most of the ITBS
tests at most levels are essentially power tests. The
only obvious exception is Math Computation, in
which time limits are intentionally imposed to help
teachers identify students who are not proficient in
computation because they work slowly. It should be
noted that two completion rates are reported for the
Reading Comprehension test beginning at Level 9.
As discussed in Part 3, this test was divided into two
separately timed sections in Forms A and B. This
change resulted in higher completion rates than
were observed in previous editions of the ITBS.
Other Test Characteristics
In addition to the statistical considerations
presented in this part of the Guide to Research and
Development, other factors of interest are routinely
examined as a part of the test development and
evaluation process. For example, readability indices
are computed for appropriate test materials that
involve significant amounts of reading. These are
reported in Part 3. Measures of differential item
functioning (DIF) are computed for both tryout
materials and final forms of the tests. Part 7
discusses these procedures and results. Finally,
complete test and item analysis information is
published in separate norms books for Form B
Complete Battery and Form A and Form B Survey
Battery.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 101
Table 6.5
Ceiling Effects, Floor Effects, and Completion Rates
Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample)
Spring 2000 National Standardization
Vocabulary
Word Analysis
Listening
Language
Mathematics
V
WA
Li
L
M
29
30
29
29
29
CEILING
Percentile Rank of Top Scores
k (perfect score)
k-1
99.9
98.8
99.7
98.4
99.9
98.2
99.9
97.9
99.9
97.2
FLOOR
Percent of Students
Scoring Below k/n*
0.6
2.1
2.8
2.6
1.3
COMPLETION RATES
Percent of Students
Completing Test
98
98
98
98
98
99
98
99
99
99
Level 5
Grade K
Number of Items (k)
Percent of Students
Completing 75% of Test
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
31
35
31
31
35
29
19
48
CEILING
Percentile Rank of Top Scores
k (perfect score)
k-1
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
98.8
99.9
98.9
99.9
99.9
FLOOR
Percent of Students
Scoring Below k/n*
11.6
10.1
21.2
42.7
22.0
20.0
81.0
26.5
COMPLETION RATES
Percent of Students
Completing Test
98
93
95
98
95
97
80
91
99
95
98
98
97
97
85
93
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
Level 6
Grade K
Number of Items (k)
Percent of Students
Completing 75% of Test
Level 6
Grade 1
Number of Items (k)
31
35
31
31
35
29
19
48
CEILING
Percentile Rank of Top Scores
k (perfect score)
k-1
98.8
97.5
99.4
97.3
99.9
98.4
99.9
98.6
99.9
98.9
90.1
71.0
90.1
73.2
91.6
83.8
FLOOR
Percent of Students
Scoring Below k/n*
1.0
0.3
0.8
2.5
1.8
0.7
10.9
1.8
COMPLETION RATES
Percent of Students
Completing Test
99
98
98
99
98
99
96
97
99
99
99
99
98
99
97
98
Percent of Students
Completing 75% of Test
* n = number of answer choices
101
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 102
Table 6.5 (continued)
Ceiling Effects, Floor Effects, and Completion Rates
Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample)
Spring 2000 National Standardization
Reading
Level 7
Grade 1
Mathematics
Word
Analysis
Listening
Spelling
Language
Problems
& Data Computation
Interpretation
Social
Studies
Science
Sources
of
Information
SI
Vocabulary
Comprehension
RV
RC
WA
30
34
35
k (perfect score)
98.9
98.0
99.9
99.9
95.0
99.0
99.9
99.9
98.8
99.9
99.9
98.7
k-1
95.8
92.8
97.9
99.9
85.9
96.1
98.5
99.0
95.4
99.9
98.4
93.9
FLOOR
Percent of Students
Scoring Below k/n*
5.1
9.3
2.9
1.9
8.0
5.5
1.3
2.7
3.4
1.6
0.7
10.6
COMPLETION RATES
Percent of Students
Completing Test
96
96
98
97
99
98
98
97
79
99
99
98
98
98
98
99
99
99
96
99
93
99
99
98
Number of Items (k)
Concepts
Li
L1
L
M1
M2
M3
SS
SC
31
23
34
29
28
27
31
31
22
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
Reading
Level 8
Grade 2
Mathematics
Word
Analysis
Listening
Spelling
Language
Problems
& Data Computation
Interpretation
Social
Studies
Science
Sources
of
Information
SI
Vocabulary
Comprehension
RV
RC
WA
32
38
38
k (perfect score)
99.9
99.9
99.9
99.9
96.4
99.7
99.9
99.7
99.9
99.9
99.9
99.3
k-1
98.7
96.0
99.3
99.7
89.0
97.7
98.5
97.2
99.3
99.9
99.9
95.8
FLOOR
Percent of Students
Scoring Below k/n*
3.8
2.9
4.2
1.6
3.6
2.4
0.1
2.0
1.8
0.9
1.7
7.1
COMPLETION RATES
Percent of Students
Completing Test
98
98
96
99
99
99
99
98
69
99
99
99
99
99
99
99
99
99
99
99
90
99
99
99
Number of Items (k)
Concepts
Li
L1
L
M1
M2
M3
SS
SC
31
23
42
31
30
30
31
31
28
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
* n = number of answer choices
102
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 103
Table 6.5 (continued)
Ceiling Effects, Floor Effects, and Completion Rates
Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample)
Spring 2000 National Standardization
Reading
Level 9
Grade 3
Language
Comprehension
RV
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Social
Studies
Science
Word
Analysis
Listening
S2
WA
Li
28
35
31
Maps
Reference
and
Materials
Diagrams
Spelling
Capitalization
Punctuation
Usage &
Expression
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
29
37
28
24
24
30
31
22
25
30
30
24
k (perfect score)
99.7
99.9
98.8
99.9
99.9
99.9
99.9
98.7
99.0
99.9
99.9
99.9
99.9
99.9
99.9
k-1
97.0
99.3
93.8
96.9
98.6
97.8
99.9
94.8
96.0
99.6
99.9
98.8
99.9
97.6
99.9
8.4
10.6
4.0
7.2
7.6
8.1
2.5
6.5
5.8
4.2
9.3
1.9
7.9
2.7
1.0
90
961
962
93
92
92
96
932
96
79
97
95
96
95
99
99
95
981
972
97
96
96
99
952
97
89
98
97
97
97
99
99
Number of Items (k)
Vocabulary
Sources of
Information
Mathematics
CEILING
Percentile Rank of Top Scores
FLOOR
Percent of Students
Scoring Below k/n*
COMPLETION RATES
Percent of Students
Completing Test
Percent of Students
Completing 75% of Test
951
981
Reading
Level 10
Grade 4
Language
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
34
41
32
26
26
33
36
24
27
34
34
25
30
k (perfect score)
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.0
99.9
99.9
99.9
99.9
k-1
98.5
99.9
97.8
99.9
99.2
98.8
99.9
97.8
95.9
99.9
99.9
98.1
99.9
FLOOR
Percent of Students
Scoring Below k/n*
8.7
6.4
4.0
6.5
11.1
8.9
2.2
4.3
5.0
4.0
9.6
7.4
7.3
COMPLETION RATES
Percent of Students
Completing Test
91
971
962
94
94
92
97
951
942
96
75
97
95
97
97
96
981
982
97
97
97
99
981
972
97
89
99
98
98
98
Number of Items (k)
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
1
Part 1
Part 2
* n = number of answer choices
2
103
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 104
Table 6.5 (continued)
Ceiling Effects, Floor Effects, and Completion Rates
Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample)
Spring 2000 National Standardization
Reading
Level 11
Grade 5
Language
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
37
43
36
28
28
35
40
26
29
37
37
26
32
k (perfect score)
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
k-1
98.7
99.7
97.8
99.5
98.7
99.9
99.9
98.3
97.5
99.9
99.9
99.0
99.9
FLOOR
Percent of Students
Scoring Below k/n*
7.5
4.6
4.5
5.4
7.0
6.7
2.7
8.0
7.1
6.1
9.4
7.8
5.8
COMPLETION RATES
Percent of Students
Completing Test
93
971
982
95
95
94
97
941
942
97
77
98
96
97
98
97
971
992
97
98
97
99
971
972
98
88
99
98
99
99
Number of Items (k)
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
Reading
Level 12
Grade 6
Language
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
39
45
38
30
30
38
43
28
30
39
39
28
34
k (perfect score)
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
k-1
98.1
99.5
98.3
99.9
98.1
99.9
99.9
98.2
98.7
99.9
99.9
99.9
99.9
FLOOR
Percent of Students
Scoring Below k/n*
3.9
5.3
3.4
4.9
7.1
6.1
4.0
4.2
4.8
5.9
6.9
4.8
6.7
COMPLETION RATES
Percent of Students
Completing Test
95
981
972
96
96
96
97
941
912
97
80
97
96
97
97
98
991
982
98
98
98
99
971
952
99
91
98
98
99
99
Number of Items (k)
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
1
Part 1
Part 2
* n = number of answer choices
2
104
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 105
Table 6.5 (continued)
Ceiling Effects, Floor Effects, and Completion Rates
Iowa Tests of Basic Skills — Complete Battery, Form A (Unweighted Sample)
Spring 2000 National Standardization
Reading
Level 13
Grade 7
Language
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
41
48
40
32
32
40
46
30
31
41
41
30
36
k (perfect score)
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
k-1
99.1
99.9
98.8
99.9
99.1
99.9
99.9
98.1
98.6
99.9
99.9
99.9
99.9
FLOOR
Percent of Students
Scoring Below k/n*
5.1
6.0
4.0
4.6
7.1
6.1
3.8
5.8
5.3
10.6
9.1
7.4
7.3
COMPLETION RATES
Percent of Students
Completing Test
96
971
982
96
97
96
98
961
942
97
78
96
96
98
97
98
981
992
98
99
98
99
981
962
99
88
98
98
99
98
Social
Studies
Science
Number of Items (k)
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
Reading
Level 14
Grade 8
Language
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
42
52
42
34
34
43
49
32
32
43
43
31
38
k (perfect score)
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
99.9
k-1
98.7
99.9
99.9
99.9
99.9
99.9
99.9
98.7
98.9
99.9
99.9
99.9
99.9
FLOOR
Percent of Students
Scoring Below k/n*
4.4
5.3
5.0
5.7
9.0
6.7
8.1
5.7
8.9
7.7
8.5
7.7
7.4
COMPLETION RATES
Percent of Students
Completing Test
97
971
992
97
97
96
97
971
942
96
78
97
97
98
98
98
991
992
98
99
98
99
991
982
98
81
98
98
99
99
Number of Items (k)
CEILING
Percentile Rank of Top Scores
Percent of Students
Completing 75% of Test
1
Part 1
Part 2
* n = number of answer choices
2
105
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 106
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 107
PART 7
Among the most important results from the periodic
use of achievement tests administered under
standard conditions are findings that can be used to
understand the process of social change through
education. The data on national trends in
achievement reported in Part 4 represent one
example of how aggregate data from achievement
tests reflect the social dynamics of education. In
addition, national data on student achievement
have shown the value of disaggregated results.
During the 1980s, for example, the National
Assessment of Educational Progress (NAEP) often
reported fairly stable levels of achievement.
However, dramatic gains were demonstrated by the
national samples of Black and Hispanic students
(Linn & Dunbar, 1990). Although the social reasons
for changes in group differences in achievement are
not always clear, carefully developed tests can
provide a broad view of the influence of school on
such differences.
Various approaches to understanding group
differences in test scores are a regular part of
research and test development efforts for the Iowa
Tests of Basic Skills. To ensure that assessment
materials are appropriate and fair for different
groups, careful test development procedures are
followed. Sensitivity review by content and fairness
committees and extensive statistical analysis of the
items and tests are conducted. The precision of
measurement for important groups in the national
standardization is evaluated when examining the
measurement characteristics of the tests. Differences
between groups in average performance and in the
variability of performance are also of interest, and
these are examined for changes over time. In
addition to descriptions of group differences in test
performance, analyses of differential item functioning
are undertaken with results from the national item
tryout as well as with results from the national
standardization.
Group Differences in Item
and Test Performance
Standard Errors of Measurement
for Groups
The precision of test scores for members of various
demographic groups is a great concern, especially
when test scores are used for purposes of selection or
placement, such as college admissions tests and
other kinds of subject-matter tests. Although
standardized achievement tests such as the ITBS
were not designed to be used in this way, there
is still an interest in the precision with which the
tests place an individual on the developmental
continuum in each content domain. Standard errors
of measurement were presented for this purpose in
Part 5. Table 7.1 reports standard errors of
measurement estimated separately for boys, girls,
Whites, Blacks, and Hispanics based on data from
the 2000 national standardization.
Gender Differences in Achievement
Differences between achievement test scores of girls
and boys have been an ongoing concern in
education. Patterns of test performance have been
used as arguments in favor of a variety of school
reform initiatives aimed at narrowing achievement
gaps (e.g., same-gender classrooms, professional
development to promote gender equity in instruction,
programs that encourage girls to take advanced
math and science classes). These initiatives are
testimony to the importance of gender differences in
achievement test results.
It is well-established that the achievement of girls
in most elementary school subjects, especially those
that emphasize language skills, is higher than that
of boys. Results from the most recent national
standardization of the ITBS continue to document
this finding. Reasons most frequently offered in the
past to explain this situation are that girls receive
more language stimulation in the home; that the
general culture, and especially school culture, sets
higher expectations for girls; and that the social
climate and predominant teaching styles in
elementary school are better suited to the interests
and values of girls than boys (Hoover, 2003).
107
6.1
6.1
7.4
7.4
6.8
6.9
7.1
7.2
8.2
8.3
9.1
9.0
9.8
9.9
10.1
10.3
7 Girls
7 Boys
8 Girls
8 Boys
9 Girls
9 Boys
10 Girls
10 Boys
11 Girls
11 Boys
12 Girls
12 Boys
13 Girls
13 Boys
14 Girls
14 Boys
11.2
11.4
10.7
10.9
10.7
10.7
10.0
9.9
9.2
9.0
7.4
7.3
6.5
6.1
4.4
4.3
4.4
4.4
7.6
7.7
7.2
7.3
7.0
7.0
6.5
6.4
5.8
5.8
5.0
5.0
4.9
4.8
3.8
3.7
2.6
2.6
RT
Reading
Total
11.2
11.1
10.4
10.3
10.3
10.0
9.4
9.1
8.7
8.3
6.7
6.5
6.2
6.3
3.7
3.8
L1
Spelling
20.3
20.0
19.8
19.3
19.1
18.6
16.2
15.8
14.9
14.4
10.9
10.7
L2
Capitalization
18.5
18.5
17.3
17.2
16.0
15.5
14.6
14.4
13.5
13.3
10.9
10.5
L3
Punctuation
17.1
17.0
15.3
15.0
14.3
13.8
13.5
13.1
11.1
10.8
9.6
9.3
L4
8.6
8.5
8.0
7.9
7.6
7.4
6.8
6.7
6.1
6.0
4.8
4.7
6.1
6.0
4.9
4.9
6.0
5.9
4.4
4.3
LT
Usage & Language
Expression
Total
10.3
10.3
9.7
9.8
9.0
9.2
9.1
9.1
8.1
8.1
8.0
8.2
7.6
7.6
6.4
6.5
M1
14.8
14.8
14.4
14.2
13.7
13.7
12.0
12.1
11.1
11.4
9.2
9.2
7.9
8.1
6.9
7.1
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
9.0
9.0
8.7
8.6
8.2
8.3
7.5
7.6
6.9
7.0
6.1
6.1
5.5
5.5
4.7
4.8
5.6
5.6
13.3
13.9
12.9
13.4
11.1
11.2
8.1
8.3
7.1
7.2
6.0
6.2
5.5
5.6
5.0
5.2
M3
MT -
4.5
4.5
Computation
Math
Total
Mathematics
7.5
7.6
7.2
7.3
6.6
6.7
5.7
5.8
5.1
5.2
4.5
4.6
4.1
4.1
3.5
3.6
MT +
Math
Total
4.8
4.9
4.6
4.6
4.4
4.4
4.0
4.0
3.6
3.6
3.1
3.1
3.2
3.2
2.6
2.6
4.1
4.0
3.4
3.4
CT -
Core
Total
4.5
4.6
4.3
4.3
4.1
4.1
3.7
3.6
3.3
3.3
2.8
2.8
2.9
2.9
2.4
2.4
CT +
Core
Total
14.3
14.3
13.8
13.7
13.2
13.1
11.5
11.5
10.1
10.0
8.5
8.5
10.6
10.6
7.8
7.8
SS
Social
Studies
13.5
13.6
12.7
12.7
11.9
11.8
11.1
11.1
10.2
10.2
9.7
9.7
10.8
11.2
9.1
9.3
SC
Science
18.2
18.3
18.3
18.5
16.2
16.2
14.3
14.4
12.4
12.5
10.9
11.2
S1
14.8
14.3
13.0
12.9
12.4
12.2
10.7
10.2
9.4
9.0
7.4
7.3
S2
11.7
11.6
11.2
11.3
10.2
10.2
8.9
8.8
7.8
7.7
6.6
6.7
6.4
6.7
5.2
5.3
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
4.5
4.5
4.3
4.3
4.1
4.0
3.7
3.6
3.3
3.3
2.9
2.9
2.9
2.9
2.3
2.3
CC -
4.4
4.4
4.2
4.2
4.0
4.0
3.6
3.5
3.2
3.2
2.8
2.8
2.8
2.9
2.3
2.3
CC +
Compo- Composite
site
9.3
9.3
7.2
7.1
7.7
7.6
6.5
6.3
WA
Word
Analysis
8.7
8.6
7.5
7.5
6.5
6.4
5.0
4.9
Li
Listening
1.8
1.8
1.4
1.4
2.9
2.9
3.8
3.7
RPT
Reading
Profile
Total
3:15 PM
Note: -Does not include Computation
+Includes Computation
9.0
9.0
6 Girls
6 Boys
RC
RV
8.0
7.9
Comprehension
Language
10/29/10
5 Girls
5 Boys
Level
Gender
108
Vocabulary
Reading
Table 7.1
Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Gender
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 108
6.3
5.9
6.0
7.4
7.0
7.0
6.9
6.8
7.1
7.2
7.3
7.3
8.2
8.5
8.6
9.1
9.3
9.4
9.7
10.5
10.5
10.1
11.0
11.0
7 White
7 Black
7 Hispanic
8 White
8 Black
8 Hispanic
9 White
9 Black
9 Hispanic
10 White
10 Black
10 Hispanic
11 White
11 Black
11 Hispanic
12 White
12 Black
12 Hispanic
13 White
13 Black
13 Hispanic
14 White
14 Black
14 Hispanic
11.3
11.9
11.7
10.8
11.1
11.0
10.7
10.6
10.7
10.0
9.8
9.8
9.1
8.9
9.0
7.3
6.9
7.1
6.3
5.5
5.6
4.4
4.2
4.0
4.4
4.3
4.4
RC
7.6
8.1
8.0
7.3
7.6
7.6
7.0
7.1
7.1
6.5
6.5
6.5
5.8
5.8
5.8
5.0
4.9
5.0
4.9
4.5
4.5
3.8
3.6
3.6
2.6
2.6
2.7
RT
Reading
Total
11.1
10.9
11.0
10.2
10.2
10.1
10.2
9.8
9.9
9.3
8.9
8.8
8.4
8.3
8.1
6.4
6.3
6.4
6.2
6.1
6.4
3.7
3.9
4.0
L1
Spelling
20.1
19.2
19.8
19.5
18.8
18.9
18.9
18.1
18.1
16.0
15.4
15.3
14.5
13.6
14.1
10.7
10.3
10.1
L2
Capitalization
18.5
18.3
18.5
17.2
16.9
17.2
15.6
15.1
15.4
14.7
14.3
14.3
13.3
12.9
13.1
10.7
9.8
10.2
L3
Punctuation
16.9
16.9
16.9
15.2
14.7
14.7
14.1
13.3
13.7
13.2
12.5
12.9
10.9
10.2
10.6
9.4
8.5
8.8
L4
8.5
8.3
8.4
8.0
7.7
7.8
7.5
7.2
7.3
6.8
6.5
6.5
6.0
5.7
5.9
4.7
4.4
4.5
5.9
5.8
5.8
5.0
4.6
4.7
5.9
5.4
5.5
4.4
4.2
4.1
LT
10.2
10.8
10.7
9.7
9.9
9.9
9.1
9.0
9.1
9.0
9.1
9.1
8.1
8.0
8.0
8.1
7.8
7.8
7.6
7.6
7.7
6.6
6.3
6.3
M1
14.8
15.1
15.1
14.3
14.3
14.5
13.7
13.2
13.4
12.1
11.9
12.0
11.2
11.0
11.1
9.2
8.9
9.1
7.9
8.1
8.1
7.0
7.1
7.0
M2
9.0
9.3
9.2
8.7
8.7
8.8
8.2
8.0
8.1
7.5
7.5
7.5
6.9
6.8
6.8
6.1
5.9
6.0
5.5
5.5
5.6
4.8
4.7
4.7
5.6
5.5
5.4
4.5
4.4
4.4
MT -
Math
Total
13.6
14.3
14.2
13.5
13.6
13.6
11.1
10.9
11.1
8.1
8.1
8.1
7.2
7.1
7.1
6.1
6.0
6.1
5.5
5.4
5.5
5.0
4.5
4.8
M3
Computation
7.5
7.8
7.8
7.3
7.4
7.4
6.6
6.4
6.5
5.7
5.7
5.7
5.2
5.1
5.1
4.6
4.4
4.5
4.1
4.1
4.2
3.6
3.5
3.5
MT +
Math
Total
4.8
5.0
5.0
4.6
4.6
4.7
4.4
4.3
4.3
4.0
3.9
4.0
3.6
3.5
3.6
3.1
3.0
3.0
3.1
3.1
3.1
2.6
2.5
2.5
4.0
3.9
3.8
3.4
3.3
3.1
CT -
Core
Total
4.6
4.7
4.7
4.3
4.4
4.4
4.1
4.0
4.0
3.7
3.6
3.6
3.3
3.2
3.2
2.8
2.6
2.7
2.9
2.8
2.8
2.4
2.3
2.3
CT +
Core
Total
14.3
14.5
14.5
13.7
13.8
13.8
13.2
12.8
12.9
11.6
11.3
11.4
10.1
9.6
9.8
8.6
8.2
8.2
10.7
10.2
10.4
8.0
7.2
7.1
SS
Social
Studies
13.5
13.6
13.6
12.7
12.5
12.6
11.9
11.4
11.6
11.1
10.5
10.9
10.2
9.6
9.9
9.7
9.0
9.3
11.0
9.9
10.2
9.8
8.9
8.2
SC
Science
18.3
18.2
18.6
18.5
18.3
18.5
16.3
15.9
16.1
14.4
14.2
14.3
12.5
12.0
12.2
11.0
10.1
10.6
S1
14.3
13.9
13.9
13.0
12.5
12.6
12.3
11.6
11.8
10.3
9.7
10.2
9.2
8.6
8.9
7.2
6.9
7.0
S2
11.6
11.5
11.6
11.3
11.0
11.2
10.2
9.8
10.0
8.8
8.6
8.8
7.8
7.4
7.6
6.6
6.1
6.3
6.6
6.3
6.3
5.3
5.2
5.2
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
4.5
4.6
4.6
4.3
4.3
4.3
4.1
3.9
4.0
3.7
3.5
3.6
3.3
3.1
3.2
2.9
2.7
2.8
2.9
2.7
2.8
2.4
2.2
2.2
CC -
4.4
4.5
4.5
4.2
4.2
4.2
4.0
3.9
3.9
3.6
3.4
3.5
3.2
3.0
3.1
2.8
2.7
2.7
2.8
2.7
2.7
2.3
2.2
2.1
CC +
Compo- Composite
site
9.2
9.1
9.3
7.3
6.8
7.1
7.5
7.3
7.2
6.5
6.5
6.1
WA
Word
Analysis
8.7
8.3
8.4
7.6
7.1
7.2
6.4
6.2
6.2
5.0
5.0
4.8
Li
Listening
1.8
1.7
1.7
1.4
1.3
1.4
2.9
2.8
2.8
3.9
3.7
3.5
RPT
Reading
Profile
Total
3:15 PM
Note: -Does not include Computation
+Includes Computation
9.0
8.6
8.4
6 White
6 Black
6 Hispanic
RV
8.2
7.7
7.0
Group
Comprehension
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
5 White
5 Black
5 Hispanic
Level
Vocabulary
Reading
Table 7.1 (continued)
Standard Errors of Measurement in the Standard Score Metric for ITBS by Level and Group
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 109
109
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 110
At the same time, the relatively higher achievement
of boys in advanced math and science curricula in
the upper grades is a common observation. Group
differences in test scores across the content domains
and grade levels of the ITBS have been monitored
for many years and continue to shed light on the
developmental nature of performance differences.
Table 7.2 presents mean differences between the
achievement of boys and girls in kindergarten through
grade 8. The differences are based on results from all
of the students participating in the 2000 national
standardization whose gender was coded on the
answer document (99% of the sample). To examine
gender differences in test scores, the frequency
distributions of scores for boys and girls were obtained
and differences at various points on the distributions
were examined with an effect-size statistic. The effect
size was defined as the difference between the group
statistics (girls minus boys) in total-sample standard
deviation units. For example, an effect size for the
group means would equal the difference between the
boys’ mean and the girls’ mean divided by the totalsample standard deviation (SD). As the table shows, on
the average girls were markedly higher in
achievement in reading comprehension, language,
computation, and reference skills. In addition, the
difference between scores earned by boys and girls
tended to increase with grade level.
The complex nature of gender differences doesn’t
necessarily appear when means are compared
(Hoover, 2003). Many of the conflicting results
previously reported concerning gender differences are
explained by consideration of the entire distribution of
achievement. The results of an analysis of differences
between the score distributions of boys and girls in
grades 4 and 8 is presented in Table 7.3. The upper
part of the table contains means and SDs of boys and
girls for each test and composite score in the battery as
well as mean effect sizes and ratios of SDs for the
groups. The lower part of the table contains effect sizes
computed at three points in the distributions, the 10th,
50th, and 90th percentiles. Complete data for all
grades are available from the Iowa Testing Programs.
Overall, the performance of boys tends to be more
variable than that of girls. This shows up in
differences between the standard deviations for
the two groups as well as differences in the scores
at selected percentiles. For example, when
examining the means, boys and girls do not appear
to differ greatly in science at grade 4. Because the
performance of boys shows more variability (SDB =
30.2, SDG = 28.1) however, the highest scoring boys
do better than the highest scoring girls (Effect90%ile=
⫺.10) and the lowest scoring boys do worse than the
110
lowest scoring girls (Effect10%ile= +.10). Differences
in the other areas were not as large but still favored
girls, especially at the middle and lower
achievement levels. At the upper achievement
levels, the performance of boys tended to equal or
surpass that of girls in all achievement areas except
Language, Reference Materials, and Math
Computation. In Vocabulary, Math Problem Solving,
Social Studies, Science, and Maps and Diagrams, the
“crossover” point where equality of achievement by
gender occurs is slightly above the median in most
grades.
Similar data on gender differences were obtained for
kindergarten through grade 8 in the 1977, 1984, and
1992 national standardizations, and for grades
3 through 8 in 1963 and 1970. Gender differences for
all national standardizations since 1963 are
summarized by total score in Table 7.4. Note that
the differences in Table 7.4 are expressed in months
on the grade-equivalent scale. The direction and
magnitude of the differences have remained stable;
however, several trends are noteworthy. In Reading
since 1992, differences at the 90th percentile have
been greater than they were earlier. In Language,
the tendency for differences favoring girls to
increase in magnitude with grade level, especially at
the 10th and 50th percentiles, continued in the 2000
results. In Math, gender differences at the median
remained small in 2000; at the 10th percentile, they
were near the 1977 peak or exceeded it at some
grade levels. Gender differences in composite
performance in 2000 were smaller at the 90th and
the 50th percentile than at the 10th percentile.
The educational implications of these differences
across test areas, achievement levels, and time are not
immediately apparent. The importance of
quantitative skills for success in high school and
college and in employment has been recognized in
educational policy. Attention has been focused on the
gap between boys and girls in quantitative skills,
especially for select groups of students; the evidence
across several national standardizations of the ITBS
suggests the magnitude of gender differences in math
concepts and problem-solving skills has been reduced.
Similar attention has not been targeted at
differences in language skills. In view of the
importance of language skills in high school
achievement and in predicting college and
vocational success, greater emphasis should be
placed on language arts programs that engage the
attention of boys. The fact that gender differences in
achievement are relatively small or nonexistent in
kindergarten and first grade raises questions about
the influence of school on such differences.
-.03
.12
.03
.05
.01
.01
-.04
.01
.08
6
7
8
9
10
11
12
13
14
.16
.18
.17
.16
.14
.15
.19
.18
.09
RC
.13
.11
.08
.10
.09
.11
.11
.16
.10
RT
Reading
Total
.27
.32
.26
.31
.28
.25
.27
.22
L1
Spelling
.37
.31
.28
.30
.28
.23
L2
Capitalization
.47
.45
.41
.41
.28
.23
L3
Punctuation
.40
.32
.29
.27
.18
.19
L4
.43
.40
.36
.37
.29
.26
.27
.24
.08
.14
LT
-.06
-.01
-.03
-.01
-.06
-.09
-.16
-.05
M1
.05
.05
.01
.06
.01
.04
-.11
.01
M2
.00
.02
-.01
.03
-.02
-.02
-.15
-.02
.00
.08
MT -
Math
Total
.25
.24
.25
.25
.17
.07
-.02
-.10
M3
Computation
.09
.10
.08
.10
.04
.00
-.12
-.04
MT +
Math
Total
.22
.21
.17
.19
.14
.13
.09
.14
.01
.10
CT -
Core
Total
.25
.24
.20
.22
.16
.14
.11
.14
CT +
Core
Total
.07
.10
.03
-.04
-.02
.00
-.17
-.05
SS
Social
Studies
.13
.11
.08
.01
.00
-.01
-.25
-.03
SC
Science
.01
.01
-.10
-.03
-.01
-.03
S1
.32
.27
.22
.27
.21
.14
S2
.16
.14
.05
.11
.09
.05
.00
.08
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.18
.17
.11
.11
.08
.07
.00
.07
CC -
.16
.15
.10
.09
.07
.06
.01
.07
CC +
Compo- Composite
site
.11
.09
.09
.15
WA
Word
Analysis
.10
.03
.05
.09
Li
Listening
.16
.15
.07
.12
RPT
Reading
Profile
Total
3:15 PM
Note: Positive differences favor girls.
-Does not include Computation
+Includes Computation
.07
5
RV
Comprehension
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
Level
Vocabulary
Reading
Table 7.2
Male-Female Effect Sizes for Average Achievement
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 111
111
22.3
.01
1.03
-.03
.01
.04
SD
d**
R***
Difference** 90
50
10
247.8
31.7
250.6
29.2
.09
1.08
-.01
.06
.27
Male Mean*
SD
Female Mean
SD
d**
R***
Difference** 90
50
10
.31
.17
-.01
1.09
.16
39.2
253.3
42.8
246.5
.16
.14
.15
1.01
.14
28.4
207.8
28.8
203.7
.25
.15
.00
1.09
.14
32.5
252.0
35.3
247.3
.11
.09
.07
1.02
.09
23.9
205.2
24.4
203.0
RT
Reading
Total
.41
.29
.17
1.07
.28
33.7
255.8
36.2
245.9
.30
.28
.35
1.00
.28
25.0
207.2
24.9
200.1
L1
Spelling
.35
.44
-.33
1.03
.37
49.0
262.8
50.4
244.5
.32
.24
.29
.99
.28
35.7
211.0
35.4
200.9
L2
Capitalization
.42
.58
.28
1.04
.47
48.7
265.2
50.5
242.0
.24
.28
.30
.98
.28
34.1
211.6
33.5
202.0
L3
.41
.48
.30
1.03
.41
50.2
262.5
51.4
241.9
.22
.19
.14
1.01
.18
33.8
211.0
34.3
204.9
L4
.42
.50
.32
1.04
.44
40.4
261.8
42.1
243.8
.30
.30
.28
1.00
.29
27.9
210.3
27.9
202.3
LT
Punctu- Usage & Language
ation
Expression
Total
.08
-.02
-.16
1.10
-.04
31.6
251.3
34.6
252.5
.08
-.05
-.15
1.08
-.05
21.7
201.6
23.3
202.7
M1
.18
.08
-.11
1.12
.06
39.6
253.0
44.3
250.5
.15
.04
-.15
1.10
.02
26.7
205.4
29.4
204.8
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
25.0
.14
.03
-.15
1.12
.02
33.7
252.3
37.6
251.6
.12
-.01
-.16
1.09
-.01
22.9
203.5
.28
.32
.17
1.04
.27
35.9
255.6
37.2
245.9
.25
.17
.15
1.03
.19
20.1
203.7
20.6
199.9
M3
MT -
203.8
Computation
Math
Total
Mathematics
.21
.15
-.10
1.11
.11
31.0
253.5
34.3
249.9
.18
.05
-.08
1.08
.05
20.2
203.7
21.8
202.7
MT +
Math
Total
.28
.27
.08
1.07
.22
32.9
255.7
35.3
248.2
.19
.15
.10
1.04
.14
23.0
206.6
23.8
203.3
CT -
Core
Total
.31
.31
.13
1.07
.26
31.8
256.2
34.1
247.7
.21
.18
.13
1.03
.16
22.1
206.7
22.8
203.0
CT +
Core
Total
.25
.16
-.16
1.15
.09
38.7
251.4
44.4
247.8
.14
-.05
-.11
1.08
-.02
24.9
204.3
26.8
204.7
SS
Social
Studies
.23
.20
-.10
1.12
.13
39.7
255.2
44.3
249.8
.10
.02
-.10
1.07
.01
28.1
206.6
30.2
206.3
SC
Science
.15
.02
-.16
1.13
.02
43.3
252.9
48.7
252.2
.11
-.01
-.13
1.08
-.02
29.5
206.3
31.9
206.8
S1
.37
.38
.18
1.06
.33
38.7
258.7
41.0
245.7
.20
.23
.20
1.01
.22
25.0
207.6
25.2
202.2
S2
.26
.20
-.02
1.10
.17
38.2
255.9
42.0
249.0
.18
.09
.00
1.05
.10
25.3
207.0
26.7
204.5
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.25
.23
.00
1.10
.18
33.5
255.5
36.9
249.3
.16
.09
.00
1.06
.08
22.9
206.5
24.2
204.6
CC -
.25
.22
-.04
1.11
.16
33.2
255.2
36.8
249.6
.16
.07
-.02
1.06
.07
22.8
206.5
24.2
204.9
CC +
Compo- Composite
site
3:15 PM
Grade 8
22.9
202.4
SD
Female Mean
202.2
RC
RV
Male Mean*
Grade 4
Comprehension
Language
10/29/10
* Means and SDs are in standard score units.
** d = (Female mean-Male mean)/average SD of Male and Female
*** R = Male SD/Female SD
-Does not include Computation
+Includes Computation
112
Vocabulary
Reading
Table 7.3
Descriptive Statistics by Gender
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 112
Note:
10
50
2
1
1
2
6
5
4
3
0
1
0
1
1
0
-1
2
2
1
1
0
0
0
-1
-1
0
0
-1
0
0
0
0
-3
-3
-2
-1
-2
0
1
1
0
1
1
1
1
4
5
0
1
0
1
0
0
-1
0
1
0
2
0
0
0
0
-1
-1
1
2
3
3
1
2
2
2
2
3
2
1
2
0
1
1
1
-1
0
1
2
2
2
5
4
3
3
3
4
4
4
3
3
2
1
0
2
1
1
1
2
3
3
5
6
2
2
2
2
6
4
5
6
2
1
1
0
0
1
1
1
1
2
3
4
4
4
6
8
1
2
2
2
3
3
4
6
2
2
2
0
0
1
1
1
1
1
2
5
6
6
6
3
3
1
2
2
4
4
3
4
2
2
2
2
3
3
3
0
1
3
2
2
3
4
6
6
2
2
2
2
3
4
4
4
2
2
1
2
3
2
0
In grade-equivalent units — “months,” positive differences favor girls.
0
0
0
1
1
1
1
0
K
1
2
2
2
1
1
2
3
3
1
-1
0
0
1
1
1
0
-1
-1
-2
-1
-1
0
1
-1
-1
-2
-3
-1
1
1
3
2
1
4
2
6
2
0
7
5
3
2
8
3
0
K
5
1
1
3
1
1
1
1
1
2
2
2
2
2
0
-1
1
0
1
0
7
0
8
2
-1
-1
-1
0
3
1
1
K
0
4
-2
0
0
5
0
-3
-4
1
0
6
0
0
0
0
’63 ’70 ’77 ’84 ’92 ’00
Reading
7
4
5
5
6
6
4
0
2
4
4
5
5
5
5
2
2
5
6
5
6
5
6
7
1
3
3
4
6
6
4
6
7
8 10
1
1
2
3
4
5
6
7
8
1
3
5
5
7
8
9
10 11 10
9 11 11
5
5
4
6
9
6
5
1
2
3
4
4
5
7
0
0
1
2
3
4
6
7
1
1
2
3
4
7
7
8
9 10
1
0
3
4
4
5
6
7
2
2
3
3
4
6
5
6
2
2
3
3
4
5
6
7
9
1
2
3
4
4
6
7
9
9 10
2
3
2
2
2
3
4
6
8 10
9
1
0
3
3
3
4
5
5
5
’63 ’70 ’77 ’84 ’92 ’00
Total
Language
1
1
1
1
0
0
1
2
1
2
1
0
1
1
1
1
2
-2
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
-1
-1
-2
-2
0
0
1
1
2
2
2
3
4
0
0
1
1
1
2
3
4
5
-1
-1
0
0
0
0
0
0
0
0
0
0
1
1
2
2
3
3
1
0
0
0
1
1
2
2
1
0
0
0
1
1
1
3
5
5
4
3
0
0
0
0
0
1
2
2
0
0
0
-1 -2
0 -1
-1 -2
-1 -1
-1 -1
-1 -1
-1 -3
1
1
0
1
2
2
3
4
3
0
0
-1
0
0
2
1
2
2
0
-1
-1
-1
-1
0
-2
-1
-2
’63 ’70 ’77 ’84 ’92 ’00
Total
Mathematics
1
1
1
3
0
1
2
1
1
3
1
2
1
-1
-1
1
1
-1
1
1
1
3
3
3
2
2
2
2
3
4
0
0
0
0
0
-1
2
2
3
3
4
4
5
2
2
2
3
3
4
6
0
0
-1
-1
0
0
0
0
5
1
2
3
3
3
3
6
1
1
2
2
3
3
1
1
2
3
4
3
4
4
1
0
1
1
1
3
4
4
0
1 -2
0 -1
-1 -2
-1 -1
0 -1
-1
-1 -2
1
1
3
2
3
3
4
5
1
0
1
1
2
1
4
4
1
-2
-1
0
0
-2
-1
0
’63 ’70 ’77 ’84 ’92 ’00
Total
Work Study
1
1
1
2
1
2
3
3
3
4
3
2
1
1
1
1
3
0
1
1
1
2
2
2
3
3
3
4
5
4
2
2
2
1
1
1
0
1
1
2
2
3
3
4
5
0
1
2
3
3
3
4
5
5
0
1
1
0
1
1
1
1
1
0
0
1
2
3
2
3
4
5
1
0
1
2
2
2
2
3
4
0
1
2
1
0
0
1
0
0
1
1
1
2
2
3
4
4
3
1
1
1
1
1
2
2
3
3
1
1
0
0
0
0
1
1
0
1
1
0
2
2
2
3
4
4
1
1
0
1
1
1
2
3
4
0
1
0
0
0
0
0
0
-1
’63 ’70 ’77 ’84 ’92 ’00
Composite
3:15 PM
2
0
7
’63 ’70 ’77 ’84 ’92 ’00
8
Grade
Vocabulary
10/29/10
90
Percentile
Rank
Table 7.4
Gender Differences in Achievement over Time
Iowa Tests of Basic Skills
1963–2000 National Standardization Data
961464_ITBS_GuidetoRD.qxp
Page 113
113
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 114
Racial-Ethnic Differences in Achievement
Differences between the average test scores of
different racial-ethnic groups, specifically the gaps
between average scores for Blacks and Whites, and
Hispanics and Whites, have been an ongoing
concern in achievement testing. Results from
national assessments such as NAEP have
consistently shown such gaps in performance.
Monitoring changes in the achievement gap over
time is an important function of achievement tests.
Historical data from The Iowa Tests allow only
limited comparisons across time. Table 7.5 shows
the differences in performance of fifth-grade
students (Whites and Blacks) in national
standardization studies since 1977. Mean
differences are reported as the effect-size statistic
(White mean minus Black mean divided by the
pooled group standard deviation). These results
support the conclusion that differences between
Whites and Blacks have narrowed somewhat,
particularly in reading and language skills. The gap
continues, however, and the problem of group
differences in achievement remains formidable.
Group differences in the 2000 national
standardization of The Iowa Tests are consistent
with the differences observed elsewhere.
Specifically, Whites outperform Blacks and
Hispanics in terms of average performance in all
subjects tested. Although the size of the gap varies
by subject area, the averages can differ by more
than one-half of a standard deviation. Table 7.6
reports effect sizes for these average differences.
114
Table 7.5
Race Differences in Achievement
Iowa Tests of Basic Skills, Grade 5
Balanced National Standardization Samples,
1977–2000
Form
Year
Vocabulary
Reading
Language
Spelling
Capitalization
Punctuation
Usage & Expression
Mathematics
Concepts & Estimation
Problem Solving
& Data Interpretation
Computation
Sources of Information
Maps & Diagrams
Reference Materials
Note:
7
1977
G
1984
K
1992
A
2000
.84
.80
.68
.58
.72
.63
.54
.54
.47
.62
.66
.87
.22
.42
.50
.59
.19
.35
.44
.63
.17
.27
.35
.54
.67
.56
.61
.58
.63
.30
.78
.25
.64
.29
.58
.20
.80
.65
.62
.52
.62
.48
.55
.41
Effect size = (White mean-Black mean) / total SD.
It is important not to confuse these effect sizes with
the term “bias” as defined in the 1999 Standards for
Educational and Psychological Testing. Statistical
bias is addressed by research related to differential
item functioning or prediction differences due to
group membership. Note that the effect sizes
reported here are consistent with NAEP findings of
smaller group differences during the early 1990s,
but that finding is not uniform across subtests.
-.55
-.59
-.78
-.65
13
14
5
6
-.63
-.51
-.54
11
12
13
14
-.48
-.42
-.51
-.50
-.36
-.41
-.45
-.41
-.21
-.63
-.58
-.64
-.54
-.49
-.54
-.49
-.60
-.58
-.47
-.56
-.59
-.56
-.24
-.65
-.60
-.70
-.58
-.58
-.56
-.48
-.53
-.19
RT
Reading
Total
-.35
-.26
-.34
-.31
-.17
-.22
-.30
-.31
-.23
-.12
-.19
-.17
-.11
-.05
-.11
-.27
L1
Spelling
Note: Positive differences favor Whites.
-Does not include Computation
+Includes Computation
-.55
-.59
10
-.66
-.67
12
9
-.53
11
-.64
-.62
-.62
-.59
9
10
8
-.47
-.48
8
7
-.41
-.57
7
-.41
-.41
-.17
-.46
RC
6
RV
Comprehension
5
Level
Vocabulary
Reading
-.40
-.36
-.43
-.30
-.22
-.27
-.45
-.43
-.39
-.26
-.35
-.23
L2
Capitalization
-.43
-.32
-.41
-.32
-.24
-.27
-.42
-.43
-.46
-.35
-.39
-.39
L3
Punctuation
-.49
-.47
-.48
-.50
-.36
-.38
-.56
-.57
-.60
-.54
-.54
-.54
L4
-.48
-.40
-.48
-.41
-.29
-.34
-.43
-.44
-.54
-.68
-.48
-.47
-.49
-.39
-.42
-.37
-.45
-.44
-.47
-.57
LT
Usage & Language
Expression
Total
Language
-.46
-.45
-.47
-.47
-.39
-.45
-.44
-.61
-.64
-.63
-.59
-.58
-.60
-.57
-.56
-.65
M1
-.48
-.45
-.48
-.47
-.35
-.38
-.37
-.47
-.71
-.67
-.66
-.58
-.54
-.54
-.54
-.60
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
-.50
-.48
-.50
-.50
-.39
-.44
-.43
-.58
-.43
-.67
-.72
-.69
-.67
-.61
-.60
-.59
-.59
-.67
-.42
-.51
MT -
Math
Total
-.20
-.18
-.26
-.19
-.16
-.15
-.21
-.30
-.26
-.21
-.27
-.20
-.29
-.22
-.33
-.47
M3
Computation
Mathematics
-.44
-.41
-.46
-.44
-.35
-.39
-.39
-.55
-.62
-.58
-.58
-.53
-.55
-.52
-.55
-.67
MT +
Math
Total
-.55
-.49
-.57
-.53
-.41
-.48
-.55
-.59
-.63
-.83
-.66
-.63
-.66
-.56
-.57
-.55
-.56
-.61
-.49
-.58
CT -
Core
Total
-.53
-.47
-.56
-.51
-.40
-.47
-.54
-.56
-.63
-.59
-.63
-.54
-.56
-.52
-.55
-.61
CT +
Core
Total
-.43
-.41
-.52
-.39
-.36
-.46
-.39
-.56
-.57
-.53
-.61
-.50
-.57
-.46
-.46
-.60
SS
Social
Studies
-.43
-.48
-.55
-.47
-.40
-.40
-.50
-.61
-.58
-.58
-.66
-.59
-.60
-.56
-.69
-.55
SC
Science
-.42
-.40
-.46
-.46
-.33
-.37
-.61
-.59
-.59
-.54
-.57
-.50
S1
-.48
-.35
-.42
-.35
-.24
-.32
-.48
-.50
-.52
-.40
-.40
-.41
S2
-.48
-.41
-.48
-.44
-.31
-.37
-.43
-.45
-.59
-.59
-.60
-.51
-.53
-.50
-.48
-.47
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
Table 7.6
Effect Sizes for Racial-Ethnic Differences in Average Achievement
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
-.53
-.49
-.58
-.51
-.41
-.48
-.58
-.65
-.66
-.64
-.69
-.59
-.62
-.57
-.65
-.69
CC -
-.52
-.49
-.58
-.51
-.41
-.47
-.58
-.65
-.66
-.63
-.68
-.59
-.62
-.56
-.65
-.69
CC +
Compo- Composite
site
-.40
-.41
-.36
-.48
-.42
-.47
-.29
-.26
WA
Word
Analysis
-.56
-.55
-.52
-.61
-.65
-.69
-.48
-.55
Li
Listening
-.57
-.55
-.48
-.74
-.50
-.59
-.37
-.48
RPT
Reading
Profile
Total
3:15 PM
Black –
10/29/10
Hispanic – White
961464_ITBS_GuidetoRD.qxp
Page 115
115
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 116
Differential Item Functioning
In developing materials for all forms of The Iowa
Tests, attention is paid to writing questions in
contexts accessible to students with a variety of
backgrounds and interests. Obviously, it is
impossible for all stimulus materials to be equally
interesting to all students. Nevertheless, a goal of all
test development in the Iowa Testing Programs is to
assemble test materials that reflect the diversity of
the test-taking population in the United States. In
pursuing this goal, all proposed stimulus materials
and test items are reviewed for appropriateness and
evaluated statistically for differential item
functioning (DIF).
Numerous research studies and editorial efforts
examined the presence of differential item
functioning in tryout materials and final forms of
the Iowa Tests of Basic Skills (e.g., Dunbar, Ordman,
& Mengeling, 2002). During item development,
original test materials and questions were written
by individuals from diverse backgrounds and
subjected to editorial scrutiny by the staff of the
Iowa Testing Programs and by representatives of
the publisher. During this phase of development,
items that portray situations markedly unfamiliar
to most students because of socio-cultural factors
were revised or removed from the pool of items
considered for tryout units. Educators also
evaluated test items for perceived fairness and
cultural sensitivity as well as for balance in
regional,
urban-rural,
and
male-female
representativeness. The educators were selected to
represent Blacks, Whites, Hispanics, American
Indians, and Asians. Members of the review panels
for Forms A and B are listed in Table 7.7.
Reviewers were given information about the
philosophical foundations of the tests, the skill areas
or classifications that were to be reviewed, and other
116
general information about the instruments. Along
with the sets of items, reviewers were asked to look
for possible racial-ethnic, regional, cultural, or
gender biases in the way the item was written or in
the information required to answer the question.
The reviewers rated items as “probably fair,’’
“possibly unfair,” or “probably unfair” and to
comment on the balance of the items and make
recommendations for change. Based on these
reviews and the statistical analysis of DIF, items
identified by the reviewers as problematic were
either revised to eliminate objectionable features or
eliminated from consideration for the final forms.
The statistical analysis of items for DIF were based
on variants of the Mantel-Haenszel procedure
(Dorans & Holland, 1993). The analysis of items in
the final editions of Forms A and B was conducted
with data from the 2000 national standardization
sample. Specific item-level comparisons of
performance were made for groups of males and
females, Blacks and Whites, and Hispanics and
Whites.
The sampling approach for DIF analysis, which was
developed by Coffman and Hoover, is described in
Witt, Ankenmann, and Dunbar (1996). For each
subtest area and level, samples of students from
comparison groups were matched by school building.
Specifically, the building-matched sample for each
grade level was formed by including, for each school,
all students in whichever group constituted the
minority for that school and an equal number of
randomly selected majority students from the same
school. This method of sampling attempts to control
for response differences between focal and reference
groups related to the influence of school curriculum
and environment.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 117
Table 7.7
Fairness Reviewers
Iowa Tests of Basic Skills, Forms A and B
Name
Position
Address
Gender
Ethnicity
A. Yvette
Alvarez-Rooney
Bilingual Diagnostician
Shaw Elementary School
Phoenix, AZ
Female
Hispanic
Richard Botteicher
Assistant Superintendent
Spring Cove School District
Roaring Spring, PA
Male
White
Monte E. Dawson
Director of Monitoring & Evaluation
Alexandria City Public Schools
Alexandria, VA
Male
Black
Hoi Doan
Middle School Teacher
The International Center
Chamblee, GA
Male
Asian
Dr. Todd Fletcher
Assistant Professor
Department of Special Education
University of Arizona
Tucson, AZ
Male
White
Jaime Garcia
Principal
Sunnydale Elementary School
Streamwood, IL
Male
Hispanic
Alfredo A. Gavito
Director of Research
Houston Independent School District
Houston, TX
Male
Hispanic
Paul Guevara
Director of Teacher Support,
Program Compliance and
Community Outreach
Merced City School District
Merced, CA
Male
Hispanic
José Jimenez
Director, Bilingual & World Languages
Camden City School District
Camden, NJ
Male
Hispanic
LaUanah King-Cassell
Principal
St. James & St. John School
Baltimore, MD
Female
Black
Viola LaFontaine, Ph.D.
Superintendent
Belcourt School District #7
Belcourt, ND
Female
American
Indian
Theresa C. Liu, Ph.D.
School Psychology Supervisor
Milwaukee Public Schools
Milwaukee, WI
Female
Asian
Thelma J. Longboat
Program Coordinator
Black Rock Academy Public School #51
Buffalo, NY
Female
American
Indian
Patti Luke
Title I Coordinating Teacher
Arbor Heights Elementary School
Seattle, WA
Female
Asian
Carlos G. Manrique
Assistant Superintendent
El Monte City School District
El Monte, CA
Male
Hispanic
Koko Mikel
Grade 4 Teacher
Sand Lake Elementary School
Anchorage, AK
Female
Asian
Joseph Montecalvo
Planning/Test Specialist
Fairfax County Public Schools
Falls Church, VA
Male
White
Francesca Nguyen
Grade 7 Social Studies Teacher
Nichols Junior High School
Biloxi, MS
Female
Asian
117
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 118
Table 7.7 (continued)
Fairness Reviewers
Iowa Tests of Basic Skills, Forms A and B
Name
Position
Address
Gender
Ethnicity
Michael A. O’Hara
Assistant Superintendent
St. Joseph Public Schools
St. Joseph, MI
Male
White
Margherita Patrick
Grade 1 Teacher
Spring River Elementary School
Gautier, MS
Female
Hispanic
Joseph Prewitt-Diaz, Ph.D. School Psychologist
Chester Upland School District
Chester, PA
Male
Hispanic
Evelyn Reed
Director, Test & Research Department
Dallas Independent School District
Dallas, TX
Female
Black
Rosa Sailes
Medill Training Center
Chicago Public Schools
Chicago, IL
Female
Hispanic
Susan J. Sharp
Educational Consultant
Midlothian, TX
Female
White
Samuel E. Spaght
Associate Superintendent
Curriculum Delivery Services
Wichita Public Schools
Wichita, KS
Male
Black
Harry D. Stratigos, Ed.D.
Math Education Advisor
Pennsylvania Department of Education
Harrisburg, PA
Male
White
John C. Swann, Jr.
Supervisor of Testing
Dayton Public Schools
Dayton, Ohio
Male
Black
Shelby Tallchief
Administrator
Title IX Federal Indian Education Program
Albuquerque Public Schools
Albuquerque, NM
Male
American
Indian
Lawrence Thompson
Guidance Counselor
Big Beaver Falls Area Middle School
Beaver Falls, PA
Male
Black
Rosanna Tubby-Nickey
Choctaw Language Specialist
Choctaw Tribal Schools
Philadelphia, MS
Female
American
Indian
Doris Tyler, Ed.D.
Evaluation Specialist
Wake County Public Schools
Raleigh, NC
Female
Black
Marguerite L. Vellos
Dean of Instruction
Farragut Career Academy
Chicago, IL
Female
Hispanic
Robert C. West, Ph.D.
Testing and Evaluation Consultant
Macomb Intermediate School District
Clinton Township, MI
Male
White
Margaret Winstead
Reading Coordinator
Moore Public Schools
Moore, OK
Female
American
Indian
Debra F. Wynn
Coordinator of Assessment (retired)
Harrison School District
Colorado Springs, CO
Female
Black
Youssef Yomtoob, Ph.D.
Superintendent
Hawthorn School #73
Vernon Hills, IL
Male
White
Liru Zhang, Ph.D.
Education Associate
Assessments & Accountability
Delaware Department of Education
Dover, DE
Female
Asian
118
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 119
Table 7.8 shows a summary of the results of DIF
analyses conducted for the items included in the
final edition of Form A. The main columns of the
table indicate the number of items identified as
favoring a given group according to the classification
scheme used by the Educational Testing Service for
NAEP. In this classification scheme, items are first
flagged for statistical significance, and then for DIF
effect sizes large enough that their impact on total
scores should be considered. In the study of items
from Form A, statistical significance levels for each
subtest were adjusted for the total number of items
on the test. DIF magnitudes that were flagged
correspond roughly to a conditional group difference
of .15 or greater on the proportion correct scale
(Category C in the NAEP scheme).
A total of 3,759 test items were included in the DIF
study that investigated male/female, Black/White,
and Hispanic/White comparisons. As can be seen
from the last row of the table, the overall
percentages of items flagged for DIF in Form A were
small and generally balanced across comparison
groups. This is the goal of careful attention to
content relevance and sensitivity during test
development.
Table 7.8
Number of Items Identified in Category C in National DIF Study
Iowa Tests of Basic Skills, Form A
2000 National Standardization Study
Gender
Group
Number
of
Items
Favor
Females
Favor
Males
Favor
Blacks
Favor
Whites
Favor
Hispanics
Favor
Whites
344
7
15
7
14
14
12
Reading
386
12
12
2
4
2
7
Word Analysis
138
1
1
0
2
0
3
Listening
122
2
0
1
1
5
4
Test
Vocabulary
Spelling
262
5
5
8
4
3
2
Capitalization
174
6
4
1
1
0
1
Punctuation
174
2
1
2
0
0
1
Usage and Expression
355
3
8
10
9
3
10
Concepts & Estimation
305
3
1
3
7
2
6
Problem Solving & Data Interpretation
284
5
3
1
4
0
5
Computation
231
1
1
0
3
0
0
Social Studies
286
6
10
6
5
4
4
Science
286
9
11
2
5
2
6
Sources of Information
412
6
8
5
8
0
1
3759
68
80
48
67
35
62
1.8
2.1
1.3
1.8
0.9
1.6
Total
Percent
119
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 120
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 121
Relationships in Test
Performance
PART 8
area are more likely to do well in other areas, and
students who do poorly in one area are likely to do
poorly in other areas. One part of the relation is due
to “extraneous” factors, such as vocabulary or
reading ability. A greater part, however, is probably
due to the emphasis on a curriculum that promotes
growth in all achievement areas. If a student’s
development lags in one area, instruction is
designed and implemented to strengthen that area.
Variability in the quality of schooling may be
another factor in the consistency of a student’s
performance. A student who is helped by the quality
of instruction in one subject is likely to be helped in
other subjects.
Correlations Among Test Scores
for Individuals
Correlation coefficients among scores on
achievement test batteries indicate whether the
obtained scores measure something in common.
High correlations suggest common sources of
variation in the test scores, but they do not reveal
the cause. Conversely, lack of correlation may be
viewed as evidence the tests are measuring
something unique. It should be noted that
correlations between obtained scores are
attenuated by measurement error in the scores.
Correlations among developmental standard scores of
the Complete Battery Form A are shown in Table 8.1.
These were based on matched samples of students
who took Form A of the Iowa Tests of Basic Skills
(ITBS) and the Cognitive Abilities Test (CogAT) in the
spring 2000 standardization. Correlations are
reported for Levels 5 through 14 in kindergarten
through grade 8.
Structural Relationships Among
Content Domains
Insight into the nature of relations among tests can
sometimes be obtained from factor analysis. The
factor structure of Form A of the ITBS was analyzed
in two stages. Two random samples of 2,000
students each were drawn from the 2000 spring
standardization. The first sample was used to define
a common factor model for the battery using
Moderate to high correlations among achievement
measures are expected in representative samples.
They show that students who do well in one content
Table 8.1
Correlations Among Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 6
Level 5
Vocabulary
Word Analysis
Listening
Language
Mathematics
Core Total
Reading Words
Reading Comprehension
Reading Total
Reading Profile Total
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Core
Total
Reading
Words
Reading
Comprehension
Reading
Total
Reading
Profile
Total
V
WA
Li
L
M
CT
RW
RC
RT
RPT
.56
.65
.60
.65
.67
.71
.60
.71
.67
.74
.88
.73
.77
.89
.87
.49
.76
.53
.59
.68
.66
.46
.71
.50
.56
.65
.62
.81
.49
.78
.54
.60
.70
.67
.95
.95
.78
.89
.78
.77
.80
.89
.86
.83
.89
.51
.61
.63
.58
.89
.85
.56
.59
.62
.65
.83
.72
.69
.76
.83
.70
.87
.76
.84
.74
.91
-
-
-
121
122
-Does not include Computation
+Includes Computation
.93
.73
.48
.70
.73
.58
.58
.63
.51
.64
.85
.86
.49
.46
.66
.79
.79
.89
.76
RC
Comprehension
.78
.54
.71
.77
.63
.62
.68
.53
.68
.91
.92
.55
.52
.69
.85
.85
.95
.95
.93
RT
Reading
Total
.52
.69
.74
.63
.61
.67
.55
.68
.81
.81
.51
.50
.64
.83
.83
.89
.74
.71
.77
WA
Word
Analysis
.39
.57
.62
.60
.65
.42
.64
.65
.63
.63
.59
.56
.77
.76
.69
.58
.54
.60
.52
Li
Listening
.75
.53
.50
.56
.50
.58
.74
.76
.39
.38
.59
.68
.68
.80
.63
.64
.68
.68
.37
L1
Spelling
.67
.66
.72
.57
.73
.91
.92
.56
.54
.69
.85
.85
.83
.71
.71
.75
.73
.56
.74
L
Language
.72
.91
.61
.90
.81
.79
.61
.60
.65
.81
.80
.71
.61
.60
.64
.62
.58
.52
.65
M1
Concepts
.94
.62
.92
.82
.79
.59
.56
.65
.80
.79
.69
.62
.61
.65
.62
.57
.51
.65
.75
M2
Problems
& Data
Interpretation
.67
.98
.88
.85
.65
.62
.70
.86
.85
.75
.80
.65
.68
.44
.41
.55
.62
.64
.59
.47
.49
.51
.51
.40
.48
.55
.62
.64
.67
M3
MT -
.66
.64
.69
.66
.62
.55
.70
.92
.95
Computation
Math
Total
Mathematics
.88
.87
.64
.61
.71
.86
.86
.76
.65
.64
.69
.67
.59
.57
.70
.90
.92
.97
.82
MT +
Math
Total
.99
.65
.62
.77
.95
.94
.94
.86
.85
.92
.80
.66
.73
.90
.82
.83
.88
.64
.87
CT -
Core
Total
.64
.60
.77
.94
.94
.95
.87
.86
.92
.81
.65
.74
.91
.80
.81
.86
.68
.87
.99
CT +
Core
Total
.63
.58
.78
.78
.61
.58
.52
.59
.49
.58
.35
.51
.55
.55
.59
.38
.57
.62
.62
SS
Social
Studies
.56
.76
.76
.59
.57
.49
.57
.48
.60
.31
.49
.56
.55
.60
.36
.57
.61
.60
.61
SC
Science
.82
.82
.74
.69
.68
.73
.66
.58
.57
.70
.67
.69
.73
.54
.72
.80
.79
.58
.58
SI
.99
.93
.84
.81
.88
.83
.76
.65
.83
.79
.80
.85
.60
.83
.95
.94
.75
.76
.85
CC -
Sources
Compoof
site
Information
.93
.84
.81
.88
.83
.76
.66
.83
.79
.79
.84
.61
.83
.95
.94
.75
.75
.85
.99
CC +
Composite
.90
.88
.95
.89
.71
.78
.82
.70
.70
.75
.56
.75
.94
.94
.60
.59
.76
.94
.94
RPT
Reading
Profile
Total
3:15 PM
Note:
.78
.96
.75
.54
.65
.72
.61
.60
.65
.51
.66
.87
.88
.55
.52
.65
.82
.82
.90
RV
Vocabulary
Reading
10/29/10
Vocabulary
Comprehension
Reading Total
Word Analysis
Listening
Spelling
Language
Concepts
Problem Solving & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Sources of Information
Composite without Computation
Composite with Computation
Reading Profile Total
Level 7
Level 8
Table 8.1 (continued)
Correlations Among Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 122
-Does not include Computation
+Includes Computation
.62
.62
.61
.79
.77
.72
.75
.78
.49
.76
.92
.92
.80
.79
.72
.77
.80
.92
.91
.73
.59
.93
.93
.96
.78
.95
.58
.60
.60
.76
.74
.69
.73
.76
.48
.73
.89
.88
.76
.77
.69
.75
.78
.88
.87
.69
.55
.88
RT
Reading
Total
RC
Comprehension
.63
.60
.62
.81
.56
.57
.60
.50
.62
.73
.74
.56
.53
.52
.60
.60
.68
.68
.60
.35
.74
.60
.61
.65
L1
Spelling
.71
.65
.88
.60
.61
.64
.56
.67
.78
.79
.58
.58
.57
.62
.64
.73
.73
.56
.42
.66
.56
.62
.63
.64
L2
Capitalization
.65
.87
.61
.62
.65
.54
.67
.77
.78
.57
.59
.58
.62
.65
.73
.73
.59
.44
.68
.59
.65
.66
.67
.74
L3
.86
.68
.72
.74
.51
.73
.87
.86
.72
.72
.67
.73
.75
.85
.84
.69
.52
.81
.72
.77
.79
.65
.66
.70
L4
.72
.73
.77
.61
.79
.92
.93
.71
.71
.69
.75
.77
.88
.87
.72
.51
.85
.71
.76
.78
.82
.89
.90
.87
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.77
.93
.61
.91
.85
.84
.69
.70
.69
.69
.75
.83
.82
.62
.52
.73
.67
.71
.73
.59
.63
.66
.70
.74
M1
.95
.60
.92
.88
.86
.72
.74
.72
.73
.78
.86
.85
.64
.53
.77
.67
.74
.75
.58
.62
.65
.71
.74
.80
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.64
.97
.92
.90
.75
.76
.75
.75
.81
.90
.89
.67
.56
.80
.71
.76
.78
.62
.66
.69
.74
.78
.94
.96
MT -
Math
Total
.81
.63
.68
.48
.50
.53
.54
.57
.60
.63
.46
.36
.54
.44
.50
.50
.51
.56
.57
.52
.62
.62
.60
.64
M3
Computation
Mathematics
.90
.91
.73
.74
.74
.75
.80
.88
.88
.60
.48
.70
.68
.74
.76
.63
.68
.71
.73
.79
.91
.92
.97
.81
MT +
Math
Total
.99
.81
.82
.78
.82
.86
.97
.97
.77
.60
.93
.84
.89
.92
.76
.79
.82
.87
.93
.87
.88
.92
.64
.91
CT -
Core
Total
.81
.81
.77
.82
.86
.97
.97
.77
.60
.93
.84
.89
.92
.77
.80
.83
.87
.94
.85
.86
.90
.69
.91
.99
CT +
Core
Total
.77
.70
.74
.77
.89
.88
.66
.54
.79
.75
.76
.80
.56
.57
.59
.71
.70
.70
.71
.75
.47
.72
.80
.80
SS
Social
Studies
.72
.75
.79
.90
.90
.66
.54
.79
.73
.78
.80
.55
.59
.62
.73
.71
.71
.74
.76
.50
.74
.82
.81
.78
SC
Science
.71
.94
.83
.87
.60
.51
.72
.66
.72
.74
.54
.59
.62
.70
.71
.71
.73
.76
.52
.75
.79
.79
.72
.74
S1
.91
.86
.84
.65
.50
.78
.68
.74
.75
.62
.63
.66
.72
.75
.69
.71
.74
.54
.74
.81
.81
.71
.73
.73
S2
.91
.92
.68
.54
.81
.72
.79
.80
.62
.65
.69
.76
.78
.76
.78
.81
.57
.80
.86
.86
.77
.79
.94
.91
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.77
.62
.92
.84
.89
.92
.71
.74
.77
.85
.88
.84
.86
.90
.61
.88
.97
.97
.88
.90
.85
.85
.92
CC -
.76
.62
.91
.83
.88
.91
.70
.74
.77
.85
.88
.84
.85
.89
.63
.88
.97
.96
.88
.90
.88
.83
.92
.99
CC +
Compo- Composite
site
.59
.89
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
WA
Word
Analysis
.73
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Li
Listening
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
RPT
Reading
Profile
Total
3:15 PM
Note:
.78
.93
.59
.56
.55
.72
.71
.66
.69
.72
.45
.69
.85
.85
.74
.73
.65
.70
.73
.84
.84
.69
.55
.87
RV
Vocabulary
Reading
10/29/10
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problems & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
Word Analysis
Listening
Reading Profile Total
Level 9
Level 10
Table 8.1 (continued)
Correlations Among Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 123
123
124
-Does not include Computation
+Includes Computation
.66
.66
.66
.78
.78
.73
.73
.77
.49
.74
.91
.91
.81
.82
.72
.76
.79
.91
.91
.93
.96
.79
.96
.63
.64
.65
.77
.76
.71
.72
.75
.49
.73
.89
.88
.78
.80
.71
.75
.78
.89
.88
RT
Reading
Total
RC
Comprehension
.68
.70
.67
.85
.62
.61
.64
.53
.66
.78
.79
.57
.58
.57
.65
.65
.73
.72
.59
.62
.64
L1
Spelling
.76
.69
.89
.66
.65
.69
.57
.70
.82
.83
.60
.63
.62
.66
.69
.77
.77
.59
.66
.67
.67
L2
Capitalization
.71
.90
.67
.66
.70
.59
.72
.83
.84
.60
.63
.63
.68
.70
.77
.77
.58
.66
.66
.69
.75
L3
.88
.70
.72
.75
.53
.74
.87
.87
.70
.74
.69
.73
.76
.85
.84
.71
.78
.79
.66
.68
.72
L4
.76
.75
.79
.63
.80
.94
.95
.70
.74
.71
.77
.80
.89
.88
.70
.78
.79
.84
.89
.91
.88
LT
.82
.94
.64
.92
.87
.86
.70
.74
.75
.71
.78
.85
.85
.65
.72
.73
.59
.64
.66
.71
.74
M1
.96
.61
.93
.88
.86
.71
.76
.76
.73
.80
.87
.86
.65
.73
.73
.56
.62
.64
.71
.73
.81
M2
.65
.97
.92
.90
.74
.78
.79
.76
.83
.90
.90
.68
.76
.77
.60
.66
.68
.75
.77
.94
.96
MT -
Math
Total
.82
.64
.69
.47
.51
.53
.55
.58
.61
.64
.43
.53
.51
.52
.57
.59
.55
.64
.67
.62
.67
M3
Computation
Mathematics
Concepts Problems
Punctu- Usage & Language
& Data
&
ation
Expression
Total
InterpreEstimation
tation
Language
.90
.91
.71
.76
.77
.75
.82
.88
.89
.65
.74
.74
.62
.68
.70
.74
.78
.92
.92
.97
.84
MT +
Math
Total
.99
.81
.84
.80
.83
.87
.97
.97
.83
.90
.92
.76
.81
.82
.88
.93
.87
.87
.91
.66
.90
CT -
Core
Total
.80
.83
.79
.83
.87
.97
.97
.82
.89
.91
.77
.82
.83
.88
.94
.86
.85
.90
.71
.91
.99
CT +
Core
Total
.80
.71
.73
.78
.89
.89
.75
.79
.81
.56
.62
.60
.73
.72
.70
.71
.74
.50
.72
.82
.81
SS
Social
Studies
.74
.76
.81
.91
.91
.72
.80
.81
.56
.62
.63
.76
.74
.73
.75
.78
.54
.76
.84
.83
.80
SC
Science
.73
.94
.85
.88
.63
.70
.71
.53
.60
.61
.68
.69
.75
.75
.78
.55
.77
.79
.78
.71
.73
S1
.92
.87
.84
.68
.76
.77
.62
.66
.67
.75
.77
.72
.72
.76
.56
.75
.83
.83
.76
.78
.73
S2
.92
.93
.70
.79
.79
.62
.68
.69
.76
.78
.79
.79
.83
.60
.82
.87
.86
.79
.81
.94
.92
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.82
.90
.91
.70
.76
.77
.86
.88
.85
.86
.90
.64
.88
.97
.97
.90
.91
.84
.88
.92
CC -
.81
.89
.90
.70
.76
.77
.85
.88
.85
.85
.90
.66
.89
.97
.97
.89
.91
.87
.85
.93
.99
CC +
Compo- Composite
site
3:15 PM
Note:
.78
.93
.61
.59
.59
.70
.71
.67
.66
.70
.42
.66
.83
.83
.76
.74
.64
.68
.71
.83
.83
RV
Vocabulary
Reading
10/29/10
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problem Solving and Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
Level 11
Level 12
Table 8.1 (continued)
Correlations Among Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 124
-Does not include Computation
+Includes Computation
.66
.70
.69
.81
.80
.74
.75
.78
.49
.75
.92
.92
.82
.80
.72
.76
.80
.91
.91
.93
.96
.80
.96
.64
.70
.70
.80
.80
.73
.76
.78
.51
.75
.91
.90
.81
.80
.72
.76
.80
.90
.90
RT
Reading
Total
RC
Comprehension
.69
.70
.69
.84
.59
.56
.60
.49
.62
.77
.78
.58
.56
.53
.62
.62
.71
.70
.62
.64
.67
L1
Spelling
.78
.73
.90
.67
.65
.69
.54
.70
.83
.84
.67
.64
.63
.68
.70
.79
.79
.63
.70
.71
.70
L2
Capitalization
.76
.92
.67
.66
.70
.55
.71
.84
.85
.65
.64
.63
.69
.70
.79
.79
.61
.69
.69
.71
.77
L3
.90
.72
.73
.76
.51
.74
.89
.89
.75
.74
.69
.75
.77
.87
.86
.70
.77
.78
.70
.74
.78
L4
.75
.74
.78
.59
.78
.94
.94
.75
.73
.70
.77
.79
.89
.89
.72
.79
.80
.85
.90
.92
.91
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.82
.94
.63
.92
.87
.86
.71
.71
.73
.71
.77
.85
.85
.66
.72
.73
.61
.67
.67
.70
.74
M1
.96
.59
.92
.88
.86
.75
.75
.74
.73
.79
.87
.86
.64
.74
.73
.58
.64
.66
.72
.73
.81
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.64
.96
.92
.90
.77
.77
.77
.75
.82
.90
.90
.68
.77
.77
.62
.69
.70
.75
.77
.94
.96
MT -
Math
Total
.83
.62
.68
.50
.50
.50
.53
.55
.60
.63
.41
.47
.47
.47
.51
.53
.51
.57
.61
.55
.61
M3
Computation
Mathematics
.89
.90
.74
.74
.74
.74
.80
.87
.88
.65
.73
.74
.63
.69
.70
.73
.77
.91
.91
.95
.82
MT +
Math
Total
.99
.83
.83
.79
.82
.86
.97
.97
.83
.90
.92
.78
.84
.84
.88
.94
.87
.87
.91
.60
.89
CT -
Core
Total
.83
.82
.78
.82
.86
.97
.97
.83
.89
.91
.79
.84
.85
.89
.94
.86
.85
.90
.66
.90
.99
CT +
Core
Total
.81
.73
.76
.80
.91
.90
.72
.79
.80
.57
.64
.63
.72
.72
.69
.73
.75
.46
.72
.81
.81
SS
Social
Studies
.74
.76
.81
.91
.90
.71
.79
.79
.57
.65
.65
.74
.73
.71
.75
.77
.47
.74
.83
.82
.80
SC
Science
.73
.94
.85
.87
.62
.70
.70
.52
.61
.61
.67
.68
.72
.74
.77
.46
.73
.77
.76
.72
.76
S1
.92
.87
.84
.67
.75
.75
.63
.68
.70
.76
.78
.70
.73
.75
.51
.74
.82
.82
.75
.77
.74
S2
.92
.92
.69
.78
.78
.61
.69
.70
.76
.78
.76
.79
.82
.52
.79
.85
.85
.79
.82
.94
.92
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.81
.89
.91
.71
.79
.79
.86
.89
.84
.86
.90
.57
.87
.97
.96
.90
.91
.84
.87
.92
CC -
.81
.89
.90
.71
.78
.79
.85
.88
.84
.86
.89
.60
.87
.96
.96
.89
.91
.87
.85
.92
.99
CC +
Compo- Composite
site
3:15 PM
Note:
.79
.93
.62
.61
.60
.72
.71
.66
.65
.69
.41
.65
.83
.82
.73
.71
.64
.67
.70
.82
.81
RV
Vocabulary
Reading
10/29/10
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problem Solving and Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
Level 13
Level 14
Table 8.1 (continued)
Correlations Among Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 125
125
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 126
exploratory factor analysis techniques. Correlations
among developmental standard scores were used
with least-squares estimates of communality. In
grades 3 through 8, the factor solutions were based
on correlations among the thirteen tests in Levels 9
through 14. In grades 1 and 2, solutions were based
on the eleven tests in Levels 7 and 8. In
kindergarten and grade 1, solutions were based on
the five tests in Level 5 and the six tests in Level 6.
After the least-squares factor solutions were
obtained, both orthogonal and oblique simple
structure transformations were performed. Three
factors were retained for Levels 7 through 14; two
were retained for Levels 5 and 6.
Information. These were again attributed to the
multidisciplinary nature of those content domains in
the school curriculum.
In the second sample for each test level, a restricted
factor analysis model based on the final solution in
the first sample was used for cross-validation.
Because the cross-validation results were similar to
the initial results, only the former are described in
this section.
Levels 7 and 8
Levels 9 through 14
At these levels, tests that define major subject areas
in the elementary school curriculum—Vocabulary,
Reading
Comprehension,
Language,
and
Mathematics—determined the three factors. Tests
in Social Studies, Science, and Sources of
Information were less consistent in their factor
composition. The three factors at Levels 9 through
14 were characterized as follows.
Factor I, “Verbal Reasoning or Comprehension”
The Vocabulary and Reading tests had the highest
loadings on this factor in all six grades and provided
the clearest interpretation of the factor. Several
other tests had substantial loadings on this factor:
Usage and Expression, Maps and Diagrams, Social
Studies, and Science. The influence of the first two of
these tests on Factor I decreased from Level 9 to
Level 14. It was still large enough to suggest the
importance of verbal skills for these tests but with
less weight in the upper grades. The loadings of the
Social Studies and Science tests on this factor
probably reflect the multidisciplinary nature of those
subjects and the influence of verbal comprehension
in the elementary school curriculum.
Factor II, “Mathematical Reasoning”
The three Math tests formed this factor. Math
Concepts and Estimation had the highest loadings
at all levels. Problem Solving and Data
Interpretation and Computation contributed to this
factor, but at slightly varying degrees depending on
level. Small but appreciable influences on this factor
were noted in Social Studies, Science, and Sources of
126
Factor III, “Aspects of Written Language”
In all grades, tests loading on this factor were
Spelling, Capitalization, and Punctuation. Loadings
of the Spelling test generally increased across the
grades. Loadings of the Usage and Expression test
also increased from lower to upper elementary
grades. By sixth grade, Usage and Expression was
clearly associated with the language factor, although
its loadings in the upper grades were smaller than
those of other language tests.
These levels have a subtest structure similar to that
of Levels 9 through 14 except in language arts. The
three factors defined at these levels reveal contrasts
between the tests in Levels 7 and 8 and those in
Levels 9 through 14. The first two factors were
similar to the ones described above. The Language
and Word Analysis tests helped define the first
factor; the three Math tests defined the second. The
third factor related to the tests that require
interpreting pictures while listening to a teacher
(Listening, Social Studies, and Science).
Levels 5 and 6
Composition of tests and the integrated curriculum
in the early elementary grades influence
correlations among tests at these levels. Factor
analysis reflected these conditions. In Levels 5 and
6, where a smaller battery is given, two factors were
defined: a verbal comprehension factor and a factor
related to tests of skills developed through direct
instruction in kindergarten and grade 1. The
Vocabulary, Listening, Language, and Math tests
defined the first factor in both levels. The second
factor was influenced by the Word Analysis test in
Level 5 and by the Word Analysis, Reading, and
Math tests in Level 6.
Interpretation of Factors
Whether the factors defined above result from
general cognitive abilities, specific skills required in
different tests, item types, qualitative differences
among tests, or school curriculum is unknown. The
correlations among factors were substantial. In the
eight grades that take Levels 7 through 14, the
median correlation between factors was .65; 18 of the
24 values were between .60 and .69. This indicates a
general factor accounts for most variability. These
961464_ITBS_GuidetoRD.qxp
10/29/10
3:15 PM
Page 127
results do not imply that score differences between
tests are unreliable, however.
A study of grade 5 national standardization results
for Forms G and H (Martin & Dunbar, 1985)
clarified the internal structure of equivalent forms
of the ITBS. One purpose of the study was to
investigate the presence of group factors after
controlling for a general factor. The analysis was
based on 48 composite variables derived from
homogeneous sets of items from each test.
The group factors were identified as (1) verbal
comprehension, (2) language mechanics, (3) solving
problems that use quantitative concepts and visual
materials, and (4) computation.
Application of extension analysis to subtests of the
CogAT confirmed interpretations of the ITBS group
factors. The CogAT Verbal subtests loaded on the
verbal comprehension factor. The Quantitative
subtests had the highest loadings on the visual
materials and math computation factors. The CogAT
Nonverbal tests, which include geometric figures
and patterns, had the highest loadings on the third
factor. This supports the visual materials
interpretation.
A study of relations between the ITBS and the Iowa
Tests of Educational Development (ITED) was part
of the initial investigation of the joint scaling of
these batteries (Becker & Dunbar, 1990). An
interbattery factor analysis was conducted to
examine the relations of tests in the ITBS and the
ITED. Results suggested that factors related to
verbal comprehension, quantitative concepts and
visual materials, and language skills were stable
across batteries. This study also replicated research
on the multidisciplinary nature of content in Social
Studies, Science, and Sources of Information.
Support for the joint scaling of the ITBS and ITED
was established.
Reliabilities of Differences
in Test Performance
For any test battery, the reliability of differences in
an individual’s performance across test areas is of
interest. The meaningfulness of strengths and
weaknesses in a profile of scores depends on the
reliability of these differences. The interpretation of
score differences across tests is discussed in the
ITBS Interpretive Guide for Teachers and
Counselors.
Reliabilities of differences among major test areas
appear in Table 8.2. Computational procedures for
these coefficients appear in Thorndike (1963, pp.
117–120). The correlations among tests were
reported earlier in this section. The K-R20 reliability
coefficients in Table 5.1 were used to compute the
reliabilities of differences.
Reliabilities of differences between scores on
individual tests are reported in Table 8.3. These
were also based on the reliability coefficients and
correlations reported earlier. Despite the relatively
high correlations among tests, reliabilities of
differences between scores are substantial (nearly
90 percent above .50). These results support the use
of ITBS subtests to identify strengths and
weaknesses.
Correlations Among Building Averages
Correlations among building averages for
kindergarten through grade 8 are shown in Table
8.4. These correlations are higher than those for
student
scores,
reflecting
the
consistent
performance of groups of students across tests. The
exception is for Math Computation, particularly in
the lower grades where correlations with other
Math tests are relatively low.
Relations Between Achievement and
General Cognitive Ability
The ITBS and the CogAT were standardized on the
same population and were given under the same
conditions at about the same time. This enables
comparisons of achievement and ability under
nearly ideal conditions— a nationwide sample
selected to be representative on factors related to
ability and achievement. Only students who took all
tests in the ITBS Complete Battery and all three
CogAT test batteries were included. The sample
sizes in each grade were:
Grade
K
1
1
2
3
4
5
6
7
8
Level
5
6
7
8
9
10
11
12
13
14
N
6,111
7,128
6,800
14,870
14,978
15,935
16,517
14,972
11,896
10,294
127
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 128
Table 8.2
Reliabilities of Differences Among Scores for Major Test Areas: Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 8
Grade 2
Level 12
Grade 6
Reading
Total
Language
Total
Math
Total
Sources
Total
RT
LT
MT +
ST
.62
.78
.61
Reading Total
.66
.55
Language Total
.78
.59
Mathematics Total
.79
.77
Sources Total
.66
.70
.63
Reading
Total
Language
Total
Math
Total
Sources
Total
RT
LT
MT +
ST
.78
.81
.69
.80
.72
Level 7
Grade 1
Reading Total
Level 11
Grade 5
Reading
Total
Language
Total
Math
Total
Sources
Total
RT
LT
MT +
ST
.77
.78
.66
.78
.70
Language Total
.56
Mathematics Total
.74
.60
Sources Total
.62
.53
.57
Reading
Total
Language
Total
Math
Total
Sources
Total
RT
LT
MT +
ST
.77
.76
.64
Reading Total
.75
.70
Language Total
.77
.63
Mathematics Total
.79
.79
Sources Total
.65
.70
Level 10
Grade 4
Level 9
Grade 3
Reading Total
Language Total
.79
Mathematics Total
.75
.75
Sources Total
.62
.69
Note:
Level 14
Grade 8
Level 13
Grade 7
.58
.61
.68
.64
+Includes Computation
Table 8.3
Reliabilities of Differences Among Tests: Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 5
Grade K
Level 6
Grade 1
Vocabulary
Word
Analysis
Listening
Language
Mathematics
Reading
Words
Reading
Comprehension
Reading
Total
V
WA
Li
L
M
RW
RC
RT
.46
.26
.31
.43
.62
.65
.67
.44
.38
.34
.34
.45
.41
.21
.36
.62
.65
.67
.26
.60
.63
.65
.54
.59
.60
Vocabulary
128
Word Analysis
.50
Listening
.35
.54
Language
.31
.51
.24
Mathematics
.40
.47
.34
.30
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 129
Table 8.3 (continued)
Reliabilities of Differences Among Tests: Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 8
Grade 2
Level 7
Grade 1
Reading
Mathematics
Word
Analysis
Listening
Spelling
Language
Problems
& Data Computation
Interpretation
Social
Studies
Science
Sources
of
Information
Vocabulary
Comprehension
RV
RC
WA
Li
L1
L
M1
M2
M3
SS
SC
SI
.52
.46
.52
.59
.57
.56
.62
.73
.44
.54
.56
.56
.59
.61
.61
.61
.66
.74
.53
.63
.60
.55
.48
.48
.52
.58
.68
.51
.59
.55
.64
.54
.42
.48
.64
.26
.32
.49
.42
.59
.65
.68
.60
.67
.62
.51
.58
.68
.52
.61
.55
.23
.50
.38
.44
.45
.55
.43
.51
.50
.59
.66
.66
.21
.41
Vocabulary
Concepts
Comprehension
.53
Word Analysis
.49
.56
Listening
.55
.63
.53
Spelling
.66
.65
.58
.65
Language
.56
.59
.47
.50
.50
Concepts
.56
.63
.50
.31
.63
.47
Problems & Data Interpretation
.62
.66
.56
.39
.69
.52
.25
Computation
.75
.77
.69
.62
.74
.69
.54
.57
Social Studies
.60
.66
.59
.26
.70
.57
.39
.46
.66
Science
.59
.66
.58
.30
.68
.56
.38
.47
.65
.29
Sources of Information
.61
.64
.58
.48
.66
.53
.46
.49
.68
.51
Level 10
Grade 4
Level 9
Grade 3
Reading
Vocabulary
Comprehension
RV
Vocabulary
Language
.50
.51
Sources of
Information
Mathematics
Social
Studies
Science
Word
Analysis
Listening
S2
WA
Li
.63
.65
—
—
.50
.53
.56
—
—
.72
.75
.71
.68
—
—
.68
.64
.66
.62
.61
—
—
.57
.69
.64
.65
.60
.59
—
—
.56
.77
.59
.60
.58
.58
—
—
.30
.67
.55
.57
.50
.58
—
—
.66
.48
.48
.41
.50
—
—
.75
.76
.71
.73
—
—
.41
.47
.53
—
—
.47
.53
—
—
.47
—
—
—
—
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Maps
Reference
and
Materials
Diagrams
Spelling
Capitalization
Punctuation
Usage &
Expression
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
.56
.74
.71
.70
.66
.66
.63
.81
.53
.61
.73
.66
.65
.57
.60
.52
.78
.49
.63
.61
.71
.71
.69
.77
.40
.62
.61
.59
.59
.60
.62
Comprehension
.57
Spelling
.75
.77
Capitalization
.73
.72
.68
Punctuation
.72
.70
.68
.51
Usage and Expression
.62
.59
.72
.65
.63
Concepts and Estimation
.59
.58
.69
.61
.59
.56
Problems & Data Interpretation
.60
.57
.71
.64
.62
.56
.30
Computation
.78
.78
.76
.69
.69
.75
.60
.65
Social Studies
.50
.50
.71
.66
.64
.54
.48
.48
.72
Science
.55
.50
.73
.66
.64
.54
.47
.45
.72
.35
Maps and Diagrams
.57
.54
.69
.61
.59
.54
.40
.40
.65
.42
.39
Reference Materials
.61
.56
.70
.64
.63
.56
.51
.49
.71
.46
.44
.43
Word Analysis
.59
.62
.68
.67
.64
.58
.57
.59
.73
.55
.56
.56
.59
Listening
.59
.61
.72
.66
.63
.61
.55
.57
.69
.54
.55
.54
.61
—
.50
129
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 130
Table 8.3 (continued)
Reliabilities of Differences Among Tests: Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Level 12
Grade 6
Level 11
Grade 5
Reading
Language
Social
Studies
Science
M3
SS
SC
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
.52
.75
.65
.73
.65
.70
.65
.78
.52
.75
.61
.69
.56
.65
.57
.75
.58
.65
.71
.76
.73
.42
.57
.62
.62
Vocabulary
Sources of
Information
Mathematics
M2
S1
S2
.62
.63
.63
.46
.51
.56
.54
.75
.73
.77
.71
.70
.58
.63
.59
.63
.56
.56
.68
.64
.69
.68
.70
.63
.63
.66
.59
.73
.58
.59
.59
.56
.36
.63
.60
.62
.47
.59
.63
.52
.52
.38
.52
.72
.74
.65
.69
.41
.47
.45
.50
.48
Reading Comprehension
.53
Spelling
.74
.74
Capitalization
.69
.66
.61
Punctuation
.71
.68
.62
.43
Usage and Expression
.64
.56
.69
.59
.59
Concepts and Estimation
.65
.62
.70
.59
.61
.61
Problems & Data Interpretation
.64
.57
.70
.59
.60
.56
.28
Computation
.81
.80
.78
.70
.71
.77
.67
.68
Social Studies
.50
.48
.73
.65
.67
.60
.57
.52
.77
Science
.58
.49
.75
.65
.68
.58
.56
.49
.78
.40
Maps and Diagrams
.62
.55
.70
.58
.60
.56
.43
.36
.70
.47
.47
Reference Materials
.65
.57
.70
.61
.61
.58
.59
.52
.75
.53
.52
Level 14
Grade 8
Level 13
Grade 7
Reading
Language
.47
Sources of
Information
Mathematics
Concepts Problems
Computa& Data
&
tion
InterpreEstimation
tation
.45
Social
Studies
Science
Maps
Reference
and
Materials
Diagrams
Vocabulary
Comprehension
Spelling
Capitalization
Punctuation
Usage &
Expression
RV
RC
L1
L2
L3
L4
M1
M2
M3
SS
SC
S1
S2
.55
.73
.64
.69
.64
.70
.68
.79
.60
.64
.65
.64
.76
.62
.67
.61
.69
.63
.80
.55
.59
.60
.60
.58
.61
.66
.75
.74
.78
.75
.77
.73
.70
.38
.50
.62
.61
.70
.62
.63
.59
.55
.48
.66
.64
.72
.67
.67
.63
.57
.66
.60
.76
.60
.61
.59
.53
.44
.70
.66
.65
.54
.62
.71
.57
.55
.46
.54
.77
.77
.72
.73
.45
.49
.51
.46
.50
Vocabulary
Reading Comprehension
.54
Spelling
.73
.77
Capitalization
.64
.60
.59
Punctuation
.71
.67
.65
.35
Usage and Expression
.63
.57
.70
.53
.55
Concepts and Estimation
.69
.67
.77
.60
.66
.66
Problems & Data Interpretation
.65
.57
.75
.58
.63
.59
.36
Computation
.77
.77
.76
.66
.70
.75
.66
.66
Social Studies
.56
.48
.75
.57
.66
.56
.62
.50
.72
Science
.63
.56
.79
.64
.69
.62
.65
.54
.75
.42
Maps and Diagrams
.59
.53
.71
.53
.59
.56
.48
.39
.66
.43
.45
Reference Materials
.65
.59
.73
.56
.62
.58
.64
.55
.72
.50
.55
130
Maps
Reference
and
Materials
Diagrams
.45
.45
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 131
Correlations between CogAT and ITBS scores
appear in Table 8.5. Patterns of correlations tend to
agree with common sense. In Levels 9 through 14,
average correlations between the ITBS Complete
Composite and CogAT are .86, .78, and .73 for the
Verbal, Quantitative, and Nonverbal batteries,
respectively.
Correlations
between
CogAT
Quantitative and ITBS Math scores are
substantially higher than others in the table.
Comparisons of ability and achievement tests are
meaningful only if the tests measure unique
characteristics. To some extent, this can be
determined subjectively by examining test content.
Revisions of the Verbal and Quantitative Batteries of
the CogAT have reduced overlap with ITBS subtests.
ITBS tests in Vocabulary, Reading, and Language
overlap somewhat with the Verbal Battery of the
CogAT. Each battery requires skill in vocabulary,
reading, and verbal reasoning. Similar overlap exists
between the ITBS Math tests and the CogAT
Quantitative Battery. In contrast, the CogAT
Nonverbal Battery measures cognitive skills distinct
from any ITBS test. The unique skills measured by
the Nonverbal Battery (particularly in abstract
reasoning) provide the rationale for using Nonverbal
scores to set expectations for ITBS performance.
Predicting Achievement from General
Cognitive Ability: Individual Scores
The combined scoring system for the ITBS and the
CogAT is described in the Interpretive Guide for
School Administrators. The prediction equations in
this system are based on matched samples of
students who took the Complete Battery of the ITBS
and the three test batteries of the CogAT in the
spring 2000 standardization.
Predicted ITBS scores are based on nonlinear
regression equations that use one of the four CogAT
scores (Verbal, Quantitative, Nonverbal, or
Composite). For example, the equation for predicting
Table 8.4
Correlations Among School Average Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
Grade 1 (Level 6/7)
Kindergarten
(Level 5/6)
Reading
Word
Analysis
Listening
Language
Math
Total
Core
Total
Reading
Profile
Total
Li
L
MT -
CT -
RPT
Vocabulary
Reading
Words
Comprehension
Reading
Total
RV
RW
RC
RT
WA
.51
.71
.81
.74
.73
.78
.74
.91
.88
.84
.93
.79
.65
.65
.65
.64
.85
.95
.83
.65
.77
.75
.81
.90
.85
.68
.77
.78
.84
.94
.74
.80
.78
.82
.93
.80
.82
.82
.83
.85
.94
.88
.93
.86
Vocabulary
Reading Words
.48
Reading Comprehension
.34
.70
Reading Total
.45
.91
.93
Word Analysis
.64
.70
.58
.71
Listening
.84
.46
.36
.47
.60
Language
.84
.48
.35
.48
.61
.93
Math Total
.81
.59
.43
.56
.73
.85
.87
Core Total
.95
.55
.39
.52
.69
.92
.95
.93
Reading Profile Total
.90
.80
.71
.82
.86
.87
.84
.87
Note:
-Does not include Computation
.94
.92
131
.85
.68
.82
.77
.83
.78
.78
.71
.84
.84
.95
.78
.73
.80
.91
.90
.94
.97
.96
RT
Reading
Total
.74
.79
.80
.80
.77
.78
.70
.80
.82
.89
.76
.71
.79
.91
.92
.93
.86
.85
.88
WA
Word
Analysis
.59
.80
.81
.78
.82
.53
.78
.82
.76
.83
.82
.77
.88
.88
.83
.79
.76
.80
.75
Li
Listening
.83
.72
.68
.72
.66
.74
.84
.84
.67
.60
.75
.79
.79
.87
.78
.81
.82
.81
.61
L1
Spelling
.84
.82
.85
.72
.86
.94
.95
.81
.73
.87
.91
.92
.88
.83
.86
.87
.83
.77
.82
L
Language
.88
.96
.73
.95
.92
.90
.79
.78
.80
.92
.91
.87
.79
.78
.81
.80
.74
.71
.80
M1
Concepts
.97
.76
.96
.90
.88
.80
.79
.85
.90
.89
.83
.77
.78
.80
.79
.79
.70
.82
.89
M2
Problems
& Data
Interpretation
.77
.99
.93
.92
.82
.81
.85
.94
.93
.86
.81
.81
.83
.82
.81
.73
.85
.97
.97
MT -
Math
Total
.86
.77
.80
.60
.55
.72
.72
.73
.73
.54
.61
.59
.64
.47
.58
.60
.70
.67
.70
M3
Computation
Mathematics
.94
.93
.83
.81
.88
.93
.92
.88
.78
.81
.82
.83
.75
.74
.83
.95
.94
.97
.84
MT +
Math
Total
.99
.86
.80
.89
.97
.96
.94
.92
.93
.95
.89
.84
.83
.95
.91
.91
.94
.66
.92
CT -
Core
Total
.84
.78
.88
.96
.96
.97
.92
.93
.96
.89
.82
.84
.95
.90
.90
.93
.70
.92
.99
CT +
Core
Total
.87
.84
.92
.92
.85
.76
.73
.77
.72
.75
.58
.71
.74
.69
.74
.46
.70
.78
.77
SS
Social
Studies
.82
.89
.89
.79
.78
.70
.77
.69
.83
.52
.70
.75
.74
.77
.41
.71
.79
.77
.79
SC
Science
.92
.92
.87
.82
.83
.85
.80
.79
.70
.83
.83
.83
.86
.55
.82
.89
.88
.78
.80
SI
.99
.96
.92
.90
.94
.91
.89
.77
.90
.90
.89
.92
.62
.89
.97
.96
.86
.87
.92
CC -
Sources
Compoof
site
Information
.96
.92
.90
.94
.91
.89
.78
.90
.90
.89
.92
.63
.89
.97
.97
.86
.87
.92
.99
CC +
Composite
.95
.94
.98
.94
.86
.86
.90
.84
.84
.88
.62
.86
.97
.97
.78
.78
.86
.97
.97
RPT
Reading
Profile
Total
3:16 PM
-Does not include Computation
+Includes Computation
.95
.83
.65
.85
.77
.80
.75
.75
.72
.81
.81
.93
.74
.66
.80
.87
.87
.90
.88
RC
Comprehension
Reading
10/29/10
Note:
.71
.81
.74
.73
.75
.78
.79
.76
.74
.67
.80
.91
.91
.77
.73
.75
.88
.87
.88
Vocabulary
Comprehension
Reading Total
Word Analysis
Listening
Spelling
Language
Concepts
Problem Solving & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Sources of Information
Composite without Computation
Composite with Computation
Reading Profile Total
Vocabulary
RV
132
Level 7 — Grade 1
Level 8 — Grade 2
Table 8.4 (continued)
Correlations Among School Average Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 132
.98
.77
.74
.78
.91
.86
.86
.90
.90
.58
.86
.95
.94
.87
.92
.85
.89
.90
.95
.95
.62
.26
.79
.88
.78
.75
.78
.92
.87
.87
.91
.91
.60
.87
.96
.96
.89
.93
.87
.90
.92
.96
.96
.71
.40
.88
.97
.97
RT
Reading
Total
.81
.79
.78
.88
.76
.78
.79
.67
.79
.85
.85
.73
.72
.70
.79
.76
.81
.81
.44
.38
.68
.75
.75
.77
L1
Spelling
.90
.80
.95
.76
.77
.78
.68
.79
.86
.87
.75
.74
.73
.79
.78
.82
.82
.65
.38
.73
.72
.78
.77
.80
L2
Capitalization
.82
.95
.78
.81
.81
.68
.81
.88
.89
.77
.77
.75
.81
.80
.85
.85
.59
.34
.67
.74
.79
.79
.79
.90
L3
.92
.84
.90
.89
.63
.86
.95
.94
.86
.90
.84
.89
.89
.94
.93
.70
.41
.84
.86
.90
.91
.77
.84
.83
L4
.84
.88
.88
.70
.87
.95
.96
.84
.85
.81
.88
.87
.92
.92
.70
.43
.84
.82
.87
.87
.88
.95
.95
.93
LT
.92
.97
.73
.96
.93
.92
.82
.85
.83
.85
.87
.91
.91
.62
.34
.72
.75
.83
.82
.70
.74
.79
.82
.83
M1
.98
.69
.95
.96
.95
.87
.90
.88
.90
.91
.95
.95
.76
.46
.85
.81
.89
.88
.76
.78
.80
.88
.87
.92
M2
.72
.97
.96
.96
.87
.89
.87
.89
.91
.95
.95
.72
.41
.81
.79
.88
.86
.74
.77
.81
.87
.86
.98
.98
MT -
Math
Total
.85
.70
.74
.58
.60
.62
.63
.64
.66
.68
.49
.37
.55
.53
.57
.56
.62
.63
.66
.59
.67
.70
.68
.70
M3
Computation
Mathematics
Concepts Problems
Punctu- Usage & Language
& Data
&
ation
Expression
Total
InterpreEstimation
tation
Language
.94
.95
.83
.86
.85
.86
.88
.92
.92
.69
.42
.78
.76
.84
.82
.75
.78
.82
.84
.86
.95
.95
.97
.84
MT +
Math
Total
.99
.90
.92
.88
.93
.93
.98
.98
.74
.44
.89
.90
.94
.95
.84
.88
.90
.95
.96
.91
.95
.95
.67
.92
CT -
Core
Total
.89
.92
.88
.92
.93
.98
.98
.74
.44
.89
.89
.94
.94
.85
.89
.90
.94
.96
.91
.94
.94
.72
.93
.99
CT +
Core
Total
.90
.87
.88
.90
.94
.94
.68
.38
.80
.85
.88
.89
.75
.78
.76
.88
.85
.81
.86
.85
.55
.81
.90
.89
SS
Social
Studies
.89
.91
.93
.96
.96
.76
.54
.86
.87
.90
.91
.70
.76
.75
.90
.84
.83
.88
.87
.58
.84
.91
.91
.91
SC
Science
.87
.97
.92
.93
.73
.56
.83
.84
.90
.89
.74
.79
.81
.89
.87
.84
.89
.88
.59
.85
.92
.92
.88
.89
S1
.96
.95
.94
.64
.51
.83
.81
.87
.86
.78
.81
.80
.89
.88
.81
.87
.86
.60
.83
.91
.90
.84
.87
.86
S2
.96
.97
.72
.56
.87
.86
.91
.91
.79
.83
.84
.92
.91
.86
.91
.90
.62
.87
.95
.95
.90
.91
.97
.96
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.75
.48
.90
.90
.95
.95
.81
.85
.86
.95
.94
.89
.94
.93
.64
.90
.99
.98
.94
.95
.94
.92
.97
CC -
.76
.49
.90
.90
.94
.95
.81
.85
.86
.94
.94
.89
.94
.93
.66
.91
.98
.98
.94
.95
.95
.91
.97
.99
CC +
Compo- Composite
site
.65
.88
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
WA
Word
Analysis
.75
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
Li
Listening
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
—
RPT
Reading
Profile
Total
3:16 PM
-Does not include Computation
+Includes Computation
.91
.97
.77
.73
.73
.89
.83
.85
.87
.88
.59
.84
.93
.92
.87
.90
.85
.87
.89
.93
.93
.78
.54
.92
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problems & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
Word Analysis
Listening
Reading Profile Total
RC
Comprehension
Reading
10/29/10
Note:
RV
Vocabulary
Level 9 — Grade 3
Level 10 — Grade 4
Table 8.4 (continued)
Correlations Among School Average Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 133
133
134
-Does not include Computation
+Includes Computation
.80
.80
.82
.91
.89
.85
.88
.88
.56
.84
.95
.95
.91
.91
.84
.87
.88
.95
.94
.97
.98
.91
.98
.77
.79
.80
.91
.88
.86
.89
.89
.56
.85
.94
.94
.90
.90
.85
.86
.88
.94
.93
RT
Reading
Total
RC
Comprehension
.81
.82
.79
.90
.73
.72
.73
.59
.73
.84
.84
.74
.70
.71
.79
.76
.80
.79
.76
.75
.77
L1
Spelling
.89
.84
.95
.79
.80
.81
.65
.82
.89
.90
.77
.76
.80
.76
.80
.86
.86
.74
.75
.76
.80
L2
Capitalization
.84
.95
.81
.82
.82
.65
.83
.90
.91
.76
.79
.80
.80
.82
.86
.87
.73
.77
.76
.77
.87
L3
.93
.85
.88
.88
.61
.85
.94
.93
.86
.90
.86
.85
.88
.93
.93
.86
.90
.90
.74
.79
.80
L4
.86
.87
.87
.68
.87
.96
.96
.84
.85
.85
.86
.88
.93
.93
.84
.86
.87
.88
.94
.94
.91
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.92
.97
.72
.96
.92
.92
.83
.85
.85
.82
.86
.91
.91
.79
.87
.85
.70
.73
.76
.84
.83
M1
.98
.64
.95
.94
.93
.86
.91
.90
.86
.91
.94
.94
.79
.87
.85
.65
.72
.74
.87
.82
.92
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.68
.97
.96
.95
.86
.89
.89
.86
.91
.95
.95
.81
.88
.87
.69
.73
.76
.87
.84
.97
.98
MT -
Math
Total
.83
.66
.70
.52
.53
.59
.54
.57
.60
.62
.55
.56
.57
.58
.59
.60
.55
.63
.71
.61
.67
M3
Computation
Mathematics
.93
.94
.82
.84
.85
.82
.87
.91
.92
.78
.85
.84
.71
.74
.77
.83
.83
.96
.93
.96
.84
MT +
Math
Total
.99
.90
.91
.89
.90
.93
.99
.98
.91
.95
.95
.82
.86
.87
.94
.95
.93
.92
.94
.65
.92
CT -
Core
Total
.89
.90
.88
.89
.92
.98
.98
.91
.94
.95
.83
.87
.88
.93
.96
.92
.91
.93
.71
.93
.99
CT +
Core
Total
.90
.87
.85
.89
.94
.94
.83
.87
.87
.70
.77
.72
.87
.84
.81
.85
.85
.54
.81
.89
.88
SS
Social
Studies
.91
.87
.91
.95
.95
.85
.90
.90
.66
.71
.70
.89
.81
.84
.87
.87
.54
.83
.90
.89
.89
SC
Science
.86
.96
.92
.93
.79
.87
.85
.67
.78
.75
.87
.84
.86
.88
.89
.59
.86
.90
.90
.89
.89
S1
.96
.92
.91
.82
.87
.87
.73
.78
.75
.88
.86
.83
.86
.86
.58
.84
.90
.90
.91
.90
.90
S2
.96
.96
.82
.89
.88
.71
.80
.76
.90
.87
.87
.89
.90
.60
.87
.92
.92
.92
.92
.98
.97
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.90
.94
.94
.77
.83
.82
.94
.92
.91
.93
.94
.63
.90
.98
.97
.94
.95
.94
.95
.96
CC -
.90
.94
.94
.78
.84
.82
.94
.92
.91
.92
.94
.65
.91
.98
.98
.94
.95
.95
.94
.97
.99
CC +
Compo- Composite
site
3:16 PM
Note:
.90
.97
.81
.77
.80
.87
.87
.81
.82
.82
.52
.79
.92
.91
.87
.86
.79
.83
.83
.91
.90
RV
Vocabulary
Reading
10/29/10
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problems & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
Level 11 — Grade 5
Level 12 — Grade 6
Table 8.4 (continued)
Correlations Among School Average Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 134
.98
.74
.82
.86
.91
.89
.83
.90
.89
.59
.86
.95
.95
.89
.91
.87
.89
.90
.95
.95
.75
.82
.86
.91
.89
.83
.89
.89
.60
.86
.95
.95
.89
.90
.85
.88
.89
.95
.95
.96
.98
.89
.82
.82
.79
.89
.75
.70
.74
.52
.72
.83
.83
.72
.71
.68
.77
.75
.79
.79
.79
.77
.80
L1
Spelling
.91
.86
.96
.80
.79
.82
.54
.79
.91
.91
.80
.79
.78
.82
.82
.88
.88
.78
.83
.83
.83
L2
Capitalization
.91
.97
.81
.82
.84
.55
.81
.93
.93
.82
.82
.81
.85
.86
.90
.90
.80
.82
.83
.82
.91
L3
.95
.84
.88
.89
.55
.84
.95
.94
.88
.88
.85
.89
.89
.94
.94
.87
.90
.91
.82
.88
.90
L4
.85
.85
.88
.57
.84
.96
.96
.86
.85
.84
.89
.88
.94
.93
.85
.88
.89
.90
.96
.96
.96
LT
Punctu- Usage & Language
ation
Expression
Total
Language
.88
.96
.66
.93
.92
.92
.83
.85
.82
.85
.86
.90
.90
.77
.87
.85
.71
.77
.77
.84
.82
M1
.98
.60
.92
.93
.92
.88
.90
.90
.90
.92
.94
.94
.81
.90
.88
.72
.78
.80
.88
.85
.89
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.65
.96
.96
.95
.88
.91
.89
.90
.92
.95
.95
.81
.91
.89
.73
.79
.81
.88
.86
.97
.98
MT -
Math
Total
.84
.62
.69
.53
.57
.55
.58
.58
.61
.63
.53
.59
.58
.56
.54
.56
.56
.58
.63
.58
.62
M3
Computation
Mathematics
.92
.94
.83
.86
.84
.86
.88
.91
.92
.78
.88
.86
.74
.78
.79
.84
.84
.94
.92
.95
.83
MT +
Math
Total
.99
.91
.92
.89
.93
.93
.99
.98
.91
.96
.96
.85
.90
.91
.96
.96
.92
.94
.95
.62
.92
CT -
Core
Total
.90
.92
.88
.92
.93
.98
.98
.91
.95
.96
.86
.90
.91
.95
.96
.91
.92
.94
.69
.94
.99
CT +
Core
Total
.91
.87
.89
.91
.95
.95
.84
.91
.90
.72
.78
.80
.88
.85
.84
.89
.89
.56
.85
.92
.91
SS
Social
Studies
.90
.90
.93
.96
.96
.83
.90
.90
.74
.80
.81
.87
.86
.85
.88
.89
.58
.86
.92
.91
.90
SC
Science
.89
.97
.93
.94
.78
.88
.86
.69
.75
.73
.86
.81
.87
.90
.90
.52
.85
.89
.88
.87
.90
S1
.97
.95
.94
.81
.88
.87
.78
.83
.84
.90
.89
.82
.87
.87
.53
.82
.91
.91
.89
.90
.88
S2
.96
.96
.82
.90
.89
.75
.81
.80
.90
.87
.87
.91
.91
.54
.86
.93
.92
.90
.93
.97
.97
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.99
.88
.95
.95
.80
.86
.86
.94
.92
.90
.93
.94
.59
.90
.98
.97
.95
.96
.93
.95
.97
CC -
.88
.95
.95
.80
.85
.86
.94
.92
.90
.93
.94
.63
.92
.98
.97
.94
.96
.94
.93
.96
.99
CC +
Compo- Composite
site
3:16 PM
-Does not include Computation
+Includes Computation
.90
.97
.72
.77
.80
.86
.84
.78
.83
.83
.56
.81
.91
.91
.84
.85
.78
.83
.82
.89
.89
Vocabulary
Comprehension
Reading Total
Spelling
Capitalization
Punctuation
Usage and Expression
Language Total
Concepts and Estimation
Problems & Data Interpretation
Math Total without Computation
Computation
Math Total with Computation
Core Total without Computation
Core Total with Computation
Social Studies
Science
Maps and Diagrams
Reference Materials
Sources Total
Composite without Computation
Composite with Computation
RT
Reading
Total
RC
Comprehension
Reading
10/29/10
Note:
RV
Vocabulary
Level 13 — Grade 7
Level 14 — Grade
Table 8.4 (continued)
Correlations Among School Average Developmental Standard Scores
Iowa Tests of Basic Skills — Complete Battery, Form A
2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 135
135
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 136
an ITBS score from the CogAT Nonverbal score
(NV) has the following form:
2
3
Predicted ITBS SS = b1NV + b2NV + b3NV + c
where NV stands for a student’s obtained score on
the CogAT Nonverbal Battery, the b’s stand for the
slopes (b1, b2, b3 ), and c stands for the intercept in
the prediction equation.
The prediction equations are used to compare
expected and obtained achievement in combined
ITBS and CogAT score reports. School districts
decide which CogAT score to use in predicting ITBS
standard scores. Separate equations are used for
each ITBS test or Total score for fall, midyear, and
spring testing. Choosing the CogAT test for
predicting achievement depends on the purpose of
combined ability-achievement reporting. If the
objective is reliability of the difference between
predicted and obtained achievement, which is
important when predicting for individuals, the
Nonverbal Battery is recommended. This is because
its correlations with ITBS tests are relatively low. If
the objective is accuracy in estimating ITBS scores,
which is important in setting expectations for
classes, buildings, or districts, the CogAT Composite
should be used. This is because its correlations with
ITBS tests are high.
The prediction of achievement with information
about cognitive ability is discussed in Thorndike’s
(1963) monograph, The Concepts of Over- and
Under-Achievement. Two problems deserve special
mention. The first is excessive overlap of predictors
and criterion and the second is unreliability.
Reliabilities of difference scores and standard
deviations of difference scores due to measurement
error are presented in Table 8.6. Although scores on
general measures of cognitive ability and
educational achievement correlate highly, scores for
individuals can show discrepancies. When expected
achievement is calculated from ability, some
discrepancy exists. Whether the discrepancy means
anything is a subjective question. Some occur simply
because of measurement errors; others are due to
true differences between cognitive ability and school
achievement.
Although the precise influence of ability on
achievement is difficult to determine, one can
estimate the size of discrepancies caused by
measurement error. The reliabilities of differences
(rD) and the standard deviations of difference scores
due to measurement error (SDE) reported in Table
136
8.6 help to interpret ability-achievement
differences. These values were computed from the
correlations between scores on the ITBS and the
CogAT, the K-R20 reliability coefficients from the
ITBS and CogAT standardization, and the standard
deviations of the weighted national standardization
sample.
The first statistic in Table 8.6, rD, estimates the
reliability of the predicted difference between actual
and expected ITBS scores for students with the
same Standard Age Scores. These coefficients are
lower than the reliabilities of ITBS and CogAT
scores because reliabilities of differences are affected
by measurement error in both tests and by the
correlation between tests.
The next statistic, SDE, is the standard deviation of
differences between expected and actual ITBS
scores due to measurement error for students with
the same CogAT Standard Age Score. If abilityachievement discrepancies were produced by
measurement error alone, SDE would be sizable. SDE
values are helpful in understanding differences
between expected and obtained ITBS scores, as the
following example demonstrates.
Obtained Versus Expected Achievement
According to the system of prediction equations
described previously, the expected ITBS Standard
Score in the spring of grade 5 on Language Total for
a student with a Standard Age Score of 97 on the
CogAT Composite is 214. From Table 8.6, the
standard deviation of differences due to errors in
measurement is 8.8. This means that differences of
8.8 or more standard score units between expected
performance and actual performance occur about 32
percent of the time because of measurement error
(i.e., 16 percent in each direction). However, the
standard error of estimate computed from the
correlations and standard deviations in Tables 8.5
and 5.1 is 20.4. The difference between a student’s
actual and predicted standard score has to be at
least this large before it will be identified as
discrepant by the combined achievement/ability
reporting system. Because the standard error of
estimate (20.4) is more than twice as large as the
standard deviation of differences due to
measurement error (8.8), the probability of a
student’s discrepancy score being labeled as extreme
because of measurement error is very unlikely.
K
K
1
1
2
A
B
C
D
E
F
K
K
1
1
2
A
B
C
D
E
F
K
K
1
1
2
3
4
5
6
7
8
Verbal
Quantitative
.27
.56
.57
.61
.63
.65
.65
.66
.69
.67
.24
.50
.53
.61
.76
.77
.78
.79
.79
.80
-Does not include Computation
+Includes Computation
.56
.52
.54
.61
.63
.61
.60
.60
.59
.61
.60
.66
.60
.66
.61
.69
.75
.75
.76
.78
.80
.80
RC
RV
.43
.59
.63
.66
.66
.66
.66
.66
.69
.67
.38
.52
.60
.69
.80
.80
.81
.83
.84
.84
RT
Reading
Total
.50
.50
.56
.58
.61
.58
.58
.59
.44
.44
.63
.63
.65
.63
.63
.65
L1
Spelling
.60
.62
.64
.62
.64
.63
.62
.61
.65
.65
.67
.69
L2
Capitalization
.61
.64
.65
.64
.66
.65
.62
.64
.66
.65
.66
.68
L3
Punctuation
.64
.65
.66
.66
.68
.67
.74
.73
.74
.75
.76
.74
L4
.70
.66
.70
.66
.66
.71
.72
.73
.71
.72
.71
.74
.66
.71
.62
.62
.76
.75
.77
.76
.76
.77
LT
.72
.72
.74
.77
.79
.79
.82
.82
.66
.63
.70
.71
.72
.72
.73
.75
M1
.73
.73
.72
.74
.75
.75
.77
.76
.65
.63
.73
.71
.72
.71
.72
.72
M2
.72
.68
.74
.78
.78
.78
.80
.81
.81
.83
.83
.57
.55
.66
.66
.65
.66
.62
.60
.46
.42
.51
.50
.51
.50
.46
.46
M3
MT -
.68
.60
.67
.71
.68
.76
.75
.76
.75
.76
.77
Computation
Math
Total
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
.77
.76
.74
.76
.77
.77
.78
.79
.69
.65
.66
.66
.70
.68
.68
.70
MT +
Math
Total
.74
.71
.76
.75
.77
.77
.78
.79
.79
.80
.80
.79
.73
.79
.71
.74
.84
.83
.84
.84
.85
.85
CT -
Core
Total
.74
.76
.78
.79
.79
.79
.80
.79
.69
.73
.84
.83
.84
.84
.84
.85
CT +
Core
Total
.62
.56
.62
.63
.62
.63
.67
.64
.69
.64
.74
.73
.74
.75
.75
.73
SS
Social
Studies
.61
.60
.64
.66
.68
.67
.67
.67
.67
.67
.73
.73
.75
.75
.74
.75
SC
Science
.65
.67
.71
.70
.70
.68
.69
.71
.71
.69
.71
.69
S1
.66
.66
.68
.67
.67
.67
.73
.72
.73
.74
.71
.73
S2
.64
.68
.70
.72
.75
.74
.74
.72
.61
.65
.77
.77
.77
.77
.76
.76
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.79
.79
.76
.77
.78
.77
.79
.78
.78
.80
.84
.84
.85
.85
.85
.85
CC -
.78
.79
.76
.77
.79
.78
.79
.78
.78
.80
.84
.83
.85
.84
.84
.84
CC +
Compo- Composite
site
.62
.55
.63
.62
.62
-
.58
.50
.58
.57
.59
-
WA
Word
Analysis
.68
.61
.62
.64
.63
-
.73
.67
.68
.68
.68
-
Li
Listening
.73
.67
.73
.70
.72
-
.77
.69
.74
.67
.72
-
RPT
Reading
Profile
Total
3:16 PM
Note:
5
6
6
7
8
9
10
11
12
13
14
Grade
CogAT Level
K
K
1
1
2
3
4
5
6
7
8
Comprehension
Vocabulary
Reading
10/29/10
5
6
6
7
8
9
10
11
12
13
14
ITBS Level
Table 8.5
Correlations Between Standard Age Scores, Cognitive Abilities Test, Form 6
and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 137
137
K
K
1
1
2
A
B
C
D
E
F
K
K
1
1
2
3
4
5
6
7
8
Nonverbal
Composite
.29
.60
.60
.66
.74
.75
.75
.76
.78
.77
.27
.51
.48
.52
.61
.62
.61
.62
.63
.63
-Does not include Computation
+Includes Computation
.61
.57
.59
.65
.69
.71
.70
.70
.71
.72
.72
.40
.38
.37
.48
.51
.56
.55
.55
.54
.55
.56
RC
RV
.46
.63
.66
.72
.77
.77
.77
.78
.79
.79
.40
.54
.51
.55
.62
.62
.62
.62
.63
.63
RT
Reading
Total
.53
.51
.61
.63
.65
.62
.62
.64
.46
.43
.46
.49
.51
.47
.47
.49
L1
Spelling
.65
.66
.69
.67
.70
.70
.54
.57
.59
.55
.58
.59
L2
Capitalization
.66
.70
.70
.69
.71
.71
.57
.60
.60
.59
.61
.60
L3
Punctuation
.73
.73
.74
.75
.76
.75
.60
.60
.61
.61
.62
.63
L4
.75
.70
.73
.70
.69
.78
.78
.79
.78
.78
.79
.56
.50
.53
.55
.56
.64
.65
.66
.64
.65
.65
LT
Usage & Language
Expression
Total
Language
.74
.73
.78
.80
.82
.82
.83
.84
.58
.59
.67
.69
.71
.70
.71
.72
M1
.74
.73
.78
.79
.80
.80
.81
.80
.56
.58
.66
.68
.69
.68
.70
.70
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
.75
.70
.75
.80
.78
.83
.84
.84
.85
.86
.86
.58
.55
.62
.62
.63
.63
.59
.57
.49
.48
.54
.54
.54
.53
.51
.49
M3
MT -
.59
.54
.58
.61
.63
.71
.73
.73
.73
.74
.75
Computation
Math
Total
Mathematics
.79
.77
.77
.78
.80
.79
.79
.81
.62
.63
.63
.67
.69
.67
.68
.70
MT +
Math
Total
.79
.76
.80
.79
.81
.85
.86
.87
.87
.87
.88
.58
.55
.56
.61
.64
.71
.72
.73
.71
.72
.73
CT -
Core
Total
.78
.81
.86
.86
.87
.86
.87
.87
.60
.64
.71
.72
.72
.71
.72
.72
CT +
Core
Total
.68
.61
.72
.71
.72
.73
.75
.73
.47
.42
.59
.59
.59
.59
.62
.61
SS
Social
Studies
.66
.65
.74
.74
.76
.76
.76
.76
.47
.47
.62
.63
.64
.64
.65
.65
SC
Science
.73
.75
.77
.77
.77
.76
.64
.67
.69
.68
.69
.70
S1
.74
.74
.75
.75
.74
.75
.61
.63
.64
.64
.63
.65
S2
.69
.73
.79
.80
.82
.82
.81
.81
.56
.59
.68
.70
.72
.71
.71
.72
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
.84
.84
.85
.86
.87
.87
.87
.87
.63
.65
.71
.73
.73
.72
.73
.74
CC -
.83
.84
.85
.86
.87
.87
.87
.87
.63
.64
.71
.73
.73
.73
.74
.74
CC +
Compo- Composite
site
.65
.56
.66
.65
.65
-
.53
.44
.53
.53
.52
-
WA
Word
Analysis
.74
.67
.66
.68
.67
-
.56
.48
.46
.48
.48
-
Li
Listening
.79
.73
.78
.74
.77
-
.58
.55
.60
.57
.59
-
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
A
B
C
D
E
F
K
K
1
1
2
3
4
5
6
7
8
Comprehension
Vocabulary
Reading
10/29/10
5
6
6
7
8
9
10
11
12
13
14
5
6
6
7
8
9
10
11
12
13
14
Grade
CogAT Level
138
ITBS Level
Table 8.5 (continued)
Correlations Between Standard Age Scores, Cognitive Abilities Test, Form 6
and Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 138
Note:
SDE
9.03
8.06
10.32
7.21
8.69
7.79
8.14
9.39
10.34
11.19
11.78
.79
.87
.81
.76
.77
.76
.74
.70
.69
.70
RT
Reading
Total
4.48 2.70
5.41 4.20
5.08 4.70
7.68 6.81
8.31 6.29
10.25 7.19
11.42 8.16
12.56 9.07
12.90 9.63
13.78 10.19
.61
.81
.83
.75
.73
.69
.68
.66
.70
.69
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
.36
.55
.41
.74
.63
.70
.72
.68
.64
.59
.59
RC
Comprehension
4.25
7.17
7.10
9.02
10.06
11.21
11.35
12.35
.82
.74
.80
.79
.80
.80
.82
.79
L1
Spelling
11.49
15.21
16.89
19.84
20.83
21.63
.75
.72
.71
.66
.66
.65
L2
Capitalization
11.22
14.05
15.57
17.08
18.70
20.22
.73
.72
.73
.76
.74
.71
L3
Punctuation
.36
.46
.44
.71
.71
.83
.84
.83
.83
.83
.82
LT
.51
.56
.62
.69
.69
.73
.73
.71
M1
5.09
5.21
7.04
5.66 7.12
7.25 8.55
10.32 6.02 8.51
12.15 7.56 8.78
14.72 8.76 9.90
16.02 9.93 10.25
17.35 10.59 11.08
19.31 11.53 11.97
.70
.74
.70
.71
.70
.70
L4
.49
.51
.57
.62
.68
.73
.75
.76
.77
.77
.77
.79
.77
.80
.82
.83
.79
.79
.81
M3
MT -
5.12
5.49
6.58
7.90 5.84 3.75
9.30 6.99 5.65
9.89 6.94 6.42
11.96 7.93 7.52
13.06 8.88 8.66
14.87 9.77 11.58
15.71 10.50 13.53
16.39 11.13 14.17
.57
.63
.64
.64
.66
.66
.67
.69
M2
Computation
Math
Total
4.34
5.43
5.20
5.99
6.88
7.95
8.73
9.30
.71
.76
.85
.86
.85
.85
.85
.85
MT +
Math
Total
4.51
4.29
5.52
4.11
5.40
4.76
5.62
6.42
7.28
7.99
8.48
.43
.59
.50
.79
.74
.81
.82
.81
.80
.79
.78
CT -
Core
Total
3.79
5.07
4.45
5.27
6.09
7.00
7.54
8.12
.82
.77
.82
.83
.82
.80
.81
.79
CT +
Core
Total
8.46
11.28
9.14
10.85
12.56
14.54
15.27
16.06
.41
.32
.61
.64
.64
.62
.65
.69
SS
Social
Studies
9.69
12.29
10.39
11.11
12.31
13.48
14.34
15.48
.40
.39
.64
.69
.69
.69
.71
.69
SC
Science
11.65
13.23
15.40
17.35
19.65
19.85
.59
.63
.62
.63
.58
.64
S1
7.84
9.96
11.29
13.57
14.29
15.99
.66
.68
.70
.65
.71
.66
S2
5.91
7.78
7.47
8.83
10.25
11.87
13.00
13.68
.67
.64
.69
.72
.72
.70
.72
.72
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
4.09
5.36
4.60
5.45
6.35
7.27
7.88
8.45
.72
.65
.82
.82
.81
.80
.80
.80
CC -
.63
.70
.63
.72
.69
-
WA
Word
Analysis
7.05
6.80
9.07
3.96 7.99
5.29 10.62
4.54
5.34
6.29
7.18
7.73
8.30
-
.71
.65
.82
.84
.81
.81
.82
.81
CC +
Compo- Composite
site
5.73
6.11
7.38
8.07
9.54
-
.39
.41
.44
.33
.37
-
Li
Listening
5.00
4.69
4.56
4.00
5.50
-
.50
.66
.71
.82
.76
-
RPT
Reading
Profile
Total
3:16 PM
rD
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
RV
Vocabulary
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
Level Grade
Verbal
Reading
Table 8.6
Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE )
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 139
139
SDE
.61
.80
.82
.78
.82
.79
.79
.79
.79
.81
4.49
5.36
4.96
7.26
8.14
10.09
11.27
12.30
12.77
13.51
.51
.63
.57
.76
.73
.80
.82
.80
.80
.78
.79
8.70
7.77
9.86
6.91
8.02
7.63
7.93
9.19
9.97
10.81
11.34
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
rD
RC
RV
2.70
4.15
4.52
6.11
6.09
6.96
7.94
8.64
9.26
9.72
.78
.86
.82
.82
.86
.86
.85
.85
.84
.85
RT
Reading
Total
4.23
7.11
7.08
9.03
10.18
11.24
11.39
12.39
.81
.73
.82
.81
.81
.82
.83
.81
L1
Spelling
11.58
15.40
17.15
19.96
20.99
21.70
.75
.71
.70
.67
.68
.69
L2
Capitalization
11.32
14.22
15.84
17.33
19.04
20.46
.73
.71
.73
.75
.73
.73
L3
Punctuation
.47
.48
.49
.70
.71
.86
.85
.84
.85
.85
.85
LT
.44
.46
.55
.60
.56
.62
.57
.56
M1
.48
.54
.64
.59
.60
.59
.58
.62
M2
.45
.41
.49
.55
.58
.69
.67
.67
.67
.64
.65
.76
.73
.72
.74
.76
.70
.71
.75
M3
MT -
.64
.69
.79
.78
.78
.76
.76
.75
MT +
Math
Total
4.89
5.05
5.11
5.50
6.80
6.49
5.52 7.03 7.82 5.72 3.78 4.26
7.00 8.46 9.20 6.85 5.71 5.35
10.23 6.09 8.69 10.02 7.17 6.68 5.51
12.10 7.72 9.07 12.21 8.29 7.88 6.44
14.75 9.04 10.34 13.45 9.43 9.13 7.45
15.91 10.07 10.76 15.31 10.37 12.14 8.62
17.30 10.82 11.75 16.25 11.26 14.10 9.53
19.36 11.69 12.68 17.00 11.98 14.79 10.22
.78
.79
.76
.78
.77
.75
L4
Computation
Math
Total
4.22
4.09
5.13
3.86
4.91
4.78
5.73
6.69
7.46
8.20
8.78
.59
.65
.61
.79
.76
.86
.85
.84
.83
.83
.82
CT -
Core
Total
3.58
4.60
4.51
5.42
6.35
7.18
7.81
8.35
.81
.79
.86
.85
.84
.84
.83
.84
CT +
Core
Total
8.11
10.89
9.04
10.77
12.46
14.35
15.24
16.00
.54
.45
.72
.72
.74
.73
.72
.75
SS
Social
Studies
9.35
11.76
10.34
11.09
12.38
13.44
14.35
15.48
.51
.52
.71
.74
.74
.76
.76
.76
SC
Science
11.70
13.30
15.68
17.64
19.88
20.15
.62
.67
.60
.60
.58
.64
S1
7.83
9.97
11.41
13.57
14.41
16.07
.72
.72
.73
.71
.74
.71
S2
5.77
7.49
7.47
8.89
10.54
12.07
13.27
13.91
.67
.64
.75
.76
.73
.72
.72
.75
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.70
4.67
4.59
5.47
6.53
7.27
8.03
8.60
.76
.75
.87
.87
.86
.86
.85
.85
CC -
.61
.68
.61
.70
.69
-
WA
Word
Analysis
6.99
6.79
8.95
3.56 7.86
4.62 10.30
4.53
5.41
6.52
7.31
7.94
8.52
-
.77
.75
.88
.87
.85
.85
.85
.85
CC +
Compo- Composite
site
5.49
5.90
7.09
7.83
9.16
-
.52
.52
.55
.43
.48
-
Li
Listening
4.70
4.46
4.19
3.75
4.93
-
.62
.71
.76
.83
.81
-
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
Comprehension
Vocabulary
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
Level Grade
Quantitative
140
Reading
Table 8.6 (continued)
Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE )
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 140
SDE
8.45
7.54
9.57
6.47
7.60
7.38
7.66
8.86
9.62
10.44
11.02
4.48
5.15
4.66
6.89
7.88
9.76
10.81
11.77
12.07
12.91
.61
.83
.86
.83
.83
.81
.82
.82
.84
.84
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
rD
.62
.70
.67
.83
.80
.83
.85
.83
.83
.81
.81
RC
RV
.83
.76
.86
.84
.85
.86
.87
.85
L1
Spelling
.79
.74
.75
.73
.73
.73
L2
Capitalization
.76
.75
.77
.79
.78
.77
L3
Punctuation
2.67
3.88
4.13 4.11
5.57 6.96
5.75 6.81 11.27 11.07
6.58 8.70 15.04 13.87
7.45 9.67 16.64 15.31
8.10 10.68 19.41 16.70
8.60 10.81 20.41 18.38
9.15 11.82 21.20 19.84
.79
.88
.87
.88
.89
.88
.88
.88
.88
.88
RT
Reading
Total
.64
.64
.68
.78
.78
.90
.89
.89
.90
.90
.89
LT
.63
.63
.65
.72
.70
.75
.75
.74
M1
.69
.71
.71
.67
.69
.70
.70
.71
M2
.62
.58
.70
.76
.77
.78
.78
.79
.79
.80
.79
.80
.77
.79
.81
.83
.78
.78
.80
M3
MT -
4.69
4.84
4.90
5.31
6.45
6.09
5.20 6.70 7.37 5.21 3.62
6.65 8.11 8.68 6.29 5.56
9.90 5.57 8.42 9.69 6.76 6.42
11.64 7.08 8.67 11.81 7.78 7.53
14.16 8.14 9.81 12.90 8.69 8.67
15.20 9.10 10.07 14.64 9.53 11.58
16.51 9.80 10.93 15.53 10.29 13.60
18.70 10.75 11.86 16.30 11.03 14.24
.81
.82
.80
.82
.81
.79
L4
Computation
Math
Total
3.82
4.86
5.08
5.93
6.76
7.78
8.63
9.30
.81
.82
.86
.86
.86
.86
.85
.85
MT +
Math
Total
3.91
3.78
4.60
3.28
4.27
4.23
5.05
5.78
6.36
7.04
7.70
.76
.79
.81
.89
.88
.91
.91
.90
.91
.90
.89
CT -
Core
Total
3.01
4.00
3.92
4.70
5.41
6.10
6.67
7.29
.90
.88
.91
.91
.91
.91
.91
.90
CT +
Core
Total
7.82
10.66
8.85
10.51
12.12
13.92
14.71
15.56
.66
.56
.74
.76
.77
.76
.77
.78
SS
Social
Studies
9.06
11.45
10.11
10.77
11.89
12.92
13.87
15.02
.63
.63
.74
.77
.78
.79
.79
.78
SC
Science
11.51
13.05
15.27
17.19
19.49
19.89
.64
.68
.64
.64
.61
.63
S1
.74
.73
.78
.79
.78
.77
.77
.76
ST
5.53
7.18
7.60 7.22
9.70 8.54
10.98 9.98
13.13 11.47
13.91 12.68
15.67 13.49
.76
.75
.77
.75
.77
.74
S2
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.04
4.00
4.05
4.84
5.62
6.29
6.94
7.68
.90
.88
.92
.91
.91
.91
.91
.90
CC -
2.95
3.92
3.98
4.77
5.56
6.31
6.92
7.61
.90
.88
.92
.91
.91
.91
.90
.90
CC +
Compo- Composite
site
6.79
6.58
8.62
7.51
9.89
-
.69
.74
.69
.77
.76
-
WA
Word
Analysis
5.28
5.72
6.82
7.58
8.88
-
.65
.63
.67
.59
.62
-
Li
Listening
4.37
4.20
3.70
3.24
4.34
-
.77
.79
.86
.90
.89
-
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
Comprehension
Vocabulary
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
Level Grade
Nonverbal
Reading
Table 8.6 (continued)
Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE )
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 141
141
SDE
8.48
7.57
9.65
6.50
7.61
7.28
7.54
8.76
9.53
10.31
10.82
4.46
5.08
4.64
6.83
7.70
9.53
10.60
11.50
11.70
12.41
.61
.80
.84
.78
.78
.75
.75
.75
.77
.78
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
rD
.49
.62
.56
.77
.72
.77
.80
.77
.76
.74
.74
RC
Comprehension
.82
.74
.82
.81
.82
.82
.84
.82
L1
Spelling
.75
.70
.70
.66
.66
.67
L2
Capitalization
.72
.69
.72
.75
.73
.72
L3
Punctuation
2.62
3.79
4.13 4.06
5.54 6.90
5.57 6.77 11.13 10.90
6.36 8.61 14.81 13.63
7.23 9.56 16.38 15.02
7.87 10.61 19.20 16.35
8.26 10.69 20.10 17.92
8.72 11.65 20.76 19.33
.79
.87
.84
.83
.84
.83
.83
.82
.82
.83
RT
Reading
Total
.45
.47
.50
.70
.72
.86
.86
.86
.86
.86
.85
LT
.46
.49
.53
.61
.57
.64
.63
.61
M1
.52
.59
.60
.55
.57
.57
.58
.62
M2
.46
.43
.54
.59
.66
.67
.67
.70
.68
.69
.69
.78
.75
.77
.79
.80
.75
.75
.79
M3
MT -
4.65
4.76
4.89
5.26
6.48
6.09
5.19 6.69 7.38 5.22 3.59
6.57 8.04 8.61 6.19 5.48
9.72 5.34 8.27 9.51 6.53 6.32
11.39 6.73 8.44 11.56 7.45 7.39
13.89 7.73 9.55 12.60 8.30 8.51
14.88 8.68 9.74 14.31 9.09 11.42
16.07 9.20 10.50 15.05 9.70 13.39
18.08 10.05 11.30 15.66 10.23 13.99
.75
.77
.73
.75
.74
.73
L4
Computation
Math
Total
3.81
4.75
4.93
5.66
6.42
7.42
8.13
8.62
.69
.74
.81
.82
.81
.81
.82
.80
MT +
Math
Total
3.87
3.75
4.68
3.27
4.17
3.90
4.61
5.29
5.89
6.36
6.84
.58
.65
.62
.82
.80
.86
.85
.85
.84
.85
.83
CT -
Core
Total
3.01
3.90
3.61
4.26
4.96
5.61
5.99
6.46
.84
.81
.86
.87
.85
.86
.85
.85
CT +
Core
Total
7.88
10.70
8.73
10.34
11.95
13.73
14.38
15.13
.50
.42
.66
.69
.70
.68
.69
.72
SS
Social
Studies
9.10
11.47
9.95
10.53
11.62
12.59
13.41
14.46
.48
.49
.66
.71
.71
.72
.73
.72
SC
Science
11.33
12.75
14.92
16.81
19.00
19.20
.56
.61
.56
.55
.52
.58
S1
.66
.63
.70
.72
.70
.68
.70
.70
ST
5.50
7.09
7.48 7.00
9.50 8.21
10.73 9.58
12.83 11.03
13.54 12.10
15.17 12.73
.68
.69
.71
.68
.72
.68
S2
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.07
3.93
3.70
4.33
5.09
5.73
6.14
6.64
.79
.77
.88
.87
.86
.86
.86
.86
CC -
2.97
3.88
3.63
4.26
5.04
5.70
6.07
6.58
.80
.77
.88
.88
.86
.86
.86
.86
CC +
Compo- Composite
site
6.66
6.51
8.59
7.46
9.81
-
.62
.70
.61
.72
.70
-
WA
Word
Analysis
5.23
5.71
6.86
7.61
8.90
-
.48
.49
.54
.40
.46
-
Li
Listening
4.32
4.13
3.71
3.24
4.30
-
.60
.70
.78
.85
.83
-
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
RV
Vocabulary
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
10/29/10
Level Grade
Composite
142
Reading
Table 8.6 (continued)
Reliabilities of Difference Scores (rD ) and Standard Deviations of Difference Scores Due to Errors of Measurement (SDE )
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 142
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 143
Predicting Achievement from General
Cognitive Ability: Group Averages
Norms for school averages show the rank of a school
in the distribution of all schools that participated in
the national standardization. Norms for school
averages are included in the Norms and Score
Conversions manuals for the ITBS. They also appear
on reports available from the Riverside Scoring
Service.
When interpreting averages for a group, how the
group compares with groups similar in average
ability is also of interest. Few groups of students are
exactly at the national average in cognitive ability;
most are above or below. Grade groups can also
differ markedly in average ability from year to year,
even within the same building. Such factors should
be considered when interpreting achievement test
results.
Comparisons of average achievement and ability are
based on data from the spring 2000 standardization
of the ITBS and the CogAT. Average scores were
obtained for each building; Developmental Standard
Scores (SS) for the ITBS and Standard Age Scores
(SAS) for the CogAT were used. Records of the two
sets of scores were matched by school building.
Correlations between CogAT and ITBS averages
appear in Table 8.7. As expected, correlations of the
ITBS tend to be higher with the CogAT Verbal
Battery and lower with the CogAT Nonverbal
Battery. Correlations between averages from the
ITBS Math tests and CogAT Quantitative Battery
are also high. Correlations between CogAT and
ITBS averages are generally lower for Spelling and
Math Computation.
The combined ITBS/CogAT scoring service
described in the Interpretive Guide for School
Administrators is used with individual student
scores. Class summaries are furnished at the end of
each report. A classroom teacher can compare the
average obtained ITBS scores to the average ITBS
scores expected from prediction equations used for
combined reporting. Such summaries are also
described for building and system averages. When
combined ITBS/CogAT score reports are
unavailable, Table 8.7 can be used to compare ITBS
and CogAT averages.
1
Table 8.7 gives values needed for predicting average
ITBS scores from average CogAT scores. Values in
the table include the slope (b) and the intercept (c)
of the equations for predicting ITBS building
averages from CogAT building averages. The
equations are in the form:
Predicted ITBS SS = b (SAS) + c
where SAS is the average CogAT Standard Age
Score and b and c are the prediction constants.1
The following example illustrates how to construct a
prediction equation for averages. Suppose the
average Nonverbal SAS for a school in the spring of
grade 6 was 92.6 (i.e., SAS = 92.6). The predicted
average ITBS Composite (CC + in standard score
units) for such a school would be 1.897 (92.6) + 42.3 =
218. The values of b = 1.897 and c = 42.3 come from
Table 8.7. The predicted average Composite score of
218 is an estimate of the average achievement for
schools with a Nonverbal SAS of 92.6. The ITBS
norms for building averages indicate that 26 percent
of the school buildings in the national
standardization had an average standard score
Composite lower than 218. The prediction equation
for averages simply indicates that for a school with
an SAS score below the national average, the ITBS
building average is also expected to be below the
national average.
Comparing the predicted average achievement for a
school with the actual ITBS school average allows
an administrator to determine if average
achievement is above or below expectation. It does
not indicate whether the magnitude of a difference
is important. To evaluate the importance of the
discrepancy, the standard errors of estimate (S y.x) in
Table 8.7 are helpful. These values estimate the
standard deviation of ITBS school averages for a
group of schools with the same average CogAT SAS.
They measure the variability expected in average
achievement of schools similar in average cognitive
ability.
Third-degree polynomial regression equations were used for the combined ITBS/CogAT individual scoring because the
relationships between ability and achievement results for individual students tend to be curvilinear; however, the relationships
between ITBS and CogAT school averages appear to be linear.
143
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 144
To illustrate the use of standard errors of estimate,
consider the previous example. In Table 8.7, Sy.x for
CC+ in grade 6 when predicted by CogAT Nonverbal
is 9.2 ITBS standard score units. This value, 9.2,
represents the amount by which an average
standard score must differ from the predicted
average standard score to be in the upper or lower
16 percent of school averages. This means that for
schools with an average SAS of 92.6, 16 percent
would be expected to have average ITBS Composites
above 227.2 (218 + 9.2) and 16 percent would be
expected to have average Composites below 208.8
(218 – 9.2).1
1
The prediction constants and standard errors of
estimate in Table 8.7 were obtained from schools
that administered the ITBS in the spring. They are
applicable only at this time of year. Estimates of
expected average achievement at fall and midyear
can be obtained by adjusting the predicted ITBS SS
by grade-to-grade differences in average standard
scores. Information about these adjustments can be
obtained from the publisher.
Adding and subtracting standard errors of estimate to predicted mean standard scores is not strictly appropriate in evaluating
a discrepancy between actual and predicted achievement. The standard error of estimate does not account for errors
associated with the establishment of the prediction equation. However, accounting for such errors would, in most applications,
increase the reported standard errors by less than 2 percent. (Note: The top and bottom 10 percent of averages may be
obtained by adding and subtracting 1.28 standard errors of estimate to the predicted average grade equivalent; for the top and
bottom 5 percent, use 1.64 standard errors.)
144
K
K
1
1
2
3
4
5
6
7
8
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
5
6
6
7
8
Sy.x 9
10
11
12
13
14
c
b
4.6
4.6
5.1
6.5
6.8
6.2
7.3
7.0
9.2
9.5
123.5
118.5
94.3
86.0
66.0
52.0
42.7
24.0
20.1
12.2
.117
.332
.588
.852
1.241
1.527
1.752
2.023
2.183
2.381
3.7
4.0
5.6
5.7
6.1
5.0
5.8
5.9
7.1
7.5
121.1
117.0
83.9
68.1
74.4
65.1
54.7
44.5
43.1
39.9
.142
.348
.690
1.028
1.146
1.384
1.623
1.816
1.954
2.105
.30
.59
.75
.83
.85
.90
.91
.92
.91
.90
RT
Reading
Total
4.4
5.1
5.5
6.7
7.6
10.3
9.9
9.1
113.0
117.5
108.7
99.7
79.3
75.4
95.3
98.2
.379
.509
.779
1.041
1.392
1.531
1.439
1.545
.62
.63
.77
.77
.82
.74
.76
.79
L1
Spelling
-Does not include Computation
+Includes Computation
4.3
7.3
4.9
6.8
6.0
6.8
5.2
6.0
6.2
6.9
7.2
52.5
76.6
73.9
72.3
49.6
79.3
76.8
62.1
64.8
66.1
68.7
.800
.631
.762
.801
1.207
1.084
1.255
1.537
1.611
1.724
1.819
.20
.52
.73
.73
.84
.88
.88
.91
.89
.88
RC
Comprehension
9.9
10.9
11.7
13.9
14.8
14.5
66.5
30.0
15.0
.6
-3.6
-28.2
1.216
1.751
2.043
2.289
2.435
2.826
.72
.78
.81
.77
.80
.82
L2
Capitalization
8.7
10.4
11.1
13.3
14.0
13.9
73.2
36.3
8.8
-6.0
-2.4
-29.3
1.151
1.694
2.110
2.360
2.439
2.833
.75
.78
.83
.80
.81
.84
L3
Punctuation
8.1
9.2
9.6
11.2
11.8
13.4
53.8
27.2
8.1
-35.7
-22.0
-65.4
1.376
1.797
2.138
2.648
2.620
3.176
.82
.83
.87
.87
.87
.87
L4
2.5
4.1
3.7
4.2
5.3
6.6
7.5
7.9
9.2
10.8
10.4
81.7
94.5
91.8
90.9
89.8
77.3
48.8
28.5
9.0
17.4
-5.3
.512
.478
.595
.610
.797
1.115
1.567
1.916
2.204
2.228
2.588
.87
.69
.80
.80
.78
.82
.85
.89
.87
.86
.88
LT
Usage & Language
Expression
Total
Language
4.1
5.5
5.2
6.0
6.3
8.2
8.9
10.4
91.2
88.2
99.3
88.9
84.2
63.8
79.2
46.5
.598
.798
.870
1.118
1.306
1.623
1.584
2.023
.80
.77
.82
.82
.85
.83
.82
.82
M1
4.9
5.9
6.6
6.7
7.7
9.2
9.9
10.7
73.9
80.6
71.5
61.6
44.1
34.1
29.6
8.8
.775
.887
1.166
1.421
1.734
1.940
2.105
2.415
.82
.78
.83
.85
.87
.84
.86
.86
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
2.6
4.3
3.8
4.1
5.1
5.6
6.0
6.4
8.1
8.3
9.4
87.3
92.9
98.0
82.6
85.5
85.8
75.3
64.2
49.6
54.5
27.7
.450
.444
.519
.686
.834
1.015
1.269
1.521
1.776
1.843
2.218
3.6
6.3
7.0
8.5
9.9
13.2
16.0
17.8
119.8
128.0
125.3
119.3
129.1
125.0
107.8
100.2
.307
.414
.602
.818
.852
1.034
1.321
1.526
.61
.48
.59
.59
.56
.50
.55
.54
M3
MT -
.83
.64
.76
.84
.81
.84
.85
.88
.85
.87
.87
Computation
Math
Total
Mathematics
3.6
5.0
5.3
5.9
6.4
8.3
9.2
10.2
95.1
100.1
99.0
89.9
86.0
75.2
72.0
52.3
.559
.690
.878
1.120
1.297
1.525
1.672
1.984
.82
.75
.82
.82
.85
.81
.83
.82
MT +
Math
Total
2.4
4.3
3.4
4.1
4.6
5.2
5.1
5.4
5.9
7.4
7.1
74.0
88.9
87.7
86.7
81.8
79.5
64.4
48.7
34.8
39.9
21.9
.586
.510
.627
.654
.881
1.091
1.395
1.693
1.929
1.995
2.294
.90
.69
.84
.83
.85
.87
.90
.93
.93
.91
.92
CT -
Core
Total
3.9
4.6
5.1
5.0
5.3
6.1
7.4
7.3
90.7
86.8
83.3
69.2
56.0
43.6
45.6
30.3
.614
.832
1.051
1.346
1.618
1.842
1.939
2.213
.82
.83
.87
.90
.92
.91
.90
.92
CT +
Core
Total
4.3
5.1
6.6
6.1
7.7
8.7
9.8
13.0
74.7
89.4
79.6
58.9
47.9
20.9
16.7
-7.3
.773
.807
1.096
1.447
1.710
2.065
2.232
2.564
.86
.79
.82
.88
.87
.87
.88
.83
SS
Social
Studies
4.8
5.5
6.0
7.0
8.5
9.9
10.5
11.7
68.6
61.4
58.7
50.6
29.2
8.5
25.4
-4.5
.821
1.083
1.311
1.552
1.912
2.198
2.159
2.568
.85
.85
.88
.86
.87
.85
.86
.85
SC
Science
7.6
7.5
9.0
9.8
12.0
13.3
65.1
48.4
36.8
12.2
20.4
-4.7
1.249
1.573
1.828
2.162
2.210
2.558
.81
.85
.85
.85
.83
.82
S1
4.9
6.2
8.1
8.8
10.3
11.6
85.5
76.7
58.2
34.4
32.3
-8.1
1.025
1.278
1.615
1.951
2.091
2.600
.87
.84
.85
.85
.85
.86
S2
4.3
5.6
5.6
6.0
7.5
8.6
10.1
11.2
91.2
84.1
75.9
62.3
47.6
23.7
26.8
-6.2
.597
.851
1.132
1.427
1.721
2.053
2.147
2.577
.79
.78
.87
.88
.88
.87
.86
.86
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.6
4.0
5.0
5.0
5.6
6.2
7.7
9.1
82.5
77.8
75.5
61.3
44.7
26.3
32.9
6.5
.696
.923
1.136
1.432
1.744
2.018
2.073
2.446
.87
.88
.89
.91
.93
.92
.91
.90
CC -
3.5
4.0
5.1
5.1
5.7
6.4
7.8
9.2
84.2
79.7
76.1
61.3
46.7
29.0
35.0
11.0
.679
.904
1.132
1.433
1.722
1.992
2.053
2.402
.87
.88
.88
.91
.92
.92
.90
.89
CC +
Compo- Composite
site
5.9
7.0
6.5
7.0
7.7
77.2
96.9
93.5
73.6
56.4
.565
.371
.591
.805
1.150
.64
.39
.61
.73
.78
WA
Word
Analysis
2.4
4.1
3.6
3.9
4.8
74.8
81.3
90.3
91.8
84.6
.581
.602
.603
.596
.841
.90
.76
.82
.81
.82
Li
Listening
3.4
3.5
3.4
4.5
4.9
68.9
98.1
98.9
90.1
79.4
.642
.392
.526
.625
.908
.86
.67
.79
.79
.84
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
.85
.57
.80
.74
.86
.80
.88
.90
.89
.90
.88
RV
Vocabulary
Reading
10/29/10
r
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
Level Grade
Verbal
Table 8.7
Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 145
145
146
K
K
1
1
2
3
4
5
6
7
8
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
b
c
5
6
6
7
8
Sy.x 9
10
11
12
13
14
5.1
7.4
5.4
6.2
6.8
8.0
7.0
8.4
8.7
10.2
10.3
53.6
77.2
69.4
51.8
46.2
82.8
88.5
73.8
83.3
88.1
103.5
.798
.635
.804
.996
1.242
1.056
1.143
1.425
1.427
1.535
1.482
4.5
4.2
4.6
6.5
8.4
7.7
9.2
9.5
11.8
11.7
118.4
106.1
77.7
77.5
71.0
59.9
47.5
37.0
32.1
37.1
.170
.451
.746
.937
1.200
1.453
1.708
1.891
2.103
2.145
.28
.63
.79
.74
.74
.81
.81
.82
.80
.81
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
r
.78
.55
.75
.79
.81
.71
.77
.78
.76
.75
.74
RC
Comprehension
3.5
3.7
5.0
6.0
7.6
6.8
7.9
8.6
10.4
10.4
114.7
106.4
65.5
62.0
78.3
75.2
60.7
60.3
60.2
69.1
.207
.450
.865
1.089
1.116
1.289
1.567
1.657
1.819
1.825
.42
.68
.81
.81
.75
.81
.83
.81
.80
.80
RT
Reading
Total
4.1
5.2
6.0
7.1
8.6
11.2
10.2
10.7
101.1
114.7
107.4
101.9
81.6
81.8
92.2
122.3
.492
.538
.797
1.022
1.372
1.465
1.496
1.312
.69
.62
.72
.73
.76
.69
.74
.68
L1
Spelling
9.8
11.1
12.6
15.2
15.4
17.2
54.2
26.3
10.8
8.1
-8.1
8.9
1.347
1.793
2.088
2.210
2.523
2.470
.73
.77
.78
.72
.78
.74
L2
Capitalization
8.9
10.3
11.8
14.7
14.1
17.1
65.1
30.2
1.7
1.3
-11.4
11.3
1.240
1.759
2.184
2.283
2.572
2.442
.74
.78
.81
.75
.81
.74
L3
Punctuation
9.3
10.6
11.4
14.1
14.0
16.9
53.7
35.4
11.1
-17.1
-15.7
-25.1
1.386
1.721
2.112
2.461
2.603
2.791
.76
.77
.81
.78
.82
.78
L4
2.7
3.8
3.5
3.9
5.1
7.0
8.1
9.2
11.2
11.6
13.6
78.8
90.1
81.8
77.4
80.2
71.6
49.0
27.0
19.1
14.8
30.0
.546
.530
.691
.737
.893
1.179
1.570
1.934
2.100
2.294
2.249
.85
.73
.83
.83
.80
.79
.82
.84
.80
.83
.78
LT
3.6
5.4
5.0
5.3
5.6
6.3
7.0
7.9
76.4
79.4
89.2
78.7
70.5
42.3
57.4
33.3
.738
.886
.978
1.223
1.444
1.829
1.828
2.161
.85
.78
.84
.86
.89
.90
.89
.90
M1
4.3
5.3
7.3
6.1
7.2
8.9
10.2
11.1
55.5
66.5
67.0
51.5
30.4
24.2
22.8
19.1
.950
1.026
1.219
1.525
1.872
2.033
2.209
2.323
.87
.83
.79
.88
.89
.86
.85
.85
M2
2.8
3.7
3.4
3.5
4.5
5.8
5.3
5.8
6.9
7.4
8.3
85.7
83.7
86.9
65.9
72.8
78.5
65.3
50.7
33.8
40.2
26.4
.471
.543
.626
.844
.959
1.095
1.372
1.657
1.926
2.018
2.239
3.3
6.0
5.8
6.9
8.8
11.2
14.9
16.7
109.3
118.2
102.1
93.0
104.3
82.6
82.4
84.1
.407
.511
.839
1.081
1.098
1.446
1.597
1.691
.70
.54
.75
.76
.68
.68
.63
.61
M3
MT -
.80
.75
.81
.88
.85
.83
.89
.90
.90
.90
.90
Computation
Math
Total
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
3.1
4.6
4.7
4.4
5.2
6.1
7.9
8.6
80.7
88.7
86.2
74.3
68.7
50.7
54.0
45.6
.695
.803
1.012
1.278
1.470
1.761
1.881
2.058
.88
.80
.86
.91
.90
.90
.88
.88
MT +
Math
Total
2.8
4.1
3.3
3.5
4.4
6.1
5.8
6.6
7.4
8.5
9.4
72.4
84.8
79.3
70.7
72.4
76.7
64.4
45.7
38.0
39.2
43.2
.608
.560
.708
.806
.975
1.126
1.399
1.725
1.893
2.037
2.092
.86
.73
.85
.88
.86
.82
.87
.89
.88
.88
.86
CT -
Core
Total
3.4
4.4
5.7
5.4
6.2
7.1
8.4
9.2
75.5
77.5
78.7
67.3
51.8
43.9
43.6
49.8
.758
.925
1.104
1.369
1.662
1.835
1.993
2.030
.87
.85
.83
.88
.89
.88
.87
.86
CT +
Core
Total
4.4
5.7
8.0
7.6
9.5
11.3
11.6
15.1
62.2
88.1
85.5
67.0
54.1
39.8
21.3
21.4
.891
.821
1.046
1.371
1.652
1.876
2.226
2.290
.85
.74
.71
.80
.79
.77
.82
.76
SS
Social
Studies
5.0
6.3
7.6
8.2
9.7
11.8
11.7
13.1
56.0
59.3
59.9
56.4
27.6
19.8
26.2
15.1
.938
1.105
1.308
1.500
1.932
2.083
2.189
2.384
.83
.80
.80
.81
.83
.78
.82
.81
SC
Science
8.4
8.6
9.3
11.0
12.2
12.8
61.3
53.3
28.4
15.4
12.8
-1.6
1.295
1.529
1.915
2.126
2.324
2.537
.77
.80
.84
.81
.82
.84
S1
5.8
7.0
9.2
10.5
10.7
13.7
83.4
79.7
59.2
44.1
26.7
18.2
1.053
1.252
1.608
1.851
2.184
2.351
.82
.80
.79
.79
.84
.80
S2
3.9
5.2
6.5
7.0
8.4
10.1
10.4
12.1
77.3
72.5
73.1
66.4
44.1
30.0
20.1
8.5
.728
.966
1.168
1.392
1.759
1.987
2.251
2.442
.83
.82
.81
.83
.84
.82
.85
.84
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.2
4.1
6.3
6.1
7.1
8.3
8.9
11.1
67.7
70.2
74.9
64.4
43.4
33.9
32.3
28.7
.836
.998
1.150
1.406
1.759
1.939
2.117
2.235
.90
.88
.82
.86
.88
.86
.87
.84
CC -
3.2
4.1
6.2
6.1
6.9
8.1
8.9
10.8
69.9
72.2
74.3
63.6
44.1
34.6
33.6
30.1
.814
.979
1.157
1.415
1.752
1.932
2.104
2.222
.90
.88
.82
.87
.88
.86
.87
.84
CC +
Compo- Composite
site
5.1
6.2
5.7
6.2
7.7
63.3
78.8
73.6
50.6
45.8
.711
.557
.783
1.024
1.256
.74
.57
.72
.80
.78
WA
Word
Analysis
2.8
4.4
3.6
4.2
4.8
72.8
82.9
82.4
84.4
77.1
.606
.597
.678
.664
.916
.87
.73
.82
.78
.82
Li
Listening
3.3
3.3
3.0
4.0
5.1
63.9
93.2
88.3
74.6
73.3
.699
.447
.627
.772
.970
.86
.73
.84
.84
.82
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
RV
Vocabulary
Reading
10/29/10
5
6
6
7
8
9
10
11
12
13
14
Level Grade
Quantitative
Table 8.7 (continued)
Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 146
K
K
1
1
2
3
4
5
6
7
8
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
5
6
6
7
8
Sy.x 9
10
11
12
13
14
c
b
4.6
4.1
5.5
6.9
7.1
7.8
9.5
10.3
12.6
13.0
119.2
100.6
80.9
72.1
50.4
68.0
45.2
43.9
43.1
57.4
.165
.508
.723
.994
1.382
1.372
1.732
1.863
1.987
1.946
3.6
3.6
6.2
6.9
6.5
7.2
8.3
9.5
11.3
11.3
113.9
100.5
71.4
57.8
60.7
85.6
59.0
68.0
72.6
84.4
.220
.511
.817
1.135
1.270
1.185
1.585
1.616
1.690
1.675
.35
.68
.68
.74
.83
.79
.81
.76
.75
.76
RT
Reading
Total
4.4
5.3
6.6
7.8
8.8
12.6
11.1
10.4
100.8
109.5
112.2
116.0
79.3
105.3
106.1
122.9
.502
.591
.737
.882
1.395
1.262
1.353
1.307
.62
.59
.64
.66
.75
.57
.68
.71
L1
Spelling
-Does not include Computation
+Includes Computation
6.6
7.8
7.2
7.7
7.9
7.3
7.6
8.5
9.5
11.2
10.6
61.2
73.8
94.5
60.7
43.2
68.4
101.5
70.0
92.2
102.2
112.2
.728
.685
.565
.920
1.277
1.181
1.014
1.463
1.368
1.391
1.397
.22
.63
.68
.69
.83
.81
.80
.78
.77
.76
RC
Comprehension
10.5
11.5
13.0
16.1
16.6
18.1
56.7
39.9
9.5
18.6
8.8
28.2
1.300
1.657
2.101
2.154
2.347
2.282
.68
.75
.76
.68
.74
.71
L2
Capitalization
9.0
11.1
12.2
16.3
15.3
17.8
60.5
47.7
-.7
20.9
3.3
28.2
1.265
1.585
2.209
2.137
2.417
2.277
.72
.74
.79
.67
.77
.71
L3
Punctuation
8.6
11.0
11.8
14.4
14.7
17.8
38.5
49.3
8.1
-16.6
-6.0
-5.8
1.512
1.581
2.143
2.509
2.498
2.602
.80
.75
.79
.77
.79
.76
L4
3.6
4.4
5.1
5.0
5.8
7.3
8.8
9.7
12.6
12.8
14.2
78.6
88.3
96.8
81.0
76.6
68.6
63.7
24.7
32.6
28.8
44.2
.553
.561
.550
.710
.933
1.190
1.422
1.957
2.010
2.148
2.110
.72
.61
.59
.71
.73
.77
.78
.82
.74
.79
.76
LT
Usage & Language
Expression
Total
Language
5.0
6.3
4.9
6.3
6.9
9.3
9.5
10.3
82.9
80.0
83.3
93.6
76.1
68.0
80.2
58.8
.682
.883
1.019
1.075
1.390
1.612
1.596
1.910
.70
.69
.84
.80
.83
.77
.79
.83
M1
6.0
6.8
6.8
7.1
8.2
10.3
10.9
12.5
61.9
72.3
55.9
67.2
32.1
36.1
30.5
39.7
.897
.974
1.308
1.368
1.857
1.957
2.125
2.121
.73
.69
.82
.83
.85
.80
.83
.80
M2
Concepts Problems
& Data
&
InterpreEstimation
tation
3.5
4.5
4.6
5.2
5.9
5.5
6.3
7.0
9.3
9.2
10.4
86.7
84.2
97.8
72.6
76.3
69.8
80.5
54.2
52.7
55.5
49.6
.465
.550
.524
.788
.929
1.162
1.221
1.623
1.779
1.860
2.012
3.6
6.2
7.1
8.5
9.8
13.5
17.0
17.9
109.0
114.1
117.7
120.4
118.9
126.8
124.9
112.3
.415
.553
.671
.809
.954
1.037
1.171
1.414
.63
.51
.58
.60
.57
.47
.47
.53
M3
MT -
.66
.60
.60
.73
.73
.85
.83
.86
.80
.84
.84
Computation
Math
Total
Mathematics
4.3
5.4
5.2
6.1
6.7
9.2
10.4
10.8
85.3
89.4
85.8
93.6
75.9
77.8
78.4
70.7
.658
.800
.999
1.086
1.400
1.528
1.633
1.811
.73
.70
.82
.81
.83
.75
.77
.80
MT +
Math
Total
4.0
4.8
5.1
5.0
5.5
5.7
6.5
7.4
9.1
10.1
10.6
74.6
83.2
96.2
76.2
70.8
67.1
77.4
46.4
51.6
54.1
60.9
.591
.589
.548
.761
.995
1.202
1.269
1.720
1.798
1.883
1.918
.70
.61
.58
.73
.77
.85
.83
.86
.81
.82
.82
CT -
Core
Total
4.7
5.3
5.6
6.4
7.2
9.1
10.3
10.5
80.5
75.2
71.5
81.7
53.8
60.1
61.6
67.9
.718
.951
1.156
1.225
1.644
1.713
1.808
1.852
.73
.77
.84
.83
.85
.79
.80
.81
CT +
Core
Total
6.4
6.7
7.2
7.9
9.8
11.4
11.9
16.0
75.7
94.7
69.9
77.0
51.3
37.9
26.4
40.0
.769
.760
1.181
1.272
1.680
1.935
2.168
2.108
.65
.60
.78
.78
.78
.76
.81
.72
SS
Social
Studies
6.8
7.8
7.0
8.5
9.9
12.0
12.2
13.1
68.4
64.9
47.1
66.8
23.4
18.7
33.5
23.2
.828
1.056
1.413
1.395
1.974
2.138
2.109
2.306
.65
.67
.83
.79
.82
.78
.80
.81
SC
Science
8.1
8.6
8.6
11.2
12.3
13.6
51.7
60.8
16.2
15.2
16.8
14.5
1.369
1.453
2.036
2.173
2.277
2.379
.78
.80
.86
.80
.82
.81
S1
5.9
7.2
8.5
10.1
10.9
14.6
78.9
87.7
46.2
37.4
30.8
36.3
1.079
1.172
1.737
1.958
2.136
2.173
.81
.79
.83
.80
.83
.76
S2
4.8
6.5
6.4
7.1
7.6
10.1
10.6
13.1
78.5
76.3
65.8
74.3
31.5
26.9
24.3
25.8
.725
.933
1.220
1.312
1.884
2.060
2.202
2.273
.73
.69
.82
.82
.87
.82
.85
.81
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
5.0
5.7
5.7
6.6
7.5
9.2
10.0
12.2
75.0
73.3
64.1
75.4
40.6
39.6
42.8
46.0
.774
.973
1.237
1.295
1.788
1.924
2.005
2.067
.74
.75
.85
.84
.86
.82
.84
.80
CC -
4.9
5.6
5.7
6.6
7.4
9.2
10.1
11.9
76.7
74.9
64.3
75.2
41.9
42.3
45.5
47.5
.757
.957
1.236
1.298
1.774
1.897
1.978
2.053
.74
.75
.85
.84
.87
.82
.83
.81
CC +
Compo- Composite
site
5.4
7.0
7.0
7.2
9.2
53.2
88.9
87.3
51.0
51.6
.819
.467
.655
1.032
1.206
.71
.38
.53
.71
.66
WA
Word
Analysis
3.9
5.0
5.2
5.5
6.4
73.8
80.9
99.2
96.0
86.8
.601
.631
.519
.558
.825
.72
.61
.56
.58
.65
Li
Listening
4.5
3.9
4.3
5.2
6.1
63.1
94.5
97.0
79.2
73.2
.713
.444
.548
.735
.975
.73
.57
.65
.71
.73
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
.59
.47
.47
.64
.73
.77
.72
.78
.71
.69
.72
RV
Vocabulary
Reading
10/29/10
r
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
Level Grade
Nonverbal
Table 8.7 (continued)
Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 147
147
148
K
K
1
1
2
3
4
5
6
7
8
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
b
c
5
6
6
7
8
Sy.x 9
10
11
12
13
14
4.8
7.1
5.5
6.3
5.9
5.9
6.2
6.9
7.4
8.8
8.7
53.2
72.1
72.7
59.9
39.9
62.1
91.4
70.4
75.8
83.4
92.8
.803
.689
.772
.921
1.300
1.251
1.115
1.459
1.509
1.565
1.578
4.5
4.1
4.6
6.1
5.7
6.6
7.9
7.5
10.0
10.2
120.2
105.6
83.4
74.5
47.3
63.3
48.7
29.7
30.1
32.6
.152
.456
.693
.963
1.422
1.420
1.697
1.975
2.101
2.174
.25
.66
.79
.78
.89
.87
.86
.89
.86
.86
-Does not include Computation
+Includes Computation
K
K
1
1
2
3
4
5
6
7
8
5
6
6
7
8
9
10
11
12
13
14
r
.81
.59
.74
.78
.86
.86
.82
.86
.84
.82
.82
RC
Comprehension
3.6
3.5
5.0
5.3
4.9
5.8
6.6
6.9
8.7
8.7
116.3
105.1
72.2
57.4
56.4
78.1
61.5
52.8
56.7
62.0
.192
.463
.802
1.130
1.321
1.261
1.559
1.741
1.833
1.883
.38
.71
.80
.86
.90
.87
.88
.88
.86
.87
RT
Reading
Total
4.1
5.0
5.7
6.9
7.8
10.9
9.7
9.5
104.5
111.8
104.3
108.4
83.0
83.9
95.9
112.4
.461
.565
.820
.959
1.359
1.453
1.443
1.401
.69
.66
.75
.74
.81
.71
.77
.76
L1
Spelling
9.5
10.6
11.6
14.0
14.3
15.5
51.3
36.5
16.4
4.0
-3.0
-.6
1.362
1.693
2.033
2.264
2.446
2.547
.75
.79
.81
.77
.81
.80
L2
Capitalization
8.1
10.0
10.8
13.7
13.0
15.1
56.9
42.2
8.3
.4
-6.0
-.2
1.309
1.641
2.120
2.306
2.492
2.540
.79
.80
.84
.78
.84
.80
L3
Punctuation
7.0
9.8
10.0
11.6
11.9
14.5
32.9
42.2
14.0
-29.3
-16.7
-36.4
1.577
1.654
2.084
2.596
2.585
2.884
.87
.81
.86
.86
.87
.85
L4
2.5
3.7
3.7
3.8
4.8
6.0
7.6
7.9
9.7
10.3
11.5
78.7
88.2
85.5
81.7
78.4
63.2
57.8
31.0
15.2
18.2
19.6
.548
.551
.655
.698
.909
1.251
1.483
1.895
2.151
2.237
2.336
.88
.75
.81
.84
.83
.85
.84
.89
.86
.87
.85
LT
3.7
5.2
4.0
5.4
5.4
6.7
7.4
8.1
82.2
78.8
82.4
90.1
79.3
53.4
70.8
42.7
.684
.889
1.035
1.110
1.357
1.731
1.677
2.054
.84
.80
.90
.86
.89
.89
.88
.90
M1
4.3
5.3
5.5
5.9
6.6
8.0
8.8
9.9
61.8
68.4
52.9
63.5
39.2
26.4
26.1
19.3
.891
1.005
1.346
1.407
1.786
2.024
2.153
2.306
.87
.83
.89
.89
.91
.88
.89
.88
M2
2.7
3.8
3.5
3.6
4.4
4.3
5.3
5.3
6.6
6.8
7.7
85.6
84.4
89.9
72.1
74.0
68.0
76.9
59.1
40.5
48.5
31.3
.473
.538
.596
.787
.944
1.188
1.258
1.574
1.873
1.915
2.177
3.3
6.0
6.6
7.8
9.3
12.4
15.7
17.1
112.5
118.7
112.6
113.1
120.0
108.9
104.3
97.2
.378
.505
.727
.882
.944
1.197
1.365
1.551
.69
.54
.66
.67
.63
.58
.58
.59
M3
MT -
.82
.73
.79
.88
.86
.91
.89
.92
.91
.92
.91
Computation
Math
Total
Mathematics
Concepts Problems
Usage & Language
& Data
&
Expression
Total
InterpreEstimation
tation
Language
3.1
4.5
4.1
5.0
5.3
6.9
8.2
8.7
85.8
89.6
82.9
88.8
79.4
63.8
66.9
53.4
.648
.792
1.034
1.135
1.365
1.643
1.734
1.967
.87
.81
.89
.88
.90
.87
.87
.88
MT +
Math
Total
2.6
4.0
3.5
3.5
3.9
4.1
5.1
5.2
5.8
7.1
7.5
72.2
82.6
82.5
76.4
70.7
63.2
72.0
50.0
36.5
42.4
39.0
.611
.584
.676
.754
.988
1.248
1.325
1.684
1.919
1.984
2.120
.89
.74
.83
.88
.89
.92
.90
.93
.93
.91
.91
CT -
Core
Total
3.4
4.0
4.0
4.9
5.1
6.0
7.3
7.5
80.8
75.9
67.4
75.9
56.9
44.5
48.4
46.5
.709
.938
1.205
1.285
1.613
1.840
1.925
2.049
.87
.88
.92
.90
.93
.92
.91
.91
CT +
Core
Total
4.4
5.4
6.3
6.8
8.2
9.3
9.7
13.6
68.5
86.7
67.7
71.5
54.2
27.8
19.9
14.7
.833
.833
1.211
1.328
1.652
2.004
2.216
2.340
.85
.77
.83
.85
.85
.85
.88
.81
SS
Social
Studies
4.9
5.8
5.8
7.3
8.4
9.8
10.1
11.1
61.8
55.8
45.7
61.7
31.0
9.8
27.0
7.8
.885
1.136
1.436
1.448
1.899
2.194
2.158
2.441
.84
.84
.89
.85
.87
.86
.87
.87
SC
Science
7.2
7.7
8.0
9.1
10.8
11.6
50.3
58.0
32.0
8.5
15.4
-1.5
1.391
1.483
1.880
2.206
2.274
2.520
.84
.84
.88
.88
.86
.87
S1
4.8
6.4
7.8
8.4
9.1
11.9
76.2
84.6
57.8
33.3
28.3
11.7
1.113
1.204
1.622
1.969
2.145
2.399
.88
.84
.86
.87
.89
.85
S2
3.8
5.1
5.2
6.1
6.8
8.0
8.8
10.5
81.3
72.8
63.9
71.2
44.8
21.3
22.3
5.3
.691
.960
1.247
1.345
1.753
2.084
2.206
2.457
.84
.83
.88
.87
.90
.89
.90
.88
ST
Maps
Reference Sources
and
Materials
Total
Diagrams
Sources of Information
3.2
3.7
4.3
5.2
5.5
6.1
7.2
9.2
73.3
69.0
61.3
70.4
46.1
28.0
34.2
23.5
.785
1.007
1.273
1.348
1.734
2.010
2.075
2.272
.90
.90
.92
.90
.93
.93
.92
.89
CC -
3.1
3.6
4.3
5.3
5.4
6.2
7.3
9.1
75.2
70.9
61.6
70.1
47.5
30.1
36.3
26.1
.765
.988
1.272
1.351
1.719
1.988
2.055
2.247
.90
.90
.92
.90
.93
.92
.92
.89
CC +
Compo- Composite
site
5.1
6.6
6.1
6.1
7.4
65.6
86.0
80.8
57.5
45.3
.689
.487
.713
.960
1.258
.74
.49
.67
.80
.80
WA
Word
Analysis
2.4
4.1
3.8
4.2
4.8
72.2
78.4
86.3
88.5
77.9
.613
.644
.640
.626
.906
.90
.77
.79
.79
.83
Li
Listening
3.1
3.3
3.1
4.0
4.6
64.3
93.1
90.6
80.0
70.7
.696
.450
.605
.721
.992
.88
.72
.83
.84
.86
RPT
Reading
Profile
Total
3:16 PM
Note:
K
K
1
1
2
3
4
5
6
7
8
RV
Vocabulary
Reading
10/29/10
5
6
6
7
8
9
10
11
12
13
14
Level Grade
Composite
Table 8.7 (continued)
Correlations (r), Prediction Constants (b and c), and Standard Errors of Estimate (Sy .x) for School Averages
Standard Age Scores, Cognitive Abilities Test, Form 6 and
Developmental Standard Scores, Iowa Tests of Basic Skills — Complete Battery, Form A
Spring 2000 National Standardization
961464_ITBS_GuidetoRD.qxp
Page 148
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 149
Technical Consideration for
Other Iowa Tests
PART 9
Other assessments have been developed in
conjunction with The Iowa Tests to support local
testing programs. These materials include the:
• Iowa Tests of Basic Skills® Survey Battery
(Levels 7–14)
• Iowa Early Learning Inventory™
• Iowa Writing Assessment (Levels 9–14)
• Constructed-Response Supplement to The
Iowa Tests (Levels 9–14)
• Listening Assessment for ITBS® (Levels 9–14)
• Iowa Algebra Aptitude Test™
Language
At Levels 7 and 8, the Language tests in the Survey
Battery assess four content areas: Spelling,
Capitalization, Punctuation, and Usage and
Expression. The numbers of questions per skill are
the same as those for the Complete Battery.
At Levels 9 through 14, the Survey Battery includes
five skill categories corresponding to the major
language skills in the Complete Battery: Spelling,
Capitalization, Punctuation, Usage, and Expression.
The questions were selected to be representative in
content and difficulty of those in the Complete
Battery.
Mathematics
Information about selected aspects of these tests
follows.
Iowa Tests of Basic Skills Survey Battery
The Iowa Tests of Basic Skills Survey Battery
consists of achievement tests in the areas of Reading,
Language, and Mathematics. The Interpretive Guide
for Teachers and Counselors and the Interpretive
Guide for School Administrators contain the content
specifications and skill descriptions for each level of
the ITBS Survey Battery.
Other Scores
The Survey Total is the average of the standard
scores from the Reading, Language, and Math tests.
Description of the Tests
Test Development
The ITBS Survey Battery contains tests in Reading,
Language, and Math with questions from the
corresponding tests of the Complete Battery. Each
test takes 30 minutes to administer.
The ITBS Survey Battery reflects the same content
emphasis as the Complete Battery. It was developed
from items in the Complete Battery, partitioned into
non-overlapping levels so that average item
difficulty, level by level, was approximately the same
as in the Complete Battery. Where clusters of items
shared a common stimulus, care was taken to
maintain the balance of stimulus types.
Reading
At Levels 7 and 8, the first part of the test measures
reading vocabulary. The second part has questions
measuring literal comprehension and questions
measuring the ability to make inferences.
p
In the Survey Battery for Levels 7 through 14, a
single test covers Math Concepts, Problem Solving,
and Data Interpretation. Separately timed sections
measure Computation and Estimation. The relative
emphasis of the major skill areas in the Survey
Battery is approximately the same as in the
Complete Battery. Score reports provide information
about major skills in mathematics, but not subskills.
At Levels 9 through 14, the first part of the Reading
test consists of a representative set of items from the
Complete Battery Vocabulary test. The second part,
Reading Comprehension, was developed with
passages and questions from the Reading test in the
Complete Battery. Passages were chosen to reflect
genres included in the Complete Battery.
Standardization
As part of the 2000 fall national standardization,
Forms A and B of the ITBS Survey Battery were
administered so that each student took one form of
the Complete Battery and the alternate form of the
Survey Battery. Raw score to standard score
conversions for each form of the ITBS Survey
149
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 150
Battery were established by first determining
comparable raw scores on Survey and Complete
Battery subtests via smoothed equipercentile
methods and then attaching standard scores to the
Survey raw scores. Because of the joint
administration of each ITBS Survey Battery with
the alternate form of the Complete Battery,
comparable scores for Form A of the ITBS Survey
were obtained through Form B of the Complete
Battery, and comparable scores for Form B of the
ITBS Survey were obtained through Form A of the
Complete Battery. Comparable scores on the two
batteries were based on approximately 3,000
students per level per test form.
information for interpreting the scales. The IELI is
intended for use with kindergarten and first-grade
students.
Description of the Inventory
The IELI includes the following scales: General
Knowledge,
Oral
Communication,
Written
Language, Math Concepts, Work Habits, and
Attentive Behavior. It takes an experienced teacher
about 10 minutes per student to complete the
ratings.
General Knowledge: This scale measures the
acquisition of general information and facts
expected of five- and six-year-old children.
For more information about the procedures used to
establish norms and score conversions for the ITBS
Survey Battery, see Parts 2 and 4.
Oral Communication: How well a student is
able to communicate ideas, describe what is
seen or heard, or ask questions is the focus
of this scale.
Test Score Characteristics
Written Language: A student’s ability to
recognize and write letters or simple words
is assessed by this scale.
Raw score means and standard deviations and
internal-consistency reliability estimates for Form A of
the Survey Battery for spring testing appear in Table
9.1. Complete technical information about Forms A
and B of the Survey Battery is provided in the Norms
and Score Conversions manual for each form.
Math Concepts: This scale evaluates how well a
student understands and is able to use
beginning mathematical ideas and processes.
Work Habits: Behaviors indicative of success in
the classroom—persistence, resourcefulness,
and independence—are assessed with this
scale.
Iowa Early Learning Inventory
The Iowa Early Learning Inventory (IELI) is a
questionnaire for teachers to rate student behavior
in six areas related to school learning. The Teacher’s
Directions and Interpretive Guide for the IELI
contains descriptions of the six scales and specific
Attentive Behavior: The questions on this scale
relate to a student’s ability to focus on
classroom activities.
Table 9.1
Test Summary Statistics
Iowa Tests of Basic Skills — Survey Battery, Form A
2000 National Standardization Data
Reading
Level
7
Number
of
Items
Mean
SD
40
26.55
8.94
Language
K-R 20
.92
Number
of
Items
34
Mean
SD
23.15
6.68
Math with Computation
K-R 20
.87
Number
of
Items
40
Mean
SD
27.11
6.37
K-R 20
.84
Math without Computation
Number
of
Items
27
Mean
SD
17.92
4.36
K-R 20
.77
8
44
28.95
8.70
.90
42
30.10
7.40
.88
50
33.64
8.19
.88
33
21.41
5.88
.84
9
27
16.94
6.48
.89
43
26.69
8.78
.90
31
20.85
6.10
.86
23
15.57
4.67
.82
10
30
17.50
6.74
.88
47
27.21
9.37
.90
34
20.12
6.38
.85
25
14.36
4.83
.80
11
32
20.39
6.67
.87
51
30.92 10.58
.92
37
22.80
7.52
.88
28
16.68
5.91
.85
12
34
21.67
7.24
.88
54
33.40 11.30
.92
40
25.82
7.87
.89
30
19.64
6.23
.87
13
36
21.23
7.76
.89
57
32.43 11.42
.92
43
25.63
8.39
.89
33
19.93
6.92
.88
14
37
21.41
7.97
.89
59
34.39 11.46
.91
46
25.32
8.74
.89
35
19.82
7.08
.87
Average
150
.89
.90
.87
.84
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 151
Test Development
Extensive research has been done by early childhood
educators on the characteristics that contribute to
successful classroom learning. From this research,
the IELI’s six scales were established. The focus was
limited to behaviors that classroom teachers would
be likely to observe in day-to-day activities. Care
was taken to include behaviors that relate to
learning, rather than socialization.
Standardization
The national norms for the IELI were obtained from
a subsample of kindergarten students included in
the spring 2000 national standardization of the
Iowa Tests of Basic Skills. These norms are the basis
for reporting a student’s score as “Developed,”
“Developing,” or “Delayed” (about 60 percent, 30
percent, and 10 percent of the norming sample,
respectively). Reliability coefficients of IELI scales
range from .81 to .93. The percent of kindergarten
students rated at each of these categories for the six
scales in the 2000 spring national standardization
and correlations between IELI ratings and ITBS
scores are reported in the Teacher’s Directions and
Interpretive Guide for the IELI (Hoover, Dunbar,
Frisbie & Qualls, 2003).
Iowa Writing Assessment
The Iowa Writing Assessment measures a student’s
ability to generate, organize, and express ideas in
written form. As a performance assessment, it adds
another dimension of information to the evaluation
of language arts achievement. Although multiplechoice and short-answer tests typically provide
highly reliable measurement of language skills, such
tests tap a student’s editing skills rather than
composition skills. Norm-referenced evaluation of
student writing supplements information obtained
from the Language tests in the Iowa Tests of Basic
Skills. The norms give a national perspective to
writing that students do on a regular basis in school.
More information about this test can be obtained
from the Iowa Writing Assessment Manual for
Scoring and Interpretation.
Description of the Test
The Iowa Writing Assessment measures a student’s
ability to write four types of essays: narrative,
descriptive, persuasive, and expository.
Narrative: A narrative tells a story. It has
characters, setting, and action. The
characters, the setting, and the problem are
usually introduced in the beginning. The
problem reaches a high point in the middle.
The ending may resolve the problem. A
narrative may be a fictional story, a factual
account of a real-life experience, or some
combination of both.
Descriptive: A description creates a vivid image
of a person, place, or thing. It enables the
reader to share the writer’s experience by
appealing to the senses.
Persuasive: A persuasive essay states and
supports an opinion by drawing on the
writer’s experience or the experience of
others, or by citing authority. Good
persuasive writing considers the audience
and presents an argument likely to be
effective.
Expository: Expository writing conveys
information to help the reader understand a
process, procedure, or concept. Telling how
something is made or done, reporting on an
experience, or exploring an idea are
examples of expository writing.
Directions for local scoring of the Iowa Writing
Assessment are provided to schools. These include
how to select and train raters, and how to organize
scoring sessions. In addition, materials are provided
to help raters make valid, consistent judgments
about student writing. These materials include a
scoring protocol and anchor papers for each prompt,
and a booklet of training papers. The anchor papers
are actual examples of student writing. They
provide concrete illustrations of the criteria for each
score point.
Student essays can be scored in two ways. The
focused-holistic score gives an overall rating of the
quality of an essay. The analytic scores provide
ratings on four separate scales (Ideas/Content,
Organization, Voice, and Conventions). A four-point
scale is used for all ratings.
Each essay is scored by two independent raters.
When two scores differ by more than one point, the
essay is rescored by a supervisor. The final score for
each essay is the average of the two closest ratings.
The supervisor’s rating becomes the final score if it
is midway between the other two ratings. Papers
that are blank, illegible, not written in English, or
not on the assigned topic are deemed unscorable.
151
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 152
For both scoring methods, the protocols describe
each point on the score scale in detail. The protocols
reflect criteria unique to each type of essay, along
with fluency of ideas and development of a personal
voice. In general, the emphasis is on thoughtful
content, logical organization, and original style. In
focused-holistic scoring, language mechanics affect
scores only if they inhibit understanding. In analytic
scoring, language mechanics are scored on the
Conventions scale. For both methods, the protocols
emphasize that effective discourse involves the
ability to reason, to develop ideas, to write for an
audience, and to proceed from one idea to the next.
prompts were weighted so the distribution of ITBS
Vocabulary scores matched the distribution from the
1992 spring national standardization of Form K.
Approximately 20,000 essays, written by students
representing 323 buildings in 32 states, were scored
in the standardization study.
The partially balanced, incomplete block design
allowed norms to be computed for each prompt
based on randomly equivalent groups of students. It
also allowed estimation of correlations between
prompts.
After the 2000 standardization, the norms for the
Iowa Writing Assessment were adjusted. The
original standardization sample was reweighted so
that the Vocabulary scores matched the Vocabulary
distribution in the 2000 spring standardization.
Test Development
Prompts for each type of writing were developed for
students in grades 3 and 4 (Levels 9–10), grades
5 and 6 (Levels 11–12), and grades 7 and 8
(Levels 13–14). Prompts were assigned to levels
based on their difficulty. Factors considered in
designing each prompt were whether it represents
the most salient features of that type of writing;
whether it clearly defines the specific demands of
that type of writing; and whether it is likely to elicit
good writing from students regardless of gender,
race or ethnicity, or geographic region. For each
level, two prompts were developed and standardized
for each type of essay.
Test Score Characteristics
The procedures for training scorers and conducting
scoring sessions helped ensure that scores from the
Iowa Writing Assessment would be reliable. Witt
(1993) studied the consistency of scores between
readers and between prompts. Intraclass correlations
were calculated for the analytic scores, for the
analytic total, and for the focused-holistic score.
Table 9.2 reports average correlations across grades.
Reader reliability is estimated to answer these
questions: How accurate is the rating of this
student’s essay? What is the relation between the
obtained rating and the rating the student would
have received from a different rater? The first two
columns of Table 9.2 display estimates of reader
reliability from the national standardization. These
Standardization
During the 1992 national standardization, prompts
were administered in a partially balanced,
incomplete block design. Pairs of prompts were
spiraled by classroom. Each student also took a form
of the ITBS. The essays in each group of paired
Table 9.2
Average Reliability Coefficients, Grades 3–8
Iowa Writing Assessment
1992 National Standardization
Mode of Discourse
Rater Reliability 1
Score Reliability 2
Generalizability 3
Holistic
Analytic
Holistic
Analytic
Holistic
Analytic
Narrative
.79
.75
.52
.60
.35
.52
Descriptive
.76
.71
.42
.53
.39
.54
Persuasive
.82
.76
.37
.53
.34
.49
Expository
.77
.78
.46
.61
.36
.52
Average
.78
.75
.44
.57
.36
.52
1 Average of two raters
2 Correlations between two essays in the same mode of discourse, essay score average of two ratings
3 Average correlations between essays in different modes of discourse
152
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 153
Table 9.3
Correlations and Reliability of Differences
Iowa Writing Assessment and Iowa Tests of Basic Skills Language Total
1992 National Standardization
Mode of Discourse
Reliability of Differences
Correlations
Holistic
Analytic
Holistic
Analytic
Narrative
.44
.58
.53
.47
Descriptive
.44
.55
.44
.42
Persuasive
.42
.59
.42
.37
Expository
.44
.60
.48
.45
Average
.43
.58
.47
.43
values are similar to reader reliabilities found in
large-scale writing assessments.
Estimates of score reliability answer the following
questions: How accurate is a student’s score on this
essay as an indicator of ability in this type of
writing? What is the expected relation between this
score and a score from a different prompt? The
second two columns of Table 9.2 display estimates of
score reliability from the national standardization.
Score reliabilities were highest for narrative writing
and lowest for descriptive and persuasive writing.
Research confirms that score reliability for a single
essay is lower than the internal-consistency or
parallel-forms reliability of a standardized test that
has many discrete items. Scoring that emphasizes
content, organization, and voice (features that vary
from topic to topic) is less reliable than scoring that
emphasizes mechanics, which are relatively
constant from topic to topic.
Score reliabilities from the standardization should
be considered lower limits. Participants were
assigned prompts on the basis of demographic, not
curricular, characteristics. When schools select
prompts, higher score reliabilities are expected
because instruction tends to create consistent
performance in that type of writing.
The correlations between scores from different essay
types are reported in the final two columns of Table
9.2. These measure the generalizability of
performance on topics requiring different types of
writing. Score generalizabilities are directly
comparable to score reliabilities—both are
correlations between different essays written by the
same student. The score reliabilities reflect
consistency in one type of writing; generalizabilities
reflect consistency across types, which is usually
lower than within a type.
Correlations between the prompts on the Iowa
Writing Assessment and the ITBS Language Total
indicate the overlap in skills assessed on the two
tests. The correlations included in Table 9.3 are
adjusted for unreliability. They estimate the true
score relationship between the tests. These
correlations show that the Iowa Writing Assessment
and the ITBS Language Total tap different aspects
of achievement in language arts.
Constructed-Response Supplement
to The Iowa Tests
The Constructed-Response Supplement (CRS) is
used together with the Complete Battery or the
Survey Battery of the ITBS. The CRS measures
learning outcomes in reading, language, and math,
using an open format. The tests assess content
objectives measured in the multiple-choice format of
the ITBS, along with others where an open format is
particularly effective.
More information about the CRS is found in the
Constructed-Response Supplement to The Iowa Tests
Manual for Scoring and Interpretation. There is a
separate manual for each test level.
Description of the Tests
Students write their answers to open-ended
questions in the test booklets. Teachers use the
guidelines provided to score the tests; they record
the results on forms that can be processed by the
Riverside Scoring Service. Most questions have
more than one correct response and can be
answered by various strategies. Students answer
with phrases, sentences, paragraphs, drawings,
diagrams, or mathematical expressions. The
administration time for each test is 30 minutes.
153
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 154
Thinking about Reading
Students show their ability to understand and
interpret what they read. At each level, the Reading
Supplement consists of one passage and several
questions. Some questions require a one- or twosentence response. Other questions can be answered
in a few words. Responses are scored on a 0-1 or a
0-1-2 scale. The maximum number of points
awarded for a question depends on its complexity.
Some questions have several parts, and points are
given for each part.
Thinking about Language
The Language Supplement assesses students’
ability to develop and organize ideas and to express
them in standard written English. At each grade
level, the Language tests include three parts:
editing, revising, and generating. In the editing part,
students identify which parts of three short stories,
reports, or letters need editing. They can make
changes in spelling, capitalization, punctuation, and
the use of words or phrases. In the revising part,
students revise a story by completing sentences,
changing them to express an idea more clearly,
correcting grammatical mistakes, and completing
the story. In the generating part, students are given
a specific topic and told to define the subject of the
story, write at least three ideas for the story, and
write a complete sentence to be included in the story.
The number of items and the total score points vary
by level, from 23 items (38 score points) to 33 items
(60 score points). Responses are scored on scales of
0-1, 0-1-2, or 0-1-2-3, depending on item complexity.
Scoring guidelines and keys are used to assign
points. For example, items in the editing part are
worth two points: one point for identifying the error
and one point for correcting it. In the revising and
generating parts, guidelines instruct scorers to accept
reasonable answers and to ignore errors in mechanics
unrelated to the skill measured by the item.
Thinking about Mathematics
The Mathematics Supplement assesses problem
solving,
data
interpretation,
conceptual
understanding, and estimation skills. Open-ended
questions are presented alone or in groups related to
a common data source. Items require students to
analyze and solve problems and to describe their
thinking using words, diagrams, graphs, symbols,
calculations, and equations or inequalities. Students
may use a variety of solution strategies to solve
problems; they must also make connections among
mathematical concepts and procedures. Questions
require students to explain their reasoning, show
their work, and justify their conclusions.
154
The number of items and the total score points vary
by level, from 13 items (19 score points) to 17 items
(24 score points). Responses are scored on scales of
0-1 or 0-1-2. The scoring guidelines describe
characteristics of a response at each score level. A
2-point
response
demonstrates
complete
understanding of the math concepts and processes
involved in that item, a 1-point response
demonstrates partial understanding, and a 0-point
response demonstrates no understanding. For each
question, the scoring key describes the kinds of
answers that would earn full or partial credit.
Test Development
The goal of test development was to create
situations that would elicit different patterns of
thinking in different students yet elicit equally
correct responses to questions (Perkhounkova,
Hoover & Ankenmann, 1997). The content
specifications for test development were similar to
those used to develop the ITBS. However, the open
format created opportunities to tap a wider range of
process skills, especially in language and
mathematics. Test materials were reviewed for
balance in terms of gender, race and ethnicity, and
geography as described in Parts 3 and 7 and were
field tested in a national item tryout.
Joint Scaling with the ITBS
CRS results are reported as raw scores. If combined
with a multiple-choice ITBS subtest, they can be
reported as developmental standard scores, national
percentile ranks, and other derived scores. The
combined CRS/multiple-choice standard score
conversions were developed with data from the fall
1997 scaling study in which national samples of
students in grades 3 through 8 took the CRS and
one or more subtests of the ITBS.
Test Score Characteristics
Internal-consistency reliability coefficients for the
CRS appear in Table 9.4. The obtained reliabilities
are comparable to reliability estimates for multiplechoice tests in Language and Math. In Reading, the
reliability estimates reflect the smaller number of
items in that test. The correlations between CRS
scores and multiple-choice tests of the ITBS appear
in Table 9.5 along with the reliabilities of the
differences.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 155
Table 9.4
Internal-Consistency Reliability
Constructed-Response Supplement
1997 Fall Scaling Study
Level
Reading
Language
Math
9
10
11
12
13
14
.67
.68
.61
.65
.70
.65
.80
.81
.79
.81
.86
.85
.75
.78
.75
.82
.80
.82
Average
.66
.82
.79
Listening Assessment for ITBS Directions for
Administration and Score Interpretation.
Description of the Test
The main purposes of the test are: (a) to assess
strengths and weaknesses in the development of
listening skills so effective instruction can be
planned to meet individual and group needs; (b) to
monitor instruction so effective teaching methods
can be identified; and (c) to help teachers and
students understand the importance of good
listening strategies.
Test Development
Table 9.5
Correlations and Reliabilities of Differences
Constructed-Response Supplement and
Corresponding ITBS Subtests
1997 Fall Scaling Study
Correlations
Level
Reading
Language
Math
9
10
11
12
13
14
.60
.68
.57
.59
.63
.66
.67
.70
.71
.74
.72
.73
.69
.76
.70
.79
.77
.80
Average
.62
.71
.75
Content specifications are based on research in
the teaching and assessment of listening
comprehension. Items in the Listening Assessment
measure six major skills:
Literal Meaning: details about persons, places,
objects, and ideas
Inferential Meaning: importance of details;
cause and effect; drawing conclusions
Following Directions: decoding; verbal,
numerical, and spatial relationships;
sequence
Visual Relationships: verbal-to-visual
transformations; word meaning in context
Numerical/Spatial/Temporal Relationships:
analyzing and visualizing concepts of
number, space, and time
Speaker’s Purpose, Point of View, or Style: main
idea, organization, purpose, tone, and context
Reliability of Differences
Level
Reading
Language
Math
9
10
11
12
13
14
.20
.16
.18
.25
.28
.17
.24
.40
.35
.38
.45
.42
.22
.19
.27
.26
.21
.18
Average
.21
.37
.22
Listening Assessment for ITBS
The Listening Assessment is a special supplement to
the Iowa Tests of Basic Skills for grades 3 through 9.
It is an upward extension of the Primary Battery
Listening tests for kindergarten through grade 3.
More information about the test can be found in the
The test includes 95 items in six overlapping levels,
9 through 14.
Standardization
The national standardization sample was selected
as a subsample of schools participating in the fall
1992 national standardization of the Iowa Tests of
Basic Skills. Selection characteristics were region,
district enrollment, type of school, and
socioeconomic status. Because the sample was not
representative of the national population,
distributions of the ITBS Vocabulary score were
obtained for the total ITBS representative sample
and the Listening Assessment sample. These
distributions were used to weight the sample that
took the Listening Assessment so that it represented
the ability of the national sample.
155
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 156
Table 9.6
Test Summary Statistics
Listening Assessment for ITBS
1992 Spring National Standardization
Grade
Number of Items
Raw Scores
Mean
SD
SEM
Standard Scores
Mean
SD
SEM
Reliability (K-R 20)
3
4
5
6
7
8
31
33
35
36
38
40
17.7
4.8
2.5
20.7
4.7
2.6
23.3
4.5
2.6
24.3
5.0
2.6
25.4
5.5
2.6
26.1
6.0
2.8
184.4
19.0
10.3
199.2
20.8
11.8
213.6
23.1
13.7
226.5
24.8
13.1
238.3
26.6
12.9
248.6
28.5
13.4
.72
.69
.67
.73
.78
.79
Table 9.7
Correlations Between Listening and ITBS Achievement
Iowa Tests of Basic Skills – Listening Assessment and Complete Battery, Form K
1992 National Standardization
Grade
Test
3
4
5
6
7
8
Vocabulary
.46 (.59)
.52 (.67)
.63 (.84)
.63 (.79)
.62 (.76)
.65 (.79)
Reading Comprehension
.43 (.54)
.49 (.64)
.60 (.78)
.63 (.78)
.63 (.77)
.60 (.72)
.47 (.58)
.54 (.68)
.65 (.83)
.67 (.82)
.67 (.80)
.66 (.78)
Spelling
.12 (.20)
.35 (.45)
.39 (.51)
.39 (.49)
.37 (.46)
.40 (.49)
Capitalization
.24 (.31)
.46 (.61)
.49 (.67)
.47 (.62)
.48 (.61)
.48 (.60)
Punctuation
.22 (.28)
.41 (.56)
.48 (.66)
.51 (.67)
.46 (.59)
.46 (.59)
Usage and Expression
.43 (.55)
.51 (.67)
.54 (.73)
.59 (.75)
.55 (.69)
.58 (.71)
.45 (.55)
.51 (.64)
.55 (.70)
.56 (.68)
.54 (.65)
.55 (.65)
Concepts and Estimation
.42 (.54)
.55 (.72)
.59 (.79)
.63 (.80)
.61 (.75)
.56 (.69)
Problem Solving &
Data Interpretation
.48 (.62)
.56 (.75)
.58 (.78)
.65 (.84)
.63 (.79)
.62 (.78)
Computation
Reading Total
Language Total
.30 (.38)
.38 (.49)
.41 (.54)
.50 (.63)
.43 (.54)
.41 (.50)
Math Total
.48 (.60)
.58 (.78)
.62 (.80)
.68 (.84)
.65 (.79)
.64 (.77)
Core Total
.51 (.61)
.59 (.73)
.66 (.83)
.69 (.83)
.67 (.79)
.67 (.78)
Social Studies
.45 (.59)
.54 (.72)
.58 (.79)
.67 (.82)
.59 (.74)
.59 (.73)
Science
.41 (.53)
.53 (.70)
.59 (.80)
.60 (.77)
.59 (.74)
.62 (.77)
Maps and Diagrams
.33 (.43)
.53 (.71)
.57 (.78)
.60 (.79)
.55 (.71)
.56 (.70)
Reference Materials
.27 (.34)
.50 (.64)
.55 (.73)
.59 (.75)
.54 (.67)
.58 (.70)
Sources of Information Total
.47 (.58)
.56 (.71)
.60 (.78)
.63 (.79)
.59 (.72)
.61 (.73)
Composite
.52 (.62)
.60 (.74)
.67 (.84)
.71 (.85)
.68 (.80)
.68 (.79)
Note: Correlations in parentheses are adjusted for unreliability.
156
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 157
Test Score Characteristics
Internal-consistency reliability coefficients (KuderRichardson Formula 20) were established with data
from the standardization sample. Means, standard
deviations, reliability coefficients, and standard
errors of measurement for raw scores and
developmental standard scores are shown in
Table 9.6.
Reliability coefficients of listening tests are almost
always lower than those of similar-length tests of
other basic skills test areas. This is probably
because listening comprehension, like reading
comprehension, represents cognitive processes
ranging from simple recall and attention span to
thinking that involves application and generalization.
Homogeneous tests, such as vocabulary and
spelling, yield relatively high reliability per unit of
testing time. Multi-dimensional tests that sample
cognitive processes from different domains tend to
be less reliable.
The reliability coefficients from the national
standardization should be viewed as lower limits.
They were obtained under standardization conditions
with no special preparation or motivation. In most
classrooms, reliability can probably be improved by
(a) discussing the importance of being a good
listener and the rationale of the Listening
Assessment, (b) rehearsing the presentation of test
items to ensure effective delivery, and (c) pacing test
administration so the students pay attention.
The correlations between standard scores on the
Listening Assessment and the ITBS appear in
Table 9.7. These are based on all students from the
1992 fall standardization who took the Listening
Assessment and the Complete Battery of the ITBS.
Correlations adjusted for unreliability based on the
K-R 20 reliability coefficients for the two sets of
variables are also presented. These estimate
correlations between scores from perfectly reliable
tests. They indicate whether the two variables
measure the same ability or unique abilities.
Most of the adjusted coefficients in Table 9.8 are
considerably below 1.00, which suggests the
Listening Assessment measures something different
from the other ITBS tests. Of particular interest are
the adjusted correlations with reading. The
cognitive processes involved in reading and
listening appear similar, even though reading and
listening involve different senses. The ITBS
Listening and Reading tests do have discriminant
validity, however, as evidenced by the adjusted
correlations ranging from .54 to .78. This range of
correlations (between “perfectly reliable” tests) is
observed for tests as different as Listening and
Math (adjusted correlations ranging from .60 to .84),
or Listening and Science (.53 to .80). Although
reading and listening have strong similarities in
development and assessment, their measurement in
The Iowa Tests allows for reliable assessment of
their differences.
Predictive Validity
Data collected in the standardization illustrate how
well the Listening test of the ITBS predicts later
achievement. Data from students who took Level 8
in the spring of grade 2 and Level 9 in the fall of
grade 3 were used to compare the predictive ability
of the Listening test to the predictive ability of other
subtests. Table 9.8 presents the correlations
between the Listening test taken in the spring and
subsequent achievement as measured by the ITBS
in the fall.
Table 9.8
Correlations Between Listening Grade 2
and ITBS Grade 3
ITBS Test
Listening
Vocabulary
.49
Reading Total
.52
Language Total
.42
Math Total
.51
Social Studies
.50
Science
.48
Sources of Information Total
.49
Composite with Computation
.55
Integrated Writing Skills Test
The Integrated Writing Skills Test (IWST) measures
editing skills by having students evaluate passages
that resemble school-based writing. More information
about the IWST can be obtained from the Integrated
Writing Skills Test Score Conversions and Technical
Summary.
Description of the Tests
The content and style of the passages in the
Integrated Writing Skills Test are patterned after
the writing of students in grades 3 through 8. Items
157
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 158
ask students to judge the appropriateness of
selected parts of the passage and to indicate where
changes are needed. A total score is reported, as well
as scores in spelling, capitalization, punctuation,
usage, expression, and multiple skills. Raw scores
can be converted to scaled scores, grade equivalents,
and percentile ranks to interpret performance as it
relates to the regular ITBS battery.
independent clauses, or recognizing that a
homophone of a word was used and should be
changed. Including a higher proportion of items that
tap written expression and multiple language skills
makes the IWST especially useful as a measure of
the editorial skills used in the revision stage of the
writing process.
Standardization
Test Development
The Integrated Writing Skills Test was developed
from the same content specifications as the fourpart Language test in the regular ITBS battery.
Items are presented in an age-appropriate story,
report, or essay written in the voice of a student at a
particular grade level. Items measure a student’s
ability to distinguish between correctly and
incorrectly spelled words; to apply generally
accepted conventions for capitalization, punctuation,
and English usage; and to select effective language
for writing. These are essential skills in
development of the ability to write effectively.
An important feature of the IWST is its increased
emphasis on usage and written expression from
grade 3 to grade 8. The emphasis on items that tap
written expression increases across levels, while the
emphasis on language mechanics (spelling,
capitalization, and punctuation) decreases. The
number of items that require students to integrate
language mechanics and usage in deciding the best
answer also increases with test level. Such items,
classified as multiple-skills questions, might involve
identifying the improper use of a comma to separate
Forms K and M of the IWST were equated to the
four-part Language tests in two special studies. In
each study, a single-group equating design was used
in which students took the IWST and the four-part
Language test from the Complete Battery of the
ITBS. Raw-score equipercentile equivalents were
determined using analytic smoothing. From these
relationships, raw score to standard score conversions
were developed for the IWST. Approximately 1,000
students per grade participated in the Form K
study; approximately 800 per grade participated in
the Form M study.
Test Score Characteristics
Raw-score means and standard deviations as well as
internal-consistency reliability estimates for the
IWST are given in Table 9.9.
The relationship between an integrated approach to
language skills assessment and the traditional
ITBS approach was studied by Bray and Dunbar
(1994). They found that the two formats were
similar in average difficulty and discrimination, and
that the distributions of item p-values were similar.
Table 9.9
Test Summary Statistics
Integrated Writing Skills Test, Form M
Grade
Number of Items
158
3
4
5
6
7
8
38
44
49
52
55
57
Fall
Mean
SD
SEM
K-R 20
18.65
7.92
2.7
.88
23.36
8.67
3.0
.88
27.73
9.66
3.1
.90
31.19
10.58
3.1
.91
33.28
11.17
3.2
.92
35.90
9.91
3.3
.89
Spring
Mean
SD
SEM
K-R 20
23.34
8.05
2.6
.89
27.08
8.90
2.9
.90
30.70
9.83
3.0
.91
33.39
10.57
3.0
.91
35.31
10.98
3.1
.92
37.36
9.80
3.2
.89
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 159
Table 9.10
Correlations Between IWST and ITBS Reading and Language Tests
Content Area
Grade 4
Reading
.74
IWST Language Test
.64
ITBS Language Total
Grade 8
.74
.61
.62
.62
.81
.86
.71
.80
.76
.83
.63
.94
.95
.97
.96
.64
.72
.74
.77
.72
.53
.60
.86
.86
.76
.54
.61
.86
.68
.76
.55
.65
.88
.68
.72
.61
.64
.89
.63
.69
.68
.77
Spelling
.54
.61
.86
Capitalization
.53
.67
.85
.64
Punctuation
.53
.67
.86
.65
.71
Usage & Expression
.71
.71
.87
.63
.65
.64
.70
.68
.59
.62
.63
.67
.78
.68
.71
.76
.72
.94
.97
.99
.97
.80
.80
.71
.87
.80
.87
.74
Note: Correlations above the diagonal in italics were adjusted for unreliability in both measures.
The two formats yielded different reliabilities per
unit of testing time, however. Integrated items
required more testing time to achieve the same level
of reliability. Although the integrated format might
require more reading than the traditional format
(and show a stronger relation with ITBS Reading
scores), this was not necessarily the case. Table 9.10
presents the correlations between Language tests in
the two formats as well as the correlations between
language and reading.
Iowa Algebra Aptitude Test
The Iowa Algebra Aptitude Test (IAAT) was
developed to help teachers and counselors make
informed decisions about initial placement of
students in the secondary mathematics curriculum.
In making such decisions, recommendations of
previous teachers should be given considerable
weight. These cannot usually be the only
determining factor, however, since a group of junior
high or middle school students will typically have
had different teachers, and it is unlikely the
teachers share a common standard for judging
students’ math abilities. Given the desire for
objective evidence to supplement teacher
recommendations, IAAT was developed to provide a
standardized measure of math aptitude.
More information about the IAAT can be obtained
from the Iowa Algebra Aptitude Test Manual for Test
Use, Interpretation, and Technical Support (Fourth
Edition).
Description of the Test
The IAAT provides a four-part profile of students
that identifies specific areas of strength and
weakness: Interpreting Mathematical Information
(18 items, 10 minutes); Translating to Symbols (15
items, 8 minutes); Finding Relationships (15 items,
8 minutes); and Using Symbols (15 items, 10
minutes).
Test Development
The item development plan included a close
examination of current algebra and pre-algebra
textbooks. Research literature in math education
was studied to determine current thinking on the
beginning of the secondary math curriculum as well
as possible promising future directions. The NCTM
Standards also were a guiding force in the
development of the IAAT.
Item development and tryout for the Fourth Edition
of the IAAT began in 1988 and continued through
1990. Items in the final forms were selected for
content coverage, difficulty level, and discriminating
power. In selecting items, the first priority was to
match the table of specifications as closely as
possible. Given this restriction, an effort was made
to select the most discriminating items with
difficulty indices between .20 and .80.
Standardization
The Fourth Edition of the Iowa Algebra Aptitude
Test, Forms 1 and 2, was standardized in 1991. Over
8,000 students from 98 public and private school
systems across the United States participated.
Three stratifying variables were used to select
participants in the IAAT standardization:
geographic region, district enrollment, and
socioeconomic status (SES) of the community.
Within each cell of the Region by Size by SES
matrix, school districts were randomly selected to
participate in the standardization study. In addition,
within each geographic region at least one Catholic
school was selected to participate.
159
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 160
Table 9.11
Test Summary Statistics
Iowa Algebra Aptitude Test – Grade 8
1991 Fall National Standardization
Number of Items
Interpreting
Mathematical
Information
Translating
to
Symbols
Finding
Relationships
Using
Symbols
Total
18
15
15
15
63
Form 1
Mean
SD
SEM
K-R 20
10.03
3.35
1.77
.72
9.55
2.89
1.61
.69
9.14
3.59
1.56
.81
8.87
3.19
1.66
.73
37.59
10.57
3.34
.90
Form 2
Mean
SD
SEM
K-R 20
10.63
3.78
1.77
.78
8.58
2.91
1.67
.67
8.67
3.84
1.63
.82
9.19
3.17
1.65
.73
37.07
11.05
3.32
.91
Test Score Characteristics
Table 9.11 provides the descriptive statistics for
both forms of the IAAT. The two forms have been
equated so that scores on both fall on a common
standard score scale. Normative data are currently
provided for eighth-grade students only. Also in the
table are internal-consistency reliability estimates
for the Composite score and each subtest of the IAAT.
The IAAT scores have reasonably large reliability
coefficients given the length of the subtests.
It is important that a test used for selection
purposes have evidence of criterion-related validity.
Validity evidence for the Fourth Edition of the IAAT
was collected in a study in which students took the
IAAT in the fall, and first- and second-semester test
scores and grades in Algebra 1 were collected. The
correlations of the IAAT Composite scores, the two
semester exam scores, and the semester grades
appear in Table 9.12.
In addition, multiple regression analyses were
carried out using the IAAT Composite scores and
the ITBS Math Total Composite scores as predictor
variables and the Algebra 1 grades and test scores
as the criterion measures. Of interest was whether
the IAAT scores add significantly to predictions of
the four criterion variables, given that the ITBS
Math Total scores were already available. The
rationale for these analyses was that most schools
probably have available scores from standardized
tests. If the IAAT scores cannot contribute to the
prediction of success in Algebra 1 beyond information
160
provided by data on hand, the usefulness of the
IAAT would be in doubt. The regression analyses
showed the IAAT Composite scores did significantly
add to the prediction of success in Algebra 1.
A common concern in using tests for selection is
predictive bias. In addition to content reviews for all
types of bias, Barron, Ansley, and Hoover (1991)
conducted a study of potential gender bias in
predicting success in algebra with the IAAT. They
found that IAAT scores did not yield biased
predictions of success for females or males.
Table 9.12
Correlations Between IAAT and
Algebra Grades and Test Scores
Iowa Algebra Aptitude Test – Grade 8
1991 Fall National Standardization
Algebra 1
IAAT Composite
First Semester
Exam
.69 (.84)
Grades
.49 (.75)
Second Semester
Exam
.65 (.82)
Grades
.45 (.74)
Note: Correlations in parentheses have been corrected for range
restriction.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 161
Works Cited
American Educational Research Association,
American Psychological Association, and
National Council on Measurement in Education.
(1999). Standards for educational and
psychological testing. Washington, DC:
American Educational Research Association.
Becker, D. F. and Dunbar, S. B. (1990, April).
Common dimensions in elementary and
secondary school survey achievement batteries.
Paper presented at the annual meeting of the
National Council of Measurement in Education,
Boston.
The American heritage dictionary of the English
language (4th ed.). (2000). New York:
Houghton-Mifflin.
Becker, D. F. and Forsyth, R. A. (1992). An empirical
investigation of Thurstone and IRT methods of
scaling achievement tests. Journal of
Educational Measurement, 29: 341–354.
Andrews, K. M. (1995). The effects of scaling design
and scaling method on the primary score scale
associated with a multi-level achievement test.
Unpublished doctoral dissertation, The
University of Iowa, Iowa City.
Becker, D. F. and Forsyth, R. A. (1994). Gender
differences in mathematics problem-solving and
science: A longitudinal analysis. International
Journal of Educational Research, 21: 407–416.
Ankenmann, R. D., Witt, E. A., and Dunbar, S. B.
(1999). An investigation of the power of the
likelihood ratio goodness-of-fit statistic in
detecting differential item functioning. Journal of
Educational Measurement, 36: 277–300.
Beggs, D. L. and Hieronymus, A. N. (1968).
Uniformity of growth in the basic skills
throughout the school year and during the
summer. Journal of Educational Measurement,
5: 91–97.
Ansley, T. N. and Forsyth, R. A. (1982, March). Use of
standardized achievement test results at the
secondary level. Paper presented at the annual
convention of the American Educational
Research Association, New York.
Bishop, N. S. and Frisbie, D. A. (1999). The effects of
test item familiarization on achievement test
scores. Applied Measurement in Education, 12:
327–341.
Ansley, T. N. and Forsyth, R.A. (1983). Relationship
of elementary and secondary school
achievement test scores to college performance.
Educational and Psychological Measurement,
43: 1103–1112.
Barron, S. I., Ansley, T. N., and Hoover, H. D. (1991,
March). Gender differences in predicting
success in high school algebra. Paper presented
at the annual convention of the American
Educational Research Association, Chicago.
Barron, S. I., Ansley, T. N., and Hoover, H. D. (1991,
April). Gender differences in predicting success
in high school algebra. Paper presented at the
annual convention of the American Educational
Research Association, Chicago.
Bormuth, J. R. (1969, March). Development of
readability analyses. Washington, DC: Office of
Education. (ERIC Document Reproduction
Service No. ED 029 166).
Bormuth, J. R. (1971). Development of standards of
readability: Toward a rational criterion of
passage performance. Final report. (ERIC
Document Reproduction Service No. ED 054
233).
Bray, G. B. and Dunbar, S. B. (1994, April). Influence
of item format on the internal characteristics of
alternate forms of tests of language skills. Paper
presented at the annual meeting of the National
Council on Measurement in Education, New
Orleans, LA.
161
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 162
Brennan, R. L. (1992). Elements of generalizability
theory. ACT Publications, Iowa City, IA.
Brennan, R. L. and Lee, W. (1997). Conditional
standard errors of measurement for scale
scores using binomial and compound binomial
assumptions. ITP Occasional Paper No. 41,
Iowa Testing Programs, The University of Iowa,
Iowa City.
Chall, J. S. and Dale, E. (1995). Readability revisited:
The new Dale-Chall readability formula.
Cambridge, MA: Brookline Books.
The Chicago manual of style (14th ed.). (1993).
Chicago: University of Chicago Press.
Cronbach, L. J. (1971). Test validation. In R.L.
Thorndike (Ed.), Educational Measurement.
Washington, DC: American Council on
Education.
Dale, E. and Chall, J. S. (1948, January/February). A
formula for predicting readability. Educational
Research Bulletin, 27: 11–20, 37–54.
Dale, E. and O’Rourke, J. (1981). The living word: A
national vocabulary inventory. Chicago: World
Book-Childcraft International.
Davison, A. and Kantor, R. N. (1982). On the failure
of readability formulas to define readable texts:
A case study from adaptations. Reading
Research Quarterly, 17: 187–209.
Dorans, N. J. and Holland, P. W. (1993). DIF
detection and description: Mantel-Haenszel and
standardization. In P. W. Holland and H. Wainer
(Eds.), Differential Item Functioning, 35–66.
Hillsdale, NJ: Erlbaum.
Dunbar, S. B., Ordman, V. L., and Mengeling, M. A.
(2002). A comparative analysis of achievement
by Native Americans in Montana. Iowa Testing
Programs, The University of Iowa, Iowa City.
Dunn, G. E. (1990). Relationship of eighth-grade
achievement test scores to ninth-grade
teachers’ marks. Unpublished doctoral
dissertation, The University of Iowa, Iowa City.
162
Feldt, L. S. (1984). Some relationships between the
binomial error model and classic test theory.
Educational and Psychological Measurement,
40 (4): 883–891.
Feldt, L. S. (1984). Testing the significance of
differences among reliability coefficients.
Proceedings of the Fourth Measurement and
Evaluation Symposium of the American Alliance
for Health, Physical Education, Recreation and
Dance (invited research paper), University of
Northern Iowa, Cedar Falls.
Feldt, L. S. (1997). Can validity rise when reliability
declines? Applied Measurement in Education,
10: 377–387.
Feldt, L. S. and Brennan, R. L. (1989). Reliability. In
R. L. Linn (Ed.), Educational Measurement (3rd
ed.) (pp. 105–146). New York: American Council
on Education and Macmillan Publishing.
Feldt, L. S. and Qualls, A. L. (1998). Approximating
scale score standard error of measurement from
the raw score standard error. Applied
Measurement in Education, 11 (2): 159–177.
Flanagan, J. C. (1951). Units, scores, and norms. In
E. F. Lindquist (Ed.), Educational Measurement
(pp. 695–763). Washington, DC: American
Council on Education.
Forsyth, R. A., Ansley, T. N., and Twing, J. S. (1992).
The validity of normative data provided for
customized tests: Two perspectives. Applied
Measurement in Education, 5: 49–62.
Frisbie, D. A. and Andrews, K. M. (1990).
Kindergarten pupil and teacher behavior during
standardized achievement testing. Elementary
School Journal, 90: 435–438.
Frisbie, D. A. and Cantor, N. K. (1995). The validity of
scores from alternative methods of assessing
spelling achievement. Journal of Educational
Measurement, 32 (1): 55–78.
Gerig, J. A., Nibbelink, W. H., and Hoover, H. D.
(1992). The effect of print size on reading
comprehension. Iowa Reading Journal, 5:
26–28.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 163
Harris, D. J. and Hoover, H. D. (1987). An application
of the three-parameter IRT model to vertical
equating. Applied Psychological Measurement,
2: 151–159.
Hieronymus, A. N. and Hoover, H. D. (1986). Manual
for school administrators, Levels 5–14, Iowa
Tests of Basic Skills Forms G/H. Chicago:
Riverside Publishing.
Hieronymus, A. N. and Hoover, H. D. (1990). Manual
for school administrators supplement, Levels
5–14, Iowa Tests of Basic Skills Form J.
Chicago: Riverside Publishing.
Hoover, H. D. (1984). The most appropriate scores
for measuring educational development in the
elementary schools: GEs. Educational
Measurement: Issues and Practice, 3: 8–14.
Kolen, M. J. and Brennan, R. L. (1995). Test equating
methods and practices. New York:
Springer-Verlag.
Koretz, D. M. (1986). Trends in educational
achievement. Congress of the U.S.,
Congressional Budget Office, Washington, DC.
Lee, S. J. (1995). Gender differences in the Iowa
Writing Assessment. Unpublished master’s
thesis, The University of Iowa, Iowa City.
Lee, G., Dunbar, S. B., and Frisbie, D. A. (2001). The
relative appropriateness of eight measurement
models for analyzing scores from tests
composed of testlets. Educational and
Psychological Measurement, 61: 958–975.
Hoover, H. D. (2003). Some common misconceptions
about tests and testing. Educational
Measurement: Issues and Practice, 22 (1): 5–14.
Lewis, J. C. (1994). The effect of content and gender
on assessment of estimation. Paper presented
at the annual convention of the National Council
on Measurement in Education, New Orleans,
LA.
Hoover, H. D., Dunbar, S. B., Frisbie, D. A., and
Qualls, A. L. (2003). Teacher’s Directions and
Interpretive Guide, Iowa Early Learning
Inventory. Itasca, IL: Riverside Publishing.
Linn, R. L. and Dunbar, S. B. (1982). Predictive
validity of admissions measures: Corrections for
selection on several variables. Journal of
College Student Personnel, 23: 222–226.
Huang, C. (1998). Factors influencing the reliability of
DIF detection methods. Unpublished doctoral
dissertation, The University of Iowa, Iowa City.
Linn, R. L. and Dunbar, S. B. (1990). The nation’s
report card goes home: Good news and bad
about trends in achievement. Phi Delta Kappan,
72: 127–133.
Jarjoura, D. (1986). An estimator of examinee-level
measurement error variance that considers test
form difficulty adjustments. Applied
Psychological Measurement, 10: 175–186.
Keats, J. A. (1957). Estimation of error variances of
test scores. Psychometrika, 22: 29–41.
Kolen, M. J. (1981). Comparison of traditional and
item response theory methods for equating
tests. Journal of Educational Measurement, 18:
1–11.
Kolen, M. J. (1984). Effectiveness of analytic
smoothing in equipercentile equating. Journal of
Educational Statistics, 9: 25–44.
Linn, R. L., Baker, E. L., and Dunbar, S. B. (1991).
Complex performance-based assessments:
Expectations and validation criteria. Educational
Researcher, 20: 15–21.
Linn, R. L. (Ed.) (1993). Educational measurement.
National Council on Measurement in Education
and American Council on Education. Phoenix,
AZ: Oryx Press.
Loyd, B. H. (1980, April). An investigation of
differential item performance by Anglo and
Hispanic pupils. Paper presented at the annual
convention of the American Educational
Research Association, Boston.
163
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 164
Loyd, B. H. (1980). Functional level testing and
reliability: An empirical study. Unpublished
doctoral dissertation, The University of Iowa,
Iowa City.
Loyd, B. H. (1980). The effect of item ordering and
speed on Rasch Model item parameter
estimates. Paper presented at the Iowa
Educational Research and Evaluation
Association, Iowa City, IA.
Loyd, B. H. and Hoover, H. D. (1980). Vertical
equating using the Rasch Model. Journal of
Educational Measurement, 17: 179–193.
Loyd, B. H., Forsyth, R. A., and Hoover, H. D. (1980).
Relationship of elementary and secondary
school achievement test scores to later
academic success. Educational and
Psychological Measurement, 40: 1117–1124.
Lu, S. and Dunbar, S. B. (1996, April). Assessing the
accuracy of the Mantel-Haenszel DIF statistic
using the bootstrap method. Paper presented at
the annual meeting of the American Educational
Research Association, New York City.
Lu, S. and Dunbar, S. B. (1996, April). The influence
of conditioning variables on assessing DIF in a
purposefully multidimensional test. Paper
presented at the annual meeting of the National
Council on Measurement in Education, New
York City.
Martin, D. J. (1985). The measurement of growth in
educational achievement. Unpublished doctoral
dissertation, The University of Iowa, Iowa City.
Martin, D. J. and Dunbar, S. B. (1985). Hierarchical
factoring in a standardized achievement battery.
Educational and Psychological Measurement,
45: 343–351.
Mengeling, M. A. (2002). An analysis of district and
school variance using hierarchical linear
modeling and longitudinal standardized
achievement data. Unpublished doctoral
dissertation, The University of Iowa, Iowa City.
164
Mengeling, M. and Dunbar, S. B. (1999, April).
Temporal stability of standardized test scores in
the early elementary grades. Paper presented at
the annual meeting of the American Educational
Research Association, Montreal, Canada.
Merriam-Webster’s dictionary of English usage.
(1994). Springfield, MA: Merriam-Webster.
Messick, S. (1989). Validity. In R. L. Linn (Ed.),
Educational Measurement (3rd ed., 13–103).
New York: American Council on
Education/Macmillan Series on Higher
Education.
Mittman, A. (1958). An empirical study of methods of
scaling achievement tests at the elementary
grade level. Ph.D. thesis, The University of
Iowa, Iowa City.
Mollenkopf, W. G. (1949). Variation of the standard
error of measurement. Psychometrika, 14:
189–229.
National Catholic Educational Association. (2000).
NCEA/Ganley’s Catholic schools of America
(28th ed.). Silverthorne, CO: Fisher Publishing
Company.
National Center for Education Statistics. (2000).
Digest of Education Statistics, 2000.
Washington, DC: U.S. Department of Education.
National Council of Teachers of Mathematics. (1989).
Curriculum and Evaluation Standards for School
Mathematics. Reston, VA: Author.
National Council of Teachers of Mathematics. (2000).
Principles and Standards for School
Mathematics. Reston, VA: Author.
National Council for the Social Studies. (1994).
Curriculum standards for social studies:
Expectations of excellence. Washington, DC:
Author.
National Research Council. (1996). National Science
Education Standards: observe, interact, change,
learn. Washington, DC: National Academy
Press.
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 165
Nibbelink, W. H. and Hoover, H. D. (1992). The
student teacher effect on elementary school
class achievement. Journal of Research for
School Executives, 2: 61–65.
Powers, R. D., Sumner, W. A., and Kearl, B. E.
(1958). A recalculation of four adult readability
formulas. Journal of Educational Psychology,
49: 99–105.
Nibbelink, W. H., Gerig, J. A., and Hoover, H. D.
(1993). The effect of print size on achievement
on mathematics problem solving. School
Science and Mathematics, 93: 20–23.
Quality Education Data. (2002). QED national
education database, [Data file]. Available from
Quality Education Data Web site,
http://www.qeddata.com/databaselic.htm
O’Conner, P. T. (1996). Woe is I: the grammarphobe’s
guide to better English in plain English. New
York: Putnam.
Qualls, A. L. (1980). Black and white teacher ratings
of elementary achievement test items for
potential race favoritism. Unpublished master’s
thesis, The University of Iowa, Iowa City.
Pearsall, M. K. (1993). The content core: A guide for
curriculum designers. (Rev. ed.). Washington,
DC: National Science Teachers Association.
Perkhounkova, Y., Hoover, H. D., and Ankenmann, R.
D. (1997, March). An examination of construct
validity of multiple-choice versus constructedresponse tests. Paper presented at the annual
meeting of the National Council in Measurement
in Education, Chicago.
Perkhounkova, Y. and Dunbar, S. B. (1999, April).
Influences of item content and format on the
dimensionality of tests combining multiplechoice and open-response items: An application
of the Poly-DIMTEST Procedure. Paper
presented at the annual meeting of the
American Educational Research Association,
Montreal, Canada.
Petersen, N. S., Kolen, M. J., and Hoover, H. D.
(1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational Measurement (3rd ed.).
Washington, DC: American Council on
Education.
Plake, B. S. (1979). The interpretation of norm-based
scores from individualized testing using the Iowa
Tests of Basic Skills. Psychology in the Schools,
16: 8–13.
Polya, G. (1957). How to solve it; a new aspect of
mathematical method (2nd ed.). Garden City,
NY: Doubleday.
Qualls-Payne, A. L. (1992). A comparison of score
level estimates of the standard error of
measurement. Journal of Educational
Measurement, 29: 213–225.
Qualls, A. L. and Ansley, T. N. (1995). The predictive
relationship of achievement test scores to
academic success. Educational and
Psychological Measurement, 55: 485–498.
Riverside Publishing. (1998). The Iowa Tests: Special
report on Riverside’s national performance
standards. Itasca, IL: Author.
Rosemier, R. A. (1962). An investigation of
discrepancies in percentile ranks between a
grade eight administration of ITBS and a grade
nine administration of ITED. Unpublished study,
Iowa Testing Programs, Iowa City, IA.
Rutherford, F. J. and Ahlgren, A. (1990). Science for
all Americans: Project 2061 American
Association for the Advancement of Science.
(Rev. ed.). New York: Oxford University Press.
Scannell, D. P. (1958). Differential prediction of
academic success from achievement test
scores. Unpublished doctoral dissertation, The
University of Iowa, Iowa City.
Schoen, H. L., Blume, G., and Hoover, H. D. (1990).
Outcomes and processes on estimation test
items in different formats. Journal for Research
in Mathematics Education, 21: 61–73.
165
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 166
Snetzler, S. and Qualls, A. L. (2002). Examination of
differential item functioning on a standardized
achievement battery with limited English
proficient students. Educational and
Psychological Measurement, 60: 564–577.
Snow, R. E. and Lohman, D. F. (1989). Implications of
cognitive psychology for educational
measurement. In R. L. Linn (Ed.), Educational
Measurement (3rd ed.). American Council on
Education, Washington, DC.
Spache, G. D. (1974). Good reading for poor readers.
Champaign, IL: Garrard Publishing.
Thorndike, R. L. (1951). Reliability. In E.F. Lindquist
(Ed.), Educational Measurement. American
Council on Education, Washington, DC.
Thorndike, R. L. (1963). The concepts of over- and
under-achievement (ERIC Document
Reproduction Service No. ED 016 250).
Witt, E. A. (1993). The construction of an analytic
score scale for the direct assessment of writing
and an investigation of its reliability and validity.
Unpublished doctoral dissertation, The
University of Iowa, Iowa City.
Witt, E. A., Ankenmann, R. D., and Dunbar, S. B.
(1996, April). The sensitivity of the MantelHaenszel statistic to variations in sampling
procedure in DIF analysis. Paper presented at
the annual meeting of the National Council on
Measurement in Education, New York City.
166
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 167
Index
Ability testing, See Cognitive Abilities Test
Accommodations
Catholic sample, 11
English language learners, 14–15
how defined, 12
students with disabilities, 14–15
Achievement and general cognitive ability
correlations between CogAT and ITBS, 131,
137–138
matched sample design, 8–9
measuring unique characteristics, 131
predicting achievement from ability, 131, 136,
143–148
reliability of difference scores, 136, 139–142
Achievement testing, 26–27
Constructed-Response Supplement
description, 6, 153
reliability and validity, 154–155
standardization, 154
test development, 28, 35, 154
Thinking about Language, 35, 154
Thinking about Mathematics, 154
Thinking about Reading, 154
Content and process specifications, 27, 34–36, 38,
41–42
Content Classifications with Item Norms, 48, 87
sample pages, 88–89
criteria for developing, 27
curriculum review, 28
distribution of skill objectives, 31
for subtests in ITBS, 30–43
NCTM Standards, 27
role in test design, 27
Bias
predictive, 160
See also differential item functioning; fairness
review; group differences; validity
Content Classifications with Item Norms, 48,
87–89
Catholic Schools in America: Year 2000, 8
Content validity, 26
See also validity
Ceiling and floor effects
assigning test levels, 100
chance level, 55, 100
summary data, 101–105
See difficulty
Cognitive Abilities Test, 7, 127, 131–148
Comparability of test forms
concurrent assembly, 60
equivalent-forms reliability, 74–76
importance in equating, 60
relationship of Forms A and B to previous forms,
61–62
Completion rates
indices to describe, 100
issues related to time limits on tests, 32, 39, 100
summary data, 101–105
Content standards, See content and process
specifications
Correlations among test scores
building averages, 127, 131–135
Cognitive Abilities Test, 137–138
Constructed-Response Supplement, 154–155
developmental standard scores, 121, 122–126
extraneous factors, 121
Integrated Writing Skills Test, 158–159
Iowa Algebra Aptitude Test, 160
Iowa Writing Assessment, 152–153
Listening Assessment for ITBS, 156–157
Criterion-referenced score interpretation, 87
Critical thinking skills, 43
classifying items, 44
how defined, 43
Cut score, 44
167
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 168
Description of the ITBS batteries and levels, 2–5
Developmental score scales, 51
grade-equivalent (GE) scores, 54
in previous editions, 51–52
national standard score scale, 52–54
purposes of, 54–55
Diagnostic uses of tests, 87
Differential item functioning (DIF), 27, 30, 116–119
Difficulty
Content Classifications with Item Norms, 48, 87
sample pages, 88–89
effects on test ceiling/floor, 100, 101–105
for different types of tests, 87
in norms booklets, 94, 100
in test development, 87, 94
individualized testing, 94
item difficulty
definition, 87
distributions of, 90–93
summary statistics, 95–99
item norms, 87–89
relation to test reliability, 87
Directions for Administration, 3, 6, 52
Gender differences
across content domains, 110
effect size vs. bias, 114
in achievement, 107
in composite scores, 110
in language, 107, 110
in math, 110
in variability of test scores, 110
Iowa Algebra Aptitude Test, 160
summary statistics, 111–113
trends over time, 110, 113
Grade-equivalent (GE) scale, 54
Group differences
by gender, 107, 111–113
by race/ethnicity, 107, 114–115
in standard errors of measurement, 107–109
in test scores, 107
Growth, 1, 33, 39, 44, 51–55, 87–89, 121
Individualized testing, 45, 83, 100
Integrated Writing Skills Test, 157–159
correlations with ITBS, 158–159
description and development, 157–158
Discrimination
item discrimination
definition, 94
summary statistics, 95–99
relation to test reliability, 87
Iowa Algebra Aptitude Test
description, 159
predictive validity, 160
standardization, 159
test development, 159
test score characteristics, 160
Effect size, 110–111, 115
Iowa Basic Skills Testing Program, 1, 28, 59–60
English language learners, 14–15, 25
Iowa Early Learning Inventory, 150–151
Equating Forms A and B, 60–61, 149–150
Complete Battery, 60
design and methods, 60
equivalence of forms, 60–62
Survey Battery, 149–150
Iowa Tests of Basic Skills
Complete Battery, 2
Core Battery, 2
criteria for instrument design, 27
description of levels, 2–5
Survey Battery, 2, 149–150
Fairness review, 30, 116–118
Iowa Tests of Educational Development, 127
Field testing, See item tryout
Iowa Writing Assessment, 6, 34
description, 151
reliability and validity, 152–153
standardization, 152
test development, 28, 152
Floor effects, See ceiling and floor effects
Gain scores, See growth
168
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 169
Item difficulty, See difficulty
Item discrimination, See discrimination
Item response theory, 61
Item tryout, 28, 30
how designed, 28–30
national data, 28, 30
numbers of items included, 28
preliminary studies, 28
statistical analyses, 28
Joint Scaling
Constructed-Response Supplement, 154
of ITBS and ITED, 127
Concepts and Estimation test, 38–39
context and symbolic form, 39
estimation strategies, 39
grade placement of test content, 39
evolution of test design, 38
Modern Mathematics Supplement, 38
NCTM Standards, 27, 38, 39, 40
Problem Solving and Data Interpretation test,
39–40
computational skills required, 41
data interpretation skills, 40
multiple-step problems, 40
Polya’s model, 40
NCTM Standards, 27, 38, 39, 40
National Education Database™, 7
Language, 34–38
Capitalization and Punctuation tests, 37
Complete Battery vs. Survey Battery, 36
content standards, 35–36, 38
effect of linguistic change, 35
item format, 35–38
Levels 5–6 of ITBS, 35
Levels 7–14 of ITBS, 35–38
relation to Reference Materials test, 36
reliability and validity, 37–38
Spelling test, 35–37
efficiency, 37
selection of words, 37
standard written English, 35
Usage and Expression test, 37–38
National standardization
accommodations, 12, 14–15
Catholic sample, 11
design, 8–9
evidence of quality, 12
gender differences in, 110
Individualized Accommodation Plan (IAP), 12
Individualized Education Program (IEP), 12
number of students, 9–10
participating schools, 16–24
public sample, 11
racial-ethnic differences in, 114
racial-ethnic representation, 12–13
Section 504 Plan, 12
weighting, 5–9
Linking, See equating Forms A and B
Norm-referenced interpretation, 51, 87
Listening, 33–34
cognitive aspects of, 34
content/process standards, 34
Levels 5–9 of ITBS, 33–34
See also Listening Assessment for ITBS
Norms, 14, 55
calculating dates for twelve-month schools, 14
changes over time, 55–60
comparisons across time, 55–57
defined, 2, 51
fall and spring testing dates, 14
sampling, 7–9
special school populations, 8, 12, 60
vs. standards, 44
Listening Assessment for ITBS, 6, 33–35, 155–157
description, 155
reliability and validity, 156–157
standardization, 155
test development, 155
Orshansky index, 7
Mathematics, 38–41
Computation test, 40–41
grade placement of content, 41
modifications in Forms A and B, 41
Parallel forms, See equating Forms A and B
Percentile rank, 53–54
169
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 170
Performance standards, 44
Precision of measurement, See reliability
Predicting achievement from CogAT scores
building averages, 143–148
estimating measurement error, 136
obtained vs. expected achievement, 136
prediction equations, 136
student scores, 131, 136–138
Predictive validity, 46–47
effects of selection, 46
empirical studies, 46–47
types of estimates, 63–64
equivalent forms, 74
equivalent halves, 75–76
internal-consistency, 64
split halves, 75–76
test-retest, 77
Reporting achievement test results
developmental levels, 51–52, 54
norms vs. standards, 44
strengths and weaknesses, 1, 51
validity considerations, 26–27
Role of standards, 1, 44–46
Primary Reading Profile, 32
Process specifications, See content and process
specifications
Program evaluation, 26–27, 45
Racial-ethnic considerations
differences in achievement, 114–115
national standardization, 12–13
standard errors of measurement, 107, 109
Readability, 48–50
definitions, 48
formulas, 48, 50
applied to Forms A and B, 48–50
interpreting readability indices, 48, 50
judging grade level, 48, 50
Reading, 32–33
comprehension vs. decoding, 32
content/process standards, 34
critical thinking, 33
Levels 6–8 of ITBS, 32
Levels 9–14 of ITBS, 32–33
two-part structure, 32
types of reading materials, 33
Reliabilities of differences in test performance,
127–130, 139–142
Reliability, 63–85
Complete Battery, Form A, 64–73
individualized testing, effect on, 83
Kuder-Richardson Formula 20 (K-R20), 63–64,
75–76
reliability coefficient, defined, 63
standard error of measurement, defined, 63
Survey Battery, Form A, 150
170
Scaling, 51–55
changes over time, 51–52
comparability of developmental scores, 51–52
defined, 51
grade-to-grade overlap, 54
growth model for the ITBS, 51, 54
Hieronymus scaling, 51, 52–54
national scaling study, 52–53
Science, 42
content classifications, 42
National Science Teachers Association, 42
Score Interpretation
evaluating instruction, 44–45
general concerns, 6
improving instruction, 44–45
modification of test content, 45–46
norm-referenced, 7
norms vs. standards, 44
predicting future performance, 46–47
See also validity
Scoring rubric, See
Constructed-Response Supplement
Iowa Writing Assessment
Social Studies, 41–42
content standards, 42
NCSS Standards, 42
relation to other tests, 42
Sources of Information, 43
Maps and Diagrams test, 43
Reference Materials test, 43
skill development, 43
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 171
Stability of scores, 83–85
grade-to-grade correlations, 85
spring-to-fall correlations, 84
Test specifications, See content and process
specifications
Thinking about Language, 35, 154
Standard errors of measurement, 63, 77
Complete Battery, Form A, 65–73
conditional, 77–82
for groups, 107
Thinking about Mathematics, 154
Thinking about Reading, 154
Standardization, See national standardization
Time limits, 3–5
Standards
role of, 1, 44–46
See also content and process specifications
Trends in achievement, 55–60
in national performance, 57
Iowa Basic Skills Testing Program, 59
summary of median differences, 57–59
Standards for Educational and Psychological
Testing, 25
Structural relations among subtests, 121
factor analysis results
Complete Battery, 126
Early Primary Battery, 126
factor structure of ITBS Form A, 121, 126
interpreting factors, 126–127
Test administration
manuals, 3, 6
preparing for testing, 6
Test development, 28–30, 107
differential item functioning (DIF), 116
fairness review, 30, 107, 114, 116, 119
item review, 116, 119
item tryout, 28, 30
Mantel-Haenszel statistics, 116
of individual subtests, 30–43
of Iowa Early Learning Inventory, 151
of Iowa Writing Assessment, 152
of Survey Battery, 149
steps in process, 28–29
Test levels, 2–5
relationship to grade level, 3
subtests included, 2–3
Test modifications, 45–46
See also accommodations
Validity, 1, 25–50
construct-irrelevant variance, 27
content quality, 26
definitions, 25
English language learners, 25
evaluating in achievement tests, 25
alignment, 25
local review, 25
factor analysis results, 121, 126–127
fairness review, 30, 116–118
in relation to purpose, 25
local school concerns, 26–27
NCTM Standards, 27
predictive validity, 46–47
responsibility for, 25–26
Standards for Educational and Psychological
Testing, 25
statistical evidence, 26
Variability, 52–54, 87, 90, 100, 110
Vocabulary, 30–32
Levels 5–8 of ITBS, 30
Levels 9–14 of ITBS, 30–32
The Living Word Vocabulary, 30
word selection, 30–31
Word Analysis, 32
sample content classification, 88
Test score summary statistics
Complete Battery, Form A, 65–73
Integrated Writing Skills Test, 158–159
Iowa Algebra Aptitude Test, 160
Listening Assessment for ITBS, 156
Survey Battery, Form A, 150
171
961464_ITBS_GuidetoRD.qxp
10/29/10
3:16 PM
Page 172