2016 program

Transcription

2016 program

2016 Training Sessions
April 7-8
2016 Annual Meeting
April 9-11
Renaissance Washington, DC Downtown Hotel
Washington, DC
©Thinkstock
2016 PROGRAM
National Council on Measurement in Education
Foundations and Frontiers:
Advancing Educational Measurement for
Research, Policy, and Practice
2016 Training Sessions
April 7-8
2016 Annual Meeting
April 9-11
Renaissance Washington,
DC Downtown Hotel
Washington, DC
#NCME16
Washington, DC, USA
Table of Contents
NCME Board of Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Proposal Reviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Future Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Renaissance Washington, DC Downtown Hotel Meeting Room Floor Plans . . . . . . . 8
Training Sessions
Thursday, April 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Friday, April 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Program
Saturday, April 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Sunday, April 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Monday, April 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Contact Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Schedule-at-a-Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Foundations and Frontiers:
Advancing Educational Measurement for
Research, Policy, and Practice
3
2016 Annual Meeting & Training Sessions
NCME Officers
President
Richard J. Patz
ACT, Iowa City, IA
Vice President
Mark Wilson
UC Berkeley, Berkeley, CA
Past President
Lauress Wise
HUMRRO, Seaside, CA NCME Directors
Amy Hendrickson
The College Board, Newtown, PA
Kristen Huff
ACT, Iowa City, IA
Luz Bay
The College Board, Dover, NH
Won-Chan Lee
University of Iowa, Iowa City, IA
Cindy Walker
University of Wisconsin-Milwaukee, Milwaukee, WI
C Dale Whittington
Shaker Heights (OH) Public Schools, Shaker Heights, OH
4
Washington, DC, USA
Editors
Journal of Educational Measurement
Jimmy de la Torre
Rutgers, The State University of NJ,
New Brunswick, NJ
Educational Measurement
Issues and Practice Dr. Howard Everson
SRI International, Menlo Park, CA
NCME Newsletter
Heather M. Buzick
Educational Testing Service, Princeton, NJ
Website Content Editor
Brett Foley
Alpine Testing Solutions, Denton, NE
2016 Annual Meeting Chairs
Annual Meeting Program Chairs
Andrew Ho
Harvard Graduate School of Education, Cambridge, MA
Matthew Johnson
Columbia University, New York, NY
Graduate Student Issues
Committee Chair
Brian Leventhal
University of Pittsburgh, Pittsburgh, PA
Training and Development
Committee Chair
Xin Li
ACT, Iowa City, IA
Fitness Run/Walk Directors
Katherine Furgol Castellano
ETS, San Francisco, CA
Jill R. van den Heuvel
Alpine Testing Solutions, Hatfield, PA
NCME Information Desk
The NCME Information Desk is located on the Meeting Room Level in the Renaissance
Washington, DC Downtown Hotel. Stop by to pick up a ribbon and obtain your bib number
and tee-shirt for the fun run and walk. It will be open at the following times:
Thursday, April 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7:30 AM-4:30 PM
Friday, April 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-4:30 PM
Saturday, April 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10:00 AM-4:30 PM
Sunday, April 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-1:00 PM
Monday, April 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-1:00 PM
5
Proposal Reviewers
Terry Ackerman
Benjamin Andrews
Robert Ankenmann
Karen Barton
Kirk Becker
Anton Beguin
Dmitry Belov
Tasha Beretvas
Jonas Bertling
Damian Betebenner
Dan Bolt
Laine Bradshaw
Henry Braun
Robert Brennan
Brent Bridgeman
Derek Briggs*
Chad Buckendahl
Li Cai*
Wayne Camara
Katherine Furgol
Castellano
Ying Cheng
Chia-Yi Chiu
Won-Chan Lee*
Dongmei Li
Jinghua Liu
Skip Livingston
JR Lockwood*
Susan Loomis
Krista Mattern
Andy Maul*
Dan McCaffrey
Katie McClarty
Catherine McClellan
Patrick Meyer
Paul Nichols
Maria Oliveri
Andreas Oranje
Thanos Patelis
Susan Philips
Mary Pitoniak
John Poggio
Sophia RabeHesketh
Mark Reckase
Frank Rijmen
Steve Culpepper
Mark Davison
Jimmy de la Torre
John Donoghue
Jeff Douglas
Michael Edwards
Karla Egan*
Kadriye Ercickan
Steve Ferrara
Holmes Finch*
Mark Gierl
Brian Habing
Chris Han
Mark Hansen
Deborah Harris
Kristen Huff*
Minjeong Jeon
Hong Jiao
Matt Johnson
Daniel Jurich
Seock-Ho Kim
Jennifer Kobrin
Suzanne Lane
* Indicates Expert Panel Chairperson
6
Michael Rodriguez
Sandip Sinharay
Steve Sireci
Dubravka Svetina
Ye Tong*
Anna Topczewski
Peter van Rijn
Jay Verkuilen
Alina von Davier
Matthias von Davier
Michael Walker
Chun Wang
Jonathan Weeks
Cathy Wendler
Andrew Wiley
Steve Wise
Duanli Yan
John Young
April Zenisky*
Washington, DC, USA
Graduate Student Abstract Reviewers
Lokman Akbay
Beyza Aksu
Abeer Alamri
Bruce Austin
Elizabeth Barker
Diego Luna Bazaldua
Masha Bertling
Lisa Beymer
Mark Bond
Nuliyana Bukhari
Jie Chen
Michelle Chen
Yi-Chen Chiang
Shenghai Dai
Tianna Floyd
Oscar Gonzalez
Mary Norris
Nese Ozturk
Robyn Pitts
Ray Reichenberg
Sumeyra Sahbaaz
Tyler Sandersfeld
Can Shao
Benjamin Shear
Jordan Sparks
Rose Stafford
Latisha Sternod
Myrah Stockdale
Meghan Sulivan
Ragip Terzi
Stephanie Underhill
Keyin Wang
Emily Ho
Landon Hurley
Charlie Iaconangelo
Andrew Iverson
Kyle Jennings
HeaWon Jun
Susan Kahn
Jaclyn Kelly
Brian Leventhal
Isaac Li
Dandan Liao
Fu Liu
David MartinezAlpizar
Namita Mehta
Rich Nieto
Future Annual Meeting
2017 Annual Meeting
April 26-30
San Antonio, TX
2018 Annual Meeting
April 12-16
New York, NY, USA
2019 Annual Meeting
April 4-8
Toronto, Ontario, Canada
7
Min Wang
Ting Wang
Xiaolin Wang
Diah Wihardini
Elizabeth Williams
Immanuel Williams
Dawn Woods
Kuan Xing
Jing-Ru Xu
Menglin Xu
Sujin Yang
Ai Ye
Nedim Yel
Hulya Yurekli
Hotel Floor Plans – Renaissance Washington, DC Downtown
8
Washington, DC, USA
9
10
Washington, DC, USA
A Message from Your Program Chairs
2016 NCME Program Highlights: Foundations and Frontiers: Advancing Educational
Measurement for Research, Policy, and Practice
We are pleased to highlight a few of the many excellent sessions that our members have
contributed, as well as congratulate our partners at AERA on their centennial celebration. From the very first conference session, at 8:15AM on Saturday, April 9, we’re kicking it
off with big-picture topics (Henry Braun leading an invited session for the recent NCME
volume: Challenges to Measurement in an Era of Accountability) alongside technical advances
(Derek Briggs leading off a session on Learning Progressions for Measuring Growth). The momentum continues through our last session, at 4:05 on Monday, April 11, where we
tackle buzz phrases (Thanos Patelis convening a session on Fairness Issues and Validation
of Noncognitive Skills) and settle scores (The Great Subscore Debate, with Emily Bo, Howard
Wainer, Sandip Sinharay, and many others facing off to surely resolve the issue once and
for all). We are taking full advantage of our location in Washington, DC, with an invited session
on the recently passed Every Students Succeeds Act over lunchtime on Monday. Peter
Oppenheim and Sarah Bolton, Education Policy Directors (majority and minority,
respectively) for the US Senate HELP Committee will discuss key provisions and spark a
discussion among researchers about ESSA’s Implications and Opportunities for Measurement
Research and Practice. Earlier that Monday morning, Kristen Huff will convene reporters
and scholars in a session with the lively title: Hold the Presses! How Measurement
Professionals can Speak More Effectively with the Press and the Public.
Consistent with our theme, our many sessions highlight both foundations (Isaac
Bejar coordinates a session on Item Response Modeling: From Theory to Practice, while Karla
Egan convenes a session on Standard Setting: Beyond Process) and frontiers (Tracy
Sweet will lead a session on Recent Advances in Social Network Analysis, and Will Lorie takes
on Big Data in Education: From Items to Policies). Stay up to date at the Twitter hashtag #NCME16 and our new NCME Facebook group. We
are confident that you will enjoy the program that you have helped to create here at the
2016 NCME Annual Meeting.
Andrew Ho and Matt Johnson
2016 NCME Annual Meeting Co-Chairs
11
Pre-Conference Training Sessions
The 2016 NCME Pre-Conference Training Sessions will be held at the Renaissance
Washington, DC Downtown Hotel on Thursday, April 7 and Friday, April 8. All full-day
sessions will be held from 8:00 AM to 5:00 PM. All half-day morning sessions will be held
from 8:00 AM to 12:00 noon. All half-day afternoon sessions will run from 1:00 PM to 5:00 PM.
On-site registration for the Pre-Conference Training Sessions will be available at the NCME
Information Desk at the Renaissance Washington, DC Downtown Hotel for those workshops
that still have availability.
Please note that internet connectivity will not be available for most training sessions and,
where applicable, participants should download the software required prior to the training
sessions. Internet connectivity will be available for a few selected training sessions that
have pre-paid an additional fee.
12
Washington, DC, USA
Pre-Conference Training Sessions - Thursday, April 7, 2016
13
14
Washington, DC, USA
Thursday, April 7, 2016
8:00 AM - 12:00 PM, Meeting Room 6, Meeting Room Level, Training Session, AA
Quality Control Tools in Support of Reporting Accurate and Valid Test Scores
Aster Tessema, American Institute of Certified Public Accountants; Oliver Zhang, The College
Board; Alina VonDavier, Educational Testing Service
All testing companies focus on ensuring that the test scores are valid, reliable, and fair. Significant resources are
allocated to meet the guidelines of well-known organizations, such as AERA/NCME, and/or The international Test
Commission Guidelines (Allalouf, 2007; ITC, 2011).
In this workshop we will discuss traditional QC methods, the operational testing process, and new QC tools for
monitoring the stability of scores over time.
We will provide participants a practical understanding of:
1. The importance of flow charts and documentation of procedures
2. The use of software tools to monitor tasks
3. How to minimize the number of hand offs
4. How to automate activities
5. The importance of trend analysis to detect anomalies
6. The importance of applying detective and preventive controls
7. Having a contingency plan
We will also show how to apply QC techniques from manufacturing to monitor scores. We will discuss traditional QC
charts (Shewhart and CUSUM charts), time series models, and change point models to the means of scale scores to
detect abrupt changes (Lee & von Davier, 2013). We will also discuss the QC methods for the process of automated &
human scoring of essays (Wang & von Davier, 2014).
15
8:00 AM - 12:00 PM, Meeting Room 7, Meeting Room Level, Training Session, BB
IRT Parameter Linking
Wim van der Linden and Michelle Barrett, Pacific Metrics
The problem of IRT parameter linking arises when the values of the parameters for the same items or examinees in
different calibrations need to be compared. So far, the problem has mainly be conceptualized as an instance of the
problem of invariance of the measurement scale for the ability parameters in the tradition of S. S. Stevens’ interval
scales. In this half-day training session, we show that the linking problem has not much to do with arbitrary units and
zeros of measurement scales but is the result of a more fundamental problem inherent in all IRT models—general lack
of identifiability of their parameters. The redefinition of the linking problem allows us to formally derive the linking
functions required to adjust for the differences in parameter values between separate calibrations. It also leads to new
efficient statistical estimators of their parameters, the derivation of their standard errors, and the use of current optimal
test-design methods to design linking studies with minimal error. All these results have been established both for
the current dichotomous and polytomous IRT models. The results will be presented during four one-hour lectures
appropriate for psychometricians with interest and/or practical experience in IRT parameter linking problems.
16
Washington, DC, USA
8:00 AM - 5:00 PM, Meeting Room 5, Meeting Room Level, Training Session, CC
21st Century Skills Assessment: Design, Development, Scoring, and Reporting of
Character Skills
Patrick Kyllonen and Jonas Bertling, Educational Testing Service
This workshop will provide training, discussion, and hands-on experience in developing methods for assessing, scoring,
and reporting on students’ social-emotional and self-management or character skills. Workshop will focus on (a)
reviewing the kinds of character skills most important to assess based on current research; (b) standard and innovative
methods for assessing character skills, including self-, peer-, teacher-, and parent- rating-scale reports, forced-choice
(rankings), anchoring vignettes, and situational judgment methods; (c) cognitive lab approaches for item tryout; (d)
classical and item-response theory (IRT) scoring procedures (e.g., 2PL, partial credit, nominal response model); (e)
validation strategies, including the development of rubrics and behaviorally anchored rating scales, and correlations
with external variables; (f ) the use of anchors in longitudinal growth studies, (g) reliability from classical test theory
(alpha, test-retest), item-response theory, and generalizability theory; and (h) reporting issues. These topics will be
covered in the workshop where appropriate, but the sessions within the workshop will tend to be organized around
item types (e.g., forced-choice, anchoring vignettes). Examples will be drawn from various assessments, including PISA,
NAEP, SuccessNavigator, FACETS, and others. The workshop is designed for a broad audience of assessment developers,
analysts, and psychometricians, working in either applied or research settings.
17
8:00 AM - 5:00 PM, Meeting Room 2, Meeting Room Level, Training Session, DD
Introduction to Standard Setting
Chad Buckendahl, Alpine Testing Solutions; Jennifer Dunn, Measured Progress; Karla Egan,
National Center for the Improvement of Educational Assessment; Lisa Keller, University of
Massachusetts Amherst; Lee LaFond, Measured Progress
As states adopt new standards and assessments the expectations on psychometricians from a political perspective
have been increasing. The purpose of this training session is to provide a practical introduction to the standard setting
process while addressing common policy concerns and expectations.
This training will follow the Evidence-Based Standard Setting (EBSS) framework. The first third of the session will touch
upon some of the primary pre-meeting developmental and logistical activities as well as the EBSS steps of defining
outcomes and developing relevant research as guiding validity evidence.
The middle third of the session will be focused on the events of the standard setting meeting itself. The session
facilitators will walk them through the phases of a typical standard setting, and participants will experience a training
session on the Bookmark, Angoff, and Body of Work methods followed by practice rating rounds with discussion.
The final third of the training session will give an overview of what happens following a standard setting meeting.
This will be carried out through a panel discussion with an emphasis on policy expectations and the importance of
continuing to gather evidence in support of the standard.
18
Washington, DC, USA
8:00 AM - 5:00 PM, Meeting Room 16, Meeting Room Level, Training Session, EE
Analyzing NAEP Data Using Plausible Values and Marginal Estimation with AM
Emmanuel Sikali, National Center for Education Statistics; Young Yee Kim, American Institues
for Research
Since results from the National Assessment of Education Progress (NAEP) serve as a common metric for all states
and select urban districts, many researchers are interested in conducting studies using NAEP data. However, NAEP
data pose many challenges for researchers due to its special design features. This class intends to provide analytic
strategies and hands-on practice with researchers who are interested in NAEP data analysis. The class consists of two
parts: (1) instructions on the psychometric and sampling designs of NAEP and data analysis strategies required by
these design features and (2) the demonstration of NAEP data analysis procedures and hands-on practice. The first
part includes marginal maximum likelihood estimation approach to obtaining scale scores and appropriate variance
estimation procedures and the second part includes two approaches to NAEP data analysis, i.e. using the plausible
values approach and the marginal estimation approach with item response data. The demonstration and hands-on
practice will be conducted with a free software program, AM, using a mini-sample public-use NAEP data file released in
2011. Intended participants are researchers, including graduate students, education practitioners, and policy analysts,
who are interested in NAEP data analysis.
19
8:00 AM - 5:00 PM, Meeting Room 4, Meeting Room Level, Training Session, FF
Multidimensional Item Response Theory: Theory and Applications and Software
Lihua Yao, Defense Manpower Data Center; Mark Reckase, Michigan State University; Rich
Schwarz, ETS
Theories and applications of multidimensional item response theory model (MIRT) and Multidimensional Computer
Adaptive testing (MCAT) and MIRT linking are discussed. Software demonstrated and hands on experienced cover
areas for multidimensional multi-group calibration, multidimensional linking, and MCAT simulation; intended for
researchers who are interested in MIRT and MCAT.
20
Washington, DC, USA
1:00 PM - 5:00 PM, Meeting Room 3, Meeting Room Level, Training Session, GG
New Weighting Methods for Causal Mediation Analysis
Guanglei Hong, University of Chicago
Many important research questions in education relate to how interventions work. A mediator characterizes the
hypothesized intermediate process. Conventional methods for mediation analysis generate biased results when the
mediator-outcome relationship depends on the treatment condition. These methods also tend to have a limited capacity
for removing confounding associated with a large number of covariates. This workshop teaches the ratio-of-mediatorprobability weighting (RMPW) method for decomposing total treatment effects into direct and indirect effects in the
presence of treatment-by-mediator interactions. RMPW is easy to implement and requires relatively few assumptions
about the distribution of the outcome, the distribution of the mediator, and the functional form of the outcome model.
We will introduce the concepts of causal mediation, explain the intuitive rationale of the RMPW strategy, and delineate
the parametric and nonparametric analytic procedures. Participants will gain hands-on experiences with a free standalone RMPW software program. We will also provide SAS, Stata, and R code and will distribute related readings. The
target audience includes graduate students, early career scholars, and advanced researchers who are familiar with
multiple regression and have had prior exposure to binary and multinomial logistic regression. Each participant will
need to bring a laptop for hands-on exercises.
21
1:00 PM - 5:00 PM, Meeting Room 6, Meeting Room Level, Training Session, II
Computerized Multistage Adaptive Testing: Theory and Applications (Book by
Chapman and Hall)
Duanli Yan, Educational Testing Service; Alina von Davier, ETS; Kyung Chris Han
This workshop provides a general overview of a computerized multistage test (MST) design and its important concepts
and processes. The focus of the workshop will be on MST theory and applications including alternative scoring and
estimation methods, classification tests, routing and scoring, linking, test security, as well as a live demonstration of
MST software MSTGen (Han, 2013). This workshop is based on the edited volume of Yan, von Davier, & Lewis (2014).
The volume is structured to take the reader through all the operational aspects of the test, from the design to the postadministration analyzes. The training course consists of a series of lectures and hands-on examples in the following
four sessions:
1. MST Overview, Design, and Assembly
2. MST Routing, Scoring, and Estimations
3. MST Applications
4. MST Simulation Software
The MST design is described, why it is needed, and how it differs from other test designs, such as linear test and
computer adaptive test (CAT) designs.
This course is intended for people who have some basic understanding of item response theory and CAT. 22
Washington, DC, USA
Pre-Conference Training Sessions - Friday, April 8, 2016
23
24
Washington, DC, USA
Friday, April 8, 2016
8:00 AM - 12:00 PM, Renaissance West B, Ballroom Level, Training Session, JJ
Landing Your Dream Job for Graduate Students
Deborah Harris and Xin Li, ACT, Inc.
This training session will address practical topics graduate students in measurement are interested in regarding
finding a job and starting a career. It will concentrate on what to do now while they are still in school to best prepare
for a job (including finding a dissertation topic, selecting a committee, maximizing experiences while still a student
with networking, internships, and volunteering, and providing suggestions to the questions regarding what types of
coursework an employer looks for, and what would make a good job talk), how to locate, interview for, and obtain a job
(including how to find where jobs are, how to apply for jobs --targeting cover letters, references, and resumes), what to
expect in the interview process (including job talks, questions to ask, and negotiating an offer), and what’s next after
they have started their first post PhD job (including adjusting to the environment, establishing a career path, publishing,
finding mentors, balancing work and life, and becoming active in the profession). The session is interactive, and geared
to addressing the participants’ questions during the session. Resource materials are provided on all relevant topics. 25
8:00 AM - 12:00 PM, Meeting Room 4, Meeting Room Level, Training Session, KK
Bayesian Analysis of IRT Models using SAS PROC MCMC
Clement Stone, University of Pittsburgh
There is a growing interest in Bayesian estimation of IRT models, in part due to the appeal of the Bayesian paradigm,
as well as the advantages of these methods with small sample sizes, more complex models (e.g., multidimensional
models), and simultaneous estimation of item and person parameters. Software has become available, SAS and
WinBUGS, which make a Bayesian analysis of IRT models more accessible to psychometricians, researchers, and scale
developers.
SAS PROC MCMC offers several advantages over other software, and the purpose of this training session is to illustrate
how SAS can be used to implement a Bayesian analysis of IRT models. After reviewing briefly Bayesian methods and
IRT models, PROC MCMC is introduced. This introduction includes discussion of a template for estimating IRT models
as well as convergence diagnostics and specification of prior distributions. Also discussed are extensions for more
complex models (e.g., multidimensional, mixture) and methods for comparing models and evaluating model fit.
The instructional approach will be one involving lecture and demonstration. Considerable code and output will be
discussed and shared. An overall objective is that attendees can extend examples to their testing applications. Some
understanding of SAS programs and SAS procedures is helpful. 26
Washington, DC, USA
8:00 AM - 5:00 PM, Meeting Room 2, Meeting Room Level, Training Session, LL
flexMIRT®: Flexible Multilevel Multidimensional Item Analysis and Test Scoring
Li Cai, University of California - Los Angeles; Carrie R. Houts, Vector Psychometric Group, LLC
There has been a tremendous amount of progress in item response theory (IRT) in the past two decades. flexMIRT® is
IRT software which offers multilevel, multidimensional, and multiple group item response models. flexMIRT® also offers
users the ability to obtain recently developed model fit indices, fit diagnostic classification models, and models with
non-normal latent densities, among other advanced features. This training session will introduce users to the flexMIRT®
system and provide valuable hands on experience with the software.
27
8:00 AM - 5:00 PM, Meeting Room 5, Meeting Room Level, Training Session, MM
Aligning ALDs and Item Response Demands to Support Teacher Evaluation Systems
Steve Ferrara, Pearson School; Christina Schneider, The National Center for the Improvement
of Educational Assessment
A primary goal of achievement tests is to classify students into achievement levels that enable inferences about student
knowledge and skill. Explicating how knowledge and skills differ in complexity and empirical item difficulty—at the
beginning of test design—is critical to those inferences. In this session we demonstrate for experts in assessment
design, standard setting, formative assessment, or teacher evaluation how emerging practices in statewide tests for
developing ALDs, training item writers to align items to ALDs, and identifying item response demands can be used
to support teachers to develop student learning objectives (SLOs) in nontested grades and subjects. Participants will
analyze ALDs, practice writing items aligned to those ALD response demands, and analyze classroom work products
from teachers who used some of these processes to create SLOs. We will apply a framework for connecting ALDs (Egan
et al., 2012), the ID Matching standard setting method (Ferrara & Lewis, 2012), and item difficulty modeling techniques
(Ferrara et al., 2011; Schneider et al., 2013) to a process that generalizes from statewide tests to SLOs, thereby supporting
construct validity arguments for student achievement indicators used for teacher evaluation.
28
Washington, DC, USA
8:00 AM - 5:00 PM, Renaissance East, Ballroom Level, Training Session, NN
Best Practices for Lifecycles of Automated Scoring Systems for Learning and
Assessment
Peter Foltz, Pearson; Claudia Leacock, CTB/McGraw Hill; André Rupp and Mo Zhang,
Educational Testing Service
Automated scoring systems are designed to evaluate performance data in order to assign scores, provide feedback,
and/or facilitate teaching-learning interactions. Such systems are used in K-12 and higher education for such areas as
ELA, science, and mathematics, as well as in professional domains such as medicine and accounting, across various
use contexts. Over the past 20 years, there has been rapid growth around research on the underlying theories and
methods of automated scoring, the development of new technologies, and ways to implement automated scoring
systems effectively. Automated scoring systems are developed by a diverse community of scholars and practitioners
encompassing such fields as natural language processing, linguistics, speech science, statistics, psychometrics,
educational assessment, and learning and cognitive sciences. As the application of automated scoring continues
to grow, it is important for the NCME community to have an overarching understanding of the best practices for
designing, evaluating, deploying, and monitoring such systems. In this training session, we provide participants with
such an understanding via a mixture of presentations, individual and group-level discussions, and structured and freeplay demonstration activities. We utilize systems that are both proprietary and freely available, and provide participants
with resources that empower them in their own future work. 29
8:00 AM - 5:00 PM, Meeting Room 3, Meeting Room Level, Training Session, OO
Test Equating Methods and Practices
Michael Kolen and Robert Brennan, University of Iowa
The need for equating arises whenever a testing program uses multiple forms of a test that are built to the same
specifications. Equating is used to adjust scores on test forms so that scores can be used interchangeably. The goals of
the session are for attendees to be able to understand the principles of equating, to conduct equating, and to interpret
the results of equating in reasonable ways. The session focuses on conceptual issues. Practical issues are considered.
30
Washington, DC, USA
8:00 AM - 5:00 PM, Renaissance West A, Ballroom Level, Training Session, PP
Diagnostic Measurement: Theory, Methods, Applications, and Software
Jonathan Templin and Meghan Sullivan, University of Kansas
Diagnostic measurement is a field of psychometrics that focuses on providing actionable feedback from
multidimensional tests. This workshop provides a hands-on introduction to the terms, techniques, and methods
used for diagnosing what students know, thereby giving researchers access to information that can be used to guide
decisions regarding students’ instructional needs. Upon completion of the workshop, participants will be able to
understand the rationale and motivation for using diagnostic measurement methods. Furthermore, participants will
be able to understand the types of data typically used in diagnostic measurement along with the information that
can be obtained from implementing diagnostic models. Participants will become well-versed in the state-of-the-art
techniques currently used in practice and will be able to use and estimate diagnostic measurement models using new
software developed by the instructor
31
1:00 PM - 5:00 PM, Renaissance West B, Ballroom Level, Training Session, QQ
Effective Item Writing for Valid Measurement
Anthony Albano, University of Nebraska-Lincoln; Michael Rodriguez, University of MinnesotaTwin Cities
In this training session, participants will learn to write and critique high-quality test items by implementing itemwriting guidelines and validity frameworks for item development. Educators, researchers, test developers, and other
test users are encouraged to participate.
Following the session, participants should be able to: implement empirically-based guidelines in the item writing
process; describe procedures for analyzing and validating items; apply item-writing guidelines in the development
of their own items; and review items from peers and provide constructive feedback based on adherence to the
guidelines. The session will consist of short presentations with small-group and large-group activities. Materials will
be contextualized within common testing applications (e.g., classroom assessment, response to intervention, progress
monitoring, summative assessment, entrance examination, licensure/certification).
Participants are encouraged to bring a laptop computer, as they will be given access to a web application that facilitates
collaboration in the item-writing process; those participating in the session in-person and remotely will use the
application to create and comment on each other’s items online. This practice in item writing will allow participants
to demonstrate understanding of what they have learned, and receive feedback on their items from peers and the
presenters.
32
Washington, DC, USA
3:00 PM - 7:00 PM, Meeting Room 11, Meeting Room Level
NCME Board of Directors Meeting
Members of NCME are invited to attend as observers.
33
4:30 PM - 6:30 PM, Fado’s Irish Pub (Graduate Students only)
Graduate Student Social
Come enjoy FREE appetizers at a local venue within walking distance of the conference hotels. The first 50 graduate
student attendees receive one free drink ticket. Exchange research interests and discuss your work with fellow graduate
students from NCME & AERA Division D.
Fado’s Irish Pub is located at 808 7th Street NW, Washington, DC 20001
34
Washington, DC, USA
6:30 PM -10:00 PM, Ballroom C, Level Three, Convention Center
AERA Centennial Symposium & Centennial Reception
The Centennial Annual Meeting’s Opening Session and Reception will celebrate AERA’s 100-year milestone in grand
style. Together, the elements of this energizing and dynamic opening session will commemorate the association’s
history, highlight the breadth and unity of the field of education research as it has evolved around the world, and begin
to explore second-century pathways for advancing AERA’s mission.
The centerpiece of the opening plenary session will be a “Meet the Press”-style Power Panel and Town Hall discussion
that takes a critical look at the current “State of the Field” for education research – taking stock of its complex history
and imagining its future. The Post Reception will be an elegant and festive party for members and friends of AERA.
35
36
Washington, DC, USA
Annual Meeting Program - Saturday, April 9, 2016
37
38
Washington, DC, USA
Saturday, April 9, 2016
6:30 AM - 7:30 AM, Meeting Room 7, Meeting Room Level
Sunrise Yoga
Please join us for the second NCME Sunrise Yoga. We will start promptly at 6:30 a.m. for one hour at the Renaissance.
Advance registration required ($10) to reserve your mat. NO EXPERIENCE NECESSARY. Just bring your body and your
mind, and our friends from Flow Yoga Center (http://www.flowyogacenter.com/) will do the rest. Namaste.
39
8:15 AM - 10:15 AM, Renaissance East, Ballroom Level, Invited Session, A1
NCME Book Series Symposium:
The Challenges to Measurement in an Era of Accountability
Session Chair: Henry Braun, Boston College
Session Discussants: Suzanne Lane, University of Pittsburgh; Scott Marion, National Center
for the Improvement of Educational Assessment
This symposium draws on The Challenges to Measurement in an Era of Accountability, a recently published volume
in the new NCME Book Series. The volume addresses a striking imbalance: Although it is not possible to calculate
test-based indicators (e.g. value-added scores or mean growth percentiles) for more than 70 percent of teachers,
assessment and accountability issues in those other subject/grade combinations have received comparatively little
attention in the research literature. The book brought together experts in educational measurement, as well as
those steeped in the various disciplines, to provide a comprehensive and accessible guide to the measurement of
achievement in a broad range of subjects, with a primary focus on high school grades. The five focal presentations
will offer discipline-specific perspectives from: social sciences, world languages, performing arts, life sciences and
physical sciences. Each presentation will include a brief review of assessment (both formative and summative) in
the discipline, with particular attention to the unique circumstances faced by teachers and measurement specialists
responsible for assessment design and development, followed by a survey of current assessment initiatives and
responses to accountability pressures. The symposium offers the measurement community a unique opportunity to
learn about assessment practices and challenges across the disciplines.
Use of Evidence Centered Design in Assessment of History Learning
Kadriye Ercikan, University of British Columbia; Pamela Kaliski, College Board
Assessment Issues in World Languages
Meg Malone, Center for Applied Linguistics; Paul Sandrock, American Council on the Teaching of Foreign Languages
Arts Assessment in an Age of Accountability: Challenges and Opportunities in Implementation, Design, and
Measurement
Scott Shuler, Connecticut Department of Education, Ret; Tim Brophy, University of Florida; Robert Sabol, Purdue University
Assessing the Life Sciences: Using Evidence-Centered Design for Accountability Purposes
Daisy Rutstein and Britte Cheng, SRI International
Assessing Physical and Earth and Space Science in the Context of the NRC Framework for K-12 Science
Education and the Next Generation Science Standards
Nathaniel Brown, Boston College
40
Washington, DC, USA
8:15 AM - 10:15 AM, Renaissance West A, Ballroom Level, Coordinated Session, A2
Collaborative Problem Solving Assessment: Challenges and Opportunities
Session Chairs: Yigal Rosen, Pearson; Lei Liu, ETS
Session Discussant: Samuel Greiff, University of Luxemburg
Collaborative problem solving (CPS) is a critical competency for college and career readiness. Students emerging
from schools into the workforce and public life will be expected to have CPS skills as well as the ability to perform
that collaboration in various group compositions and environments (Griffin, Care, & McGaw, 2012; OECD, 2013).
Recent curriculum and instruction reforms have focused to a greater extent on teaching and learning CPS (National
Research Council, 2012; OECD, 2012). However, structuring standardized computer-based assessment of CPS skills,
specifically for large-scale assessment programs, is challenging. In this symposium a spectrum of approaches for
collaborative problem solving assessment will be introduced, and four papers will be presented and discussed.
PISA 2015 Collaborative Problem Solving Assessment Framework
Art Graesser, University of Memphis
Human-To-Agent Approach in Collaborative Problem Solving Assessment
Yigal Rosen, Pearson
Collaborative Problem Solving Assessment: Bring Social Aspect into Science Assessment
Lei Liu, Jiangang Hao, Alina von Davier and Patrick Kyllonen, ETS
Assessing Collaborative Problem Solving: Students’ Perspective
Haggai Kupermintz, University of Haifa
41
8:15 AM - 10:15 AM, Renaissance West B, Ballroom Level, Coordinated Session, A3
Harnessing Technological Innovation in Assessing English Learners: Enhancing Rather
Than Hindering
Session Chair: Dorry Kenyon, Center for Applied Linguistics
Session Discussant: Mark Reckase, Michigan State University
How do English Learners (ELs) interact with technology in large-scale testing? In this coordinated session, an
interdisciplinary team from the Center for Applied Linguistics presents findings from four years of research and
development for the WIDA Consortium. For nine years, WIDA has offered an annual paper-and-pencil assessment of
developing academic English language proficiency (ELP), known as ACCESS for ELLs, used to assess over 1 million
ELs in 36 states. With federal funding, WIDA and its partners have transitioned this assessment to a web-based
assessment, ACCESS 2.0, now in its first operational year (2015-2016). ACCESS 2.0 is used to assess ELs at all levels of
English language development, from grades 1 to 12, and to assess all four language domains (listening, speaking,
reading and writing). Thus, the research and development activities covered multiple critical issues pertaining to
ELs and technology in large-scale assessments. In this session, we share research findings from several inter-related
perspectives, including improving accuracy of measurement, developing complex web-based performanceassessment tasks, and familiarity with technology in the EL population, including keyboarding and interfacing with
technology-enhanced task types. These findings provide insight into the valid assessment of ELs using technology
for a wide variety of uses.
Keyboarding and the Writing Construct for Els
Jennifer Renn and Jenny Dodson, Center for Applied Linguistics
Supporting Extended Discourse Through a Computer-Delivered Assessment of Speaking
Megan Montee and Samantha Musser, Center for Applied Linguistics
Using Multistage Testing to Enhance Measurement
David MacGregor and Xin Yu, Center for Applied Linguistics
Enhanced Item Types—Engagement or Unnecessary Confusion for Els?
Jennifer Norton and Justin Kelly, Center for Applied Linguistics
42
Washington, DC, USA
8:15 AM - 10:15 AM, Meeting Room 3, Meeting Room Level, Paper Session, A4
How Can Assessment Inform Classroom Practice?
Session Discussant: Priya Kannan, ETS
What Score Report Features Promote Accurate Remediation? Insights from Cognitive Interviews
Francis Rick, University of Massachusetts, Amherst; Amanda Clauser, National Board of Medical Examiners
Cognitive interviews were conducted with medical students interacting with score reports to investigate what
content and design features promote adequate interpretations and remediation decisions. Transcribed “speech
bursts” were coded based on pre-established categories, which were then used to evaluate the effectiveness of
each report format.
Evaluating the Degree of Coherence Between Instructional Targets and Measurement Models
Lauren Deters, Lori Nebelsick-Gullet, Charlene Turner, Bill Herrera and Elizabeth Towles, edCount, LLC
To solidify the links between the instructional and measurement contexts for its overall assessment system the
National Center and State Collaborative investigated the degree of coherence among the system’s measurement
targets, learning expectations, and targeted long-range outcomes. This study provides evidence for the system’s
coherence across instruction and assessment contexts.
Modeling the Instructional Sensitivity of Polytomous Items
Alexander Naumann and Johannes Hartig, German Institute for International Educational Research (DIPF); Jan Hochweber,
University of Teacher Education St. Gallen (PHSG)
We propose a longitudinal multilevel IRT model for the instructional sensitivity of polytomous items. The model
permits evaluation of global and differential sensitivity based on average change and variation of change in
classroom-specific item locations and thresholds. Results suggest that the model performs well in its application
to empirical data.
Growth Sensitivity and Standardized Assessments: New Evidence on the Relationship
Shalini Kapoor, ACT; Catherine Welch and Steve Dunbar, Iowa Testing Programs/University of Iowa
Academic growth measurement requires a structured feedback that informs not only what students know but
also what they need to know to learn and grow. This research proposes a method that can support generation of
content-related growth feedback which can help tailor classroom instruction to student-specific needs.
Using Regression-Based Growth Models to Inform Learning with Multiple Assessments
Ping Yin and Dan Mix, Curriculum Associates
This study evaluates the feasibility of two types of regression-based growth models to inform student learning using
a computer adaptive assessment administered multiple times throughout a school year. With the increased interest
to inform instruction learning, it is important to evaluate whether current growth models can support such goals.
43
8:15 AM - 10:15 AM, Meeting Room 4, Meeting Room Level, Coordinated Session, A5
Enacting a Learning Progression Design to Measure Growth
Session Chair: Damian Betebenner, National Center for the Improvement of Educational
Assessment
The concept of growth is at the foundation of the policy and practice around systems of educational accountability.
Yet there is a disconnect between the criterion-referenced intuitions that parents and teachers have for what it
means for students to demonstrate growth and the primarily norm-referenced metrics that are used to infer
growth. One way to address this disconnect would be to develop vertically linked score scales that could be used
to support both criterion-referenced and norm-referenced interpretations, but this hinges upon having a coherent
conceptualization of what it is that is growing from grade to grade. The purpose of this session is to facilitate
debate about the design of large-scale assessments for the intended purpose of drawing inferences about student
growth, a topic that was the recent subject of a 2015 focus article and commentaries for the journal Measurement.
A learning-progression approach to the conceptualization of growth and the subsequent design of a vertical score
scale will be described in the context of student understanding of proportional reasoning, a big picture idea from
the Common Core State Standards for Mathematics. Subsequent presentations and discussion will focus on the
pros and cons of the proposed approach and of other possible alternatives.
Using Learning Progressions to Design Vertical Scales
Derek Briggs and Fred Peck, University of Colorado
Challenges in Modeling and Measuring Learning Progressions
Jere Confrey, Ryan Seth Jones, and Garron Gianopulos, North Carolina State University
The Importance of Content-Referenced Score Interpretations
Scott Marion, National Center for the Improvement of Educational Assessment
Challenges on the Path to Implementation
Joseph Martineau and Adam Wyse, National Center for the Improvement of Educational Assessment
Growth Through Levels
David Thissen, University of North Carolina
44
Washington, DC, USA
Testlets and Multidimensionality in Adaptive Testing
Session Discussant: Chun Wang, University of Minnesota
Measuring Language Ability of Students with Compensatory MCAT: a Post-Hoc Simulation Study
Burhanettin Özdemir and Selahattin Gelbal, Hacettepe University
The purposes of this study is to determine the most suitable Multidimensional CAT design that measures language
ability of students and compare the paper-pencil test outcomes to those of the new MCAT designs. Real data set
from English Proficiency Test was used to create item pool consisting of 565 items.
Multidimensional CAT Classification Method for Composite Scores
Lihua Yao and Dan Segall, Defense Manpower data center
The current research proposed an item selection method using cut points for the composite score for classification
purpose in the multidimensional CAT frame work. The classification accuracy for the composite score for the
proposed method is compared with other existing MCAT methods.
Two Bayesian Online Calibration Methods in Multidimensional Computerized Adaptive Testing
Ping Chen, Beijing Normal University
To solve the non-convergence issue in M-MEM (Chen & Xin, 2013) and improve the calibration precision, this study
combined Bayes Modal Estimation (BME) (Mislevy, 1986) with M-OEM and M-MEM to make full use of the prior
information, and proposed two Bayesian online calibration methods in MCAT (M-OEM-BME and M-MEM-BME).
Item Selection in Testlet-Based CAT
Mark Reckase and Xin Luo, Michigan State University
The research in item selection in testlet-based CAT is rare. This study compared three item selection approaches
(one was based on polytomous model and two on dichotomous model) and investigated some factors that might
influence the effectiveness of CAT. These three approaches obtained similar measurement accuracy but different
exposure rate.
Effects of Testlet Characteristics on Estimating Abilities in Testlet-Based CAT
Seohong Pak, University of Iowa; Hong Qian and Xiao Luo, NCSBN
The testlet selection methods, testlet sizes, degrees of variation in item difficulites within each testlet, and degrees
of testlet random effect were investigated under testlet-based CAT. The 48 conditions were run for 50 times using R
and results were compared based on the measurement accuracy and decision accuracy.
Computerized Mastery Testing (CMT) Without the Use of Item Response Theory
Sunhee Kim and Adena Lebeau, Prometric; Tammy Trierweiler, Law School Admissions Council (LSAC); F. Jay Breyer and Charles
Lewis, Educational Testing Service; Robert Smith, Smith Consulting
This study demonstrates that CMT can be successfully implemented when testlets are constructed using classical
item statistics in a real world application. As CMT is easier to implement and more cost efficient than CAT test
designs, credentialing programs that have small samples and item pools may benefit from this approach.
45
Methods for Examining Local Item Dependence and Multidimensionality
Session Discussant: Ki Matlock, Oklahoma State University
Examining Unidimensionality with Parallel Analysis on Residuals
Tianshu Pan, Pearson
The current study will compare the performances of some parallel analysis (Horn, 1965) procedures on checking
unidimensionality of the simulated unidimensional and multidimensional data respectively, i.e., the procedures of
Reckase’s (2009), Drasgow and Lissak’s (1983), regular parallel analysis, and a new parallel analysis proposed by this
study.
A Conditional IRT Model for Directional Local Item Dependency in Multipart Items
Dandan Liao, Hong Jiao and Robert Lissitz, University of Maryland, College Park
A multipart item consists of two related questions, which potentially introduces conditional local item dependence
(LID) between two parts of the item. This paper proposes a conditional IRT model for directional LID in multipart
items and compares different approaches to modeling LID in terms of parameter estimation through simulation
study.
Fit Index Criteria in Confirmatory Factor Analysis Models Used by Measurement Practitioners
Anne Corinne Huggins-Manley and HyunSuk Han, University of Florida
Measurement practitioners often use CFA models to assess unidimensionality and local independence in test data.
Current guidelines for assessing fit of CFA models are possibly inappropriate because they were not developed under
measurement oriented conditions. This study provides CFA fit index cutoff recommendations for evaluating IRT model
assumptions.
Multilevel Bi-Factor IRT Models for Wording Effects
Xiaorui Huang, East China Normal University
A multilevel bi-factor IRT was developed to account for wording effects in mixed-format scales and multilevel data
structures. Simulation studies demonstrated good parameter recovery for the new model and underestimation of
SE when multilevel data structures were ignored. An empirical example was provided.
A Generalized Multinomial Error Model for Tests That Violate Conditional Independence Assumptions
Benjamin Andrews, ACT
A generalized multinomial error model is presented that allows for dependency among vectors of item responses.
This model can be used in instances where polytomous items are related to the same passage or if responses are
rated on several different traits. Examples and comparisons to G theory methods are discussed.
Both Local Item Dependencies and Cut-Point Location Impact Examinee Classifications
Jonathan Rubright, American Institute of Certified Public Accountants
This simulation study demonstrates that the strength of local item dependencies and the location of an examination’s
cut-point both influence the sensitivity and specificity of examinee classifications under unidimensional IRT.
Practical implications are discussed in terms of false positives and false negatives of test takers.
46
Washington, DC, USA
10:35 AM - 12:05 PM, Renaissance East, Ballroom Level, Coordinated Session, B1
The End of Testing as We Know It?
Session Chair: Randy Bennett, ETS
Session Presenters: Randy Bennett, ETS; Joan Herman, UCLA-CRESST; Neal Kingston,
University of Kansas
The rapid evolution of technology is affecting all aspects of our lives—commerce, communication, leisure, and
education. Activities like travel planning, news consumption, and music purchasing have been so dramatically
affected as to have caused significant shifts in how services and products are packaged, marketed, distributed,
priced, and sold. Those shifts have been dramatic enough to have substantially reduced the influence of once-staple
products like newspapers and the companies that provide them.
Technology has come to education and educational testing too, though more slowly than to other areas. Still, there
is growing evidence that the future for these fields will be considerably different and that those differences will
emerge quickly. Billions of dollars are being invested in new technology-based products and services for K-12 as well
as higher education, huge amounts of student data are being collected through these offerings, tests are moving
to digital delivery and substantially changed in the process, and the upheaval that has occurred in other industries
may come to education too.
What will and won’t change for educational testing? This panel presentation will include three speakers, each
offering a different scenario for the future of K-12 assessment.
47
10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Coordinated Session, B2
Fairness and Machine Learning for Educational Practice
Session Chair: Alina von Davier
Session Moderator: Jill Burstein
Session Panelists: Nitin Madnani and Aoife Cahill, Educational Testing Service; Solon
Barocas, Princeton University; Brendan O’Connor, University of Massachusetts Amherst;
James Willis, Indiana University
This panel will address issues around fairness and transparency in the application of ML to education, in particular
to learning and assessment. Panelists will include experts in NLP, Computational Psychometric (CP), and Education
Technology Policy and Ethics. Panelists will respond to questions such as,
1. Are data-driven methods used alone ever OK?
2. Are there use cases that are more acceptable than others from a fairness perspective?
3. Are there examples from other domains that we may apply to educational assessment?
4. In the case of scoring written essays: What is the difference between human raters and ML methods? For human
raters, at least in writing, we know what they ‘are supposed to consider’ but don’t know what they choose and
what the weightings are? For ML methods, we actually ‘know’ what features go in, but weightings and predictive
modeling can be black-box-like. But, is this any less true for human raters?
5. Under what conditions is interpretability important? For instance, how do we isolate diagnostic information if we
use ML for predicting learning outcomes?
6. Can we detect underlying bias in the large data set from education? If we identify the bias, is it acceptable to
adjust the ML algorithms to eliminate the bias? Can this adjustments be misused?
7. What type of evaluation methods should one employ to ensure that the results are fair to all groups?
The moderator will lead the panel by presenting questions to the panel and managing the discussion. The panel
discussion will be 60 minutes, and there will be an additional 30 minutes intended for questions and discussion with
the audience.
48
Washington, DC, USA
10:35 AM - 12:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, B3
Item Difficulty Modeling: From Theory to Practice
Session Chair: Isaac I. Bejar
Session Discussant: Steve Ferrara
Item difficulty modeling (IDM) is concerned with both and understanding of the variability in estimated item
difficulty, as well as explanatory item response modeling incorporating difficulty covariates. The symposium starts
with an overview of the multiple applications of difficulty modeling, ranging from purely theoretical to practical
applications. The following presentations then focus on presenting empirical research on the modelling of
mathematics items used in K-12 and graduate admissions assessments. Specifically, the following research will be
presented:
• The use of a validated IDM for generating items by means of family and structural variants
• The multidisciplinary development of an IDM for practical day-to-day application
• Evaluation of the feasibility of automating the propositional analysis of existing items to study the role of
linguistic variables on item difficulty
• Fitting an explanatory IRT model that extends the LLTM by fixing residuals to fully account for difficulty
An Overview of the Purposes of Item Difficulty Modeling (IDM)
Isaac Bejar, ETS
Implications of Item Difficulty Modeling for Item Design and Item Generation
Susan Embretson, Georgia Institute of Technology
Developing an Item Difficulty Model for Quantitative Reasoning: A Knowledge Elicitation Approach
Edith Aurora Graf, ETS
Exploring an Automated Approach to Examining Linguistic Context in Mathematics Items
Kristin Morrison, Georgia Institute of Technology
An Explanatory Model for Item Difficulties with Fixed Residuals
Paul De Boeck, Ohiao State University
49
10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Paper Session, B4
Growth and Vertical Scales
Session Discussant: Anna Topczewski, Pearson
Estimating Vertical Scale Drift Due to Repetitious Horizontal Equating
Emily Ho, Michael Chajewski and Judit Antal, College Board
The stability of a vertical scale as a function of repeated administrations is rarely studied. Our empirical simulation
uses 2-pl math items, generating forms for test-takers from three grades. We examine the effect of ability, test
difficulty, and equating designs on vertical scale stability when applying repetitious horizontal equating.
An Eirm Approach for Studying Latent Growth in Alphabet Knowledge Among Kindergarteners
Xiaoxin Wei, American Institutes for Research; Patrick Meyer and Marcia Invernizzi, University of Virginia
We applied a series of latent growth explanatory item response models to study growth in alphabet knowledge over
three time points. Models allowed for time-varying item parameters and evaluated the impact of person properties
on growth. Results show that growth differs by examinee group in expected and unexpected ways.
Vertical Scaling and Item Location: Generalizing from Horizontal Linking Designs
Stephen Murphy, Rong Jin, Bill Insko and Sid Sharairi, Houghton Mifflin Harcourt
Establishing a vertical scale for an assessment is besieged with practical decisions. Outcomes of these decisions are
essential to valid interpretations of student growth and teacher effectiveness (Briggs, Weeks, & Wiley, 2008). This
study adds to existing literature by examining the impact of item location on the vertical scale.
Predictive Accuracy of Model Inferences for Longitudinal Data with Self-Selection
Tyler Matta, Yeow Meng Thum and Quinn Lathrop, Northwest Evaluation Association
Conventional approaches to characterizing classification accuracy are not valid when data are subject to selfselection. We introduce predictive accuracy, a framework that appropriately accounts for the impact of nonignorable
missing data. We provide an illustration using longitudinal assessment data to predict college readiness when
college test takers are self-selected.
50
Washington, DC, USA
Perspectives on Validation
Session Discussant: Mark Shermis, University of Houston-Clear Lake
Using a Theory of Action to Ensure High Quality Tests
Cathy Wendler, Educational Testing Service
A theory of action helps testing programs ensure high quality tests by documenting claims, determining evidence
needed to support those claims, and creating solutions to address unintended consequences. This presentation
describes the components of a theory of action and how it is being used to evaluate and improve programs.
Teacher Evaluation Systems: Mapping a Validity Argument
Tia Sukin and W. Nicewander, Pacific Metrics; Phoebe Winter, Consultant, Assessment Research and Development
Providing validation evidence for teacher evaluation systems is a complex and historically neglected task. This paper
provides a framework structure for building an argument for the use of comprehensive teacher evaluation systems
which will allow for the identification of possible weaknesses in the system that need to be addressed.
Validity Evidence to Support Alternate Assessment Score Uses: Fidelity and Response Processes
Meagan Karvonen, Russell Swinburne Romine and Amy Clark, University of Kansas
Validity of score interpretations and uses for new online alternate assessments for students with significant
cognitive disabilities (AA-AAS) require new sources of evidence about student and teacher actions during the
test administration process. We present findings from student cognitive labs, teacher cognitive labs, and test
administration observations for an AA-AAS.
Communicating Psychometric Research to Policymakers
Andrea Lash and Mary Peterson, WestEd; Benjamin Hayes, Washoe County School District
Policymakers’ implicit assumptions about assessment data inform their designs of educator evaluation systems.
How can psychometricians help policymakers evaluate the validity of their assumptions? We examine a two-year
effort in one state using a model of science communication for political contexts and an argument-based validation
framework.
51
Model Fit
Session Discussant: Matthew Johnson, Teachers College
Evaluation of Item Response Theory Item-Fit Indices
Adrienne Sgammato and John Donoghue, Educational Testing Service
Performance of Pearson chi-square and likelihood ratio item level model fit indices based on observed data were
evaluated in the presence of complex sampling of items (i.e., BIB sampling). Distributional properties, type I error
and power of these measures were evaluated.
Rethinking Complexity in Item Response Theory Models
Wes Bonifay, University of Missouri
The notion of complexity commonly refers to the number of freely estimated parameters in a model. An investigation
of five popular measurement models suggests that complexity in IRT should be defined not by the number of
parameters, but instead by the functional form of the model.
Measures for Identifying Non-Monotonically Increasing Item Response Functions
Nazia Rahman and Peter Pashley, Law School Admission Council; Charles Lewis, Educational Testing Service
This study explored statistical measures as bases for defining robust criterion in checking for non-monotonicity
in multiple-choice tests, and may be considered analogous to effect size measures. The three methods adapted
to identify non-monotonicity in items were Mokken’s scalability coefficient, isotonic regression analysis, and
nonparametric smooth regression method.
Evaluation of Limited Information IRT Model-Fit Indices Applied to Complex Item Samples
John Donoghue and Adrienne Sgammato, Educational Testing Service
Recently, “limited information” (computed from low order margins of the item response data) measures of model fit
have been suggested. We examined the performance of the indices in the presence of complex sampling of items
(i.e., BIB sampling). Distributional properties, Type I error and power of these measures were evaluated.
52
Washington, DC, USA
Simulation- and Game-Based Assessments
Session Discussant: José Pablo González-Brenes, Pearson
Aligning Process, Product and Survey Data: Bayes Nets for a Simulation-Based Assessment
Tiago Caliço, University of Maryland; Vandhana Mehta and Martin Benson, Cisco Networking Academy; André Rupp,
Simulation-based assessments yield product and process data that can potentially allow for more comprehensive
measurement of competencies and factors that affect these competencies. We discuss the iterative construction of
student characterizations (personae) and elucidate the methodological implications for putting into practice the
evidence-centered design process successfully.
Practical Consequences of Static, Dynamic, or Hierarchical Bayesian Networks in Game-Based Assessments
Maria Bertling, Harvard University; Katherine Furgol Castellano, Educational Testing Service
There is a growing interest in using Bayesian approaches for analyzing data from game-based assessments (GBAs).
This paper describes the process of developing a measurement model for an argumentation game and demonstrates
analytical and practical consequences of using different types of Bayesian networks as scaling method for GBAs.
Impact of Feedback Within Technology Enhanced Items on Perseverance and Performance
Stacy Hayes, Chris Meador and Karen Barton, Discovery Education
This research explores the impact of formative feedback within technology enhanced items (TEIs) embedded in
a digital mathematics techbook, and where students are permitted multiple attempts. Exploratory analyses will
investigate patterns of student performance by time on task, type of feedback, item type, misconception, construct
complexity, and persistence.
Framework for Feedback and Remediation with Electronic Objective Structured Clinical Examinations
Hollis Lai, Vijay Daniels, Mark Gierl, Tracey Hillier and Amy Tan, University of Alberta
Objective Structured Clinical Examination (OSCE) is popular among health profession education but cannot
provide student feedback and guidance. As OSCEs migrate into an electronic format, the purpose of our paper is
to demonstrate a framework to integrate myriad of data sources captured in an OSCE to provide student feedback.
53
Test Security and Cheating
Session Discussant: Dmitry Belov, Law School Admission Council
Applying Three Methods for Detecting Aberrant Tests to Detect Compromised Items
Yu Zhang, Jiyoon Park and Lorin Mueller, Federation of State Boards of Physical Therapy
Three different approaches toward detecting item preknowledge were applied to detect compromised items.
These three methods were originally developed for detecting aberrant responses and showed high performance in
detecting examinees having item preknowledge. We employed these methods to detect potentially compromised
items.
Detecting Two Patterns of Cheating with a Profile of Statistical Indices
Amin Saiar, Gregory Hurtz and John Weiner, PSI Services LLC
Several indices used to detect aberrances in item scores are compared, assessing similarities in raw responses. Results
show that the different indices are differentially sensitive to two patterns of cheating, and profiles across the indices
may be most useful for detecting and diagnosing test cheating.
Integrating Digital Assessment Meta-Data for Psychometric and Validity Analysis
Elizabeth Stone, Educational Testing Service
This paper discusses meta-data (or process data) captured during assessments that can be used to enhance
psychometric and validity analyses. We examine sources and types of meta-data, as well as uses including subgroup
refinement, identification of effort, and test security. We also describe challenges and caveats to this usage.
How Accurately Can We Detect Erasures?
Han Yi Kim and Louis Roussos, Measured Progress
Erasure analyses require accurate detection of erasures, as distinct from blank and filled-in marks. This study evaluates
erasure detection using data for which the true nature of the marks are known. Optimal rules are formulated. Type I
error and power are calculated and evaluated under various scenarios.
54
Washington, DC, USA
12:25 PM - 1:55 PM, Renaissance East, Ballroom Level, Coordinated Session, C1
Opting Out of Testing: Parent Rights Versus Valid Accountability Scores
Session Discussant: S.E. Phillips, Assessment Law Consultant
Although permitted by legislation in some states, too many parents opting their children out of statewide testing
may threaten the validity of school accountability scores. This session will explore the effects of opt outs from the
perspectives of enabling state legislation, state assessment staff, measurement specialists, and testing vendors.
Survey and Analysis of State Opt Outs and Required Test Participation Legislation
Michelle Croft, ACT, Inc.; Richard Lee, ACT, Inc
Test Administration, Scoring, and Reporting When Students Opt Out
Tim Vansickle, Questar Assessment Inc.,
Responding to Parents and Schools About Student Testing Opt Outs
Derek Brown, Oregon Department of Education
Opt-Outs: The Validity of School Accountability and Teacher Evaluation Test Score Interpretations
Greg Cizek, University of North Carolina at Chapel Hill
55
12:25 PM - 1:55 PM, Renaissance West A, Ballroom Level, Coordinated Session, C2
Building Toward a Validation Argument with Innovative Field Test Design and Analysis
Session Chair: Catherine Welch, University of Iowa
Session Discussants: Michael Rodriguez, University of Minnesota; Wayne Camara, ACT, Inc.
For a variety of reasons, large-scale assessment programs have come to rely heavily on data collected during field
testing to evaluate items, assemble forms and link those forms to already established standard score scales and
interpretive frameworks such as proficiency benchmarks and other standards such as college readiness. When
derived scores are based on pre-calibrated item pools, as in adaptive testing, or on pre-equated or otherwise linked
fixed test forms, the administrative conditions (cf. Wise, 2015) and sampling designs (e.g. Meyers, Miller & Way, 2009)
for field testing are critical to the validity of the scores. This session addresses key aspects of field testing that can be
used as a basis for the validation work of an operational assessment program.
Implications of New Construct Definitions and Shifting Emphases in Curriculum and Instruction
Catherine Welch, University of Iowa
Implications of Composition and Behavior of the Sample When Studying Item Responses
Tim Davey, Educational Testing Service
Assessing Validity of Item Response Theory Model When Calibrating Field Test Items
Brandon LeBeau, University of Iowa
56
Washington, DC, USA
12:25 PM - 1:55 PM, Renaissance West B, Ballroom Level, Coordinated Session, C3
Towards Establishing Standards for Spiraling of Contextual Questionnaires in LargeScale Assessments
Session Chair: Jonas Bertling, Educational Testing Service
Session Discussant: Lauren Harrell, National Center for Education Statistics
Constraints of overall testing time and the large sample sizes in large-scale assessments (LSAs) make spiraling
approaches where different respondents receive different sets of items a viable option to reduce respondent burden
while maintaining or increasing content coverage across relevant areas. Yet, LSAs have taken different directions
in their use of spiraling in operational questionnaires and there is currently no consensus on the benefits and
drawbacks of spiraling. This symposium brings together diverse perspectives on spiraling approaches in conjunction
with mass imputation for contextual questionnaires in LSAs and will help establish standards how future operational
questionnaire designs can be improved to reduce risks for plausible value estimation and secondary analyses.
Context and Position Effects on Survey Questions and Implications for Matrix Sampling
Paul Jewsbury and Jonas Bertling, Educational Testing Service
Matrix Sampling and Imputation of Context Questionnaires: Implications for Generating Plausible Values
David Kaplan and Dan Su, University of Wisconsin – Madison
Imputing Missing Background Data, How to ... And When to ...
Matthias von Davier, Educational Testing Service
Design Considerations for Planned Missing Auxiliary Data in a Latent Regression Context
Leslie Rutkowski, University of Oslo
57
12:25 PM - 1:55 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, C4
Estimation Precision of Variance Components: Revisiting Generalizability Theory
Session Discussant: Xiaohong Gao, ACT, Inc.
In this coordinated session of three presentations, the overarching theme is the estimation precision of variance
components (VCs) in generalizability theory (G theory). The estimation precision is of significant importance in that
VCs are the building blocks of reliability, on which valid interpretations of measurement are contingent. In the first
presentation, the authors discuss the adverse effects of non-additivity on the estimation precision of VCs. Specifically,
the VC of subjects is underestimated and consequently, generalizability coefficients are also underestimated in a onefacet design. An example of non-additivity is the presence of subject-by-facet interaction in a one-facet design. The
authors demonstrate that a nonadditive model should be used in such a case to obtain unbiased estimators for VCs.
As a follow-up study, the second presentation focuses on the identification of non-additivity by use of Tukey’s singledegree-freedom test. The authors evaluate Tukey’s test for non-additivity in terms of Type I and Type II error rates.
Finally, the third presentation extends our theme in a multivariate context and touches on the estimation precision of
construct-irrelevant VCs in subscore profile analysis. The authors compare the extent to which Component Universe
Score Profiles and factor analytic profiles accurately represent subscore profiles.
Bias in Estimating Subject Variance Component When Interaction Exists in One-Facet Design
Jinming Zhang, University Of Illinois at Urbana-Champaign
Component Universe Score Profiles: Advantages Over Factor Analytic Profile Analysis
Joe Grochowalski, The College Board; Se-Kang Kim, Fordham University
Evaluating Tukey’s Test for Detecting Nonadditivity in G-Theory Applications
Chih-Kai Lin, Center for Applied Linguistics (CAL)
58
Washington, DC, USA
12:25 PM - 1:55 PM, Meeting Room 4, Meeting Room Level, Paper Session, C5
Sensitivity of Value-Added Models
Session Discussant: Katherine Furgol Castellano, ETS
Cohort and Content Variability in Value-Added Model School Effects
Daniel Anderson and Joseph Stevens, University of Oregon
The purpose of this paper was to explore the extent to which school effects estimated from a random-effects valueadded model (VAM) vary as a function of year-to-year fluctuations in the student sample (i.e., cohort) and the tested
subject (reading or math). Preliminary results suggest high volatility in school effect estimates.
Value-Added Modelling Considerations for School Evaluation Purposes
Lucy Lu, NSW Department of Education, Australia
This paper discusses findings from the development of value-added models for a large Australian education system.
Issues covered include the impact of modelling choices on the representation of schools of different sizes in the
distribution of school effects; sensitivity of VA estimates to test properties and to missing test data.
Implications of Differential Item Quality for Test Scores and Value-Added Estimates
Robert Meyer, Nandita Gawade and Caroline Wang, Education Analytics, Inc.
We explore whether differential item quality compromises the use of locally-developed tests in student performance
and educator evaluation. Using simulated and empirical data, we find that item corruption affects test scores, and
to a lesser extent, value-added estimates. Adjusting test score scales and limiting to well-functioning items mitigate
these effects.
59
Item and Scale Drift
Session Discussant: Jonathan Weeks, ETS
The Impact of Item Parameter Drift in Computer Adaptive Testing (cat)
Nicole Risk, American Medical Technologists
The impact of IPD on measurement in CAT was examined. The amount and magnitude of IPD, as well as the size
of the item pool, was varied in a series of simulations. A number of criteria was used to evaluate the effects on
measurement precision, classification, and test efficiency.
Practice Differences and Item Parameter Drift in Computer Adaptive Testing
Beyza Aksu Dunya, University of Illinois at Chicago
The purpose of this simulation study was to evaluate the impact of IPD that occurs due to teaching and practice
differences on person parameter estimation and classification accuracy in CAT when factors such as percentage of
drifting items and percentage of examinees receiving differential teaching and practices vary.
Investigating Linear and Nonlinear Item Parameter Drift with Explanatory IRT Models
Luke Stanke, Minneapolis Public Schools; Okan Bulut, University of Alberta; Michael Rodriguez and Jose Palma, University
of Minnesota
This study investigates the impact of model misspecification in detecting linear and nonlinear item parameter drift
(IPD). Monte Carlo simulations were conducted to examine drift with linear, quadratic, and factor IPD models under
various testing conditions.
Quality Control Models for Tests with a Continuous Administration Mode
Yuyu Fan, Fordham University; Alina von Davier and Yi-Hsuan Lee, ETS
This paper systematically compared the performance of Change Point Models (CPM) and Hidden Markov Models
(HMM) on score stability monitoring and scale drift assessment in educational test administrations using simulated
data. The study will contribute to the continuing monitoring of scale scores for the purpose of quality control in
equating.
Ensuring Test Fairness Through Monitoring the Anchor Test and Covariates
Marie Wiberg, Umeå University; Alina von Davier, Educational Testing Service
A quality control procedure for a testing program with multiple consecutive administrations with anchor test is
proposed. Descriptive statistics, ANOVA, IRT and linear mixed effect models were used to examine the impact of
covariates on the anchor test. The results implies that the covariates play a significant part.
60
Washington, DC, USA
Cognitive Diagnostic Model Extensions
Session Discussant: Larry DeCarlo, Teachers College; Columbia University
A Polytomously-Scored Dina Model for Graded Response Data
Dongbo Tu, Chanjin Zheng and Yan Cai, Jiangxi Normal University; Hua-Hua Chang, University of Illinois at UrbanaChampaign
This paper proposed a polytomous extension of the DINA model for a test with polytomously-scored items.
Simulation study was conducted to investigate the performance of the proposed model. In addition, a real-data
example was used to illustrate the application of this new model with the polytomously-scored items.
Information Matrix Estimation Procedures for Cognitive Diagnostic Model
Tao Xin, Yanlou Liu and Wei Tian, Beijing Normal University
The performance of sandwich-type covariance matrix in CDM is consistent and robust to model misspecification.
The Type I error rates of the Wald statistic, constructed by using observed information matrix, for one-, two-, and
three-attribute items are all perfectly matched the nominal levels, when the sample size was relatively large.
Higher-Order Cognitive Diagnostic Models for Polytomous Latent Attributes
Peida Zhan and Yufang Bian, Beijing Normal University; Wen-Chung Wang and Xiaomin Li, The Hong Kong Institude of
Education
Latent attributes in cognitive diagnostic models (CDMs) are dichotomous, but in practice polytomous attributes are
possible. We developed a set of new CDMs in which the polytomous attributes are assumed to measure the same
continuous latent trait. Simulation studies demonstrated good parameter recovery using WinBUGS. An empirical
example was given.
Incorporating Latent and Observed Predictors in Cognitive Diagnostic Models
Yoon Soo Park and Kuan Xing, University of Illinois at Chicago; Young-Sun Lee, Teachers College, Columbia University;
MiYoun Lim, Ewha Womans University
A general approach to specify observed and latent factors (estimated using item response theory) as predictors in
an explanatory framework for cognitive diagnostic models is proposed. Simulations were conducted to examine
the stability of estimates; real-world data analyses were conducted to demonstrate the framework and application
using TIMSS data.
61
12:25 PM - 1:55 PM, Mount Vernon Square, Meeting Room Level, Electronic Board
Session, Paper Session, C8
Electronic Board #1
Examination of Over-Extraction of Latent Classes in the Mixed Rasch Model
Sedat Sen, Harran University
Correct identification of number of latent classes in MRMs is very important. This study investigated the overextraction problem in MRMs by focusing on non-normal ability distributions and fit index selection. Three ML-based
estimation techniques were used and over-extraction problem was observed under some conditions.
Electronic Board #2
Identifying a Credible Reference Variable for Measurement Invariance Testing
Cheng-Hsien Li and KwangHee Jung, Department of Pediatrics, University of Texas Medical School at Houston
Two limitations to model identification in multiple-group CFA, unfortunately, have received little attention: (1) the
standardization in loading invariance and (2) the lack of a statistical test for intercept invariance. The proposed
strategy extends a MIMIC model with moderated effects to identify a credible reference variable for measurement
invariance testing.
Electronic Board #3
Using Partial Classification of Respondents to Reduce Classification Error in Mixture IRT
Youngmi Cho, Pearson; Tongyun Li, ETS; Jeffrey Harring and George Macready, University of Maryland
This study investigates an alternative classification method in mixture IRT models. This method incorporates an
additional classification criterion. Namely that the largest posterior probability for each response pattern must equal
or exceed a specified lower bound. This results in a reduction of expected classification error.
Electronic Board #4
Parameter Recovery in Multidimensional Item Response Theory Models Under Complexity and Nonormality
Stephanie Underhill, Dubravka Svetina, Shenghai Dai and Xiaolin Wang, Indiana University - Bloomington
We investigate item and person parameter recovery in multidimensional item response theory models for
understudied conditions. Specifically, we ask how well can IRTpro and the mirt package in R recover the parameters
when person distribution is nonnormal, items exhibit varying degrees of complexity, and different item parameters
comprise an assessment.
Electronic Board #5
Psychometric Properties of Technology-Enhanced Item Formats
Ashleigh Crabtree and Catherine Welch, University of Iowa
Objectives of this research will be to provide information about the properties of technology-enhanced item
formats. Specifically, the research will focus on the construct representation and technical properties of test forms
that use these item types.
62
Washington, DC, USA
Electronic Board #6
Using Technology-Enhanced Items to Measure Fifth Grade Geometry
Jessica Masters, Lisa Famularo and Kristin King, Measured Progress
Technology-enhanced items have potential to provide improved measurement of high-level constructs. But
research is needed to evaluate whether these items lead to valid inferences about knowledge and provide improved
measurement over traditional items. This paper explores these questions in the context of fifth grade geometry
using qualitative cognitive lab data.
Electronic Board #7
A Multilevel Mt-Mm Approach for Estimating Trait Variance Across Informant Types
Tim Konold and Kathan Shukla, University of Virginia
An approach for extracting common trait variance from structurally different informant ratings is presented with
an extension for measuring the resulting factors’ associations with an external outcome. Results are based on
structurally different and interchangeable students (N = 45,641) and teachers (N = 12,808) from 302 schools.
Electronic Board #8
A Validation Study of the Learning Errors and Formative Feedback (leaff) Model
Wei Tang, Jacqueline Leighton and Qi Guo, University of Alberta
The objective of the present study involves (1) validating the selected measures of the latent variables in the
Learning Errors and Formative Feedback (LEAFF) model, and (2) applying a structural equation model to evaluate
the core of the LEAFF model. In addition, culturally invariant models are analyzed and presented.
Electronic Board #9
Automatic Flagging of Items for Key Validation
Füsun Şahin, University at Albany, State University of New York; Jerome Clauser, American Board of Internal Medicine
Key validation procedures typically rely on professional judgement to identify potentially problematic items.
Unfortunately, lack of standardized flagging criteria can introduce bias in examinee scores. This study demonstrates
the use of logistic regression to mimic expert judgment and automatically flag problematic items. The final model
properly identified 96% of items.
Electronic Board #10
Evaluating the Robustness of Multidimensional IRT (mirt) Based Growth Modeling
Hanwook Yoo, Seunghee Chung, Peter van Rijn and Hyeon-Joo Oh, Educational Testing Service
This study evaluates the robustness of MIRT-based growth modeling when tests are not strictly unidimensional.
Primary independent variables manipulated are a) magnitude of student growth and b) magnitude of test
multidimensionality. The findings support how growth is effectively measured by proposed model under different
test conditions.
Standard Errors of Measurement for Group-Level SGP with Bootstrap Procedures
Jinah Choi, Won-Chan Lee, Robert Brennan and Robert Ankenmann, The University of Iowa
This study provides procedures for estimating standard errors of measurement and confidence intervals for grouplevel SGPs by using bootstrap sampling plans in generalizability theory. It is informative to gauge reliability of the
reported SGPs when reporting the mean or median of individual SGPs within a group of interest.
63
Vertical Scaling of Test with Mixed Item Formats Including Technology Enhanced Items
Dong-In Kim, Ping Wan and Joanna Tomkowicz, Data Recognition Corporation; Furong Gao, Pacific Metric; Jungnam Kim,
NBCE
This study is intended to enhance the knowledge base of IRT vertical scaling when tests consists of mixed item
types including technology-enhanced items. Using large scale state assessments, the study compares results from
different configurations of item type compositions of anchor set, anchor sources, IRT models, and vertical scaling
methods.
Full-Information Bifactor Growth Models and Derivatives for Longitudinal Data
Ying Li, American Nurses Credentialing Center
Bifactor growth model with correlated general factors has shown promising in recovering longitudinal data;
however it’s not known whether the simplified models perform well with comparable estimation accuracy. This
study investigated two simplified versions of the model in data recovery under various conditions, aiming to provide
guidance on model selections.
The Pseudo-Equivalent Groups Approach as an Alternative to Common-Item Equating
Sooyeon Kim and Ru Lu, Educational Testing Service
This study evaluates the effectiveness of equating test scores by using demographic data to form “pseudoequivalent groups” of test takers. The study uses data from a single test form to create two half-length forms for
which the equating relationship is known.
Equating with a Heterogeneous Target Population in the Common-Item Design
Ru Lu and Sooyeon Kim, Educational Testing Service
This study evaluates the effectiveness of weighting for each subgroup in the nonequivalent groups with commonitem design. This study uses data from a single test form to create two research forms for which the equating
relationship is known. Two weighting schemes are compared in terms of equating accuracy.
Examining the Reliability of Rubric Scores to Assess Score Report Quality
Mary Roduta Roberts, University of Alberta; Chad Gotch, Washington State University
The purpose of this study is to assess the reliability of scores obtained from a recently developed ratings-based
measure of score report quality. Findings will be used to refine assessment of score report quality and advance the
study and practice of score reporting.
Accuracy of Angoff Method Item Difficulty Estimation at Specific Cut Score Levels
Tanya Longabach, Excelsior College
This study examines the accuracy of item difficulty estimates in Angoff standard setting with no normative item data
available. Correlation between observed and estimated item difficulty is moderate to high. The judges consistently
overestimate student ability at higher cut levels, and underestimate ability of students at the D cut level.
64
Washington, DC, USA
A Passage-Based Approach to Setting Cutscores on Ela Assessments
Marianne Perie and Jessica Loughran, Center for Educational Testing and Evaluation
New assessments in ELA contain a strong focus on reading comprehension with multiple passages of varying
complexity. Using a variant on the Bookmark method, this study provides results from two standard setting
workshops with two approaches to setting passage-based cut scores and two approaches to recovering the
intended cut score.
Psychometric Characteristics of Technology Enhanced Items from a Computer-Based Interim Assessment
Program
Nurliyana Bukhari, University of North Carolina at Greensboro; Keith Boughton and Dong-In Kim, Data Recognition
Corporation
This study compared the IRT information of technology enhanced (TE) item formats from an interim assessment
program. Findings indicate that the evidence-based selected response items within English Language Arts and
the select-and-order, equation-and-expression entry, and matching items within Mathematics, provided more
information when compared to the traditional selected response items.
Exposure Control for Response Time-Informed Item Selection and Estimation in CAT
Justin Kern, Edison Choe and Hua-Hua Chang, University of Illinois at Urbana-Champaign
This study will investigate item exposure control while using response times (RTs) with item responses in CAT to
minimize overall test-taking time. Items are selected as maximum information per time unit as in Fan et al. (2012).
Calculations use estimates for ability and speededness obtained via a joint-estimation MAP routine.
Monitoring Item Drift Using Stochastic Process Control Charts
Hongwen Guo and Frederic Robin, ETS
In on-demand testing, test items have to be reused; however their true characteristics may drift away over time. This
study links item drift to DIF analysis and SPC methods in a sequence of test administrations are used to detect item
drift as early as possible.
Reporting Subscores Using Different Multidimensional IRT Models in Sequencing Adaptive Testing
Jing-Ru Xu, Pearson VUE; Frank Rijmen, Association of American Medical Colleges
This research investigates the efficiency of reporting subscores in sequencing adaptive testing. It compares this
new implementation with a general multidimensional CAT program. Different multidimensional models were
fitted in different CAT simulation studies using PISA 2012 Math with four subdomains. It provides insights into score
reporting in multidimensional CAT.
Multidimensional IRT Model Estimation with Multivariate Non-Normal Latent Distributions
Tongyun Li and Liyang Mao, Educational Testing Service
The purpose of the present study is to investigate the robustness of the multidimensional IRT model parameter
estimation when the latent distribution is multivariate non-normal. A simulation study is proposed to evaluate the
65
accuracy of item and person parameter estimates with different magnitudes of violation to the multivariate normal
assumption.
Stochastic Ordering of the Latent Trait Using the Composite Score
Feifei Li and Timothy Davey, Educational Testing Service
The purposes of this study are to investigate whether combining score from monotonic items causes the violation
of SOL in the empirical composite score function and to find out what are the factors that introduce violation of SOL
when combining monotonic polytomous items.
Establishing Critical Values for Parscale G2 Item Fit Statistics
Lixiong Gu and Ying Lu, Educational Testing Service
Research shows the Type I error rate of the PARSCALE G2 statistic are inflated with the decrease of test-length and
increase of sample size. This study develops a table of empirical critical values for Type I error of 0.05 at different
sample sizes that may help psychometricians flag misfit items.
66
Washington, DC, USA
2:15 PM - 3:45 PM, Renaissance East, Ballroom Level, Invited Session, D1
Assessing the Assessments: Measuring the Quality of New College- and Career-Ready
Assessments
Morgan Polikoff, USC
Tony Alpert, Smarter Balanced
Bonnie Hain, PARCC
Brian Gill, Mathematica
Carrie Conaway, Massachusetts Department of Education
Donna Matovinovic, ACT
This panel presents results from two recent studies of the quality of new college and career-ready assessments.
The first study uses a new methodology to evaluate the quality of PARCC, Smarter Balanced, ACT Aspire, and
Massachusetts MCAS against the CCSSO Criteria for High Quality Assessment. After the presentation of the study
and its findings, respondents from PARCC and Smarter Balanced will discuss the methodology and their thoughts on
the most important dimensions against which new assessments should be evaluated. The second study investigates
the predictive validity of PARCC and MCAS for predicting success in college. After the presentation of the study and
its findings, respondents from the Massachusetts Department of Education will discuss the study and the state’s
needs regarding evidence to select and improve next-generation assessment. The overarching goal of the panel is
to provoke discussion and debate about the best ways to evaluate the quality of new assessments in the collegeand career-ready standards era.
67
2:15 PM - 3:45 PM, Renaissance West A, Ballroom Level, Coordinated Session, D2
Some Psychometric Models for Learning Progressions
Session Chair: Mark Wilson, University of California, Berkeley
Session Discussant: Matthias Von Davier, ETS
Learning progressions represent theories about the conceptual pathways that students follow when learning in
a domain (NRC, 2006). One common type of representation is a multidimensional structure, with links between
certain pairs of levels of the different dimensions (as predicted by, say, substantive theory and/or empirical findings).
An illustration of such a complex hypothesis, which derives from an assessment development project in the area of
statistical modeling for middle school students called the Assessing Data Modeling and Statistical Reasoning (ADM;
Lehrer, Kim, Ayers & Wilson, 2014) project. The vertical columns of boxes (such as Cos1, Cos2, ... Cos4) represent the
levels of each of the 6 dimensions of the learning progression. In addition to these “vertical” links between different
levels of each construct, other links between levels of different constructs (such as the one from ToM6 to Cos3) that
indicate that there is an expectation (from theory and/or earlier empirical findings) that a student needs to succeed
on the 6th level of the ToM dimension before they can be expected to succeed on the 3rd level of the CoS dimension.
Putting it a bit more formally, we use a genre of representation that is structured as a multidimensional set
of constructs: Each construct has (1) several levels representing successive levels of sophistication in student
understanding and (2) directional relations between individual levels of different constructs. We call the models
used to analyze such a structure structured constructs models (SCMs; Wilson, 2009).
Introduction to the Concept of a Structured Constructs Model (scm)
Mark Wilson, University of California, Berkeley
Modeling Structured Constructs as Non-Symmetric Relations Between Ordinal Latent Variables
David Torres Irribarra, Pontificia Universidad Católica de Chile; Ronli Diakow (Brenda Loyd Dissertation Award Winner,
2015), New York City Department of Education
A Structured Constructs Model for Continuous Latent Traits with Discontinuity Parameters
In-Hee Choi, University of California, Berkeley
A Structured Constructs Model Based on Change-Point Analysis
Hyo Jeong Shin, ETS
Discussion of the Different Approaches to Using Item Response Models for Scms
Mark Wilson, University of California, Berkeley
68
Washington, DC, USA
2:15 PM - 3:45 PM, Renaissance West B, Ballroom Level, Coordinated Session, D3
Multiple Perspectives on Promoting Assessment Literacy for Parents
Session Chair: Lauress Wise, Human Resources Research Organization (HumRRO)
The national dialogue on American education has become increasingly focused on assessment. There is a clear need
for greater understanding about fundamental aspects of educational testing. Several organizations and individuals
have undertaken concerted efforts to increase the assessment literacy of various audiences, including educators,
policymakers, parents, and the general public.
This coordinated session will focus on the efforts taken by three initiatives that include parents among the target
audiences. NCME Past President Laurie Wise will introduce the session by discussing the need for initiatives that
increase the assessment literacy of parents. NCME Board member Cindy Walker will discuss the ongoing efforts
on behalf of NCME to develop and promote assessment literacy materials. Beth Rorick of the National Parent
Teacher Association will discuss a national assessment literacy effort to educate parents on college and career ready
standards and state assessments. Maria Donata Vasquez-Colina and John Morris of Florida Atlantic University will
discuss outcomes and follow up activities from focus groups with parents on assessment literacy. Presentations will
be followed by group discussion (among both panelists and audience members) on ideas for coordinating multiple
efforts to increase parents’ assessment literacy.
NCME Assessment Literacy Initiative
Cindy Walker, University of Wisconsin - Milwaukee
NAEP Assessment Literacy Initiative
Beth Rorick, National Parent-Teacher Association
Lessons Learned from Parents on Assessment Literacy
Maria Donata Vasquez-Colina and John Morris, Florida Atlantic University
69
2:15 PM - 3:45 PM, Meeting Room 3, Meeting Room Level, Paper Session, D4
Equating Mixed-Format Tests
Session Discussant: Won-Chan Lee, University of Iowa
Classification Error Under Random Groups Equating Using Small Samples with Mixed-Format Tests
Ja Young Kim, ACT, Inc.
Few studies investigated equating with small samples using mixed-format tests. The purpose of this study is to
examine the impact of small sample and equating method on the misclassification of examinees based on where
the passing scores are located, taking into account factors related to using the mixed-format tests.
Sample Size Requirement for Trend Scoring in Mixed-Format Test Equating
Qing Yi and Yong He, ACT, Inc.; Hua Wei, Pearson
The purpose of this study is to investigate how many rescored responses are sufficient to adjust for the differences in
rater severity across test administrations in mixed-format test equating. Simulated data are used to study the sample
size requirement for the trend scoring method with IRT equating.
Comparing IRT-Based and Ctt-Based Pre-Equating in Mixed-Format Testing
Meichu Fan, Xin Li and YoungWoo Cho, ACT, Inc.
Pre-equating research has tremendous appeal to test practitioners with the demand for immediate score reporting.
IRT pre-equating research is readily applicable, but research on pre-equating using classical test theory (CTT), where
only classical item statistics are available, is limited. This study compares various pre- and post-equating methods
in mixed-format testing.
Equating Mixed-Format Tests Using Automated Essay Scoring (aes) System Scores
Süleyman Olgar, Florida Department of Education; Russell Almond, Florida State University
This study investigated the impact of using generic e-rater scores to equate mixed-format tests with MC items
and an essay. The kappa and observed agreements were large and similar across six equating methods.
The MC+e-rater equating outcomes are strong and even better than the MC-only equating results for some
conditions.
70
Washington, DC, USA
Standard Setting
Session Discussant: Susan Davis-Becker, Alpine Testing Solutions
Exploring the Influence of Judge Proficiency on Standard-Setting Judgments for Medical Examinations
Michael Peabody, American Board of Family Medicine; Stefanie Wind, University of Alabama
The purpose of this study is to explore the use of the Many-Facet Rasch model (Linacre, 1989) as a method for
adjusting modified-Angoff standard setting ratings (Angoff, 1971) based on judges’ subject area knowledge.
Findings suggest differences in the severity and quality of standard-setting judgments across levels of judge
proficiency.
Setting Cut Scores on the Ap Seminar Course and Exam Components
Deanna Morgan and Priyank Patel, The College Board; Yang Zhao, University of Kansas
This paper documents a standard-setting study using the Performance Profile Method to determine recommended
cut scores for examinees to be placed in each of the AP grade categories (1-5). The Subject Matter Experts used an
ordered profile packet of students’ performance, and converged on recommended scores.
Interval Validation Method for Setting Achievement Level Standards for Computerized Adaptive Tests
William Insko and Stephen Murphy, Houghton Mifflin Harcourt
The Interval Validation Method for setting achievement level standards is specifically designed for assessments
with large item pools, such as computerized adaptive tests. The method focuses judgments on intervals of similarly
performing items presumed to contain a single cut score location. Validation of the interval sets the cut score.
The Use of Web 2.0 Tools in a Bookmark Standard Setting
Jennifer Lord-Bessen, McGraw Hill Education CTB; Ricardo Mercado, DRC; Adele Brandstrom, CTB
This study examines the use of interactive, collaborative Web tools in an onsite, online Bookmark Standard Setting
workshop for a state assessment. It explores the feasibility of this concept—addressing issues of security, user
satisfaction, and cost—in a fully online standard setting with remote participants.
71
Diagnostic Classification Models: Applications
Session Discussant: Jonathan Templin, University of Kansas
Assessing Students’ Competencies Through Cognitive Diagnosis Models: Validity and Reliability Evidences
Miguel Sorrel, Julio Olea and Francisco Abad, Universidad Autónoma de Madrid; Jimmy de la Torre, Rutgers, The State
University of New Jersey; Juan Barrada, Universidad de Zaragoza; David Aguado, Instituto de Ingeniería del Conocimiento;
Filip Lievens, Ghent University
Cognitive diagnosis models can be applied to situational judgement tests to provide information about noncognitive
factors, which currently are not included in selection procedures for admission to university. Reliable measures of
study orientation (habits and attitudes), helping others, and generalized compliance were significantly related to
the grade point average.
Examining Effects of Pictorial Fraction Models on Student Test Responses
Angela Broaddus, Center for Educational Testing and Evaluation University of Kansas; Meghan Sullivan, University of
Kansas
The present study investigates the effects of aspects of visual fraction models on student test responses. Responses
to 50 items assessing partitioning and identifying unit fractions were analyzed using diagnostic classification
methods to provide insight into effective representations of early fraction knowledge.
Evaluation of Learning Map Structure Using Diagnostic Cognitive Modeling and Bayesian Networks
Feng Chen, Jonathan Templin and William Skorupski, The University of Kansas
The learning map underlying the assessment system should accurately specify the connections among nodes,
as well as specify nodes at the appropriate level of the granularity. This paper seeks to validate a learning map
combining real data analyses and simulation study to provide inference to test development.
72
Washington, DC, USA
Advances in IRT Modelling and Estimation
Session Discussant: Mark Hansen, UCLA
Estimation of Mixture IRT Models from Nonnormally Distributed Data
Tugba Karadavut and Allan S. Cohen, University of Georgia
Mixture IRT models generally assume standard normal ability distributions but, nonnormality is likely to occur in
many achievement tests. Nonnormality has been shown to cause extraction of spurious latent classes. A skew t
distribution, corrected extraction of spurious latent classes in growth models, and will be studied in this research.
Two-Tier Item Factor Models with Empirical Histograms as Nonnormal Latent Densities
Hyesuk Jang, American Institutes for Research; Ji Seung Yang, University of Maryland; Scott Monroe, University of
Massachusetts
The purpose of this study is to investigate the effects of nonnormal latent densities in two-tier item factor models
on parameter estimates and to propose an extended empirical histogram approach that allows an appropriate
characterization of the nonnormal densities for two correlated general factors and unbiased parameter estimates.
Examining Performance of the Mh-Rm Algorithm with the 3pl Multilevel MIRT Model
Bozhidar Bashkov, American Board of Internal Medicine; Christine DeMars, James Madison University
This study examined the performance of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm (Cai, 2010b)
in estimating 3PL multilevel multidimensional IRT (ML-MIRT) models. Item and person parameter recovery as well as
variances and covariances at different levels were investigated in different combinations of number of dimensions,
intraclass correlation levels, and sample sizes.
Expectation-Expectation-Maximization: A Feasible Mixture-Model-Based Mle Algorithm for the ThreeParameter Logistic Model
Chanjin Zheng, Jiangxi Normal University; Xiangbing Meng, Northeast Normal University
Stable MLE of item parameters under 3PLM with a modest sample size remains a challenge. The current study
presents a mixture-model approach to 3PLM based on which a feasible Expectation-Expectation-Maximization MLE
algorithm is proposed. The simulation study indicates that EEM is comparable to Bayesian EM.
73
Session: GSIC Graduate Student Poster Session, D8
Graduate Student Issues Committee
Brian Leventhal, Chair
Masha Bertling, Laine Bradshaw, Lisa Beymer, Evelyn Johnson, Ricardo Neito, Ray
Reichenberg, Latisha Sternod, Dubravka Svetina
Electronic Board #1
Testing Two Alternatives to a Value-Added Model for Teacher Capability
Nicole Jess, Michigan State University
This study tests two alternatives to Value-Added Models (VAMs) for teacher capability: Student Response Model
(SRM) and Multilevel Mixture Item Response Model (MMixIRM). We will compare the accuracy of estimation of
teacher capability using these models under various conditions of class size, location of cut-score, and student
assignment to teacher.
Electronic Board #2
Using Response Time in Cognitive Diagnosis Models
Nathan Minchen, Rutgers, The State University of New Jersey
No abstract submitted at time of printing
Electronic Board #3
An Exhaustive Search for Identifying Hierarchical Attribute Structure
Lokman Akbay, Rutgers, The State University of New Jersey
Specification of an incorrect hierarchical relationship between any two attributes can substantially degrade
classification accuracy. As such, the importance of correctly identifying the hierarchical structure among attributes
cannot be overemphasized. The primary objective of this study is to propose a procedure for identifying the most
appropriate hierarchical structure for attributes.
Electronic Board #4
Performance of DIMTEST and Generalized Dimensionality Discrepancy Statistics for Assessing
Unidimensionality
Ray Reichenberg, Arizona State University
The standardized generalized dimensionality discrepancy measure (SGDDM; Levy, Xu, Yel, & Svetina, 2015) was
compared to DIMTEST in terms of their absolute and relative efficacy in assessing the unidimensionality assumption
common in IRT under a variety of testing conditions (e.g., sample size/test length). Results and future research
opportunities are discussed.
Electronic Board #5
Self-Directed Learning Oriented Assessments Without High Technologies
Jiahui Zhang, Michigan State University
Self-directed learning oriented assessments capitalizes on the construction of assessment activities for optimal
learning and for the cultivation of self-directed learning capacities. This study aims to develop such an assessment
74
Washington, DC, USA
combining the strengths of paper-pencil tests, CDM, and standard setting, which can be used by learners without
high technologies.
Electronic Board #6
Vertical Scaling Under Rasch Testlet Model
Mingcai Zhang, Michigan state university
Using Rasch testlet model, the scaling constants are estimated between three pairs of adjacent grades which are
linked through anchor testlets. The simulated factors that impact the precision of scaling constant estimation
include group mean difference, anchor testlet positions, and the magnitude of testlet effect.
Electronic Board #7
The Effect of DIF on Group Invariance of IRT True Score Equating
Dasom Hwang, Yonsei University
Traditional methods for detecting DIF have been used for single level data analysis. However, most data in education
has multilevel structure. This study investigates more effective method under various conditions comparing
statistical power and type 1 error rates using adjusted methods based on Mantel-Haenszel method and SIBTEST
for multilevel data.
Electronic Board #8
Detecting Non-Fitting Items for the Testlet Response Model
Ryan Lynch, University of Wisconsin - Milwaukee
A Monte Carlo simulation will be conducted to evaluate the s-X2 item fit statistic. Findings indicate that the s-X2 may
be a viable tool for evaluating item fit when the testlet effect is large, but results are mixed when the testlet effect
is small.
Electronic Board #9
An Iterative Technique to Improve Test Cheating Detection Using the Omega Statistic
Hotaka Maeda, University of Wisconsin-Milwaukee
We propose an iterative technique to improve ability estimation for accused answer copiers. A Monte Carlo
simulation showed that by using the new ability estimate, the omega statistic had better controlled Type I error and
increased power in all studied conditions, particularly when the source ability was high.
Parameter Recovery in the Multidimensional Graded Response Item Response Theory Model
Shengyu Jiang, Universtiy of Minnesota
Multidimensional graded response model can be a useful tool in modeling ordered categorical test data for multiple
latent traits. A simulation study is conducted to investigate the variables that might affect parameter recovery and
provide guidance for test construction and data collection in practical settings where the MGRM is applied.
The Impact of Ignoring a Multilevel Structure in Mixture Item Response Models
Woo-yeol Lee, Vanderbilt University
Multilevel mixture item response models are widely discussed but infrequently used in education research.
Because little research exists assessing when it is necessary to use such models, the current study investigated the
consequences of ignoring a multilevel structure in mixture item response models via a simulation study.
75
Determining the Diagnostic Properties of the Force Concept Inventory
Mary Norris, Virginia Tech
The Force Concept Inventory (FCI) is widely used to measure learning in introductory physics. Typically, instructors
use total score. Investigation suggests that the test is multidimensional. This study fits FCI data with cognitive
diagnostic and bifactor models in order to provide a more detailed assessment of student skills.
Understanding School Truancy: Risk-Need Latent Profiles of Adolescents
Andrew Iverson, Washington State University
Latent Profile Analysis was used to examine risk and needs profiles of adolescents in Washington State based on
the WARNS assessment. Profiles were developed to aid understanding of behaviors associated with school truancy.
Profiles were examined across student demographic variables (e.g., suspensions, arrests) to provide validity
evidence for the profiles.
Utilizing Nonignorable Missing Data Information in Item Response Theory
Daniel Lee, University of Maryland
The purposes of this simulation study are to examine the effects of ignoring nonignorable missing data in item
response models and evaluate the performance of model-based and imputation-based approaches (e.g., stochastic
regression and Markov Chain Monte Carlo imputation) in parameter estimation to provide practical guidance to
applied researchers.
Investigating IPD Amplification and Cancellation at the Testlet-Level on Model Parameter Estimation
Rosalyn Bryant, University of Maryland College Park
This study investigates the effect of item parameter drift (IPD) amplification or cancellation on model parameter
estimation in a testlet-based linear test. Estimates will be compared between a 2-Parameter item response theory
(IRT) model and a 2-Parameter testlet model varying magnitudes and patterns of IPD at item and testlet levels.
Measuring Reading Comprehension Through Automated Analysis of Students’ Small-Group Discussions
Audra Kosh, University of North Carolina, Chapel Hill
We present the development and initial validation of a computer-automated tool that measures elementary school
students’ reading comprehension by analyzing transcripts of small-group discussions about texts. Students’ scores
derived from the automated tool were a statistically significant predictor of scores on traditional multiple-choice
and constructed-response reading comprehension tests.
Differential Item Functioning Among Students with Disabilities and English Language Learners
Kevin Krost, University of Pittsburgh
The presence of differential item functioning (DIF) was investigated on a statewide eighth grade mathematics
assessment. Both students with disabilities and English language learners were focal groups, and several IRT and
CTT methods were used and compared. Implications of results were discussed.
76
Washington, DC, USA
Extreme Response Style: Which Model is Best?
Brian Leventhal, University of Pittsburgh
More robust and rigorous psychometric models, such as IRT models, have been advocated for survey applications.
However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme
response options. Through simulation methods, this study helps determine which model accounting for extreme
response tendency is more appropriate.
Evaluating DIF Detection Procedure in the Context of the Mirid
Isaac Li, University of South Florida
The model with internal restriction on item difficulties (MIRID) is a componential Rasch model with unique betweenitem relationships, which pose challenges for psychometric studies like differential item functioning in its context.
This empirical study compares and evaluates the suitability of four different DIF detection procedures for the MIRID.
Item Difficulty Modeling of Computer-Adaptive Reading Comprehension Items Using Explanatory IRT Models
Yukie Toyama, UC Berkeley, Graduate School of Education
This study investigated the effects of passage complexity and item type on difficulty of reading comprehension
items for grades 2-12 students, using the Rasch latent regression linear logistic test model. Results indicated that it is
text complexity, rather than item type, that explained the majority of variance in item difficulty.
Recovering the Item Model Structure from Automatically Generated Items Using Graph Theory
Xinxin Zhang, University of Alberta
We describe a methodology to recover the item models from generated items and present the results using a novel
graph theory approach. We also demonstrate the methodology using generated items from the medical science
domain. Our proposed methodology was found to be robust and generalizable.
The Impact of Item Difficulty on Diagnostic Classification Models
Ren Liu, University of Florida
Diagnostic classification models have been applied to non-diagnostic tests to partly meet the accountability
demands for student improvement. The purpose of the study is to investigate the impact of item parameters (i.e.
discrimination, difficulty, and guessing) on attribute classification when diagnostic classification models are applied
to existing non-diagnostic tests.
Sensitivity to Multidimensionality of Mixture IRT Models
Yoonsun Jang, University of Georgia
Overextraction of latent classes is a concern when mixture IRT models are used in an exploratory approach. This
study investigates whether some kinds of multidimensionality might result in overextraction of latent classes. A
simulation study and an empirical example are presented to explain this effect.
77
Monte Carlo Methods for Approximating Optimal Item Selection in CAT
Tianyu Wang, University of Illinois
Monte Carlo techniques for item selection in an adaptive sequence are explored as a method for determining how
to minimize mean squared error of ability estimation in CAT. Algorithms are developed to trim away candidate items
as the test length increases, and connections to the Maximum Information criterion are studied.
The Relationship Between Q-Matrix Loading, Item Usage, and Estimation Precision in Cd-Cat
Susu Zhang, University of Illinois at Urbana-Champaign
The current project explores the relationship between items’ Q-matrix loadings and their exposure rate in cognitive
diagnostic computerized adaptive tests, under various information-based item selection algorithms. In addition, the
consequences of selecting certain high-information items loading on a large number of attributes on estimation
accuracy will be examined.
78
Washington, DC, USA
4:05 PM - 6:05 PM, Renaissance East, Ballroom Level, Coordinated Session, E1
Do Large Scale Performance Assessments Influence Classroom Instruction? Evidence
from the Consortia
Session Discussant: Suzanne Lane, University of Pittsburgh
Each of the six major statewide assessment consortia created logic models to explicate their theories of action
for including performance assessment components in their summative and formative assessment designs. In this
session, we will focus on the theory of action hypothesis that including performance assessment components will
lead to desired changes in classroom teaching activities and student learning. This hypothesis echoes similar theories
in the statewide performance assessment movements of the 1990s (e.g., Davey, Ferrara, Shavelson, Holland, Webb,
& Wise, 2015, p. 5; Lane & Stone, 2006, p. 389). Lane and colleagues conducted consequential validity studies to
examine this hypothesis and found modest positive results (e.g., Darling-Hammond & Adamson, 2010; Lane, Parke,
& Stone, 2002; Parke, Lane, & Stone, 2006; Stone & Lane, 2003). Other researchers reported worrisome unintended
consequences (e.g., Koretz, Mitchell, Barron, & Keith, 1996).
In a 2015 NCME coordinated session, several consortia reported on validity arguments and supporting evidence for
the performance assessment components of their programs. The session discussant made the observation that “The
idea that PA will drive improvements in teaching is [the] most suspect part of [the theory of action]; more research
needed.” That was a call for studies of impacts on teaching activities and student learning in the classroom. This
session is a response to that call.
This session is a continuation of ongoing examinations of performance assessment in statewide assessment
programs that follows from well attended sessions in the 2013, 2014, and 2015 NCME meetings. The session is
somewhat innovative in that we have included five of the six major statewide assessment consortia, with the goal
of creating a comprehensive summary on this topic. A discussant will synthesize the evidence provided by the
presenters and evaluate the consortia’s hypothesis about performance assessment and widely held beliefs about
how performance assessment can influence curriculum development, teaching, and learning.
Performance assessment has re-emerged as a widely used assessment tool in large scale assessment programs
and in classroom formative assessment practices. Developments in validation theory (e.g., Kane, 2013) have placed
claims and evidence in the center of test score interpretation and use arguments—in this case, claims about test
use arguments. The convergence of these two forces requires us to (a) explicate our rationales for using specific
assessment tools for specific purposes and about intended claims and inferences, and (b) investigate the plausibility
of these rationales and claims. The papers in this session will explicate the consortium rationales for including
performance assessment in their designs and provide new evidence of the supportability of their rationales.
Smarter Balanced Assessment Consortium
Marty McCall, Smarter Balanced Assessment Consortium
Dynamic Learning Maps
Marianne Perie and Meagan Karvonen, CETE University of Kansas
NCSC Assessment Consortium
Ellen Forte, edCount
Elpa21 Assessment Consortium
Kenji Hakuta, Stanford University; Phoebe Winter, Independent Consultant
WIDA Consortium
Dorry Kenyon and Meg Montee, Center for Applied Linguistics
79
4:05 PM - 6:05 PM, Renaissance West A, Ballroom Level, Contributed Session, E2
Applications of Latent Regression to Modeling Student Achievement, Growth, and
Educator Effectiveness
Session Chair: J.R. Lockwood, Educational Testing Service
Session Discussant: Matthew Johnson, Columbia University
There are both research and policy demands for making increasingly ambitious inferences about student
achievement, achievement growth and educator effectiveness using longitudinal educational data. For example,
test score data are now used routinely to make inferences about achievement growth through Student Growth
Percentiles (SGP), as well as inferences about the effectiveness of schools and teachers. A common concern in
these applications is that inferences may have both random and systematic errors resulting from limitations of
the achievement measures, limitations of the available data, and/or failure of statistical modeling assumptions.
This session will present four diverse applications in which the accuracy of standard approaches to the estimation
problems can be improved, or their validity tested, through latent regression modeling. “Latent regression” refers to
statistical models involving the regression of unobserved variables on observed covariates (von Davier & Sinharay,
2010). For example, the National Assessment of Educational Progress uses regression of latent achievement
constructs on student background and grouping variables to improve the value of the reported results for secondary
analysis (Mislevy, Johnson, & Muraki, 1992). The increasing availability of methods and software for fitting latent
regression models provides unprecedented opportunities for using them to improve inferences about quantities
now being demanded from educational data.
Using the Fay-Herriot Model to Improve Inferences from Coarsened Proficiency Data
Benjamin Shear, Stanford University; Katherine Furgol Castellano and J.R. Lockwood, Educational Testing Service
Estimating True SGP Distributions Using Multidimensional Item Response Models and Latent Regression
Katherine Furgol Castellano and J.R. Lockwood, Educational Testing Service
Testing Student-Teacher Selection Mechanisms Using Item Response Data
J.R. Lockwood, Daniel McCaffrey, Elizabeth Stone and Katherine Furgol Castellano, Educational Testing Service; Charles
Iaconangelo, Rutgers University
Adjusting for Covariate Measurement Error When Estimating Weights to Balance Nonequivalent Groups
Daniel McCaffrey, J.R. Lockwood, Shelby Haberman and Lili Yao, Educational Testing Service
80
Washington, DC, USA
4:05 PM - 6:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, E3
Jail Terms for Falsifying Test Scores: Yes, No or Uncertain?
Session Moderator: Wayne Camara, ACT
Session Debaters: Mike Bunch, Measurement Incorporated; S E Phillips, Assessment Law
Consultant; Mike Beck, Testing Consultant; Rachel Schoenig, ACT
Far too many testing programs have recently faced public embarrassment and loss of credibility due to wellorganized schemes by educators to fraudulently inflate test scores over extended periods of time. Even testing
programs with good prevention, detection and investigation strategies are frustrated because consequences such
as score invalidation or loss of a license or credential seem not to be sufficient consequences to deter organized
efforts to falsify test scores. The pecuniary gains, job security and recognition from falsified scores have appeared to
outweigh the deterrence effect of existing penalties.
This situation led a prosecutor in Atlanta, Georgia to employ a novel strategy to impose serious consequences on
educators who conspired to fraudulently inflate student test scores. An extensive, external investigation triggered by
excessive erasures and phenomenal test score improvements over ten years had implicated a total of 178 educators,
82 of whom had confessed and resigned, were fired or lost their teaching licenses at administrative hearings. In 2013,
a grand jury indicted 35 of the remaining educators, including the alleged leader of the conspiracy, Superintendent
Beverly Hall, for violation of a state Racketeer Influenced and Corrupt Organizations (RICO) statute. The RICO statute
was originally designed to punish mafia organized crime, but the prosecutor argued that the cover-ups, intimidation
and collusion involved in the organized activity of changing students’ answers on annual tests constituted a criminal
enterprise. He further argued that this criminal enterprise obscured the academic deficiencies and shortchanged the
education of poor minority students. Superintendent Hall, who denied the charges but faced a possible sentence
of up to 45 years in jail, died of breast cancer shortly before the trial began. Twelve of the indicted educators who
refused a plea bargain went to trial and 11 were convicted. The lone defendant who was acquitted was a special
education teacher who had administered tests to students with disabilities.
In April 2015, amid pleas for leniency and with an acknowledgement that the students whose achievements were
misrepresented were the real victims, the trial judge handed down unexpected and stiff punishments that included
jail terms for 8 of the convicted Atlanta educators. After refusing an opportunity to avoid jail time by admitting their
crimes in open court and foregoing their rights to appeal, they were sentenced to jail terms of 1 to 7 years. Two of
the remaining convicted educators, a testing coordinator and a teacher, accepted sentencing deals in which they
received 6 months of weekends in jail and one year of home confinement, respectively. After having been held in the
county jail for two weeks following their convictions, the judge released the sentenced educators on bond pending
appeal. About two weeks later and consistent with the prosecutor’s original recommendations, the same judge
reduced the jail time from 7 years to 3 years for the three administrators who had received the longest sentences.
Despite these reductions, the sentencing of educators to multiyear jail terms for conspiring to falsify test scores
remained unprecedented and controversial.
Although measurement specialists may focus mainly on threats to test score validity and view invalidation of scores
as the most appropriate consequence for violations of test security rules, the exposure of educator conspiracies in
Atlanta and a number of other districts nationally suggests that more severe penalties may be needed to deter such
violations and ensure test score validity. Measurement specialists are likely to be part of the conversations with state
testing programs considering alternative consequences and will be better able to participate responsibly if they are
fully informed about the competing arguments for and against penalizing egregious test security violations with
jail time.
81
Thus, the dual purposes of this symposium are to (1) conduct a debate to illuminate the arguments and evidence in
favor of and against jail time for educators who conspire to falsify student test scores, and (2) to provide audience
members with an opportunity to discuss and vote on a model statute specifying penalties for conspiracy to falsify
student test scores. The model statute also includes an alternative for avoiding jail time similar to that offered to the
convicted Atlanta educators by the trial judge prior to sentencing. A debate format was chosen for this symposium
to present a fair and balanced discussion so audience members can draw their own conclusions. The opportunity
to hear arguments on both sides and to consider the issues from multiple perspectives should provide audience
members with insights and evidence that can be shared with states considering alternative consequences for
violations of test security rules.
82
Washington, DC, USA
4:05 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Paper Session, E4
Test Design and Construction
Session Discussant: Chad Buckendahl
Potential Impact of Section Order on an Internet Based Admissions Test Scoring
Naomi Gafni and Michal Baumer, National Institute for Testing & Evaluation
Meimad is an internet based admissions test consisting of eight multiple choice sections. One out of every seven
test forms is randomly selected and the eight test sections in it are presented to examinees in a random order. The
study examines the effect of section position on performance level.
Automated Test-Form Generation with Constraint Programming (cp)
Jie Li and Wim van der Linden, McGraw-Hill Education
Constraint programming (CP) is used to optimally solve automated test-form generation problems. The modeling
and solution process is demonstrated for two empirical examples: (i) generation of a fixed test form with optimal item
ordering; and (ii) real-time ordering of items in the shadow tests in CAT.
An Item-Matching Heuristic Method for a Complex Multiple Forms Test Assembly Problem
Pei-Hua Chen and Cheng-Yi Huang, National Chiao Tung University
An item matching approach for a complex test specification problem was proposed and compared with the integer
linear programming method. The purpose of this study is to extend the item matching method to test with complex
non-psychometric constraints such as set-based items, variable set length, and nested content constraints.
The Effect of Foil-Reordering and Minor Editorial Revisions on Item Performance
Tingting Chen, Yu-Lan Su and Jui-Sheng Wang, ACT, Inc.
This study investigates how foil-reordering, and minor reformatting and rewording affect item difficulty,
discrimination and other statistics for multiple-choice items using empirical data. Comparative and correlational
analyses were conducted across administrations. The results indicated a significant impact on item difficulty and key
selection distributions for foil-reordering and rewording.
Is Pre-Calibration Possible? a Conceptual Aig Framework, Model, and Empirical Investigation
Shauna Sweet, University of Maryland, College Park; Mark Gierl, University of Alberta
While automatic item generation is technologically feasible, a conceptual architecture supporting the evaluation of
these generative processes is needed. This study details such a framework and empirically examines the performance
of a new multi-level model intended for pre-calibration of automatically generated items and evaluation of the
generation process.
Award Session: NCME Annual Award: Mark Gierl & Hollis Lai
83
4:05 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, E5
Tablet Use in Assessment
Session Discussant: Walter Way, Pearson
Use of tablet devices in the classroom continues to increase as Bring Your Own Device (BYOD), 1:1 technology
programs, and flipped learning change the way students consume academic content, interact with their teachers
and peers, and demonstrate their mastery of academic knowledge and skills. In addition, many K-12 assessment
programs (e.g. NAEP, PARCC, SBAC, etc.) now or will soon allow administration of assessments using tablets. To
assure the validity and reliability of test scores it is incumbent upon test developers to evaluate the potential impact
of digital devices prior to their use within assessment. This session will explore various facets of the use of tablets
within educational assessment and will include presentation of a set of five papers on this topic. The papers will
utilize both qualitative and quantitative methods for evaluating tablet use and will evaluate impacts for different
student sub-groups and special populations as well as tablet applications for both testing and scoring.
Improving Measurement Through Usability
Nicholas Cottrell, Fulcrum
Using Tablet Technology to Develop Learning-Oriented English Language Assessment for English Learners
Alexis Lopez, Jonathan Schmigdall, Ian Blood and Jennifer Wain, ETS
Device Comparability: Score Range & Subgroup Analyses
Laurie Davis, Yuanyuan McBride and Xiaojing Kong, Pearson; Kristin Morrison, Georgia Institute of Technology
Response Time Differences Between Computers and Tablets
Xiaojing Kong, Laurie Davis and Yuanyuan McBride, Pearson
Scoring Essays on an iPad
Guangming Ling, Jean Williams, Sue O’Brien and Carlos Cavalie, ETS
84
Washington, DC, USA
Topics in Multistage and Adaptive Testing
Session Discussant: Jonathan Rubright, AICPA
A Top-Down Approach to Designing a Computerized Multistage Test
Xiao Luo, Doyoung Kim and Ada Woo, National Council of State Boards of Nursing
The success of a computerized multistage test (MST) relies on a meticulous test design. This study introduces a new
route-based top-down approach to designing MST, which imposes constraints and objectives upon routes and
algorithmically searches for an optimal assembly of modules. This method simplifies and expedites the design of MST.
Comparison of Non-Parametric Routing Methods with IRT in Multistage Testing Design
Evgeniya Reshetnyak, Fordham University; Alina von Davier, Charles Lewis and Duanli Yan, ETS
The goal of proposed study is to compare performance of non-parametric methods and machine learning techniques
with traditional IRT methods for routing test takers in an adaptive multistage test design using operational and
simulated data.
A Modified Procedure in Applying Cats to Allow Unrestricted Answer Changing
Zhongmin Cui, Chunyan Liu, Yong He and Hanwei Chen, ACT, Inc.
Computerized adaptive testing with salt (CATS) has been shown to be robust to test-taking strategies (e.g., Wainer,
1993) in a reviewable CAT. The robustness, however, is gained at the expense of test efficiency loss. We propose an
innovative modification such that the modified CATS is both robust and efficient.
The Expected Likelihood Ratio in Computerized Classification Testing
Steven Nydick, Pearson VUE
This simulation study compares the classification accuracy and expected test length of the expected likelihood ratio
(ELR; Nydick, 2014) item selection algorithm to alternative algorithms in SPRT-based computerized classification
testing (CCT). Results will help practitioners determine the most efficient method of item selection given a
particular CCT stopping rule.
A Comparison of the Pretest Item Calibration Procedures in CAT
Xia Mao, Pearson
This study compares four procedures for calibrating pretest items in CAT using both real data and simulated data by
manipulating the pretest item cluster length, calibration sample features and calibration sample sizes. The results
will provide guidance for pretest item calibration in large-scale CAT in K–12 contexts.
Pretest Item Selection and Calibration Under Computerized Adaptive Testing
Shichao Wang, The University of Iowa; Chunyan Liu, ACT, Inc.
Pretest item calibration plays an important role in maintaining item pools under computerized adaptive testing. This
study aims to compare and evaluate five pretest item selection methods in item parameter estimation using various
calibration procedures. The practical significance of these methods is also discussed.
Using Off-Grade Items in Adaptive Testing —A Differential Item Functioning Approach
Shuqin Tao and Daniel Mix, Curriculum Associates
This study is intended to assess the appropriateness of using off-grade items in adaptive testing from a differential
item functioning (DIF) approach. Data came from an adaptive assessment administered to school districts nationwide.
Insights gained will help develop item selection strategies in adaptive algorithm to select appropriate off-grade items.
85
Cognitive Diagnosis Models: Exploration and Evaluation
Session Discussant: Laine Bradshaw, University of Georgia
Bayesian Inferences of Q-Matrix with Presence of Anchor Items
Xiang Liu, Young-Sun Lee and Yihan Zhao, Teachers College, Columbia University
Anchor items are usually included in multiple administrations of same assessment. Attribute specifications and item
parameters can be obtained for these items from previous analyses. We propose a Bayesian method for estimating
Q-matrix with presence of partial knowledge. Simulation demonstrates its effectiveness. TIMSS 2003 and 2007 data
are then analyzed.
An Exploratory Approach to the Q-Matrix Via Bayesian Estimation
Lawrence DeCarlo, Teachers College, Columbia University
An exploratory approach to determining the Q-matrix in cognitive diagnostic models is presented. All elements are
specified as being uncertain, with respect to inclusion, and posteriors from a Bayesian analysis are used for selection.
Simulations show that the approach gives high rates of correct element recovery, typically over 90%.
Parametric or Nonparametric—Evaluating Q-Matrix Refinement Methods for Dina and Dino Models
Yi-Fang Wu, University of Iowa; Hueying Tzou, National University of Tainan
Two model-based and one model-free statistical Q-matrix refinement methods are evaluated and compared against
one another. Large-scope simulations are used to study their q-vector recovery rates and the correct rates of examinee
classification. The three most recent methods are also applied to real data for identifying and correcting misspecified
q-entries.
Comparing Attribute-Level Reliability Estimation Methods in Diagnostic Assessments
Chunmei Zheng and Yuehmei Chien, Pearson; Ning Yan, Independent Consultant
Diagnostic classification models have drawn much attention to practitioners due to its promising use on aligning
teaching, learning, and assessment. However, little has been investigated on attribute classification reliability. The
purpose of this study, therefore, is to conduct a comparison study for the existing reliability estimation method.
Estimation of Diagnostic Classification Models Without Constraints: Issues with Class Label Switching
Hongling Lao and Jonathan Templin, University of Kansas
Diagnostic classification models (DCMs) may suffer from the latent class label switching issue, providing misleading
results. A simulation study is proposed to investigate (1) the prevalence of label switching issue in different DCMs,
and (2) the effectiveness of constraints at preventing label switching from happening.
Conditions Impacting Parameter and Profile Recovery Under the Nida Model
Yanyan Fu, Jonathan Rollins and Robert Henson, UNCG
The NIDA model was studied under various conditions. Results indicated that sample size did not affect attribute
parameter recovery and marginal CCRs. However, the number of attributes and items influenced the mCCRs. RUM
and NIDA model generated data yielded similar mCCRs when estimated using the NIDA model.
86
Washington, DC, USA
Sequential Detection of Learning Multiple Skills in Cognitive Diagnosis
Sangbeak Ye, University of Illinois - Urbana Champaign
Cognitive diagnosis models aim to identify examinees’ mastery or non-mastery of a vector of skills. In an e-learning
environment where a set of skills are trained until mastery, proper detection method to determine the presence of
the skills is vital. We introduce techniques to detect change-points of multiple skills.
87
Session, Paper Session, E8
Electronic Board #1
Response Styles Adjustments in Cross-Cultural Data Using the Mixture PCM IRT Model
Bruce Austin, Brian French and Olusola Adesope, Washington State University
Response styles can contribute irrelevant variance to rating-scale items and compromise cross-cultural comparisons.
Rasch IRT models were used to identify response styles and adjust data after identifying latent classes based on
response styles. Predictive models were improved with adjusted data. We conclude with recommendations for
identifying response style classes.
Electronic Board #2
Using Differential Item Functioning to Test for Inter-Rater Reliability in Educational Testing
Sakine Gocer Sahin, Hacettepe University; Cindy M. Walker, University of Wisconsin- Milwaukee
Although multiple choice items can be more reliable, the information obtained from open ended items is sometimes
greater, and more aligned than these items. This is only true if the raters are unbiased. The purpose of this research
was to investigate an alternative measure of inter-rater reliability, based in IRT.
Electronic Board #3
Incorporating Expert Priors in Estimation of Bayesian Networks for Computer Interactive Tasks
Johnny Lin, University of California, Los Angeles; Hongwen Guo, Helena Jia, Jung Aa Moon and Janet Koster van Groos,
Due to the cost of item development in computer interactive tasks, the amount of evidence available for estimation
is reduced. In order to minimize instability, we show how expert priors can be incorporated into Bayesian Networks
by performing a smoothing transformation to obtain posterior estimates.
Electronic Board #4
A Multidimensional Rater Effects Model
Richard Schwarz, ETS; Lihua Yao, DMDC
An approach for evaluating rater effects is to add an explicit rater parameter to a polytomous IRT model called a
rater effects model. A multidimensional rater effects model is proposed. Using MCMC techniques and simulation,
specifications for priors, the posterior distributions, and estimation of the model will be described.
Electronic Board #5
Exploring Clinical Diagnosis Process Data with Cluster Analysis and Sequence Mining
Feiming Li and Frank Papa, Univeristy of North Texas Health Science Center
This study collected a clinical diagnosis process data from a diagnosis task performed by medical students in
the computer-based environment. The study aimed to identify attributes of data-gathering behaviors predicting
diagnostic accuracy; conduct cluster analysis and sequential mining to explore meaningful attribute or sequential
patterns explaining the success/failure of diagnosis.
88
Washington, DC, USA
Electronic Board #6
Validity Evidence for a Writing Assessment for Students with Significant Cognitive Disabilities
Russell Swinburne Romine, Meagan Karvonen and Michelle Shipman, University of Kansas
Sources of evidence for a validity argument are presented for the writing assessment in the Dynamic Learning
Maps Alternate Assessment System. Methods included teacher surveys, test administration observations and a new
cognitive lab protocol in which test administrators participated in a think aloud during administration of a practice
assessment.
Electronic Board #7
The Implications of Reduced Testing for Teacher Accountability
Jessica Alzen, School of Education University of Colorado Boulder; Erin Fahle and Benjamin Domingue, Graduate School of
Education Stanford University
The present student testing burden is substantial, and interest in alternative scenarios with reduced testing but
persistent accountability measures has grown. This study focuses on VA estimates in the presence of structural
missingness of test data consistent with alternative scenarios designed to reduce the student testing burden.
Electronic Board #8
Examination of the Constructs Assessed by Published Tests of Critical Thinking
Jennifer Kobrin, Edynn Sato, Emily Lai and Johanna Weegar, Pearson
We used a principled approach to define the construct of critical thinking and examined the degree to which
existing tests are aligned to the construct. Our findings suggest that existing tests tend to focus on a narrow set of
skills and identify gaps that offer opportunities for future assessment development.
Electronic Board #9
The False Discovery Rate Applied to Large-Scale Testing Security Screenings
Tanesia Beverly, University of Connecticut; Peter Pashley, Law School Admission Council
When statistical tests are conducted repeatedly to detect test fraud (e.g., copying) the overall false-positive rate
should be controlled. Three approaches to adjusting significance levels were investigated with simulated and real
data. A procedure for controlling the false discovery rate by Benjamini and Hochberg (1995) yielded the best results.
The Impact of Ignoring Multiple-Group Structure in Testlet-Based Tests on Ability Estimation
Ming Li, Hong Jiao and Robert Lissitz, University of Maryland
The study investigates the impact of ignoring the multi-group structure on ability estimation in testlet-based tests.
In a simulation, model parameter estimates from three IRT models: a standard 2PL model, and a multiple-group 2PL
model with or without testlet effects are compared and evaluated in terms of estimation errors.
Reconceptualising Validity Incorporating Evidence of User Interpretation
Timothy O’Leary, University of Melbourne; John Hattie and Patrick Griffin, Melbourne University
Validity is a fundamental consideration in test development. A recent conception introduced user validity focused
upon the accuracy and effectiveness of interpretations resulting from test score reports. This paper proposes a
reconceptualization of validity incorporating evidence of user interpretations and a method for the collection of
such evidence.
89
Single and Double Linking Designs Accessed by Population Invariance
Yan Huo and Sooyeon Kim, Educational Testing Service
The purpose of this study is to determine whether double linking is more effective than single linking in terms of
achieving subpopulation invariance on scoring. When double-linking was applied, the conversions derived from
two subgroups different in geographic regions were more comparable to the conversion derived from the total
group.
Equating Mixed-Format Tests Using a Simple-Structure MIRT Model Under a Cineg Design
Jiwon Choi, ACT/University of Iowa; Won-Chan Lee, University of Iowa
This study applies the SS-MIRT observed score equating procedure for mixed-format tests under the CINEG design.
Also, the study compares various scale linking methods for SS-MIRT equating. The results show that the SS-MIRT
approach provides more accurate equating results than the UIRT and traditional equipercentile methods.
Pre-Equating or Post-Equating? Impact of Item Parameter Drift
Wenchao Ma, Rutgers, The State University of New Jersey; Hao Song, National Board of Osteopathic Medical Examiners
This study, using a real-data-based simulation, examines whether item parameter drift (IPD) influences pre-equating
and post-equating. Accuracy of ability estimates and classifications are evaluated under varied conditions of IPD
direction, magnitude, and proportion of items with IPD. Recommendation is made on which equating method is
preferred under different IPD conditions.
A Comparative Study on Fixed Item Parameter Calibration Methods
Keyu Chen and Catherine Welch, University of Iowa
This study provides a description of implementing fixed item parameter method in BILOG-MG as well as a comparison
of three fixed item parameter calibration methods when calibrating field test items on the scale of operational items.
A simulation study will be conducted to compare results of the three methods.
Examining Various Weighting Schemes Effect on Examinee Classification Using a Test Battery
Qing Xie, ACT/The University of Iowa; Yi-Fang Wu, Rongchun Zhu and Xiaohong Gao, ACT, Inc
The purpose of this study is to examine the effect of various weighting schemes on classifying examinees into
multiple categories. The results will provide practical guidelines for using either profile scores or composite score for
examinee classification in a test battery.
Module Assembly for Logistic Positive Exponent Model-Based Multistage Adaptive Testing
Thales Ricarte and Mariana Cúri, Institute of Mathematical and Computer Sciences (ICMC-USP); Alina von Davier,
Educational Testing Service (ETS)
In Multistage (MST) Adaptive Testing based on Item Response theory models, modules are assembled optimizing
an objective function via linear programming. In this project, we analyzed the MST based on the Logistic Positive
Exponent model for testlet performance using Fisher, Kullback-Leibler information criteria and Continuous Entropy
Method as objective function.
90
Washington, DC, USA
Online Calibration Pretest Item Selection Design
Rui Guo and Hua-hua Chang, University of Illinois at Urbana-Champaign
Pretest item calibration is crucial In multidimensional computerized adaptive testing. This study proposed an online
calibration pretest item selection design name Four-quadrant D-optimal design with proportional density index
algorithm. Simulation results showed that the proposed method provides a good item calibration efficiency.
Online Multistage Intelligent Selection Method for Cd-Cat
Fen LUO, Shuliang Ding, Xiaoqing Wang and Jianhua Xiong, Jiangxi Normal University
A new item selection-method, online multistage intelligent selection method (OMISM) is proposed. Simulation
results show that for OMISM, the pattern match ratio of knowledge state is higher than that for posterior-weighted
Kullback-Leibler information selection method in CD-CAT when examinees mastered multiple attributes.
Data-Driven Simulations of False Positive Rates for Compound DIF Inference Rules
Quinn Lathrop, Northwest Evaluation Association
Understanding how inference rules function under the null hypothesis is critical. This proposal presents a datadriven simulation method to determine the false positive rate of tests for DIF. The method does not assume a
functional form of the item characteristic curves and also replicates impact from empirical data.
Simultaneous Evaluation of DIF and Its Sources Using Hierarchical Explanatory Models
William Skorupski, Jennifer Brussow and Jessica Loughran, University of Kansas
This study uses item-level features as explanatory variables for understanding DIF. Two approaches for DIF
identification/explanation are compared: 1) two-stage DIF + regression, and 2) a simultaneous, hierarchical
approach. Realistic data were simulated by varying the strength of relationship between DIF and explanatory
variables and reference/focal group sample sizes.
Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model
Christopher Runyon, Rose Stafford, Jodi Casabianaca and Barbara Dodd, The University of Texas at Austin
This research investigates trait level estimation under the rating scale model using three imputation methods of
handling missing data: (a) multiple imputation, (b) nearest-neighbor hot deck imputation, and (c) multiple hot deck
imputation. We compare the performance of these methods for three levels of missingness crossed with three scale
lengths.
The Nonparametric Method to Analyze Multiple-Choice Items: Using Hamming Distance Method
Shibei Xiang, Wei Tian and Tao Xin, National Cooperative Innovation Center for Assessment and Improvement of Basic
Education Quality
Many data in education are in the form of multiple-choice (MC) items that are scored as dichotomous data. In order
to obtain information from incorrect answers, we expand Q-matrix for options and use nonparametric Hamming
distance method to classify examinees that can be even used on a small sample size.
91
Automatic Scoring System for a Short Answer in Korean Large Scale Assessment
EunYoung Lim, Eunhee Noh and Kyunghee Sung, Korean Institute for Curriculum and Evaluation
The purpose of this study is to evaluate a prototype of Korean automatic scoring system (KASS) for short answers
and to explore the related features of KASS to improve accuracy of automatic scoring.
92
Washington, DC, USA
4:05 PM - 7:00 PM, Convention Center, Level Two, Room 202 A
The Life and Contributions of Robert L. Linn,
Followed by a Reception
Note: NCME is partnering with AERA to record this session. We will make this recording available to all NCME
members, including those who have to miss this tribute for presentations and attendance at NCME sessions.
93
6:30 PM - 8:00 PM, Grand Ballroom South, Ballroom Level
NCME and AERA Division D Joint Reception
National Council on Measurement in Education and AERA Division D Welcome Reception for Current and New
Members
94
Washington, DC, USA
Annual Meeting Program - Sunday, April 10, 2016
95
96
Washington, DC, USA
Sunday, April 10, 2016
8:00 AM - 9:00 AM, Marquis Salon 6, Marriott Marquis Hotel
2016 NCME Breakfast and Business Meeting (Ticketed Event)
Join your friends and colleagues at the NCME Breakfast and Business Meeting at the Marriott Marquis Hotel. Theater
style seating will be available for those who did no purchase a breakfast ticket but wish to attend the Business
Meeting.
97
9:00 AM - 9:40 AM, Marquis Salon 6, Marriott Marquis Hotel
Presidential Address:
Education and the Measurement of Behavioral Change
Rich Patz
Act, Iowa City, IA
98
Washington, DC, USA
10:35 AM - 12:05 PM, Renaissance East, Ballroom Level, Invited Session, F1
Award Session
Career Award: Do Educational Assessments Yield Achievement Measurements
Winner: Mark Reckase
Session Moderator: Kadriye Erickan, University of British Columbia
Because my original training in measurement/psychometrics was in psychology rather than education, I have
noted the difference in approaches taken for the development of tests in those two disciplines. One begins with
the concepts of a hypothetical construct and locating persons along a continuum, and the other begins with the
definition of a domain of content and works to estimate the amount of the domain that a person has acquired. This
presentation will address whether these two conceptions of test development are consistent with each other and
with the assumptions of the IRT models that are often used to analyze the test results. It will also address how tests
results are interpreted and if those interpretations are consistent with the measurement model and the test design.
Finally, there is a discussion of how users of test results would like to interpret results, and whether measurement
experts can produce tests and analysis procedures that will support the desired interpretations
99
10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Invited Session, F2
Debate: Should the NAEP Mathematics Framework Be Revised to Align with the
Common Core State Standards?
Session Presenters: Michael Cohen, Achieve
Chester Finn, Fordham Institute
Session Moderators: Bill Bushaw, National Assessment Governing Board
Terry Mazany, Chicago Community Trust
The 2015 National Assessment of Educational Progress (NAEP) results showed declines in mathematics scores
at grades 4 and 8 for the nation and several states and districts. The release of the 2015 NAEP results prompted
discussion about the extent to which the results may have been affected by differences between the content of the
NAEP mathematics assessments and the Common Core State Standards in mathematics. The National Assessment
Governing Board wants to know what you think. The presenters will frame the issue and then audience members
will engage in a thorough discussion providing important insights to Governing Board members.
100
Washington, DC, USA
10:35 AM - 12:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, F3
Beyond Process: Theory, Policy, and Practice in Standard Setting
Session Chair: Karla Egan, NCIEA
Session Discussant: Chad Buckendahl, Alpine Testing
Standard setting has become a routine and (largely) accepted part of the test development cycle for K-12 summative
assessments. Conventional implementation of almost any K-12 standard setting method convenes teachers
who study achievement level descriptors (ALDs) to make decisions about the knowledge, skills, and abilities
(KSAs) expected of students. Traditionally, these cut scores have gone to state boards of education or education
commissioners that are sometimes reluctant to adjust cut scores established by educators. While these conventional
practices have served the field well, there are particular areas that deserve further scrutiny.
The first area needing further scrutiny is the validity of the ALDs, which provide a common framework for panelists
to use when recommending cut scores. These ALDs are often written months or years prior to the test, sometimes
even providing guidance for item writers and test developers regarding the KSAs expected of students on the test
(Egan, Schneider, & Ferrara, 2012). What happens when carefully developed ALDs are not well aligned to actual
student performance? This is an area that happens in practice, yet only handful of studies that have examined
the issue (e.g., Schneider, Egan, Kim, & Brandstrom, 2012). The first paper seeks to validate the ALDs used in the
development of a national alternate assessment against student performance on that assessment.
The next area that needs a closer look is the use of educators as panelists in standard setting workshops. Educators
may have a conflict of interest in recommending the cut scores. Educators are asked to recommend cut scores that
have a direct consequence on accountability measures, such as teacher evaluation. There are other means of setting
cut scores that do not involve teachers. For example, when setting college-readiness cut scores, it may not even be
necessary to bring in panelists if the state links performance on their high school test to a test like the ACT or SAT.
The second paper investigates the positives and negatives of quantitative methods for setting cut scores.
Another take on the same issue is to involve panelists who are able to reflect globally on the how cut scores will
impact school-, district-, and statewide systems. To this end, methods have been used that show different types
of data to inform panelist decision (Beimers, Way, McClarty, & Miles, 2012). Others have brought in district-level
staff following the content-based standard setting to adjust cut scores from a system perspective. The third paper
approaches this as a validity issue, and it examines the different type of evidence (beyond process) that should be
used to support standard setting.
The final issue that deserves further scrutiny is the use of panelists as evaluators of the standard setting. Panelists often
serve as the only evaluators of the implementation and outcome of the method itself. Panelists fill out evaluations
at the end of the standard setting, and these are often used as validity evidence supporting the cut scores. While
this group represents an important perspective on the standard setting process, it is important to recognize that
panelists are often heavily invested in the process by the time they participate in an evaluation of the standard setting
workshop. The last paper considers the role that an external evaluator could play at standard setting.
The Alignment of Achievement Level Descriptors to Student Performance
Lori Nebelsick-Gullet, edCount
Data-Based Standard Setting: Moving Away from Panelists?
Joseph Martineau, NCIEA
101
Examining Validity Evidence of Policy Reviews
Juan d’Brot, DRC
The Role of the External Evaluator in Standard Setting
Karla Egan, NCIEA
102
Washington, DC, USA
10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, F4
Exploring Timing and Process Data in Large-Scale Assessments
Session Chairs: Matthias von Davier and Qiwei He, Educational Testing Service
Session Discussant: Ryan Baker, Teachers College Columbia University
Computer-based assessments (CBAs) provide new insights into behavioral processes related to task completion
that cannot be easily observed using paper-based instruments. In CBAs, a variety of timing and process data
accompanies test performance data. This means that much more than data is available besides correctness
or incorrectness. The analyses of these types of data are necessarily much more involved than those typically
performed on traditional tests. This symposium provides examples of how sequences of actions and timing data are
related to task performance and how to use process data to interpret students’ computer and information literacy
achievements in large-scale international education and skills surveys such as the Programme for International
Student Assessment (PISA), the Programme for International Assessment of Adult Competencies (PIAAC), and
the International Computer and Information Literacy Study (ICILS). The methods applied in these talks draw on
cognitive theories for guidance of what “good” problem solving is, as well as on modern data-analytic techniques
that can be utilized to explore log file data. These studies highlight the potential of analyzing students’ behavior
stored in log files in computer-based large-scale assessments and show the promise of tracking students’ problemsolving strategies by using process data analysis.
An Overview: Process Data – Why Do We Care?
Matthias von Davier, Educational Testing Service
Log File Analyses of Students’ Problem-Solving Behavior in PISA 2012 Assessment
Samuel Greiff and Sascha Wüstenberg, University of Luxembourg
Identifying Feature Sequences from Process Data in PIAAC Problem-Solving Items with N-Grams
Qiwei He and Matthias von Davier, Educational Testing Service
Predictive Feature Generation and Selection from Process Data in PISA Simulation-Based Environment
Zhuangzhuang Han, Qiwei He and Matthias von Davier, Educational Testing Service
103
10:35 AM - 12:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, F5
Psychometric Challenges with the Machine Scoring of Short-Form Constructed
Responses
Session Chair: Mark Shermis, University of Houston—Clear Lake
Session Discussant: Claudia Leacock, CTB/McGraw-Hill
This session examines four methodological problems associated with machine scoring of short-form constructed
responses. The first study looks at the detection of speededness with short-answer questions on a testlet-based
science test. Because items in a testlet are scored together, speededness can have a negative and even irrecoverable
impact on an examinee’s score. The second study attempted to detect speededness/differential speededness on Task
Based Simulations (a type of short-form constructed response) that were part of a licensing exam. Since the TBSs are
embedded in the same section of the exam as multiple-choice questions, the goal was to ensure that examinees will
have enough time to complete the test. The third study used a new twist on adjudicating short-answer machine
scores. Instead of using a second human rater to adjudicate discrepant scores between one human and one machine
rater, the study employed two different machine scoring systems and used a human rater to resolve differences in
scores. The last study attempted to explain DIF using linguistic feature sets of machine scored short-answer questions
taken from middle- and high-school exam questions. The study suggests that focal and reference groups have
different “linguistic profiles” that may explain differences in test performance on particular items.
Speededness Effects in a Constructed Response Science Test
Meereem Kim, Allan Cohen, Zhenqui Lu, Seohyun Kim, Cory Buxton and Martha Allexsaht-Snider, University of Georgia
Speededness for Task Based Simulations Items in a Multi-Stage Licensure Examination
Xinhui Xiong, American Institute for Certified Public Accountants
Short-Form Constructed Response Machine Scoring Adjudication Methods
Susan Lottridge, Pacific Metrics, Inc.
Use of Automated Scoring to Generate Hypotheses Regarding Language Based DIF
Mark Shermis, University of Houston--Clear Lake; Liyang Mao, IXL Learning; Matthew Mulholland, Educational Testing
Service; Vincent Kieftenbeld, PacificMetrics, Inc.
104
Washington, DC, USA
10:35 AM - 12:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, F6
Advances in Equating
Session Discussant: Benjamin Andrews, ACT
Bifactor MIRT Observed Score Equating of Testlet-Based Tests with Nonequivalent Groups
Mengyao Zhang, National Conference of Bar Examiners; Won-Chan Lee, The University of Iowa; Min Wang, ACT
This study extends a bifactor MIRT observed-score equating framework for testlet-based tests (Zhang et al., 2015) to
accommodate nonequivalent groups. Binary data are simulated to represent varying degrees of testlet effect and
group equivalence. Different procedures are evaluated regarding the estimated equating relationships for numbercorrect scores.
Hierarchical Generalized Linear Models (hglms) for Testlet-Based Test Equating
Ting Xu and Feifei Ye, University of Pittsburgh
This simulation study was to investigate the effectiveness of Hierarchical Generalized Linear Models (HGLMs) as
concurrent calibration models on testlet-based test equating under the anchor-test design. Three approaches were
compared, including two under the HGLM framework and one using Rasch concurrent calibration. Degrees of testlet
variance were manipulated.
The Local Tucker Method and Its Standard Errors
Sonya Powers, Pearson; Lisa Larsson, ERC Credit Modelling
A new linear equating method is proposed that addresses limitations of the local and Tucker equating methods. This
method uses a bivariate normal distribution to model common and non-common item scores. Simulation results
indicate that this new method has comparable standard errors to the original Tucker method and less bias.
Using Criticality Analysis to Select Loglinear Smoothing Models
Arnond Sakworawich, National Institute of Development Administration; Han-Hui Por and Alina von Davier, Educational
Testing Service; David Budescu, Fordham University
This paper proposes “Criticality analysis” as a loglinear smoothing model selection procedure. We show that this
method outperforms traditional methods that rely on global measures of fit of the original data set by providing a
clearer and sharper differentiation between the competing models.
105
10:35 AM - 12:05 PM, Meeting Room 15, Meeting Room Level, Paper Session, F7
Novel Approaches for the Analysis of Performance Data
Session Discussant: William Skorupski, University of Kansas
Combining a Mixture IRT Model with a Nominal Random Item Mixture Model
Hye-Jeong Choi and Allan Cohen, University of Georgia; Brian Bottge, University of Kentucky
This study describes a psychometric model in which a mixture item response theory model (MixIRTM) is combined
to a random item mixture nominal response model (RMixNRM). Inclusion of error and accuracy in one model has the
potential to provide a more direct explanation about differences in response patterns.
Bayesian Estimation of Null Categories in Constructed-Response Items
Yong He, Ruitao Liu and Zhongmin Cui, ACT, Inc.
Estimating item parameters in the presence of a null category in a constructed-response item is challenging. The
problem has not been investigated in the generalized partial credit model (GPCM). A Bayesian estimation of null
categories based on the GPCM framework is proposed in this study.
The Fast Model: Integrating Learning Science and Measurement
José González-Brenes, Pearson; Yun Huang and Peter Brusilovsky, University of Pittsburgh
The assessment and learning science communities rely on different paradigms to model student performance.
Assessment uses models that capture different student abilities and problem difficulties, while learning science
uses models that capture skill acquisition. We present our recent work on FAST (Feature Aware Student knowledge
Tracing) to bridge both communities.
Award Session: Brenda Loyd Dissertation Award 2016- Yuanchoa Emily Bo
106
Washington, DC, USA
10:35 AM - 12:05 PM, Mount Vernon Square, Meeting Room Level, Electronic Board
Session, Paper Session, F8
Electronic Board #1
Multilevel IRT: When is Local Independence Violated?
Christine DeMars and Jessica Jacovidis, James Madison University
Calibration data often is often collected within schools. This illustration shows that random school effects for ability
do not bias IRT parameter estimates or their standard errors. However, random school effects for item difficulty lead
to bias in item discrimination estimates and inflated standard errors for difficulty and ability.
Electronic Board #2
The Higher-Order IRT Model for Global and Local Person Dependence
Kuan-Yu Jin and Wen-Chung Wang, The Hong Kong Institute of Education
Persons from the same clusters may behave more similarly than those from different clusters. In this study, we
proposed a higher-order partial credit model for person clustering to quantify global and local person dependence
for clustered samples in multiple tests. Simulations studies supported good parameter recovery of the new model.
Electronic Board #3
A Multidimensional Item Response Model for Local Dependence and Content Domain Structure
Yue Liu, Sichuan Institute Of Education Sciences; Lihua Yao, Defense Manpower Data Center; Hongyun Liu, Beijing Normal
University, Depart ment of Psychology
This study proposed a multidimensional item response model for testlets to simultaneously account for local
dependence due to item clustering and multidimensional structure. Within-testlet and between-testlet models are
applied to collaborative problem solving assessments real data. Precisions for the domain score and overall score for
the proposed models are compared.
Electronic Board #4
Distinguishing Struggling Learners from Unmotivated Students in an Intelligent Tutoring System
Kimberly Colvin, University at Albany, SUNY
To help teachers distinguish struggling learners from unmotivated students, a measure of examinee motivation
designed for large-scale computer-based tests was modified and applied to an intelligent tutoring system. Proposed
modifications addressed issues related to small sample sizes. The relationship of hint use and student motivation
was also investigated.
Electronic Board #5
Using Bayesian Networks for Prediction in a Comprehensive Assessment System
Nathan Dadey and Brian Gong, The National Center for the Improvement of Educational Assessment
This work shows how a Bayesian network can be used to predict student summative achievement classifications
using assessment data collected thought the school year. The structure of the network is based on a curriculum map.
The ultimate aim is to examine the usefulness of the network information to teachers.
107
Electronic Board #6
Comparability Within Computer-Based Assessment: Does Screen Size Matter?
Jie Chen and Marianne Perie, Center for Educational Testing and Evaluation
Comparability studies are moving beyond paper-and-pencil versus computer-based assessments to analyze
variances within computers. Using data from a large district giving tests on either Macs, with large, high-definition
screens or Chromebooks, with standard 14” screens, this study compares assessment results between devices by
grade, subject, and item type.
Electronic Board #7
Modeling Acquiescence and Extreme Response Styles and Wording Effects in Mixed-Format Items
Hui-Fang Chen, City University of Hong Kong; Kuan-Yu Jin and Wen-Chung Wang, Hong Kong Institute of Education
Acquiescence and extreme response styles and wording effects are commonly observed in rating scale or Likert items.
In this study, a multidimensional IRT model was proposed to account for these two responses styles and wording
effects simultaneously. The effectiveness and feasibility of the new model were examined in simulation studies.
Electronic Board #8
Accessibility: Consideration of the Learner, the Teacher, and Item Performance
Bill Herrera, Charlene Turner and Lori Nebelsick-Gullett, edCount, LLC; Lietta Scott, Arizona Department of Education,
Assessment Section
To better understand the impact of federal legislation that required schools to provide access to academic
curricula to students with intellectual disability, the National Center and State Collaborative examined differential
performance of items with respect to students’ communication and opportunity to learn using data from three
assessment administrations.
Electronic Board #9
Examining the Growth and Achievement of Stem Majors Using Latent Growth Models
Heather Rickels, Catherine Welch and Stephen Dunbar, University of Iowa, Iowa Testing Programs
This study examined the use of latent growth models (LGM) when investigating the growth and college readiness of
STEM majors versus non-STEM majors. Specifically, LGMs were used to compare growth on a state achievement test
from Grades 6-11 of STEM majors and non-STEM majors at a public university.
Modeling NCTM and CCSS 5th Grade Math Growth Estimates and Interactions
Dan Farley and Meg Guerreiro, University of Oregon
This study compares NCTM and CCSS growth estimates. Multilevel models were used to generate models to
compare standards. The CCSS measures appear to be more sensitive to growth, but exhibit potential biases toward
female and English learners.
Norming and Psychometric Analysis for a Large-Scale Computerized Adaptive Early Literacy Assessment
James Olsen, Renaissance Learning Inc.
This paper presents psychometric analysis and norming information for a large-scale adaptive K-3 early-literacy
assessment. It addresses validity, reliability, and later grade 3 reading proficiency. The norming involved sampling
586,380 fall/spring assessments, post stratification weighting to a representative national sample, descriptive score
statistics, and developing scale percentiles and grade equivalents.
108
Washington, DC, USA
The Impact of Ignoring the Multiple-Group Structure of Item Response Data
Yoon Jeong Kang, American Institutes for Research; Hong Jiao and Robert Lissitz, University of Maryland
This study examines model parameter estimation accuracy and proficiency level classification accuracy when the
multiple-group structure of item response data is ignored. The results show that the heterogeneity of population
distribution was the most influential factor on the accuracy of model parameter estimation and proficiency level
classification.
Influential Factors on College Retention Based on Tree Models and Random Forests
Chansoon Lee, Sonya Sedivy and James Wollack, University of Wisconsin-Madison
The purpose of this study is to examine influential factors on college retention. Tree models and random forests will
be applied to determine important factors on student retention and to improve the prediction of college retention.
Detecting Non-Effortful Responses to Short-Answer Items
Ruth Childs, Gulam Khan and Amanda Brijmohan, Ontario Institute for Studies in Education, University of Toronto; Emily
Brown, Sheridan College; Graham Orpwood, York University
This study investigates the feasibility and effects of using the content of short-answer responses, in addition to
response times, to improve the filtering of non-effortful responses from field test data and so improve item
calibration.
Item Difficulty Modeling for an Ell Reading Comprehension Test Using LLTM
Lingyun Gao, ACT, Inc.; Changjiang Wang, Pearson
This study models cognitive complexity of the items included in a large-scale high-stakes reading comprehension
test for English language learners (ELL), using the linear logistic test model (LLTM; Fischer, 2005). The findings will
have implications for targeted test design and efficient item development.
The Effect of Unmotivated Test-Takers on Field Test Item Calibrations
H. Jane Rogers and Hariharan Swaminathan, University of Connecticut
A simulation study was conducted to investigate the effect of low motivation of test-takers on field-test item
calibrations. Even small percentages of unmotivated test-takers resulted in substantial underestimation of
discrimination parameters and overestimation of difficulty parameters. These calibration errors resulted in
inaccurate estimation of trait parameters in a CAT administration.
Cognitive Analysis of Responses Scored Using a Learning Progression for Proportional Reasoning
Edith Aurora Graf, ETS; Peter van Rijn, ETS Global
Learning progressions are complex structures based on a synthesis of standards documents and research studies,
and therefore require empirical verification. We describe a validity exercise in which we compare IRT-based
classifications of students into the levels of a learning progression to classifications provided by a human rater.
109
Nonparametric Diagnostic Classification Analysis for Testlet-Based Tests
Shuying Sha and Robert Henson, University of North Carolina at Greensboro
This study investigates the impact of the testlet effect on performance of parametric and nonparametric (Hamming
Distance method) diagnostic classification analysis. Results showed that the performance of both approaches
deteriorated with the increase of the testlet effect size. Potential solutions to nonparametric classification for testletbased test are proposed.
An Application of Second-Order Growth Mixture Model for Educational Longitudinal Research
Xin Li and Changhua Rich, ACT, Inc.; Hongyun Liu, Beijing Normal University
Investigating change in individual achievement over time is of central importance in educational research. The
current study describes and illustrates the use of the second-order latent growth model with its extension to the
growth mixture model to a real data to help modeling growth with considering the population heterogeneity.
Confirmatory Factor Analysis of Timss’ Mathematics Attitude Items with Recommendations for Change
Thomas Hogan, University of Scranton
This study reports results of confirmatory factor analysis for Trends in International Mathematics and Science
Study (TIMSS) math attitude scales for national samples of students in the United States at grades 4 and 8.
Recommendations are made for improvement of the scales, particularly for the Self-confidence latent variable.
Controlling for Multiplicity in Structural Equation Models
Michael Zweifel and Weldon Smith, University of Nebraska-Lincoln
When evaluating a structural equation models, several hypotheses are evaluated simultaneously which leads which
increases the probability that a Type I error is committed. This proposal examined how several common multiple
comparison procedures performed when the number of item response categories and the item variances were varied.
Alternative Approaches for Comparing Test Score Achievement Gap Trends
Benjamin Shear, Stanford University; Yeow Meng Thum, Northwest Evaluation Association
This paper compares trajectories of cross-sectional achievement gaps between subgroups to subgroup differences
in longitudinal growth trajectories. The impact of vertical scaling assumptions is assessed with parallel analyses in
an ordinal metric. We suggest ways to test inferences about closing gaps (“equalization”) across grades and cohorts,
possibly for value-added analyses.
110
Washington, DC, USA
12:25 PM - 2:25 PM, Ballroom ABC, Level Three, Convention Center
AERA Awards Luncheon
AERA’s Awards Program is one of the most prominent ways for education researchers to recognize and honor the
outstanding scholarship and service of their peers. Recipients of AERA awards are announced and recognized
during the Annual Awards Luncheon.
111
2:45 PM - 4:15 PM, Renaissance East, Ballroom Level, Coordinated Session, G1
Challenges and Opportunities in the Interpretation of the Testing Standards
Session Chair: Andrew Wiley, Alpine Testing Solutions, Inc.
Session Discussant: Barbara Plake, University of Nebraska-Lincoln
Across divisions of the professional assessment community, the Standards for Educational and Psychological
Testing (AERA/APA/NCME, 2014) and its requirements serve as the guiding principles for testing programs when
determining procedures and policies. However, while the Standards do serve as the primary source for the
assessment community, the interpretation of the Standards continues to be a somewhat subjective affair. Because
validity is dependent on the context of each program, testing professionals are required to interpret and align the
guidelines to prioritize and evaluate relevant evidence.
For example, in some scenarios a term such as “representative” can be difficult to define, and reasonable people
could interpret evidence with notably different expectations. In practical terms, this can become problematic for the
profession because if the Standards are not sufficiently clear for the purposes of interpretability and accountability
within the profession, it creates more confusion when trying to communicate these expectations to policymakers
and lay audiences.
The purpose of this session is to focus of how assessment professionals use and interpret the Standards and the
procedures that individuals and organizations use when applying them. Each of the four presenter will discuss
the methods and procedures that their respective organizations have developed or how they have advised
organizations they work with about interpreting and using the Standards to design or improve their programs.
In addition, they will also discuss the sections of the Standards that they have found to be particularly difficult to
interpret with recommendations about how additional interpretative guidance would make the Standards more
effective to implement.
The session will include with Dr. Barbara Plake serving as a discussant. Dr. Plake is one of the leading voices on the
value and importance of the Standards and will review each paper along with a review of some of her experience in
the use and interpretation of the Standards.
Using the Testing Standards as the Basis for Developing a Validation Argument.
Wayne Camara, ACT
Using the Standards to Support Assessment Quality Evaluation
Erika Hall and Thanos Patelis, Center for Assessment
Blurring the Lines Between Credentialing and Employment Testing
Chad Buckendahl, Alpine Testing Solutions, Inc.
Content Based Evidence and Test Score Validation
Ellen Forte, edCount, LLC
112
Washington, DC, USA
2:45 PM - 4:15 PM, Renaissance West A, Ballroom Level, Coordinated Session, G2
Applications of Combinatorial Optimization in Educational Measurement
Session Chairs: Wim van der Linden and Michelle Barrett, Pacific Metrics; Bernard Veldkamp,
University of Twente; Dmitry Belov, Law School Admission Council
Combinatorial optimization (CO) is concerned with searching for an element from a finite set (called a feasible
set) that would optimize (minimize or maximize) a given objective function. Numerous practical problems can
be formulated as CO problems, where a feasible set is not given explicitly but is represented implicitly by a list of
inequalities and inclusions. Two unique features of CO problems should be mentioned:
1. In practice, a feasible set is so large that a straightforward approach to solving a corresponding CO problem
by checking every element of the feasible set would take an astronomical amount of time. For example, in a
traveling salesmen problem (TSP) (Given a list of n cities and the distances between each pair of cities, what is the
shortest possible route that visits each city exactly once and returns to the origin city?), the corresponding feasible
set contains (n–1)!/2n− elements (routes). Thus, in the case of 25 cities there are 310,224,200,866,620,000,000,000
possible routes. Assuming that a computer can check each route in 1 microsecond (1/1,000,000 of a second),
an optimal solution of the TSP with 25 cities will be found in about 9,837,144,878 years. With respect to the
size of a given CO problem (e.g., number of cities, n, in the TSP), the time it takes to solve the problem can be
approximated by an exponent, resulting in an exponential time (e.g., c2n, cen, where c is a constant) in contrast to
a polynomial time (e.g., cnlogn, cn2).
2. Often, a given CO problem can be reduced to another CO problem in polynomial time. Thus, if one CO problem
can be solved efficiently (e.g., in polynomial time) then the whole class of CO problems can be solved efficiently
as well.
Fortunately, the modern CO literature provides methods that, during the search, allow us to identify and remove large
portions of the feasible set that do not contain an optimal element. As a result, many real instances of CO problems
can be solved in a reasonable amount of time. The most popular method is branch-and-bound (Papadimitriou &
Steiglitz, 1982), which solves an instance of the TSP with 25 cities in less than one minute on a regular PC.
The history of CO applications in educational measurement began in the early 1980s, when psychometricians
started to use CO methods for automated test assembly (ATA). Theunissen (1985) reduced a special case of an ATA
problem to a knapsack problem (Papadimitriou & Steiglitz, 1982). van der Linden and Boekkooi-Timminga (1989)
formulated an ATA problem as a maximin problem. Later, Boekkooi-Timminga (1990) extended this approach to
the assembly of multiple test forms with no common items. Soon after that, the ATA problem attracted many more
researchers, whose major results are reviewed in van der Linden (2005).
The first part of this coordinated session will introduce CO and then review its existing and potential future
applications to educational measurement. More specifically, it will introduce mixed integer programming (MIP)
modeling as a tool for finding solutions to CO problems, emphasizing such key notions as constraints, objective
function, feasible and optimal feasible solutions, linear and nonlinear models, and heuristic and solver-based
solutions. It will then review areas of educational measurement where CO has already provided or has the potential
to provide optimal solutions to main problems, including areas such as optimal test assembly, automated test-form
generation, item-pool design, adaptive testing, calibration sample design, controlling test speededness, parameter
linking design, and test-based instructional assignment.
The second part of this coordinated session will discuss three recent applications of CO in educational measurement.
The first application relates to linking. For the common dichotomous and polytomous response models, linking
113
response model parameters across test administrations that use separate item calibrations requires the use of
common items and/or common examinees. Error in the estimated linking function parameters occurs as a result of
propagation of estimation error in the response model parameters (van der Linden & Barrett, in press). When using
a precision-weighted average approach to estimation of linking parameters, linking error appears to be additive in
the contribution of each linking item. Therefore, minimizing linking error when selecting common items from the
larger set of available items from the first test administration may be facilitated using CO. Three new MIP models
used to optimize the selection of a set of linking items, subject to blueprint and practical test requirements, will be
presented. Empirical results will demonstrate the use of the models.
The second application is for ATA under uncertainty in item parameters. Commonly, in an ATA problem one assumes
that item parameters are known precisely. However, they are always estimated from some dataset, which adds
uncertainty into the corresponding CO problem. Several optimization strategies dealing with uncertainty in the
objective function and/or constraints of a CO problem have been developed in the literature. This presentation
will focus on robust and stochastic optimization strategies, which will be applied to both linear and adaptive test
assembly. An impact of the uncertainty on the ATA process will be studied, and practical recommendations to
minimize the impact will be provided.
The third application relates to two important topics in test security: detection of item preknowledge and detection
of aberrant answer changes (ACs). Item preknowledge describes a situation in which a group of examinees (called
aberrant examinees) have had access to some items (called compromised items) from an administered test prior
to the exam. Item preknowledge negatively affects both the corresponding testing program and its users (e.g.,
universities, companies, government organizations) because scores for aberrant examinees are invalid. In general,
item preknowledge is difficult to detect due to three unknowns: (i) unknown subgroups of examinees at (ii)
unknown test centers who (iii) had access to unknown subsets of compromised items prior to taking the test. To
resolve the issue of multiple unknowns, two CO methods are applied. First, a random search detects suspicious
test centers and suspicious subgroups of examinees. Second, given suspicious subgroups of examinees, simulated
annealing identifies compromised items. Advantages and limitations of the methods will be demonstrated using
both simulated and real data. The statistical analysis of ACs has uncovered multiple testing irregularities on largescale assessments. However, existing statistics capitalize on the uncertainty in AC data, which may result in a large
Type I error. Without loss of generality, for each examinee, two disjoint subsets of administered items are introduced:
the first subset has items with ACs; the second subset has items without ACs, assembled by CO methods to minimize
the distance between its characteristic curve and the characteristic curve of the first subset. A new statistic measures
the difference in performance between these two subsets, where to avoid the uncertainty, only final responses are
used. In computer simulations, the new statistic demonstrated a strong robustness to the uncertainty and higher
detection rates in contrast to two popular statistics based on wrong-to-right ACs.
114
Washington, DC, USA
2:45 PM - 4:15 PM, Renaissance West B, Ballroom Level, Paper Session, G3
Psychometrics of Teacher Ratings
Session Discussant: Tia Sukin, Pacific Metrics
Psychometric Characteristics and Item Category Maps for a Student Evaluation of Teaching
Patrick Meyer, Justin Doromal, Xiaoxin Wei and Shi Zhu, University of Virginia
We describe psychometric characteristics of a student evaluation of teaching with four dimensions: Organization,
Assessment, Interactions, and Rigor. Using data from 430 students and 65 university classrooms, we implemented
an IRT-based approach to maximum information item category mapping to facilitate score interpretation and
multilevel models to evaluate threats to validity.
Psychometric Stability of Tripod Student Perception Surveys with Reduced Data
Catherine McClellan, Clowder Consulting; John Donoghue, Educational Testing Service
Student perception surveys such as TripodTM are becoming more commonly used as part of PK-12 classroom teacher
evaluations. The loss of classroom time to survey administration remains a concern for teachers. This study examines
approaches the impact on survey results of various data reduction approaches.
Does the ‘type’ of Rater Matter When Evaluating Special Education Teachers?
Janelle Lawson, San Francisco State University; Carrie Semmelroth, Boise State University
This study examined how school administrators without any formal experience in special education performed
using the Recognizing Effective Special Education Teachers (RESET) Observation Tool compared with previous
reliability studies that used experienced special education teachers as raters. Preliminary findings indicate that ‘type’
of rater matters when evaluating special education teachers.
Measuring Score Consistency Between Teacher and Reader Scored Grades
Yang Zhao, University of Kansas; Jonathan Rollins, University of North Carolina; Deanna Morgan and Priyank Patel, The
College Board
The purpose of this paper is to evaluate score consistency between teachers and readers. Measures such as the
Pearson correlation, Root Mean Square Error, Mean Absolute Error, Root Mean Square Error in agreement, and the
Concordance Correlation Coefficient in agreement, are calculated.
115
2:45 PM - 4:15 PM, Meeting Room 3, Meeting Room Level, Paper Session, G4
Multidimensionality
Session Discussant: Mark Reckase, Michigan State University
An Index for Characterizing Construct Shift in Vertical Scales
Jonathan Weeks, ETS
The purpose of this study is to define an index that characterizes the amount of construct shift associated with a
“unidimensional” vertical scale when the underlying data are multidimensional. The method is applied to large-scale
math and reading assessments.
Multidimensional Test Assembly of Parallel Test Forms Using a Kulback-Leibler Information Index
Dries Debeer, University of Leuven; Usama Ali, Educational Testing Company; Peter van Rijn, ETS Global
The statistical targets commonly used for the assembly of parallel test forms in unidimensional IRT are not directly
transferable to multidimensional IRT. To fill this gap, a Kulback-Leibler based information index (KLI) is proposed. The
KLI is discussed and evaluated in the uni- and the multidimensional case.
Evaluating the Use of Unidimensional IRT Procedures for Multidimensional Data
Wei Wang, Chi-Wen Liao and Peng Lin, Educational Testing Service
This study intends to investigate the feasibility of applying unidimensional IRT procedures (including item calibration
and equating) for multidimensional data. Both simulated data and operational data will be used. The results will
provide suggestions about under which conditions it is appropriate to use unidimensional IRT procedures to
analyze multidimensional data.
Classification Consistency and Accuracy Indices for Multidimensional Item Response Theory
Wenyi Wang, Lihong Song and Shuliang Ding, Jiangxi Normal University; Hua-Hua Chang, University of Illinois at UrbanaChampaign
For criterion-referenced tests, classification consistency and accuracy are important indicators to evaluate the
reliability and validity of classiﬁcation results. The purpose of this study is to explore these indices for complex
decision rules under multidimensional item response theory. It would be valuable to score interpretation and
computerized classification testing.
116
Washington, DC, USA
Validating “Noncognitive”/Nontraditional Constructs I
Session Discussant: William Lorié, Center for NextGen Learning & Assessment, Pearson
Improving the NAEP SES Measure: Can NAEP Learn from Other Survey Programs?
Young Yee Kim and Jonathan Phelan, American Institutes for Research; Jing Chen, National Center for Education Statistics;
Grace Ji, Avar Consulting, Inc.
This study is designed as part of NCES’s efforts to improve the NAEP SES measure. Based on the findings from the
extensive review of various survey programs within and outside NCES and literature review, some suggestions are
made to help NCES in reporting a new SES measure in 2017.
Investigating SES Using the NAEP-HSLS Overlap Sample
Burhan Ogut, George Bohrnstedt and Markus Broer, American Institutes for Research
This study examines the relationships among the three main SES components (parental education, occupational
status and income) based on parent-reports on the one hand, and student-reports of SES proxy variables (parents’
education, household possessions, and NSLP eligibility) on the other hand, using multiple-indicators and multiplecauses models and seemingly unrelated regressions.
Rethinking the Measurement of Noncognitive Attributes
Andrew Maul, University of California, Santa Barbara
The quality of “noncognitive” measurement lags behind the quality of measurement in traditional academic realms.
This project identifies a potentially serious gap in the validity argument for a prominent measure of growth mindsets.
New approaches to the measurement of growth mindsets are piloted and exemplified.
Validating Relationships Among Mathematics-Related Self Efficacy, Self Concept, Anxiety and Achievement
Measures
Madhabi Chatterji and Meiko Lin, Teachers College, Columbia University
In this construct validation study, we use structural equation modeling to validate theoretically specified pathways
and correlations of mathematics-related self-efficacy, self-concept, and anxiety in with math achievement scores.
Results are consistent with past research with older students, and carry implications for research, policy and
classroom practice.
117
Invariance
Session Discussant: Ha Phan, Pearson
The Impact of Measurement Noninvariance in Longitudinal Item Response Modeling
In-Hee Choi, University of California, Berkeley
This study investigates the impact of measurement noninvariance across time and group in longitudinal item
response modeling, when researchers examine group difference in growth. First, measurement noninvariance is
estimated from a large-scale longitudinal survey. These results are then used for a simulation study with different
sample sizes.
Measurement Invariance in International Large-Scale Assessments: Ordered-Categorical Outcomes in a
Multidimensional Context
Dubravka Svetina, Indiana University; Leslie Rutkowski, University of Oslo
A critical precursor to comparing means on latent variables across cultures is that the measures are invariant across
groups. Lack of consensus for cut off values for evaluating model fit in literature motivates this study where we
consider the performance of fit measures when data are modeled as multidimensional, ordered-categorical.
Assessing Uniform Measurement Invariance Using Multilevel Latent Modeling
Carrie Morris, University of Iowa College of Education; Xin Li, ACT
This simulation study investigated use of multilevel MIMIC and mixture models for assessing uniform measurement
invariance. A multilevel model was generated with measurement error, and measurement and factorial
noninvariances were imposed. Model fit, parameter and standard error bias, and power to detect noninvariance
were assessed for all estimated models.
Population Invariance of Equating Functions Across Subpopulations for a Large Scale Assessment
Lucy Amati and Alina von Davier, Educational Testing Service
In this study, we examine the population invariance assumption for a large-scale assessment. Results of the analysis
demonstrated that the equating functions for subpopulations are very close to that of the total population. Results
supported the invariance assumption of the equating function, contributing to showcase the fairness of the test.
118
Washington, DC, USA
Detecting Aberrant Response Behaviors
Session Discussant: John Donoghue, ETS
Methods That Incorporate Response Times and Responses for Excluding Data Irregularities
Heru Widiatmo, ACT, Inc.
Two methods, which use both responses and response times for excluding data irregularities, are combined and
compared to find an optimal method. The methods are Response Time Effort (RTE) and Effective Response Time
(ERT). The 3-PL IRT model is used to calibrate data and to evaluate the results.
Online Detection of Compromised Items with Response Times in CAT
Hyeon-Ah Kang, University of Illinois at Urbana-Champaign
An online calibration based CUSUM procedure is proposed to detect compromised items in CAT. The procedure
utilizes both observed item responses and response times for evaluating changes in item parameter estimates that
are obtained on-the-fly during the CAT administrations.
Detecting Examinee Preknowledge of Items: A Comparison of Methods
Xi Wang, University of Massachusetts Amherst; Frederic Robin, Hongwen Guo and Neil Dorans, Educational Testing Service;
Yang Liu, University of California, Merced
In a continuous testing program, examinees are likely to have preknowledge of some items due to the repeated use
of items over time. In this study, two methods are proposed to detect item preknowledge at person level, and their
effectiveness is compared in a multistage adaptive testing context.
Development of an R Package for Statistical Analysis in Test Security
Jiyoon Park, Yu Zhang and Lorin Mueller, Federation of State Boards of Physical Therapy
Statistical analysis of test results is the most widely used approach employed by test sponsors. Different statistical
methods can be used to capture the signs of security breaches and to evaluate the validity of test scores. We propose
an R package that provides systematic and comprehensive analyses in test security.
119
Session: GSIC Graduate Student Poster Session, G8
Graduate Student Issues Committee
Brian Leventhal, Chair
Masha Bertling, Laine Bradshaw, Lisa Beymer, Evelyn Johnson, Ricardo Neito, Ray
Reichenberg, Latisha Sternod, Dubravka Svetina
Electronic Board #1
Examining Test Irregularities Using Multidimensional Scaling Approach
Qing Xie, ACT/The University of Iowa
The purpose of this simulation study is to explore the possibility of using multidimensional scaling in detecting test
irregularities via the concept of consistency of a battery or test structure. The results will provide insights on how
well this method can be applied in different test irregularity situations.
Electronic Board #2
The Influence of Measurement Invariance in the Two-Wave, Longitudinal Mediation Model
Oscar Gonzalez, Arizona State University
Statistical mediation describes how two variables are related by examining intermediate mechanisms. The
mediation model assumes an underlying longitudinal design and that the same constructs are measured over time.
This study examines what happens to the mediated effect when longitudinal measurement invariance is violated in
a two-wave mediation model.
Electronic Board #3
Parallel Analysis of Unidimensionality with Pca and Paf in Dichotomously Scored Data
Ismail Cukadar, Florida State University
This Monte Carlo study investigates the impact of using two different factor extraction methods (principle
component analysis and principle axis factoring) in the Kaiser rule and the parallel analysis on the decision of
unidimensionality in binary data that has examinee guessing.
Electronic Board #4
Reducing Data Demands of Using a Multidimensional Unfolding IRT Model
Elizabeth Williams, Georgia Institute of Technology
A simulation study will be performed to investigate using a multidimensional scaling (MDS) solution in conjunction
with the Multidimensional Generalized Graded Unfolding Model (MGGUM) to reduce data demands. The expected
results are that the data demands will be reduced without sacrificing the quality of true parameter recovery.
Electronic Board #5
Challenging Conditions for Mml and Mh-Rm Estimation of Multidimensional IRT Models
Derek Sauder, James Madison University
The MHRM estimator is faster than the MML estimator, and generally gives comparable parameter estimates. In
one real dataset, the two procedures estimated similar item parameter values but different correlations between
the subscales. A simulation will be conducted to examine which factors might lead to discrepancies between the
estimators.
120
Washington, DC, USA
Electronic Board #6
The Effects of Dimensionality and Dimensional Structure on Composite Scores and Subscores
Unhee Ju, Michigan State University
Both composite scores and subscores can provide diagnostic information about students’ specific progress. A
simulation study was conducted to examine the performance of composite scores and subscores under different
conditions of the number of dimensions, dimensional structure, and correlation between dimensions. Their
implications will be discussed in the presentation.
Electronic Board #7
Simple Structure MIRT True Score Equating for Mixed-Format Tests
Stella Kim, The University of Iowa
This study proposes a SS-MIRT true-score equating procedure for mixed-format tests and investigates its
performance based on the results from real data analyses and a simulation study.
Electronic Board #8
Conditions of Evaluating Models with Approximate Measurement Invariance Using Bayesian Estimation
Ya Zhang, University of Pittsburgh
A simulation study is performed to investigate approximate measurement invariance (MI) through Bayesian
estimation. The size of differences in item intercepts, the proportion of items with differences on, and the level of
prior variabilities are manipulated. The study findings provide a general guideline to the use of approximate MI.
Electronic Board #9
Detecting Nonlinear Item Position Effects with a Multilevel Model
Logan Rome, University of Wisconsin-Milwaukee
When tests utilize a design in which items appear in different orders in various booklets, the item position can
impact item responses. This simulation study will examine the performance of a multilevel model in detecting
several functions and sizes of non-linear item-specific position effects.
Comparison of Scoring Methods for Different Item Types
Hongyu Diao, Unversity of Massachusetts-Amherst
This study will use a Monte Carlo simulation method to investigate the impact of concurrent calibration and separate
calibration for the mixed-format test. The response data of Multiple Choice and Technology-Enhanced Items are
simulated to represent two different dimensions.
IRT Approach to Estimate Reliability of Testlet with Balanced and Unbalanced Data
Nana Kim, Yonsei University
This study aims to investigate the effects of balanced and unbalanced data structures on the reliability estimates of
testlet-based test when applying item response theory (IRT) approaches using simulated data sets. We focus on the
relationship between patterns of reliability estimates and the degree of imbalance in data structure.
121
Hierarchical Bayesian Modeling for Peer Assessment in a Massive Open Online Course
Yao Xiong, The Pennsylvania State University
Peer assessment has been widely used in most of the massive open online courses (MOOCs) to provide feedbacks for
constructed-response questions. However, peer rater accuracy and reliability is a major concern. The current study
proposes a hierarchical Bayesian approach to account for the accuracy and reliability.
The Impact of Model Misspecification in the DCM-CAT
Yu Bao, The University of Georgia
Item parameters are usually assumed to be known in DCM-CAT simulations. When the assumption is violated,
model misspecification may lead to different item information and posterior distribution, which are essential for
item selection. The study shows how mis-fitting DCMs and overfitting DCMs will influence item bank usage and
classification accuracy.
Interval Estimation of IRT Proficiency in Mixed-Format Tests
Shichao Wang, The University of Iowa
Interval estimation of proficiency can help to clearly present information to test users on how to interpret the
uncertainty in their scores. This study intends to compare the performance of analytical and empirical approaches
in constructing an interval for IRT-based proficiency for mixed-format tests using simulation techniques.
Analysis of Item Difficulty Predictors for Item Pool Development
Feng Chen, The University of Kansas
Systematic item difficulty prediction is introduced which accounts for all possible item features. The effect of these
on resulting item parameters demonstrated using simulated and real data. Results will provide statistical and
evidentiary implications to item pool development and test construction.
Regressing Multiple Predictors into a Cognitive Diagnostic Model
Kuan Xing, University of Illinois at Chicago
This study is to investigate the stability of parameter estimates and classification when multiple covariates of
different types are analyzed in the RDINA and HO-DINA models. Real-world (TIMSS) data analyses and simulation
study were conducted. Educational significance regarding examining the relationship between covariates and the
CDM was discussed.
Non-Instructional Factors That Affect Student Mathematics Performance
Michelle Boyer, University of Massachusetts, Amherst
The effects of non-instructional factors in educational success are increasingly important for educational authorities
to understand as they seek to improve student outcomes. This study evaluates a large number of such factors and
their effects on mathematics performance for a large US nationally representative sample of students.
122
Washington, DC, USA
A Procedure to Improve Item Parameter Estimation in Presence of Test Speededness
Can Shao, University of Notre Dame
In this study, we propose to use a data cleansing procedure based on change-point analysis to improve item
parameter estimation in presence of test speededness. Simulation results show that this procedure can dramatically
reduce the bias and root mean square error of the item parameter estimates.
Simulation Study Off Estimation Methods in Multidimensional Student Response Data
Philip Grosse, University of Pittsburgh
The purpose of this simulation study is to provide a comparison of WLSMV and BAYES estimators in a bifactor model
based on simulated multidimensional student responses. The estimation methods are compared in terms of their
item parameter recovery and ability estimation.
Detecting Testlet Effect Using Graph Theory
Xin Luo, Michigan State University
Testlet effect has significant influence on measurement accuracy and test validity. This study proposed a new
approach based on graph theory to detect testlet effect. Results of a simulation study supported the quality of this
method.
Assessing Item Response Theory Dimensionality Assumptions Using DIMTEST and NOHARM-Based Methods
Kirsten Hochstedt, Penn State University
This study examined how select IRT dimensionality assessment methods performed for two- and three-parameter
logistic models with combinations of short test lengths, small sample sizes, and ability distribution shapes (skewness,
kurtosis). The capability of DIMTEST and three NOHARM-based methods to detect dimensionality assumption
violations in simulated data was compared.
Evaluating the Invariance Property in IRT: A Case of Multi-State Assessment
Seunghee Chung, Rutgers University
This simulation study investigates how the invariance property of IRT item parameter can be held under multi-state
assessment situation, especially when the characteristics of member states are dissimilar to one another. Practical
implication of multi-state assessment development is discussed to avoid potential measurement bias caused by
lack of invariance property.
Evaluating Predictive Accuracy of Alternative IRT Models and Scoring Methods
Charles Iaconangelo, Rutgers University, The State University of New Jersey
This paper uses longitudinal data from a large urban school system to evaluate different item response theory
models and scoring methods for their value in predicting future test scores. It finds that both richer IRT models, and
scoring methods based on response patterns rather than number correct, improve predictive accuracy.
123
A Comparison of Estimation Methods for the Multi-Unidimensional Three-Parameter IRT Model
Tzu Chun Kuo, Southern Illinois University Carbondale
Two marginal maximum likelihood (MML) approaches, three fully Bayesian algorithms, and a Metropolis-Hastings
Robbins-Monro (MHRM) algorithm were compared for estimating multi-unidimensional three-parameter models
using simulations. Preliminary results suggested that the two MML approaches, together with blocked Metropolis
and MHRM, had an overall better parameter recovery than the other estimation methods.
A Methodology for Item Condensation Rule Identification in Cognitive Diagnostic Models
Diego Luna Bazaldua, Teachers College, Columbia University
A methodology within a Bayesian framework is employed to identify the item condensation rules for cognitive
diagnostic models (CDMs). Simulated and empirical data are used to analyze the ability of the methodology to
detect the correct condensation rules for different CDMs.
124
Washington, DC, USA
4:35 PM - 5:50 PM, Ballroom C, Level Three, Convention Center
AERA Presidential Address
Public Scholarship to Educate Diverse Democracies
Jeannie Oakes, AERA President; University of California - Los Angeles
125
4:35 PM - 6:05 PM, Renaissance East, Ballroom Level, Coordinated Session, H1
Advances in Balanced Assessment Systems: Conceptual Framework, Informational
Analysis, Application to Accountability
Session Chair: Scott Marion, National Center for the Improvement of Educational
Assessment
Session Discussant: Lorrie Shepard, University of Colorado, Boulder
For more than a decade, there have been calls for multiple assessments to be designed and used in more integrated
ways—for “balanced” or “comprehensive” assessment systems. However, there has been little focused work on clearly
defining what is meant by a balanced assessment system as well as the characteristics that contribute to the quality
of such assessment systems. Importantly, there have been scant analyses of such systems and in particular how
instructional and accountability demands might both be addressed. This coordinated session presents advances in
conceptualizing and analyzing balanced assessment systems.
The session begins with an overview of the need for considering the quality of balanced assessment systems,
with an emphasis on validity and usefulness. The second presentation focuses on conceptualizing the systems
aspects of a balanced assessment system—what characterizes a system that goes beyond good individual
assessments? The third presentation presents two approaches, content-based alignment judgments and scalebased interpretations, together to get content-referenced information from assessments to support instruction and
learning. These approaches are based on the actual information available and the interpretations supported. The
fourth presentation presents a technical analysis of comparability in a balanced assessment system in the context
of school accountability.
Balanced Assessment Systems: Overview and Context
Brian Gong and Scott Marion, National Center for the Improvement of Educational Assessment
Systemic Aspects of Balanced Assessment Systems
Rajendra Chattergoon, University of Colorado, Boulder
Validity and Utility in a Balanced Assessment System: Use, Information, and Timing
Phonraphee Thummaphan, University of Washington, Seattle; Nathan Dadey, Center for Assessment
Comparability in Balanced Assessment Systems for State Accountability
Carla Evans, University of New Hampshire; Susan Lyons, Center for Assessment
126
Washington, DC, USA
4:35 PM - 6:05 PM, Renaissance West A, Ballroom Level, Coordinated Session, H2
Minimizing Uncertainty: Effectively Communicating Results from CDM-Based
Assessments
Session Discussant: Jacqueline Leighton, University of Alberta
Fueled by needs for educational tests that provide diagnostic feedback, researchers have made recent progress in
designing statistical models that are well-suited to categorize examinees according to mastery levels for a set of
latent skills or abilities. Cognitive diagnosis models (CDMs) yield probabilistic classifications of students according
to multiple facets, termed attributes, of knowledge or reasoning. These results have the potential to inform
instructional decision-making and learning, but in order to do so the results must be comprehensible to a variety
of education stakeholders.
This session will include four papers on CDMs and communicating CDM-based results. Laine Bradshaw and Roy
Levy outline the challenges of reporting results from CDMs and provide context for subsequent papers. Tasmin
Dhaliwal, Tracey Hembry and Laine Bradshaw provide empirical evidence of teacher interpretation and preference
for viewing mastery probabilities and classification results, in an online reporting environment. Kristen DiCerbo
and Jennifer Kobrin share findings on how to present learning progression-based assessment results to teachers to
support their instructional decision-making. Valerie Shute and Diego Zapata-Rivera model (using Bayes nets) and
visualize students’ beliefs in flexible belief networks.
Interpreting Examinee Results from Classification-Based Models
Laine Bradshaw (2015 Jason Millman Promising Measurement Scholar Award Winner), University of Georgia; Roy Levy,
Arizona State University
Achieving the Promise of CDMs: Communicating CDM-Based Assessment Results
Tasmin Dhaliwal, Pearson; Tracey Hembry, Alpine Testing Solutions; Laine Bradshaw, University of Georgia
Communicating Assessment Results Based on Learning Progressions
Kristen DiCerbo and Jennifer Kobrin, Pearson
Representing and Visualizing Beliefs
Valerie Shute, Florida State University; Diego Zapata-Rivera, Educational Testing Service
127
4:35 PM - 6:05 PM, Meeting Room 16, Meeting Room Level, Coordinated Session, H3
Overhauling the SAT: Using and Interpreting Redesigned SAT Scores
Session Chair: Maureen Ewing, College Board
Session Discussant: Suzanne Lane, University of Pittsburgh
In February of 2013, the College Board announced it would undertake a major redesign of the SAT® with the intent of
making the test more transparent and useful. The redesigned test will assess skills, knowledge, and understandings
that matter most for college and career readiness. Only relevant vocabulary (as opposed to the sometimes criticized
obscure vocabulary measured today) will be assessed. The Math section will be focused on a smaller number of
content areas. The essay will be optional. There will be a switch to rights-only scoring. The total score scale will revert
back to the original 400 to 1600, and there will be several cross-test scores and subscores. At the same time, scores
on the redesigned assessment are expected to continue to meaningfully predict success in college and serve as a
reliable indicator of college and career readiness.
Throughout the redesign effort, many important research questions emerged such as: (1) How can we be sure the
content on the redesigned test measures what is most important for college and career readiness? (2) How can we
develop concordance tables to relate scores on the redesigned assessment to current scores? (3) How do we define
and measure college and career readiness? (4) How well can we expect scores on the redesigned assessment to
predict first-year college grades?
The purpose of this session is to describe the research the College Board has done to support the launch of
the redesigned SAT. The session will begin with a brief overview of the changes to the SAT with a focus on how
these changes are intended to make the test more transparent and useful. Four papers will follow that describe
more specifically the test design and content validity argument for the new test, the development and practical
implications of producing and delivering concordance tables, the methodology used to develop and validate college
and career readiness benchmarks for the new test and, lastly, early results about the relationship between scores
on the redesigned SAT and college grades gathered from a special, non-operational study. The discussant, Suzanne
Lane, who is a nationally-renowned expert on assessment design and validity research, will offer constructive
comments on the fundamental ideas, approaches, and designs undergirding the research presentations.
An Overview of the Redesigned SAT
Jack Buckley, College Board
The Redesigned SAT: Content Validity and Assessment Design
Sherral Miller and Jay Happel, College Board
Producing Concordance Tables for the Transition to the Redesigned SAT
Pamela Kaliski, Rosemary Reshetar, Tim Moses, Hui Deng and Anita Rawls, College Board
College and Career Readiness and the Redesigned SAT Benchmarks
Jeff Wyatt and Kara Smith, College Board
A First Look at the Predictive Validity of the Redesigned SAT
Emily Shaw, Jessica Marini, Jonathan Beard and Doron Shmueli, College Board
128
Washington, DC, USA
4:35 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, H4
Quality Assurance Methods for Operational Automated Scoring of Essays and Speech
Session Discussant: Vincent Kieftenbeld, Pacific Metrics
The quality of current automated scoring systems is increasingly comparable with or even surpassing that of
trained human raters. Ensuring score validity in automated scoring, however, requires sophisticated quality
assurance methods both during the design and training of automated scoring models, as well as during operational
automated scoring. The four studies in this coordinated session present novel quality assurance methods for use
in operational automated scoring of essay and speech responses. A common theme unifying these studies is the
development of techniques to screen responses during operational scoring. A wide variety of methods is used,
ranging from ensemble learning and outlier detection to information retrieval and natural language processing and
identification. This session complements the session Challenges and solutions in the operational use of automated
scoring systems which focuses on quality assurance during the design and training phases of automated scoring.
Statistical High-Dimensional Outlier Detection Methods to Identify Abnormal Responses in Automated Scoring
Raghuveer Kanneganti, Data Recognition Corporation CTB; Luyao Peng, University of California, Riverside
Does Automated Speaking Response Scoring Favor Speakers of Certain First Language?
Guangming Ling and Su-Youn Yoon, Educational Testing Service
Feature Development for Scoring Source-Based Essays
Claudia Leacock, McGraw-Hill Education CTB; Raghuveer Kanneganti, Data Recognition Corporation CTB
Non-Scorable Spoken Response Detection Using NLP and Speech Processing Techniques
Su-Youn Yoon, Educational Testing Service
129
4:35 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Paper Session, H5
Student Growth Percentiles
Session Discussant: Damian Betebenner, Center for Assessment
The Accuracy and Fairness of Aggregate Student Growth Percentiles as Indicators of Educator Performance
Jason Millman Promising Measurement Scholar Award Winner 2016: Katherine Furgol Castellano
Daniel McCaffrey and J.R. Lockwood, ETS
Aggregated SGP (AGP), the mean/median SGP for students linked to the same teacher/school, are a popular
alternative to VAM-based measures of educator performance. However, we demonstrate that test score
measurement error affects the accuracy and precision of typically used AGP. We also contrast standard AGP against
several alternative AGP estimators.
Cluster Growth Percentiles: An Alternative to Aggregated Student Growth Percentiles
Scott Monroe, UMass Amherst; Li Cai, CRESST/UCLA
Aggregates of Student Growth Percentiles (Betebenner, 2009) are used by numerous states for purposes of teacher
evaluation. In this research, we propose an alternative statistic, a Cluster Growth Percentile, defined directly at the
group or cluster-level. The two approaches are compared, and simulated and empirical examples are provided.
Evaluating Student Growth Percentiles: Perspective of Test-Retest Reliability
Johnny Denbleyker, Houghton Mifflin Harcourt; Ye Lin, University of Iowa
This study examines SGP calculations and corresponding NCEs where multiple test opportunities existed within
the accountability testing window for an NCLB mathematics assessment. This allowed aspects of reliability to be
assessed in a practical test-retest manner while accounting for measurement error associated with both sampling
of items and occasions.
130
Washington, DC, USA
Equating: From Theory to Practice
Session Discussant: Ye Tong, Pearson
Similarities Between Equating Equivalents Using Presmoothing and Postsmoothing
Hyung Jin Kim and Robert Brennan, The University of Iowa
Presmoothing and postsmoothing improve equating by reducing sampling error. However, little research has
been conducted about similarities in equated-equivalents between presmoothing and postsmoothing. This study
examines how equated-equivalents differ between presmoothing and postsmoothing for different smoothing
degrees, and investigates the presmoothing degrees giving similar results as a specific postsmoothing degree.
Stability of IRT Calibration Methods for the Common-Item Nonequivalent Groups Equating Design
Yujin Kang and Won-Chan Lee, University of Iowa
The purpose of this study is to investigate accumulated equating error of item response theory (IRT) calibration
methods in the common-item nonequivalent groups (CINEG) design. The factors of investigation are calibration
methods, equating methods, types of change in the ability distribution, common item compositions, and computer
software for calibration.
Subscore Equating and Reporting
Euijin Lim and Won-Chan Lee, The University of Iowa
The purpose of this study is to address the necessity of subscore equating in terms of score profiles using real data
sets and discuss practical issues related thereto. Also, the performance of several equating methods for subscores
are compared under various conditions using simulation techniques.
On the Effect of Varying Difficulty of Anchor Tests on Equating Accuracy
Irina Grabovsky and Daniel Julrich, NBME
This study investigates the question of optimal location of anchor test for equating minimum competency
examinations. For examinations where means of distributions of examinee abilities and item difficulties are distance
apart, placement of an anchor test based on proximity to examinee ability mean results in a more accurate equating
procedure.
131
Issues in Ability Estimation and Scoring
Session Discussant: Peter van Rijn
Practical and Policy Impacts of Ignoring Nested Data Structures on Ability Estimation
Kevin Shropshire, Virginia Tech (note I graduated in May 2014). I currently work at the University of Georgia (OIR) and this
research is not affiliated with that department / university. I am providing the school where my research was conducted.;
Yasuo Miyazaki, Virginia Tech
Consistent with the literature, the standard errors corresponding to item difficulty parameters are underestimated
when clustering is part of the design but ignored in the estimation process. This research extends the focus to the
impact of design clustering on ability estimation in IRT models for psychometricians and policy makers.
MIRT Ability Estimation: Effects of Ignoring the Partially Compensatory Nature
Janine Buchholz and Johannes Hartig, German Institute for International Educational Research (DIPF); Joseph Rios,
The MIRT model most commonly employed to estimate within-item multidimensionality is compensatory. However,
numerous examples in educational testing suggest partially compensatory relations among dimensions. We
therefore investigated conditional bias in theta estimates when incorrectly applying the compensatory model.
Findings demonstrate systematic underestimation for examinees highly proficient in one dimension.
Interval Estimation of Scale Scores in Item Response Theory
Yang Liu, University of California, Merced; Ji Seung Yang, University of Maryland, College Park
In finite samples, the uncertainty arising from item parameter estimation is often non-negligible and must be
accounted for when calculating latent variable scores. Various Bayesian, fiducial, and frequentist interval estimators
are harmonized under the framework of consistent predictive inference, and their performances are evaluated via
Monte Carlo simulations.
Applying the Hajek Approach in the Delta Method of Variance Estimation
Jiahe Qian, Educational Testing Service
The variance formula derived by the delta method, for two-stage sampling design, employs the joint inclusion
probabilities in the first-stage selection of schools. The inquiry aims to apply Hajek approximation to estimate the
joint probabilities, which are often unavailable in analysis. The application is illustrated with real and simulation data.
2016 Bradley Hanson Award for Contributions to Educational Measurement: Sun-Joo Cho
132
Washington, DC, USA
Session, Paper Session, H8
Electronic Board #1
Asymmetric ICCs as an Alternative Approach to Accommodate Guessing Effects
Sora Lee and Daniel Bolt, University of Wisconsin, Madison
Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating
guessing effects are well documented (Han, 2012). We consider the use of a residual heteroscedasticity model
(Molenaar, 2014) as an alternative, and compare its performance to the 3PL with real test datasets and through
simulation analyses.
Electronic Board #2
Software Note for PARSCALE
Ying Lu, John Donoghue and Hanwook Yoo, Educational Testing Service
PARSCALE is one of the most popular commercial software packages for IRT calibration. PARSCALE users, however,
should be aware of the issues associated with the software to ensure the quality of IRT calibration results. The
purpose of this paper is to summarize these issues and to suggest solutions.
Electronic Board #3
Stochastic Approximation EM for Exploratory Item Factor Analysis
Eugene Geis and Greg Camilli, Rutgers Graduate School of Education
We present an item parameter estimation combining stochastic approximation and Gibbs sampling for exploratory
multivariate IRT analyses. It is characterized by drawing a missing random variable, updating post-burn-in sufficient
statistics of missing data using the Robbins-Monro procedure, estimating factor loadings using a novel approach,
and drawing samples of latent ability.
Electronic Board #4
Reporting Student Growth Percentiles: A Novel Tool for Displaying Growth
David Swift and Sid Sharairi, Houghton Mifflin Harcourt
The increased use of growth models has created a need for tools that help policy makers with growth decisions and
inform stakeholders. The data tool presented meets this need through a feature rich, user friendly application that
puts the policy maker in control.
Electronic Board #5
The Impact of Plausible Values When Used Incorrectly
Kyung Sun Chung, Pennsylvania State University
This study examined the effect of plausible values when used incorrectly such as using one value out of five provided
or using averages of five plausible values. Two previously published studies are replicated for practical relevance.
The results present that appropriate use of plausible values is recommended for unbiased estimates.
Electronic Board #6
Missing Data – on How to Avoid Omitted and Not-Reached Items
Miriam Hacker, Frank Goldhammer and Ulf Kröhne, German Institute for International Educational Research (DIPF)
The problem of missing data is common in almost all measurements. In this study, the occurrence of missing data
is examined and how to avoid them by presenting more time information at item level. Results indicates that time
information can reduce missing responses without affecting the performance.
133
Electronic Board #7
Challenging Measurement in the Field of Multicultural Education: Validating a New Scale
Jessie Montana Cain, University of North Carolina at Chapel Hill
Measurement in the field of multicultural education has been scarce. In this study the psychometric properties of
the newly developed Multicultural Teacher Capacity Scale were examined. The MTCS is a reliable and valid measure
of multicultural teacher capacity for samples that mirror the development sample.
Electronic Board #8
Automated Test Assembly Methods Using Monte-Carlo-Based Linear-On-The-Fly (LOFT) Techniques
John Weiner and Gregory Hurtz, PSI Services LLC
Monte-Carlo-based Linear-on-the-fly techniques of automated test assembly offer a number of advantages toward
the goals of exam security, exam form equivalence, and efficiency in examination development activities. Classicaltest-theory and Rasch/IRT approaches are compared, and issues of statistical sampling and analyses are discussed.
Electronic Board #9
DIF Related to Test Takers’ Culture Background and Language Proficiency
Jinghua Liu, Secondary School Admission Test Board; Tim Moses, College Board
This study examines DIF from the perspective of test takers’ culture background by using operational data from a
standardized admission test. We recommend that testing programs containing large portion of test takers from
different regions and culture background ought to add region/culture DIF to the DIF routine screening.
Can a Two-Item Essay Test Be Reliable and Valid?
Brent Bridgeman and Donald Powers, Educational Testing Service
Psychometricians have long complained that a two-item essay test cannot be reliable and valid for predicting
academic outcomes compared to a multiple-choice test (e.g., Wainer & Thissen, 1993). Recent evidence from
predictive validity studies of Verbal Reasoning and Analytical Writing GRE scores challenges this point of view.
Selecting Automatic Scoring Features Using Criticality Analysis
Han-Hui Por and Anastassia Loukina, Educational Testing Service
We apply the criticality analysis approach to select features in the automatic scoring of spoken responses in a
language assessment. We show that this approach addresses issues of sample dependence and bias, and identifies
salient features that are critical in improving model validity.
A Meta-Analysis of the Predictive Validity of Graduate Management Admission Test
HAIXIA QIAN, Kim Trang and Neal Martin Kingston, University of Kansas
The purpose of the meta-analysis was to assess the Graduate Management Admission Test (GMAT) and
undergraduate GPA (UGPA) as predictors of business school performance. Results showed both the GMAT and UGPA
were significant predictors, with the GMAT as a stronger predictor compared to UGPA.
A Fully Bayesian Approach to Smoothing the Linking Function in Equipercentile Equating
Zhehan Jiang and William Skorupski, University of Kansas
A fully Bayesian parametric method for robustly estimating the linking function in equipercentile equating is
introduced, explicated, and evaluated via a Monte Carlo simulation study.
134
Washington, DC, USA
Conducting a Post-Equating Check to Detect Unstable Items on Pre-Equated Tests
Keyin Wang, Michigan State University; Wonsuk Kim and Louis Roussos, Measured Progress
Pre-equated tests are increasingly common. Every item is assumed to behave in a stable manner. Thus, “postequated” checks need to be conducted to detect and correct problematic items. Little research has been directly
conducted on this topic. This study proposes possible procedures and begins to evaluate them.
An Evaluation of Methods for Establishing Crosswalks Between Instruments
Mark Hansen, University of California, Los Angeles
In this study, we evaluate several approaches for obtaining projections (or crosswalks) between instruments
measuring related, but somewhat distinct constructs. Methods utilizing unidimensional and multidimensional item
response theory models are compared. We examine the impact of test length, correlation between constructs, and
sample characteristics on the quality of the projection.
Exploration of Factors Affecting the Necessity of Reporting Test Subscores
Xiaolin Wang, Dubravka Svetina and Shenghai Dai, Indiana University, Bloomington
Interest in test subscore reporting has been growing rapidly for diagnosis purposes. This simulation study examined
factors (correlation between subscales, number of items per subscale, complexity of test, and item parameter
distribution) that affected the necessity of reporting subscores within the classical test theory framework.
Evaluation of Psychometric Stability of Generated Items
YU-LAN SU, TINGTING CHEN and JUI-SHENG WANG, ACT,ING
The study investigated the psychometric stability of generated items using operational data. The generated items
were compared to their parents for the classical item statistics, DIF, raw response distributions to the key, and IRT
parameters. The empirical evidence will serve as groundwork for the growing applications of item generation.
Creating Parallel Forms with Small Samples of Examinees
Lisa Keller, University of Massachusetts Amherst; Rob Keller, Measured Progress; Andrea Hebert, Bottom Line Technologies
This study investigates using item specific priors in item calibration to assist in the creation of parallel forms in the
presence of small samples of examinees. Results indicate that while the item parameters may still contain error,
classification of examinees into performance categories might be improved using the method.
Higher-Order G-DINA Model for Polytomous Attributes
qin Yi and Tao Yang, Faculty of Education, Beijing Normal University; Tao Xin and lou Liu, School Of Psychology, Beijing
Normal University
G-DINA Model for Polytomous Attributes (Jinsong Chen,2013) accounting for the attribute level can provide
additional diagnostic information. While involving the high order structure, it can provide more micro attributes
information and macro capability expression linked to IRT theory, which also increase the sensitivity of classification.
135
New Search Algorithm for Q-matrix Validation
Ragip Terzi, Rutgers, The State University of New Jersey; Jimmy de la Torre, Rutgers University
The validity while constructing a Q-matrix in cognitive diagnosis modeling has raised significant attentions
due to the possibility of attribute-misspecifications. It can result in model-data misfit and ultimately attributemisclassifications. The current study proposes a new method for Q-matrix validation. The results are also compared
to other parametric and non-parametric methods.
Generalized DCMs for Option-Based Scoring
Oksana Naumenko, Yanyan Fu and Robert Henson, The University of North Carolina at Greensboro; Bill Stout, University of
Illinois at Urbana-Champaign; Lou DiBello, University of Illinois at Chicago
A recently proposed family of models, the Generalized Diagnostic Classification Models for Multiple Choice OptionBased Scoring (GDCM-MC) extracts information about examinee cognitive processing from all MC item options. This
paper describes a set of simulation studies with factors such as test length and number of options that examine
model performance.
Evaluating Sampling Variability and Measurement Precision of Aggregated Scores in Large-Scale Assessment
Xiaohong Gao and Rongchun Zhu, ACT, Inc.
The study demonstrates how to conceptualize sources of measurement error and estimate sampling variability and
reliability in large-scale assessment of educational quality. One international and one domestic assessment data
sets are used to shed light on potential sources of measurement uncertainty and improvement of measurement
precision for aggregated scores.
The Model for Dichotomously-Scored Multiple-Attempt Multiple-Choice Items
Igor Himelfarb and Katherine Furgol Castellano, Educational Testing Service (ETS); Guoliang Fang, Penn State University
This paper proposes a model for dichotomously-scored, multiple-attempt, multiple-choice item responses that may
occur in scaffolded assessments. Assuming a 3PL IRT model, simulations were conducted using MCMC MetropolisHasting to recover the generated parameters. Results indicate that best recovery was for item parameters of low
and moderate difficulty and discrimination.
Classical Test Theory Embraces Cognitive Load Theory: Measurement Challenges Keeping It Simple
Charles Secolsky, Mississippi Department of Education; Eric Magaram, Rockland Community College
The measurement community is challenged by advances in educational technology and psychology.On a basic level,
classical test theory is used as a measurement model for understanding cognitive load theory and the influence of
cognitive load theory has on test validity.The greater the germane cognitive load, the greater the true score.
136
Washington, DC, USA
6:30 PM - 8:00 PM, Renaissance West B, Ballroom Level
President’s Reception
By Invitation Only
137
138
Washington, DC, USA
Annual Meeting Program - Monday, April 11, 2016
139
140
Washington, DC, USA
Monday, April 11, 2016
5:45 AM - 7:00 AM
NCME Fitness Run/Walk
Session Organizers: Katherine Furgol Castellano, ETS; Jill R. van den Heuvel, Alpine Testing
Solutions
Start your morning with NCME’s annual 5k Walk/Run in Potomac Park. Meet in the lobby of the Renaissance
Washington, DC Downtown Hotel at 5:45AM. Pre-registration is required. Pickup your bib number and t-shirt at
the NCME Information Desk in the hotel, anytime prior to race day. Transportation will be provided. (Additional
registration fee required)
The event is made possible through the sponsorship of:
National Center for the Improvement of Educational Assessment, Inc.
Measurement, Inc.
College Board
ACT
American Institutes for Research
Graduate Management Admission Council
Pearson Educational Measurement
Houghton Mifflin Harcourt
Law School Admission Council
Applied Measurement Professionals, Inc.
WestEd
HumRRO
141
8:15 AM - 10:15 AM, Meeting Room 13/14, Meeting Room Level, Invited Session, I1
NCME Book Series Symposium: Technology and Testing
Session Editor: Fritz Drasgow, University of Illinois at Urbana-Champaign
This symposium draws on Technology and Testing: Improving Educational and Psychological Measurement, a recently
published volume in the new NCME Book Series. The volume probes the remarkable opportunities for innovation
and progress that have resulted from the convergence of advances in technology, measurement, and the cognitive
and learning sciences. The book documents many of these new directions and provides suggestions for numerous
further advances. It seems safe to predict that testing will be dramatically transformed over the new few decades –
paper test booklets with opscan answer sheets will soon be as outdated as computer punch cards.
The book is divided into four sections, each with several chapters and a section commentator. For purposes of
this symposium, one chapter author per section will present his or her chapter in some depth, followed by the
section commentator who will briefly review each of the other chapters in the section. The symposium offers the
measurement community a unique opportunity to learn about how technology will help to transform assessment
practices and the challenges that transformation is already posing and will continue to present
Issues in Simulation-Based Assessment
Brian Clauser and Melissa Margolis, National Board of Medical Examiners; Jerome Clauser, American Board of Internal
Medicine; Michael Kolen, University of Iowa
Commentator: Stephen Sireci, University of Massachusetts, Amherst
Using Technology-Enhanced Processes to Generate Test Items in Multiple Languages
Mark Gierl, Hollis Lai, Karen Fung and Bin Zheng, University of Alberta
Commentator: Mark Reckase, Michigan State University
Increasing the Accessibility of Assessments through Technology
Elizabeth Stone, Cara Laitusis and Linda Cook, ETS
Commentator: Kurt Geisinger, University of Nebraska, Lincoln
From Standardization to Personalization: The Comparability of Scores Based on Different Testing Conditions,
Modes, and Devices
Walter Way, Laurie Davis, Leslie Keng and Ellen Strain-Seymour, Pearson
Commentator: Edward Haertel, Stanford University
142
Washington, DC, USA
8:15 AM - 10:15 AM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, I2
Exploring Various Psychometric Approaches to Report Meaningful Subscores
Session Discussant: Li Cai, University of California, Los Angeles
The impetus of this session came directly from needs and concerns expressed by score users of K-12 large-scale
Common Core State Standards (CCSS) aligned assessments. Subscores, also called domain scores as Reading,
Listening, and Writing in an English language arts test, and subdomain scores that are based on detailed content
standards nested within a domain are reported in assessments. As the CCSS have been adopted by many states,
educators and parents need information of both domain and sub-domain from the state accountability tests to
(1) explain the student’s performance in certain content areas, (2) evaluate the effects of teaching and learning
practices in classroom and (3) investigate the impact of implementation of CCSS. However, the use of subscores
has been criticized for its low reliability (Thissen & Wainer, 2001) and little added value when correlations among
subscores are high (Sinharay, 2010). In online-adaptive testing, the traditional observed subscores are usually not
meaningful, because students responded to different items at different difficulty levels, which renders the subscores
not comparable among students. Furthermore, in an online-adaptive testing format, each student usually receives
only a few items that are from the core content-related subdomain units. In that case, student-level subdomain
scores are unlikely to be reliable. However when school-level factors were collected from many students, the
aggregated information may be meaningful.
The issues of reporting subscores in K-12 CCSS-aligned assessments are discussed by four different approaches
from both theoretical and empirical perspectives. Our studies show that the reliabilities can be improved and
additional information can be provided to test users in assessment even under the online-adaptive testing setting.
The first study presents results from a residual analysis of subscores which has been widely applied in the statewide
assessments. The advantages, limitations and possible solutions for improvement are also discussed. The second
study uses a mixture of Item Response Theory (IRT) and a higher-order cognitive diagnostic models (HO-DINA) to
produce attribute classification profiles as alternative of traditional subscores along with general ability scores. The
third study proposes a Multilevel Testlet (MLT) item factor model to produce school-level instructionally-meaningful
subscores. The fourth study incorporates collateral information by implementing a fully Bayesian approach to report
more reliable subscores. This panel of studies will provide an insight of subscore from various approaches and both
a within- and across-methodologies perspective. We hope this session can enrich the literature and methodology
in subscore reporting and also support producing meaningful diagnostic information for teaching and learning.
Using Residual Analysis to Report Subscores in Statewide Assessments
Jon Cohen, American Institutes for Research
Applying a Mixture of IRT and HO-DINA Models in Subscore Reporting
Likun Hou, Educational Testing Services; Yan Huo, Educational Testing Service; Jummy de la Torre, Rutgers University
Multilevel Testlet Item Factor Model for School-Level Instructionally-Meaningful Subscores
Megan Kuhfeld, University of California, Los Angeles
Incorporating Collateral Information and Fully Bayesian Approach for Subscores Reporting
Yi Du, Educational Testing Services; Shuqin Tao, curriculum associates; Feifei Li, Educational Testing Service
143
8:15 AM - 10:15 AM, Meeting Room 3, Meeting Room Level, Coordinated Session, I3
From Items to Policies: Big Data in Education
Session Discussant: Zachary Pardos, School of Information and Graduate School of
Education, UC Berkeley
Data are woven into every sector of the global economy (McGuire et al., 2012), including education. As technology
and analytics improve, the use of big data to derive insights that lead to system improvements is growing rapidly.
The purpose of this panel is to share a collection of promising approaches for analyzing and leveraging big data in a
wide range of education contexts. Each contribution is an application of machine learning, computer science, and/
or statistical techniques to an education issue or question, in which expert judgment would be costly, impractical,
or otherwise hampered by the magnitude of the problem. We focus on the novel application of big data to address
questions of construct validity for assessments; inferences about student abilities and learning needs when data are
sparse or unstructured; decisions about course structure; and public sentiment about specific education policies.
The ultimate goal for the use of big data and the application of these methods is to improve outcomes for learners.
We conclude the session with lessons learned from the application of these methods to research questions across a
broad spectrum of education issues, noting strengths and limitations.
What and When Students Learn: Q-Matrices and Student Models from Longitudinal Data
José González-Brenes, Center for Digital Data, Analytics & Adaptive Learning, Pearson
Misconceptions Revealed Through Error Responses
Thomas McTavish, Center for Digital Data, Analytics and Adaptive Learning, Pearson
Beyond Subscores: Mining Student Responses for Diagnostic Information
William Lorié, Center for NextGen Learning & Assessment, Pearson
Mining the Web to Leverage Collective Intelligence and Learn Student Preferences
Kathy McKnight, Center for Educator Learning & Effectiveness, Pearson; Antonio Moretti and Ansaf Salleb-Aouissi, Center for
Computational Learning Systems, Columbia University; José González-Brenes, Center for Digital Data, Analytics & Adaptive
Learning, Pearson
The Application of Sentiment and Topic Analysis to Teacher Evaluation Policy
Antonio Moretti and Ansaf Salleb-Aouissi, Center for Computational Learning Systems, Columbia University; Kathy
McKnight, Center for Educator Learning & Effectiveness, Pearson
144
Washington, DC, USA
8:15 AM - 10:15 AM, Meeting Room 4, Meeting Room Level, Coordinated Session, I4
Methods and Approaches for Validating Claims of College and Career Readiness
Session Chair: Thanos Patelis, Center for Assessment
Session Discussant: Michael Kane, Educational Testing Service
The focus on college and career readiness has penetrated all aspects and segments of education, as well as
economic and political rhetoric. Testing organizations, educational organizations, states, and institutions of higher
education have made claims of college and career readiness. New large-scale assessments have been launched and
historic assessments used for college admissions and placements are being revised to represent current claims of
college and career readiness. Validation evidence to substantiate these claims are important and expected (AERA,
APA, & NCME, 2014). This session will involve four presentations by active participants and contributors in the
conceptualization, design, and implementation of validation studies. Each presentation will present a validation
framework and specific suggestions, recommendations and examples of methodologies in undertaking the
validation of these claims of college and career readiness. Concrete suggestions will be provided. A fifth presenter
will offer comments about the presentations and also provide additional recommendations and insights.
Are We Ready for College and Career Readiness?
Stephen Sireci, University of Massachusetts-Amherst
Validating Claims for College and Career Readiness with Assessments Used for Accountability
Wayne Camara, ACT
Moving Beyond the Rhetoric: Urgent Call for Empirically Validating Claims of College-And-Career-Readiness
Catherine Welch and Stephen Dunbar, University of Iowa
Some Concrete Suggestions and Cautions in Evaluating/Validating Claims of College Readiness
Thanos Patelis, Center for Assessment
145
8:15 AM - 10:15 AM, Renaissance West A, Ballroom Level, Invited Session, I5
Recent Advances in Quantitative Social Network Analysis in Education
Presenters: Tracy Sweet, University of Maryland
Qiwen Zheng, University of Maryland
Mengxiao Zhu, ETS
Sam Adhikari, Carnegie Mellon University
Beau Dabbs, Carnegie Mellon University
I-Chien Chen, Michigan State University
Social network data is becoming increasingly more common in education research and the purpose of this
symposium is to both summarize current research on social network methodology and to showcase how these
methods can address substantive research questions in education and promote on-going education research. Each
presentation introduces exciting cutting-edge methodological research focusing on different aspects of social
network analysis that will be of interest to both methodologists and education researchers.
The session will begin with an introduction by Tracy Sweet followed by several methodological talks showcasing
exciting new research. Mengxiao Zhu will describe new ways to analyze network data from students’ learning and
problem-solving processes. Qiwen Zheng will discuss a model for multiple networks that focuses on subgroup
integration. Sam Adikhari will discuss a longitudinal model that illustrates how network structure changes over
time, and I-Chien Chen will also introduce new methods for multiple time points but will focus on how changes over
time is related to changes in other outcomes. Finally, Beau Dabbs will discuss model selection methods
146
Washington, DC, USA
8:15 AM - 10:15 AM, Meeting Room 15, Meeting Room Level, Paper Session, I6
Issues in Automated Scoring
Session Discussant: Shayne Miel, Turnitin
Modeling the Global Text Features for Enhancing the Automated Scoring System
Syed Muhammad Fahad Latifi and Mark Gierl, University of Alberta
We will introduce and demonstrate the innovative modeling of global text features for enhancing the performance
of automated essay scoring (AES) system. The representative dataset from PARCC and SMARTER Balanced states
were used. The results suggested that the global text modeling has consistently outperformed two state-of-the-art
commercial AES systems.
Discretization of Scores from an Automated Scoring Engine Using Gradient Boosted Machines
Scott Wood, Pacific Metrics Corporation
In automated scoring engines using linear regression models, it is common to convert the continuous predicted
scores into discrete scores for reporting. A recent study shows that special care must be taken when converting
continuous predicted scores from gradient boosted machine modelling into discrete scores.
Automated Scoring of Constructed Response Items Measuring Computational Thinking
Daisy Rutstein, John Niekrasz and Eric Snow, SRI International
Increasingly, assessments contain constructed response items to measure hard-to-assess inquiry- and designbased concepts. These types of item responses are challenging to score reliably and efficiently. This paper discusses
the adaptation of an automated scoring engine for scoring responses on constructed response items measuring
computational thinking.
Automated Scoring of Complex Technology-Enhanced Tasks in a Middle School Science Unit
Samuel Crane, Aaron Harnly, Malorie Hughes and John Stewart, Amplify
We show how complex user-interaction data from a Natural Selection app can be auto-scored using several
methods. We estimate validity using a comparative analysis of content-expert ratings, evidence rule scoring, and
a machine learning approach. The machine learning approaches are shown to agree with expert human scoring.
Comparison of Human Rater and Automatic Scoring on Students’ Ability Estimation
Zhen Wang, Educational Testing Service (ETS); Lihua Yao, DoD Data Center; Yu Sun
The purpose is to compare human rater with automatic scoring in terms of examinees’ ability estimation with IRTbased rater model. Each speaking item is analyzed with both IRT models without rater-effect and with rater-effects.
The effects of different rating designs may substantially increase the bias in examinees’ ability estimation.
Issues to Consider When Examining Differential Item Functioning in Essays
Matthew Schultz, Jonathan Rubright and Aster Tessema, American Institute of Certified Public Accountants
The development of Automated Essay Scoring has propelled the increasing use of writing in high-stakes assessments.
To date, DIF is rarely considered in such contexts. Here, methods to assess DIF in essays and considerations for
practitioners are reviewed, and results of an application from an operational testing program are discussed.
147
8:15 AM - 10:15 AM, Meeting Room 16, Meeting Room Level, Paper Session, I7
Multidimensional and Multivariate Methods
Session Discussant: Irina Grabovsky, NBME
Information Functions of Multidimensional Forced-Choice IRT Models
Seang-hwane Joo, Philseok Lee and Stephen Stark, University of South Florida
This paper aimed to develop the concept of information functions for multidimensional force-choice IRT models
and demonstrate how statement parameters and test formats (pair, triplet and tetrad) influence the item and test
information. The implications for constructing fake-resistant noncognitive measures are further discussed using
information functions.
Investigating Reverse-Worded Matched Item Pairs Using the GPCM and NRM
Ki Matlock, Oklahoma State University; Ronna Turner and Dent Gitchel, University of Arkansas
The GPCM is often used for polytomous data, however the NRM allows for the investigation of how adjacent
categories may discriminate differently when items are positively or negatively worded. In this study, responses to
reverse-worded items are analyzed using the two models, and the estimated parameters are compared.
Item Response Theory Models for Ipsative Tests with Polytomous Multidimensional Forced-Choice Items
Xue-Lan Qiu and Wen-Chung Wang, The Hong Kong Institute of Education
Developments of IRT models for ipsative tests with dichotomous multidimensional forced-choice items have been
witnessed in recent years. In this study, we develop a new class of IRT models for polytomous MFC items. We conducted
simulation studies in variety of conditions to evaluate parameter recovery and provided an empirical example.
Multivariate Generalizability Theory and Conventional Approaches for Obtaining More Accurate
Disattenuated Correlations
Walter Vispoel, Carrie Morris and Murat Kilinc, University of Iowa
The standard approach for obtaining disattenuated correlations rests on assumptions easily violated in practice.
We explore multiple methods for obtaining disattenuated correlations designed to limit introduction of bias due
to assumption violations, including methods based on applications of multivariate generalizability theory and a
conventional alternative to such methods.
Comparing a Modified Alpha Coefficient to Split-Half Approaches in the LOFT Framework
Tammy Trierweiler, Law School Admission Council (LSAC); Charles Lewis, Educational Testing Service
In this study, the performance of a Modified Alpha coefficient was compared to split-half methods for estimating generic
reliability in a LOFT framework. Simulations across different ability distributions, sample sizes and ranges of item pool
difficulties were considered and results were compared to the corresponding theoretical population reliability.
Estimating Correlations Among School Relevant Categories in a Multidimensional Space
Se-Kang Kim, Fordham University; Joseph Grochowalski, College Board
The current study estimates correlations between row and column categories in a multidimensional space. The
contingency table being analyzed consists of New York school districts as row categories and school relevant
categories (e.g., attendance, safety,…, etc.) as column categories. To calculate correlations, the biplot paradigm
(Greenacre, 2010) is utilized
148
Washington, DC, USA
10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Invited Session, J1
Hold the Presses! How Measurement Professionals Can Speak More Effectively with
the Press and the Public (Education Writers Association Session)
Session Chairs: Kristen Huff, ACT
Laurie Wise, HumRRO, Emeritus
Lori Crouch, EWA
Session Panelists: Caroline Hendrie, EWA
David Hoff, Hager Sharp
Andrew Ho, Harvard Graduate School of Education
Anya Kamenetz, NPR
Sarah Sparks, Education Week
How can members of the press help advance the assessment literacy of the general public? Could we have
communicated better about the Common Core State Standards? Please join NCME for a panel session sponsored
jointly with the Education Writers Association (EWA), the professional organization of journalists that covers
education. In this panel discussion, EWA Executive Director Caroline Hendrie will lead a conversation with journalists
and academics about the role of measurement experts and the press in the modern media era, with all its political
polarization, sound bites, Twitter hashtags, and quotes on deadline. Approximately half the session will be reserved
for audience questions and answers, so please take advantage of this unique opportunity to discuss how we can
improve our communication about educational measurement.
149
10:35 AM - 12:05 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, J2
Challenges and Solutions in the Operational Use of Automated Scoring Systems
Session Chair: Su-Youn Yoon
Session Discussant: Klaus Zechner, ETS
An automated scoring system can assess constructed responses faster than human raters and at a lower cost.
These advantages have prompted a strong demand for high-performing automated scoring systems for various
applications. However, even state-of-the-art automated scoring systems face numerous challenges to their use
in operational testing programs. This session will discuss four important issues that may arise when automated
scoring systems are used in operational tests: features vulnerable to sub-group bias, accommodations for special
test taker groups with disabilities, the development of new tests using a novel input type, and the addition of
automated scoring to ongoing operational testing programs based only on human scoring. These issues may be
associated with problems that cause aberrant performance of automated scoring systems and result in weakening
the validity of automated scores. Also, the addition of machine scoring to prior all-human scoring may change the
score distribution and result in difficulty interpreting and maintaining the reported scale. We will analyze problems
associated with these issues and provide solutions. This session will demonstrate the importance of considering
validity issues at the initial stage of automated scoring system design in order to overcome these challenges.
Fairness in Automated Scoring: Screening Features for Subgroup Differences
Ji An, University of Maryland; Vincent Kieftenbeld and Raghuveer Kanneganti, McGraw-Hill Education CTB
Use of Automated Scoring in Language Assessments for Candidates with Speech Impairments
Heather Buzick, Educational Testing Service; Anastassia Loukina, ETS
A Novel Automatic Handwriting Assessment System Built on Touch-Based Tablet
Xin Chen, Ran Xu and Richard Wang, Pearson; Tuo Zhao, University of Missouri
Ensuring Scale Continuity in Automated Scoring Deployment in Operational Programs
Jay Breyer, Shelby Haberman and Chen Li, ETS
150
Washington, DC, USA
10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, J3
Novel Models to Address Measurement Errors in Educational Assessment and
Evaluation Studies
Session Chair: Kilchan Choi, CRESST/UCLA
Session Discussant: Elizabeth Stuart, Johns Hopkins
Measurement error issues adversely affect results obtained from typical modeling approaches used to analyze data
from assessment and evaluation studies. In particular, measurement error can weaken the validity of inferences from
student assessment data, reduce the statistical power of impact studies, and diminish the ability of researchers to
identify the causal mechanisms that lead to an intervention improving the desired outcome.
This symposium proposes novel statistical models to account for the impact of measurement error. The first paper
proposes a multilevel two-tier item factor model with latent change score parameterization in order to address
conditional exchangeability of participants that routinely accompanies analysis of multisite randomized experiments
with pre- and posttests. The second paper examines the consequence of correcting measurement errors in valueadded models to address a question on who are the teachers that are benefitting more than others in the result
of correcting measurement errors. The third paper proposes a multilevel latent variable plausible values approach
for more appropriately handling measurement error in predictors in multilevel modeling settings in which latent
predictors are measured by observed categorical variables. The last paper proposes a three-level latent variable
hierarchical model with a cluster-level measurement model using one-stage full information estimation approach.
On the Role of Multilevel Item Response Models in Multisite Evaluation Studies
Li Cai and Kilchan Choi, UCLA/CRESST; Megan Kuhfeld, UCLA
Consequence of Correcting Measurement Errors in Value-Added Models
Kilchan Choi, CRESST/UCLA; Yongnam Kim, University of Wisconsin
Handling Error in Predictors Using Multiple-Imputation/Mcmc-Based Approaches: Sensitivity of Results to
Priors
Michael Seltzer, UCLA; Jiseung Yang, University of Maryland
Three-Level Latent Variable Hierarchical Model with Level-2 Measurement Model
Kilchan Choi and Li Cai, UCLA/CRESST; Michael Seltzer, UCLA
151
10:35 AM - 12:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, J4
Mode Comparability Investigation of a CCSS Based K-12 Assessment
Session Chair: David Chayer, Data Recognition Corporation
Session Discussant: Debora Harris, ACT
Recent introduction of the Common Core State Standards and accountability legislation have brought extensive
attention to online administration of K-12 large scale assessments. In this coordinated session, a series of mode
comparability investigations on a K-12 assessment which uses various item types, such as multiple choice, technology
enhancement, and open ended items, is attempted in order to test three major comparability hypotheses of same
test factor structure, same measurement precision, and same score properties by applying various methods. A
presentation of most recent trends of the mode comparability studies on K-12 assessments will be followed by the
presentations of findings from the mode comparability hypotheses investigations mentioned above. Finally, results
via various equating methods are compared when a difference in difficulty exists in the two modes. This coordinated
session will contribute to the measurement field by providing a summary of the most recent mode comparability
studies, theoretical guidelines for mode comparability, and practical considerations for educators and practitioners.
Recent Trends of Mode Comparability Studies
Jong Kim, ACT
Comparison of OLT and PPT Structure
Karen Barton, Learning Analytics; Jungnam Kim, NBCE
Applying an IRT Method to Mode Comparability
Dong-In Kim, Keith Boughton and Joanna Tomkowicz, Data Recognition Corporation; Frank Rijiman, AAMC
Equating When Mode Effect Exists
Marc Julian, Dong-in Kim, Ping Wan and Litong Zhang, Data Recognition Corporation
152
Washington, DC, USA
10:35 AM - 12:05 PM, Meeting Room 16, Meeting Room Level, Paper Session, J5
Validating “Noncognitive”/Nontraditional Constructs II
Session Discussant: Andrew Maul, University of California, Santa Barbara
Using Response Times to Enhance Scores on Measures of Executive Functioning
Brooke Magnus, University of North Carolina at Chapel Hill; Michael Willoughby, RTI International; Yang Liu, University of
California, Merced
We propose a novel response time model for the assessment of executive functioning in children transitioning from
early to middle childhood. Using a model comparison approach, we examine the degree to which response times
may be analyzed jointly with response accuracy to improve the precision and range of ability scores.
A Structural Equation Model Replication Study of Influences on Attitudes Towards Science
Rajendra Chattergoon, University of Colorado, Boulder
This paper replicates and extends a structural equation model using data from the Trends in International
Mathematics and Science Study (TIMSS). Similar latent factor structure was obtained using TIMSS 1995 and 2011
data, but some items loaded on multiple factors. Three models fit the data equally well, suggesting multiple
interpretations.
Experimental Validation Strategies Using the Example of a Performance-Based Ict-Skills Test
Lena Engelhardt and Frank Goldhammer, German Institute for International Educational Research; Johannes Naumann,
Goethe University Frankfurt; Andreas Frey, Friedrich Schiller University Jena
Two experimental validation approaches are presented to investigate the construct interpretation of ability scores
using the example of a performance-based ICT (information and communication technology) -skills test. Constructrelevant task characteristics were manipulated experimentally, first, to change only the difficulty of items, and
second, to change also the tapped construct.
Measuring Being Bullied in the Context of Racial and Religious DIF
Michael Rodriguez, Kory Vue and Jose Palma, University of Minnesota
To address the measurement and relevance of novel constructs in education, a measure of being bullied is
anticipated to exhibit DIF on items about the role of race and religion. The scale is recalibrated to account for DIF
and compared vis-à-vis correlations, mean differences, and criterion-referenced levels of being bullied.
153
Differential Functioning - Theory and Applications
Session Discussant: Catherine McClellan, Clowder Consulting
Using the Partial Credit Model to Investigate the Comparability of Examination Standards
Qingping He and Michelle Meadows, Office of Qualifications and Examinations Regulation
This study explores the use of the Partial Credit Model (PCM) and differential step functioning (DSF) to investigate
the comparability of standards in examinations that test the same subjects but are provided by different assessment
providers. These examinations are used in the General Certificate of Secondary Education qualifications in England.
Handling Missing Data on DIF Detection Under the Mimic Model
Daniella Reboucas and Ying Cheng, University of Notre Dame
In detecting differential item functioning (DIF), mistreatment of missing data would inflate type I error and lower
power. This study examines DIF detection with the MIMIC model under the three missing mechanisms. Results
suggest that the full information maximum likelihood method works better than multiple imputation in this case.
Properties of Matching Criterion and Its Effect on Mantel-Haenszel DIF Procedure
Usama Ali, Educational Testing Service
This paper investigates the matching criterion used for Mantel-Haenszel DIF procedure. The goal of this paper is to
evaluate the robustness of DIF results due to less optimal conditions as reflected in number of items contributing to
the criterion score, number of score levels, and its reliability.
Impact of Differential Bundle Functioning on Test Performance of Focal Examinees
Kathleen Banks, LEAD Public Schools; Cindy Walker, University of Wisconsin-Milwaukee
The purpose of this study was to apply the Walker, Zhang, Banks, and Cappaert (2012) effect size criteria to bundles
that showed statistically significant differential bundle functioning (DBF) against focal groups in past DBF studies.
The question was whether the bundles biased the mean total scores for focal groups.
154
Washington, DC, USA
Latent Regression and Related Topics
Session Discussant: Matthias von Davier, ETS
Multidimensional IRT Calibration with Simultaneous Latent Regression in Large-Scale Survey Assessments
Lauren Harrell and Li Cai, University of California, Los Angeles
Multidimensional item response theory models, estimated simultaneously with latent regression models using an
adaptation of the Metropolis-Hastings Robbins-Monro algorithm, are applied to data from the National Assessment
of Educational Progress (NAEP) Science and Mathematics assessments. The impact of dimensionality on parameter
estimation and plausible values is investigated.
Single-Stage Vs. Two-Stage Estimation of Latent Regression IRT Models
Peter van Rijn, ETS Global; Yasmine El Masri, Oxford University Centre for Educational Assessment
Item and population parameters of PISA 2012 data are compared between a single-stage and a two-stage approach.
While item and population parameters remained similar, standard errors of population parameters were greater in
a single-stage approach. Similar results were observed when fitting univariate and multivariate models. Practical
implications are discussed.
Improving Score Precision in Large-Scale Assessments with the Multivariate Bayesian Lasso
Steven Culpepper, Trevor Park and James Balamuta, University of Illinois at Urbana-Champaign
The multivariate Bayesian Lasso (MBL) was developed for high-dimensional regression models, such as the
conditioning model in large-scale assessments (e.g., NAEP). Monte Carlo results document the gains in score
precision achieved when employing the MBL model versus Bayesian models that assume a multivariate normal
prior for regression coefficients.
Performance of Missing Data Approaches in Retrieving Group-Level Parameters
Steffi Pohl, Freie Universität Berlin; Carmen Köhler and Claus Carstensen, Otto-Friedrich-Universität Bamberg
We investigate the performance of different missing data approaches in retrieving group-level parameters (e.g.,
regression coefficients) that are usually of interest in large-scale assessments. Results show that ignoring missing
values performed almost equally well as model-based approaches for nonignorable missing data; both approaches
outperformed treating missing values as incorrect responses.
155
11:00 AM - 2:00 PM, Meeting Room 12, Meeting Room Level
Past Presidents Luncheon
By invitation only
156
Washington, DC, USA
12:25 PM - 1:55 PM, Meeting Room 8/9, Meeting Room Level, Invited Session, K1
The Every Students Succeeds Act (ESSA): Implications for Measurement Research and
Practice
Session Moderator: Martin West, Harvard Graduate School of Education
Session Presenters: Peter Oppenheim, Education Policy Director and Counsel, U.S. Senate
Committee on Health, Education, Labor, and Pensions (Majority)
Sarah Bolton, Education Policy Director, U.S. Senate Committee on
Health, Education, Labor, and Pensions (Minority)
Session Respondents: Sherman Dorn, Arizona State University
Marianne Perie, University of Kansas
John Easton, Spencer Foundation
The 2015 enactment of the Every Student Succeeds Act marked a major shift in federal education policy, allowing
states greater flexibility with respect to the design of school accountability systems while at the same time directing
them to incorporate additional performance metrics not based on test scores. In this session, key Congressional
staff involved in crafting the new law will describe its rationale and how they hope states will respond. A panel of
researchers will in turn consider the opportunities the law creates for innovation in and research on educational
measurement and the design of school accountability systems.
157
12:25 PM - 1:55 PM, Renaissance West A, Ballroom Level, Coordinated Session, K2
Career Paths in Educational Measurement: Lessons Learned by Accomplished
Professionals
Session Moderator: S E Phillips, Assessment Law Consultant
Session Panelists: Kathy McKnight, Pearson School Research; Joe Martineau, National Center
for the Improvement of Educational Assessment; Barbara Plake, University of Nebraska Lincoln, Emeritus
Deciding what you want to do when you become a measurement professional can be a daunting task for a masters
or doctoral student about to graduate. It can also be challenging for a graduate of a measurement program about
to begin a first job. Sometimes, graduate students see the work of accomplished measurement professionals and
wonder how they got there. Other times, graduate students know what they are interested in and the type of
measurement activity they would like to engage in, but are uncertain which settings or career paths will provide
the best fit.
Careers in educational measurement are many and varied. As graduate students consider their career options, they
must weight their skills, abilities, interests and preferences against the opportunities, expectations, demands and
advancement potential of various jobs and career paths. This session is designed to provide some food for thought
for these difficult decisions. It is targeted particularly at graduate students in measurement programs, graduates in
their first jobs and career changers within measurement.
158
Washington, DC, USA
12:25 PM - 1:55 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, K3
Recent Investigations and Extensions of the Hierarchical Rater Model
Session Chair: Jodi Casabianca, The University of Texas at Austin
Session Discussant: Brian Patterson, Questar Assessment
Rater effects in education testing and research have the potential to impact the quality of scores in constructed
response and performance assessments. The hierarchical rater model (HRM) is a multilevel item response theory
model for multiple ratings of behavior and performance that yields estimates of latent traits corrected for individual
rater bias and variability (Casabianca, Junker, & Patz, 2015; Patz, Junker, Johnson, & Mariano, 2002). This session reports
on some extensions and investigations of the basic HRM. The first paper serves as a primer to the session, providing
the basic HRM formulae and notation, as well as comparisons to competing models. The second paper focuses on a
parameterization of the longitudinal HRM that uses an autoregressive and/or moving average process in the estimation
of latent traits over time. The third paper discusses a multidimensional extension to the HRM to be used with rubrics
assessing more than one trait. The fourth paper evaluates HRM parameter estimates when the examinee population is
nonnormal, and demonstrates the use of flexible options for the Bayesian prior on the latent trait.
The HRM and Other Modern Models for Multiple Ratings of Rich Responses
Brian Junker, Carnegie Mellon University
The Longitudinal Hierarchical Rater Model with Autoregressive and Moving Average Processes
Mark Bond and Jodi Casabianca, The University of Texas at Austin; Brian Junker, Carnegie Mellon University
The Hierarchical Rater Model for Multidimensional Rubrics
Ricardo Nieto, Jodi Casabianca and Brian Junker, The University of Texas at Austin
Parameter Recovery of the Hierarchical Rater Model with Nonnormal Examinee Populations
Peter Conforti and Jodi Casabianca, The University of Texas at Austin
159
12:25 PM - 1:55 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, K4
The Validity of Scenario-Based Assessment: Empirical Results
Session Discussant: Brian Stecher, RAND
Scenario-based assessments are distinct from traditional tests in that the former present a unifying context with
which all subsequent questions are associated. Among other things, that context, or scenario, is intended to provide
a reasonably realistic setting and purpose for responding. The presence of the scenario should, at best, facilitate
valid, fair, and reliable measurement but, in no event should it impede such measurement. The facilitation of valid,
fair, and reliable measurement may occur because the scenario increases motivation and engagement, provides
background information to activate prior knowledge and make it more equal across students, or steps students
through warm-up problems that prepare them better for undertaking a culminating performance task.
Among the issues that have emerged with respect to scenario-based assessment are generalizability (e.g., students
less knowledgeable or interested in the particular scenario may be disadvantaged); local dependency (i.e., items
may be conditionally dependent, artificially inflating measurement precision); and scaffolding effects (e.g., the leadin tasks may help students perform better than they otherwise would).
This symposium will include three papers describing scenario-based assessments for K-12 reading, writing, and
science, as well as empirical results related to their validity, fairness, and reliability. Brian Stecher, of RAND, will be
the discussant.
Building and Scaling Theory-Based and Developmentally-Sensitive Scenario-Based Reading Assessments
John Sabatini, Tenaha O’Reilly, Jonathan Weeks and Jonathan Steinberg, ETS
Scenario-Based Assessments in Writing: An Experimental Study
Randy Bennett and Mo Zhang, ETS
SimScientists Assessments: Science System Framework Scenarios
Edys Quellmalz, Matt Silberglitt, Barbara Buckley, Mark Loveland, Daniel Brenner and Kevin (Chun-Wei) Huang, WestEd
160
Washington, DC, USA
12:25 PM - 1:55 PM, Meeting Room 5, Meeting Room Level, Paper Session, K5
Item Design and Development
Session Discussant: Ruth Childs, University of Toronto
A Mixed Methods Examination of Reverse-Scored Items in Adolescent Populations
Carol Barry and Haifa Matos-Elefonte, The College Board; Whitney Smiley, SAS
This study is a mixed methods exploration of reverse-scored items administered to 8th graders. The quantitative
portion examines the psychometric properties of a measure of academic perseverance. The qualitative portion uses
think aloud interviews to explore potential reasons for poor functioning of reverse-scored items on the instrument.
Effects of Writing Skill on Scores on Justification/Evaluation Mathematics Items
Tim Hazen and Catherine Welch, Iowa Testing Programs
Justification/Explanation (J/E) items in Mathematics require students to justify or explain their answers, often through
writing. This empirical study matches scores on J/E items with scores on Mathematics and Writing achievement tests
to examine 1) unidimensionality assumptions and 2) potentially unwanted effects on scores on tests with J/E items.
Economy of Multiple-Choice (mc) Versus Constructed-Response (cr) Items: Does Cr Always Lose?
Xuan-Adele Tan and Longjuan Liang, Educational Testing Service
This study will compare Multiple-Choice (MC) versus Constructed-Response (CR) items in different contents and of
different types in terms of cost and time for certain level of reliability. Results showed that CRs can have higher or
comparable reliabilities for certain contents. Results will help direct future test design effort.
Applying the Q-Diffusion IRT Model to Assess the Impact of Multi-Media Items
Nick Redell, Qiongqiong Liu and Hao Song, National Board of Osteopathic Medical Examiners (NBOME)
An application of the Q-diffusion IRT response process model to data from a timed, high-stakes licensure examination
suggested that multi-media items convey additional information to examinees above and beyond the time needed
to process and encode the item and that multi-media alters response processes for select examinees.
161
English Learners
Session Discussant: Michael Rodriguez, University of Minnesota
Using Translanguaging to Assess Math Knowledge of Emergent Bilinguals: An Exploratory Study
Alejandra Garcia and Fernanda Gandara, University of Massachusetts; Alexis Lopez, Educational Testing Services
There are persisting gaps in mathematics scores between ELs (English-learners) and non-ELs even with existing test
accommodations. Translanguaging considers that bilinguals have one linguistic repertoire from which they select
features strategically to communicate effectively. This study analyzed the performance of ELs on a math assessment
that included items translanguaging features.
Estimating Effects of Reclassification of English Leaners Using a Propensity Score Approach
Jinok Kim, Li Cai and Kilchan Choi, UCLA/CRESST
Reclassification of English Learners (ELs) should be based on their readiness for mainstream classrooms. Drawing
on propensity score methods, this paper estimates the effects of ELs’ reclassification on their subsequent academic
outcomes in one state. Findings suggest small but positive effects for students reclassified in grades 4, 5, and 6.
Comparability Study of Computer-Based and Paper-Based Tests for English Language Learners
Nami Shin, Mark Hansen and Li Cai, University of California, Los Angeles/ National Center for Research on Evaluation,
Standards, and Student Testing (CRESST)
The purpose of this study is to examine the extent to which English Language Learner (ELL) status interacts with
mode of test administration on large-scale, end-of-year content assessments. Specifically, we examine whether
differences in item performance or functioning across Computer-based and Paper-based administrations are similar
for ELL and non-ELL students.
Applying Hierarchical Latent Regression Models in Cross Lingual Assessment
Haiyan Lin and Xiaohong Gao, ACT, Inc.
This study models the variation of examinees’ performance across groups and interaction effect between group
and person variables by applying 2- and 3-level hierarchical latent regression model in cross lingual assessments.
Simulation uses empirical estimates of two real datasets and explores different sample sizes, test lengths, and theta
distributions.
162
Washington, DC, USA
Differential Item and Test Functioning
Session Discussant: Dubravka Svetina, Indiana University
Examining Sources of Gender DIF Using Cross-Classified Multilevel IRT Models
Liuhan Cai and Anthony Albano, University of Nebraska–Lincoln
An understanding of the sources of DIF can lead to more effective test development. This study examined gender
DIF and its relationship with item format and opportunity to learn using cross-classified multilevel IRT models fit to
math achievement data from an international dataset. Implications for test development are discussed.
Comparing Differential Test Functioning (dtf) for Dfit Mantel-Haenszel/Liu-Agresti Variance
C. Hunter and T. Oshima, Georgia State University
Using simulated data, DTF was calculated using DFIT and the Mantel-Haenszel/Liu-Agresti variance method. DFIT
results show unacceptable Type I error rate for DIF conditions with unequal sample sizes, but no susceptibility to
distributional differences. The variance method showed expected high rates of DTF, being especially sensitive to
distributional differences.
When Can MIRT Models Be a Solution for Dif?
Yuan-Ling Liaw and Elizabeth Sanders, University of Washington
The present study was designed to examine whether multidimensional item response theory (MIRT) models
might be useful in controlling for differential item functioning (DIF) when estimating primary ability, or whether
traditional (and simpler) unidimensional item response theory (UIRT) models with DIF items removed are sufficient
for accurately estimating primary ability.
Power Formulas for Uniform and Non-Uniform Logistic Regression DIF Tests
Zhushan Li, Boston College
Power formulas for the popular logistic regression tests for uniform and non-uniform DIF are derived. The formulas
provide a means for sample size calculations in planning DIF studies with logistic regression DIF tests. Factors
influencing the power are discussed. The correctness of the power formulas is confirmed by simulation studies.
Detecting Group Differences in Item Response Processes: An Explanatory Speed-Accuracy Mixture Model
Heather Hayes, AMTIS Inc.; Stephen Gunter and Sarah Morrisey, Camber Corporation; Michael Finger, Pamela Ing and Anne
Thissen-Roe, Comira
For the purpose of assessing construct validity, we extend previous conjoint speed-accuracy models to
simultaneously examine a) the impact of cognitive components on performance for verbal reasoning items and
b) how these effects (i.e., response processes) differ among groups who vary in educational breadth and depth.
163
Session, Paper Session, K8
Electronic Board #1
Extension of the Lz* Statistic to Mixed-Format Tests
Sandip Sinharay, Pacific Metrics Corp
Snijders (2001) suggested the lz* statistic that is a popular IRT-based person fit statistic (PFS). However, lz* can be
computed for tests including only dichotomous items and has not been extended to mixed-format tests. This paper
extends lz* to mixed-format tests.
Electronic Board #2
Examining Two New Fit Statistics for Dichotomous IRT Models
Leanne Freeman and Bo Zhang, University of Wisconsin, Milwaukee
This study introduces the Clarke and Vuong statistics for assessing model-data fit for dichotomous IRT models. Monte
Carlo simulations will be conducted to examine the Type I error and power of the two statistics. Their performance
will be compared to the likelihood ratio test, which most researchers use currently.
Electronic Board #3
Automated Marking of Written Response Items in a National Medical Licensing Examination
Maxim Morin, André-Philippe Boulais and André De Champlain, Medical Council of Canada
Automated essay scoring (AES) offers a promising alternative to human scoring for the marking of constructedresponse type items. Based on real data, the present study compared several AES conditions for scoring shortanswer CR items and evaluated the impact of using AES on the overall statistics of a sample examination form.
Electronic Board #4
Evaluating Automated Rater Performance: Is the State of the Art Improving?
Michelle Boyer, University of Massachusetts, Amherst; Vincent Kieftenbeld, Pacific Metrics
This study evaluates multiple automated raters across four different automated scoring studies to assess whether
the state of the art in automated scoring is advancing. Beyond an item by item evaluation, the method used here
investigates automated rater performance across many items.
Electronic Board #5
Test-Taking Strategies and Ability Estimates in a Speeded Computerized Adaptive Test
Hua Wei and Xin Li, Pearson
This study compares ability estimates of examinees using different test-taking strategies towards the end of a
computerized adaptive test (CAT) when they are unable to finish the test within the allotted time. Item responses will
be simulated for fixed-length CAT administrations with different test lengths and different degrees of speededness.
Electronic Board #6
Detecting Cheating When Examinees and Accomplices Are Not Physically Co-Located
Chi-Yu Huang, Yang Lu and Nooree Huh, ACT.Inc.
A simulation study will be conducted to examine the efficiency of different statistics in detecting cheating among
examinees who are physically in different locations but share highly similar item responses. Different statistics that
will be investigated include a modified ω index, l_z index, H^T index, score estimation, and score prediction.
164
Washington, DC, USA
Electronic Board #7
Detecting Differential Item Functioning (dif) Using Boosting Regression Tree
Xin Luo and Mark Reckase, Michigan State University; John Lockwood, ETS
A classification method in data mining known as boosting regression tree (BRT) was applied to identify the items
with DIF in a variety of test situations, and the effectiveness of this new method was compared with other DIF
detection procedures. The results supported the quality of the BRT method.
Electronic Board #8
Using Growth Mixture Modeling to Explore Test Takers’ Score Change Patterns
Youhua Wei, Educational Testing Service
For a large-scale and high-stakes testing program, some examinees take the test more than once and their score
change patterns vary across individuals. This study uses latent class and growth mixture modeling to identify
unobserved sub-populations and explore different latent score change patterns among repeaters in a testing
program.
Electronic Board #9
Studies of Growth in Reading in a Vertically Equated National Reading Test
David Andrich and Ida Marais, University of Western Australia
Australia’s yearly reading assessments for all Year 3, 5, 7 and 9 students are equated vertically. The rate of increase of
the worst performing state is greater than that of the best performing one. The former’s efforts to improve reading
may be missed if mean achievements alone were compared.
Examining the Impact of Longitudinal Measurement Invariance Violations on Growth Models
Kelli Samonte, American Board of Internal Medicine; John Willse, University of North Carolina Greensboro
Longitudinal analyses rely on the assumption that scales function invariantly across measurement occasions.
Minimal research has been conducted to evaluate the impact longitudinal measurement invariance violations have
on latent growth models (LGM). The current study aims to examine the impact varying degrees of longitudinal
invariance violations have on LGM parameters.
Defining On-Track Towards College Readiness Using Advanced Latent Growth Modeling Techniques
Anthony Fina, Iowa Testing Programs, University of Iowa
The primary purpose of this exploratory study was to investigate growth at the individual level and examine how
individual variability in growth is related to college readiness. Growth mixture models and a latent class growth
analysis were used to define developmental trajectories from middle school through high school.
Impact of Sample Size and the Number of Common Items on Equating
Hongyu Diao, Duy Pham and Lisa Keller, University of Massachusetts-Amherst
Three methods of small sample equating in the non-equivanlent groups anchor test design are investigated in
this simulation study: circle-arc, nominal weights mean equating, and Rasch equating. Results indicate that in the
presence of small samples, increasing the number of equating items might help mitigate the error.
165
Effect of Test Speededness on Item Parameter Estimation and Equating
Can Shao, University of Notre Dame; Rongchun Zhu and Xiaohong Gao, ACT
Test speededness often leads to biased parameter estimates and produces inaccurate equated scores, thus
threatens test validity. In this study, we compare three different methods of dealing with test speededness and
investigate their impact on item parameter estimation and equating.
Computation of Conditional Standard Error of Measurement with Compound Multinomial Models
Hongling Wang, ACT, Inc.
Compound multinomial models have been used to compute conditional standard error of measurement (CSEM) for
tests containing polytomous scores. One problem hindering applications of these models is the great amount of
computation for tests with complex item scoring. This study investigates strategies to simplify CSEM computation
with compound multinomial models.
Exploring the Within-Item Speed-Accuracy Relationship with the Profile Method for Computer-Based Tests
Shu-chuan Kao, Pearson
The purpose of this study is to describe the effect of time on the item-person interaction for computer-based tests.
The profile method shows the subgroup item difficulty conditioned on item latency. The profile trend can help
testing practitioners easily inspect the effect of response time in empirical data.
Impact of Items with Minor Drift on Examinee Classification
aijun wang, Yu zhang and Lorin Mueller, Federation of state boards of physical therapy
This study examined the impact of items with minor drift on examinee’s classification accuracy at different levels of
abilities. Results show the pass/fail status of examinees at medium ability levels are more affected than high or low
ability levels.
Detecting DIF on Polytomous Items of Tests with Special Education Populations
Kwang-lee Chu and Marc Johnson, Pearson; Pei-ying Lin, University of Saskatchewan
Disability affects performance and interacts with gender/ethnicity; its impacts are more of ability differences and
should be isolated from DIF analysis. The effects of disability on polytomous item DIF analysis are examined. This
study uses empirical data and simulations investigating accuracy of DIF models.
Online Calibration of Polytomous Items Using the Generalized Partial Credit Model
Yi Zheng, Arizona State University
Online calibration is a technology-enhanced calibration strategy that dynamically embeds pretest items in
operational computerized adaptive tests and utilizes known operational item parameters to calibrate the pretest
items. This study extends existing online calibration methods for dichotomous IRT models to GPCM to model
polytoums items such as performance-based items.
166
Washington, DC, USA
Identifying Intra-Individual Significant Growth in K-12 Reading and Mathematics with Adaptive Testing
Chaitali Phadke, David Weiss and Theodore Christ, University of Minnesota
Psychometrically significant intra-individual change in K-12 Math and Reading achievement was measured using
the fixed-length (30-item) Adaptive Measurement of Change (AMC) method. Analyses indicated that the majority
of change was nonlinear. Results supported the use of the AMC procedure for the detection of psychometrically
significant change.
A Comparison of Estimation Techniques for IRT Models with Small Samples
Holmes Finch, Ball State University; Brian French, Washington State University
Estimation accuracy of item response theory (IRT) model parameters is a concern with small samples. This can
preclude the use of IRT and associated advantages with low incidence populations. This simulation study compares
marginal maximum likelihood (ML) and pairwise estimation procedures. Results support the accuracy of pairwise
estimation over ML.
Comparing Three Procedures for Preknowledge Detection in Computerized Adaptive Testing
Jin Zhang and Ann Wang, ACT Inc.
One classical and two Bayesian procedures of item preknowledge detection based on the hierarchical lognormal
response time model are compared for computerized adaptive testing. A simulation study is conducted to
investigate the effectiveness of the methods in conditions with various proportions of items and examinees affected
by item preknowledge.
Small Sample Equating for Different Uses of Test Scores in Higher Education
HyeSun Lee, University of Nebraska-Lincoln; Katrina Roohr and Ou Lydia Liu, Educational Testing Service
The current simulation examined four equating methods for small samples depending on the use of test scores
in higher education. Mean equating performed better for the estimation of institution-level reliability, whereas
identity equating performed slightly better for the estimation of value-added scores. The paper addresses practical
implications of the findings.
Diagnostic Classification Modeling in Student Learning Progression Assessment
Ruhan Circi, University of Colorado Boulder; Nathan Dadey, The National Center for the Improvementof Educational
Assessment, Inc
A diagnostic classification model is used in this study to model a learning progression assessment. Results provided
evidence for the moderate item quality. There is found support for the use of learning progression in the classroom
to help students to gain mastery at least in one of learning outcomes.
167
2:15 PM - 3:45 PM, Renaissance West A, Ballroom Level, Invited Session, L1
Learning from History: How K-12 Assessment Will Impact Student Learning Over the
Next Decade (National Association of Assessment Directors)
Session Organizer: Mary E Yakimowski, Sacred Heart University
Session Panelists: Kenneth J Daly III
Dale Whittington, Shaker Heights Schools
Lou Fabrizio, North Carolina Department of Public Instruction
Carlos Martínez, Jr, U.S. Department of Education
James H McMillan, Virginia Commonwealth University
Eva Baker, University of California, Los Angeles
We have seen a remarkable evolution in the field of K-12 student assessment over the past 50 years. This increased
attention has increased student learning, or has it? Through this invited session, you will hear panelists sharing
insight from our history on K-12 assessment to offer learnings to best design and utilize assessment results that truly
deepen student learning over this next decade.
More specifically, this invited session brings together panelists representing practitioners (Mr. Kenneth J. Daly III,
Dr. Dale Whittington), state and federal government agencies (Dr. Louis M. Fabrizio, Dr. Carlos Martinez) and higher
education institutions (Dr. James H. McMillan, Dr. Eva Baker) with a combined experience in assessment of over 150
years.
For the introductory portion of this session, panelists have been charged with sharing reflections on significant
developments in K-12 student assessment from the last half century. They will do this by reconstructing their
collective memory of this assessment history. The major portion of the session will be allotted to the second charge
given to the panelists; specifically, to present and discuss some learnings gained from this history to better construct
and use assessments that are geared to deepen student learning during this next decade. The last part of this session
will allow for interactions among the panelists and the audience on improving learning through assessments.
168
Washington, DC, USA
2:15 PM - 3:45 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, L2
Psychometric Issues on the Operational New-Generation Consortia Assessments
Session Discussant: Timothy Davey, Educational Testing Services
Theoretical foundation of online (adaptive and non-adaptive) testing has been historically well established. Basic
components of computerized adaptive test (CAT) procedures and their implementations have also been sufficiently
investigated with options from various perspectives (Weiss and Gage, 1984; Way, 2005; Davey, 2011). However, new
and practical psychometric issues arose as online assessments moved to large-scale operational testing practices.
Particularly, newly-developed new-generation Common Core State Standards (CCSS) aligned assessments were
operationalized to a number of states. Psychometric designs affected by these changes including scoring strategies,
IRT model selections, and vertical scales, may have impact on the validity of test scores. Furthermore, a complex
test design with both CAT and performance task was used for these CCSS-aligned assessments. The assessments
also include innovative items, in addition to traditional dichotomous and polytomous items. Therefore, findings and
solutions from previous research may not be directly applicable for some issues mentioned above regarding the
operational online assessments. Innovative psychometric analyses and solutions are required.
This session discusses the following important practical psychometric issues addressed in the first-year operational
practice of the newly-developed new-generation Common Core State Standards (CCSS) aligned assessments,
including (1) how to score an incomplete computerized adaptive test (CAT), (2) how to achieve an optimal balance
between content/administration constraints and CAT efficiency in the assessment designs for accurate ability
estimates? (3) which type of IRT models (unidimensional or multidimensional) produces more robust vertical scales
in measuring student ability and growth? Three studies explore these questions using different psychometric and
statistical methods based on operational data from multiple states or simulations. Analyses and findings are not
only useful in validating the characteristics of the assessments for future improvement, but will also inspire more
investigations in these areas that have not been fully explored yet.
Psychometric Issues and Approaches in Scoring Incomplete Online-Adaptive Tests
Yi Du, Yanming Jiang, Terran Brown and Timothy Davey, Educational Testing Service
Effects of CAT Designs on Content Balance and the Efficiency of Test
Shudong Wang, Northwest Evaluation Association; Hong Jiao, University of Maryland
Multidimensional Vertical Scaling for Tests with Complex Structures and Various Growth Patterns
Yanming Jiang, Educational Testing Service
169
2:15 PM - 3:45 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, L3
Issues and Practices in Multilevel Item Response Models
Session Chair: Ji Seung Yang, University of Maryland
Session Discussant: Li Ci, University of California
Educational assessment data are often collected under complex sampling designs that result in unavoidable
dependency among examinees within clusters such as classrooms or schools. The multilevel item response theory
models (MLIRT) have been developed (e.g., Adams, Wilson, and Wu, 1997; Fox, 2005; Kamata, 2001) to address
the nested structure of item response data more properly and to draw more sound statistical inferences for both
within- and between-cluster level estimates (e.g., interclass correlation or cluster-level latent scores). Combined with
multidimensionality or local dependency among item responses (e.g., testlet), the complexity of multilevel item
response models has increased and drawn many methodologists’ attention with respect to the issues and practices
that cover not only modeling but also scoring and choosing models. The purpose of this coordinated session is to
introduce recent advanced topics in MLIRT and provide more practical guidance to practitioners to implement some
of the extended MLIRT models. The session is composed of five papers. The first two papers are concerned about
MLIRT models that reflect complex sampling designs properly, and the second two papers focus on the distribution
of latent density and scoring at between-cluster level. Finally, the last paper is on model selection methods in MLIRT.
Multilevel Cross-Classified Dichotomous Item Response Theory Models for Complex Person Clustering
Structures
Chen Li and Hong Jiao, University of Maryland
Multilevel Item Response Models with Sampling Weights
Xiaying Zheng and Ji Seung Yang, University of Maryland
School-Level Subscores Using Multilevel Item Factor Analysis
Megan Kuhfeld and Li Cai, University of California
Multilevel Item Bifactor Models with Nonnormal Latent Densities
Ji Seung Yang, Ji An and Xiaying Zheng, University of Maryland
Model Selection Methods for Mlirt Models: Gaining Information from Different Focused Parameters
Xue Zhang and Jian Tao, Northeast Normal University; Chun Wang, University of Minnesota
170
Washington, DC, USA
Psychometric Issues in Alternate Assessments
Session Chair: Okan Bulut, University of Alberta
Session Discussant: Michael Rodriguez, University of Minnesota
Alternate assessments are designed for students with significant cognitive disabilities. They are characterized by
semi-adaptive test designs, testlet-based forms, small sample sizes, and negatively skewed ability distributions.
This symposium aims to reflect the common psychometric challenges in the context of alternate assessments, such
as local item dependence (LID), differential item functioning (DIF), testlet and position effects, and the impact of
cumulative item parameter drift (IPD). The alternate assessments used in this proposal are mixed-format tests that
consist of both dichotomous and polytomous items.
The first study explores the advantages of a four-level measurement model (1– item effect, 2–testlet effect, 3–person
effect, and 4–disability type effect) in investigating local item dependence caused by item clustering and local
person dependence caused by person clustering over models that cannot handle them simultaneously.
The second study employs the Linear Logistic Test Model (LLTM) to examine the consequences of item position and
testlet position effects in alternate assessments. The use of LLTM for investigating position effects in a semi-adaptive
test form is demonstrated.
The third study quantifies the advantages of three bi-factor models that take the testlet-based item structure into
account and compares them with the 2PL IRT model. In addition, DIF analysis based on each model included in the
study is conducted, which helps understanding the differences of the models in the context of DIF.
The last study examines the cumulative impact of item parameter drift on item parameter and student ability estimates.
It includes a Monte Carlo simulation for each operational administration in five states across three to nine years. Results
from simulations and operational testing are compared. Effects of different equating methods are also compared.
Multilevel Modeling of Item and Person Clustering Simultaneously in Alternate Assessments
Chao Xie and Hyesuk Jang, American Institutes for Research
Examining Item and Testlet Position Effects in Computer-Based Alternate Assessments
Okan Bulut, University of Alberta; Xiaodong Hou and Ming Lei, American Institutes for Research
An Application of Bi-Factor Model for Examining DIF in Alternate Assessments
Hyesuk Jang and Chao Xie, American Institutes for Research
Impact of Cumulative Drift on Parameter and Ability Estimates in Alternate Assessments
Ming Lei, American Institutes for Research; Okan Bulut, University of Alberta
171
Recommendations for Addressing the Unintended Consequences of Increasing
Examination Rigor
Session Discussant: Betsy Becker, Florida State University
The purpose of this symposium is to present findings from all development activities since the RTTT and address
the unintended consequences of increasing examination rigor. The findings from the past 5 years of FTCE/FELE
development, scoring, reporting, and standard setting procedures and outcomes will be presented. First, The FTCE/
FELE program initiatives, as well as policy changes and outcomes that have occurred as a result of the increase
in examination rigor will be presented. Second, the current study will draw an overview picture of the 1.5-2 year
development cycle for the FTCE/FELE program and provide an in-depth explanation of the facilitation of each
step in the test development process, based on the Standards for Educational and Psychological Testing. Third, the
current psychometric, scoring and reporting, standard setting, and passing scores adoption processes for the FTCE/
FELE program will be discussed. Lastly an overview picture of educator candidates’ performance and in response to
examinations’ increased rigor will be discussed and analysis of student-level and test-level data will be presented
to answer: What is the impact of increased rigor on average difficulty of tests? Does increased rigor have significant
impact on test takers’ performances? Does increased rigor have significant impact on passing rates?
The Effect of Increased Rigor on Education Policy
Phil Canto, Florida Department of Education
Developing Assessments in an Ongoing Testing Environment
Lauren White, Florida Department of Education
FTCE/FELE Standard Setting and New Passing Scores: The Methodology
Süleyman Olgar, Florida Department of Education
Increased Rigor and Its Impact on Certification Examination Outcomes
Onder Koklu, Florida Department of Education
172
Washington, DC, USA
2:15 PM - 3:45 PM, Meeting Room 15, Meeting Room Level, Paper Session, L6
Innovations in Assessment
Session Discussant: TBA
Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling
Stephen Holmes, Michelle Meadows, Ian Stockford and Qingping He, Office of Qualifications and Examinations Regulation
This research explores a new approach, the comparative judgement and Rasch modelling approach, to investigate
the comparability of difficulty of examinations. Findings from this study suggests that this approach could
potentially be used as a proxy for pretesting assessments when security or other issues are a major concern.
Improvements in Automated Capturing of Psycho-Linguistic Features in Readingassessment Text
Makoto Sano, Prometric
This study explores psycho-linguistic features associated with reading passage MC item types that can be used to
predict item difficulty levels of these item types. The effectiveness of new functions on NLP tool, PLIMAC (Sano,
2015) is evaluated in use of items from the NAEP Grade 8 Reading assessment.
Generating Rubric Scores from Pairwise Comparisons
Shayne Miel, Elijah Mayfield and David Adamson, Turnitin; Holly Garner, EverEd Technology
Using pairwise comparisons to score essays on a holistic rubric is potentially a more reliable scoring method than
traditional handscoring. We establish a metric for measuring the reliability of a scoring process and explore methods
for assigning discrete rubric scores to the ranked list induced by the pairwise comparisons.
Investigating Sequential Item Effects in a Testlet Model
William Muntean and Joe Betts, Pearson
Scenario-based assessments are well-suited for measuring professional decision-making skills such as clinical
judgment. However, these types of items present a unique challenge to a testlet-based model because of potential
sequential item effects. This research investigates the impact of sequential item effects within a testlet model.
173
2:15 PM - 3:45 PM, Meeting Room 12, Meeting Room Level, Paper Session, L7
Technology-Based Assessments
Session Discussant: Mengxiao Zhu, ETS
Theoretical Framework for Log-Data in Technology-Based Assessments with Empirical Applications from PISA
Ulf Kroehne, Heiko Rölke, Susanne Kuger, Frank Goldhammer and Eckhard Klieme, German Institute for International
Educational Research (DIPF)
Indicators derived from log-data are often based on the ad hoc use of available events due to the missing definition
of log-data completeness. This gap is filled with a theoretical framework that formalizes technology-based
assessments with finite-state machines and provides completeness conditions, illustrated with empirical examples
from PISA assessments.
Investigating the Relations of Writing Process Features and the Final Product
Chen Li, Mo Zhang and Paul Deane, Educational Testing Service
Features extracted from the writing processes such as latency between keypresses have potential to provide
evidence of one’s writing skills not available from the final product. This study investigates and compares the
relations of process features with text quality as measured by two rubrics on writing fundamentals and higher-level
skills.
Interpretation of a Complex Assessment Focusing on Validity and Appropriate Reliability Assessment
Steffen Brandt, Art of Reduction; Kristina Kögler, Goethe-Universität Frankfurt; Andreas Rausch, Universität Bamberg
An analysis approach combining qualitative analyses of answer patterns and quantitative, IRT-based analyses is
demonstrated on data from a test composed of three computer-based problem solving tasks (each 30-45 minutes).
The strong qualitative component increases validity and additionally yields appropriate reliability estimates by
avoiding local item dependence.
Award Session: Brenda Loyd Dissertation Award 2016: Youn-Jeng Choi
174
Washington, DC, USA
2:15 PM - 3:45 PM, Meeting Room 13/14, Meeting Room Level, Invited Session, L8
NCME Diversity and Testing Committee Sponsored Symposium: Implications of
Computer-Based Testing for Assessing Diverse Learners: Lessons Learned from the
Consortia
Session Moderator: Priya Kannan, Educational Testing Service
Session Discussant: Bob Dolan, Diverse Learners Consulting
Six consortia developed and operationally delivered next-generation, large-scale assessments in 2015. These efforts
provided opportunities to re-think the ways that assessment systems, and in particular computer-based tests,
are designed to support valid assessment for all learners. In this session, representatives from each consortium
will describe their lessons learned in the administration of computer-based tests to diverse learners. Topics will
include design features of the assessment systems that are intended to promote effective and inclusive assessment,
research and evaluation on the 2014-15 assessment administration, and future challenges and opportunities
Smarter Balanced Assessment Consortium (SBAC)
Tony Alpert, Smarter Balanced Assessment Consortium
Partnership for Assessment of Readiness of College and Careers (PARCC)
Trinell Bowman, Prince George’s County Public Schools in Maryland
National Center and State Collaborative (NCSC)
Rachel Quenemoen, National Center on Educational Outcomes
Dynamic Learning Maps Alternate Assessment System (DLM)
Russell Swinburne Romine, University of Kansas
English Language Proficiency Assessment for the 21st Century (ELPA21)
Martha Thurlow, National Center on Educational Outcomes
WIDA
Carsten Wilmes, University of Wisconsin
175
3:00 PM - 7:00 PM, Meeting Room 10/11, Meeting Room Level
NCME Board of Directors Meeting
Members of NCME are invited to attend as observers
176
Washington, DC, USA
4:05 PM - 6:05 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, M1
Fairness Issues and Validation of Non-Cognitive Skills
Session Chair: Haifa Matos-Elefonte, The College Board
Session Discussant: Patrick Kyllonen, Educational Testing Service
More research and attention are needed to ensure assessments of noncognitive skills provide fair and valid
inferences for all examinees. Four presenters will offer perspectives on non-cognitive skills and the issues of
fairness of assessing them in four contexts. The first presenter will discuss non-cognitive factors within the context
of an international assessment offering a framework to handle the interplay of cultural and linguistic diversity in
developing the assessment to ensure fairness and valid interpretations for all test takers. The second presenter will
provide an overview of non-cognitive skills in K-12 settings with thoughts on the issues surrounding the various
threats to fair and valid interpretations. The third presenter will extend the evidence-centered-design approach
to capture the needs of culturally and linguistically diverse populations in the design and development of a
noncognitive assessment used in higher education, so as to ensure the fairness and validity of inferences for all
examinees. The fourth presentation will provide an overview of the fairness issues involving non-cognitive measures
in personnel selection and discuss specific aspects that permit these assessments to be used in fair and valid ways.
Finally, a discussant will provide some comments on each of the presentations and offer additional insights.
Non-Cognitive Factors, Culture, and Fair and Valid Assessment of Culturally And-Linguistically-Diverse
Learners
Edynn Sato, Pearson
Some Thoughts on Fairness Issues in Assessing Non-Cognitive Skills in K-12
Thanos Patelis, Center for Assessment
An Application of Evidence-Centered-Design to Assess Collaborative Problem Solving in Higher Education
Maria Elena Oliveri, Robert Mislevy and Rene Lawless, Educational Testing Service
The Changing Use of Non-Cognitive Measures in Personnel Selection
Kurt Geisinger, Buros Center for Testing, University of Nebraska-Lincoln
177
4:05 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, M2
Thinking About Your Audience in Designing and Evaluating Score Reports
Session Chair: Priya Kannan, Educational Testing Service
Session Discussant: April Zenisky, University of Massachusetts, Amherst
The information presented in score reports is often the single-most important point of interaction between a score
user and the outcomes of an assessment. Score reports are consumed by a variety of score users (e.g., test takers,
parents, teachers, administrators, policy makers), and each of these users have different levels of understanding of
the assessment and its intended outcomes. The degree to which these diverse users understand the information
presented in score reports impacts their ability to draw reasonable conclusions. Recent score reporting frameworks
have highlighted the importance of taking into account the needs, pre-existing knowledge, and attitudes of specific
stakeholder groups (Zapata-Rivera & Katz, 2014) as well as the importance of iterative design in the development
of score reports (Hambleton & Zenisky, 2013). The papers in this session employ a variety of methods to identify
and understand the needs of diverse stakeholder groups, and studies highlight the importance of sequential and
iterative approaches (i.e., assessing needs – prototyping – evaluating usability and accuracy of understanding) to
the design and development of audience-focused score reports. These collection of studies demonstrate how a
focus on stakeholder needs can bring substantive gains for the validity of interpretations and decisions made from
assessment results.
Designing and Evaluating Score Reports for a Medical Licensing Examination
Amanda Clauser, National Board of Medical Examiners; Francis Rick, University of Massachusetts, Amherst
Evaluating Validity of Score Reports with Diverse Subgroups of Parents
Priya Kannan, Diego Zapata-Rivera and Emily Leibowitz, Educational Testing Service
Designing Alternate Assessment Score Reports: Implications for Instructional Planning
Amy Clark, Meagan Karvonen and Neal Kingston, University of Kansas
Interactive Score Reports: a Strategic and Systematic Approach to Development
Richard Tannenbaum, Priya Kannan, Emily Leibowitz, Ikkyu Choi and Spyridon Papageorgiou, Educational Testing Service
Data Systems and Reports as Active Participants in Data Analyses
Jenny Rankin, Illuminate Education
178
Washington, DC, USA
4:05 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, M3
Use of Automated Tools in Listening and Reading Item Generation
Session Chair: Su-Youn Yoon, ETS
Session Discussant: Christy Schneider, Center for Assessment
Creating a large pool of valid items with appropriate difficulty has been a continuing challenge for testing programs.
In order to address this need, several studies have focused on developing automated tools to predict the complexity
of passages for reading or listening items. In addition to predicting text complexity, automated technologies can be
used in a variety of ways in the context of item generation, which may contribute to increased efficiency, validity,
and reliability in item development. This coordinated session will investigate the use of automated technology to
support a wide range of processes for generating items that assess listening and reading skills.
Aligning the Textevaluator Reporting Scale with the Common Core Text Complexity Scale
Kathleen Sheehan, ETS
Prediction of Passage Acceptance/ Rejection Using Linguistic Information
Swapna Somasundaran, Yoko Futagi, Nitin Madnani, Nancy Glazer, Matt Chametsky and Cathy Wendler, ETS
Measuring Text Complexity of Items for Adult English Language Learners
Peter Foltz, Pearson and University of Colorado Boulder; Mark Rosenstein, Pearson
Automatic Prediction of Difficulty of Listening Items
Su-Youn Yoon, Anastassia Loukina, Youhua Wei and Jennifer Sakano, ETS
Item Generation Using Natural Language Processing Based Tools and Resources
Chong Min Lee, Melissa Lopez, Su-Youn Yoon, Jenifer Sakano, Anastassia Loukina, Bob Krovetz and Chi Lu, ETS
179
4:05 PM - 6:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, M4
Practical Issues in Equating
Session Discussant: Dongmei Li, ACT
Empirical Item Characteristic Curve Pre-Equating with the Presence of Test Speededness
Yuxi Qiu and Anne Huggins-Manley, University of Florida
This simulation study is proposed to evaluate the accuracy of the empirical item characteristic curve (EICC) preequating method under combinations of varied levels of test speededness, sample size, and test length. Findings
of this research provide guidelines for practitioners, and further stimulate a better practice toward score equating.
Investigating the Effect of Missing and Speeded Responses in Equating
Hongwook Suh, JP Kim and Tony Thompson, ACT, inc.
This study investigates the effect of dealing with examinees who showed omitted and speeded responses on
equating results by applying lognormal response time model (van der Linden, 2006). Empirical data are manipulated
to design practical situations considered in the equating procedures.
The Effects of Non-Representative Common Items on Linear Equating Relationships
Lu Wang, ACT, Inc./The University of Iowa; Won-Chan Lee, University of Iowa
This study investigates the effects of both content and statistical representation of common items on the accuracy
of four linear equating relationships. The results of this study will assist practitioners in choosing the most accurate
linear equating method(s) when the representativeness of common items is a concern.
Pseudo-Equating Without Common Items or Common Persons
Nooree Huh, Deborah Harris and Yu Fang, ACT, Inc.
In some high stakes testing programs, it is not possible to conduct standard equating such as common item or
random groups equating because once an item is exposed, it is no longer secure. However, the need to compare
scores across administrations may still exist. This paper demonstrates some alternative approaches.
Equating Item Difficulty Under Sub-Optimal Conditions
Michael Walker, The College Board; Usama Ali, Educational Testing Service
This paper evaluates two methods for equating item difficulty statistics: one using linear equating and the other
using post-stratification. The paper evaluates these methods in terms of bias and error across a range of sample sizes
and population ability differences; and across chains of equating of different length.
Impact of Drifted Common Items on Proficiency Estimates Under the Ciecp Design
Juan Chen, Andrew Mroch, Mengyao Zhang, Joanne Kane, Mark Connally and Mark Albanese, National Conference of Bar
Examiners
The authors explore the detection and impact of drifted common items on examinee proficiency estimates and
examinee classification. Two different detection methods, two approaches to setting item parameter estimates,
and two different linking methods are examined. Both practical and theoretical implications of the findings are
discussed.
180
Washington, DC, USA
The Great Subscore Debate
Session Discussant: Sandip Sinharay, Pacific Metrics
How Worthless Subscores Are Causing Excessively Long Tests
Howard Wainer and Richard Feinberg, National Board of Medical Examiners
Previous research overwhelmingly confirms the paucity of subscores worth reporting for either individuals or
institutions. Given the excessive length of most standardized tests, particularly licensure/credentialing, offered
without evidence to support reporting more than a single score, we illustrate an approach for reducing test length
and minimizing additional pass/fail misclassification.
An Alternative Perspective on Subscores and Their Value
Yuanchao Emily Bo, Mark Hansen and Li Cai, University of California, Los Angeles; Charles Lewis, Educational Testing
Service, Fordham University
Recent work has shown that observed subscores are often worse predictors of true subscores than the total score.
However, we propose here that it is the specific component of the subscore that should be used to judge its value.
From this perspective, we reach a quite different conclusion.
Masking Distinct and Reliable Subscores: A Call to Assess Added Value Invariance
Joseph Rios, Educational Testing Service
Subscore added value is commonly assessed for the total sample; however, this study found that up to 30% of
examinees with added value can be masked when treating subscores as invariant across groups. Therefore, we
should consider that subscores may be valid and reliable for some examinees and not all.
Why Do Value Added Ratios Differ Under Different Scoring Approaches?
Brian Leventhal, University of Pittsburgh; Jonathan Rubright, American Institute of Certified Public Accountants
Using classical test theory, Haberman (2008) developed an approach to calculate whether a subscore has value
in being reported. This paper shows how value added ratios differ under item response theory, and provides an
empirical example showing how various scoring options under IRT impact this ratio.
Accuracy of the Person-Level Index for Conditional Subscore Reporting
Richard Feinberg and Mark Raymond, National Board of Medical Examiners
Recent research has proposed a conditional index to detect subscore value for certain test takers when more
conventional methods suggest not reporting at all. The current study furthers this research by investigating
conditions under which conditional indices detect potentially meaningful score profiles that may be worthy of
reporting.
The Validity of Augmented Subscores When Used for Different Purposes
Marc Gessaroli, National Board of Medical Examiners
The validity of augmented subscores has been debated in the literature. This paper studies the validity of augmented
subscores when they are used for different purposes. The findings suggest that the usefulness of augmented
subscores varies depending upon the intended use of the scores.
181
Scores and Scoring Rules
Session Discussant: Steven Culpepper, University of Illinois
The Relationship Between Pass Rate and Multiple Attempts
Ying Cheng and Cheng Liu, University of Notre Dame
We analytically derive the relationship between expected conditional and marginal pass rate and the number of
allowed attempts at a test under two definitions of pass rate. It is shown that depending on the definition, the pass
rate can go up or down with the number of attempts.
Classification Consistency and Accuracy with Atypical Score Distributions
Stella Kim and Won-Chan Lee, The University of Iowa
The primary purpose of this study is to evaluate relative performance of various procedures for estimating
classification consistency and accuracy indices with atypical score distributions. Three simulation studies are
conducted, each of which is associated with a peculiar observed score distribution.
A Psychometric Evaluation of Item-Level Scoring Rules for Educational Tests
Frederik Coomans and Han van der Maas, University of Amsterdam; Peter van Rijn, ETS Global, Amsterdam; Marjan Bakker,
Tilburg University; Gunter Maris, Cito Institute for Educational Measurement and University of Amsterdam
We develop a modeling framework in which psychometric models can be constructed directly from a scoring rule
for dichotomous and polytomous items. By assessing the fit of such a model, we can infer the extent to which the
population of test takers responds in accordance with the scoring rule.
For Want of Subscores in Large-Scale Educational Survey Assessment:a Simulation Study
Nuo Xi, Yue Jia, Xueli Xu and Longjuan Liang, Educational Testing Service
The objective of the simulation study is to investigate the impact of varying length of content area subscales (overall
and per examinee) for its prospective use in large-scale educational survey assessments. Sample size and estimation
method are also controlled to evaluate the overall effect on group statistics estimation.
Comparability of Essay Scores Across Response-Modes: A Complementary View Using Multiple Approaches
Nina Deng and Jennifer Dunn, Measured Progress
This study evaluates the comparability of essay scores between computer-typed vs. handwritten responses.
Multiple approaches were integrated to provide a complementary view for assessing both statistical and practical
significance of essay score differences at the factorial, scoring-dimension, and item levels.
182
Washington, DC, USA
4:05 PM - 6:05 PM, Meeting Room 13/14, Meeting Room Level, Invited Session, M7
On the Use and Misuse of Latent Variable Scores
Session Presenter: Anders Skrondal, Norwegian Institute of Public Health
One major purpose of latent variable modeling is scoring of latent variables, such as ability estimation. Another
purpose is investigation of the relationships among latent (and possibly observed) variables. In this case the stateof-the-art approach is simultaneous estimation of a measurement model (for the relationships between latent
variables and items measuring them) and a structural model (for the relationships between different latent variables
and between latent and observed variables). An alternative approach, that is considered naive, is to use latent
variable scores as proxies for latent variables. Here, estimation is simplified by first estimating the measurement
model and obtaining latent variable scores, and subsequently treating the latent variable scores as observed
variables in standard regression analyses. This approach will generally produce invalid estimates for the target
parameters in the structural model, but we will demonstrate that valid estimates can be obtained if the scoring
methods are judiciously chosen. Furthermore, the proxy approach can be superior to the state-of-the-art approach
because it protects against certain misspecifications and allows doubly-robust causal inference in a class of latent
variable models.
183
184
Washington, DC, USA
Participant Index
A
Bertling, Masha . . . . . . . . . . . . . . . . . . . . . 74, 120
Betebenner, Damian . . . . . . . . . . . . . . . . . . 44, 130
Betts, Joe . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Beverly, Tanesia . . . . . . . . . . . . . . . . . . . . . . . . 89
Beymer, Lisa . . . . . . . . . . . . . . . . . . . . . . . 74, 120
Bian, Yufang . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Blood, Ian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bo, Yuanchao Emily . . . . . . . . . . . . . . . . . . . . . 181
Boeck, Paul De . . . . . . . . . . . . . . . . . . . . . . . . . 49
Bohrnstedt, George . . . . . . . . . . . . . . . . . . . . . 117
Bolt, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Bolton, Sarah . . . . . . . . . . . . . . . . . . . . . . . . . 157
Bond, Mark . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Bonifay, Wes . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bottge, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 106
Boughton, Keith . . . . . . . . . . . . . . . . . . . . . 65, 152
Boulais, André-Philippe . . . . . . . . . . . . . . . . . . 164
Bowman, Trinell . . . . . . . . . . . . . . . . . . . . . . . 175
Boyer, Michelle . . . . . . . . . . . . . . . . . . . . . 122, 164
Bradshaw, Laine . . . . . . . . . . . . 74, 86, 120, 127, 127
Brandstrom, Adele . . . . . . . . . . . . . . . . . . . . . . 71
Brandt, Steffen . . . . . . . . . . . . . . . . . . . . . . . . 174
Braun, Henry . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Brennan, Robert . . . . . . . . . . . . . . . . . . .30, 63, 131
Brenner, Daniel . . . . . . . . . . . . . . . . . . . . . . . 160
Breyer, F. Jay . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Breyer, Jay . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Bridgeman, Brent . . . . . . . . . . . . . . . . . . . . . . 134
Briggs, Derek . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Brijmohan, Amanda . . . . . . . . . . . . . . . . . . . . 109
Broaddus, Angela . . . . . . . . . . . . . . . . . . . . . . . 72
Broer, Markus . . . . . . . . . . . . . . . . . . . . . . . . . 117
Brophy, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Brown, Derek . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Brown, Emily . . . . . . . . . . . . . . . . . . . . . . . . . 109
Brown, Nathaniel . . . . . . . . . . . . . . . . . . . . . . . 40
Brown, Terran . . . . . . . . . . . . . . . . . . . . . . . . . 169
Brusilovsky, Peter . . . . . . . . . . . . . . . . . . . . . . 106
Brussow, Jennifer . . . . . . . . . . . . . . . . . . . . . . . 91
Bryant, Rosalyn . . . . . . . . . . . . . . . . . . . . . . . . 76
Buchholz, Janine . . . . . . . . . . . . . . . . . . . . . . 132
Buckendahl, Chad . . . . . . . . . . . . . . 18, 83, 101, 112
Buckley, Barbara . . . . . . . . . . . . . . . . . . . . . . . 160
Buckley, Jack . . . . . . . . . . . . . . . . . . . . . . . . . 128
Budescu, David . . . . . . . . . . . . . . . . . . . . . . . 105
Bukhari, Nurliyana . . . . . . . . . . . . . . . . . . . . . . 65
Bulut, Okan . . . . . . . . . . . . . . . . . .60, 171, 171, 171
Burstein, Jill . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Bushaw, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Abad, Francisco . . . . . . . . . . . . . . . . . . . . . . . . 72
Adamson, David . . . . . . . . . . . . . . . . . . . . . . . 173
Adesope, Olusola . . . . . . . . . . . . . . . . . . . . . . . 88
Adhikari, Sam . . . . . . . . . . . . . . . . . . . . . . . . 146
Aguado, David . . . . . . . . . . . . . . . . . . . . . . . . . 72
Akbay, Lokman . . . . . . . . . . . . . . . . . . . . . . . . 74
Albanese, Mark . . . . . . . . . . . . . . . . . . . . . . . 180
Albano, Anthony . . . . . . . . . . . . . . . . . . . . 32, 163
Ali, Usama . . . . . . . . . . . . . . . . . . . . . 116, 154, 180
Allexsaht-Snider, Martha . . . . . . . . . . . . . . . . . 104
Almond, Russell . . . . . . . . . . . . . . . . . . . . . . . . 70
Alpert, Tony . . . . . . . . . . . . . . . . . . . . . . . . 67, 175
Alzen, Jessica . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Amati, Lucy . . . . . . . . . . . . . . . . . . . . . . . . . . 118
An, Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . 150, 170
Anderson, Daniel . . . . . . . . . . . . . . . . . . . . . . . 59
Andrews, Benjamin . . . . . . . . . . . . . . . . . . . 46, 105
Andrich, David . . . . . . . . . . . . . . . . . . . . . . . . 165
Ankenmann, Robert . . . . . . . . . . . . . . . . . . . . . 63
Antal, Judit . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Austin, Bruce . . . . . . . . . . . . . . . . . . . . . . . . . . 88
B
Baker, Eva . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Baker, Ryan . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Bakker, Marjan . . . . . . . . . . . . . . . . . . . . . . . . 182
Balamuta, James . . . . . . . . . . . . . . . . . . . . . . . 155
Banks, Kathleen . . . . . . . . . . . . . . . . . . . . . . . 154
Bao, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Barocas, Solon . . . . . . . . . . . . . . . . . . . . . . . . . 48
Barrada, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Barrett, Michelle . . . . . . . . . . . . . . . . . . . . . 16, 113
Barry, Carol . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Barton, Karen . . . . . . . . . . . . . . . . . . . . . . . 53, 152
Bashkov, Bozhidar . . . . . . . . . . . . . . . . . . . . . . . 73
Baumer, Michal . . . . . . . . . . . . . . . . . . . . . . . . 83
Bazaldua, Diego Luna . . . . . . . . . . . . . . . . . . . 124
Beard, Jonathan . . . . . . . . . . . . . . . . . . . . . . . 128
Becker, Betsy . . . . . . . . . . . . . . . . . . . . . . . . . 172
Bejar, Isaac . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Bejar, Isaac I. . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Belov, Dmitry . . . . . . . . . . . . . . . . . . . . . . . 54, 113
Bennett, Randy . . . . . . . . . . . . . 47, 47, 142, 160, 160
Benson, Martin . . . . . . . . . . . . . . . . . . . . . . . . . 53
Bertling, Jonas . . . . . . . . . . . . . . . . . . . . 17, 57, 57
Bertling, Maria . . . . . . . . . . . . . . . . . . . . . . . . . 53
185
Participant Index
Buxton, Cory . . . . . . . . . . . . . . . . . . . . . . . . . 104
Buzick, Heather . . . . . . . . . . . . . . . . . . . . . . . 150
Choe, Edison . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Choi, Hye-Jeong . . . . . . . . . . . . . . . . . . . . . . . 106
Choi, Ikkyu . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Choi, In-Hee . . . . . . . . . . . . . . . . . . . . . . . 68, 118
Choi, Jinah . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Choi, Jiwon . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Choi, Kilchan . . . . . . . . . . . . .151, 151, 151, 151, 162
Christ, Theodore . . . . . . . . . . . . . . . . . . . . . . . 167
Chu, Kwang-lee . . . . . . . . . . . . . . . . . . . . . . . 166
Chung, Kyung Sun . . . . . . . . . . . . . . . . . . . . . 133
Chung, Seunghee . . . . . . . . . . . . . . . . . . . . 63, 123
Ci, Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Circi, Ruhan . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Cizek, Greg . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Clark, Amy . . . . . . . . . . . . . . . . . . . . . . . . 51, 178
Clauser, Amanda . . . . . . . . . . . . . . . . . . . . 43, 178
Clauser, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 142
Clauser, Jerome . . . . . . . . . . . . . . . . . . . . . 63, 142
Cohen, Allan . . . . . . . . . . . . . . . . . . . . . . 104, 106
Cohen, Allan S. . . . . . . . . . . . . . . . . . . . . . . . . . 73
Cohen, Jon . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Cohen, Michael . . . . . . . . . . . . . . . . . . . . . . . 100
Colvin, Kimberly . . . . . . . . . . . . . . . . . . . . . . . 107
Conaway, Carrie . . . . . . . . . . . . . . . . . . . . . . . . 67
Conforti, Peter . . . . . . . . . . . . . . . . . . . . . . . . 159
Confrey, Jere . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Connally, Mark . . . . . . . . . . . . . . . . . . . . . . . . 180
Cook, Linda . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Coomans, Frederik . . . . . . . . . . . . . . . . . . . . . 182
Cottrell, Nicholas . . . . . . . . . . . . . . . . . . . . . . . 84
Crabtree, Ashleigh . . . . . . . . . . . . . . . . . . . . . . 62
Crane, Samuel . . . . . . . . . . . . . . . . . . . . . . . . 147
Croft, Michelle . . . . . . . . . . . . . . . . . . . . . . . . . 55
Crouch, Lori . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Cui, Zhongmin . . . . . . . . . . . . . . . . . . . . . . 85, 106
Cukadar, Ismail . . . . . . . . . . . . . . . . . . . . . . . . 120
Culpepper, Steven . . . . . . . . . . . . . . . . . . 155, 182
Cúri, Mariana . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C
Cahill, Aoife . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Cai, Li . . . 27, 130, 143, 151, 151, 155, 162, 162, 170, 181
Cai, Liuhan . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Cai, Yan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Cain, Jessie Montana . . . . . . . . . . . . . . . . . . . . 134
Caliço, Tiago . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Camara, Wayne . . . . . . . . . . . . . . . . 56, 81, 112, 145
Camilli, Greg . . . . . . . . . . . . . . . . . . . . . . . . . 133
Canto, Phil . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Carstensen, Claus . . . . . . . . . . . . . . . . . . . . . . 155
Casabianaca, Jodi . . . . . . . . . . . . . . . . . . . . . . . 91
Casabianca, Jodi . . . . . . . . . . . . . . 159, 159, 159, 159
Castellano, Katherine Furgol . . . . . . 53, 59, 80, 80, 80,
130, 136, 141
Cavalie, Carlos . . . . . . . . . . . . . . . . . . . . . . . . . 84
Chajewski, Michael . . . . . . . . . . . . . . . . . . . . . . 50
Chametsky, Matt . . . . . . . . . . . . . . . . . . . . . . 179
Champlain, André De . . . . . . . . . . . . . . . . . . . 164
Chang, Hua-Hua . . . . . . . . . . . . . . . . . . . . . .61, 65
Chang, Hua-hua . . . . . . . . . . . . . . . . . . . . . . . . 91
Chang, Hua-Hua . . . . . . . . . . . . . . . . . . . . . . . 116
Chattergoon, Rajendra . . . . . . . . . . . . . . . . 126, 153
Chatterji, Madhabi . . . . . . . . . . . . . . . . . . . . . 117
Chayer, David . . . . . . . . . . . . . . . . . . . . . . . . . 152
Chen, Feng . . . . . . . . . . . . . . . . . . . . . . . . 72, 122
Chen, Hanwei . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chen, Hui-Fang . . . . . . . . . . . . . . . . . . . . . . . 108
Chen, I-Chien . . . . . . . . . . . . . . . . . . . . . . . . . 146
Chen, Jie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Chen, Jing . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Chen, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Chen, Keyu . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Chen, Pei-Hua . . . . . . . . . . . . . . . . . . . . . . . . . 83
Chen, Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chen, Tingting . . . . . . . . . . . . . . . . . . . . . . . . . 83
CHEN, TINGTING . . . . . . . . . . . . . . . . . . . . . . . 135
Chen, Xin . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Cheng, Britte . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Cheng, Ying . . . . . . . . . . . . . . . . . . . . . . . 154, 182
Chien, Yuehmei . . . . . . . . . . . . . . . . . . . . . . . . 86
Childs, Ruth . . . . . . . . . . . . . . . . . . . . . . . 109, 161
Cho, Youngmi . . . . . . . . . . . . . . . . . . . . . . . . . 62
Cho, YoungWoo . . . . . . . . . . . . . . . . . . . . . . . . 70
D
d’Brot, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Dabbs, Beau . . . . . . . . . . . . . . . . . . . . . . . . . 146
Dadey, Nathan . . . . . . . . . . . . . . . . . . 107, 126, 167
Dai, Shenghai . . . . . . . . . . . . . . . . . . . . . . 62, 135
Daniels, Vijay . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Davey, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Davey, Timothy . . . . . . . . . . . . . . . . . . 66, 169, 169
186
Washington, DC, USA
Participant
Participant
Index
Index
F
Davier, Alina von . . . .22, 41, 48, 60, 60, 85, 90, 105, 118
Davier, Matthias von . . . . . . . . . . . . . . . . . . . . . 57
Davier, Matthias Von . . . . . . . . . . . . . . . . . . . . . 68
Davier, Matthias von . . . . . . . .103, 103, 103, 103, 155
Davis, Laurie . . . . . . . . . . . . . . . . . . . . . 84, 84, 142
Davis-Becker, Susan . . . . . . . . . . . . . . . . . . . . . . 71
Deane, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Debeer, Dries . . . . . . . . . . . . . . . . . . . . . . . . . 116
DeCarlo, Larry . . . . . . . . . . . . . . . . . . . . . . . . . 61
DeCarlo, Lawrence . . . . . . . . . . . . . . . . . . . . . . 86
DeMars, Christine . . . . . . . . . . . . . . . . . . . . 73, 107
Denbleyker, Johnny . . . . . . . . . . . . . . . . . . . . . 130
Deng, Hui . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Deng, Nina . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Deters, Lauren . . . . . . . . . . . . . . . . . . . . . . . . . 43
Dhaliwal, Tasmin . . . . . . . . . . . . . . . . . . . . . . . 127
Diakow, Ronli . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Diao, Hongyu . . . . . . . . . . . . . . . . . . . . . . 121, 165
DiBello, Lou . . . . . . . . . . . . . . . . . . . . . . . . . . 136
DiCerbo, Kristen . . . . . . . . . . . . . . . . . . . . . . . 127
Ding, Shuliang . . . . . . . . . . . . . . . . . . . . . . 91, 116
Dodd, Barbara . . . . . . . . . . . . . . . . . . . . . . . . . 91
Dodson, Jenny . . . . . . . . . . . . . . . . . . . . . . . . . 42
Dolan, Bob . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Domingue, Benjamin . . . . . . . . . . . . . . . . . . . . . 89
Donoghue, John . . . . . . . . . . . . 52, 52, 115, 119, 133
Dorans, Neil . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Dorn, Sherman . . . . . . . . . . . . . . . . . . . . . . . . 157
Doromal, Justin . . . . . . . . . . . . . . . . . . . . . . . 115
Drasgow, Fritz . . . . . . . . . . . . . . . . . . . . . . . . 142
Du, Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 143, 169
Dunbar, Stephen . . . . . . . . . . . . . . . . . . . 108, 145
Dunbar, Steve . . . . . . . . . . . . . . . . . . . . . . . . . 43
Dunn, Jennifer . . . . . . . . . . . . . . . . . . . . . . 18, 182
Dunya, Beyza Aksu . . . . . . . . . . . . . . . . . . . . . . 60
Fabrizio, Lou . . . . . . . . . . . . . . . . . . . . . . . . . 168
Fahle, Erin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Famularo, Lisa . . . . . . . . . . . . . . . . . . . . . . . . . 63
Fan, Meichu . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Fan, Yuyu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Fang, Guoliang . . . . . . . . . . . . . . . . . . . . . . . . 136
Fang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Farley, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Feinberg, Richard . . . . . . . . . . . . . . . . . . . 181, 181
Ferrara, Steve . . . . . . . . . . . . . . . . . . . . . . . .28, 49
Fina, Anthony . . . . . . . . . . . . . . . . . . . . . . . . 165
Finch, Holmes . . . . . . . . . . . . . . . . . . . . . . . . 167
Finger, Michael . . . . . . . . . . . . . . . . . . . . . . . . 163
Finn, Chester . . . . . . . . . . . . . . . . . . . . . . . . . 100
Foltz, Peter . . . . . . . . . . . . . . . . . . . . . . . . 29, 179
Forte, Ellen . . . . . . . . . . . . . . . . . . . . . . . . 79, 112
Freeman, Leanne . . . . . . . . . . . . . . . . . . . . . . 164
French, Brian . . . . . . . . . . . . . . . . . . . . . . . 88, 167
Frey, Andreas . . . . . . . . . . . . . . . . . . . . . . . . . 153
Fu, Yanyan . . . . . . . . . . . . . . . . . . . . . . . . . 86, 136
Fung, Karen . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Futagi, Yoko . . . . . . . . . . . . . . . . . . . . . . . . . . 179
G
Gafni, Naomi . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Gandara, Fernanda . . . . . . . . . . . . . . . . . . . . . 162
Gao, Furong . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Gao, Lingyun . . . . . . . . . . . . . . . . . . . . . . . . . 109
Gao, Xiaohong . . . . . . . . . . . . . 58, 90, 136, 162, 166
Garcia, Alejandra . . . . . . . . . . . . . . . . . . . . . . 162
Garner, Holly . . . . . . . . . . . . . . . . . . . . . . . . . 173
Gawade, Nandita . . . . . . . . . . . . . . . . . . . . . . . 59
Geis, Eugene . . . . . . . . . . . . . . . . . . . . . . . . . 133
Geisinger, Kurt . . . . . . . . . . . . . . . . . . . . . 142, 177
Gelbal, Selahattin . . . . . . . . . . . . . . . . . . . . . . . 45
Gessaroli, Marc . . . . . . . . . . . . . . . . . . . . . . . . 181
Gianopulos, Garron . . . . . . . . . . . . . . . . . . . . . . 44
Gierl, Mark . . . . . . . . . . . . . . . . . . . 53, 83, 142, 147
Gill, Brian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Gitchel, Dent . . . . . . . . . . . . . . . . . . . . . . . . . 148
Glazer, Nancy . . . . . . . . . . . . . . . . . . . . . . . . . 179
Goldhammer, Frank . . . . . . . . . . . . . . . 133, 153, 174
Gong, Brian . . . . . . . . . . . . . . . . . . . . . . . 107, 126
Gonzalez, Oscar . . . . . . . . . . . . . . . . . . . . . . . 120
González-Brenes, José . . . . . . . . . . . . . 106, 144, 144
González-Brenes, José Pablo . . . . . . . . . . . . . . . . 53
E
Easton, John . . . . . . . . . . . . . . . . . . . . . . . . . 157
Egan, Karla . . . . . . . . . . . . . . . . . . . . . 18, 101, 102
Embretson, Susan . . . . . . . . . . . . . . . . . . . . . . . 49
Engelhardt, Lena . . . . . . . . . . . . . . . . . . . . . . 153
Ercikan, Kadriye . . . . . . . . . . . . . . . . . . . . . . . . 40
Erickan, Kadriye . . . . . . . . . . . . . . . . . . . . . . . . 99
Evans, Carla . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Ewing, Maureen . . . . . . . . . . . . . . . . . . . . . . . 128
187
Participant Index
Gotch, Chad . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Grabovsky, Irina . . . . . . . . . . . . . . . . . . . . 131, 148
Graesser, Art . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Graf, Edith Aurora . . . . . . . . . . . . . . . . . . . . 49, 109
Greiff, Samuel . . . . . . . . . . . . . . . . . . . . . . 41, 103
Griffin, Patrick . . . . . . . . . . . . . . . . . . . . . . . . . 89
Grochowalski, Joe . . . . . . . . . . . . . . . . . . . . . . . 58
Grochowalski, Joseph . . . . . . . . . . . . . . . . . . . 148
Groos, Janet Koster van . . . . . . . . . . . . . . . . . . . 88
Grosse, Philip . . . . . . . . . . . . . . . . . . . . . . . . . 123
Gu, Lixiong . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Guerreiro, Meg . . . . . . . . . . . . . . . . . . . . . . . . 108
Gunter, Stephen . . . . . . . . . . . . . . . . . . . . . . . 163
Guo, Hongwen . . . . . . . . . . . . . . . . . . . .65, 88, 119
Guo, Qi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Guo, Rui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Henson, Robert . . . . . . . . . . . . . . . . . . 86, 110, 136
Herman, Joan . . . . . . . . . . . . . . . . . . . . . . . . . 47
Herrera, Bill . . . . . . . . . . . . . . . . . . . . . . . . 43, 108
Heuvel, Jill R. van den . . . . . . . . . . . . . . . . . . . 141
Hillier, Tracey . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Himelfarb, Igor . . . . . . . . . . . . . . . . . . . . . . . . 136
Ho, Andrew . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Ho, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Hochstedt, Kirsten . . . . . . . . . . . . . . . . . . . . . 123
Hochweber, Jan . . . . . . . . . . . . . . . . . . . . . . . . 43
Hoff, David . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Hogan, Thomas . . . . . . . . . . . . . . . . . . . . . . . 110
Holmes, Stephen . . . . . . . . . . . . . . . . . . . . . . 173
Hong, Guanglei . . . . . . . . . . . . . . . . . . . . . . . . 21
Hou, Likun . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Hou, Xiaodong . . . . . . . . . . . . . . . . . . . . . . . . 171
Houts, Carrie R. . . . . . . . . . . . . . . . . . . . . . . . . . 27
Huang, Cheng-Yi . . . . . . . . . . . . . . . . . . . . . . . 83
Huang, Chi-Yu . . . . . . . . . . . . . . . . . . . . . . . . 164
Huang, Kevin (Chun-Wei) . . . . . . . . . . . . . . . . . 160
Huang, Xiaorui . . . . . . . . . . . . . . . . . . . . . . . . . 46
Huang, Yun . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Huff, Kristen . . . . . . . . . . . . . . . . . . . . . . . . . 149
Huggins-Manley, Anne . . . . . . . . . . . . . . . . . . . 180
Huggins-Manley, Anne Corinne . . . . . . . . . . . . . . 46
Hughes, Malorie . . . . . . . . . . . . . . . . . . . . . . . 147
Huh, Nooree . . . . . . . . . . . . . . . . . . . . . . 164, 180
Hunter, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Huo, Yan . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 143
Hurtz, Gregory . . . . . . . . . . . . . . . . . . . . . . 54, 134
Hwang, Dasom . . . . . . . . . . . . . . . . . . . . . . . . .75
H
Haberman, Shelby . . . . . . . . . . . . . . . . . . . 80, 150
Hacker, Miriam . . . . . . . . . . . . . . . . . . . . . . . . 133
Haertel, Edward . . . . . . . . . . . . . . . . . . . . . . . 142
Hain, Bonnie . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Hakuta, Kenji . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Hall, Erika . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Han, HyunSuk . . . . . . . . . . . . . . . . . . . . . . . . . 46
Han, Kyung Chris . . . . . . . . . . . . . . . . . . . . . . . 22
Han, Zhuangzhuang . . . . . . . . . . . . . . . . . . . . 103
Hansen, Mark . . . . . . . . . . . . . . . . .73, 135, 162, 181
Hao, Jiangang . . . . . . . . . . . . . . . . . . . . . . . . . 41
Happel, Jay . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Harnly, Aaron . . . . . . . . . . . . . . . . . . . . . . . . . 147
Harrell, Lauren . . . . . . . . . . . . . . . . . . . . . . 57, 155
Harring, Jeffrey . . . . . . . . . . . . . . . . . . . . . . . . . 62
Harris, Debora . . . . . . . . . . . . . . . . . . . . . . . . 152
Harris, Deborah . . . . . . . . . . . . . . . . . . . . . 25, 180
Hartig, Johannes . . . . . . . . . . . . . . . . . . . . 43, 132
Hattie, John . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Hayes, Benjamin . . . . . . . . . . . . . . . . . . . . . . . . 51
Hayes, Heather . . . . . . . . . . . . . . . . . . . . . . . . 163
Hayes, Stacy . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Hazen, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . 161
He, Qingping . . . . . . . . . . . . . . . . . . . . . . 154, 173
He, Qiwei . . . . . . . . . . . . . . . . . . . . . 103, 103, 103
He, Yong . . . . . . . . . . . . . . . . . . . . . . . .70, 85, 106
Hebert, Andrea . . . . . . . . . . . . . . . . . . . . . . . 135
Hembry, Tracey . . . . . . . . . . . . . . . . . . . . . . . 127
Hendrie, Caroline . . . . . . . . . . . . . . . . . . . . . . 149
I
Iaconangelo, Charles . . . . . . . . . . . . . . . . . . 80, 123
III, Kenneth J Daly . . . . . . . . . . . . . . . . . . . . . . 168
Ing, Pamela . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Insko, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Insko, William . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Invernizzi, Marcia . . . . . . . . . . . . . . . . . . . . . . . 50
Irribarra, David Torres . . . . . . . . . . . . . . . . . . . . 68
Iverson, Andrew . . . . . . . . . . . . . . . . . . . . . . . . 76
J
Jacovidis, Jessica . . . . . . . . . . . . . . . . . . . . . . 107
Jang, Hyesuk . . . . . . . . . . . . . . . . . . . . 73, 171, 171
Jang, Yoonsun . . . . . . . . . . . . . . . . . . . . . . . . . 77
188
Washington, DC, USA
Participant Index
Jess, Nicole . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Jewsbury, Paul . . . . . . . . . . . . . . . . . . . . . . . . . 57
Ji, Grace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Jia, Helena . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Jia, Yue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Jiang, Shengyu . . . . . . . . . . . . . . . . . . . . . . . . . 75
Jiang, Yanming . . . . . . . . . . . . . . . . . . . . . 169, 169
Jiang, Zhehan . . . . . . . . . . . . . . . . . . . . . . . . 134
Jiao, Hong . . . . . . . . . . . . . . . . 46, 89, 109, 169, 170
Jin, Kuan-Yu . . . . . . . . . . . . . . . . . . . . . . . 107, 108
Jin, Rong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Johnson, Evelyn . . . . . . . . . . . . . . . . . . . . . 74, 120
Johnson, Marc . . . . . . . . . . . . . . . . . . . . . . . . 166
Johnson, Matthew . . . . . . . . . . . . . . . . . . . . 52, 80
Jones,, Ryan Seth . . . . . . . . . . . . . . . . . . . . . . . 44
Joo, Seang-hwane . . . . . . . . . . . . . . . . . . . . . . 148
Ju, Unhee . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Julian, Marc . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Julrich, Daniel . . . . . . . . . . . . . . . . . . . . . . . . 131
Jung, KwangHee . . . . . . . . . . . . . . . . . . . . . . . .62
Junker, Brian . . . . . . . . . . . . . . . . . . . 159, 159, 159
Kim, Dong-in . . . . . . . . . . . . . . . . . . . . . . . . . 152
Kim, Doyoung . . . . . . . . . . . . . . . . . . . . . . . . . 85
Kim, Han Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Kim, Hyung Jin . . . . . . . . . . . . . . . . . . . . . . . . 131
Kim, Ja Young . . . . . . . . . . . . . . . . . . . . . . . . . 70
Kim, Jinok . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Kim, Jong . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Kim, JP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Kim, Jungnam . . . . . . . . . . . . . . . . . . . . . . 64, 152
Kim, Meereem . . . . . . . . . . . . . . . . . . . . . . . . 104
Kim, Nana . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Kim, Se-Kang . . . . . . . . . . . . . . . . . . . . . . . 58, 148
Kim, Seohyun . . . . . . . . . . . . . . . . . . . . . . . . 104
Kim, Sooyeon . . . . . . . . . . . . . . . . . . . . . 64, 64, 90
Kim, Stella . . . . . . . . . . . . . . . . . . . . . . . . 121, 182
Kim, Sunhee . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Kim, Wonsuk . . . . . . . . . . . . . . . . . . . . . . . . . 135
Kim, Yongnam . . . . . . . . . . . . . . . . . . . . . . . . 151
Kim, Young Yee . . . . . . . . . . . . . . . . . . . . . . 19, 117
King, Kristin . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Kingston, Neal . . . . . . . . . . . . . . . . . . . . . . 47, 178
Kingston, Neal Martin . . . . . . . . . . . . . . . . . . . 134
Klieme, Eckhard . . . . . . . . . . . . . . . . . . . . . . . 174
Kobrin, Jennifer . . . . . . . . . . . . . . . . . . . . . 89, 127
Kögler, Kristina . . . . . . . . . . . . . . . . . . . . . . . . 174
Köhler, Carmen . . . . . . . . . . . . . . . . . . . . . . . 155
Koklu, Onder . . . . . . . . . . . . . . . . . . . . . . . . . 172
Kolen, Michael . . . . . . . . . . . . . . . . . . . . . . 30, 142
Kong, Xiaojing . . . . . . . . . . . . . . . . . . . . . . .84, 84
Konold, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Kosh, Audra . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Kroehne, Ulf . . . . . . . . . . . . . . . . . . . . . . . . . 174
Kröhne, Ulf . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Krost, Kevin . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Krovetz, Bob . . . . . . . . . . . . . . . . . . . . . . . . . 179
Kuger, Susanne . . . . . . . . . . . . . . . . . . . . . . . 174
Kuhfeld, Megan . . . . . . . . . . . . . . . . . 143, 151, 170
Kuo, Tzu Chun . . . . . . . . . . . . . . . . . . . . . . . . 124
Kupermintz, Haggai . . . . . . . . . . . . . . . . . . . . . 41
Kyllonen, Patrick . . . . . . . . . . . . . . . . . . .17, 41, 177
K
Kaliski, Pamela . . . . . . . . . . . . . . . . . . . . . . 40, 128
Kamenetz, Anya . . . . . . . . . . . . . . . . . . . . . . . 149
Kane, Joanne . . . . . . . . . . . . . . . . . . . . . . . . . 180
Kane, Michael . . . . . . . . . . . . . . . . . . . . . . . . 145
Kang, Hyeon-Ah . . . . . . . . . . . . . . . . . . . . . . . 119
Kang, Yoon Jeong . . . . . . . . . . . . . . . . . . . . . . 109
Kang, Yujin . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Kannan, Priya . . . . . . . . . . . . . 43, 175, 178, 178, 178
Kanneganti, Raghuveer . . . . . . . . . . . . 129, 129, 150
Kao, Shu-chuan . . . . . . . . . . . . . . . . . . . . . . . 166
Kaplan, David . . . . . . . . . . . . . . . . . . . . . . . . . 57
Kapoor, Shalini . . . . . . . . . . . . . . . . . . . . . . . . . 43
Karadavut, Tugba . . . . . . . . . . . . . . . . . . . . . . . 73
Karvonen, Meagan . . . . . . . . . . . . . . 51, 79, 89, 178
Keller, Lisa . . . . . . . . . . . . . . . . . . . . . . 18, 135, 165
Keller, Rob . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Kelly, Justin . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Keng, Leslie . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Kenyon, Dorry . . . . . . . . . . . . . . . . . . . . . . .42, 79
Kern, Justin . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Khan, Gulam . . . . . . . . . . . . . . . . . . . . . . . . . 109
Kieftenbeld, Vincent . . . . . . . . . . . 104, 129, 150, 164
Kilinc, Murat . . . . . . . . . . . . . . . . . . . . . . . . . 148
Kim, Dong-In . . . . . . . . . . . . . . . . . . . . .64, 65, 152
L
LaFond, Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Lai, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Lai, Hollis . . . . . . . . . . . . . . . . . . . . . . . . . 53, 142
Laitusis, Cara . . . . . . . . . . . . . . . . . . . . . . . . . 142
Lane, Suzanne . . . . . . . . . . . . . . . . . . . .40, 79, 128
189
Participant Index
Lao, Hongling . . . . . . . . . . . . . . . . . . . . . . . . . 86
Larsson, Lisa . . . . . . . . . . . . . . . . . . . . . . . . . 105
Lash, Andrea . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Lathrop, Quinn . . . . . . . . . . . . . . . . . . . . . . .50, 91
Latifi, Syed Muhammad Fahad . . . . . . . . . . . . . . 147
Lawless, Rene . . . . . . . . . . . . . . . . . . . . . . . . 177
Lawson, Janelle . . . . . . . . . . . . . . . . . . . . . . . 115
Leacock, Claudia . . . . . . . . . . . . . . . . . . 29, 104, 129
Lebeau, Adena . . . . . . . . . . . . . . . . . . . . . . . . . 45
LeBeau, Brandon . . . . . . . . . . . . . . . . . . . . . . . 56
Lee, Chansoon . . . . . . . . . . . . . . . . . . . . . . . . 109
Lee, Chong Min . . . . . . . . . . . . . . . . . . . . . . . 179
Lee, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Lee, HyeSun . . . . . . . . . . . . . . . . . . . . . . . . . 167
Lee, Philseok . . . . . . . . . . . . . . . . . . . . . . . . . 148
Lee, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Lee, Sora . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Lee, Won-Chan . . . . . 63, 70, 90, 105, 131, 131, 180, 182
Lee, Woo-yeol . . . . . . . . . . . . . . . . . . . . . . . . . 75
Lee, Yi-Hsuan . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Lee, Young-Sun . . . . . . . . . . . . . . . . . . . . . . 61, 86
Lei, Ming . . . . . . . . . . . . . . . . . . . . . . . . . 171, 171
Leibowitz, Emily . . . . . . . . . . . . . . . . . . . . 178, 178
Leighton, Jacqueline . . . . . . . . . . . . . . . . . . 63, 127
Leventhal, Brian . . . . . . . . . . . . . . . 74, 77, 120, 181
Levy, Roy . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Lewis, Charles . . . . . . . . . . . . . . 45, 52, 85, 148, 181
Li, Chen . . . . . . . . . . . . . . . . . . . . . . 150, 170, 174
Li, Cheng-Hsien . . . . . . . . . . . . . . . . . . . . . . . . 62
Li, Dongmei . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Li, Feifei . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 143
Li, Feiming . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Li, Isaac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Li, Jie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Li, Ming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Li, Tongyun . . . . . . . . . . . . . . . . . . . . . . . . .62, 65
Li, Xiaomin . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Li, Xin . . . . . . . . . . . . . . . . . . . 25, 70, 110, 118, 164
Li, Ying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Li, Zhushan . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Liang, Longjuan . . . . . . . . . . . . . . . . . . . . 161, 182
Liao, Chi-Wen . . . . . . . . . . . . . . . . . . . . . . . . . 116
Liao, Dandan . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Liaw, Yuan-Ling . . . . . . . . . . . . . . . . . . . . . . . 163
Lievens, Filip . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Lim, Euijin . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Lim, EunYoung . . . . . . . . . . . . . . . . . . . . . . . . . 92
Lim, MiYoun . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Lin, Chih-Kai . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Lin, Haiyan . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Lin, Johnny . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Lin, Meiko . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Lin, Pei-ying . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Lin, Peng . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Lin, Ye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Linden, Wim van der . . . . . . . . . . . . . . . . . . 83, 113
Ling, Guangming . . . . . . . . . . . . . . . . . . . . 84, 129
Lissitz, Robert . . . . . . . . . . . . . . . . . . . . 46, 89, 109
Liu, Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Liu, Chunyan . . . . . . . . . . . . . . . . . . . . . . . .85, 85
Liu, Hongyun . . . . . . . . . . . . . . . . . . . . . . 107, 110
Liu, Jinghua . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Liu, Lei . . . . . . . . . . . . . . . . . . . . . . . . . . . .41, 41
Liu, lou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Liu, Ou Lydia . . . . . . . . . . . . . . . . . . . . . . . . . 167
Liu, Qiongqiong . . . . . . . . . . . . . . . . . . . . . . . 161
Liu, Ren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Liu, Ruitao . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Liu, Xiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Liu, Yang . . . . . . . . . . . . . . . . . . . . . . 119, 132, 153
Liu, Yanlou . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Liu, Yue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Lockwood, J.R. . . . . . . . . . . . . .80, 80, 80, 80, 80, 130
Lockwood, John . . . . . . . . . . . . . . . . . . . . . . . 165
Longabach, Tanya . . . . . . . . . . . . . . . . . . . . . . . 64
Lopez, Alexis . . . . . . . . . . . . . . . . . . . . . . . 84, 162
Lopez, Melissa . . . . . . . . . . . . . . . . . . . . . . . . 179
Lord-Bessen, Jennifer . . . . . . . . . . . . . . . . . . . . . 71
Lorié, William . . . . . . . . . . . . . . . . . . . . . . 117, 144
Lottridge, Susan . . . . . . . . . . . . . . . . . . . . . . . 104
Loughran, Jessica . . . . . . . . . . . . . . . . . . . . .65, 91
Loukina, Anastassia . . . . . . . . . . . . 134, 150, 179, 179
Loveland, Mark . . . . . . . . . . . . . . . . . . . . . . . . 160
Lu, Chi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Lu, Lucy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Lu, Ru . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64, 64
Lu, Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Lu, Ying . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 133
Lu, Zhenqui . . . . . . . . . . . . . . . . . . . . . . . . . . 104
LUO, Fen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Luo, Xiao . . . . . . . . . . . . . . . . . . . . . . . . . . 45, 85
Luo, Xin . . . . . . . . . . . . . . . . . . . . . . . 45, 123, 165
Lynch, Ryan . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Lyons, Susan . . . . . . . . . . . . . . . . . . . . . . . . . 126
190
Washington, DC, USA
Participant Index
M
Mix, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Miyazaki, Yasuo . . . . . . . . . . . . . . . . . . . . . . . 132
Monroe, Scott . . . . . . . . . . . . . . . . . . . . . . 73, 130
Montee, Meg . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Montee, Megan . . . . . . . . . . . . . . . . . . . . . . . . 42
Moon, Jung Aa . . . . . . . . . . . . . . . . . . . . . . . . . 88
Moretti, Antonio . . . . . . . . . . . . . . . . . . . . 144, 144
Morgan, Deanna . . . . . . . . . . . . . . . . . . . . . 71, 115
Morin, Maxim . . . . . . . . . . . . . . . . . . . . . . . . 164
Morris, Carrie . . . . . . . . . . . . . . . . . . . . . . 118, 148
Morris, John . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Morrisey, Sarah . . . . . . . . . . . . . . . . . . . . . . . 163
Morrison, Kristin . . . . . . . . . . . . . . . . . . . . . .49, 84
Moses, Tim . . . . . . . . . . . . . . . . . . . . . . . 128, 134
Mroch, Andrew . . . . . . . . . . . . . . . . . . . . . . . 180
Mueller, Lorin . . . . . . . . . . . . . . . . . . . . 54, 119, 166
Mulholland, Matthew . . . . . . . . . . . . . . . . . . . 104
Muntean, William . . . . . . . . . . . . . . . . . . . . . . 173
Murphy, Stephen . . . . . . . . . . . . . . . . . . . . . 50, 71
Musser, Samantha . . . . . . . . . . . . . . . . . . . . . . . 42
Ma, Wenchao . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Maas, Han van der . . . . . . . . . . . . . . . . . . . . . 182
MacGregor, David . . . . . . . . . . . . . . . . . . . . . . . 42
Macready, George . . . . . . . . . . . . . . . . . . . . . . . 62
Madnani, Nitin . . . . . . . . . . . . . . . . . . . . . . 48, 179
Maeda, Hotaka . . . . . . . . . . . . . . . . . . . . . . . . . 75
Magaram, Eric . . . . . . . . . . . . . . . . . . . . . . . . 136
Magnus, Brooke . . . . . . . . . . . . . . . . . . . . . . . 153
Malone, Meg . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Mao, Liyang . . . . . . . . . . . . . . . . . . . . . . . . 65, 104
Mao, Xia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Marais, Ida . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Margolis, Melissa . . . . . . . . . . . . . . . . . . . . . . 142
Marini, Jessica . . . . . . . . . . . . . . . . . . . . . . . . 128
Marion, Scott . . . . . . . . . . . . . . . . . 40, 44, 126, 126
Maris, Gunter . . . . . . . . . . . . . . . . . . . . . . . . . 182
Martineau, Joe . . . . . . . . . . . . . . . . . . . . . . . . 158
Martineau, Joseph . . . . . . . . . . . . . . . . . . . 44, 101
Martínez, Jr, Carlos . . . . . . . . . . . . . . . . . . . . . 168
Masri, Yasmine El . . . . . . . . . . . . . . . . . . . . . . 155
Masters, Jessica . . . . . . . . . . . . . . . . . . . . . . . . 63
Matlock, Ki . . . . . . . . . . . . . . . . . . . . . . . . 46, 148
Matos-Elefonte, Haifa . . . . . . . . . . . . . . . . 161, 177
Matovinovic, Donna . . . . . . . . . . . . . . . . . . . . . 67
Matta, Tyler . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Maul, Andrew . . . . . . . . . . . . . . . . . . . . . 117, 153
Mayfield, Elijah . . . . . . . . . . . . . . . . . . . . . . . . 173
Mazany, Terry . . . . . . . . . . . . . . . . . . . . . . . . . 100
McBride, Yuanyuan . . . . . . . . . . . . . . . . . . . .84, 84
McCaffrey, Daniel . . . . . . . . . . . . . . . . . .80, 80, 130
McCall, Marty . . . . . . . . . . . . . . . . . . . . . . . . . . 79
McClellan, Catherine . . . . . . . . . . . . . . . . . 115, 154
McKnight, Kathy . . . . . . . . . . . . . . . . . 144, 144, 158
McMillan, James H . . . . . . . . . . . . . . . . . . . . . 168
McTavish, Thomas . . . . . . . . . . . . . . . . . . . . . . 144
Meador, Chris . . . . . . . . . . . . . . . . . . . . . . . . . 53
Meadows, Michelle . . . . . . . . . . . . . . . . . . 154, 173
Mehta, Vandhana . . . . . . . . . . . . . . . . . . . . . . . 53
Meng, Xiangbing . . . . . . . . . . . . . . . . . . . . . . . 73
Mercado, Ricardo . . . . . . . . . . . . . . . . . . . . . . . 71
Meyer, Patrick . . . . . . . . . . . . . . . . . . . . . . 50, 115
Meyer, Robert . . . . . . . . . . . . . . . . . . . . . . . . . 59
Miel, Shayne . . . . . . . . . . . . . . . . . . . . . . 147, 173
Miller, Sherral . . . . . . . . . . . . . . . . . . . . . . . . . 128
Minchen, Nathan . . . . . . . . . . . . . . . . . . . . . . . 74
Mislevy, Robert . . . . . . . . . . . . . . . . . . . . . . . 177
Mix, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
N
Naumann, Alexander . . . . . . . . . . . . . . . . . . . . . 43
Naumann, Johannes . . . . . . . . . . . . . . . . . . . . 153
Naumenko, Oksana . . . . . . . . . . . . . . . . . . . . . 136
Nebelsick-Gullet, Lori . . . . . . . . . . . . . . . . . . 43, 101
Nebelsick-Gullett, Lori . . . . . . . . . . . . . . . . . . . 108
Neito, Ricardo . . . . . . . . . . . . . . . . . . . . . . 74, 120
Nicewander, W. . . . . . . . . . . . . . . . . . . . . . . . . 51
Niekrasz, John . . . . . . . . . . . . . . . . . . . . . . . . 147
Nieto, Ricardo . . . . . . . . . . . . . . . . . . . . . . . . 159
Noh, Eunhee . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Norris, Mary . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Norton, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . 42
Nydick, Steven . . . . . . . . . . . . . . . . . . . . . . . . . 85
O
O’Brien, Sue . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
O’Connor, Brendan . . . . . . . . . . . . . . . . . . . . . . 48
O’Leary, Timothy . . . . . . . . . . . . . . . . . . . . . . . .89
O’Reilly, Tenaha . . . . . . . . . . . . . . . . . . . . . . . 160
Oakes, Jeannie . . . . . . . . . . . . . . . . . . . . . . . . 125
Ogut, Burhan . . . . . . . . . . . . . . . . . . . . . . . . . 117
Oh, Hyeon-Joo . . . . . . . . . . . . . . . . . . . . . . . . . 63
Olea, Julio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Olgar, Süleyman . . . . . . . . . . . . . . . . . . . . . 70, 172
191
Participant Index
Oliveri, Maria Elena . . . . . . . . . . . . . . . . . . . . . 177
Olsen, James . . . . . . . . . . . . . . . . . . . . . . . . . 108
Oppenheim, Peter . . . . . . . . . . . . . . . . . . . . . 157
Orpwood, Graham . . . . . . . . . . . . . . . . . . . . . 109
Oshima, T. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Özdemir, Burhanettin . . . . . . . . . . . . . . . . . . . . 45
Quellmalz, Edys . . . . . . . . . . . . . . . . . . . . . . . 160
Quenemoen, Rachel . . . . . . . . . . . . . . . . . . . . 175
R
Rahman, Nazia . . . . . . . . . . . . . . . . . . . . . . . . . 52
Rankin, Jenny . . . . . . . . . . . . . . . . . . . . . . . . 178
Rausch, Andreas . . . . . . . . . . . . . . . . . . . . . . . 174
Rawls, Anita . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Raymond, Mark . . . . . . . . . . . . . . . . . . . . . . . 181
Reboucas, Daniella . . . . . . . . . . . . . . . . . . . . . 154
Reckase, Mark . . . . . . . . . 20, 42, 45, 99, 116, 142, 165
Redell, Nick . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Reichenberg, Ray . . . . . . . . . . . . . . . . . .74, 74, 120
Renn, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . . 42
Reshetar, Rosemary . . . . . . . . . . . . . . . . . . . . . 128
Reshetnyak, Evgeniya . . . . . . . . . . . . . . . . . . . . 85
Ricarte, Thales . . . . . . . . . . . . . . . . . . . . . . . . . 90
Rich, Changhua . . . . . . . . . . . . . . . . . . . . . . . 110
Rick, Francis . . . . . . . . . . . . . . . . . . . . . . . . 43, 178
Rickels, Heather . . . . . . . . . . . . . . . . . . . . . . . 108
Rijiman, Frank . . . . . . . . . . . . . . . . . . . . . . . . 152
Rijmen, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Rijn, Peter van . . . . . . . . . . 63, 109, 116, 132, 155, 182
Rios, Joseph . . . . . . . . . . . . . . . . . . . . . . 132, 181
Risk, Nicole . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Roberts, Mary Roduta . . . . . . . . . . . . . . . . . . . . 64
Robin, Frederic . . . . . . . . . . . . . . . . . . . . . . 65, 119
Rodriguez, Michael . . . . . . . . 32, 56, 60, 153, 162, 171
Rogers, H. Jane . . . . . . . . . . . . . . . . . . . . . . . . 109
Rölke, Heiko . . . . . . . . . . . . . . . . . . . . . . . . . 174
Rollins, Jonathan . . . . . . . . . . . . . . . . . . . . 86, 115
Rome, Logan . . . . . . . . . . . . . . . . . . . . . . . . . 121
Romine, Russell Swinburne . . . . . . . . . . . .51, 89, 175
Roohr, Katrina . . . . . . . . . . . . . . . . . . . . . . . . 167
Rorick, Beth . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Rosen, Yigal . . . . . . . . . . . . . . . . . . . . . . . . .41, 41
Rosenstein, Mark . . . . . . . . . . . . . . . . . . . . . . 179
Roussos, Louis . . . . . . . . . . . . . . . . . . . . . . 54, 135
Rubright, Jonathan . . . . . . . . . . . . . 46, 85, 147, 181
Runyon, Christopher . . . . . . . . . . . . . . . . . . . . . 91
Rupp, André . . . . . . . . . . . . . . . . . . . . . . . . 29, 53
Rutkowski, Leslie . . . . . . . . . . . . . . . . . . . . 57, 118
Rutstein, Daisy . . . . . . . . . . . . . . . . . . . . . . 40, 147
P
Pak, Seohong . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Palma, Jose . . . . . . . . . . . . . . . . . . . . . . . . 60, 153
Pan, Tianshu . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Papa, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Papageorgiou, Spyridon . . . . . . . . . . . . . . . . . . 178
Pardos, Zachary . . . . . . . . . . . . . . . . . . . . . . . 144
Park, Jiyoon . . . . . . . . . . . . . . . . . . . . . . . . 54, 119
Park, Trevor . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Park, Yoon Soo . . . . . . . . . . . . . . . . . . . . . . . . . 61
Pashley, Peter . . . . . . . . . . . . . . . . . . . . . . . 52, 89
Patel, Priyank . . . . . . . . . . . . . . . . . . . . . . . 71, 115
Patelis, Thanos . . . . . . . . . . . . . . . 112, 145, 145, 177
Patterson, Brian . . . . . . . . . . . . . . . . . . . . . . . 159
Patz, Rich . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Peabody, Michael . . . . . . . . . . . . . . . . . . . . . . . 71
Peck, Fred . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Peng, Luyao . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Perie, Marianne . . . . . . . . . . . . . . . . 65, 79, 108, 157
Peterson, Mary . . . . . . . . . . . . . . . . . . . . . . . . . 51
Phadke, Chaitali . . . . . . . . . . . . . . . . . . . . . . . 167
Pham, Duy . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Phan, Ha . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Phelan, Jonathan . . . . . . . . . . . . . . . . . . . . . . 117
Phillips, S E . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Phillips, S.E. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Plake, Barbara . . . . . . . . . . . . . . . . . . . . . 112, 158
Pohl, Steffi . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Polikoff, Morgan . . . . . . . . . . . . . . . . . . . . . . . . 67
Por, Han-Hui . . . . . . . . . . . . . . . . . . . . . . 105, 134
Powers, Donald . . . . . . . . . . . . . . . . . . . . . . . 134
Powers, Sonya . . . . . . . . . . . . . . . . . . . . . . . . 105
Q
QIAN, HAIXIA . . . . . . . . . . . . . . . . . . . . . . . . . 134
Qian, Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Qian, Jiahe . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Qiu, Xue-Lan . . . . . . . . . . . . . . . . . . . . . . . . . 148
Qiu, Yuxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
S
Sabatini, John . . . . . . . . . . . . . . . . . . . . . . . . 160
192
Washington, DC, USA
Participant Index
Sabol, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Şahin, Füsun . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Sahin, Sakine Gocer . . . . . . . . . . . . . . . . . . . . . . 88
Saiar, Amin . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Sakano, Jenifer . . . . . . . . . . . . . . . . . . . . . . . . 179
Sakano, Jennifer . . . . . . . . . . . . . . . . . . . . . . . 179
Sakworawich, Arnond . . . . . . . . . . . . . . . . . . . 105
Salleb-Aouissi, Ansaf . . . . . . . . . . . . . . . . . 144, 144
Samonte, Kelli . . . . . . . . . . . . . . . . . . . . . . . . 165
Sanders, Elizabeth . . . . . . . . . . . . . . . . . . . . . . 163
Sandrock, Paul . . . . . . . . . . . . . . . . . . . . . . . . . 40
Sano, Makoto . . . . . . . . . . . . . . . . . . . . . . . . . 173
Sato, Edynn . . . . . . . . . . . . . . . . . . . . . . . . 89, 177
Sauder, Derek . . . . . . . . . . . . . . . . . . . . . . . . 120
Schmigdall, Jonathan . . . . . . . . . . . . . . . . . . . . 84
Schneider, Christina . . . . . . . . . . . . . . . . . . . . . 28
Schneider, Christy . . . . . . . . . . . . . . . . . . . . . . 179
Schultz, Matthew . . . . . . . . . . . . . . . . . . . . . . 147
Schwarz, Rich . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Schwarz, Richard . . . . . . . . . . . . . . . . . . . . . . . 88
Scott, Lietta . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Secolsky, Charles . . . . . . . . . . . . . . . . . . . . . . 136
Sedivy, Sonya . . . . . . . . . . . . . . . . . . . . . . . . . 109
Segall, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Seltzer, Michael . . . . . . . . . . . . . . . . . . . . 151, 151
Semmelroth, Carrie . . . . . . . . . . . . . . . . . . . . . 115
Sen, Sedat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Sgammato, Adrienne . . . . . . . . . . . . . . . . . . .52, 52
Sha, Shuying . . . . . . . . . . . . . . . . . . . . . . . . . 110
Shao, Can . . . . . . . . . . . . . . . . . . . . . . . . 123, 166
Sharairi, Sid . . . . . . . . . . . . . . . . . . . . . . . . 50, 133
Shaw, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Shear, Benjamin . . . . . . . . . . . . . . . . . . . . . 80, 110
Sheehan, Kathleen . . . . . . . . . . . . . . . . . . . . . 179
Shepard, Lorrie . . . . . . . . . . . . . . . . . . . . . . . . 126
Shermis, Mark . . . . . . . . . . . . . . . . . . . 51, 104, 104
Shin, Hyo Jeong . . . . . . . . . . . . . . . . . . . . . . . . 68
Shin, Nami . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Shipman, Michelle . . . . . . . . . . . . . . . . . . . . . . 89
Shmueli, Doron . . . . . . . . . . . . . . . . . . . . . . . 128
Shropshire, Kevin . . . . . . . . . . . . . . . . . . . . . . 132
Shukla, Kathan . . . . . . . . . . . . . . . . . . . . . . . . . 63
Shuler, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Shute, Valerie . . . . . . . . . . . . . . . . . . . . . . . . . 127
Sikali, Emmanuel . . . . . . . . . . . . . . . . . . . . . . . 19
Silberglitt, Matt . . . . . . . . . . . . . . . . . . . . . . . 160
Sinharay, Sandip . . . . . . . . . . . . . . . . . . . . 164, 181
Sireci, Stephen . . . . . . . . . . . . . . . . . . . . . 142, 145
Skorupski, William . . . . . . . . . . . . . . 72, 91, 106, 134
Skrondal, Anders . . . . . . . . . . . . . . . . . . . . . . 182
Smiley, Whitney . . . . . . . . . . . . . . . . . . . . . . . 161
Smith, Kara . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Smith, Robert . . . . . . . . . . . . . . . . . . . . . . . . . 45
Smith, Weldon . . . . . . . . . . . . . . . . . . . . . . . . 110
Snow, Eric . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Somasundaran, Swapna . . . . . . . . . . . . . . . . . . 179
Song, Hao . . . . . . . . . . . . . . . . . . . . . . . . . 90, 161
Song, Lihong . . . . . . . . . . . . . . . . . . . . . . . . . 116
Sorrel, Miguel . . . . . . . . . . . . . . . . . . . . . . . . . 72
Sparks, Sarah . . . . . . . . . . . . . . . . . . . . . . . . . 149
Stafford, Rose . . . . . . . . . . . . . . . . . . . . . . . . . 91
Stanke, Luke . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Stark, Stephen . . . . . . . . . . . . . . . . . . . . . . . . 148
Stecher, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 160
Steinberg, Jonathan . . . . . . . . . . . . . . . . . . . . 160
Sternod, Latisha . . . . . . . . . . . . . . . . . . . . . 74, 120
Stevens, Joseph . . . . . . . . . . . . . . . . . . . . . . . . 59
Stewart, John . . . . . . . . . . . . . . . . . . . . . . . . . 147
Stockford, Ian . . . . . . . . . . . . . . . . . . . . . . . . . 173
Stone, Clement . . . . . . . . . . . . . . . . . . . . . . . . 26
Stone, Elizabeth . . . . . . . . . . . . . . . . . . .54, 80, 142
Stout, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Strain-Seymour, Ellen . . . . . . . . . . . . . . . . . . . . 142
Stuart, Elizabeth . . . . . . . . . . . . . . . . . . . . . . . 151
Su, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57
Su, Yu-Lan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
SU, YU-LAN . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Suh, Hongwook . . . . . . . . . . . . . . . . . . . . . . . 180
Sukin, Tia . . . . . . . . . . . . . . . . . . . . . . . . . 51, 115
Sullivan, Meghan . . . . . . . . . . . . . . . . . . . . . 31, 72
Sun, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Sung, Kyunghee . . . . . . . . . . . . . . . . . . . . . . . . 92
Svetina, Dubravka . . . . . . . . 62, 74, 118, 120, 135, 163
Swaminathan, Hariharan . . . . . . . . . . . . . . . . . 109
Sweet, Shauna . . . . . . . . . . . . . . . . . . . . . . . . . 83
Sweet, Tracy . . . . . . . . . . . . . . . . . . . . . . . . . 146
Swift, David . . . . . . . . . . . . . . . . . . . . . . . . . . 133
T
Tan, Amy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53
Tan, Xuan-Adele . . . . . . . . . . . . . . . . . . . . . . . 161
Tang, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Tannenbaum, Richard . . . . . . . . . . . . . . . . . . . 178
Tao, Jian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Tao, Shuqin . . . . . . . . . . . . . . . . . . . . . . . . 85, 143
Templin, Jonathan . . . . . . . . . . . . . . . 31, 72, 72, 86
193
Participant Index
Terzi, Ragip . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Tessema, Aster . . . . . . . . . . . . . . . . . . . . . . 15, 147
Thissen, David . . . . . . . . . . . . . . . . . . . . . . . . . 44
Thissen-Roe, Anne . . . . . . . . . . . . . . . . . . . . . 163
Thompson, Tony . . . . . . . . . . . . . . . . . . . . . . . 180
Thum, Yeow Meng . . . . . . . . . . . . . . . . . . . 50, 110
Thummaphan, Phonraphee . . . . . . . . . . . . . . . 126
Thurlow, Martha . . . . . . . . . . . . . . . . . . . . . . . 175
Tian, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . .61, 91
Tomkowicz, Joanna . . . . . . . . . . . . . . . . . . . 64, 152
Tong, Ye . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Topczewski, Anna . . . . . . . . . . . . . . . . . . . . . . . 50
Torre, Jimmy de la . . . . . . . . . . . . . . . . . . . . 72, 136
Torre, Jummy de la . . . . . . . . . . . . . . . . . . . . . 143
Towles, Elizabeth . . . . . . . . . . . . . . . . . . . . . . . 43
Toyama, Yukie . . . . . . . . . . . . . . . . . . . . . . . . . 77
Trang, Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Trierweiler, Tammy . . . . . . . . . . . . . . . . . . . 45, 148
Tu, Dongbo . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Turner, Charlene . . . . . . . . . . . . . . . . . . . . . 43, 108
Turner, Ronna . . . . . . . . . . . . . . . . . . . . . . . . 148
Tzou, Hueying . . . . . . . . . . . . . . . . . . . . . . . . . 86
Wang, Caroline . . . . . . . . . . . . . . . . . . . . . . . . . 59
Wang, Changjiang . . . . . . . . . . . . . . . . . . . . . 109
Wang, Chun . . . . . . . . . . . . . . . . . . . . . . . 45, 170
Wang, Hongling . . . . . . . . . . . . . . . . . . . . . . . 166
Wang, Jui-Sheng . . . . . . . . . . . . . . . . . . . . . . . . 83
WANG, JUI-SHENG . . . . . . . . . . . . . . . . . . . . . 135
Wang, Keyin . . . . . . . . . . . . . . . . . . . . . . . . . 135
Wang, Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Wang, Min . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Wang, Richard . . . . . . . . . . . . . . . . . . . . . . . . 150
Wang, Shichao . . . . . . . . . . . . . . . . . . . . . . 85, 122
Wang, Shudong . . . . . . . . . . . . . . . . . . . . . . . 169
Wang, Tianyu . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Wang, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Wang, Wen-Chung . . . . . . . . . . . . . 61, 107, 108, 148
Wang, Wenyi . . . . . . . . . . . . . . . . . . . . . . . . . 116
Wang, Xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Wang, Xiaolin . . . . . . . . . . . . . . . . . . . . . . . 62, 135
Wang, Xiaoqing . . . . . . . . . . . . . . . . . . . . . . . . 91
Wang, Zhen . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Way, Walter . . . . . . . . . . . . . . . . . . . . . . . . 84, 142
Weegar, Johanna . . . . . . . . . . . . . . . . . . . . . . . 89
Weeks, Jonathan . . . . . . . . . . . . . . . . . 60, 116, 160
Wei, Hua . . . . . . . . . . . . . . . . . . . . . . . . . . 70, 164
Wei, Xiaoxin . . . . . . . . . . . . . . . . . . . . . . . . 50, 115
Wei, Youhua . . . . . . . . . . . . . . . . . . . . . . 165, 179
Weiner, John . . . . . . . . . . . . . . . . . . . . . . . 54, 134
Weiss, David . . . . . . . . . . . . . . . . . . . . . . . . . 167
Welch, Catherine . . . . . 43, 56, 56, 62, 90, 108, 145, 161
Wendler, Cathy . . . . . . . . . . . . . . . . . . . . . . 51, 179
West, Martin . . . . . . . . . . . . . . . . . . . . . . . . . 157
White, Lauren . . . . . . . . . . . . . . . . . . . . . . . . 172
Whittington, Dale . . . . . . . . . . . . . . . . . . . . . . 168
Wiberg, Marie . . . . . . . . . . . . . . . . . . . . . . . . . 60
Widiatmo, Heru . . . . . . . . . . . . . . . . . . . . . . . 119
Wiley, Andrew . . . . . . . . . . . . . . . . . . . . . . . . 112
Williams, Elizabeth . . . . . . . . . . . . . . . . . . . . . 120
Williams, Jean . . . . . . . . . . . . . . . . . . . . . . . . . 84
Willis, James . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Willoughby, Michael . . . . . . . . . . . . . . . . . . . . 153
Willse, John . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Wilmes, Carsten . . . . . . . . . . . . . . . . . . . . . . . 175
Wilson, Mark . . . . . . . . . . . . . . . . . . . . . 68, 68, 68
Wind, Stefanie . . . . . . . . . . . . . . . . . . . . . . . . . 71
Winter, Phoebe . . . . . . . . . . . . . . . . . . . . . . 51, 79
Wise, Lauress . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Wise, Laurie . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Wollack, James . . . . . . . . . . . . . . . . . . . . . . . . 109
Woo, Ada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
U
Underhill, Stephanie . . . . . . . . . . . . . . . . . . . . . 62
University, Sacred Heart . . . . . . . . . . . . . . . . . . 168
V
van der Linden, Wim . . . . . . . . . . . . . . . . . . . . . 16
Vansickle, Tim . . . . . . . . . . . . . . . . . . . . . . . . . 55
Vasquez-Colina, Maria Donata . . . . . . . . . . . . . . . 69
Veldkamp, Bernard . . . . . . . . . . . . . . . . . . . . . 113
Vispoel, Walter . . . . . . . . . . . . . . . . . . . . . . . . 148
VonDavier, Alina . . . . . . . . . . . . . . . . . . . . . . . . 15
Vue, Kory . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
W
Wain, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . . 84
Wainer, Howard . . . . . . . . . . . . . . . . . . . . . . . 181
Walker, Cindy . . . . . . . . . . . . . . . . . . . . . . . 69, 154
Walker, Cindy M. . . . . . . . . . . . . . . . . . . . . . . . . 88
Walker, Michael . . . . . . . . . . . . . . . . . . . . . . . 180
Wan, Ping . . . . . . . . . . . . . . . . . . . . . . . . . 64, 152
wang, aijun . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Wang, Ann . . . . . . . . . . . . . . . . . . . . . . . . . . 167
194
Washington, DC, USA
Participant Index
Z
Wood, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Wu, Yi-Fang . . . . . . . . . . . . . . . . . . . . . . . . .86, 90
Wüstenberg, Sascha . . . . . . . . . . . . . . . . . . . . 103
Wyatt, Jeff . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Wyse, Adam . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Zapata-Rivera, Diego . . . . . . . . . . . . . . . . . 127, 178
Zechner, Klaus . . . . . . . . . . . . . . . . . . . . . . . . 150
Zenisky, April . . . . . . . . . . . . . . . . . . . . . . . . . 178
Zhan, Peida . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Zhang, Bo . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Zhang, Jiahui . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Zhang, Jin . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Zhang, Jinming . . . . . . . . . . . . . . . . . . . . . . . . 58
Zhang, Litong . . . . . . . . . . . . . . . . . . . . . . . . 152
Zhang, Mengyao . . . . . . . . . . . . . . . . . . . 105, 180
Zhang, Mingcai . . . . . . . . . . . . . . . . . . . . . . . . 75
Zhang, Mo . . . . . . . . . . . . . . . . . . . . . 29, 160, 174
Zhang, Oliver . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Zhang, Susu . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Zhang, Xinxin . . . . . . . . . . . . . . . . . . . . . . . . . 77
Zhang, Xue . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Zhang, Ya . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Zhang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . 54, 119
zhang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Zhao, Tuo . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Zhao, Yang . . . . . . . . . . . . . . . . . . . . . . . . 71, 115
Zhao, Yihan . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Zheng, Bin . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Zheng, Chanjin . . . . . . . . . . . . . . . . . . . . . . 61, 73
Zheng, Chunmei . . . . . . . . . . . . . . . . . . . . . . . . 86
Zheng, Qiwen . . . . . . . . . . . . . . . . . . . . . . . . 146
Zheng, Xiaying . . . . . . . . . . . . . . . . . . . . . 170, 170
Zheng, Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Zhu, Mengxiao . . . . . . . . . . . . . . . . . . . . . 146, 174
Zhu, Rongchun . . . . . . . . . . . . . . . . . . 90, 136, 166
Zhu, Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Zweifel, Michael . . . . . . . . . . . . . . . . . . . . . . . 110
X
Xi, Nuo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Xiang, Shibei . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Xie, Chao . . . . . . . . . . . . . . . . . . . . . . . . 171, 171
Xie, Qing . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 120
Xin, Tao . . . . . . . . . . . . . . . . . . . . . . . . 61, 91, 135
Xing, Kuan . . . . . . . . . . . . . . . . . . . . . . . . 61, 122
Xiong, Jianhua . . . . . . . . . . . . . . . . . . . . . . . . . 91
Xiong, Xinhui . . . . . . . . . . . . . . . . . . . . . . . . . 104
Xiong, Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Xu, Jing-Ru . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Xu, Ran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Xu, Ting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Xu, Xueli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Y
Yakimowski, Mary E . . . . . . . . . . . . . . . . . . . . . 168
Yan, Duanli . . . . . . . . . . . . . . . . . . . . . . . . .22, 85
Yan, Ning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Yang, Ji Seung . . . . . . . . . . . . 73, 132, 170, 170, 170
Yang, Jiseung . . . . . . . . . . . . . . . . . . . . . . . . . 151
Yang, Tao . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Yao, Lihua . . . . . . . . . . . . . . . . . 20, 45, 88, 107, 147
Yao, Lili . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Ye, Feifei . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Ye, Sangbeak . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Yi, qin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Yi, Qing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Yin, Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Yoo, Hanwook . . . . . . . . . . . . . . . . . . . . . . 63, 133
Yoon, Su-Youn . . . . . . . . .129, 129, 150, 179, 179, 179
Yu, Xin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
195
Participant Index
196
Washington, DC, USA
Contact Information for Individual and Coordinated Sessions First Authors
Aksu Dunya, Beyza
Bennett, Randy E
ETS
[email protected]
University of Illinois at Chicago
[email protected]
Ali, Usama S.
[email protected]
Bertling, Maria
Harvard University
[email protected]
Alzen, Jessica
School of Education University of Colorado Boulder
[email protected]
Beverly, Tanesia
University of Connecticut
[email protected]
Amati, Lucy
[email protected]
Bo, Yuanchao Emily
University of California, Los Angeles
[email protected]
An, Ji
University of Maryland
[email protected]
Bond, Mark
The University of Texas at Austin
[email protected]
Anderson, Daniel
University of Oregon
[email protected]
Bonifay, Wes E
University of Missouri
[email protected]
Andrews, Benjamin
ACT
[email protected]
Boyer, Michelle
University of Massachusetts, Amherst
[email protected]
Andrich, David
University of Western Australia
[email protected]
Bradshaw, Laine
University of Georgia
[email protected]
Austin, Bruce W
Washington State University
[email protected]
Brandt, Steffen
Art of Reduction
[email protected]
Banks, Kathleen
LEAD Public Schools
[email protected]
Breyer, Jay F.
ETS
[email protected]
Barry, Carol L
The College Board
[email protected]
Bridgeman, Brent
[email protected]
Barton, Karen
Learning Analytics
[email protected]
Briggs, Derek C
University of Colorado
[email protected]
Bashkov, Bozhidar M
American Board of Internal Medicine
[email protected]
Broaddus, Angela
Center for Educational Testing and Evaluation
[email protected]
Bejar, Isaac I.
ETS
[email protected]
Brown, Derek
Oregon Department of Education
[email protected]
197
Buchholz, Janine
German Institute for International Educational
Research (DIPF)
[email protected]
Carstens, Ralph
International Association for the Evaluation of
Educational Achievement (IEA) Data Processing and
Research Center
[email protected]
Buckendahl, Chad W.
Alpine Testing Solutions, Inc.
[email protected]
Castellano, Katherine Furgol
[email protected]
Buckley, Jack
College Board
[email protected]
Chattergoon, Rajendra
University of Colorado, Boulder
[email protected]
Bukhari, Nurliyana
University of North Carolina at Greensboro
[email protected]
Chattergoon, Rajendra
University of Colorado, Boulder
[email protected]
Bulut, Okan
University of Alberta
[email protected]
Chatterji, Madhabi
Teachers College, Columbia University
[email protected]
Buzick, Heather
[email protected]
Chen, Feng
The University of Kansas
[email protected]
Cai, Li
UCLA/CRESST
[email protected]
Chen, Hui-Fang
City University of Hong Kong
[email protected]
Cai, Liuhan
University of Nebraska-Lincoln
[email protected]
Chen, Jie
[email protected]
Cain, Jessie Montana
University of North Carolina at Chapel Hill
[email protected]
Chen, Juan
National Conference of Bar Examiners
[email protected]
Caliço, Tiago A
[email protected]
Chen, Keyu
University of Iowa
[email protected]
Camara, Wayne
ACT
[email protected]
Chen, Pei-Hua
National Chiao Tung University
[email protected]
Camara, Wayne J.
ACT
[email protected]
Chen, Ping
Beijing Normal University
[email protected]
Canto, Phil
Florida Department of Education
[email protected]
Chen, Tingting
ACT, Inc.
[email protected]
Carroll, Patricia E
University of California - Los Angeles
[email protected]
198
Washington, DC, USA
Chen, Xin
Pearson
[email protected]
Cizek, Greg
[email protected]
Cheng, Ying Alison
University of Notre Dame
[email protected]
Clark, Amy K.
[email protected]
Childs, Ruth A
Ontario Institute for Studies in Education, University
of Toronto
[email protected]
Clauser, Amanda L.
National Board of Medical Examiners
[email protected]
Cohen, Allan
[email protected]
Cho, Youngmi
Pearson
[email protected]
Cohen, Jon
[email protected]
Choi, Hye-Jeong
[email protected]
Colvin, Kimberly F
University at Albany, SUNY
[email protected]
Choi, In-Hee
University of California, Berkeley
[email protected]
Conforti, Peter
[email protected]
Choi, In-Hee
[email protected]
Confrey, Jere
North Carolina State University
[email protected]
Choi, Jinah
The University of Iowa
[email protected]
Choi, Jiwon
ACT/University of Iowa
[email protected]
Contributor Last Name, Contributor First Name Middle
Initial
Company/Institution
Contributor Email Address
Choi, Kilchan
CRESST/UCLA
[email protected]
Coomans, Frederik
University of Amsterdam
[email protected]
Choi, Kilchan
CRESST/UCLA
[email protected]
Cottrell, Nicholas D
Fulcrum
[email protected]
Chu, Kwang-lee
Pearson
[email protected]
Crabtree, Ashleigh R
University of Iowa
[email protected]
Chung, Kyung Sun
Pennsylvania State University
[email protected]
Crane, Samuel
Amplify
[email protected]
Circi, Ruhan
University of Colorado Boulder
[email protected]
Croft, Michelle
ACT, Inc.
[email protected]
199
Cui, Zhongmin
ACT, Inc.
[email protected]
Diao, Hongyu
University of Massachusetts-Amherst
[email protected]
Culpepper, Steven Andrew
University of Illinois at Urbana-Champaign
[email protected]
DiCerbo, Kristen
Pearson
[email protected]
Dadey, Nathan
The National Center for the Improvement of
Educational Assessment
[email protected]
Donata Vasquez-Colina, Maria
Florida Atlantic University
[email protected]
Donoghue, John R
[email protected]
Davey, Tim
[email protected]
Du, Yi
[email protected]
Davis, Laurie L
Pearson
[email protected]
Du, Yi
Educational Testing Services
[email protected]
d’Brot, Juan
DRC
JD’[email protected]
Egan, Karla
NCIEA
[email protected]
De Boeck, Paul
Ohiao State University
[email protected]
Embretson, Susan E
Georgia Institute of Technology
[email protected]
Debeer, Dries
University of Leuven
[email protected]
DeCarlo, Lawrence T.
[email protected]
Engelhardt, Lena
Research
[email protected]
DeMars, Christine E.
James Madison University
[email protected]
Evans, Carla M.
University of New Hampshire
[email protected]
Denbleyker, Johnny
[email protected]
Fan, Meichu
ACT, Inc
[email protected]
Deng, Nina
Measured Progress
[email protected]
Fan, Yuyu
Fordham University
[email protected]
Deters, Lauren
edCount, LLC
[email protected]
Farley, Dan
University of Oregon
[email protected]
Dhaliwal, Tasmin
Pearson
[email protected]
Feinberg, Richard A
[email protected]
200
Washington, DC, USA
Fina, Anthony D
Iowa Testing Programs, University of Iowa
[email protected]
Gocer Sahin, Sakine
Hacettepe University
[email protected]
Finch, Holmes
Ball State University
[email protected]
Gong, Brian
National Center for the Improvement of Educational
Assessment
[email protected]
Foltz, Peter W.
Pearson and University of Colorado Boulder
[email protected]
González-Brenes, José
Center for Digital Data, Analytics & Adaptive Learning,
Pearson
[email protected]
Forte, Ellen
edCount
[email protected]
González-Brenes, José Pablo
Pearson
[email protected]
Forte, Ellen
edCount, LLC
[email protected]
Grabovsky, Irina
NBME
[email protected]
Freeman, Leanne
University of Wisconsin, Milwaukee
[email protected]
Graesser, Art
University of Memphis
[email protected]
Fu, Yanyan
UNCG
[email protected]
Graf, Edith Aurora
ETS
[email protected]
Gafni, Naomi
National Institute for Testing & Evaluation
[email protected]
Greiff, Samuel
University of Luxemburg
[email protected]
Gao, Lingyun
ACT, Inc.
[email protected]
Grochowalski, Joe
The College Board
[email protected]
Gao, Xiaohong
ACT, Inc.
[email protected]
Gu, Lixiong
[email protected]
Garcia, Alejandra Amador
University of Massachusetts
[email protected]
Guo, Hongwen
ETS
[email protected]
Geis, Eugene J
Rutgers Graduate School of Education
[email protected]
Guo, Rui
[email protected]
Geisinger, Kurt F.
Buros Center for Testing, University of NebraskaLincoln
[email protected]
Hacker, Miriam
The German Institute for International Educational
Research (DIPF) Centre for International Student
Assessment (ZIB)
[email protected]
Gessaroli, Marc E
[email protected]
201
Hakuta, Kenji
Stanford University
[email protected]
Hogan, Thomas P
University of Scranton
[email protected]
Hall, Erika
Center for Assessment
[email protected]
Holmes, Stephen
Office of Qualifications and Examinations Regulation
[email protected]
Han, Zhuangzhuang
Teachers College Columbia University
[email protected]
Hou, Likun
Educational Testing Services
[email protected]
Hansen, Mark
[email protected]
Huang, Chi-Yu
ACT, Inc.
[email protected]
Harrell, Lauren
[email protected]
Huang, Xiaorui
East China Normal University
[email protected]
Hayes, Heather
AMTIS Inc.
[email protected]
Huggins-Manley, Anne Corinne
University of Florida
[email protected]
Hayes, Stacy
Discovery Education
[email protected]
Huh, Nooree
ACT, Inc.
[email protected]
Hazen, Tim
Iowa Testing Programs
[email protected]
Hunter, C. Vincent
Georgia State Univrsity
[email protected]
He, Qingping
Office of Qualifications and Examinations Regulation
[email protected]
Huo, Yan
[email protected]
He, Qiwei
[email protected]
Insko, William R
[email protected]
He, Yong
ACT, Inc.
[email protected]
Jang, Hyesuk
[email protected]
Herrera, Bill
edCount, LLC
[email protected]
Jang, Hyesuk
[email protected]
Himelfarb, Igor
[email protected]
Jewsbury, Paul
[email protected]
Ho, Emily H
College Board
[email protected]
Jiang, Yanming
[email protected]
202
Washington, DC, USA
Jiang, Zhehan
[email protected]
KARADAVUT, TUGBA
UNIVERSITY OF GEORGIA
[email protected]
Jin, Kuan-Yu
The Hong Kong Institute of Education
[email protected]
Karvonen, Meagan
[email protected]
Joo, Seang-hwane
University of South Florida
[email protected]
Keller, Lisa A
University of Massachusetts Amherst
[email protected]
Julian, Marc
Data Recognition Corporation
[email protected]
Kenyon, Dorry
Center for Applied Linguistics
[email protected]
Junker, Brian W
Carnegie Mellon University
[email protected]
Kern, Justin L.
[email protected]
Kaliski, Pamela
College Board
[email protected]
Kim, Dong-In
[email protected]
Kang, Hyeon-Ah
[email protected]
Kim, Dong-In
[email protected]
Kang, Yoon Jeong
[email protected]
Kim, Han Yi
Measured Progress
[email protected]
Kang, Yujin
University of Iowa
[email protected]
Kim, Hyung Jin
[email protected]
Kannan, Priya
[email protected]
Kim, Ja Young
ACT, Inc.
[email protected]
Kanneganti, Raghuveer
Data Recognition Corporation CTB
[email protected]
Kim, Jinok
UCLA/CRESST
[email protected]
Kao, Shu-chuan
Pearson
[email protected]
Kim, Jong
ACT
[email protected]
Kaplan, David
University of Wisconsin – Madison
[email protected]
Kim, Se-Kang
Fordham University
[email protected]
Kapoor, Shalini
ACT
[email protected]
Kim, Sooyeon
[email protected]
203
Kim, Stella Y
[email protected]
Latifi, Syed Muhammad Fahad
[email protected]
Kim, Sunhee
Prometric
[email protected]
Lawson, Janelle
San Francisco State University
[email protected]
Kim, Young Yee
American Institues for Research
[email protected]
Leacock, Claudia
McGraw-Hill Education CTB
[email protected]
Kobrin, Jennifer L.
Pearson
[email protected]
LeBeau, Brandon
University of Iowa
[email protected]
Koklu, Onder
[email protected]
Lee, Chansoon
University of Wisconsin-Madison
[email protected]
Konold, Tim R
University of Virginia
[email protected]
Lee, Chong Min
ETS
[email protected]
Kroehne, Ulf
Research (DIPF)
[email protected]
Lee, HyeSun
[email protected]
Lee, Sora
University of Wisconsin, Madison
[email protected]
Kuhfeld, Megan
University of California
[email protected]
Lei, Ming
[email protected]
Kuhfeld, Megan
[email protected]
Leventhal, Brian
University of Pittsburgh
[email protected]
Kupermintz, Haggai
University of Haifa
[email protected]
Li, Chen
[email protected]
Lai, Hollis
[email protected]
Li, Chen
[email protected]
Lao, Hongling
[email protected]
Lash, Andrea A.
WestEd
[email protected]
Li, Cheng-Hsien
Department of Pediatrics, University of Texas Medical
School at Houston
[email protected]
Lathrop, Quinn N
Northwest Evaluation Association
[email protected]
Li, Feifei
[email protected]
204
Washington, DC, USA
Li, Feiming
University of North Texas Health Science Center
[email protected]
Ling, Guangming
ETS
[email protected]
Li, Jie
McGraw-Hill Education
[email protected]
Liu, Jinghua
Secondary School Admission Test Board
[email protected]
Li, Ming
[email protected]
Liu, Lei
ETS
[email protected]
Li, Tongyun
[email protected]
Liu, Xiang
[email protected]
Li, Xin
ACT, Inc.
[email protected]
Liu, Yang
University of California, Merced
[email protected]
Li, Ying
American Nurses Credentialing Center
[email protected]
Liu, Yue
Sichuan Institute Of Education Sciences
[email protected]
Li, Zhushan Mandy
Boston College
[email protected]
Lockwood, J.R.
[email protected]
Liao, Dandan
University of Maryland, College Park
[email protected]
Longabach, Tanya
Excelsior College
[email protected]
Liaw, Yuan-Ling
University of Washington
[email protected]
Lopez, Alexis A
ETS
[email protected]
Lim, Euijin
[email protected]
Lord-Bessen, Jennifer
McGraw Hill Education CTB
[email protected]
Lin, Chih-Kai
Center for Applied Linguistics (CAL)
[email protected]
Lorié, William A
Center for NextGen Learning & Assessment, Pearson
[email protected]
Lin, Haiyan
ACT, Inc.
[email protected]
Lottridge, Susan
Pacific Metrics, Inc.
[email protected]
Lin, Johnny
[email protected]
Lu, Lucy
NSW Department of Education, Australia
[email protected]
Ling, Guangming
[email protected]
Lu, Ru
[email protected]
205
Lu, Ying
[email protected]
Matta, Tyler H.
[email protected]
LUO, Fen
Jiangxi Normal University
[email protected]
Maul, Andrew
University of California, Santa Barbara
[email protected]
Luo, Xiao
National Council of State Boards of Nursing
[email protected]
McCaffrey, Daniel F.
[email protected]
Luo, Xin
Michigan State University
[email protected]
McCall, Marty
Smarter Balanced Assessment Consortium
[email protected]
Ma, Wenchao
Rutgers, The State University of New Jersey
[email protected]
McClellan, Catherine A
Clowder Consulting
[email protected]
MacGregor, David
[email protected]
McKnight, Kathy
Center for Educator Learning & Effectiveness, Pearson
[email protected]
Magnus, Brooke E
[email protected]
McTavish, Thomas S
Center for Digital Data, Analytics and Adaptive
Learning, Pearson
[email protected]
Mao, Xia
Pearson
[email protected]
Meyer, Patrick
University of Virginia
[email protected]
Marion, Scott
Assessment
[email protected]
Meyer, Robert H
Education Analytics, Inc.
[email protected]
Martineau, Joseph
Assessment
[email protected]
Miel, Shayne
Turnitin
[email protected]
Miller, Sherral
College Board
[email protected]
Martineau, Joseph
NCIEA
[email protected]
Monroe, Scott
UMass Amherst
[email protected]
Masters, Jessica
Measured Progress
[email protected]
Montee, Megan
[email protected]
Matlock, Ki Lynn
Oklahoma State University
[email protected]
206
Washington, DC, USA
Moretti, Antonio
Center for Computational Learning Systems,
Columbia University
[email protected]
Nydick, Steven W
Pearson VUE
[email protected]
Ogut, Burhan
[email protected]
Morgan, Deanna L
The College Board
[email protected]
O’Leary, Timothy Mark
University of Melbourne
[email protected]
Morin, Maxim
Medical Council of Canada
[email protected]
Olgar, Süleyman
[email protected]
Morris, Carrie A
University of Iowa College of Education
[email protected]
Olgar, Süleyman
[email protected]
Morrison, Kristin M
Georgia Institute of Technology
[email protected]
Oliveri, Maria Elena
[email protected]
Muntean, William Joseph
Pearson
[email protected]
Olsen, James B.
Renaissance Learning Inc.
[email protected]
Murphy, Stephen T
[email protected]
Özdemir, Burhanettin
Hacettepe University
[email protected]
Naumann, Alexander
Research (DIPF)
[email protected]
Pak, Seohong
University of Iowa
[email protected]
Naumenko, Oksana
The University of North Carolina at Greensboro
[email protected]
Pan, Tianshu
Pearson
[email protected]
Nebelsick-Gullet, Lori
edCount
[email protected]
Park, Jiyoon
Federation of State Boards of Physical Therapy
[email protected]
Nieto, Ricardo
[email protected]
Park, Yoon Soo
University of Illinois at Chicago
[email protected]
Noh, Eunhee
Korean Institute for Curriculum and Evaluation
[email protected]
Patelis, Thanos
[email protected]
Norton, Jennifer
[email protected]
Patelis, Thanos
[email protected]
207
Peabody, Michael
American Board of Family Medicine
[email protected]
Reboucas, Daniella
[email protected]
Perie, Marianne
[email protected]
Reckase, Mark
[email protected]
Perie, Marianne
CETE University of Kansas
[email protected]
Redell, Nick
National Board of Osteopathic Medical Examiners
(NBOME)
[email protected]
Phadke, Chaitali
University of Minnesota
[email protected]
Renn, Jennifer
[email protected]
Pohl, Steffi
Freie Universität Berlin
[email protected]
Reshetnyak, Evgeniya
Fordham University
[email protected]
Por, Han-Hui
[email protected]
Ricarte, Thales Akira Matsumoto
Institute of Mathematical and Computer Sciences
(ICMC-USP)
[email protected]
Powers, Sonya
Pearson
[email protected]
Rick, Francis
University of Massachusetts, Amherst
[email protected]
QIAN, HAIXIA
[email protected]
Rickels, Heather Anne
University of Iowa, Iowa Testing Programs
[email protected]
Qian, Jiahe
[email protected]
Rios, Joseph A.
[email protected]
QIU, Xue-Lan
The Hong Kong Institute of Education
[email protected]
Risk, Nicole M
American Medical Technologists
[email protected]
Qiu, Yuxi
University of Florida
[email protected]
Roduta Roberts, Mary
[email protected]
Quellmalz, Edys S
WestEd
[email protected]
Rogers, H. Jane
University of Connecticut
[email protected]
Rahman, Nazia
Law School Admission Council
[email protected]
Rorick, Beth
National Parent-Teacher Association
Rankin, Jenny G.
Illuminate Education
[email protected]
Rosen, Yigal
Pearson
[email protected]
208
Washington, DC, USA
Rubright, Jonathan D
American Institute of Certified Public Accountants
[email protected]
Seltzer, Michael
UCLA
[email protected]
Runyon, Christopher R.
[email protected]
Sen, Sedat
Harran University
[email protected]
Rutkowski, Leslie
University of Oslo
[email protected]
Sgammato, Adrienne
[email protected]
Rutstein, Daisy W.
SRI International
[email protected]
Sha, Shuying
University of North Carolina at Greensboro
[email protected]
Sabatini, John
ETS
[email protected]
Shao, Can
[email protected]
Şahin, Füsun
University at Albany, State University of New York
[email protected]
Shaw, Emily
College Board
[email protected]
Saiar, Amin
PSI Services LLC
[email protected]
Shear, Benjamin
Stanford University
[email protected]
Sakworawich, Arnond
National Institute of Development Administration
[email protected]
Shear, Benjamin R.
Stanford University
[email protected]
Samonte, Kelli M.
American Board of Internal Medicine
[email protected]
Sheehan, Kathleen M.
ETS
[email protected]
Sano, Makoto
Prometric
[email protected]
Shermis, Mark D
University of Houston--Clear Lake
[email protected]
Sato, Edynn
Pearson
[email protected]
Shin, Hyo Jeong
ETS
[email protected]
Schultz, Matthew T
American Institute of Certified Public Accountants
[email protected]
Shin, Nami
University of California, Los Angeles/ National Center
for Research on Evaluation, Standards, and Student
Testing (CRESST)
[email protected]
Schwarz, Richard D.
ETS
[email protected]
Secolsky, Charles
Mississippi Department of Education
[email protected]
209
Shropshire, Kevin O.
Virginia Tech (note I graduated in May 2014). I
currently work at the University of Georgia (OIR) and
this research is not affiliated with that department
/ university. I am providing the school where my
research was conducted.
[email protected]
Sweet, Shauna J
University of Maryland, College Park
[email protected]
Swift, David
[email protected]
Swinburne Romine, Russell
[email protected]
Shute, Valerie
Florida State University
[email protected]
Tan, Xuan-Adele
[email protected]
Sinharay, Sandip
Pacific Metrics Corp
[email protected]
Tang, Wei
[email protected]
Sireci, Stephen G.
University of Massachusetts-Amherst
[email protected]
Tannenbaum, Richard J.
[email protected]
Skorupski, William P
[email protected]
Tao, Shuqin
Curriculum Associates
[email protected]
Somasundaran, Swapna
ETS
[email protected]
Terzi, Ragip
Rutgers, The State University of New Jersey
[email protected]
Sorrel, Miguel A.
Universidad Autónoma de Madrid
[email protected]
Thissen, David
University of North Carolina
[email protected]
Stanke, Luke
Minneapolis Public Schools
[email protected]
Thomas, Larry
[email protected]
Stone, Elizabeth
[email protected]
Thummaphan, Phonraphee
University of Washington, Seattle
[email protected]
SU, YU-LAN
ACT.ING
[email protected]
Torres Irribarra, David
Pontificia Universidad Católica de Chile
[email protected]
Suh, Hongwook
ACT, inc.
[email protected]
Traynor, Anne
Purdue University
[email protected]
Sukin, Tia M
Pacific Metrics
[email protected]
Trierweiler, Tammy J.
Law School Admission Council (LSAC)
[email protected]
Svetina, Dubravka
Indiana University
[email protected]
210
Washington, DC, USA
TU, DONGBO
[email protected]
Wang, Shichao
[email protected]
Underhill, Stephanie
Indiana University - Bloomington
[email protected]
Wang, Shudong
[email protected]
van Rijn, Peter
ETS Global
[email protected]
Wang, Wei
[email protected]
Vansickle, Tim
Questar Assessment Inc.,
[email protected]
Wang, Wenyi
[email protected]
Vispoel, Walter P
University of Iowa
[email protected]
Wang, Xi
University of Massachusetts Amherst
[email protected]
von Davier, Matthias
[email protected]
Wang, Xiaolin
Indiana University, Bloomington
[email protected]
Vue, Kory
University of Minnesota
[email protected]
Wang, Zhen
[email protected]
Wainer, Howard
[email protected]
Weeks, Jonathan P
ETS
[email protected]
Walker, Cindy
University of Wisconsin - Milwaukee
[email protected]
Wei, Hua
Pearson
[email protected]
Walker, Michael E
The College Board
[email protected]
Wei, Xiaoxin Elizabeth
[email protected]
wang, aijun
federation of state boards of physical therapy
[email protected]
Wei, Youhua
[email protected]
Wang, Hongling
ACT, Inc.
[email protected]
Weiner, John A.
PSI Services LLC
[email protected]
Wang, Keyin
[email protected]
Welch, Catherine
University of Iowa
[email protected]
Wang, Lu
ACT, Inc./The University of Iowa
[email protected]
Welch, Catherine J
University of Iowa
[email protected]
211
Wendler, Cathy
[email protected]
Xin, Tao
[email protected]
White, Lauren
[email protected]
Xiong, Xinhui
American Institute for Certified Public Accountants
[email protected]
Wiberg, Marie
Umeå University
[email protected]
Xu, Jing-Ru
Pearson VUE
[email protected]
Widiatmo, Heru
ACT, Inc.
[email protected]
Xu, Ting
University of Pittsburgh
[email protected]
Wilson, Mark
[email protected]
Yang, Ji Seung
[email protected]
Wilson, Mark
[email protected]
Yao, Lihua
Defense manpower data center
[email protected]
Wood, Scott W
Pacific Metrics Corporation
[email protected]
Ye, Sangbeak
University of Illinois - Urbana Champaign
[email protected]
Wu, Yi-Fang
University of Iowa
[email protected]
Yi, qin
Faculty of Education, Beijing Normal University
[email protected]
Wyatt, Jeff
College Board
[email protected]
Yi, Qing
ACT, Inc.
[email protected]
Xi, Nuo
[email protected]
Yin, Ping
Curriculum Associates
[email protected]
Xiang, Shibei
National Cooperative Innovation Center for
Assessment and Improvement of Basic Education
Quality
[email protected]
Yoo, Hanwook Henry
[email protected]
Yoon, Su-Youn
[email protected]
Xie, Chao
[email protected]
Yoon, Su-Youn
ETS
[email protected]
Xie, Qing
ACT/The University of Iowa
[email protected]
Zhan, Peida
[email protected]
212
Washington, DC, USA
Zhang, Jin
ACT Inc.
[email protected]
Zhang, Jinming
University Of Illinois at Urbana-Champaign
[email protected]
Zhang, Mengyao
National Conference of Bar Examiners
[email protected]
Zhang, Xue
Northeast Normal University
[email protected]
Zhang, Yu
Federation of State Boards of Physical Therapy
[email protected]
Zhao, Yang
[email protected]
Zheng, Chanjin
[email protected]
Zheng, Chunmei
Pearson
[email protected]
Zheng, Xiaying
[email protected]
Zheng, Yi
Arizona State University
[email protected]
Zweifel, Michael
[email protected]
213
214
Washington, DC, USA
NCME 2016 • Schedule-At-A-Glance
Time
Room
Type
ID
Title
8:00 AM–12:00 PM
Meeting Room 6
TS
AA
Quality Control Tools in Support of Reporting
Accurate and Valid Test Scores
8:00 AM–12:00 PM
Meeting Room 7
TS
BB
IRT Parameter Linking
8:00 AM–5:00 PM
Meeting Room 5
TS
CC
21st Century Skills Assessment: Design,
Development, Scoring, and Reporting of Character
Skills
8:00 AM–5:00 PM
Meeting Room 2
TS
DD
Introduction to Standard Setting
8:00 AM–5:00 PM
Meeting Room 16
TS
EE
Analyzing NAEP Data Using Plausible Values and
Marginal Estimation with AM
8:00 AM–5:00 PM
Meeting Room 4
TS
FF
Multidimensional Item Response Theory: Theory
and Applications and software
1:00 PM–5:00 PM
Meeting Room 3
TS
GG
New Weighting Methods for Causal Mediation
Analysis
1:00 PM–5:00 PM
Meeting Room 6
TS
II
Computerized Multistage Adaptive Testing: Theory
and Applications (Book by Chapman and
Hall)”
8:00 AM–12:00 PM
Renaissance
West B
TS
JJ
Landing Your Dream Job for Graduate Students
8:00 AM–12:00 PM
Meeting Room 4
TS
KK
Bayesian Analysis of IRT Models using SAS PROC
MCMC
8:00 AM–5:00 PM
Meeting Room 2
TS
LL
flexMIRT®: Flexible multilevel multidimensional
item analysis and test scoring
8:00 AM–5:00 PM
Meeting Room 5
TS
MM
Aligning ALDs and Item Response Demands to
Support Teacher Evaluation Systems
8:00 AM–5:00 PM
Renaissance East
TS
NN
Best Practices for Lifecycles of Automated Scoring
Systems for Learning and Assessment
8:00 AM–5:00 PM
Meeting Room 3
TS
OO
Test Equating Methods and Practices
8:00 AM–5:00 PM
Renaissance
West A
TS
PP
Diagnostic Measurement: Theory, Methods,
Applications, and Software
1:00 PM–5:00 PM
Renaissance
West B
TS
QQ
Effective Item Writing for Valid Measurement
3:00 PM–8:00 PM
Meeting Room 11
Board Meeting
4:30 PM–6:30 PM
Fado’s Irish Pub,
808 7th Street
NW, Washington,
DC 20001
Graduate Student Social
CS=Coordinated Session • EB= Electronic Board Session
IS= Invited Session • PS= Paper Session • TS=Training Session
215
Time
Room
Type
ID
Title
6:30 PM–10:00 PM
Convention
Center, Level
Three, Ballroom C
6:30 AM–7:30 AM
Meeting Room 7
8:15 AM–10:15 AM
Renaissance East
IS
A1
NCME Book Series Symposium: The Challenges to
Measurement in an Era of Accountability
8:15 AM–10:15 AM
Renaissance
West A
CS
A2
Collaborative Problem Solving Assessment:
Challenges and Opportunities
8:15 AM–10:15 AM
Renaissance
West B
CS
A3
Harnessing Technological Innovation in Assessing
English Learners: Enhancing Rather Than Hindering
8:15 AM–10:15 AM
Meeting Room 3
PS
A4
How can assessment inform classroom practice?
AERA Centennial Symposium
& Centennial Reception
Sunrise Yoga
8:15 AM–10:15 AM
Meeting Room 4
CS
A5
Enacting a Learning Progression Design to Measure
Growth
8:15 AM–10:15 AM
Meeting Room 5
PS
A6
Testlets and Multidimensionality in Adaptive
Testing
8:15 AM–10:15 AM
Meeting Room 12
PS
A7
Methods for Examining Local Item Dependence
and Multidimensionality
10:35 AM–12:05 PM
Renaissance East
CS
B1
The End of Testing as We Know it?
10:35 AM–12:05 PM
Renaissance
West A
CS
B2
Fairness and Machine Learning for Educational
Practice
10:35 AM–12:05 PM
Renaissance
West B
CS
B3
Item Difficulty Modeling: From Theory to Practice
10:35 AM–12:05 PM
Meeting Room 3
PS
B4
Growth and Vertical Scales
10:35 AM–12:05 PM
Meeting Room 4
PS
B5
Perspectives on Validation
10:35 AM–12:05 PM
Meeting Room 5
PS
B6
Model Fit
10:35 AM–12:05 PM
Meeting Room 12
PS
B7
Simulation- and Game-based Assessments
10:35 AM–12:05 PM
Meeting Room 10
PS
B8
Test Security and Cheating
12:25 PM–1:55 PM
Renaissance East
CS
C1
Opting out of testing: Parent rights versus valid
accountability scores
12:25 PM–1:55 PM
Renaissance
West A
CS
C2
Building toward a validation argument with
innovative field test design and analysis
12:25 PM–1:55 PM
Renaissance
West B
CS
C3
Towards establishing standards for spiraling
of contextual questionnaires in large-scale
assessments
12:25 PM–1:55 PM
Meeting Room 3
CS
C4
Estimation precision of variance components:
Revisiting generalizability theory
12:25 PM–1:55 PM
Meeting Room 4
PS
C5
Sensitivity of Value-Added Models
12:25 PM–1:55 PM
Meeting Room 5
PS
C6
Item and Scale Drift
12:25 PM–1:55 PM
Meeting Room 12
PS
C7
Cognitive Diagnostic Model Extensions
216
Washington, DC, USA
Time
Room
Type
ID
Title
12:25 PM–1:55 PM
Mount Vernon
Square
EB
C8
2:15 PM–3:45 PM
Renaissance East
IS
D1
Assessing the assessments: Measuring the quality
of new college- and career-ready assessments
2:15 PM–3:45 PM
Renaissance
West A
CS
D2
Some psychometric models for learning
progressions
2:15 PM–3:45 PM
Renaissance
West B
CS
D3
Multiple Perspectives on Promoting Assessment
Literacy for Parents
2:15 PM–3:45 PM
Meeting Room 3
PS
D4
Equating Mixed-Format Tests
2:15 PM–3:45 PM
Meeting Room 4
PS
D5
Standard Setting
2:15 PM–3:45 PM
Meeting Room 5
PS
D6
Diagnostic Classification Models: Applications
2:15 PM–3:45 PM
Meeting Room 12
PS
D7
Advances in IRT Modelling and Estimation
2:15 PM–3:45 PM
Mount Vernon
Square
EB
D8
GSIC Poster Session
4:05 PM–6:00 PM
Renaissance East
CS
E1
Do Large Scale Performance Assessments Influence
Classroom Instruction? Evidence from the Consortia
4:05 PM–6:05 PM
Renaissance
West A
CS
E2
Applications of Latent Regression to Modeling
Student Achievement, Growth, and Educator
Effectiveness
4:05 PM–6:05 PM
Renaissance
West B
CS
E3
Jail Terms for Falsifying Test Scores:  Yes, No
or Uncertain?
4:05 PM–6:05 PM
Meeting Room 3
PS
E4
Test Design and Construction
4:05 PM–6:05 PM
Meeting Room 4
CS
E5
Tablet Use in Assessment
4:05 PM–6:05 PM
Meeting Room 5
PS
E6
Topics in Multistage and Adaptive Testing
4:05 PM–6:05 PM
Meeting Room 12
PS
E7
Cognitive Diagnosis Models: Exploration and
Evaluation
4:05 PM–5:35 PM
Mount Vernon
Square
EB
E8
6:30 PM–8:00 PM
Grand Ballroom
South
NCME and Division D Reception
8:00 AM–9:00 AM
Marriott Marquis
Hotel, Marquis
Salon 6
Breakfast and Business Session
9:00 AM–9:40 AM
Marriott Marquis
Hotel, Marquis
Salon 6
Presidential Address: Education and the
Measurement of Behavioral Change
10:35 AM–12:05 PM
Renaissance East
IS
F1
Career Award: Do Educational Assessments Yield
Achievement Measurements
217
Time
Room
Type
ID
Title
10:35 AM–12:05 PM
Renaissance
West A
IS
F2
Debate: Should the NAEP Mathematics Framework
be revised to align with the Common Core State
Standards?
10:35 AM–12:05 PM
Renaissance
West B
CS
F3
Beyond process: Theory, policy, and practice in
standard setting
10:35 AM–12:05 PM
Meeting Room 3
CS
F4
Exploring Timing and Process Data in Large-Scale
Assessments
10:35 AM–12:05 PM
Meeting Room 4
CS
F5
Psychometric Challenges with the Machine Scoring
of Short-Form Constructed Responses
10:35 AM–12:05 PM
Meeting Room 5
PS
F6
Advances in Equating
10:35 AM–12:05 PM
Meeting Room 15
PS
F7
Novel Approaches for the Analysis of Performance
Data
10:35 AM–12:05 PM
Mount Vernon
Square
EB
F8
12:25 PM–2:25 PM
Convention
Center, Level
Three, Ballroom
ABC
2:45 PM–4:15 PM
Renaissance East
CS
G1
Challenges and Opportunities in the Interpretation
of the Testing Standards
2:45 PM–4:15 PM
Renaissance
West A
CS
G2
Applications of Combinatorial Optimization in
Educational Measurement
2:45 PM–4:15 PM
Renaissance
West B
PS
G3
Psychometrics of Teacher Ratings
2:45 PM–4:15 PM
Meeting Room 3
PS
G4
Multidimensionality
G5
Validating “Noncognitive”/Nontraditional
Constructs I
2:45 PM–4:15 PM
Meeting Room 4
AERA Awards Luncheon
PS
2:45 PM–4:15 PM
Meeting Room 5
PS
G6
Invariance
2:45 PM–4:15 PM
Meeting Room 15
PS
G7
Detecting Aberrant Response Behaviors
2:45 PM–4:15 PM
Mount Vernon
Square
EB
G8
GSIC Poster Session
4:35 PM–5:50 PM
Convention
Center, Level
Three, Ballroom C
4:35 PM–6:05 PM
Renaissance East
CS
H1
Advances in Balanced Assessment Systems:
Conceptual framework, informational analysis,
application to accountability
4:35 PM–6:05 PM
Renaissance
West A
CS
H2
Minimizing Uncertainty: Effectively Communicating
Results from CDM-based Assessments
4:35 PM–6:05 PM
Meeting Room 16
CS
H3
Overhauling the SAT: Using and Interpreting
Redesigned SAT Scores
AERA Presidential Address
218
Washington, DC, USA
Time
4:35 PM–6:05 PM
Room
Meeting Room 3
Type
ID
CS
H4
Title
Quality Assurance Methods for Operational
Automated Scoring of Essays and Speech
4:35 PM–6:05 PM
Meeting Room 4
PS
H5
Student Growth Percentiles
4:35 PM–6:05 PM
Meeting Room 5
PS
H6
Equating: From Theory to Practice
4:35 PM–6:05 PM
Meeting Room 15
PS
H7
Issues in Ability Estimation and Scoring
4:35 PM–6:05 PM
Mount Vernon
Square
EB
H8
6:30 PM–8:00 PM
Renaissance
West B
President’s Reception
5:45 AM–7:00 AM
NCME Fitness Run/Walk
8:15 AM–10:15 AM
Meeting Room
13/14
IS
I1
NCME Book Series Symposium: Technology and
Testing
8:15 AM–10:15 AM
Meeting Room
8/9
CS
I2
Exploring Various Psychometric Approaches to
Report Meaningful Subscores
8:15 AM–10:15 AM
Meeting Room 3
CS
I3
From Items to Policies: Big Data in Education
8:15 AM–10:15 AM
Meeting Room 4
CS
I4
Methods and Approaches for Validating Claims of
College and Career Readiness
8:15 AM–10:15 AM
Renaissance
West A
IS
I5
Recent Advances in Quantitative Social Network
Analysis in Education
8:15 AM–10:15 AM
Meeting Room 15
PS
I6
Issues in Automated Scoring
8:15 AM–10:15 AM
Meeting Room 16
PS
I7
Multidimensional and Multivariate methods
10:35 AM–12:05 PM
Renaissance
West A
IS
J1
Hold the Presses! How Measurement Professionals
can Speak More Effectively with the Press and the
Public (Education Writers Association Session)
10:35 AM–12:05 PM
Meeting Room
8/9
CS
J2
Challenges and solutions in the operational use of
automated scoring systems
10:35 AM–12:05 PM
Meeting Room 3
CS
J3
Novel Models to Address Measurement Errors in
Educational Assessment and Evaluation Studies
10:35 AM–12:05 PM
Meeting Room 4
CS
J4
Mode Comparability Investigation of a CCSS based
K-12 Assessment
10:35 AM–12:05 PM
Meeting Room 16
PS
J5
Validating “Noncognitive”/Nontraditional
Constructs II
10:35 AM–12:05 PM
Meeting Room 15
PS
J6
Differential Functioning - Theory and Applications
10:35 AM–12:05 PM
Meeting Room 5
PS
J7
Latent Regression and Related Topics
11:00 AM–2:00 PM
Meeting Room 12
12:25 PM–1:55 PM
Meeting Room
8/9
IS
K1
The Every Students Succeeds Act (ESSA):
Implications for measurement research and practice
12:25 PM–1:55 PM
Renaissance
West A
CS
K2
Career Paths in Educational Measurement: Lessons
Learned by Accomplished Professionals
Past Presidents Luncheon
219
Time
Room
Type
ID
Title
12:25 PM–1:55 PM
Meeting Room 3
CS
K3
Recent Investigations and Extensions of the
Hierarchical Rater Model
12:25 PM–1:55 PM
Meeting Room 4
CS
K4
The Validity of Scenario-Based Assessment:
Empirical Results
12:25 PM–1:55 PM
Meeting Room 5
PS
K5
Item Design and Development
12:25 PM–1:55 PM
Meeting Room 15
PS
K6
English Learners
12:25 PM–1:55 PM
Meeting Room 16
PS
K7
Differential Item and Test Functioning
12:25 PM–1:55 PM
Mount Vernon
Square
EB
K8
2:15 PM–3:45 PM
Renaissance
West A
IS
L1
Learning from History: How K-12 Assessment Will
Impact Student Learning Over the Next Decade
(National Association of Assessment Directors)
2:15 PM–3:45 PM
Meeting Room
8/9
CS
L2
Psychometric Issues on the Operational NewGeneration Consortia Assessments
2:15 PM–3:45 PM
Meeting Room 3
CS
L3
Issues and Practices in Multilevel Item Response
Models
2:15 PM–3:45 PM
Meeting Room 4
CS
L4
Psychometric Issues in Alternate Assessments
2:15 PM–3:45 PM
Meeting Room 5
CS
L5
Recommendations for Addressing the Unintended
Consequences of Increasing Examination Rigor
2:15 PM–3:45 PM
Meeting Room 15
PS
L6
Innovations in Assessment
2:15 PM–3:45 PM
Meeting Room 12
PS
L7
Technology-based Assessments
2:15 PM–3:45 PM
Meeting Room
13/14
L8
NCME Diversity and Testing Committee Sponsored
Symposium: Implications of Computer-Based
Testing for Assessing Diverse Learners: Lessons
Learned from the Consortia
3:00 PM–7:00 PM
Meeting Room
10/11
4:05 PM–6:05 PM
Meeting Room
8/9
CS
M1
Fairness Issues and Validation of Non-Cognitive
Skills
4:05 PM–6:05 PM
Meeting Room 3
CS
M2
Thinking about your Audience in Designing and
Evaluating Score Reports
4:05 PM–6:05 PM
Meeting Room 4
CS
M3
Use of automated tools in listening and reading
item generation
4:05 PM–6:05 PM
Meeting Room 5
PS
M4
Practical Issues in Equating
4:05 PM–6:05 PM
Meeting Room 16
PS
M5
The Great Subscore Debate
4:05 PM–6:05 PM
Meeting Room 12
PS
M6
Scores and Scoring Rules
4:05 PM–6:05 PM
Meeting Room
13/14
IS
M7
On the use and misuse of latent variable scores
IS
Board Meeting
220
National Council on Measurement in Education is
very grateful to the following organizations for their
generous financial support of our 2016 Annual Meeting
National Council on Measurement in Education
100 North 20th Street, Suite 400. Philadelphia, PA 19103 (215) 461-6263
http://www.ncme.org/

2016 program

Transcription

Similar documents

I`m helping to end MS by participating in the 2016 Jayman BUILT MS

The Board of Directors of Qatar Fuel (WOQOD) is pleased to invite

5 `S` Training Program at ABV-IIITM, Gwalior on 21.07.2016

Porky Expo Flyer

The NeuLion Report

Sept 30-Oct 2, 2016 Nov 4-6, 2016 @Camp Tadmor

4 star ALOE HOTEL, PAPHOS

the Guidelines for the registration on

Ticket plus hotel packages for the Chelsea Flower Show 2016

loyola institute for ministry on-campus open house thursday, may 12