2016 program
Transcription
2016 program
2016 Training Sessions April 7-8 2016 Annual Meeting April 9-11 Renaissance Washington, DC Downtown Hotel Washington, DC ©Thinkstock 2016 PROGRAM National Council on Measurement in Education Foundations and Frontiers: Advancing Educational Measurement for Research, Policy, and Practice 2016 Training Sessions April 7-8 2016 Annual Meeting April 9-11 Renaissance Washington, DC Downtown Hotel Washington, DC #NCME16 Washington, DC, USA Table of Contents NCME Board of Directors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Proposal Reviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Future Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Renaissance Washington, DC Downtown Hotel Meeting Room Floor Plans . . . . . . . 8 Training Sessions Thursday, April 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Friday, April 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Program Saturday, April 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Sunday, April 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Monday, April 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Contact Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Schedule-at-a-Glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Foundations and Frontiers: Advancing Educational Measurement for Research, Policy, and Practice 3 2016 Annual Meeting & Training Sessions NCME Officers President Richard J. Patz ACT, Iowa City, IA Vice President Mark Wilson UC Berkeley, Berkeley, CA Past President Lauress Wise HUMRRO, Seaside, CA NCME Directors Amy Hendrickson The College Board, Newtown, PA Kristen Huff ACT, Iowa City, IA Luz Bay The College Board, Dover, NH Won-Chan Lee University of Iowa, Iowa City, IA Cindy Walker University of Wisconsin-Milwaukee, Milwaukee, WI C Dale Whittington Shaker Heights (OH) Public Schools, Shaker Heights, OH 4 Washington, DC, USA Editors Journal of Educational Measurement Jimmy de la Torre Rutgers, The State University of NJ, New Brunswick, NJ Educational Measurement Issues and Practice Dr. Howard Everson SRI International, Menlo Park, CA NCME Newsletter Heather M. Buzick Educational Testing Service, Princeton, NJ Website Content Editor Brett Foley Alpine Testing Solutions, Denton, NE 2016 Annual Meeting Chairs Annual Meeting Program Chairs Andrew Ho Harvard Graduate School of Education, Cambridge, MA Matthew Johnson Columbia University, New York, NY Graduate Student Issues Committee Chair Brian Leventhal University of Pittsburgh, Pittsburgh, PA Training and Development Committee Chair Xin Li ACT, Iowa City, IA Fitness Run/Walk Directors Katherine Furgol Castellano ETS, San Francisco, CA Jill R. van den Heuvel Alpine Testing Solutions, Hatfield, PA NCME Information Desk The NCME Information Desk is located on the Meeting Room Level in the Renaissance Washington, DC Downtown Hotel. Stop by to pick up a ribbon and obtain your bib number and tee-shirt for the fun run and walk. It will be open at the following times: Thursday, April 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7:30 AM-4:30 PM Friday, April 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-4:30 PM Saturday, April 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10:00 AM-4:30 PM Sunday, April 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-1:00 PM Monday, April 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8:00 AM-1:00 PM 5 2016 Annual Meeting & Training Sessions Proposal Reviewers Terry Ackerman Benjamin Andrews Robert Ankenmann Karen Barton Kirk Becker Anton Beguin Dmitry Belov Tasha Beretvas Jonas Bertling Damian Betebenner Dan Bolt Laine Bradshaw Henry Braun Robert Brennan Brent Bridgeman Derek Briggs* Chad Buckendahl Li Cai* Wayne Camara Katherine Furgol Castellano Ying Cheng Chia-Yi Chiu Won-Chan Lee* Dongmei Li Jinghua Liu Skip Livingston JR Lockwood* Susan Loomis Krista Mattern Andy Maul* Dan McCaffrey Katie McClarty Catherine McClellan Patrick Meyer Paul Nichols Maria Oliveri Andreas Oranje Thanos Patelis Susan Philips Mary Pitoniak John Poggio Sophia RabeHesketh Mark Reckase Frank Rijmen Steve Culpepper Mark Davison Jimmy de la Torre John Donoghue Jeff Douglas Michael Edwards Karla Egan* Kadriye Ercickan Steve Ferrara Holmes Finch* Mark Gierl Brian Habing Chris Han Mark Hansen Deborah Harris Kristen Huff* Minjeong Jeon Hong Jiao Matt Johnson Daniel Jurich Seock-Ho Kim Jennifer Kobrin Suzanne Lane * Indicates Expert Panel Chairperson 6 Michael Rodriguez Sandip Sinharay Steve Sireci Dubravka Svetina Ye Tong* Anna Topczewski Peter van Rijn Jay Verkuilen Alina von Davier Matthias von Davier Michael Walker Chun Wang Jonathan Weeks Cathy Wendler Andrew Wiley Steve Wise Duanli Yan John Young April Zenisky* Washington, DC, USA Graduate Student Abstract Reviewers Lokman Akbay Beyza Aksu Abeer Alamri Bruce Austin Elizabeth Barker Diego Luna Bazaldua Masha Bertling Lisa Beymer Mark Bond Nuliyana Bukhari Jie Chen Michelle Chen Yi-Chen Chiang Shenghai Dai Tianna Floyd Oscar Gonzalez Mary Norris Nese Ozturk Robyn Pitts Ray Reichenberg Sumeyra Sahbaaz Tyler Sandersfeld Can Shao Benjamin Shear Jordan Sparks Rose Stafford Latisha Sternod Myrah Stockdale Meghan Sulivan Ragip Terzi Stephanie Underhill Keyin Wang Emily Ho Landon Hurley Charlie Iaconangelo Andrew Iverson Kyle Jennings HeaWon Jun Susan Kahn Jaclyn Kelly Brian Leventhal Isaac Li Dandan Liao Fu Liu David MartinezAlpizar Namita Mehta Rich Nieto Future Annual Meeting 2017 Annual Meeting April 26-30 San Antonio, TX 2018 Annual Meeting April 12-16 New York, NY, USA 2019 Annual Meeting April 4-8 Toronto, Ontario, Canada 7 Min Wang Ting Wang Xiaolin Wang Diah Wihardini Elizabeth Williams Immanuel Williams Dawn Woods Kuan Xing Jing-Ru Xu Menglin Xu Sujin Yang Ai Ye Nedim Yel Hulya Yurekli 2016 Annual Meeting & Training Sessions Hotel Floor Plans – Renaissance Washington, DC Downtown 8 Washington, DC, USA 9 2016 Annual Meeting & Training Sessions 10 Washington, DC, USA A Message from Your Program Chairs 2016 NCME Program Highlights: Foundations and Frontiers: Advancing Educational Measurement for Research, Policy, and Practice We are pleased to highlight a few of the many excellent sessions that our members have contributed, as well as congratulate our partners at AERA on their centennial celebration. From the very first conference session, at 8:15AM on Saturday, April 9, we’re kicking it off with big-picture topics (Henry Braun leading an invited session for the recent NCME volume: Challenges to Measurement in an Era of Accountability) alongside technical advances (Derek Briggs leading off a session on Learning Progressions for Measuring Growth). The momentum continues through our last session, at 4:05 on Monday, April 11, where we tackle buzz phrases (Thanos Patelis convening a session on Fairness Issues and Validation of Noncognitive Skills) and settle scores (The Great Subscore Debate, with Emily Bo, Howard Wainer, Sandip Sinharay, and many others facing off to surely resolve the issue once and for all). We are taking full advantage of our location in Washington, DC, with an invited session on the recently passed Every Students Succeeds Act over lunchtime on Monday. Peter Oppenheim and Sarah Bolton, Education Policy Directors (majority and minority, respectively) for the US Senate HELP Committee will discuss key provisions and spark a discussion among researchers about ESSA’s Implications and Opportunities for Measurement Research and Practice. Earlier that Monday morning, Kristen Huff will convene reporters and scholars in a session with the lively title: Hold the Presses! How Measurement Professionals can Speak More Effectively with the Press and the Public. Consistent with our theme, our many sessions highlight both foundations (Isaac Bejar coordinates a session on Item Response Modeling: From Theory to Practice, while Karla Egan convenes a session on Standard Setting: Beyond Process) and frontiers (Tracy Sweet will lead a session on Recent Advances in Social Network Analysis, and Will Lorie takes on Big Data in Education: From Items to Policies). Stay up to date at the Twitter hashtag #NCME16 and our new NCME Facebook group. We are confident that you will enjoy the program that you have helped to create here at the 2016 NCME Annual Meeting. Andrew Ho and Matt Johnson 2016 NCME Annual Meeting Co-Chairs 11 2016 Annual Meeting & Training Sessions Pre-Conference Training Sessions The 2016 NCME Pre-Conference Training Sessions will be held at the Renaissance Washington, DC Downtown Hotel on Thursday, April 7 and Friday, April 8. All full-day sessions will be held from 8:00 AM to 5:00 PM. All half-day morning sessions will be held from 8:00 AM to 12:00 noon. All half-day afternoon sessions will run from 1:00 PM to 5:00 PM. On-site registration for the Pre-Conference Training Sessions will be available at the NCME Information Desk at the Renaissance Washington, DC Downtown Hotel for those workshops that still have availability. Please note that internet connectivity will not be available for most training sessions and, where applicable, participants should download the software required prior to the training sessions. Internet connectivity will be available for a few selected training sessions that have pre-paid an additional fee. 12 Washington, DC, USA Pre-Conference Training Sessions - Thursday, April 7, 2016 13 2016 Annual Meeting & Training Sessions 14 Washington, DC, USA Thursday, April 7, 2016 8:00 AM - 12:00 PM, Meeting Room 6, Meeting Room Level, Training Session, AA Quality Control Tools in Support of Reporting Accurate and Valid Test Scores Aster Tessema, American Institute of Certified Public Accountants; Oliver Zhang, The College Board; Alina VonDavier, Educational Testing Service All testing companies focus on ensuring that the test scores are valid, reliable, and fair. Significant resources are allocated to meet the guidelines of well-known organizations, such as AERA/NCME, and/or The international Test Commission Guidelines (Allalouf, 2007; ITC, 2011). In this workshop we will discuss traditional QC methods, the operational testing process, and new QC tools for monitoring the stability of scores over time. We will provide participants a practical understanding of: 1. The importance of flow charts and documentation of procedures 2. The use of software tools to monitor tasks 3. How to minimize the number of hand offs 4. How to automate activities 5. The importance of trend analysis to detect anomalies 6. The importance of applying detective and preventive controls 7. Having a contingency plan We will also show how to apply QC techniques from manufacturing to monitor scores. We will discuss traditional QC charts (Shewhart and CUSUM charts), time series models, and change point models to the means of scale scores to detect abrupt changes (Lee & von Davier, 2013). We will also discuss the QC methods for the process of automated & human scoring of essays (Wang & von Davier, 2014). 15 2016 Annual Meeting & Training Sessions Thursday, April 7, 2016 8:00 AM - 12:00 PM, Meeting Room 7, Meeting Room Level, Training Session, BB IRT Parameter Linking Wim van der Linden and Michelle Barrett, Pacific Metrics The problem of IRT parameter linking arises when the values of the parameters for the same items or examinees in different calibrations need to be compared. So far, the problem has mainly be conceptualized as an instance of the problem of invariance of the measurement scale for the ability parameters in the tradition of S. S. Stevens’ interval scales. In this half-day training session, we show that the linking problem has not much to do with arbitrary units and zeros of measurement scales but is the result of a more fundamental problem inherent in all IRT models—general lack of identifiability of their parameters. The redefinition of the linking problem allows us to formally derive the linking functions required to adjust for the differences in parameter values between separate calibrations. It also leads to new efficient statistical estimators of their parameters, the derivation of their standard errors, and the use of current optimal test-design methods to design linking studies with minimal error. All these results have been established both for the current dichotomous and polytomous IRT models. The results will be presented during four one-hour lectures appropriate for psychometricians with interest and/or practical experience in IRT parameter linking problems. 16 Washington, DC, USA Thursday, April 7, 2016 8:00 AM - 5:00 PM, Meeting Room 5, Meeting Room Level, Training Session, CC 21st Century Skills Assessment: Design, Development, Scoring, and Reporting of Character Skills Patrick Kyllonen and Jonas Bertling, Educational Testing Service This workshop will provide training, discussion, and hands-on experience in developing methods for assessing, scoring, and reporting on students’ social-emotional and self-management or character skills. Workshop will focus on (a) reviewing the kinds of character skills most important to assess based on current research; (b) standard and innovative methods for assessing character skills, including self-, peer-, teacher-, and parent- rating-scale reports, forced-choice (rankings), anchoring vignettes, and situational judgment methods; (c) cognitive lab approaches for item tryout; (d) classical and item-response theory (IRT) scoring procedures (e.g., 2PL, partial credit, nominal response model); (e) validation strategies, including the development of rubrics and behaviorally anchored rating scales, and correlations with external variables; (f ) the use of anchors in longitudinal growth studies, (g) reliability from classical test theory (alpha, test-retest), item-response theory, and generalizability theory; and (h) reporting issues. These topics will be covered in the workshop where appropriate, but the sessions within the workshop will tend to be organized around item types (e.g., forced-choice, anchoring vignettes). Examples will be drawn from various assessments, including PISA, NAEP, SuccessNavigator, FACETS, and others. The workshop is designed for a broad audience of assessment developers, analysts, and psychometricians, working in either applied or research settings. 17 2016 Annual Meeting & Training Sessions Thursday, April 7, 2016 8:00 AM - 5:00 PM, Meeting Room 2, Meeting Room Level, Training Session, DD Introduction to Standard Setting Chad Buckendahl, Alpine Testing Solutions; Jennifer Dunn, Measured Progress; Karla Egan, National Center for the Improvement of Educational Assessment; Lisa Keller, University of Massachusetts Amherst; Lee LaFond, Measured Progress As states adopt new standards and assessments the expectations on psychometricians from a political perspective have been increasing. The purpose of this training session is to provide a practical introduction to the standard setting process while addressing common policy concerns and expectations. This training will follow the Evidence-Based Standard Setting (EBSS) framework. The first third of the session will touch upon some of the primary pre-meeting developmental and logistical activities as well as the EBSS steps of defining outcomes and developing relevant research as guiding validity evidence. The middle third of the session will be focused on the events of the standard setting meeting itself. The session facilitators will walk them through the phases of a typical standard setting, and participants will experience a training session on the Bookmark, Angoff, and Body of Work methods followed by practice rating rounds with discussion. The final third of the training session will give an overview of what happens following a standard setting meeting. This will be carried out through a panel discussion with an emphasis on policy expectations and the importance of continuing to gather evidence in support of the standard. 18 Washington, DC, USA Thursday, April 7, 2016 8:00 AM - 5:00 PM, Meeting Room 16, Meeting Room Level, Training Session, EE Analyzing NAEP Data Using Plausible Values and Marginal Estimation with AM Emmanuel Sikali, National Center for Education Statistics; Young Yee Kim, American Institues for Research Since results from the National Assessment of Education Progress (NAEP) serve as a common metric for all states and select urban districts, many researchers are interested in conducting studies using NAEP data. However, NAEP data pose many challenges for researchers due to its special design features. This class intends to provide analytic strategies and hands-on practice with researchers who are interested in NAEP data analysis. The class consists of two parts: (1) instructions on the psychometric and sampling designs of NAEP and data analysis strategies required by these design features and (2) the demonstration of NAEP data analysis procedures and hands-on practice. The first part includes marginal maximum likelihood estimation approach to obtaining scale scores and appropriate variance estimation procedures and the second part includes two approaches to NAEP data analysis, i.e. using the plausible values approach and the marginal estimation approach with item response data. The demonstration and hands-on practice will be conducted with a free software program, AM, using a mini-sample public-use NAEP data file released in 2011. Intended participants are researchers, including graduate students, education practitioners, and policy analysts, who are interested in NAEP data analysis. 19 2016 Annual Meeting & Training Sessions Thursday, April 7, 2016 8:00 AM - 5:00 PM, Meeting Room 4, Meeting Room Level, Training Session, FF Multidimensional Item Response Theory: Theory and Applications and Software Lihua Yao, Defense Manpower Data Center; Mark Reckase, Michigan State University; Rich Schwarz, ETS Theories and applications of multidimensional item response theory model (MIRT) and Multidimensional Computer Adaptive testing (MCAT) and MIRT linking are discussed. Software demonstrated and hands on experienced cover areas for multidimensional multi-group calibration, multidimensional linking, and MCAT simulation; intended for researchers who are interested in MIRT and MCAT. 20 Washington, DC, USA Thursday, April 7, 2016 1:00 PM - 5:00 PM, Meeting Room 3, Meeting Room Level, Training Session, GG New Weighting Methods for Causal Mediation Analysis Guanglei Hong, University of Chicago Many important research questions in education relate to how interventions work. A mediator characterizes the hypothesized intermediate process. Conventional methods for mediation analysis generate biased results when the mediator-outcome relationship depends on the treatment condition. These methods also tend to have a limited capacity for removing confounding associated with a large number of covariates. This workshop teaches the ratio-of-mediatorprobability weighting (RMPW) method for decomposing total treatment effects into direct and indirect effects in the presence of treatment-by-mediator interactions. RMPW is easy to implement and requires relatively few assumptions about the distribution of the outcome, the distribution of the mediator, and the functional form of the outcome model. We will introduce the concepts of causal mediation, explain the intuitive rationale of the RMPW strategy, and delineate the parametric and nonparametric analytic procedures. Participants will gain hands-on experiences with a free standalone RMPW software program. We will also provide SAS, Stata, and R code and will distribute related readings. The target audience includes graduate students, early career scholars, and advanced researchers who are familiar with multiple regression and have had prior exposure to binary and multinomial logistic regression. Each participant will need to bring a laptop for hands-on exercises. 21 2016 Annual Meeting & Training Sessions Thursday, April 7, 2016 1:00 PM - 5:00 PM, Meeting Room 6, Meeting Room Level, Training Session, II Computerized Multistage Adaptive Testing: Theory and Applications (Book by Chapman and Hall) Duanli Yan, Educational Testing Service; Alina von Davier, ETS; Kyung Chris Han This workshop provides a general overview of a computerized multistage test (MST) design and its important concepts and processes. The focus of the workshop will be on MST theory and applications including alternative scoring and estimation methods, classification tests, routing and scoring, linking, test security, as well as a live demonstration of MST software MSTGen (Han, 2013). This workshop is based on the edited volume of Yan, von Davier, & Lewis (2014). The volume is structured to take the reader through all the operational aspects of the test, from the design to the postadministration analyzes. The training course consists of a series of lectures and hands-on examples in the following four sessions: 1. MST Overview, Design, and Assembly 2. MST Routing, Scoring, and Estimations 3. MST Applications 4. MST Simulation Software The MST design is described, why it is needed, and how it differs from other test designs, such as linear test and computer adaptive test (CAT) designs. This course is intended for people who have some basic understanding of item response theory and CAT. 22 Washington, DC, USA Pre-Conference Training Sessions - Friday, April 8, 2016 23 2016 Annual Meeting & Training Sessions 24 Washington, DC, USA Friday, April 8, 2016 8:00 AM - 12:00 PM, Renaissance West B, Ballroom Level, Training Session, JJ Landing Your Dream Job for Graduate Students Deborah Harris and Xin Li, ACT, Inc. This training session will address practical topics graduate students in measurement are interested in regarding finding a job and starting a career. It will concentrate on what to do now while they are still in school to best prepare for a job (including finding a dissertation topic, selecting a committee, maximizing experiences while still a student with networking, internships, and volunteering, and providing suggestions to the questions regarding what types of coursework an employer looks for, and what would make a good job talk), how to locate, interview for, and obtain a job (including how to find where jobs are, how to apply for jobs --targeting cover letters, references, and resumes), what to expect in the interview process (including job talks, questions to ask, and negotiating an offer), and what’s next after they have started their first post PhD job (including adjusting to the environment, establishing a career path, publishing, finding mentors, balancing work and life, and becoming active in the profession). The session is interactive, and geared to addressing the participants’ questions during the session. Resource materials are provided on all relevant topics. 25 2016 Annual Meeting & Training Sessions Friday, April 8, 2016 8:00 AM - 12:00 PM, Meeting Room 4, Meeting Room Level, Training Session, KK Bayesian Analysis of IRT Models using SAS PROC MCMC Clement Stone, University of Pittsburgh There is a growing interest in Bayesian estimation of IRT models, in part due to the appeal of the Bayesian paradigm, as well as the advantages of these methods with small sample sizes, more complex models (e.g., multidimensional models), and simultaneous estimation of item and person parameters. Software has become available, SAS and WinBUGS, which make a Bayesian analysis of IRT models more accessible to psychometricians, researchers, and scale developers. SAS PROC MCMC offers several advantages over other software, and the purpose of this training session is to illustrate how SAS can be used to implement a Bayesian analysis of IRT models. After reviewing briefly Bayesian methods and IRT models, PROC MCMC is introduced. This introduction includes discussion of a template for estimating IRT models as well as convergence diagnostics and specification of prior distributions. Also discussed are extensions for more complex models (e.g., multidimensional, mixture) and methods for comparing models and evaluating model fit. The instructional approach will be one involving lecture and demonstration. Considerable code and output will be discussed and shared. An overall objective is that attendees can extend examples to their testing applications. Some understanding of SAS programs and SAS procedures is helpful. 26 Washington, DC, USA Friday, April 8, 2016 8:00 AM - 5:00 PM, Meeting Room 2, Meeting Room Level, Training Session, LL flexMIRT®: Flexible Multilevel Multidimensional Item Analysis and Test Scoring Li Cai, University of California - Los Angeles; Carrie R. Houts, Vector Psychometric Group, LLC There has been a tremendous amount of progress in item response theory (IRT) in the past two decades. flexMIRT® is IRT software which offers multilevel, multidimensional, and multiple group item response models. flexMIRT® also offers users the ability to obtain recently developed model fit indices, fit diagnostic classification models, and models with non-normal latent densities, among other advanced features. This training session will introduce users to the flexMIRT® system and provide valuable hands on experience with the software. 27 2016 Annual Meeting & Training Sessions Friday, April 8, 2016 8:00 AM - 5:00 PM, Meeting Room 5, Meeting Room Level, Training Session, MM Aligning ALDs and Item Response Demands to Support Teacher Evaluation Systems Steve Ferrara, Pearson School; Christina Schneider, The National Center for the Improvement of Educational Assessment A primary goal of achievement tests is to classify students into achievement levels that enable inferences about student knowledge and skill. Explicating how knowledge and skills differ in complexity and empirical item difficulty—at the beginning of test design—is critical to those inferences. In this session we demonstrate for experts in assessment design, standard setting, formative assessment, or teacher evaluation how emerging practices in statewide tests for developing ALDs, training item writers to align items to ALDs, and identifying item response demands can be used to support teachers to develop student learning objectives (SLOs) in nontested grades and subjects. Participants will analyze ALDs, practice writing items aligned to those ALD response demands, and analyze classroom work products from teachers who used some of these processes to create SLOs. We will apply a framework for connecting ALDs (Egan et al., 2012), the ID Matching standard setting method (Ferrara & Lewis, 2012), and item difficulty modeling techniques (Ferrara et al., 2011; Schneider et al., 2013) to a process that generalizes from statewide tests to SLOs, thereby supporting construct validity arguments for student achievement indicators used for teacher evaluation. 28 Washington, DC, USA Friday, April 8, 2016 8:00 AM - 5:00 PM, Renaissance East, Ballroom Level, Training Session, NN Best Practices for Lifecycles of Automated Scoring Systems for Learning and Assessment Peter Foltz, Pearson; Claudia Leacock, CTB/McGraw Hill; André Rupp and Mo Zhang, Educational Testing Service Automated scoring systems are designed to evaluate performance data in order to assign scores, provide feedback, and/or facilitate teaching-learning interactions. Such systems are used in K-12 and higher education for such areas as ELA, science, and mathematics, as well as in professional domains such as medicine and accounting, across various use contexts. Over the past 20 years, there has been rapid growth around research on the underlying theories and methods of automated scoring, the development of new technologies, and ways to implement automated scoring systems effectively. Automated scoring systems are developed by a diverse community of scholars and practitioners encompassing such fields as natural language processing, linguistics, speech science, statistics, psychometrics, educational assessment, and learning and cognitive sciences. As the application of automated scoring continues to grow, it is important for the NCME community to have an overarching understanding of the best practices for designing, evaluating, deploying, and monitoring such systems. In this training session, we provide participants with such an understanding via a mixture of presentations, individual and group-level discussions, and structured and freeplay demonstration activities. We utilize systems that are both proprietary and freely available, and provide participants with resources that empower them in their own future work. 29 2016 Annual Meeting & Training Sessions Friday, April 8, 2016 8:00 AM - 5:00 PM, Meeting Room 3, Meeting Room Level, Training Session, OO Test Equating Methods and Practices Michael Kolen and Robert Brennan, University of Iowa The need for equating arises whenever a testing program uses multiple forms of a test that are built to the same specifications. Equating is used to adjust scores on test forms so that scores can be used interchangeably. The goals of the session are for attendees to be able to understand the principles of equating, to conduct equating, and to interpret the results of equating in reasonable ways. The session focuses on conceptual issues. Practical issues are considered. 30 Washington, DC, USA Friday, April 8, 2016 8:00 AM - 5:00 PM, Renaissance West A, Ballroom Level, Training Session, PP Diagnostic Measurement: Theory, Methods, Applications, and Software Jonathan Templin and Meghan Sullivan, University of Kansas Diagnostic measurement is a field of psychometrics that focuses on providing actionable feedback from multidimensional tests. This workshop provides a hands-on introduction to the terms, techniques, and methods used for diagnosing what students know, thereby giving researchers access to information that can be used to guide decisions regarding students’ instructional needs. Upon completion of the workshop, participants will be able to understand the rationale and motivation for using diagnostic measurement methods. Furthermore, participants will be able to understand the types of data typically used in diagnostic measurement along with the information that can be obtained from implementing diagnostic models. Participants will become well-versed in the state-of-the-art techniques currently used in practice and will be able to use and estimate diagnostic measurement models using new software developed by the instructor 31 2016 Annual Meeting & Training Sessions Friday, April 8, 2016 1:00 PM - 5:00 PM, Renaissance West B, Ballroom Level, Training Session, QQ Effective Item Writing for Valid Measurement Anthony Albano, University of Nebraska-Lincoln; Michael Rodriguez, University of MinnesotaTwin Cities In this training session, participants will learn to write and critique high-quality test items by implementing itemwriting guidelines and validity frameworks for item development. Educators, researchers, test developers, and other test users are encouraged to participate. Following the session, participants should be able to: implement empirically-based guidelines in the item writing process; describe procedures for analyzing and validating items; apply item-writing guidelines in the development of their own items; and review items from peers and provide constructive feedback based on adherence to the guidelines. The session will consist of short presentations with small-group and large-group activities. Materials will be contextualized within common testing applications (e.g., classroom assessment, response to intervention, progress monitoring, summative assessment, entrance examination, licensure/certification). Participants are encouraged to bring a laptop computer, as they will be given access to a web application that facilitates collaboration in the item-writing process; those participating in the session in-person and remotely will use the application to create and comment on each other’s items online. This practice in item writing will allow participants to demonstrate understanding of what they have learned, and receive feedback on their items from peers and the presenters. 32 Washington, DC, USA Friday, April 8, 2016 3:00 PM - 7:00 PM, Meeting Room 11, Meeting Room Level NCME Board of Directors Meeting Members of NCME are invited to attend as observers. 33 2016 Annual Meeting & Training Sessions Friday, April 8, 2016 4:30 PM - 6:30 PM, Fado’s Irish Pub (Graduate Students only) Graduate Student Social Come enjoy FREE appetizers at a local venue within walking distance of the conference hotels. The first 50 graduate student attendees receive one free drink ticket. Exchange research interests and discuss your work with fellow graduate students from NCME & AERA Division D. Fado’s Irish Pub is located at 808 7th Street NW, Washington, DC 20001 34 Washington, DC, USA Friday, April 8, 2016 6:30 PM -10:00 PM, Ballroom C, Level Three, Convention Center AERA Centennial Symposium & Centennial Reception The Centennial Annual Meeting’s Opening Session and Reception will celebrate AERA’s 100-year milestone in grand style. Together, the elements of this energizing and dynamic opening session will commemorate the association’s history, highlight the breadth and unity of the field of education research as it has evolved around the world, and begin to explore second-century pathways for advancing AERA’s mission. The centerpiece of the opening plenary session will be a “Meet the Press”-style Power Panel and Town Hall discussion that takes a critical look at the current “State of the Field” for education research – taking stock of its complex history and imagining its future. The Post Reception will be an elegant and festive party for members and friends of AERA. 35 2016 Annual Meeting & Training Sessions 36 Washington, DC, USA Annual Meeting Program - Saturday, April 9, 2016 37 2016 Annual Meeting & Training Sessions 38 Washington, DC, USA Saturday, April 9, 2016 6:30 AM - 7:30 AM, Meeting Room 7, Meeting Room Level Sunrise Yoga Please join us for the second NCME Sunrise Yoga. We will start promptly at 6:30 a.m. for one hour at the Renaissance. Advance registration required ($10) to reserve your mat. NO EXPERIENCE NECESSARY. Just bring your body and your mind, and our friends from Flow Yoga Center (http://www.flowyogacenter.com/) will do the rest. Namaste. 39 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 8:15 AM - 10:15 AM, Renaissance East, Ballroom Level, Invited Session, A1 NCME Book Series Symposium: The Challenges to Measurement in an Era of Accountability Session Chair: Henry Braun, Boston College Session Discussants: Suzanne Lane, University of Pittsburgh; Scott Marion, National Center for the Improvement of Educational Assessment This symposium draws on The Challenges to Measurement in an Era of Accountability, a recently published volume in the new NCME Book Series. The volume addresses a striking imbalance: Although it is not possible to calculate test-based indicators (e.g. value-added scores or mean growth percentiles) for more than 70 percent of teachers, assessment and accountability issues in those other subject/grade combinations have received comparatively little attention in the research literature. The book brought together experts in educational measurement, as well as those steeped in the various disciplines, to provide a comprehensive and accessible guide to the measurement of achievement in a broad range of subjects, with a primary focus on high school grades. The five focal presentations will offer discipline-specific perspectives from: social sciences, world languages, performing arts, life sciences and physical sciences. Each presentation will include a brief review of assessment (both formative and summative) in the discipline, with particular attention to the unique circumstances faced by teachers and measurement specialists responsible for assessment design and development, followed by a survey of current assessment initiatives and responses to accountability pressures. The symposium offers the measurement community a unique opportunity to learn about assessment practices and challenges across the disciplines. Use of Evidence Centered Design in Assessment of History Learning Kadriye Ercikan, University of British Columbia; Pamela Kaliski, College Board Assessment Issues in World Languages Meg Malone, Center for Applied Linguistics; Paul Sandrock, American Council on the Teaching of Foreign Languages Arts Assessment in an Age of Accountability: Challenges and Opportunities in Implementation, Design, and Measurement Scott Shuler, Connecticut Department of Education, Ret; Tim Brophy, University of Florida; Robert Sabol, Purdue University Assessing the Life Sciences: Using Evidence-Centered Design for Accountability Purposes Daisy Rutstein and Britte Cheng, SRI International Assessing Physical and Earth and Space Science in the Context of the NRC Framework for K-12 Science Education and the Next Generation Science Standards Nathaniel Brown, Boston College 40 Washington, DC, USA Saturday, April 9, 2016 8:15 AM - 10:15 AM, Renaissance West A, Ballroom Level, Coordinated Session, A2 Collaborative Problem Solving Assessment: Challenges and Opportunities Session Chairs: Yigal Rosen, Pearson; Lei Liu, ETS Session Discussant: Samuel Greiff, University of Luxemburg Collaborative problem solving (CPS) is a critical competency for college and career readiness. Students emerging from schools into the workforce and public life will be expected to have CPS skills as well as the ability to perform that collaboration in various group compositions and environments (Griffin, Care, & McGaw, 2012; OECD, 2013). Recent curriculum and instruction reforms have focused to a greater extent on teaching and learning CPS (National Research Council, 2012; OECD, 2012). However, structuring standardized computer-based assessment of CPS skills, specifically for large-scale assessment programs, is challenging. In this symposium a spectrum of approaches for collaborative problem solving assessment will be introduced, and four papers will be presented and discussed. PISA 2015 Collaborative Problem Solving Assessment Framework Art Graesser, University of Memphis Human-To-Agent Approach in Collaborative Problem Solving Assessment Yigal Rosen, Pearson Collaborative Problem Solving Assessment: Bring Social Aspect into Science Assessment Lei Liu, Jiangang Hao, Alina von Davier and Patrick Kyllonen, ETS Assessing Collaborative Problem Solving: Students’ Perspective Haggai Kupermintz, University of Haifa 41 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 8:15 AM - 10:15 AM, Renaissance West B, Ballroom Level, Coordinated Session, A3 Harnessing Technological Innovation in Assessing English Learners: Enhancing Rather Than Hindering Session Chair: Dorry Kenyon, Center for Applied Linguistics Session Discussant: Mark Reckase, Michigan State University How do English Learners (ELs) interact with technology in large-scale testing? In this coordinated session, an interdisciplinary team from the Center for Applied Linguistics presents findings from four years of research and development for the WIDA Consortium. For nine years, WIDA has offered an annual paper-and-pencil assessment of developing academic English language proficiency (ELP), known as ACCESS for ELLs, used to assess over 1 million ELs in 36 states. With federal funding, WIDA and its partners have transitioned this assessment to a web-based assessment, ACCESS 2.0, now in its first operational year (2015-2016). ACCESS 2.0 is used to assess ELs at all levels of English language development, from grades 1 to 12, and to assess all four language domains (listening, speaking, reading and writing). Thus, the research and development activities covered multiple critical issues pertaining to ELs and technology in large-scale assessments. In this session, we share research findings from several inter-related perspectives, including improving accuracy of measurement, developing complex web-based performanceassessment tasks, and familiarity with technology in the EL population, including keyboarding and interfacing with technology-enhanced task types. These findings provide insight into the valid assessment of ELs using technology for a wide variety of uses. Keyboarding and the Writing Construct for Els Jennifer Renn and Jenny Dodson, Center for Applied Linguistics Supporting Extended Discourse Through a Computer-Delivered Assessment of Speaking Megan Montee and Samantha Musser, Center for Applied Linguistics Using Multistage Testing to Enhance Measurement David MacGregor and Xin Yu, Center for Applied Linguistics Enhanced Item Types—Engagement or Unnecessary Confusion for Els? Jennifer Norton and Justin Kelly, Center for Applied Linguistics 42 Washington, DC, USA Saturday, April 9, 2016 8:15 AM - 10:15 AM, Meeting Room 3, Meeting Room Level, Paper Session, A4 How Can Assessment Inform Classroom Practice? Session Discussant: Priya Kannan, ETS What Score Report Features Promote Accurate Remediation? Insights from Cognitive Interviews Francis Rick, University of Massachusetts, Amherst; Amanda Clauser, National Board of Medical Examiners Cognitive interviews were conducted with medical students interacting with score reports to investigate what content and design features promote adequate interpretations and remediation decisions. Transcribed “speech bursts” were coded based on pre-established categories, which were then used to evaluate the effectiveness of each report format. Evaluating the Degree of Coherence Between Instructional Targets and Measurement Models Lauren Deters, Lori Nebelsick-Gullet, Charlene Turner, Bill Herrera and Elizabeth Towles, edCount, LLC To solidify the links between the instructional and measurement contexts for its overall assessment system the National Center and State Collaborative investigated the degree of coherence among the system’s measurement targets, learning expectations, and targeted long-range outcomes. This study provides evidence for the system’s coherence across instruction and assessment contexts. Modeling the Instructional Sensitivity of Polytomous Items Alexander Naumann and Johannes Hartig, German Institute for International Educational Research (DIPF); Jan Hochweber, University of Teacher Education St. Gallen (PHSG) We propose a longitudinal multilevel IRT model for the instructional sensitivity of polytomous items. The model permits evaluation of global and differential sensitivity based on average change and variation of change in classroom-specific item locations and thresholds. Results suggest that the model performs well in its application to empirical data. Growth Sensitivity and Standardized Assessments: New Evidence on the Relationship Shalini Kapoor, ACT; Catherine Welch and Steve Dunbar, Iowa Testing Programs/University of Iowa Academic growth measurement requires a structured feedback that informs not only what students know but also what they need to know to learn and grow. This research proposes a method that can support generation of content-related growth feedback which can help tailor classroom instruction to student-specific needs. Using Regression-Based Growth Models to Inform Learning with Multiple Assessments Ping Yin and Dan Mix, Curriculum Associates This study evaluates the feasibility of two types of regression-based growth models to inform student learning using a computer adaptive assessment administered multiple times throughout a school year. With the increased interest to inform instruction learning, it is important to evaluate whether current growth models can support such goals. 43 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 8:15 AM - 10:15 AM, Meeting Room 4, Meeting Room Level, Coordinated Session, A5 Enacting a Learning Progression Design to Measure Growth Session Chair: Damian Betebenner, National Center for the Improvement of Educational Assessment The concept of growth is at the foundation of the policy and practice around systems of educational accountability. Yet there is a disconnect between the criterion-referenced intuitions that parents and teachers have for what it means for students to demonstrate growth and the primarily norm-referenced metrics that are used to infer growth. One way to address this disconnect would be to develop vertically linked score scales that could be used to support both criterion-referenced and norm-referenced interpretations, but this hinges upon having a coherent conceptualization of what it is that is growing from grade to grade. The purpose of this session is to facilitate debate about the design of large-scale assessments for the intended purpose of drawing inferences about student growth, a topic that was the recent subject of a 2015 focus article and commentaries for the journal Measurement. A learning-progression approach to the conceptualization of growth and the subsequent design of a vertical score scale will be described in the context of student understanding of proportional reasoning, a big picture idea from the Common Core State Standards for Mathematics. Subsequent presentations and discussion will focus on the pros and cons of the proposed approach and of other possible alternatives. Using Learning Progressions to Design Vertical Scales Derek Briggs and Fred Peck, University of Colorado Challenges in Modeling and Measuring Learning Progressions Jere Confrey, Ryan Seth Jones, and Garron Gianopulos, North Carolina State University The Importance of Content-Referenced Score Interpretations Scott Marion, National Center for the Improvement of Educational Assessment Challenges on the Path to Implementation Joseph Martineau and Adam Wyse, National Center for the Improvement of Educational Assessment Growth Through Levels David Thissen, University of North Carolina 44 Washington, DC, USA Saturday, April 9, 2016 8:15 AM - 10:15 AM, Meeting Room 5, Meeting Room Level, Paper Session, A6 Testlets and Multidimensionality in Adaptive Testing Session Discussant: Chun Wang, University of Minnesota Measuring Language Ability of Students with Compensatory MCAT: a Post-Hoc Simulation Study Burhanettin Özdemir and Selahattin Gelbal, Hacettepe University The purposes of this study is to determine the most suitable Multidimensional CAT design that measures language ability of students and compare the paper-pencil test outcomes to those of the new MCAT designs. Real data set from English Proficiency Test was used to create item pool consisting of 565 items. Multidimensional CAT Classification Method for Composite Scores Lihua Yao and Dan Segall, Defense Manpower data center The current research proposed an item selection method using cut points for the composite score for classification purpose in the multidimensional CAT frame work. The classification accuracy for the composite score for the proposed method is compared with other existing MCAT methods. Two Bayesian Online Calibration Methods in Multidimensional Computerized Adaptive Testing Ping Chen, Beijing Normal University To solve the non-convergence issue in M-MEM (Chen & Xin, 2013) and improve the calibration precision, this study combined Bayes Modal Estimation (BME) (Mislevy, 1986) with M-OEM and M-MEM to make full use of the prior information, and proposed two Bayesian online calibration methods in MCAT (M-OEM-BME and M-MEM-BME). Item Selection in Testlet-Based CAT Mark Reckase and Xin Luo, Michigan State University The research in item selection in testlet-based CAT is rare. This study compared three item selection approaches (one was based on polytomous model and two on dichotomous model) and investigated some factors that might influence the effectiveness of CAT. These three approaches obtained similar measurement accuracy but different exposure rate. Effects of Testlet Characteristics on Estimating Abilities in Testlet-Based CAT Seohong Pak, University of Iowa; Hong Qian and Xiao Luo, NCSBN The testlet selection methods, testlet sizes, degrees of variation in item difficulites within each testlet, and degrees of testlet random effect were investigated under testlet-based CAT. The 48 conditions were run for 50 times using R and results were compared based on the measurement accuracy and decision accuracy. Computerized Mastery Testing (CMT) Without the Use of Item Response Theory Sunhee Kim and Adena Lebeau, Prometric; Tammy Trierweiler, Law School Admissions Council (LSAC); F. Jay Breyer and Charles Lewis, Educational Testing Service; Robert Smith, Smith Consulting This study demonstrates that CMT can be successfully implemented when testlets are constructed using classical item statistics in a real world application. As CMT is easier to implement and more cost efficient than CAT test designs, credentialing programs that have small samples and item pools may benefit from this approach. 45 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 8:15 AM - 10:15 AM, Meeting Room 12, Meeting Room Level, Paper Session, A7 Methods for Examining Local Item Dependence and Multidimensionality Session Discussant: Ki Matlock, Oklahoma State University Examining Unidimensionality with Parallel Analysis on Residuals Tianshu Pan, Pearson The current study will compare the performances of some parallel analysis (Horn, 1965) procedures on checking unidimensionality of the simulated unidimensional and multidimensional data respectively, i.e., the procedures of Reckase’s (2009), Drasgow and Lissak’s (1983), regular parallel analysis, and a new parallel analysis proposed by this study. A Conditional IRT Model for Directional Local Item Dependency in Multipart Items Dandan Liao, Hong Jiao and Robert Lissitz, University of Maryland, College Park A multipart item consists of two related questions, which potentially introduces conditional local item dependence (LID) between two parts of the item. This paper proposes a conditional IRT model for directional LID in multipart items and compares different approaches to modeling LID in terms of parameter estimation through simulation study. Fit Index Criteria in Confirmatory Factor Analysis Models Used by Measurement Practitioners Anne Corinne Huggins-Manley and HyunSuk Han, University of Florida Measurement practitioners often use CFA models to assess unidimensionality and local independence in test data. Current guidelines for assessing fit of CFA models are possibly inappropriate because they were not developed under measurement oriented conditions. This study provides CFA fit index cutoff recommendations for evaluating IRT model assumptions. Multilevel Bi-Factor IRT Models for Wording Effects Xiaorui Huang, East China Normal University A multilevel bi-factor IRT was developed to account for wording effects in mixed-format scales and multilevel data structures. Simulation studies demonstrated good parameter recovery for the new model and underestimation of SE when multilevel data structures were ignored. An empirical example was provided. A Generalized Multinomial Error Model for Tests That Violate Conditional Independence Assumptions Benjamin Andrews, ACT A generalized multinomial error model is presented that allows for dependency among vectors of item responses. This model can be used in instances where polytomous items are related to the same passage or if responses are rated on several different traits. Examples and comparisons to G theory methods are discussed. Both Local Item Dependencies and Cut-Point Location Impact Examinee Classifications Jonathan Rubright, American Institute of Certified Public Accountants This simulation study demonstrates that the strength of local item dependencies and the location of an examination’s cut-point both influence the sensitivity and specificity of examinee classifications under unidimensional IRT. Practical implications are discussed in terms of false positives and false negatives of test takers. 46 Washington, DC, USA Saturday, April 9, 2016 10:35 AM - 12:05 PM, Renaissance East, Ballroom Level, Coordinated Session, B1 The End of Testing as We Know It? Session Chair: Randy Bennett, ETS Session Presenters: Randy Bennett, ETS; Joan Herman, UCLA-CRESST; Neal Kingston, University of Kansas The rapid evolution of technology is affecting all aspects of our lives—commerce, communication, leisure, and education. Activities like travel planning, news consumption, and music purchasing have been so dramatically affected as to have caused significant shifts in how services and products are packaged, marketed, distributed, priced, and sold. Those shifts have been dramatic enough to have substantially reduced the influence of once-staple products like newspapers and the companies that provide them. Technology has come to education and educational testing too, though more slowly than to other areas. Still, there is growing evidence that the future for these fields will be considerably different and that those differences will emerge quickly. Billions of dollars are being invested in new technology-based products and services for K-12 as well as higher education, huge amounts of student data are being collected through these offerings, tests are moving to digital delivery and substantially changed in the process, and the upheaval that has occurred in other industries may come to education too. What will and won’t change for educational testing? This panel presentation will include three speakers, each offering a different scenario for the future of K-12 assessment. 47 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Coordinated Session, B2 Fairness and Machine Learning for Educational Practice Session Chair: Alina von Davier Session Moderator: Jill Burstein Session Panelists: Nitin Madnani and Aoife Cahill, Educational Testing Service; Solon Barocas, Princeton University; Brendan O’Connor, University of Massachusetts Amherst; James Willis, Indiana University This panel will address issues around fairness and transparency in the application of ML to education, in particular to learning and assessment. Panelists will include experts in NLP, Computational Psychometric (CP), and Education Technology Policy and Ethics. Panelists will respond to questions such as, 1. Are data-driven methods used alone ever OK? 2. Are there use cases that are more acceptable than others from a fairness perspective? 3. Are there examples from other domains that we may apply to educational assessment? 4. In the case of scoring written essays: What is the difference between human raters and ML methods? For human raters, at least in writing, we know what they ‘are supposed to consider’ but don’t know what they choose and what the weightings are? For ML methods, we actually ‘know’ what features go in, but weightings and predictive modeling can be black-box-like. But, is this any less true for human raters? 5. Under what conditions is interpretability important? For instance, how do we isolate diagnostic information if we use ML for predicting learning outcomes? 6. Can we detect underlying bias in the large data set from education? If we identify the bias, is it acceptable to adjust the ML algorithms to eliminate the bias? Can this adjustments be misused? 7. What type of evaluation methods should one employ to ensure that the results are fair to all groups? The moderator will lead the panel by presenting questions to the panel and managing the discussion. The panel discussion will be 60 minutes, and there will be an additional 30 minutes intended for questions and discussion with the audience. 48 Washington, DC, USA Saturday, April 9, 2016 10:35 AM - 12:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, B3 Item Difficulty Modeling: From Theory to Practice Session Chair: Isaac I. Bejar Session Discussant: Steve Ferrara Item difficulty modeling (IDM) is concerned with both and understanding of the variability in estimated item difficulty, as well as explanatory item response modeling incorporating difficulty covariates. The symposium starts with an overview of the multiple applications of difficulty modeling, ranging from purely theoretical to practical applications. The following presentations then focus on presenting empirical research on the modelling of mathematics items used in K-12 and graduate admissions assessments. Specifically, the following research will be presented: • The use of a validated IDM for generating items by means of family and structural variants • The multidisciplinary development of an IDM for practical day-to-day application • Evaluation of the feasibility of automating the propositional analysis of existing items to study the role of linguistic variables on item difficulty • Fitting an explanatory IRT model that extends the LLTM by fixing residuals to fully account for difficulty An Overview of the Purposes of Item Difficulty Modeling (IDM) Isaac Bejar, ETS Implications of Item Difficulty Modeling for Item Design and Item Generation Susan Embretson, Georgia Institute of Technology Developing an Item Difficulty Model for Quantitative Reasoning: A Knowledge Elicitation Approach Edith Aurora Graf, ETS Exploring an Automated Approach to Examining Linguistic Context in Mathematics Items Kristin Morrison, Georgia Institute of Technology An Explanatory Model for Item Difficulties with Fixed Residuals Paul De Boeck, Ohiao State University 49 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Paper Session, B4 Growth and Vertical Scales Session Discussant: Anna Topczewski, Pearson Estimating Vertical Scale Drift Due to Repetitious Horizontal Equating Emily Ho, Michael Chajewski and Judit Antal, College Board The stability of a vertical scale as a function of repeated administrations is rarely studied. Our empirical simulation uses 2-pl math items, generating forms for test-takers from three grades. We examine the effect of ability, test difficulty, and equating designs on vertical scale stability when applying repetitious horizontal equating. An Eirm Approach for Studying Latent Growth in Alphabet Knowledge Among Kindergarteners Xiaoxin Wei, American Institutes for Research; Patrick Meyer and Marcia Invernizzi, University of Virginia We applied a series of latent growth explanatory item response models to study growth in alphabet knowledge over three time points. Models allowed for time-varying item parameters and evaluated the impact of person properties on growth. Results show that growth differs by examinee group in expected and unexpected ways. Vertical Scaling and Item Location: Generalizing from Horizontal Linking Designs Stephen Murphy, Rong Jin, Bill Insko and Sid Sharairi, Houghton Mifflin Harcourt Establishing a vertical scale for an assessment is besieged with practical decisions. Outcomes of these decisions are essential to valid interpretations of student growth and teacher effectiveness (Briggs, Weeks, & Wiley, 2008). This study adds to existing literature by examining the impact of item location on the vertical scale. Predictive Accuracy of Model Inferences for Longitudinal Data with Self-Selection Tyler Matta, Yeow Meng Thum and Quinn Lathrop, Northwest Evaluation Association Conventional approaches to characterizing classification accuracy are not valid when data are subject to selfselection. We introduce predictive accuracy, a framework that appropriately accounts for the impact of nonignorable missing data. We provide an illustration using longitudinal assessment data to predict college readiness when college test takers are self-selected. 50 Washington, DC, USA Saturday, April 9, 2016 10:35 AM - 12:05 PM, Meeting Room 4, Meeting Room Level, Paper Session, B5 Perspectives on Validation Session Discussant: Mark Shermis, University of Houston-Clear Lake Using a Theory of Action to Ensure High Quality Tests Cathy Wendler, Educational Testing Service A theory of action helps testing programs ensure high quality tests by documenting claims, determining evidence needed to support those claims, and creating solutions to address unintended consequences. This presentation describes the components of a theory of action and how it is being used to evaluate and improve programs. Teacher Evaluation Systems: Mapping a Validity Argument Tia Sukin and W. Nicewander, Pacific Metrics; Phoebe Winter, Consultant, Assessment Research and Development Providing validation evidence for teacher evaluation systems is a complex and historically neglected task. This paper provides a framework structure for building an argument for the use of comprehensive teacher evaluation systems which will allow for the identification of possible weaknesses in the system that need to be addressed. Validity Evidence to Support Alternate Assessment Score Uses: Fidelity and Response Processes Meagan Karvonen, Russell Swinburne Romine and Amy Clark, University of Kansas Validity of score interpretations and uses for new online alternate assessments for students with significant cognitive disabilities (AA-AAS) require new sources of evidence about student and teacher actions during the test administration process. We present findings from student cognitive labs, teacher cognitive labs, and test administration observations for an AA-AAS. Communicating Psychometric Research to Policymakers Andrea Lash and Mary Peterson, WestEd; Benjamin Hayes, Washoe County School District Policymakers’ implicit assumptions about assessment data inform their designs of educator evaluation systems. How can psychometricians help policymakers evaluate the validity of their assumptions? We examine a two-year effort in one state using a model of science communication for political contexts and an argument-based validation framework. 51 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 10:35 AM - 12:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, B6 Model Fit Session Discussant: Matthew Johnson, Teachers College Evaluation of Item Response Theory Item-Fit Indices Adrienne Sgammato and John Donoghue, Educational Testing Service Performance of Pearson chi-square and likelihood ratio item level model fit indices based on observed data were evaluated in the presence of complex sampling of items (i.e., BIB sampling). Distributional properties, type I error and power of these measures were evaluated. Rethinking Complexity in Item Response Theory Models Wes Bonifay, University of Missouri The notion of complexity commonly refers to the number of freely estimated parameters in a model. An investigation of five popular measurement models suggests that complexity in IRT should be defined not by the number of parameters, but instead by the functional form of the model. Measures for Identifying Non-Monotonically Increasing Item Response Functions Nazia Rahman and Peter Pashley, Law School Admission Council; Charles Lewis, Educational Testing Service This study explored statistical measures as bases for defining robust criterion in checking for non-monotonicity in multiple-choice tests, and may be considered analogous to effect size measures. The three methods adapted to identify non-monotonicity in items were Mokken’s scalability coefficient, isotonic regression analysis, and nonparametric smooth regression method. Evaluation of Limited Information IRT Model-Fit Indices Applied to Complex Item Samples John Donoghue and Adrienne Sgammato, Educational Testing Service Recently, “limited information” (computed from low order margins of the item response data) measures of model fit have been suggested. We examined the performance of the indices in the presence of complex sampling of items (i.e., BIB sampling). Distributional properties, Type I error and power of these measures were evaluated. 52 Washington, DC, USA Saturday, April 9, 2016 10:35 AM - 12:05 PM, Meeting Room 12, Meeting Room Level, Paper Session, B7 Simulation- and Game-Based Assessments Session Discussant: José Pablo González-Brenes, Pearson Aligning Process, Product and Survey Data: Bayes Nets for a Simulation-Based Assessment Tiago Caliço, University of Maryland; Vandhana Mehta and Martin Benson, Cisco Networking Academy; André Rupp, Educational Testing Service Simulation-based assessments yield product and process data that can potentially allow for more comprehensive measurement of competencies and factors that affect these competencies. We discuss the iterative construction of student characterizations (personae) and elucidate the methodological implications for putting into practice the evidence-centered design process successfully. Practical Consequences of Static, Dynamic, or Hierarchical Bayesian Networks in Game-Based Assessments Maria Bertling, Harvard University; Katherine Furgol Castellano, Educational Testing Service There is a growing interest in using Bayesian approaches for analyzing data from game-based assessments (GBAs). This paper describes the process of developing a measurement model for an argumentation game and demonstrates analytical and practical consequences of using different types of Bayesian networks as scaling method for GBAs. Impact of Feedback Within Technology Enhanced Items on Perseverance and Performance Stacy Hayes, Chris Meador and Karen Barton, Discovery Education This research explores the impact of formative feedback within technology enhanced items (TEIs) embedded in a digital mathematics techbook, and where students are permitted multiple attempts. Exploratory analyses will investigate patterns of student performance by time on task, type of feedback, item type, misconception, construct complexity, and persistence. Framework for Feedback and Remediation with Electronic Objective Structured Clinical Examinations Hollis Lai, Vijay Daniels, Mark Gierl, Tracey Hillier and Amy Tan, University of Alberta Objective Structured Clinical Examination (OSCE) is popular among health profession education but cannot provide student feedback and guidance. As OSCEs migrate into an electronic format, the purpose of our paper is to demonstrate a framework to integrate myriad of data sources captured in an OSCE to provide student feedback. 53 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 10:35 AM - 12:05 PM, Meeting Room 10, Meeting Room Level, Paper Session, B8 Test Security and Cheating Session Discussant: Dmitry Belov, Law School Admission Council Applying Three Methods for Detecting Aberrant Tests to Detect Compromised Items Yu Zhang, Jiyoon Park and Lorin Mueller, Federation of State Boards of Physical Therapy Three different approaches toward detecting item preknowledge were applied to detect compromised items. These three methods were originally developed for detecting aberrant responses and showed high performance in detecting examinees having item preknowledge. We employed these methods to detect potentially compromised items. Detecting Two Patterns of Cheating with a Profile of Statistical Indices Amin Saiar, Gregory Hurtz and John Weiner, PSI Services LLC Several indices used to detect aberrances in item scores are compared, assessing similarities in raw responses. Results show that the different indices are differentially sensitive to two patterns of cheating, and profiles across the indices may be most useful for detecting and diagnosing test cheating. Integrating Digital Assessment Meta-Data for Psychometric and Validity Analysis Elizabeth Stone, Educational Testing Service This paper discusses meta-data (or process data) captured during assessments that can be used to enhance psychometric and validity analyses. We examine sources and types of meta-data, as well as uses including subgroup refinement, identification of effort, and test security. We also describe challenges and caveats to this usage. How Accurately Can We Detect Erasures? Han Yi Kim and Louis Roussos, Measured Progress Erasure analyses require accurate detection of erasures, as distinct from blank and filled-in marks. This study evaluates erasure detection using data for which the true nature of the marks are known. Optimal rules are formulated. Type I error and power are calculated and evaluated under various scenarios. 54 Washington, DC, USA Saturday, April 9, 2016 12:25 PM - 1:55 PM, Renaissance East, Ballroom Level, Coordinated Session, C1 Opting Out of Testing: Parent Rights Versus Valid Accountability Scores Session Discussant: S.E. Phillips, Assessment Law Consultant Although permitted by legislation in some states, too many parents opting their children out of statewide testing may threaten the validity of school accountability scores. This session will explore the effects of opt outs from the perspectives of enabling state legislation, state assessment staff, measurement specialists, and testing vendors. Survey and Analysis of State Opt Outs and Required Test Participation Legislation Michelle Croft, ACT, Inc.; Richard Lee, ACT, Inc Test Administration, Scoring, and Reporting When Students Opt Out Tim Vansickle, Questar Assessment Inc., Responding to Parents and Schools About Student Testing Opt Outs Derek Brown, Oregon Department of Education Opt-Outs: The Validity of School Accountability and Teacher Evaluation Test Score Interpretations Greg Cizek, University of North Carolina at Chapel Hill 55 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 12:25 PM - 1:55 PM, Renaissance West A, Ballroom Level, Coordinated Session, C2 Building Toward a Validation Argument with Innovative Field Test Design and Analysis Session Chair: Catherine Welch, University of Iowa Session Discussants: Michael Rodriguez, University of Minnesota; Wayne Camara, ACT, Inc. For a variety of reasons, large-scale assessment programs have come to rely heavily on data collected during field testing to evaluate items, assemble forms and link those forms to already established standard score scales and interpretive frameworks such as proficiency benchmarks and other standards such as college readiness. When derived scores are based on pre-calibrated item pools, as in adaptive testing, or on pre-equated or otherwise linked fixed test forms, the administrative conditions (cf. Wise, 2015) and sampling designs (e.g. Meyers, Miller & Way, 2009) for field testing are critical to the validity of the scores. This session addresses key aspects of field testing that can be used as a basis for the validation work of an operational assessment program. Implications of New Construct Definitions and Shifting Emphases in Curriculum and Instruction Catherine Welch, University of Iowa Implications of Composition and Behavior of the Sample When Studying Item Responses Tim Davey, Educational Testing Service Assessing Validity of Item Response Theory Model When Calibrating Field Test Items Brandon LeBeau, University of Iowa 56 Washington, DC, USA Saturday, April 9, 2016 12:25 PM - 1:55 PM, Renaissance West B, Ballroom Level, Coordinated Session, C3 Towards Establishing Standards for Spiraling of Contextual Questionnaires in LargeScale Assessments Session Chair: Jonas Bertling, Educational Testing Service Session Discussant: Lauren Harrell, National Center for Education Statistics Constraints of overall testing time and the large sample sizes in large-scale assessments (LSAs) make spiraling approaches where different respondents receive different sets of items a viable option to reduce respondent burden while maintaining or increasing content coverage across relevant areas. Yet, LSAs have taken different directions in their use of spiraling in operational questionnaires and there is currently no consensus on the benefits and drawbacks of spiraling. This symposium brings together diverse perspectives on spiraling approaches in conjunction with mass imputation for contextual questionnaires in LSAs and will help establish standards how future operational questionnaire designs can be improved to reduce risks for plausible value estimation and secondary analyses. Context and Position Effects on Survey Questions and Implications for Matrix Sampling Paul Jewsbury and Jonas Bertling, Educational Testing Service Matrix Sampling and Imputation of Context Questionnaires: Implications for Generating Plausible Values David Kaplan and Dan Su, University of Wisconsin – Madison Imputing Missing Background Data, How to ... And When to ... Matthias von Davier, Educational Testing Service Design Considerations for Planned Missing Auxiliary Data in a Latent Regression Context Leslie Rutkowski, University of Oslo 57 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 12:25 PM - 1:55 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, C4 Estimation Precision of Variance Components: Revisiting Generalizability Theory Session Discussant: Xiaohong Gao, ACT, Inc. In this coordinated session of three presentations, the overarching theme is the estimation precision of variance components (VCs) in generalizability theory (G theory). The estimation precision is of significant importance in that VCs are the building blocks of reliability, on which valid interpretations of measurement are contingent. In the first presentation, the authors discuss the adverse effects of non-additivity on the estimation precision of VCs. Specifically, the VC of subjects is underestimated and consequently, generalizability coefficients are also underestimated in a onefacet design. An example of non-additivity is the presence of subject-by-facet interaction in a one-facet design. The authors demonstrate that a nonadditive model should be used in such a case to obtain unbiased estimators for VCs. As a follow-up study, the second presentation focuses on the identification of non-additivity by use of Tukey’s singledegree-freedom test. The authors evaluate Tukey’s test for non-additivity in terms of Type I and Type II error rates. Finally, the third presentation extends our theme in a multivariate context and touches on the estimation precision of construct-irrelevant VCs in subscore profile analysis. The authors compare the extent to which Component Universe Score Profiles and factor analytic profiles accurately represent subscore profiles. Bias in Estimating Subject Variance Component When Interaction Exists in One-Facet Design Jinming Zhang, University Of Illinois at Urbana-Champaign Component Universe Score Profiles: Advantages Over Factor Analytic Profile Analysis Joe Grochowalski, The College Board; Se-Kang Kim, Fordham University Evaluating Tukey’s Test for Detecting Nonadditivity in G-Theory Applications Chih-Kai Lin, Center for Applied Linguistics (CAL) 58 Washington, DC, USA Saturday, April 9, 2016 12:25 PM - 1:55 PM, Meeting Room 4, Meeting Room Level, Paper Session, C5 Sensitivity of Value-Added Models Session Discussant: Katherine Furgol Castellano, ETS Cohort and Content Variability in Value-Added Model School Effects Daniel Anderson and Joseph Stevens, University of Oregon The purpose of this paper was to explore the extent to which school effects estimated from a random-effects valueadded model (VAM) vary as a function of year-to-year fluctuations in the student sample (i.e., cohort) and the tested subject (reading or math). Preliminary results suggest high volatility in school effect estimates. Value-Added Modelling Considerations for School Evaluation Purposes Lucy Lu, NSW Department of Education, Australia This paper discusses findings from the development of value-added models for a large Australian education system. Issues covered include the impact of modelling choices on the representation of schools of different sizes in the distribution of school effects; sensitivity of VA estimates to test properties and to missing test data. Implications of Differential Item Quality for Test Scores and Value-Added Estimates Robert Meyer, Nandita Gawade and Caroline Wang, Education Analytics, Inc. We explore whether differential item quality compromises the use of locally-developed tests in student performance and educator evaluation. Using simulated and empirical data, we find that item corruption affects test scores, and to a lesser extent, value-added estimates. Adjusting test score scales and limiting to well-functioning items mitigate these effects. 59 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 12:25 PM - 1:55 PM, Meeting Room 5, Meeting Room Level, Paper Session, C6 Item and Scale Drift Session Discussant: Jonathan Weeks, ETS The Impact of Item Parameter Drift in Computer Adaptive Testing (cat) Nicole Risk, American Medical Technologists The impact of IPD on measurement in CAT was examined. The amount and magnitude of IPD, as well as the size of the item pool, was varied in a series of simulations. A number of criteria was used to evaluate the effects on measurement precision, classification, and test efficiency. Practice Differences and Item Parameter Drift in Computer Adaptive Testing Beyza Aksu Dunya, University of Illinois at Chicago The purpose of this simulation study was to evaluate the impact of IPD that occurs due to teaching and practice differences on person parameter estimation and classification accuracy in CAT when factors such as percentage of drifting items and percentage of examinees receiving differential teaching and practices vary. Investigating Linear and Nonlinear Item Parameter Drift with Explanatory IRT Models Luke Stanke, Minneapolis Public Schools; Okan Bulut, University of Alberta; Michael Rodriguez and Jose Palma, University of Minnesota This study investigates the impact of model misspecification in detecting linear and nonlinear item parameter drift (IPD). Monte Carlo simulations were conducted to examine drift with linear, quadratic, and factor IPD models under various testing conditions. Quality Control Models for Tests with a Continuous Administration Mode Yuyu Fan, Fordham University; Alina von Davier and Yi-Hsuan Lee, ETS This paper systematically compared the performance of Change Point Models (CPM) and Hidden Markov Models (HMM) on score stability monitoring and scale drift assessment in educational test administrations using simulated data. The study will contribute to the continuing monitoring of scale scores for the purpose of quality control in equating. Ensuring Test Fairness Through Monitoring the Anchor Test and Covariates Marie Wiberg, Umeå University; Alina von Davier, Educational Testing Service A quality control procedure for a testing program with multiple consecutive administrations with anchor test is proposed. Descriptive statistics, ANOVA, IRT and linear mixed effect models were used to examine the impact of covariates on the anchor test. The results implies that the covariates play a significant part. 60 Washington, DC, USA Saturday, April 9, 2016 12:25 PM - 1:55 PM, Meeting Room 12, Meeting Room Level, Paper Session, C7 Cognitive Diagnostic Model Extensions Session Discussant: Larry DeCarlo, Teachers College; Columbia University A Polytomously-Scored Dina Model for Graded Response Data Dongbo Tu, Chanjin Zheng and Yan Cai, Jiangxi Normal University; Hua-Hua Chang, University of Illinois at UrbanaChampaign This paper proposed a polytomous extension of the DINA model for a test with polytomously-scored items. Simulation study was conducted to investigate the performance of the proposed model. In addition, a real-data example was used to illustrate the application of this new model with the polytomously-scored items. Information Matrix Estimation Procedures for Cognitive Diagnostic Model Tao Xin, Yanlou Liu and Wei Tian, Beijing Normal University The performance of sandwich-type covariance matrix in CDM is consistent and robust to model misspecification. The Type I error rates of the Wald statistic, constructed by using observed information matrix, for one-, two-, and three-attribute items are all perfectly matched the nominal levels, when the sample size was relatively large. Higher-Order Cognitive Diagnostic Models for Polytomous Latent Attributes Peida Zhan and Yufang Bian, Beijing Normal University; Wen-Chung Wang and Xiaomin Li, The Hong Kong Institude of Education Latent attributes in cognitive diagnostic models (CDMs) are dichotomous, but in practice polytomous attributes are possible. We developed a set of new CDMs in which the polytomous attributes are assumed to measure the same continuous latent trait. Simulation studies demonstrated good parameter recovery using WinBUGS. An empirical example was given. Incorporating Latent and Observed Predictors in Cognitive Diagnostic Models Yoon Soo Park and Kuan Xing, University of Illinois at Chicago; Young-Sun Lee, Teachers College, Columbia University; MiYoun Lim, Ewha Womans University A general approach to specify observed and latent factors (estimated using item response theory) as predictors in an explanatory framework for cognitive diagnostic models is proposed. Simulations were conducted to examine the stability of estimates; real-world data analyses were conducted to demonstrate the framework and application using TIMSS data. 61 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 12:25 PM - 1:55 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session, Paper Session, C8 Electronic Board #1 Examination of Over-Extraction of Latent Classes in the Mixed Rasch Model Sedat Sen, Harran University Correct identification of number of latent classes in MRMs is very important. This study investigated the overextraction problem in MRMs by focusing on non-normal ability distributions and fit index selection. Three ML-based estimation techniques were used and over-extraction problem was observed under some conditions. Electronic Board #2 Identifying a Credible Reference Variable for Measurement Invariance Testing Cheng-Hsien Li and KwangHee Jung, Department of Pediatrics, University of Texas Medical School at Houston Two limitations to model identification in multiple-group CFA, unfortunately, have received little attention: (1) the standardization in loading invariance and (2) the lack of a statistical test for intercept invariance. The proposed strategy extends a MIMIC model with moderated effects to identify a credible reference variable for measurement invariance testing. Electronic Board #3 Using Partial Classification of Respondents to Reduce Classification Error in Mixture IRT Youngmi Cho, Pearson; Tongyun Li, ETS; Jeffrey Harring and George Macready, University of Maryland This study investigates an alternative classification method in mixture IRT models. This method incorporates an additional classification criterion. Namely that the largest posterior probability for each response pattern must equal or exceed a specified lower bound. This results in a reduction of expected classification error. Electronic Board #4 Parameter Recovery in Multidimensional Item Response Theory Models Under Complexity and Nonormality Stephanie Underhill, Dubravka Svetina, Shenghai Dai and Xiaolin Wang, Indiana University - Bloomington We investigate item and person parameter recovery in multidimensional item response theory models for understudied conditions. Specifically, we ask how well can IRTpro and the mirt package in R recover the parameters when person distribution is nonnormal, items exhibit varying degrees of complexity, and different item parameters comprise an assessment. Electronic Board #5 Psychometric Properties of Technology-Enhanced Item Formats Ashleigh Crabtree and Catherine Welch, University of Iowa Objectives of this research will be to provide information about the properties of technology-enhanced item formats. Specifically, the research will focus on the construct representation and technical properties of test forms that use these item types. 62 Washington, DC, USA Electronic Board #6 Using Technology-Enhanced Items to Measure Fifth Grade Geometry Jessica Masters, Lisa Famularo and Kristin King, Measured Progress Technology-enhanced items have potential to provide improved measurement of high-level constructs. But research is needed to evaluate whether these items lead to valid inferences about knowledge and provide improved measurement over traditional items. This paper explores these questions in the context of fifth grade geometry using qualitative cognitive lab data. Electronic Board #7 A Multilevel Mt-Mm Approach for Estimating Trait Variance Across Informant Types Tim Konold and Kathan Shukla, University of Virginia An approach for extracting common trait variance from structurally different informant ratings is presented with an extension for measuring the resulting factors’ associations with an external outcome. Results are based on structurally different and interchangeable students (N = 45,641) and teachers (N = 12,808) from 302 schools. Electronic Board #8 A Validation Study of the Learning Errors and Formative Feedback (leaff) Model Wei Tang, Jacqueline Leighton and Qi Guo, University of Alberta The objective of the present study involves (1) validating the selected measures of the latent variables in the Learning Errors and Formative Feedback (LEAFF) model, and (2) applying a structural equation model to evaluate the core of the LEAFF model. In addition, culturally invariant models are analyzed and presented. Electronic Board #9 Automatic Flagging of Items for Key Validation Füsun Şahin, University at Albany, State University of New York; Jerome Clauser, American Board of Internal Medicine Key validation procedures typically rely on professional judgement to identify potentially problematic items. Unfortunately, lack of standardized flagging criteria can introduce bias in examinee scores. This study demonstrates the use of logistic regression to mimic expert judgment and automatically flag problematic items. The final model properly identified 96% of items. Electronic Board #10 Evaluating the Robustness of Multidimensional IRT (mirt) Based Growth Modeling Hanwook Yoo, Seunghee Chung, Peter van Rijn and Hyeon-Joo Oh, Educational Testing Service This study evaluates the robustness of MIRT-based growth modeling when tests are not strictly unidimensional. Primary independent variables manipulated are a) magnitude of student growth and b) magnitude of test multidimensionality. The findings support how growth is effectively measured by proposed model under different test conditions. Electronic Board #11 Standard Errors of Measurement for Group-Level SGP with Bootstrap Procedures Jinah Choi, Won-Chan Lee, Robert Brennan and Robert Ankenmann, The University of Iowa This study provides procedures for estimating standard errors of measurement and confidence intervals for grouplevel SGPs by using bootstrap sampling plans in generalizability theory. It is informative to gauge reliability of the reported SGPs when reporting the mean or median of individual SGPs within a group of interest. 63 2016 Annual Meeting & Training Sessions Electronic Board #12 Vertical Scaling of Test with Mixed Item Formats Including Technology Enhanced Items Dong-In Kim, Ping Wan and Joanna Tomkowicz, Data Recognition Corporation; Furong Gao, Pacific Metric; Jungnam Kim, NBCE This study is intended to enhance the knowledge base of IRT vertical scaling when tests consists of mixed item types including technology-enhanced items. Using large scale state assessments, the study compares results from different configurations of item type compositions of anchor set, anchor sources, IRT models, and vertical scaling methods. Electronic Board #13 Full-Information Bifactor Growth Models and Derivatives for Longitudinal Data Ying Li, American Nurses Credentialing Center Bifactor growth model with correlated general factors has shown promising in recovering longitudinal data; however it’s not known whether the simplified models perform well with comparable estimation accuracy. This study investigated two simplified versions of the model in data recovery under various conditions, aiming to provide guidance on model selections. Electronic Board #14 The Pseudo-Equivalent Groups Approach as an Alternative to Common-Item Equating Sooyeon Kim and Ru Lu, Educational Testing Service This study evaluates the effectiveness of equating test scores by using demographic data to form “pseudoequivalent groups” of test takers. The study uses data from a single test form to create two half-length forms for which the equating relationship is known. Electronic Board #15 Equating with a Heterogeneous Target Population in the Common-Item Design Ru Lu and Sooyeon Kim, Educational Testing Service This study evaluates the effectiveness of weighting for each subgroup in the nonequivalent groups with commonitem design. This study uses data from a single test form to create two research forms for which the equating relationship is known. Two weighting schemes are compared in terms of equating accuracy. Electronic Board #16 Examining the Reliability of Rubric Scores to Assess Score Report Quality Mary Roduta Roberts, University of Alberta; Chad Gotch, Washington State University The purpose of this study is to assess the reliability of scores obtained from a recently developed ratings-based measure of score report quality. Findings will be used to refine assessment of score report quality and advance the study and practice of score reporting. Electronic Board #17 Accuracy of Angoff Method Item Difficulty Estimation at Specific Cut Score Levels Tanya Longabach, Excelsior College This study examines the accuracy of item difficulty estimates in Angoff standard setting with no normative item data available. Correlation between observed and estimated item difficulty is moderate to high. The judges consistently overestimate student ability at higher cut levels, and underestimate ability of students at the D cut level. 64 Washington, DC, USA Electronic Board #18 A Passage-Based Approach to Setting Cutscores on Ela Assessments Marianne Perie and Jessica Loughran, Center for Educational Testing and Evaluation New assessments in ELA contain a strong focus on reading comprehension with multiple passages of varying complexity. Using a variant on the Bookmark method, this study provides results from two standard setting workshops with two approaches to setting passage-based cut scores and two approaches to recovering the intended cut score. Electronic Board #19 Psychometric Characteristics of Technology Enhanced Items from a Computer-Based Interim Assessment Program Nurliyana Bukhari, University of North Carolina at Greensboro; Keith Boughton and Dong-In Kim, Data Recognition Corporation This study compared the IRT information of technology enhanced (TE) item formats from an interim assessment program. Findings indicate that the evidence-based selected response items within English Language Arts and the select-and-order, equation-and-expression entry, and matching items within Mathematics, provided more information when compared to the traditional selected response items. Electronic Board #20 Exposure Control for Response Time-Informed Item Selection and Estimation in CAT Justin Kern, Edison Choe and Hua-Hua Chang, University of Illinois at Urbana-Champaign This study will investigate item exposure control while using response times (RTs) with item responses in CAT to minimize overall test-taking time. Items are selected as maximum information per time unit as in Fan et al. (2012). Calculations use estimates for ability and speededness obtained via a joint-estimation MAP routine. Electronic Board #21 Monitoring Item Drift Using Stochastic Process Control Charts Hongwen Guo and Frederic Robin, ETS In on-demand testing, test items have to be reused; however their true characteristics may drift away over time. This study links item drift to DIF analysis and SPC methods in a sequence of test administrations are used to detect item drift as early as possible. Electronic Board #22 Reporting Subscores Using Different Multidimensional IRT Models in Sequencing Adaptive Testing Jing-Ru Xu, Pearson VUE; Frank Rijmen, Association of American Medical Colleges This research investigates the efficiency of reporting subscores in sequencing adaptive testing. It compares this new implementation with a general multidimensional CAT program. Different multidimensional models were fitted in different CAT simulation studies using PISA 2012 Math with four subdomains. It provides insights into score reporting in multidimensional CAT. Electronic Board #23 Multidimensional IRT Model Estimation with Multivariate Non-Normal Latent Distributions Tongyun Li and Liyang Mao, Educational Testing Service The purpose of the present study is to investigate the robustness of the multidimensional IRT model parameter estimation when the latent distribution is multivariate non-normal. A simulation study is proposed to evaluate the 65 2016 Annual Meeting & Training Sessions accuracy of item and person parameter estimates with different magnitudes of violation to the multivariate normal assumption. Electronic Board #24 Stochastic Ordering of the Latent Trait Using the Composite Score Feifei Li and Timothy Davey, Educational Testing Service The purposes of this study are to investigate whether combining score from monotonic items causes the violation of SOL in the empirical composite score function and to find out what are the factors that introduce violation of SOL when combining monotonic polytomous items. Electronic Board #25 Establishing Critical Values for Parscale G2 Item Fit Statistics Lixiong Gu and Ying Lu, Educational Testing Service Research shows the Type I error rate of the PARSCALE G2 statistic are inflated with the decrease of test-length and increase of sample size. This study develops a table of empirical critical values for Type I error of 0.05 at different sample sizes that may help psychometricians flag misfit items. 66 Washington, DC, USA Saturday, April 9, 2016 2:15 PM - 3:45 PM, Renaissance East, Ballroom Level, Invited Session, D1 Assessing the Assessments: Measuring the Quality of New College- and Career-Ready Assessments Morgan Polikoff, USC Tony Alpert, Smarter Balanced Bonnie Hain, PARCC Brian Gill, Mathematica Carrie Conaway, Massachusetts Department of Education Donna Matovinovic, ACT This panel presents results from two recent studies of the quality of new college and career-ready assessments. The first study uses a new methodology to evaluate the quality of PARCC, Smarter Balanced, ACT Aspire, and Massachusetts MCAS against the CCSSO Criteria for High Quality Assessment. After the presentation of the study and its findings, respondents from PARCC and Smarter Balanced will discuss the methodology and their thoughts on the most important dimensions against which new assessments should be evaluated. The second study investigates the predictive validity of PARCC and MCAS for predicting success in college. After the presentation of the study and its findings, respondents from the Massachusetts Department of Education will discuss the study and the state’s needs regarding evidence to select and improve next-generation assessment. The overarching goal of the panel is to provoke discussion and debate about the best ways to evaluate the quality of new assessments in the collegeand career-ready standards era. 67 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 2:15 PM - 3:45 PM, Renaissance West A, Ballroom Level, Coordinated Session, D2 Some Psychometric Models for Learning Progressions Session Chair: Mark Wilson, University of California, Berkeley Session Discussant: Matthias Von Davier, ETS Learning progressions represent theories about the conceptual pathways that students follow when learning in a domain (NRC, 2006). One common type of representation is a multidimensional structure, with links between certain pairs of levels of the different dimensions (as predicted by, say, substantive theory and/or empirical findings). An illustration of such a complex hypothesis, which derives from an assessment development project in the area of statistical modeling for middle school students called the Assessing Data Modeling and Statistical Reasoning (ADM; Lehrer, Kim, Ayers & Wilson, 2014) project. The vertical columns of boxes (such as Cos1, Cos2, ... Cos4) represent the levels of each of the 6 dimensions of the learning progression. In addition to these “vertical” links between different levels of each construct, other links between levels of different constructs (such as the one from ToM6 to Cos3) that indicate that there is an expectation (from theory and/or earlier empirical findings) that a student needs to succeed on the 6th level of the ToM dimension before they can be expected to succeed on the 3rd level of the CoS dimension. Putting it a bit more formally, we use a genre of representation that is structured as a multidimensional set of constructs: Each construct has (1) several levels representing successive levels of sophistication in student understanding and (2) directional relations between individual levels of different constructs. We call the models used to analyze such a structure structured constructs models (SCMs; Wilson, 2009). Introduction to the Concept of a Structured Constructs Model (scm) Mark Wilson, University of California, Berkeley Modeling Structured Constructs as Non-Symmetric Relations Between Ordinal Latent Variables David Torres Irribarra, Pontificia Universidad Católica de Chile; Ronli Diakow (Brenda Loyd Dissertation Award Winner, 2015), New York City Department of Education A Structured Constructs Model for Continuous Latent Traits with Discontinuity Parameters In-Hee Choi, University of California, Berkeley A Structured Constructs Model Based on Change-Point Analysis Hyo Jeong Shin, ETS Discussion of the Different Approaches to Using Item Response Models for Scms Mark Wilson, University of California, Berkeley 68 Washington, DC, USA Saturday, April 9, 2016 2:15 PM - 3:45 PM, Renaissance West B, Ballroom Level, Coordinated Session, D3 Multiple Perspectives on Promoting Assessment Literacy for Parents Session Chair: Lauress Wise, Human Resources Research Organization (HumRRO) The national dialogue on American education has become increasingly focused on assessment. There is a clear need for greater understanding about fundamental aspects of educational testing. Several organizations and individuals have undertaken concerted efforts to increase the assessment literacy of various audiences, including educators, policymakers, parents, and the general public. This coordinated session will focus on the efforts taken by three initiatives that include parents among the target audiences. NCME Past President Laurie Wise will introduce the session by discussing the need for initiatives that increase the assessment literacy of parents. NCME Board member Cindy Walker will discuss the ongoing efforts on behalf of NCME to develop and promote assessment literacy materials. Beth Rorick of the National Parent Teacher Association will discuss a national assessment literacy effort to educate parents on college and career ready standards and state assessments. Maria Donata Vasquez-Colina and John Morris of Florida Atlantic University will discuss outcomes and follow up activities from focus groups with parents on assessment literacy. Presentations will be followed by group discussion (among both panelists and audience members) on ideas for coordinating multiple efforts to increase parents’ assessment literacy. NCME Assessment Literacy Initiative Cindy Walker, University of Wisconsin - Milwaukee NAEP Assessment Literacy Initiative Beth Rorick, National Parent-Teacher Association Lessons Learned from Parents on Assessment Literacy Maria Donata Vasquez-Colina and John Morris, Florida Atlantic University 69 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 2:15 PM - 3:45 PM, Meeting Room 3, Meeting Room Level, Paper Session, D4 Equating Mixed-Format Tests Session Discussant: Won-Chan Lee, University of Iowa Classification Error Under Random Groups Equating Using Small Samples with Mixed-Format Tests Ja Young Kim, ACT, Inc. Few studies investigated equating with small samples using mixed-format tests. The purpose of this study is to examine the impact of small sample and equating method on the misclassification of examinees based on where the passing scores are located, taking into account factors related to using the mixed-format tests. Sample Size Requirement for Trend Scoring in Mixed-Format Test Equating Qing Yi and Yong He, ACT, Inc.; Hua Wei, Pearson The purpose of this study is to investigate how many rescored responses are sufficient to adjust for the differences in rater severity across test administrations in mixed-format test equating. Simulated data are used to study the sample size requirement for the trend scoring method with IRT equating. Comparing IRT-Based and Ctt-Based Pre-Equating in Mixed-Format Testing Meichu Fan, Xin Li and YoungWoo Cho, ACT, Inc. Pre-equating research has tremendous appeal to test practitioners with the demand for immediate score reporting. IRT pre-equating research is readily applicable, but research on pre-equating using classical test theory (CTT), where only classical item statistics are available, is limited. This study compares various pre- and post-equating methods in mixed-format testing. Equating Mixed-Format Tests Using Automated Essay Scoring (aes) System Scores Süleyman Olgar, Florida Department of Education; Russell Almond, Florida State University This study investigated the impact of using generic e-rater scores to equate mixed-format tests with MC items and an essay. The kappa and observed agreements were large and similar across six equating methods. The MC+e-rater equating outcomes are strong and even better than the MC-only equating results for some conditions. 70 Washington, DC, USA Saturday, April 9, 2016 2:15 PM - 3:45 PM, Meeting Room 4, Meeting Room Level, Paper Session, D5 Standard Setting Session Discussant: Susan Davis-Becker, Alpine Testing Solutions Exploring the Influence of Judge Proficiency on Standard-Setting Judgments for Medical Examinations Michael Peabody, American Board of Family Medicine; Stefanie Wind, University of Alabama The purpose of this study is to explore the use of the Many-Facet Rasch model (Linacre, 1989) as a method for adjusting modified-Angoff standard setting ratings (Angoff, 1971) based on judges’ subject area knowledge. Findings suggest differences in the severity and quality of standard-setting judgments across levels of judge proficiency. Setting Cut Scores on the Ap Seminar Course and Exam Components Deanna Morgan and Priyank Patel, The College Board; Yang Zhao, University of Kansas This paper documents a standard-setting study using the Performance Profile Method to determine recommended cut scores for examinees to be placed in each of the AP grade categories (1-5). The Subject Matter Experts used an ordered profile packet of students’ performance, and converged on recommended scores. Interval Validation Method for Setting Achievement Level Standards for Computerized Adaptive Tests William Insko and Stephen Murphy, Houghton Mifflin Harcourt The Interval Validation Method for setting achievement level standards is specifically designed for assessments with large item pools, such as computerized adaptive tests. The method focuses judgments on intervals of similarly performing items presumed to contain a single cut score location. Validation of the interval sets the cut score. The Use of Web 2.0 Tools in a Bookmark Standard Setting Jennifer Lord-Bessen, McGraw Hill Education CTB; Ricardo Mercado, DRC; Adele Brandstrom, CTB This study examines the use of interactive, collaborative Web tools in an onsite, online Bookmark Standard Setting workshop for a state assessment. It explores the feasibility of this concept—addressing issues of security, user satisfaction, and cost—in a fully online standard setting with remote participants. 71 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 2:15 PM - 3:45 PM, Meeting Room 5, Meeting Room Level, Paper Session, D6 Diagnostic Classification Models: Applications Session Discussant: Jonathan Templin, University of Kansas Assessing Students’ Competencies Through Cognitive Diagnosis Models: Validity and Reliability Evidences Miguel Sorrel, Julio Olea and Francisco Abad, Universidad Autónoma de Madrid; Jimmy de la Torre, Rutgers, The State University of New Jersey; Juan Barrada, Universidad de Zaragoza; David Aguado, Instituto de Ingeniería del Conocimiento; Filip Lievens, Ghent University Cognitive diagnosis models can be applied to situational judgement tests to provide information about noncognitive factors, which currently are not included in selection procedures for admission to university. Reliable measures of study orientation (habits and attitudes), helping others, and generalized compliance were significantly related to the grade point average. Examining Effects of Pictorial Fraction Models on Student Test Responses Angela Broaddus, Center for Educational Testing and Evaluation University of Kansas; Meghan Sullivan, University of Kansas The present study investigates the effects of aspects of visual fraction models on student test responses. Responses to 50 items assessing partitioning and identifying unit fractions were analyzed using diagnostic classification methods to provide insight into effective representations of early fraction knowledge. Evaluation of Learning Map Structure Using Diagnostic Cognitive Modeling and Bayesian Networks Feng Chen, Jonathan Templin and William Skorupski, The University of Kansas The learning map underlying the assessment system should accurately specify the connections among nodes, as well as specify nodes at the appropriate level of the granularity. This paper seeks to validate a learning map combining real data analyses and simulation study to provide inference to test development. 72 Washington, DC, USA Saturday, April 9, 2016 2:15 PM - 3:45 PM, Meeting Room 12, Meeting Room Level, Paper Session, D7 Advances in IRT Modelling and Estimation Session Discussant: Mark Hansen, UCLA Estimation of Mixture IRT Models from Nonnormally Distributed Data Tugba Karadavut and Allan S. Cohen, University of Georgia Mixture IRT models generally assume standard normal ability distributions but, nonnormality is likely to occur in many achievement tests. Nonnormality has been shown to cause extraction of spurious latent classes. A skew t distribution, corrected extraction of spurious latent classes in growth models, and will be studied in this research. Two-Tier Item Factor Models with Empirical Histograms as Nonnormal Latent Densities Hyesuk Jang, American Institutes for Research; Ji Seung Yang, University of Maryland; Scott Monroe, University of Massachusetts The purpose of this study is to investigate the effects of nonnormal latent densities in two-tier item factor models on parameter estimates and to propose an extended empirical histogram approach that allows an appropriate characterization of the nonnormal densities for two correlated general factors and unbiased parameter estimates. Examining Performance of the Mh-Rm Algorithm with the 3pl Multilevel MIRT Model Bozhidar Bashkov, American Board of Internal Medicine; Christine DeMars, James Madison University This study examined the performance of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm (Cai, 2010b) in estimating 3PL multilevel multidimensional IRT (ML-MIRT) models. Item and person parameter recovery as well as variances and covariances at different levels were investigated in different combinations of number of dimensions, intraclass correlation levels, and sample sizes. Expectation-Expectation-Maximization: A Feasible Mixture-Model-Based Mle Algorithm for the ThreeParameter Logistic Model Chanjin Zheng, Jiangxi Normal University; Xiangbing Meng, Northeast Normal University Stable MLE of item parameters under 3PLM with a modest sample size remains a challenge. The current study presents a mixture-model approach to 3PLM based on which a feasible Expectation-Expectation-Maximization MLE algorithm is proposed. The simulation study indicates that EEM is comparable to Bayesian EM. 73 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 2:15 PM - 3:45 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session: GSIC Graduate Student Poster Session, D8 Graduate Student Issues Committee Brian Leventhal, Chair Masha Bertling, Laine Bradshaw, Lisa Beymer, Evelyn Johnson, Ricardo Neito, Ray Reichenberg, Latisha Sternod, Dubravka Svetina Electronic Board #1 Testing Two Alternatives to a Value-Added Model for Teacher Capability Nicole Jess, Michigan State University This study tests two alternatives to Value-Added Models (VAMs) for teacher capability: Student Response Model (SRM) and Multilevel Mixture Item Response Model (MMixIRM). We will compare the accuracy of estimation of teacher capability using these models under various conditions of class size, location of cut-score, and student assignment to teacher. Electronic Board #2 Using Response Time in Cognitive Diagnosis Models Nathan Minchen, Rutgers, The State University of New Jersey No abstract submitted at time of printing Electronic Board #3 An Exhaustive Search for Identifying Hierarchical Attribute Structure Lokman Akbay, Rutgers, The State University of New Jersey Specification of an incorrect hierarchical relationship between any two attributes can substantially degrade classification accuracy. As such, the importance of correctly identifying the hierarchical structure among attributes cannot be overemphasized. The primary objective of this study is to propose a procedure for identifying the most appropriate hierarchical structure for attributes. Electronic Board #4 Performance of DIMTEST and Generalized Dimensionality Discrepancy Statistics for Assessing Unidimensionality Ray Reichenberg, Arizona State University The standardized generalized dimensionality discrepancy measure (SGDDM; Levy, Xu, Yel, & Svetina, 2015) was compared to DIMTEST in terms of their absolute and relative efficacy in assessing the unidimensionality assumption common in IRT under a variety of testing conditions (e.g., sample size/test length). Results and future research opportunities are discussed. Electronic Board #5 Self-Directed Learning Oriented Assessments Without High Technologies Jiahui Zhang, Michigan State University Self-directed learning oriented assessments capitalizes on the construction of assessment activities for optimal learning and for the cultivation of self-directed learning capacities. This study aims to develop such an assessment 74 Washington, DC, USA combining the strengths of paper-pencil tests, CDM, and standard setting, which can be used by learners without high technologies. Electronic Board #6 Vertical Scaling Under Rasch Testlet Model Mingcai Zhang, Michigan state university Using Rasch testlet model, the scaling constants are estimated between three pairs of adjacent grades which are linked through anchor testlets. The simulated factors that impact the precision of scaling constant estimation include group mean difference, anchor testlet positions, and the magnitude of testlet effect. Electronic Board #7 The Effect of DIF on Group Invariance of IRT True Score Equating Dasom Hwang, Yonsei University Traditional methods for detecting DIF have been used for single level data analysis. However, most data in education has multilevel structure. This study investigates more effective method under various conditions comparing statistical power and type 1 error rates using adjusted methods based on Mantel-Haenszel method and SIBTEST for multilevel data. Electronic Board #8 Detecting Non-Fitting Items for the Testlet Response Model Ryan Lynch, University of Wisconsin - Milwaukee A Monte Carlo simulation will be conducted to evaluate the s-X2 item fit statistic. Findings indicate that the s-X2 may be a viable tool for evaluating item fit when the testlet effect is large, but results are mixed when the testlet effect is small. Electronic Board #9 An Iterative Technique to Improve Test Cheating Detection Using the Omega Statistic Hotaka Maeda, University of Wisconsin-Milwaukee We propose an iterative technique to improve ability estimation for accused answer copiers. A Monte Carlo simulation showed that by using the new ability estimate, the omega statistic had better controlled Type I error and increased power in all studied conditions, particularly when the source ability was high. Electronic Board #10 Parameter Recovery in the Multidimensional Graded Response Item Response Theory Model Shengyu Jiang, Universtiy of Minnesota Multidimensional graded response model can be a useful tool in modeling ordered categorical test data for multiple latent traits. A simulation study is conducted to investigate the variables that might affect parameter recovery and provide guidance for test construction and data collection in practical settings where the MGRM is applied. Electronic Board #11 The Impact of Ignoring a Multilevel Structure in Mixture Item Response Models Woo-yeol Lee, Vanderbilt University Multilevel mixture item response models are widely discussed but infrequently used in education research. Because little research exists assessing when it is necessary to use such models, the current study investigated the consequences of ignoring a multilevel structure in mixture item response models via a simulation study. 75 2016 Annual Meeting & Training Sessions Electronic Board #12 Determining the Diagnostic Properties of the Force Concept Inventory Mary Norris, Virginia Tech The Force Concept Inventory (FCI) is widely used to measure learning in introductory physics. Typically, instructors use total score. Investigation suggests that the test is multidimensional. This study fits FCI data with cognitive diagnostic and bifactor models in order to provide a more detailed assessment of student skills. Electronic Board #13 Understanding School Truancy: Risk-Need Latent Profiles of Adolescents Andrew Iverson, Washington State University Latent Profile Analysis was used to examine risk and needs profiles of adolescents in Washington State based on the WARNS assessment. Profiles were developed to aid understanding of behaviors associated with school truancy. Profiles were examined across student demographic variables (e.g., suspensions, arrests) to provide validity evidence for the profiles. Electronic Board #14 Utilizing Nonignorable Missing Data Information in Item Response Theory Daniel Lee, University of Maryland The purposes of this simulation study are to examine the effects of ignoring nonignorable missing data in item response models and evaluate the performance of model-based and imputation-based approaches (e.g., stochastic regression and Markov Chain Monte Carlo imputation) in parameter estimation to provide practical guidance to applied researchers. Electronic Board #15 Investigating IPD Amplification and Cancellation at the Testlet-Level on Model Parameter Estimation Rosalyn Bryant, University of Maryland College Park This study investigates the effect of item parameter drift (IPD) amplification or cancellation on model parameter estimation in a testlet-based linear test. Estimates will be compared between a 2-Parameter item response theory (IRT) model and a 2-Parameter testlet model varying magnitudes and patterns of IPD at item and testlet levels. Electronic Board #16 Measuring Reading Comprehension Through Automated Analysis of Students’ Small-Group Discussions Audra Kosh, University of North Carolina, Chapel Hill We present the development and initial validation of a computer-automated tool that measures elementary school students’ reading comprehension by analyzing transcripts of small-group discussions about texts. Students’ scores derived from the automated tool were a statistically significant predictor of scores on traditional multiple-choice and constructed-response reading comprehension tests. Electronic Board #17 Differential Item Functioning Among Students with Disabilities and English Language Learners Kevin Krost, University of Pittsburgh The presence of differential item functioning (DIF) was investigated on a statewide eighth grade mathematics assessment. Both students with disabilities and English language learners were focal groups, and several IRT and CTT methods were used and compared. Implications of results were discussed. 76 Washington, DC, USA Electronic Board #18 Extreme Response Style: Which Model is Best? Brian Leventhal, University of Pittsburgh More robust and rigorous psychometric models, such as IRT models, have been advocated for survey applications. However, item responses may be influenced by construct-irrelevant variance factors such as preferences for extreme response options. Through simulation methods, this study helps determine which model accounting for extreme response tendency is more appropriate. Electronic Board #19 Evaluating DIF Detection Procedure in the Context of the Mirid Isaac Li, University of South Florida The model with internal restriction on item difficulties (MIRID) is a componential Rasch model with unique betweenitem relationships, which pose challenges for psychometric studies like differential item functioning in its context. This empirical study compares and evaluates the suitability of four different DIF detection procedures for the MIRID. Electronic Board #20 Item Difficulty Modeling of Computer-Adaptive Reading Comprehension Items Using Explanatory IRT Models Yukie Toyama, UC Berkeley, Graduate School of Education This study investigated the effects of passage complexity and item type on difficulty of reading comprehension items for grades 2-12 students, using the Rasch latent regression linear logistic test model. Results indicated that it is text complexity, rather than item type, that explained the majority of variance in item difficulty. Electronic Board #21 Recovering the Item Model Structure from Automatically Generated Items Using Graph Theory Xinxin Zhang, University of Alberta We describe a methodology to recover the item models from generated items and present the results using a novel graph theory approach. We also demonstrate the methodology using generated items from the medical science domain. Our proposed methodology was found to be robust and generalizable. Electronic Board #22 The Impact of Item Difficulty on Diagnostic Classification Models Ren Liu, University of Florida Diagnostic classification models have been applied to non-diagnostic tests to partly meet the accountability demands for student improvement. The purpose of the study is to investigate the impact of item parameters (i.e. discrimination, difficulty, and guessing) on attribute classification when diagnostic classification models are applied to existing non-diagnostic tests. Electronic Board #23 Sensitivity to Multidimensionality of Mixture IRT Models Yoonsun Jang, University of Georgia Overextraction of latent classes is a concern when mixture IRT models are used in an exploratory approach. This study investigates whether some kinds of multidimensionality might result in overextraction of latent classes. A simulation study and an empirical example are presented to explain this effect. 77 2016 Annual Meeting & Training Sessions Electronic Board #24 Monte Carlo Methods for Approximating Optimal Item Selection in CAT Tianyu Wang, University of Illinois Monte Carlo techniques for item selection in an adaptive sequence are explored as a method for determining how to minimize mean squared error of ability estimation in CAT. Algorithms are developed to trim away candidate items as the test length increases, and connections to the Maximum Information criterion are studied. Electronic Board #25 The Relationship Between Q-Matrix Loading, Item Usage, and Estimation Precision in Cd-Cat Susu Zhang, University of Illinois at Urbana-Champaign The current project explores the relationship between items’ Q-matrix loadings and their exposure rate in cognitive diagnostic computerized adaptive tests, under various information-based item selection algorithms. In addition, the consequences of selecting certain high-information items loading on a large number of attributes on estimation accuracy will be examined. 78 Washington, DC, USA Saturday, April 9, 2016 4:05 PM - 6:05 PM, Renaissance East, Ballroom Level, Coordinated Session, E1 Do Large Scale Performance Assessments Influence Classroom Instruction? Evidence from the Consortia Session Discussant: Suzanne Lane, University of Pittsburgh Each of the six major statewide assessment consortia created logic models to explicate their theories of action for including performance assessment components in their summative and formative assessment designs. In this session, we will focus on the theory of action hypothesis that including performance assessment components will lead to desired changes in classroom teaching activities and student learning. This hypothesis echoes similar theories in the statewide performance assessment movements of the 1990s (e.g., Davey, Ferrara, Shavelson, Holland, Webb, & Wise, 2015, p. 5; Lane & Stone, 2006, p. 389). Lane and colleagues conducted consequential validity studies to examine this hypothesis and found modest positive results (e.g., Darling-Hammond & Adamson, 2010; Lane, Parke, & Stone, 2002; Parke, Lane, & Stone, 2006; Stone & Lane, 2003). Other researchers reported worrisome unintended consequences (e.g., Koretz, Mitchell, Barron, & Keith, 1996). In a 2015 NCME coordinated session, several consortia reported on validity arguments and supporting evidence for the performance assessment components of their programs. The session discussant made the observation that “The idea that PA will drive improvements in teaching is [the] most suspect part of [the theory of action]; more research needed.” That was a call for studies of impacts on teaching activities and student learning in the classroom. This session is a response to that call. This session is a continuation of ongoing examinations of performance assessment in statewide assessment programs that follows from well attended sessions in the 2013, 2014, and 2015 NCME meetings. The session is somewhat innovative in that we have included five of the six major statewide assessment consortia, with the goal of creating a comprehensive summary on this topic. A discussant will synthesize the evidence provided by the presenters and evaluate the consortia’s hypothesis about performance assessment and widely held beliefs about how performance assessment can influence curriculum development, teaching, and learning. Performance assessment has re-emerged as a widely used assessment tool in large scale assessment programs and in classroom formative assessment practices. Developments in validation theory (e.g., Kane, 2013) have placed claims and evidence in the center of test score interpretation and use arguments—in this case, claims about test use arguments. The convergence of these two forces requires us to (a) explicate our rationales for using specific assessment tools for specific purposes and about intended claims and inferences, and (b) investigate the plausibility of these rationales and claims. The papers in this session will explicate the consortium rationales for including performance assessment in their designs and provide new evidence of the supportability of their rationales. Smarter Balanced Assessment Consortium Marty McCall, Smarter Balanced Assessment Consortium Dynamic Learning Maps Marianne Perie and Meagan Karvonen, CETE University of Kansas NCSC Assessment Consortium Ellen Forte, edCount Elpa21 Assessment Consortium Kenji Hakuta, Stanford University; Phoebe Winter, Independent Consultant WIDA Consortium Dorry Kenyon and Meg Montee, Center for Applied Linguistics 79 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 4:05 PM - 6:05 PM, Renaissance West A, Ballroom Level, Contributed Session, E2 Applications of Latent Regression to Modeling Student Achievement, Growth, and Educator Effectiveness Session Chair: J.R. Lockwood, Educational Testing Service Session Discussant: Matthew Johnson, Columbia University There are both research and policy demands for making increasingly ambitious inferences about student achievement, achievement growth and educator effectiveness using longitudinal educational data. For example, test score data are now used routinely to make inferences about achievement growth through Student Growth Percentiles (SGP), as well as inferences about the effectiveness of schools and teachers. A common concern in these applications is that inferences may have both random and systematic errors resulting from limitations of the achievement measures, limitations of the available data, and/or failure of statistical modeling assumptions. This session will present four diverse applications in which the accuracy of standard approaches to the estimation problems can be improved, or their validity tested, through latent regression modeling. “Latent regression” refers to statistical models involving the regression of unobserved variables on observed covariates (von Davier & Sinharay, 2010). For example, the National Assessment of Educational Progress uses regression of latent achievement constructs on student background and grouping variables to improve the value of the reported results for secondary analysis (Mislevy, Johnson, & Muraki, 1992). The increasing availability of methods and software for fitting latent regression models provides unprecedented opportunities for using them to improve inferences about quantities now being demanded from educational data. Using the Fay-Herriot Model to Improve Inferences from Coarsened Proficiency Data Benjamin Shear, Stanford University; Katherine Furgol Castellano and J.R. Lockwood, Educational Testing Service Estimating True SGP Distributions Using Multidimensional Item Response Models and Latent Regression Katherine Furgol Castellano and J.R. Lockwood, Educational Testing Service Testing Student-Teacher Selection Mechanisms Using Item Response Data J.R. Lockwood, Daniel McCaffrey, Elizabeth Stone and Katherine Furgol Castellano, Educational Testing Service; Charles Iaconangelo, Rutgers University Adjusting for Covariate Measurement Error When Estimating Weights to Balance Nonequivalent Groups Daniel McCaffrey, J.R. Lockwood, Shelby Haberman and Lili Yao, Educational Testing Service 80 Washington, DC, USA Saturday, April 9, 2016 4:05 PM - 6:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, E3 Jail Terms for Falsifying Test Scores: Yes, No or Uncertain? Session Moderator: Wayne Camara, ACT Session Debaters: Mike Bunch, Measurement Incorporated; S E Phillips, Assessment Law Consultant; Mike Beck, Testing Consultant; Rachel Schoenig, ACT Far too many testing programs have recently faced public embarrassment and loss of credibility due to wellorganized schemes by educators to fraudulently inflate test scores over extended periods of time. Even testing programs with good prevention, detection and investigation strategies are frustrated because consequences such as score invalidation or loss of a license or credential seem not to be sufficient consequences to deter organized efforts to falsify test scores. The pecuniary gains, job security and recognition from falsified scores have appeared to outweigh the deterrence effect of existing penalties. This situation led a prosecutor in Atlanta, Georgia to employ a novel strategy to impose serious consequences on educators who conspired to fraudulently inflate student test scores. An extensive, external investigation triggered by excessive erasures and phenomenal test score improvements over ten years had implicated a total of 178 educators, 82 of whom had confessed and resigned, were fired or lost their teaching licenses at administrative hearings. In 2013, a grand jury indicted 35 of the remaining educators, including the alleged leader of the conspiracy, Superintendent Beverly Hall, for violation of a state Racketeer Influenced and Corrupt Organizations (RICO) statute. The RICO statute was originally designed to punish mafia organized crime, but the prosecutor argued that the cover-ups, intimidation and collusion involved in the organized activity of changing students’ answers on annual tests constituted a criminal enterprise. He further argued that this criminal enterprise obscured the academic deficiencies and shortchanged the education of poor minority students. Superintendent Hall, who denied the charges but faced a possible sentence of up to 45 years in jail, died of breast cancer shortly before the trial began. Twelve of the indicted educators who refused a plea bargain went to trial and 11 were convicted. The lone defendant who was acquitted was a special education teacher who had administered tests to students with disabilities. In April 2015, amid pleas for leniency and with an acknowledgement that the students whose achievements were misrepresented were the real victims, the trial judge handed down unexpected and stiff punishments that included jail terms for 8 of the convicted Atlanta educators. After refusing an opportunity to avoid jail time by admitting their crimes in open court and foregoing their rights to appeal, they were sentenced to jail terms of 1 to 7 years. Two of the remaining convicted educators, a testing coordinator and a teacher, accepted sentencing deals in which they received 6 months of weekends in jail and one year of home confinement, respectively. After having been held in the county jail for two weeks following their convictions, the judge released the sentenced educators on bond pending appeal. About two weeks later and consistent with the prosecutor’s original recommendations, the same judge reduced the jail time from 7 years to 3 years for the three administrators who had received the longest sentences. Despite these reductions, the sentencing of educators to multiyear jail terms for conspiring to falsify test scores remained unprecedented and controversial. Although measurement specialists may focus mainly on threats to test score validity and view invalidation of scores as the most appropriate consequence for violations of test security rules, the exposure of educator conspiracies in Atlanta and a number of other districts nationally suggests that more severe penalties may be needed to deter such violations and ensure test score validity. Measurement specialists are likely to be part of the conversations with state testing programs considering alternative consequences and will be better able to participate responsibly if they are fully informed about the competing arguments for and against penalizing egregious test security violations with jail time. 81 2016 Annual Meeting & Training Sessions Thus, the dual purposes of this symposium are to (1) conduct a debate to illuminate the arguments and evidence in favor of and against jail time for educators who conspire to falsify student test scores, and (2) to provide audience members with an opportunity to discuss and vote on a model statute specifying penalties for conspiracy to falsify student test scores. The model statute also includes an alternative for avoiding jail time similar to that offered to the convicted Atlanta educators by the trial judge prior to sentencing. A debate format was chosen for this symposium to present a fair and balanced discussion so audience members can draw their own conclusions. The opportunity to hear arguments on both sides and to consider the issues from multiple perspectives should provide audience members with insights and evidence that can be shared with states considering alternative consequences for violations of test security rules. 82 Washington, DC, USA Saturday, April 9, 2016 4:05 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Paper Session, E4 Test Design and Construction Session Discussant: Chad Buckendahl Potential Impact of Section Order on an Internet Based Admissions Test Scoring Naomi Gafni and Michal Baumer, National Institute for Testing & Evaluation Meimad is an internet based admissions test consisting of eight multiple choice sections. One out of every seven test forms is randomly selected and the eight test sections in it are presented to examinees in a random order. The study examines the effect of section position on performance level. Automated Test-Form Generation with Constraint Programming (cp) Jie Li and Wim van der Linden, McGraw-Hill Education Constraint programming (CP) is used to optimally solve automated test-form generation problems. The modeling and solution process is demonstrated for two empirical examples: (i) generation of a fixed test form with optimal item ordering; and (ii) real-time ordering of items in the shadow tests in CAT. An Item-Matching Heuristic Method for a Complex Multiple Forms Test Assembly Problem Pei-Hua Chen and Cheng-Yi Huang, National Chiao Tung University An item matching approach for a complex test specification problem was proposed and compared with the integer linear programming method. The purpose of this study is to extend the item matching method to test with complex non-psychometric constraints such as set-based items, variable set length, and nested content constraints. The Effect of Foil-Reordering and Minor Editorial Revisions on Item Performance Tingting Chen, Yu-Lan Su and Jui-Sheng Wang, ACT, Inc. This study investigates how foil-reordering, and minor reformatting and rewording affect item difficulty, discrimination and other statistics for multiple-choice items using empirical data. Comparative and correlational analyses were conducted across administrations. The results indicated a significant impact on item difficulty and key selection distributions for foil-reordering and rewording. Is Pre-Calibration Possible? a Conceptual Aig Framework, Model, and Empirical Investigation Shauna Sweet, University of Maryland, College Park; Mark Gierl, University of Alberta While automatic item generation is technologically feasible, a conceptual architecture supporting the evaluation of these generative processes is needed. This study details such a framework and empirically examines the performance of a new multi-level model intended for pre-calibration of automatically generated items and evaluation of the generation process. Award Session: NCME Annual Award: Mark Gierl & Hollis Lai 83 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 4:05 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, E5 Tablet Use in Assessment Session Discussant: Walter Way, Pearson Use of tablet devices in the classroom continues to increase as Bring Your Own Device (BYOD), 1:1 technology programs, and flipped learning change the way students consume academic content, interact with their teachers and peers, and demonstrate their mastery of academic knowledge and skills. In addition, many K-12 assessment programs (e.g. NAEP, PARCC, SBAC, etc.) now or will soon allow administration of assessments using tablets. To assure the validity and reliability of test scores it is incumbent upon test developers to evaluate the potential impact of digital devices prior to their use within assessment. This session will explore various facets of the use of tablets within educational assessment and will include presentation of a set of five papers on this topic. The papers will utilize both qualitative and quantitative methods for evaluating tablet use and will evaluate impacts for different student sub-groups and special populations as well as tablet applications for both testing and scoring. Improving Measurement Through Usability Nicholas Cottrell, Fulcrum Using Tablet Technology to Develop Learning-Oriented English Language Assessment for English Learners Alexis Lopez, Jonathan Schmigdall, Ian Blood and Jennifer Wain, ETS Device Comparability: Score Range & Subgroup Analyses Laurie Davis, Yuanyuan McBride and Xiaojing Kong, Pearson; Kristin Morrison, Georgia Institute of Technology Response Time Differences Between Computers and Tablets Xiaojing Kong, Laurie Davis and Yuanyuan McBride, Pearson Scoring Essays on an iPad Guangming Ling, Jean Williams, Sue O’Brien and Carlos Cavalie, ETS 84 Washington, DC, USA Saturday, April 9, 2016 4:05 PM - 6:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, E6 Topics in Multistage and Adaptive Testing Session Discussant: Jonathan Rubright, AICPA A Top-Down Approach to Designing a Computerized Multistage Test Xiao Luo, Doyoung Kim and Ada Woo, National Council of State Boards of Nursing The success of a computerized multistage test (MST) relies on a meticulous test design. This study introduces a new route-based top-down approach to designing MST, which imposes constraints and objectives upon routes and algorithmically searches for an optimal assembly of modules. This method simplifies and expedites the design of MST. Comparison of Non-Parametric Routing Methods with IRT in Multistage Testing Design Evgeniya Reshetnyak, Fordham University; Alina von Davier, Charles Lewis and Duanli Yan, ETS The goal of proposed study is to compare performance of non-parametric methods and machine learning techniques with traditional IRT methods for routing test takers in an adaptive multistage test design using operational and simulated data. A Modified Procedure in Applying Cats to Allow Unrestricted Answer Changing Zhongmin Cui, Chunyan Liu, Yong He and Hanwei Chen, ACT, Inc. Computerized adaptive testing with salt (CATS) has been shown to be robust to test-taking strategies (e.g., Wainer, 1993) in a reviewable CAT. The robustness, however, is gained at the expense of test efficiency loss. We propose an innovative modification such that the modified CATS is both robust and efficient. The Expected Likelihood Ratio in Computerized Classification Testing Steven Nydick, Pearson VUE This simulation study compares the classification accuracy and expected test length of the expected likelihood ratio (ELR; Nydick, 2014) item selection algorithm to alternative algorithms in SPRT-based computerized classification testing (CCT). Results will help practitioners determine the most efficient method of item selection given a particular CCT stopping rule. A Comparison of the Pretest Item Calibration Procedures in CAT Xia Mao, Pearson This study compares four procedures for calibrating pretest items in CAT using both real data and simulated data by manipulating the pretest item cluster length, calibration sample features and calibration sample sizes. The results will provide guidance for pretest item calibration in large-scale CAT in K–12 contexts. Pretest Item Selection and Calibration Under Computerized Adaptive Testing Shichao Wang, The University of Iowa; Chunyan Liu, ACT, Inc. Pretest item calibration plays an important role in maintaining item pools under computerized adaptive testing. This study aims to compare and evaluate five pretest item selection methods in item parameter estimation using various calibration procedures. The practical significance of these methods is also discussed. Using Off-Grade Items in Adaptive Testing —A Differential Item Functioning Approach Shuqin Tao and Daniel Mix, Curriculum Associates This study is intended to assess the appropriateness of using off-grade items in adaptive testing from a differential item functioning (DIF) approach. Data came from an adaptive assessment administered to school districts nationwide. Insights gained will help develop item selection strategies in adaptive algorithm to select appropriate off-grade items. 85 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 4:05 PM - 6:05 PM, Meeting Room 12, Meeting Room Level, Paper Session, E7 Cognitive Diagnosis Models: Exploration and Evaluation Session Discussant: Laine Bradshaw, University of Georgia Bayesian Inferences of Q-Matrix with Presence of Anchor Items Xiang Liu, Young-Sun Lee and Yihan Zhao, Teachers College, Columbia University Anchor items are usually included in multiple administrations of same assessment. Attribute specifications and item parameters can be obtained for these items from previous analyses. We propose a Bayesian method for estimating Q-matrix with presence of partial knowledge. Simulation demonstrates its effectiveness. TIMSS 2003 and 2007 data are then analyzed. An Exploratory Approach to the Q-Matrix Via Bayesian Estimation Lawrence DeCarlo, Teachers College, Columbia University An exploratory approach to determining the Q-matrix in cognitive diagnostic models is presented. All elements are specified as being uncertain, with respect to inclusion, and posteriors from a Bayesian analysis are used for selection. Simulations show that the approach gives high rates of correct element recovery, typically over 90%. Parametric or Nonparametric—Evaluating Q-Matrix Refinement Methods for Dina and Dino Models Yi-Fang Wu, University of Iowa; Hueying Tzou, National University of Tainan Two model-based and one model-free statistical Q-matrix refinement methods are evaluated and compared against one another. Large-scope simulations are used to study their q-vector recovery rates and the correct rates of examinee classification. The three most recent methods are also applied to real data for identifying and correcting misspecified q-entries. Comparing Attribute-Level Reliability Estimation Methods in Diagnostic Assessments Chunmei Zheng and Yuehmei Chien, Pearson; Ning Yan, Independent Consultant Diagnostic classification models have drawn much attention to practitioners due to its promising use on aligning teaching, learning, and assessment. However, little has been investigated on attribute classification reliability. The purpose of this study, therefore, is to conduct a comparison study for the existing reliability estimation method. Estimation of Diagnostic Classification Models Without Constraints: Issues with Class Label Switching Hongling Lao and Jonathan Templin, University of Kansas Diagnostic classification models (DCMs) may suffer from the latent class label switching issue, providing misleading results. A simulation study is proposed to investigate (1) the prevalence of label switching issue in different DCMs, and (2) the effectiveness of constraints at preventing label switching from happening. Conditions Impacting Parameter and Profile Recovery Under the Nida Model Yanyan Fu, Jonathan Rollins and Robert Henson, UNCG The NIDA model was studied under various conditions. Results indicated that sample size did not affect attribute parameter recovery and marginal CCRs. However, the number of attributes and items influenced the mCCRs. RUM and NIDA model generated data yielded similar mCCRs when estimated using the NIDA model. 86 Washington, DC, USA Sequential Detection of Learning Multiple Skills in Cognitive Diagnosis Sangbeak Ye, University of Illinois - Urbana Champaign Cognitive diagnosis models aim to identify examinees’ mastery or non-mastery of a vector of skills. In an e-learning environment where a set of skills are trained until mastery, proper detection method to determine the presence of the skills is vital. We introduce techniques to detect change-points of multiple skills. 87 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 4:05 PM - 5:35 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session, Paper Session, E8 Electronic Board #1 Response Styles Adjustments in Cross-Cultural Data Using the Mixture PCM IRT Model Bruce Austin, Brian French and Olusola Adesope, Washington State University Response styles can contribute irrelevant variance to rating-scale items and compromise cross-cultural comparisons. Rasch IRT models were used to identify response styles and adjust data after identifying latent classes based on response styles. Predictive models were improved with adjusted data. We conclude with recommendations for identifying response style classes. Electronic Board #2 Using Differential Item Functioning to Test for Inter-Rater Reliability in Educational Testing Sakine Gocer Sahin, Hacettepe University; Cindy M. Walker, University of Wisconsin- Milwaukee Although multiple choice items can be more reliable, the information obtained from open ended items is sometimes greater, and more aligned than these items. This is only true if the raters are unbiased. The purpose of this research was to investigate an alternative measure of inter-rater reliability, based in IRT. Electronic Board #3 Incorporating Expert Priors in Estimation of Bayesian Networks for Computer Interactive Tasks Johnny Lin, University of California, Los Angeles; Hongwen Guo, Helena Jia, Jung Aa Moon and Janet Koster van Groos, Educational Testing Service Due to the cost of item development in computer interactive tasks, the amount of evidence available for estimation is reduced. In order to minimize instability, we show how expert priors can be incorporated into Bayesian Networks by performing a smoothing transformation to obtain posterior estimates. Electronic Board #4 A Multidimensional Rater Effects Model Richard Schwarz, ETS; Lihua Yao, DMDC An approach for evaluating rater effects is to add an explicit rater parameter to a polytomous IRT model called a rater effects model. A multidimensional rater effects model is proposed. Using MCMC techniques and simulation, specifications for priors, the posterior distributions, and estimation of the model will be described. Electronic Board #5 Exploring Clinical Diagnosis Process Data with Cluster Analysis and Sequence Mining Feiming Li and Frank Papa, Univeristy of North Texas Health Science Center This study collected a clinical diagnosis process data from a diagnosis task performed by medical students in the computer-based environment. The study aimed to identify attributes of data-gathering behaviors predicting diagnostic accuracy; conduct cluster analysis and sequential mining to explore meaningful attribute or sequential patterns explaining the success/failure of diagnosis. 88 Washington, DC, USA Electronic Board #6 Validity Evidence for a Writing Assessment for Students with Significant Cognitive Disabilities Russell Swinburne Romine, Meagan Karvonen and Michelle Shipman, University of Kansas Sources of evidence for a validity argument are presented for the writing assessment in the Dynamic Learning Maps Alternate Assessment System. Methods included teacher surveys, test administration observations and a new cognitive lab protocol in which test administrators participated in a think aloud during administration of a practice assessment. Electronic Board #7 The Implications of Reduced Testing for Teacher Accountability Jessica Alzen, School of Education University of Colorado Boulder; Erin Fahle and Benjamin Domingue, Graduate School of Education Stanford University The present student testing burden is substantial, and interest in alternative scenarios with reduced testing but persistent accountability measures has grown. This study focuses on VA estimates in the presence of structural missingness of test data consistent with alternative scenarios designed to reduce the student testing burden. Electronic Board #8 Examination of the Constructs Assessed by Published Tests of Critical Thinking Jennifer Kobrin, Edynn Sato, Emily Lai and Johanna Weegar, Pearson We used a principled approach to define the construct of critical thinking and examined the degree to which existing tests are aligned to the construct. Our findings suggest that existing tests tend to focus on a narrow set of skills and identify gaps that offer opportunities for future assessment development. Electronic Board #9 The False Discovery Rate Applied to Large-Scale Testing Security Screenings Tanesia Beverly, University of Connecticut; Peter Pashley, Law School Admission Council When statistical tests are conducted repeatedly to detect test fraud (e.g., copying) the overall false-positive rate should be controlled. Three approaches to adjusting significance levels were investigated with simulated and real data. A procedure for controlling the false discovery rate by Benjamini and Hochberg (1995) yielded the best results. Electronic Board #10 The Impact of Ignoring Multiple-Group Structure in Testlet-Based Tests on Ability Estimation Ming Li, Hong Jiao and Robert Lissitz, University of Maryland The study investigates the impact of ignoring the multi-group structure on ability estimation in testlet-based tests. In a simulation, model parameter estimates from three IRT models: a standard 2PL model, and a multiple-group 2PL model with or without testlet effects are compared and evaluated in terms of estimation errors. Electronic Board #11 Reconceptualising Validity Incorporating Evidence of User Interpretation Timothy O’Leary, University of Melbourne; John Hattie and Patrick Griffin, Melbourne University Validity is a fundamental consideration in test development. A recent conception introduced user validity focused upon the accuracy and effectiveness of interpretations resulting from test score reports. This paper proposes a reconceptualization of validity incorporating evidence of user interpretations and a method for the collection of such evidence. 89 2016 Annual Meeting & Training Sessions Electronic Board #12 Single and Double Linking Designs Accessed by Population Invariance Yan Huo and Sooyeon Kim, Educational Testing Service The purpose of this study is to determine whether double linking is more effective than single linking in terms of achieving subpopulation invariance on scoring. When double-linking was applied, the conversions derived from two subgroups different in geographic regions were more comparable to the conversion derived from the total group. Electronic Board #13 Equating Mixed-Format Tests Using a Simple-Structure MIRT Model Under a Cineg Design Jiwon Choi, ACT/University of Iowa; Won-Chan Lee, University of Iowa This study applies the SS-MIRT observed score equating procedure for mixed-format tests under the CINEG design. Also, the study compares various scale linking methods for SS-MIRT equating. The results show that the SS-MIRT approach provides more accurate equating results than the UIRT and traditional equipercentile methods. Electronic Board #14 Pre-Equating or Post-Equating? Impact of Item Parameter Drift Wenchao Ma, Rutgers, The State University of New Jersey; Hao Song, National Board of Osteopathic Medical Examiners This study, using a real-data-based simulation, examines whether item parameter drift (IPD) influences pre-equating and post-equating. Accuracy of ability estimates and classifications are evaluated under varied conditions of IPD direction, magnitude, and proportion of items with IPD. Recommendation is made on which equating method is preferred under different IPD conditions. Electronic Board #15 A Comparative Study on Fixed Item Parameter Calibration Methods Keyu Chen and Catherine Welch, University of Iowa This study provides a description of implementing fixed item parameter method in BILOG-MG as well as a comparison of three fixed item parameter calibration methods when calibrating field test items on the scale of operational items. A simulation study will be conducted to compare results of the three methods. Electronic Board #16 Examining Various Weighting Schemes Effect on Examinee Classification Using a Test Battery Qing Xie, ACT/The University of Iowa; Yi-Fang Wu, Rongchun Zhu and Xiaohong Gao, ACT, Inc The purpose of this study is to examine the effect of various weighting schemes on classifying examinees into multiple categories. The results will provide practical guidelines for using either profile scores or composite score for examinee classification in a test battery. Electronic Board #17 Module Assembly for Logistic Positive Exponent Model-Based Multistage Adaptive Testing Thales Ricarte and Mariana Cúri, Institute of Mathematical and Computer Sciences (ICMC-USP); Alina von Davier, Educational Testing Service (ETS) In Multistage (MST) Adaptive Testing based on Item Response theory models, modules are assembled optimizing an objective function via linear programming. In this project, we analyzed the MST based on the Logistic Positive Exponent model for testlet performance using Fisher, Kullback-Leibler information criteria and Continuous Entropy Method as objective function. 90 Washington, DC, USA Electronic Board #18 Online Calibration Pretest Item Selection Design Rui Guo and Hua-hua Chang, University of Illinois at Urbana-Champaign Pretest item calibration is crucial In multidimensional computerized adaptive testing. This study proposed an online calibration pretest item selection design name Four-quadrant D-optimal design with proportional density index algorithm. Simulation results showed that the proposed method provides a good item calibration efficiency. Electronic Board #19 Online Multistage Intelligent Selection Method for Cd-Cat Fen LUO, Shuliang Ding, Xiaoqing Wang and Jianhua Xiong, Jiangxi Normal University A new item selection-method, online multistage intelligent selection method (OMISM) is proposed. Simulation results show that for OMISM, the pattern match ratio of knowledge state is higher than that for posterior-weighted Kullback-Leibler information selection method in CD-CAT when examinees mastered multiple attributes. Electronic Board #20 Data-Driven Simulations of False Positive Rates for Compound DIF Inference Rules Quinn Lathrop, Northwest Evaluation Association Understanding how inference rules function under the null hypothesis is critical. This proposal presents a datadriven simulation method to determine the false positive rate of tests for DIF. The method does not assume a functional form of the item characteristic curves and also replicates impact from empirical data. Electronic Board #21 Simultaneous Evaluation of DIF and Its Sources Using Hierarchical Explanatory Models William Skorupski, Jennifer Brussow and Jessica Loughran, University of Kansas This study uses item-level features as explanatory variables for understanding DIF. Two approaches for DIF identification/explanation are compared: 1) two-stage DIF + regression, and 2) a simultaneous, hierarchical approach. Realistic data were simulated by varying the strength of relationship between DIF and explanatory variables and reference/focal group sample sizes. Electronic Board #22 Comparing Imputation Methods for Trait Estimation Using the Rating Scale Model Christopher Runyon, Rose Stafford, Jodi Casabianaca and Barbara Dodd, The University of Texas at Austin This research investigates trait level estimation under the rating scale model using three imputation methods of handling missing data: (a) multiple imputation, (b) nearest-neighbor hot deck imputation, and (c) multiple hot deck imputation. We compare the performance of these methods for three levels of missingness crossed with three scale lengths. Electronic Board #23 The Nonparametric Method to Analyze Multiple-Choice Items: Using Hamming Distance Method Shibei Xiang, Wei Tian and Tao Xin, National Cooperative Innovation Center for Assessment and Improvement of Basic Education Quality Many data in education are in the form of multiple-choice (MC) items that are scored as dichotomous data. In order to obtain information from incorrect answers, we expand Q-matrix for options and use nonparametric Hamming distance method to classify examinees that can be even used on a small sample size. 91 2016 Annual Meeting & Training Sessions Electronic Board #24 Automatic Scoring System for a Short Answer in Korean Large Scale Assessment EunYoung Lim, Eunhee Noh and Kyunghee Sung, Korean Institute for Curriculum and Evaluation The purpose of this study is to evaluate a prototype of Korean automatic scoring system (KASS) for short answers and to explore the related features of KASS to improve accuracy of automatic scoring. 92 Washington, DC, USA Saturday, April 9, 2016 4:05 PM - 7:00 PM, Convention Center, Level Two, Room 202 A The Life and Contributions of Robert L. Linn, Followed by a Reception Note: NCME is partnering with AERA to record this session. We will make this recording available to all NCME members, including those who have to miss this tribute for presentations and attendance at NCME sessions. 93 2016 Annual Meeting & Training Sessions Saturday, April 9, 2016 6:30 PM - 8:00 PM, Grand Ballroom South, Ballroom Level NCME and AERA Division D Joint Reception National Council on Measurement in Education and AERA Division D Welcome Reception for Current and New Members 94 Washington, DC, USA Annual Meeting Program - Sunday, April 10, 2016 95 2016 Annual Meeting & Training Sessions 96 Washington, DC, USA Sunday, April 10, 2016 8:00 AM - 9:00 AM, Marquis Salon 6, Marriott Marquis Hotel 2016 NCME Breakfast and Business Meeting (Ticketed Event) Join your friends and colleagues at the NCME Breakfast and Business Meeting at the Marriott Marquis Hotel. Theater style seating will be available for those who did no purchase a breakfast ticket but wish to attend the Business Meeting. 97 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 9:00 AM - 9:40 AM, Marquis Salon 6, Marriott Marquis Hotel Presidential Address: Education and the Measurement of Behavioral Change Rich Patz Act, Iowa City, IA 98 Washington, DC, USA Sunday, April 10, 2016 10:35 AM - 12:05 PM, Renaissance East, Ballroom Level, Invited Session, F1 Award Session Career Award: Do Educational Assessments Yield Achievement Measurements Winner: Mark Reckase Session Moderator: Kadriye Erickan, University of British Columbia Because my original training in measurement/psychometrics was in psychology rather than education, I have noted the difference in approaches taken for the development of tests in those two disciplines. One begins with the concepts of a hypothetical construct and locating persons along a continuum, and the other begins with the definition of a domain of content and works to estimate the amount of the domain that a person has acquired. This presentation will address whether these two conceptions of test development are consistent with each other and with the assumptions of the IRT models that are often used to analyze the test results. It will also address how tests results are interpreted and if those interpretations are consistent with the measurement model and the test design. Finally, there is a discussion of how users of test results would like to interpret results, and whether measurement experts can produce tests and analysis procedures that will support the desired interpretations 99 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Invited Session, F2 Debate: Should the NAEP Mathematics Framework Be Revised to Align with the Common Core State Standards? Session Presenters: Michael Cohen, Achieve Chester Finn, Fordham Institute Session Moderators: Bill Bushaw, National Assessment Governing Board Terry Mazany, Chicago Community Trust The 2015 National Assessment of Educational Progress (NAEP) results showed declines in mathematics scores at grades 4 and 8 for the nation and several states and districts. The release of the 2015 NAEP results prompted discussion about the extent to which the results may have been affected by differences between the content of the NAEP mathematics assessments and the Common Core State Standards in mathematics. The National Assessment Governing Board wants to know what you think. The presenters will frame the issue and then audience members will engage in a thorough discussion providing important insights to Governing Board members. 100 Washington, DC, USA Sunday, April 10, 2016 10:35 AM - 12:05 PM, Renaissance West B, Ballroom Level, Coordinated Session, F3 Beyond Process: Theory, Policy, and Practice in Standard Setting Session Chair: Karla Egan, NCIEA Session Discussant: Chad Buckendahl, Alpine Testing Standard setting has become a routine and (largely) accepted part of the test development cycle for K-12 summative assessments. Conventional implementation of almost any K-12 standard setting method convenes teachers who study achievement level descriptors (ALDs) to make decisions about the knowledge, skills, and abilities (KSAs) expected of students. Traditionally, these cut scores have gone to state boards of education or education commissioners that are sometimes reluctant to adjust cut scores established by educators. While these conventional practices have served the field well, there are particular areas that deserve further scrutiny. The first area needing further scrutiny is the validity of the ALDs, which provide a common framework for panelists to use when recommending cut scores. These ALDs are often written months or years prior to the test, sometimes even providing guidance for item writers and test developers regarding the KSAs expected of students on the test (Egan, Schneider, & Ferrara, 2012). What happens when carefully developed ALDs are not well aligned to actual student performance? This is an area that happens in practice, yet only handful of studies that have examined the issue (e.g., Schneider, Egan, Kim, & Brandstrom, 2012). The first paper seeks to validate the ALDs used in the development of a national alternate assessment against student performance on that assessment. The next area that needs a closer look is the use of educators as panelists in standard setting workshops. Educators may have a conflict of interest in recommending the cut scores. Educators are asked to recommend cut scores that have a direct consequence on accountability measures, such as teacher evaluation. There are other means of setting cut scores that do not involve teachers. For example, when setting college-readiness cut scores, it may not even be necessary to bring in panelists if the state links performance on their high school test to a test like the ACT or SAT. The second paper investigates the positives and negatives of quantitative methods for setting cut scores. Another take on the same issue is to involve panelists who are able to reflect globally on the how cut scores will impact school-, district-, and statewide systems. To this end, methods have been used that show different types of data to inform panelist decision (Beimers, Way, McClarty, & Miles, 2012). Others have brought in district-level staff following the content-based standard setting to adjust cut scores from a system perspective. The third paper approaches this as a validity issue, and it examines the different type of evidence (beyond process) that should be used to support standard setting. The final issue that deserves further scrutiny is the use of panelists as evaluators of the standard setting. Panelists often serve as the only evaluators of the implementation and outcome of the method itself. Panelists fill out evaluations at the end of the standard setting, and these are often used as validity evidence supporting the cut scores. While this group represents an important perspective on the standard setting process, it is important to recognize that panelists are often heavily invested in the process by the time they participate in an evaluation of the standard setting workshop. The last paper considers the role that an external evaluator could play at standard setting. The Alignment of Achievement Level Descriptors to Student Performance Lori Nebelsick-Gullet, edCount Data-Based Standard Setting: Moving Away from Panelists? Joseph Martineau, NCIEA 101 2016 Annual Meeting & Training Sessions Examining Validity Evidence of Policy Reviews Juan d’Brot, DRC The Role of the External Evaluator in Standard Setting Karla Egan, NCIEA 102 Washington, DC, USA Sunday, April 10, 2016 10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, F4 Exploring Timing and Process Data in Large-Scale Assessments Session Chairs: Matthias von Davier and Qiwei He, Educational Testing Service Session Discussant: Ryan Baker, Teachers College Columbia University Computer-based assessments (CBAs) provide new insights into behavioral processes related to task completion that cannot be easily observed using paper-based instruments. In CBAs, a variety of timing and process data accompanies test performance data. This means that much more than data is available besides correctness or incorrectness. The analyses of these types of data are necessarily much more involved than those typically performed on traditional tests. This symposium provides examples of how sequences of actions and timing data are related to task performance and how to use process data to interpret students’ computer and information literacy achievements in large-scale international education and skills surveys such as the Programme for International Student Assessment (PISA), the Programme for International Assessment of Adult Competencies (PIAAC), and the International Computer and Information Literacy Study (ICILS). The methods applied in these talks draw on cognitive theories for guidance of what “good” problem solving is, as well as on modern data-analytic techniques that can be utilized to explore log file data. These studies highlight the potential of analyzing students’ behavior stored in log files in computer-based large-scale assessments and show the promise of tracking students’ problemsolving strategies by using process data analysis. An Overview: Process Data – Why Do We Care? Matthias von Davier, Educational Testing Service Log File Analyses of Students’ Problem-Solving Behavior in PISA 2012 Assessment Samuel Greiff and Sascha Wüstenberg, University of Luxembourg Identifying Feature Sequences from Process Data in PIAAC Problem-Solving Items with N-Grams Qiwei He and Matthias von Davier, Educational Testing Service Predictive Feature Generation and Selection from Process Data in PISA Simulation-Based Environment Zhuangzhuang Han, Qiwei He and Matthias von Davier, Educational Testing Service 103 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 10:35 AM - 12:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, F5 Psychometric Challenges with the Machine Scoring of Short-Form Constructed Responses Session Chair: Mark Shermis, University of Houston—Clear Lake Session Discussant: Claudia Leacock, CTB/McGraw-Hill This session examines four methodological problems associated with machine scoring of short-form constructed responses. The first study looks at the detection of speededness with short-answer questions on a testlet-based science test. Because items in a testlet are scored together, speededness can have a negative and even irrecoverable impact on an examinee’s score. The second study attempted to detect speededness/differential speededness on Task Based Simulations (a type of short-form constructed response) that were part of a licensing exam. Since the TBSs are embedded in the same section of the exam as multiple-choice questions, the goal was to ensure that examinees will have enough time to complete the test. The third study used a new twist on adjudicating short-answer machine scores. Instead of using a second human rater to adjudicate discrepant scores between one human and one machine rater, the study employed two different machine scoring systems and used a human rater to resolve differences in scores. The last study attempted to explain DIF using linguistic feature sets of machine scored short-answer questions taken from middle- and high-school exam questions. The study suggests that focal and reference groups have different “linguistic profiles” that may explain differences in test performance on particular items. Speededness Effects in a Constructed Response Science Test Meereem Kim, Allan Cohen, Zhenqui Lu, Seohyun Kim, Cory Buxton and Martha Allexsaht-Snider, University of Georgia Speededness for Task Based Simulations Items in a Multi-Stage Licensure Examination Xinhui Xiong, American Institute for Certified Public Accountants Short-Form Constructed Response Machine Scoring Adjudication Methods Susan Lottridge, Pacific Metrics, Inc. Use of Automated Scoring to Generate Hypotheses Regarding Language Based DIF Mark Shermis, University of Houston--Clear Lake; Liyang Mao, IXL Learning; Matthew Mulholland, Educational Testing Service; Vincent Kieftenbeld, PacificMetrics, Inc. 104 Washington, DC, USA Sunday, April 10, 2016 10:35 AM - 12:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, F6 Advances in Equating Session Discussant: Benjamin Andrews, ACT Bifactor MIRT Observed Score Equating of Testlet-Based Tests with Nonequivalent Groups Mengyao Zhang, National Conference of Bar Examiners; Won-Chan Lee, The University of Iowa; Min Wang, ACT This study extends a bifactor MIRT observed-score equating framework for testlet-based tests (Zhang et al., 2015) to accommodate nonequivalent groups. Binary data are simulated to represent varying degrees of testlet effect and group equivalence. Different procedures are evaluated regarding the estimated equating relationships for numbercorrect scores. Hierarchical Generalized Linear Models (hglms) for Testlet-Based Test Equating Ting Xu and Feifei Ye, University of Pittsburgh This simulation study was to investigate the effectiveness of Hierarchical Generalized Linear Models (HGLMs) as concurrent calibration models on testlet-based test equating under the anchor-test design. Three approaches were compared, including two under the HGLM framework and one using Rasch concurrent calibration. Degrees of testlet variance were manipulated. The Local Tucker Method and Its Standard Errors Sonya Powers, Pearson; Lisa Larsson, ERC Credit Modelling A new linear equating method is proposed that addresses limitations of the local and Tucker equating methods. This method uses a bivariate normal distribution to model common and non-common item scores. Simulation results indicate that this new method has comparable standard errors to the original Tucker method and less bias. Using Criticality Analysis to Select Loglinear Smoothing Models Arnond Sakworawich, National Institute of Development Administration; Han-Hui Por and Alina von Davier, Educational Testing Service; David Budescu, Fordham University This paper proposes “Criticality analysis” as a loglinear smoothing model selection procedure. We show that this method outperforms traditional methods that rely on global measures of fit of the original data set by providing a clearer and sharper differentiation between the competing models. 105 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 10:35 AM - 12:05 PM, Meeting Room 15, Meeting Room Level, Paper Session, F7 Novel Approaches for the Analysis of Performance Data Session Discussant: William Skorupski, University of Kansas Combining a Mixture IRT Model with a Nominal Random Item Mixture Model Hye-Jeong Choi and Allan Cohen, University of Georgia; Brian Bottge, University of Kentucky This study describes a psychometric model in which a mixture item response theory model (MixIRTM) is combined to a random item mixture nominal response model (RMixNRM). Inclusion of error and accuracy in one model has the potential to provide a more direct explanation about differences in response patterns. Bayesian Estimation of Null Categories in Constructed-Response Items Yong He, Ruitao Liu and Zhongmin Cui, ACT, Inc. Estimating item parameters in the presence of a null category in a constructed-response item is challenging. The problem has not been investigated in the generalized partial credit model (GPCM). A Bayesian estimation of null categories based on the GPCM framework is proposed in this study. The Fast Model: Integrating Learning Science and Measurement José González-Brenes, Pearson; Yun Huang and Peter Brusilovsky, University of Pittsburgh The assessment and learning science communities rely on different paradigms to model student performance. Assessment uses models that capture different student abilities and problem difficulties, while learning science uses models that capture skill acquisition. We present our recent work on FAST (Feature Aware Student knowledge Tracing) to bridge both communities. Award Session: Brenda Loyd Dissertation Award 2016- Yuanchoa Emily Bo 106 Washington, DC, USA Sunday, April 10, 2016 10:35 AM - 12:05 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session, Paper Session, F8 Electronic Board #1 Multilevel IRT: When is Local Independence Violated? Christine DeMars and Jessica Jacovidis, James Madison University Calibration data often is often collected within schools. This illustration shows that random school effects for ability do not bias IRT parameter estimates or their standard errors. However, random school effects for item difficulty lead to bias in item discrimination estimates and inflated standard errors for difficulty and ability. Electronic Board #2 The Higher-Order IRT Model for Global and Local Person Dependence Kuan-Yu Jin and Wen-Chung Wang, The Hong Kong Institute of Education Persons from the same clusters may behave more similarly than those from different clusters. In this study, we proposed a higher-order partial credit model for person clustering to quantify global and local person dependence for clustered samples in multiple tests. Simulations studies supported good parameter recovery of the new model. Electronic Board #3 A Multidimensional Item Response Model for Local Dependence and Content Domain Structure Yue Liu, Sichuan Institute Of Education Sciences; Lihua Yao, Defense Manpower Data Center; Hongyun Liu, Beijing Normal University, Depart ment of Psychology This study proposed a multidimensional item response model for testlets to simultaneously account for local dependence due to item clustering and multidimensional structure. Within-testlet and between-testlet models are applied to collaborative problem solving assessments real data. Precisions for the domain score and overall score for the proposed models are compared. Electronic Board #4 Distinguishing Struggling Learners from Unmotivated Students in an Intelligent Tutoring System Kimberly Colvin, University at Albany, SUNY To help teachers distinguish struggling learners from unmotivated students, a measure of examinee motivation designed for large-scale computer-based tests was modified and applied to an intelligent tutoring system. Proposed modifications addressed issues related to small sample sizes. The relationship of hint use and student motivation was also investigated. Electronic Board #5 Using Bayesian Networks for Prediction in a Comprehensive Assessment System Nathan Dadey and Brian Gong, The National Center for the Improvement of Educational Assessment This work shows how a Bayesian network can be used to predict student summative achievement classifications using assessment data collected thought the school year. The structure of the network is based on a curriculum map. The ultimate aim is to examine the usefulness of the network information to teachers. 107 2016 Annual Meeting & Training Sessions Electronic Board #6 Comparability Within Computer-Based Assessment: Does Screen Size Matter? Jie Chen and Marianne Perie, Center for Educational Testing and Evaluation Comparability studies are moving beyond paper-and-pencil versus computer-based assessments to analyze variances within computers. Using data from a large district giving tests on either Macs, with large, high-definition screens or Chromebooks, with standard 14” screens, this study compares assessment results between devices by grade, subject, and item type. Electronic Board #7 Modeling Acquiescence and Extreme Response Styles and Wording Effects in Mixed-Format Items Hui-Fang Chen, City University of Hong Kong; Kuan-Yu Jin and Wen-Chung Wang, Hong Kong Institute of Education Acquiescence and extreme response styles and wording effects are commonly observed in rating scale or Likert items. In this study, a multidimensional IRT model was proposed to account for these two responses styles and wording effects simultaneously. The effectiveness and feasibility of the new model were examined in simulation studies. Electronic Board #8 Accessibility: Consideration of the Learner, the Teacher, and Item Performance Bill Herrera, Charlene Turner and Lori Nebelsick-Gullett, edCount, LLC; Lietta Scott, Arizona Department of Education, Assessment Section To better understand the impact of federal legislation that required schools to provide access to academic curricula to students with intellectual disability, the National Center and State Collaborative examined differential performance of items with respect to students’ communication and opportunity to learn using data from three assessment administrations. Electronic Board #9 Examining the Growth and Achievement of Stem Majors Using Latent Growth Models Heather Rickels, Catherine Welch and Stephen Dunbar, University of Iowa, Iowa Testing Programs This study examined the use of latent growth models (LGM) when investigating the growth and college readiness of STEM majors versus non-STEM majors. Specifically, LGMs were used to compare growth on a state achievement test from Grades 6-11 of STEM majors and non-STEM majors at a public university. Electronic Board #10 Modeling NCTM and CCSS 5th Grade Math Growth Estimates and Interactions Dan Farley and Meg Guerreiro, University of Oregon This study compares NCTM and CCSS growth estimates. Multilevel models were used to generate models to compare standards. The CCSS measures appear to be more sensitive to growth, but exhibit potential biases toward female and English learners. Electronic Board #11 Norming and Psychometric Analysis for a Large-Scale Computerized Adaptive Early Literacy Assessment James Olsen, Renaissance Learning Inc. This paper presents psychometric analysis and norming information for a large-scale adaptive K-3 early-literacy assessment. It addresses validity, reliability, and later grade 3 reading proficiency. The norming involved sampling 586,380 fall/spring assessments, post stratification weighting to a representative national sample, descriptive score statistics, and developing scale percentiles and grade equivalents. 108 Washington, DC, USA Electronic Board #12 The Impact of Ignoring the Multiple-Group Structure of Item Response Data Yoon Jeong Kang, American Institutes for Research; Hong Jiao and Robert Lissitz, University of Maryland This study examines model parameter estimation accuracy and proficiency level classification accuracy when the multiple-group structure of item response data is ignored. The results show that the heterogeneity of population distribution was the most influential factor on the accuracy of model parameter estimation and proficiency level classification. Electronic Board #13 Influential Factors on College Retention Based on Tree Models and Random Forests Chansoon Lee, Sonya Sedivy and James Wollack, University of Wisconsin-Madison The purpose of this study is to examine influential factors on college retention. Tree models and random forests will be applied to determine important factors on student retention and to improve the prediction of college retention. Electronic Board #14 Detecting Non-Effortful Responses to Short-Answer Items Ruth Childs, Gulam Khan and Amanda Brijmohan, Ontario Institute for Studies in Education, University of Toronto; Emily Brown, Sheridan College; Graham Orpwood, York University This study investigates the feasibility and effects of using the content of short-answer responses, in addition to response times, to improve the filtering of non-effortful responses from field test data and so improve item calibration. Electronic Board #15 Item Difficulty Modeling for an Ell Reading Comprehension Test Using LLTM Lingyun Gao, ACT, Inc.; Changjiang Wang, Pearson This study models cognitive complexity of the items included in a large-scale high-stakes reading comprehension test for English language learners (ELL), using the linear logistic test model (LLTM; Fischer, 2005). The findings will have implications for targeted test design and efficient item development. Electronic Board #16 The Effect of Unmotivated Test-Takers on Field Test Item Calibrations H. Jane Rogers and Hariharan Swaminathan, University of Connecticut A simulation study was conducted to investigate the effect of low motivation of test-takers on field-test item calibrations. Even small percentages of unmotivated test-takers resulted in substantial underestimation of discrimination parameters and overestimation of difficulty parameters. These calibration errors resulted in inaccurate estimation of trait parameters in a CAT administration. Electronic Board #17 Cognitive Analysis of Responses Scored Using a Learning Progression for Proportional Reasoning Edith Aurora Graf, ETS; Peter van Rijn, ETS Global Learning progressions are complex structures based on a synthesis of standards documents and research studies, and therefore require empirical verification. We describe a validity exercise in which we compare IRT-based classifications of students into the levels of a learning progression to classifications provided by a human rater. 109 2016 Annual Meeting & Training Sessions Electronic Board #18 Nonparametric Diagnostic Classification Analysis for Testlet-Based Tests Shuying Sha and Robert Henson, University of North Carolina at Greensboro This study investigates the impact of the testlet effect on performance of parametric and nonparametric (Hamming Distance method) diagnostic classification analysis. Results showed that the performance of both approaches deteriorated with the increase of the testlet effect size. Potential solutions to nonparametric classification for testletbased test are proposed. Electronic Board #19 An Application of Second-Order Growth Mixture Model for Educational Longitudinal Research Xin Li and Changhua Rich, ACT, Inc.; Hongyun Liu, Beijing Normal University Investigating change in individual achievement over time is of central importance in educational research. The current study describes and illustrates the use of the second-order latent growth model with its extension to the growth mixture model to a real data to help modeling growth with considering the population heterogeneity. Electronic Board #20 Confirmatory Factor Analysis of Timss’ Mathematics Attitude Items with Recommendations for Change Thomas Hogan, University of Scranton This study reports results of confirmatory factor analysis for Trends in International Mathematics and Science Study (TIMSS) math attitude scales for national samples of students in the United States at grades 4 and 8. Recommendations are made for improvement of the scales, particularly for the Self-confidence latent variable. Electronic Board #21 Controlling for Multiplicity in Structural Equation Models Michael Zweifel and Weldon Smith, University of Nebraska-Lincoln When evaluating a structural equation models, several hypotheses are evaluated simultaneously which leads which increases the probability that a Type I error is committed. This proposal examined how several common multiple comparison procedures performed when the number of item response categories and the item variances were varied. Electronic Board #22 Alternative Approaches for Comparing Test Score Achievement Gap Trends Benjamin Shear, Stanford University; Yeow Meng Thum, Northwest Evaluation Association This paper compares trajectories of cross-sectional achievement gaps between subgroups to subgroup differences in longitudinal growth trajectories. The impact of vertical scaling assumptions is assessed with parallel analyses in an ordinal metric. We suggest ways to test inferences about closing gaps (“equalization”) across grades and cohorts, possibly for value-added analyses. 110 Washington, DC, USA Sunday, April 10, 2016 12:25 PM - 2:25 PM, Ballroom ABC, Level Three, Convention Center AERA Awards Luncheon AERA’s Awards Program is one of the most prominent ways for education researchers to recognize and honor the outstanding scholarship and service of their peers. Recipients of AERA awards are announced and recognized during the Annual Awards Luncheon. 111 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 2:45 PM - 4:15 PM, Renaissance East, Ballroom Level, Coordinated Session, G1 Challenges and Opportunities in the Interpretation of the Testing Standards Session Chair: Andrew Wiley, Alpine Testing Solutions, Inc. Session Discussant: Barbara Plake, University of Nebraska-Lincoln Across divisions of the professional assessment community, the Standards for Educational and Psychological Testing (AERA/APA/NCME, 2014) and its requirements serve as the guiding principles for testing programs when determining procedures and policies. However, while the Standards do serve as the primary source for the assessment community, the interpretation of the Standards continues to be a somewhat subjective affair. Because validity is dependent on the context of each program, testing professionals are required to interpret and align the guidelines to prioritize and evaluate relevant evidence. For example, in some scenarios a term such as “representative” can be difficult to define, and reasonable people could interpret evidence with notably different expectations. In practical terms, this can become problematic for the profession because if the Standards are not sufficiently clear for the purposes of interpretability and accountability within the profession, it creates more confusion when trying to communicate these expectations to policymakers and lay audiences. The purpose of this session is to focus of how assessment professionals use and interpret the Standards and the procedures that individuals and organizations use when applying them. Each of the four presenter will discuss the methods and procedures that their respective organizations have developed or how they have advised organizations they work with about interpreting and using the Standards to design or improve their programs. In addition, they will also discuss the sections of the Standards that they have found to be particularly difficult to interpret with recommendations about how additional interpretative guidance would make the Standards more effective to implement. The session will include with Dr. Barbara Plake serving as a discussant. Dr. Plake is one of the leading voices on the value and importance of the Standards and will review each paper along with a review of some of her experience in the use and interpretation of the Standards. Using the Testing Standards as the Basis for Developing a Validation Argument. Wayne Camara, ACT Using the Standards to Support Assessment Quality Evaluation Erika Hall and Thanos Patelis, Center for Assessment Blurring the Lines Between Credentialing and Employment Testing Chad Buckendahl, Alpine Testing Solutions, Inc. Content Based Evidence and Test Score Validation Ellen Forte, edCount, LLC 112 Washington, DC, USA Sunday, April 10, 2016 2:45 PM - 4:15 PM, Renaissance West A, Ballroom Level, Coordinated Session, G2 Applications of Combinatorial Optimization in Educational Measurement Session Chairs: Wim van der Linden and Michelle Barrett, Pacific Metrics; Bernard Veldkamp, University of Twente; Dmitry Belov, Law School Admission Council Combinatorial optimization (CO) is concerned with searching for an element from a finite set (called a feasible set) that would optimize (minimize or maximize) a given objective function. Numerous practical problems can be formulated as CO problems, where a feasible set is not given explicitly but is represented implicitly by a list of inequalities and inclusions. Two unique features of CO problems should be mentioned: 1. In practice, a feasible set is so large that a straightforward approach to solving a corresponding CO problem by checking every element of the feasible set would take an astronomical amount of time. For example, in a traveling salesmen problem (TSP) (Given a list of n cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?), the corresponding feasible set contains (n–1)!/2n− elements (routes). Thus, in the case of 25 cities there are 310,224,200,866,620,000,000,000 possible routes. Assuming that a computer can check each route in 1 microsecond (1/1,000,000 of a second), an optimal solution of the TSP with 25 cities will be found in about 9,837,144,878 years. With respect to the size of a given CO problem (e.g., number of cities, n, in the TSP), the time it takes to solve the problem can be approximated by an exponent, resulting in an exponential time (e.g., c2n, cen, where c is a constant) in contrast to a polynomial time (e.g., cnlogn, cn2). 2. Often, a given CO problem can be reduced to another CO problem in polynomial time. Thus, if one CO problem can be solved efficiently (e.g., in polynomial time) then the whole class of CO problems can be solved efficiently as well. Fortunately, the modern CO literature provides methods that, during the search, allow us to identify and remove large portions of the feasible set that do not contain an optimal element. As a result, many real instances of CO problems can be solved in a reasonable amount of time. The most popular method is branch-and-bound (Papadimitriou & Steiglitz, 1982), which solves an instance of the TSP with 25 cities in less than one minute on a regular PC. The history of CO applications in educational measurement began in the early 1980s, when psychometricians started to use CO methods for automated test assembly (ATA). Theunissen (1985) reduced a special case of an ATA problem to a knapsack problem (Papadimitriou & Steiglitz, 1982). van der Linden and Boekkooi-Timminga (1989) formulated an ATA problem as a maximin problem. Later, Boekkooi-Timminga (1990) extended this approach to the assembly of multiple test forms with no common items. Soon after that, the ATA problem attracted many more researchers, whose major results are reviewed in van der Linden (2005). The first part of this coordinated session will introduce CO and then review its existing and potential future applications to educational measurement. More specifically, it will introduce mixed integer programming (MIP) modeling as a tool for finding solutions to CO problems, emphasizing such key notions as constraints, objective function, feasible and optimal feasible solutions, linear and nonlinear models, and heuristic and solver-based solutions. It will then review areas of educational measurement where CO has already provided or has the potential to provide optimal solutions to main problems, including areas such as optimal test assembly, automated test-form generation, item-pool design, adaptive testing, calibration sample design, controlling test speededness, parameter linking design, and test-based instructional assignment. The second part of this coordinated session will discuss three recent applications of CO in educational measurement. The first application relates to linking. For the common dichotomous and polytomous response models, linking 113 2016 Annual Meeting & Training Sessions response model parameters across test administrations that use separate item calibrations requires the use of common items and/or common examinees. Error in the estimated linking function parameters occurs as a result of propagation of estimation error in the response model parameters (van der Linden & Barrett, in press). When using a precision-weighted average approach to estimation of linking parameters, linking error appears to be additive in the contribution of each linking item. Therefore, minimizing linking error when selecting common items from the larger set of available items from the first test administration may be facilitated using CO. Three new MIP models used to optimize the selection of a set of linking items, subject to blueprint and practical test requirements, will be presented. Empirical results will demonstrate the use of the models. The second application is for ATA under uncertainty in item parameters. Commonly, in an ATA problem one assumes that item parameters are known precisely. However, they are always estimated from some dataset, which adds uncertainty into the corresponding CO problem. Several optimization strategies dealing with uncertainty in the objective function and/or constraints of a CO problem have been developed in the literature. This presentation will focus on robust and stochastic optimization strategies, which will be applied to both linear and adaptive test assembly. An impact of the uncertainty on the ATA process will be studied, and practical recommendations to minimize the impact will be provided. The third application relates to two important topics in test security: detection of item preknowledge and detection of aberrant answer changes (ACs). Item preknowledge describes a situation in which a group of examinees (called aberrant examinees) have had access to some items (called compromised items) from an administered test prior to the exam. Item preknowledge negatively affects both the corresponding testing program and its users (e.g., universities, companies, government organizations) because scores for aberrant examinees are invalid. In general, item preknowledge is difficult to detect due to three unknowns: (i) unknown subgroups of examinees at (ii) unknown test centers who (iii) had access to unknown subsets of compromised items prior to taking the test. To resolve the issue of multiple unknowns, two CO methods are applied. First, a random search detects suspicious test centers and suspicious subgroups of examinees. Second, given suspicious subgroups of examinees, simulated annealing identifies compromised items. Advantages and limitations of the methods will be demonstrated using both simulated and real data. The statistical analysis of ACs has uncovered multiple testing irregularities on largescale assessments. However, existing statistics capitalize on the uncertainty in AC data, which may result in a large Type I error. Without loss of generality, for each examinee, two disjoint subsets of administered items are introduced: the first subset has items with ACs; the second subset has items without ACs, assembled by CO methods to minimize the distance between its characteristic curve and the characteristic curve of the first subset. A new statistic measures the difference in performance between these two subsets, where to avoid the uncertainty, only final responses are used. In computer simulations, the new statistic demonstrated a strong robustness to the uncertainty and higher detection rates in contrast to two popular statistics based on wrong-to-right ACs. 114 Washington, DC, USA Sunday, April 10, 2016 2:45 PM - 4:15 PM, Renaissance West B, Ballroom Level, Paper Session, G3 Psychometrics of Teacher Ratings Session Discussant: Tia Sukin, Pacific Metrics Psychometric Characteristics and Item Category Maps for a Student Evaluation of Teaching Patrick Meyer, Justin Doromal, Xiaoxin Wei and Shi Zhu, University of Virginia We describe psychometric characteristics of a student evaluation of teaching with four dimensions: Organization, Assessment, Interactions, and Rigor. Using data from 430 students and 65 university classrooms, we implemented an IRT-based approach to maximum information item category mapping to facilitate score interpretation and multilevel models to evaluate threats to validity. Psychometric Stability of Tripod Student Perception Surveys with Reduced Data Catherine McClellan, Clowder Consulting; John Donoghue, Educational Testing Service Student perception surveys such as TripodTM are becoming more commonly used as part of PK-12 classroom teacher evaluations. The loss of classroom time to survey administration remains a concern for teachers. This study examines approaches the impact on survey results of various data reduction approaches. Does the ‘type’ of Rater Matter When Evaluating Special Education Teachers? Janelle Lawson, San Francisco State University; Carrie Semmelroth, Boise State University This study examined how school administrators without any formal experience in special education performed using the Recognizing Effective Special Education Teachers (RESET) Observation Tool compared with previous reliability studies that used experienced special education teachers as raters. Preliminary findings indicate that ‘type’ of rater matters when evaluating special education teachers. Measuring Score Consistency Between Teacher and Reader Scored Grades Yang Zhao, University of Kansas; Jonathan Rollins, University of North Carolina; Deanna Morgan and Priyank Patel, The College Board The purpose of this paper is to evaluate score consistency between teachers and readers. Measures such as the Pearson correlation, Root Mean Square Error, Mean Absolute Error, Root Mean Square Error in agreement, and the Concordance Correlation Coefficient in agreement, are calculated. 115 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 2:45 PM - 4:15 PM, Meeting Room 3, Meeting Room Level, Paper Session, G4 Multidimensionality Session Discussant: Mark Reckase, Michigan State University An Index for Characterizing Construct Shift in Vertical Scales Jonathan Weeks, ETS The purpose of this study is to define an index that characterizes the amount of construct shift associated with a “unidimensional” vertical scale when the underlying data are multidimensional. The method is applied to large-scale math and reading assessments. Multidimensional Test Assembly of Parallel Test Forms Using a Kulback-Leibler Information Index Dries Debeer, University of Leuven; Usama Ali, Educational Testing Company; Peter van Rijn, ETS Global The statistical targets commonly used for the assembly of parallel test forms in unidimensional IRT are not directly transferable to multidimensional IRT. To fill this gap, a Kulback-Leibler based information index (KLI) is proposed. The KLI is discussed and evaluated in the uni- and the multidimensional case. Evaluating the Use of Unidimensional IRT Procedures for Multidimensional Data Wei Wang, Chi-Wen Liao and Peng Lin, Educational Testing Service This study intends to investigate the feasibility of applying unidimensional IRT procedures (including item calibration and equating) for multidimensional data. Both simulated data and operational data will be used. The results will provide suggestions about under which conditions it is appropriate to use unidimensional IRT procedures to analyze multidimensional data. Classification Consistency and Accuracy Indices for Multidimensional Item Response Theory Wenyi Wang, Lihong Song and Shuliang Ding, Jiangxi Normal University; Hua-Hua Chang, University of Illinois at UrbanaChampaign For criterion-referenced tests, classification consistency and accuracy are important indicators to evaluate the reliability and validity of classification results. The purpose of this study is to explore these indices for complex decision rules under multidimensional item response theory. It would be valuable to score interpretation and computerized classification testing. 116 Washington, DC, USA Sunday, April 10, 2016 2:45 PM - 4:15 PM, Meeting Room 4, Meeting Room Level, Paper Session, G5 Validating “Noncognitive”/Nontraditional Constructs I Session Discussant: William Lorié, Center for NextGen Learning & Assessment, Pearson Improving the NAEP SES Measure: Can NAEP Learn from Other Survey Programs? Young Yee Kim and Jonathan Phelan, American Institutes for Research; Jing Chen, National Center for Education Statistics; Grace Ji, Avar Consulting, Inc. This study is designed as part of NCES’s efforts to improve the NAEP SES measure. Based on the findings from the extensive review of various survey programs within and outside NCES and literature review, some suggestions are made to help NCES in reporting a new SES measure in 2017. Investigating SES Using the NAEP-HSLS Overlap Sample Burhan Ogut, George Bohrnstedt and Markus Broer, American Institutes for Research This study examines the relationships among the three main SES components (parental education, occupational status and income) based on parent-reports on the one hand, and student-reports of SES proxy variables (parents’ education, household possessions, and NSLP eligibility) on the other hand, using multiple-indicators and multiplecauses models and seemingly unrelated regressions. Rethinking the Measurement of Noncognitive Attributes Andrew Maul, University of California, Santa Barbara The quality of “noncognitive” measurement lags behind the quality of measurement in traditional academic realms. This project identifies a potentially serious gap in the validity argument for a prominent measure of growth mindsets. New approaches to the measurement of growth mindsets are piloted and exemplified. Validating Relationships Among Mathematics-Related Self Efficacy, Self Concept, Anxiety and Achievement Measures Madhabi Chatterji and Meiko Lin, Teachers College, Columbia University In this construct validation study, we use structural equation modeling to validate theoretically specified pathways and correlations of mathematics-related self-efficacy, self-concept, and anxiety in with math achievement scores. Results are consistent with past research with older students, and carry implications for research, policy and classroom practice. 117 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 2:45 PM - 4:15 PM, Meeting Room 5, Meeting Room Level, Paper Session, G6 Invariance Session Discussant: Ha Phan, Pearson The Impact of Measurement Noninvariance in Longitudinal Item Response Modeling In-Hee Choi, University of California, Berkeley This study investigates the impact of measurement noninvariance across time and group in longitudinal item response modeling, when researchers examine group difference in growth. First, measurement noninvariance is estimated from a large-scale longitudinal survey. These results are then used for a simulation study with different sample sizes. Measurement Invariance in International Large-Scale Assessments: Ordered-Categorical Outcomes in a Multidimensional Context Dubravka Svetina, Indiana University; Leslie Rutkowski, University of Oslo A critical precursor to comparing means on latent variables across cultures is that the measures are invariant across groups. Lack of consensus for cut off values for evaluating model fit in literature motivates this study where we consider the performance of fit measures when data are modeled as multidimensional, ordered-categorical. Assessing Uniform Measurement Invariance Using Multilevel Latent Modeling Carrie Morris, University of Iowa College of Education; Xin Li, ACT This simulation study investigated use of multilevel MIMIC and mixture models for assessing uniform measurement invariance. A multilevel model was generated with measurement error, and measurement and factorial noninvariances were imposed. Model fit, parameter and standard error bias, and power to detect noninvariance were assessed for all estimated models. Population Invariance of Equating Functions Across Subpopulations for a Large Scale Assessment Lucy Amati and Alina von Davier, Educational Testing Service In this study, we examine the population invariance assumption for a large-scale assessment. Results of the analysis demonstrated that the equating functions for subpopulations are very close to that of the total population. Results supported the invariance assumption of the equating function, contributing to showcase the fairness of the test. 118 Washington, DC, USA Sunday, April 10, 2016 2:45 PM - 4:15 PM, Meeting Room 15, Meeting Room Level, Paper Session, G7 Detecting Aberrant Response Behaviors Session Discussant: John Donoghue, ETS Methods That Incorporate Response Times and Responses for Excluding Data Irregularities Heru Widiatmo, ACT, Inc. Two methods, which use both responses and response times for excluding data irregularities, are combined and compared to find an optimal method. The methods are Response Time Effort (RTE) and Effective Response Time (ERT). The 3-PL IRT model is used to calibrate data and to evaluate the results. Online Detection of Compromised Items with Response Times in CAT Hyeon-Ah Kang, University of Illinois at Urbana-Champaign An online calibration based CUSUM procedure is proposed to detect compromised items in CAT. The procedure utilizes both observed item responses and response times for evaluating changes in item parameter estimates that are obtained on-the-fly during the CAT administrations. Detecting Examinee Preknowledge of Items: A Comparison of Methods Xi Wang, University of Massachusetts Amherst; Frederic Robin, Hongwen Guo and Neil Dorans, Educational Testing Service; Yang Liu, University of California, Merced In a continuous testing program, examinees are likely to have preknowledge of some items due to the repeated use of items over time. In this study, two methods are proposed to detect item preknowledge at person level, and their effectiveness is compared in a multistage adaptive testing context. Development of an R Package for Statistical Analysis in Test Security Jiyoon Park, Yu Zhang and Lorin Mueller, Federation of State Boards of Physical Therapy Statistical analysis of test results is the most widely used approach employed by test sponsors. Different statistical methods can be used to capture the signs of security breaches and to evaluate the validity of test scores. We propose an R package that provides systematic and comprehensive analyses in test security. 119 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 2:45 PM - 4:15 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session: GSIC Graduate Student Poster Session, G8 Graduate Student Issues Committee Brian Leventhal, Chair Masha Bertling, Laine Bradshaw, Lisa Beymer, Evelyn Johnson, Ricardo Neito, Ray Reichenberg, Latisha Sternod, Dubravka Svetina Electronic Board #1 Examining Test Irregularities Using Multidimensional Scaling Approach Qing Xie, ACT/The University of Iowa The purpose of this simulation study is to explore the possibility of using multidimensional scaling in detecting test irregularities via the concept of consistency of a battery or test structure. The results will provide insights on how well this method can be applied in different test irregularity situations. Electronic Board #2 The Influence of Measurement Invariance in the Two-Wave, Longitudinal Mediation Model Oscar Gonzalez, Arizona State University Statistical mediation describes how two variables are related by examining intermediate mechanisms. The mediation model assumes an underlying longitudinal design and that the same constructs are measured over time. This study examines what happens to the mediated effect when longitudinal measurement invariance is violated in a two-wave mediation model. Electronic Board #3 Parallel Analysis of Unidimensionality with Pca and Paf in Dichotomously Scored Data Ismail Cukadar, Florida State University This Monte Carlo study investigates the impact of using two different factor extraction methods (principle component analysis and principle axis factoring) in the Kaiser rule and the parallel analysis on the decision of unidimensionality in binary data that has examinee guessing. Electronic Board #4 Reducing Data Demands of Using a Multidimensional Unfolding IRT Model Elizabeth Williams, Georgia Institute of Technology A simulation study will be performed to investigate using a multidimensional scaling (MDS) solution in conjunction with the Multidimensional Generalized Graded Unfolding Model (MGGUM) to reduce data demands. The expected results are that the data demands will be reduced without sacrificing the quality of true parameter recovery. Electronic Board #5 Challenging Conditions for Mml and Mh-Rm Estimation of Multidimensional IRT Models Derek Sauder, James Madison University The MHRM estimator is faster than the MML estimator, and generally gives comparable parameter estimates. In one real dataset, the two procedures estimated similar item parameter values but different correlations between the subscales. A simulation will be conducted to examine which factors might lead to discrepancies between the estimators. 120 Washington, DC, USA Electronic Board #6 The Effects of Dimensionality and Dimensional Structure on Composite Scores and Subscores Unhee Ju, Michigan State University Both composite scores and subscores can provide diagnostic information about students’ specific progress. A simulation study was conducted to examine the performance of composite scores and subscores under different conditions of the number of dimensions, dimensional structure, and correlation between dimensions. Their implications will be discussed in the presentation. Electronic Board #7 Simple Structure MIRT True Score Equating for Mixed-Format Tests Stella Kim, The University of Iowa This study proposes a SS-MIRT true-score equating procedure for mixed-format tests and investigates its performance based on the results from real data analyses and a simulation study. Electronic Board #8 Conditions of Evaluating Models with Approximate Measurement Invariance Using Bayesian Estimation Ya Zhang, University of Pittsburgh A simulation study is performed to investigate approximate measurement invariance (MI) through Bayesian estimation. The size of differences in item intercepts, the proportion of items with differences on, and the level of prior variabilities are manipulated. The study findings provide a general guideline to the use of approximate MI. Electronic Board #9 Detecting Nonlinear Item Position Effects with a Multilevel Model Logan Rome, University of Wisconsin-Milwaukee When tests utilize a design in which items appear in different orders in various booklets, the item position can impact item responses. This simulation study will examine the performance of a multilevel model in detecting several functions and sizes of non-linear item-specific position effects. Electronic Board #10 Comparison of Scoring Methods for Different Item Types Hongyu Diao, Unversity of Massachusetts-Amherst This study will use a Monte Carlo simulation method to investigate the impact of concurrent calibration and separate calibration for the mixed-format test. The response data of Multiple Choice and Technology-Enhanced Items are simulated to represent two different dimensions. Electronic Board #11 IRT Approach to Estimate Reliability of Testlet with Balanced and Unbalanced Data Nana Kim, Yonsei University This study aims to investigate the effects of balanced and unbalanced data structures on the reliability estimates of testlet-based test when applying item response theory (IRT) approaches using simulated data sets. We focus on the relationship between patterns of reliability estimates and the degree of imbalance in data structure. 121 2016 Annual Meeting & Training Sessions Electronic Board #12 Hierarchical Bayesian Modeling for Peer Assessment in a Massive Open Online Course Yao Xiong, The Pennsylvania State University Peer assessment has been widely used in most of the massive open online courses (MOOCs) to provide feedbacks for constructed-response questions. However, peer rater accuracy and reliability is a major concern. The current study proposes a hierarchical Bayesian approach to account for the accuracy and reliability. Electronic Board #13 The Impact of Model Misspecification in the DCM-CAT Yu Bao, The University of Georgia Item parameters are usually assumed to be known in DCM-CAT simulations. When the assumption is violated, model misspecification may lead to different item information and posterior distribution, which are essential for item selection. The study shows how mis-fitting DCMs and overfitting DCMs will influence item bank usage and classification accuracy. Electronic Board #14 Interval Estimation of IRT Proficiency in Mixed-Format Tests Shichao Wang, The University of Iowa Interval estimation of proficiency can help to clearly present information to test users on how to interpret the uncertainty in their scores. This study intends to compare the performance of analytical and empirical approaches in constructing an interval for IRT-based proficiency for mixed-format tests using simulation techniques. Electronic Board #15 Analysis of Item Difficulty Predictors for Item Pool Development Feng Chen, The University of Kansas Systematic item difficulty prediction is introduced which accounts for all possible item features. The effect of these on resulting item parameters demonstrated using simulated and real data. Results will provide statistical and evidentiary implications to item pool development and test construction. Electronic Board #16 Regressing Multiple Predictors into a Cognitive Diagnostic Model Kuan Xing, University of Illinois at Chicago This study is to investigate the stability of parameter estimates and classification when multiple covariates of different types are analyzed in the RDINA and HO-DINA models. Real-world (TIMSS) data analyses and simulation study were conducted. Educational significance regarding examining the relationship between covariates and the CDM was discussed. Electronic Board #17 Non-Instructional Factors That Affect Student Mathematics Performance Michelle Boyer, University of Massachusetts, Amherst The effects of non-instructional factors in educational success are increasingly important for educational authorities to understand as they seek to improve student outcomes. This study evaluates a large number of such factors and their effects on mathematics performance for a large US nationally representative sample of students. 122 Washington, DC, USA Electronic Board #18 A Procedure to Improve Item Parameter Estimation in Presence of Test Speededness Can Shao, University of Notre Dame In this study, we propose to use a data cleansing procedure based on change-point analysis to improve item parameter estimation in presence of test speededness. Simulation results show that this procedure can dramatically reduce the bias and root mean square error of the item parameter estimates. Electronic Board #19 Simulation Study Off Estimation Methods in Multidimensional Student Response Data Philip Grosse, University of Pittsburgh The purpose of this simulation study is to provide a comparison of WLSMV and BAYES estimators in a bifactor model based on simulated multidimensional student responses. The estimation methods are compared in terms of their item parameter recovery and ability estimation. Electronic Board #20 Detecting Testlet Effect Using Graph Theory Xin Luo, Michigan State University Testlet effect has significant influence on measurement accuracy and test validity. This study proposed a new approach based on graph theory to detect testlet effect. Results of a simulation study supported the quality of this method. Electronic Board #21 Assessing Item Response Theory Dimensionality Assumptions Using DIMTEST and NOHARM-Based Methods Kirsten Hochstedt, Penn State University This study examined how select IRT dimensionality assessment methods performed for two- and three-parameter logistic models with combinations of short test lengths, small sample sizes, and ability distribution shapes (skewness, kurtosis). The capability of DIMTEST and three NOHARM-based methods to detect dimensionality assumption violations in simulated data was compared. Electronic Board #22 Evaluating the Invariance Property in IRT: A Case of Multi-State Assessment Seunghee Chung, Rutgers University This simulation study investigates how the invariance property of IRT item parameter can be held under multi-state assessment situation, especially when the characteristics of member states are dissimilar to one another. Practical implication of multi-state assessment development is discussed to avoid potential measurement bias caused by lack of invariance property. Electronic Board #23 Evaluating Predictive Accuracy of Alternative IRT Models and Scoring Methods Charles Iaconangelo, Rutgers University, The State University of New Jersey This paper uses longitudinal data from a large urban school system to evaluate different item response theory models and scoring methods for their value in predicting future test scores. It finds that both richer IRT models, and scoring methods based on response patterns rather than number correct, improve predictive accuracy. 123 2016 Annual Meeting & Training Sessions Electronic Board #24 A Comparison of Estimation Methods for the Multi-Unidimensional Three-Parameter IRT Model Tzu Chun Kuo, Southern Illinois University Carbondale Two marginal maximum likelihood (MML) approaches, three fully Bayesian algorithms, and a Metropolis-Hastings Robbins-Monro (MHRM) algorithm were compared for estimating multi-unidimensional three-parameter models using simulations. Preliminary results suggested that the two MML approaches, together with blocked Metropolis and MHRM, had an overall better parameter recovery than the other estimation methods. Electronic Board #25 A Methodology for Item Condensation Rule Identification in Cognitive Diagnostic Models Diego Luna Bazaldua, Teachers College, Columbia University A methodology within a Bayesian framework is employed to identify the item condensation rules for cognitive diagnostic models (CDMs). Simulated and empirical data are used to analyze the ability of the methodology to detect the correct condensation rules for different CDMs. 124 Washington, DC, USA Sunday, April 10, 2016 4:35 PM - 5:50 PM, Ballroom C, Level Three, Convention Center AERA Presidential Address Public Scholarship to Educate Diverse Democracies Jeannie Oakes, AERA President; University of California - Los Angeles 125 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 4:35 PM - 6:05 PM, Renaissance East, Ballroom Level, Coordinated Session, H1 Advances in Balanced Assessment Systems: Conceptual Framework, Informational Analysis, Application to Accountability Session Chair: Scott Marion, National Center for the Improvement of Educational Assessment Session Discussant: Lorrie Shepard, University of Colorado, Boulder For more than a decade, there have been calls for multiple assessments to be designed and used in more integrated ways—for “balanced” or “comprehensive” assessment systems. However, there has been little focused work on clearly defining what is meant by a balanced assessment system as well as the characteristics that contribute to the quality of such assessment systems. Importantly, there have been scant analyses of such systems and in particular how instructional and accountability demands might both be addressed. This coordinated session presents advances in conceptualizing and analyzing balanced assessment systems. The session begins with an overview of the need for considering the quality of balanced assessment systems, with an emphasis on validity and usefulness. The second presentation focuses on conceptualizing the systems aspects of a balanced assessment system—what characterizes a system that goes beyond good individual assessments? The third presentation presents two approaches, content-based alignment judgments and scalebased interpretations, together to get content-referenced information from assessments to support instruction and learning. These approaches are based on the actual information available and the interpretations supported. The fourth presentation presents a technical analysis of comparability in a balanced assessment system in the context of school accountability. Balanced Assessment Systems: Overview and Context Brian Gong and Scott Marion, National Center for the Improvement of Educational Assessment Systemic Aspects of Balanced Assessment Systems Rajendra Chattergoon, University of Colorado, Boulder Validity and Utility in a Balanced Assessment System: Use, Information, and Timing Phonraphee Thummaphan, University of Washington, Seattle; Nathan Dadey, Center for Assessment Comparability in Balanced Assessment Systems for State Accountability Carla Evans, University of New Hampshire; Susan Lyons, Center for Assessment 126 Washington, DC, USA Sunday, April 10, 2016 4:35 PM - 6:05 PM, Renaissance West A, Ballroom Level, Coordinated Session, H2 Minimizing Uncertainty: Effectively Communicating Results from CDM-Based Assessments Session Discussant: Jacqueline Leighton, University of Alberta Fueled by needs for educational tests that provide diagnostic feedback, researchers have made recent progress in designing statistical models that are well-suited to categorize examinees according to mastery levels for a set of latent skills or abilities. Cognitive diagnosis models (CDMs) yield probabilistic classifications of students according to multiple facets, termed attributes, of knowledge or reasoning. These results have the potential to inform instructional decision-making and learning, but in order to do so the results must be comprehensible to a variety of education stakeholders. This session will include four papers on CDMs and communicating CDM-based results. Laine Bradshaw and Roy Levy outline the challenges of reporting results from CDMs and provide context for subsequent papers. Tasmin Dhaliwal, Tracey Hembry and Laine Bradshaw provide empirical evidence of teacher interpretation and preference for viewing mastery probabilities and classification results, in an online reporting environment. Kristen DiCerbo and Jennifer Kobrin share findings on how to present learning progression-based assessment results to teachers to support their instructional decision-making. Valerie Shute and Diego Zapata-Rivera model (using Bayes nets) and visualize students’ beliefs in flexible belief networks. Interpreting Examinee Results from Classification-Based Models Laine Bradshaw (2015 Jason Millman Promising Measurement Scholar Award Winner), University of Georgia; Roy Levy, Arizona State University Achieving the Promise of CDMs: Communicating CDM-Based Assessment Results Tasmin Dhaliwal, Pearson; Tracey Hembry, Alpine Testing Solutions; Laine Bradshaw, University of Georgia Communicating Assessment Results Based on Learning Progressions Kristen DiCerbo and Jennifer Kobrin, Pearson Representing and Visualizing Beliefs Valerie Shute, Florida State University; Diego Zapata-Rivera, Educational Testing Service 127 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 4:35 PM - 6:05 PM, Meeting Room 16, Meeting Room Level, Coordinated Session, H3 Overhauling the SAT: Using and Interpreting Redesigned SAT Scores Session Chair: Maureen Ewing, College Board Session Discussant: Suzanne Lane, University of Pittsburgh In February of 2013, the College Board announced it would undertake a major redesign of the SAT® with the intent of making the test more transparent and useful. The redesigned test will assess skills, knowledge, and understandings that matter most for college and career readiness. Only relevant vocabulary (as opposed to the sometimes criticized obscure vocabulary measured today) will be assessed. The Math section will be focused on a smaller number of content areas. The essay will be optional. There will be a switch to rights-only scoring. The total score scale will revert back to the original 400 to 1600, and there will be several cross-test scores and subscores. At the same time, scores on the redesigned assessment are expected to continue to meaningfully predict success in college and serve as a reliable indicator of college and career readiness. Throughout the redesign effort, many important research questions emerged such as: (1) How can we be sure the content on the redesigned test measures what is most important for college and career readiness? (2) How can we develop concordance tables to relate scores on the redesigned assessment to current scores? (3) How do we define and measure college and career readiness? (4) How well can we expect scores on the redesigned assessment to predict first-year college grades? The purpose of this session is to describe the research the College Board has done to support the launch of the redesigned SAT. The session will begin with a brief overview of the changes to the SAT with a focus on how these changes are intended to make the test more transparent and useful. Four papers will follow that describe more specifically the test design and content validity argument for the new test, the development and practical implications of producing and delivering concordance tables, the methodology used to develop and validate college and career readiness benchmarks for the new test and, lastly, early results about the relationship between scores on the redesigned SAT and college grades gathered from a special, non-operational study. The discussant, Suzanne Lane, who is a nationally-renowned expert on assessment design and validity research, will offer constructive comments on the fundamental ideas, approaches, and designs undergirding the research presentations. An Overview of the Redesigned SAT Jack Buckley, College Board The Redesigned SAT: Content Validity and Assessment Design Sherral Miller and Jay Happel, College Board Producing Concordance Tables for the Transition to the Redesigned SAT Pamela Kaliski, Rosemary Reshetar, Tim Moses, Hui Deng and Anita Rawls, College Board College and Career Readiness and the Redesigned SAT Benchmarks Jeff Wyatt and Kara Smith, College Board A First Look at the Predictive Validity of the Redesigned SAT Emily Shaw, Jessica Marini, Jonathan Beard and Doron Shmueli, College Board 128 Washington, DC, USA Sunday, April 10, 2016 4:35 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, H4 Quality Assurance Methods for Operational Automated Scoring of Essays and Speech Session Discussant: Vincent Kieftenbeld, Pacific Metrics The quality of current automated scoring systems is increasingly comparable with or even surpassing that of trained human raters. Ensuring score validity in automated scoring, however, requires sophisticated quality assurance methods both during the design and training of automated scoring models, as well as during operational automated scoring. The four studies in this coordinated session present novel quality assurance methods for use in operational automated scoring of essay and speech responses. A common theme unifying these studies is the development of techniques to screen responses during operational scoring. A wide variety of methods is used, ranging from ensemble learning and outlier detection to information retrieval and natural language processing and identification. This session complements the session Challenges and solutions in the operational use of automated scoring systems which focuses on quality assurance during the design and training phases of automated scoring. Statistical High-Dimensional Outlier Detection Methods to Identify Abnormal Responses in Automated Scoring Raghuveer Kanneganti, Data Recognition Corporation CTB; Luyao Peng, University of California, Riverside Does Automated Speaking Response Scoring Favor Speakers of Certain First Language? Guangming Ling and Su-Youn Yoon, Educational Testing Service Feature Development for Scoring Source-Based Essays Claudia Leacock, McGraw-Hill Education CTB; Raghuveer Kanneganti, Data Recognition Corporation CTB Non-Scorable Spoken Response Detection Using NLP and Speech Processing Techniques Su-Youn Yoon, Educational Testing Service 129 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 4:35 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Paper Session, H5 Student Growth Percentiles Session Discussant: Damian Betebenner, Center for Assessment The Accuracy and Fairness of Aggregate Student Growth Percentiles as Indicators of Educator Performance Jason Millman Promising Measurement Scholar Award Winner 2016: Katherine Furgol Castellano Daniel McCaffrey and J.R. Lockwood, ETS Aggregated SGP (AGP), the mean/median SGP for students linked to the same teacher/school, are a popular alternative to VAM-based measures of educator performance. However, we demonstrate that test score measurement error affects the accuracy and precision of typically used AGP. We also contrast standard AGP against several alternative AGP estimators. Cluster Growth Percentiles: An Alternative to Aggregated Student Growth Percentiles Scott Monroe, UMass Amherst; Li Cai, CRESST/UCLA Aggregates of Student Growth Percentiles (Betebenner, 2009) are used by numerous states for purposes of teacher evaluation. In this research, we propose an alternative statistic, a Cluster Growth Percentile, defined directly at the group or cluster-level. The two approaches are compared, and simulated and empirical examples are provided. Evaluating Student Growth Percentiles: Perspective of Test-Retest Reliability Johnny Denbleyker, Houghton Mifflin Harcourt; Ye Lin, University of Iowa This study examines SGP calculations and corresponding NCEs where multiple test opportunities existed within the accountability testing window for an NCLB mathematics assessment. This allowed aspects of reliability to be assessed in a practical test-retest manner while accounting for measurement error associated with both sampling of items and occasions. 130 Washington, DC, USA Sunday, April 10, 2016 4:35 PM - 6:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, H6 Equating: From Theory to Practice Session Discussant: Ye Tong, Pearson Similarities Between Equating Equivalents Using Presmoothing and Postsmoothing Hyung Jin Kim and Robert Brennan, The University of Iowa Presmoothing and postsmoothing improve equating by reducing sampling error. However, little research has been conducted about similarities in equated-equivalents between presmoothing and postsmoothing. This study examines how equated-equivalents differ between presmoothing and postsmoothing for different smoothing degrees, and investigates the presmoothing degrees giving similar results as a specific postsmoothing degree. Stability of IRT Calibration Methods for the Common-Item Nonequivalent Groups Equating Design Yujin Kang and Won-Chan Lee, University of Iowa The purpose of this study is to investigate accumulated equating error of item response theory (IRT) calibration methods in the common-item nonequivalent groups (CINEG) design. The factors of investigation are calibration methods, equating methods, types of change in the ability distribution, common item compositions, and computer software for calibration. Subscore Equating and Reporting Euijin Lim and Won-Chan Lee, The University of Iowa The purpose of this study is to address the necessity of subscore equating in terms of score profiles using real data sets and discuss practical issues related thereto. Also, the performance of several equating methods for subscores are compared under various conditions using simulation techniques. On the Effect of Varying Difficulty of Anchor Tests on Equating Accuracy Irina Grabovsky and Daniel Julrich, NBME This study investigates the question of optimal location of anchor test for equating minimum competency examinations. For examinations where means of distributions of examinee abilities and item difficulties are distance apart, placement of an anchor test based on proximity to examinee ability mean results in a more accurate equating procedure. 131 2016 Annual Meeting & Training Sessions Sunday, April 10, 2016 4:35 PM - 6:05 PM, Meeting Room 15, Meeting Room Level, Paper Session, H7 Issues in Ability Estimation and Scoring Session Discussant: Peter van Rijn Practical and Policy Impacts of Ignoring Nested Data Structures on Ability Estimation Kevin Shropshire, Virginia Tech (note I graduated in May 2014). I currently work at the University of Georgia (OIR) and this research is not affiliated with that department / university. I am providing the school where my research was conducted.; Yasuo Miyazaki, Virginia Tech Consistent with the literature, the standard errors corresponding to item difficulty parameters are underestimated when clustering is part of the design but ignored in the estimation process. This research extends the focus to the impact of design clustering on ability estimation in IRT models for psychometricians and policy makers. MIRT Ability Estimation: Effects of Ignoring the Partially Compensatory Nature Janine Buchholz and Johannes Hartig, German Institute for International Educational Research (DIPF); Joseph Rios, Educational Testing Service (ETS) The MIRT model most commonly employed to estimate within-item multidimensionality is compensatory. However, numerous examples in educational testing suggest partially compensatory relations among dimensions. We therefore investigated conditional bias in theta estimates when incorrectly applying the compensatory model. Findings demonstrate systematic underestimation for examinees highly proficient in one dimension. Interval Estimation of Scale Scores in Item Response Theory Yang Liu, University of California, Merced; Ji Seung Yang, University of Maryland, College Park In finite samples, the uncertainty arising from item parameter estimation is often non-negligible and must be accounted for when calculating latent variable scores. Various Bayesian, fiducial, and frequentist interval estimators are harmonized under the framework of consistent predictive inference, and their performances are evaluated via Monte Carlo simulations. Applying the Hajek Approach in the Delta Method of Variance Estimation Jiahe Qian, Educational Testing Service The variance formula derived by the delta method, for two-stage sampling design, employs the joint inclusion probabilities in the first-stage selection of schools. The inquiry aims to apply Hajek approximation to estimate the joint probabilities, which are often unavailable in analysis. The application is illustrated with real and simulation data. 2016 Bradley Hanson Award for Contributions to Educational Measurement: Sun-Joo Cho 132 Washington, DC, USA Sunday, April 10, 2016 4:35 PM - 6:05 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session, Paper Session, H8 Electronic Board #1 Asymmetric ICCs as an Alternative Approach to Accommodate Guessing Effects Sora Lee and Daniel Bolt, University of Wisconsin, Madison Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects are well documented (Han, 2012). We consider the use of a residual heteroscedasticity model (Molenaar, 2014) as an alternative, and compare its performance to the 3PL with real test datasets and through simulation analyses. Electronic Board #2 Software Note for PARSCALE Ying Lu, John Donoghue and Hanwook Yoo, Educational Testing Service PARSCALE is one of the most popular commercial software packages for IRT calibration. PARSCALE users, however, should be aware of the issues associated with the software to ensure the quality of IRT calibration results. The purpose of this paper is to summarize these issues and to suggest solutions. Electronic Board #3 Stochastic Approximation EM for Exploratory Item Factor Analysis Eugene Geis and Greg Camilli, Rutgers Graduate School of Education We present an item parameter estimation combining stochastic approximation and Gibbs sampling for exploratory multivariate IRT analyses. It is characterized by drawing a missing random variable, updating post-burn-in sufficient statistics of missing data using the Robbins-Monro procedure, estimating factor loadings using a novel approach, and drawing samples of latent ability. Electronic Board #4 Reporting Student Growth Percentiles: A Novel Tool for Displaying Growth David Swift and Sid Sharairi, Houghton Mifflin Harcourt The increased use of growth models has created a need for tools that help policy makers with growth decisions and inform stakeholders. The data tool presented meets this need through a feature rich, user friendly application that puts the policy maker in control. Electronic Board #5 The Impact of Plausible Values When Used Incorrectly Kyung Sun Chung, Pennsylvania State University This study examined the effect of plausible values when used incorrectly such as using one value out of five provided or using averages of five plausible values. Two previously published studies are replicated for practical relevance. The results present that appropriate use of plausible values is recommended for unbiased estimates. Electronic Board #6 Missing Data – on How to Avoid Omitted and Not-Reached Items Miriam Hacker, Frank Goldhammer and Ulf Kröhne, German Institute for International Educational Research (DIPF) The problem of missing data is common in almost all measurements. In this study, the occurrence of missing data is examined and how to avoid them by presenting more time information at item level. Results indicates that time information can reduce missing responses without affecting the performance. 133 2016 Annual Meeting & Training Sessions Electronic Board #7 Challenging Measurement in the Field of Multicultural Education: Validating a New Scale Jessie Montana Cain, University of North Carolina at Chapel Hill Measurement in the field of multicultural education has been scarce. In this study the psychometric properties of the newly developed Multicultural Teacher Capacity Scale were examined. The MTCS is a reliable and valid measure of multicultural teacher capacity for samples that mirror the development sample. Electronic Board #8 Automated Test Assembly Methods Using Monte-Carlo-Based Linear-On-The-Fly (LOFT) Techniques John Weiner and Gregory Hurtz, PSI Services LLC Monte-Carlo-based Linear-on-the-fly techniques of automated test assembly offer a number of advantages toward the goals of exam security, exam form equivalence, and efficiency in examination development activities. Classicaltest-theory and Rasch/IRT approaches are compared, and issues of statistical sampling and analyses are discussed. Electronic Board #9 DIF Related to Test Takers’ Culture Background and Language Proficiency Jinghua Liu, Secondary School Admission Test Board; Tim Moses, College Board This study examines DIF from the perspective of test takers’ culture background by using operational data from a standardized admission test. We recommend that testing programs containing large portion of test takers from different regions and culture background ought to add region/culture DIF to the DIF routine screening. Electronic Board #10 Can a Two-Item Essay Test Be Reliable and Valid? Brent Bridgeman and Donald Powers, Educational Testing Service Psychometricians have long complained that a two-item essay test cannot be reliable and valid for predicting academic outcomes compared to a multiple-choice test (e.g., Wainer & Thissen, 1993). Recent evidence from predictive validity studies of Verbal Reasoning and Analytical Writing GRE scores challenges this point of view. Electronic Board #11 Selecting Automatic Scoring Features Using Criticality Analysis Han-Hui Por and Anastassia Loukina, Educational Testing Service We apply the criticality analysis approach to select features in the automatic scoring of spoken responses in a language assessment. We show that this approach addresses issues of sample dependence and bias, and identifies salient features that are critical in improving model validity. Electronic Board #12 A Meta-Analysis of the Predictive Validity of Graduate Management Admission Test HAIXIA QIAN, Kim Trang and Neal Martin Kingston, University of Kansas The purpose of the meta-analysis was to assess the Graduate Management Admission Test (GMAT) and undergraduate GPA (UGPA) as predictors of business school performance. Results showed both the GMAT and UGPA were significant predictors, with the GMAT as a stronger predictor compared to UGPA. Electronic Board #13 A Fully Bayesian Approach to Smoothing the Linking Function in Equipercentile Equating Zhehan Jiang and William Skorupski, University of Kansas A fully Bayesian parametric method for robustly estimating the linking function in equipercentile equating is introduced, explicated, and evaluated via a Monte Carlo simulation study. 134 Washington, DC, USA Electronic Board #14 Conducting a Post-Equating Check to Detect Unstable Items on Pre-Equated Tests Keyin Wang, Michigan State University; Wonsuk Kim and Louis Roussos, Measured Progress Pre-equated tests are increasingly common. Every item is assumed to behave in a stable manner. Thus, “postequated” checks need to be conducted to detect and correct problematic items. Little research has been directly conducted on this topic. This study proposes possible procedures and begins to evaluate them. Electronic Board #15 An Evaluation of Methods for Establishing Crosswalks Between Instruments Mark Hansen, University of California, Los Angeles In this study, we evaluate several approaches for obtaining projections (or crosswalks) between instruments measuring related, but somewhat distinct constructs. Methods utilizing unidimensional and multidimensional item response theory models are compared. We examine the impact of test length, correlation between constructs, and sample characteristics on the quality of the projection. Electronic Board #16 Exploration of Factors Affecting the Necessity of Reporting Test Subscores Xiaolin Wang, Dubravka Svetina and Shenghai Dai, Indiana University, Bloomington Interest in test subscore reporting has been growing rapidly for diagnosis purposes. This simulation study examined factors (correlation between subscales, number of items per subscale, complexity of test, and item parameter distribution) that affected the necessity of reporting subscores within the classical test theory framework. Electronic Board #17 Evaluation of Psychometric Stability of Generated Items YU-LAN SU, TINGTING CHEN and JUI-SHENG WANG, ACT,ING The study investigated the psychometric stability of generated items using operational data. The generated items were compared to their parents for the classical item statistics, DIF, raw response distributions to the key, and IRT parameters. The empirical evidence will serve as groundwork for the growing applications of item generation. Electronic Board #18 Creating Parallel Forms with Small Samples of Examinees Lisa Keller, University of Massachusetts Amherst; Rob Keller, Measured Progress; Andrea Hebert, Bottom Line Technologies This study investigates using item specific priors in item calibration to assist in the creation of parallel forms in the presence of small samples of examinees. Results indicate that while the item parameters may still contain error, classification of examinees into performance categories might be improved using the method. Electronic Board #19 Higher-Order G-DINA Model for Polytomous Attributes qin Yi and Tao Yang, Faculty of Education, Beijing Normal University; Tao Xin and lou Liu, School Of Psychology, Beijing Normal University G-DINA Model for Polytomous Attributes (Jinsong Chen,2013) accounting for the attribute level can provide additional diagnostic information. While involving the high order structure, it can provide more micro attributes information and macro capability expression linked to IRT theory, which also increase the sensitivity of classification. 135 2016 Annual Meeting & Training Sessions Electronic Board #20 New Search Algorithm for Q-matrix Validation Ragip Terzi, Rutgers, The State University of New Jersey; Jimmy de la Torre, Rutgers University The validity while constructing a Q-matrix in cognitive diagnosis modeling has raised significant attentions due to the possibility of attribute-misspecifications. It can result in model-data misfit and ultimately attributemisclassifications. The current study proposes a new method for Q-matrix validation. The results are also compared to other parametric and non-parametric methods. Electronic Board #21 Generalized DCMs for Option-Based Scoring Oksana Naumenko, Yanyan Fu and Robert Henson, The University of North Carolina at Greensboro; Bill Stout, University of Illinois at Urbana-Champaign; Lou DiBello, University of Illinois at Chicago A recently proposed family of models, the Generalized Diagnostic Classification Models for Multiple Choice OptionBased Scoring (GDCM-MC) extracts information about examinee cognitive processing from all MC item options. This paper describes a set of simulation studies with factors such as test length and number of options that examine model performance. Electronic Board #22 Evaluating Sampling Variability and Measurement Precision of Aggregated Scores in Large-Scale Assessment Xiaohong Gao and Rongchun Zhu, ACT, Inc. The study demonstrates how to conceptualize sources of measurement error and estimate sampling variability and reliability in large-scale assessment of educational quality. One international and one domestic assessment data sets are used to shed light on potential sources of measurement uncertainty and improvement of measurement precision for aggregated scores. Electronic Board #23 The Model for Dichotomously-Scored Multiple-Attempt Multiple-Choice Items Igor Himelfarb and Katherine Furgol Castellano, Educational Testing Service (ETS); Guoliang Fang, Penn State University This paper proposes a model for dichotomously-scored, multiple-attempt, multiple-choice item responses that may occur in scaffolded assessments. Assuming a 3PL IRT model, simulations were conducted using MCMC MetropolisHasting to recover the generated parameters. Results indicate that best recovery was for item parameters of low and moderate difficulty and discrimination. Electronic Board #24 Classical Test Theory Embraces Cognitive Load Theory: Measurement Challenges Keeping It Simple Charles Secolsky, Mississippi Department of Education; Eric Magaram, Rockland Community College The measurement community is challenged by advances in educational technology and psychology.On a basic level, classical test theory is used as a measurement model for understanding cognitive load theory and the influence of cognitive load theory has on test validity.The greater the germane cognitive load, the greater the true score. 136 Washington, DC, USA Sunday, April 10, 2016 6:30 PM - 8:00 PM, Renaissance West B, Ballroom Level President’s Reception By Invitation Only 137 2016 Annual Meeting & Training Sessions 138 Washington, DC, USA Annual Meeting Program - Monday, April 11, 2016 139 2016 Annual Meeting & Training Sessions 140 Washington, DC, USA Monday, April 11, 2016 5:45 AM - 7:00 AM NCME Fitness Run/Walk Session Organizers: Katherine Furgol Castellano, ETS; Jill R. van den Heuvel, Alpine Testing Solutions Start your morning with NCME’s annual 5k Walk/Run in Potomac Park. Meet in the lobby of the Renaissance Washington, DC Downtown Hotel at 5:45AM. Pre-registration is required. Pickup your bib number and t-shirt at the NCME Information Desk in the hotel, anytime prior to race day. Transportation will be provided. (Additional registration fee required) The event is made possible through the sponsorship of: National Center for the Improvement of Educational Assessment, Inc. Measurement, Inc. College Board ACT American Institutes for Research Graduate Management Admission Council Educational Testing Service Pearson Educational Measurement Houghton Mifflin Harcourt Law School Admission Council Applied Measurement Professionals, Inc. WestEd HumRRO 141 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 13/14, Meeting Room Level, Invited Session, I1 NCME Book Series Symposium: Technology and Testing Session Editor: Fritz Drasgow, University of Illinois at Urbana-Champaign Session Chair: Randy Bennett, ETS This symposium draws on Technology and Testing: Improving Educational and Psychological Measurement, a recently published volume in the new NCME Book Series. The volume probes the remarkable opportunities for innovation and progress that have resulted from the convergence of advances in technology, measurement, and the cognitive and learning sciences. The book documents many of these new directions and provides suggestions for numerous further advances. It seems safe to predict that testing will be dramatically transformed over the new few decades – paper test booklets with opscan answer sheets will soon be as outdated as computer punch cards. The book is divided into four sections, each with several chapters and a section commentator. For purposes of this symposium, one chapter author per section will present his or her chapter in some depth, followed by the section commentator who will briefly review each of the other chapters in the section. The symposium offers the measurement community a unique opportunity to learn about how technology will help to transform assessment practices and the challenges that transformation is already posing and will continue to present Issues in Simulation-Based Assessment Brian Clauser and Melissa Margolis, National Board of Medical Examiners; Jerome Clauser, American Board of Internal Medicine; Michael Kolen, University of Iowa Commentator: Stephen Sireci, University of Massachusetts, Amherst Using Technology-Enhanced Processes to Generate Test Items in Multiple Languages Mark Gierl, Hollis Lai, Karen Fung and Bin Zheng, University of Alberta Commentator: Mark Reckase, Michigan State University Increasing the Accessibility of Assessments through Technology Elizabeth Stone, Cara Laitusis and Linda Cook, ETS Commentator: Kurt Geisinger, University of Nebraska, Lincoln From Standardization to Personalization: The Comparability of Scores Based on Different Testing Conditions, Modes, and Devices Walter Way, Laurie Davis, Leslie Keng and Ellen Strain-Seymour, Pearson Commentator: Edward Haertel, Stanford University 142 Washington, DC, USA Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, I2 Exploring Various Psychometric Approaches to Report Meaningful Subscores Session Discussant: Li Cai, University of California, Los Angeles The impetus of this session came directly from needs and concerns expressed by score users of K-12 large-scale Common Core State Standards (CCSS) aligned assessments. Subscores, also called domain scores as Reading, Listening, and Writing in an English language arts test, and subdomain scores that are based on detailed content standards nested within a domain are reported in assessments. As the CCSS have been adopted by many states, educators and parents need information of both domain and sub-domain from the state accountability tests to (1) explain the student’s performance in certain content areas, (2) evaluate the effects of teaching and learning practices in classroom and (3) investigate the impact of implementation of CCSS. However, the use of subscores has been criticized for its low reliability (Thissen & Wainer, 2001) and little added value when correlations among subscores are high (Sinharay, 2010). In online-adaptive testing, the traditional observed subscores are usually not meaningful, because students responded to different items at different difficulty levels, which renders the subscores not comparable among students. Furthermore, in an online-adaptive testing format, each student usually receives only a few items that are from the core content-related subdomain units. In that case, student-level subdomain scores are unlikely to be reliable. However when school-level factors were collected from many students, the aggregated information may be meaningful. The issues of reporting subscores in K-12 CCSS-aligned assessments are discussed by four different approaches from both theoretical and empirical perspectives. Our studies show that the reliabilities can be improved and additional information can be provided to test users in assessment even under the online-adaptive testing setting. The first study presents results from a residual analysis of subscores which has been widely applied in the statewide assessments. The advantages, limitations and possible solutions for improvement are also discussed. The second study uses a mixture of Item Response Theory (IRT) and a higher-order cognitive diagnostic models (HO-DINA) to produce attribute classification profiles as alternative of traditional subscores along with general ability scores. The third study proposes a Multilevel Testlet (MLT) item factor model to produce school-level instructionally-meaningful subscores. The fourth study incorporates collateral information by implementing a fully Bayesian approach to report more reliable subscores. This panel of studies will provide an insight of subscore from various approaches and both a within- and across-methodologies perspective. We hope this session can enrich the literature and methodology in subscore reporting and also support producing meaningful diagnostic information for teaching and learning. Using Residual Analysis to Report Subscores in Statewide Assessments Jon Cohen, American Institutes for Research Applying a Mixture of IRT and HO-DINA Models in Subscore Reporting Likun Hou, Educational Testing Services; Yan Huo, Educational Testing Service; Jummy de la Torre, Rutgers University Multilevel Testlet Item Factor Model for School-Level Instructionally-Meaningful Subscores Megan Kuhfeld, University of California, Los Angeles Incorporating Collateral Information and Fully Bayesian Approach for Subscores Reporting Yi Du, Educational Testing Services; Shuqin Tao, curriculum associates; Feifei Li, Educational Testing Service 143 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 3, Meeting Room Level, Coordinated Session, I3 From Items to Policies: Big Data in Education Session Discussant: Zachary Pardos, School of Information and Graduate School of Education, UC Berkeley Data are woven into every sector of the global economy (McGuire et al., 2012), including education. As technology and analytics improve, the use of big data to derive insights that lead to system improvements is growing rapidly. The purpose of this panel is to share a collection of promising approaches for analyzing and leveraging big data in a wide range of education contexts. Each contribution is an application of machine learning, computer science, and/ or statistical techniques to an education issue or question, in which expert judgment would be costly, impractical, or otherwise hampered by the magnitude of the problem. We focus on the novel application of big data to address questions of construct validity for assessments; inferences about student abilities and learning needs when data are sparse or unstructured; decisions about course structure; and public sentiment about specific education policies. The ultimate goal for the use of big data and the application of these methods is to improve outcomes for learners. We conclude the session with lessons learned from the application of these methods to research questions across a broad spectrum of education issues, noting strengths and limitations. What and When Students Learn: Q-Matrices and Student Models from Longitudinal Data José González-Brenes, Center for Digital Data, Analytics & Adaptive Learning, Pearson Misconceptions Revealed Through Error Responses Thomas McTavish, Center for Digital Data, Analytics and Adaptive Learning, Pearson Beyond Subscores: Mining Student Responses for Diagnostic Information William Lorié, Center for NextGen Learning & Assessment, Pearson Mining the Web to Leverage Collective Intelligence and Learn Student Preferences Kathy McKnight, Center for Educator Learning & Effectiveness, Pearson; Antonio Moretti and Ansaf Salleb-Aouissi, Center for Computational Learning Systems, Columbia University; José González-Brenes, Center for Digital Data, Analytics & Adaptive Learning, Pearson The Application of Sentiment and Topic Analysis to Teacher Evaluation Policy Antonio Moretti and Ansaf Salleb-Aouissi, Center for Computational Learning Systems, Columbia University; Kathy McKnight, Center for Educator Learning & Effectiveness, Pearson 144 Washington, DC, USA Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 4, Meeting Room Level, Coordinated Session, I4 Methods and Approaches for Validating Claims of College and Career Readiness Session Chair: Thanos Patelis, Center for Assessment Session Discussant: Michael Kane, Educational Testing Service The focus on college and career readiness has penetrated all aspects and segments of education, as well as economic and political rhetoric. Testing organizations, educational organizations, states, and institutions of higher education have made claims of college and career readiness. New large-scale assessments have been launched and historic assessments used for college admissions and placements are being revised to represent current claims of college and career readiness. Validation evidence to substantiate these claims are important and expected (AERA, APA, & NCME, 2014). This session will involve four presentations by active participants and contributors in the conceptualization, design, and implementation of validation studies. Each presentation will present a validation framework and specific suggestions, recommendations and examples of methodologies in undertaking the validation of these claims of college and career readiness. Concrete suggestions will be provided. A fifth presenter will offer comments about the presentations and also provide additional recommendations and insights. Are We Ready for College and Career Readiness? Stephen Sireci, University of Massachusetts-Amherst Validating Claims for College and Career Readiness with Assessments Used for Accountability Wayne Camara, ACT Moving Beyond the Rhetoric: Urgent Call for Empirically Validating Claims of College-And-Career-Readiness Catherine Welch and Stephen Dunbar, University of Iowa Some Concrete Suggestions and Cautions in Evaluating/Validating Claims of College Readiness Thanos Patelis, Center for Assessment 145 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 8:15 AM - 10:15 AM, Renaissance West A, Ballroom Level, Invited Session, I5 Recent Advances in Quantitative Social Network Analysis in Education Presenters: Tracy Sweet, University of Maryland Qiwen Zheng, University of Maryland Mengxiao Zhu, ETS Sam Adhikari, Carnegie Mellon University Beau Dabbs, Carnegie Mellon University I-Chien Chen, Michigan State University Social network data is becoming increasingly more common in education research and the purpose of this symposium is to both summarize current research on social network methodology and to showcase how these methods can address substantive research questions in education and promote on-going education research. Each presentation introduces exciting cutting-edge methodological research focusing on different aspects of social network analysis that will be of interest to both methodologists and education researchers. The session will begin with an introduction by Tracy Sweet followed by several methodological talks showcasing exciting new research. Mengxiao Zhu will describe new ways to analyze network data from students’ learning and problem-solving processes. Qiwen Zheng will discuss a model for multiple networks that focuses on subgroup integration. Sam Adikhari will discuss a longitudinal model that illustrates how network structure changes over time, and I-Chien Chen will also introduce new methods for multiple time points but will focus on how changes over time is related to changes in other outcomes. Finally, Beau Dabbs will discuss model selection methods 146 Washington, DC, USA Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 15, Meeting Room Level, Paper Session, I6 Issues in Automated Scoring Session Discussant: Shayne Miel, Turnitin Modeling the Global Text Features for Enhancing the Automated Scoring System Syed Muhammad Fahad Latifi and Mark Gierl, University of Alberta We will introduce and demonstrate the innovative modeling of global text features for enhancing the performance of automated essay scoring (AES) system. The representative dataset from PARCC and SMARTER Balanced states were used. The results suggested that the global text modeling has consistently outperformed two state-of-the-art commercial AES systems. Discretization of Scores from an Automated Scoring Engine Using Gradient Boosted Machines Scott Wood, Pacific Metrics Corporation In automated scoring engines using linear regression models, it is common to convert the continuous predicted scores into discrete scores for reporting. A recent study shows that special care must be taken when converting continuous predicted scores from gradient boosted machine modelling into discrete scores. Automated Scoring of Constructed Response Items Measuring Computational Thinking Daisy Rutstein, John Niekrasz and Eric Snow, SRI International Increasingly, assessments contain constructed response items to measure hard-to-assess inquiry- and designbased concepts. These types of item responses are challenging to score reliably and efficiently. This paper discusses the adaptation of an automated scoring engine for scoring responses on constructed response items measuring computational thinking. Automated Scoring of Complex Technology-Enhanced Tasks in a Middle School Science Unit Samuel Crane, Aaron Harnly, Malorie Hughes and John Stewart, Amplify We show how complex user-interaction data from a Natural Selection app can be auto-scored using several methods. We estimate validity using a comparative analysis of content-expert ratings, evidence rule scoring, and a machine learning approach. The machine learning approaches are shown to agree with expert human scoring. Comparison of Human Rater and Automatic Scoring on Students’ Ability Estimation Zhen Wang, Educational Testing Service (ETS); Lihua Yao, DoD Data Center; Yu Sun The purpose is to compare human rater with automatic scoring in terms of examinees’ ability estimation with IRTbased rater model. Each speaking item is analyzed with both IRT models without rater-effect and with rater-effects. The effects of different rating designs may substantially increase the bias in examinees’ ability estimation. Issues to Consider When Examining Differential Item Functioning in Essays Matthew Schultz, Jonathan Rubright and Aster Tessema, American Institute of Certified Public Accountants The development of Automated Essay Scoring has propelled the increasing use of writing in high-stakes assessments. To date, DIF is rarely considered in such contexts. Here, methods to assess DIF in essays and considerations for practitioners are reviewed, and results of an application from an operational testing program are discussed. 147 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 8:15 AM - 10:15 AM, Meeting Room 16, Meeting Room Level, Paper Session, I7 Multidimensional and Multivariate Methods Session Discussant: Irina Grabovsky, NBME Information Functions of Multidimensional Forced-Choice IRT Models Seang-hwane Joo, Philseok Lee and Stephen Stark, University of South Florida This paper aimed to develop the concept of information functions for multidimensional force-choice IRT models and demonstrate how statement parameters and test formats (pair, triplet and tetrad) influence the item and test information. The implications for constructing fake-resistant noncognitive measures are further discussed using information functions. Investigating Reverse-Worded Matched Item Pairs Using the GPCM and NRM Ki Matlock, Oklahoma State University; Ronna Turner and Dent Gitchel, University of Arkansas The GPCM is often used for polytomous data, however the NRM allows for the investigation of how adjacent categories may discriminate differently when items are positively or negatively worded. In this study, responses to reverse-worded items are analyzed using the two models, and the estimated parameters are compared. Item Response Theory Models for Ipsative Tests with Polytomous Multidimensional Forced-Choice Items Xue-Lan Qiu and Wen-Chung Wang, The Hong Kong Institute of Education Developments of IRT models for ipsative tests with dichotomous multidimensional forced-choice items have been witnessed in recent years. In this study, we develop a new class of IRT models for polytomous MFC items. We conducted simulation studies in variety of conditions to evaluate parameter recovery and provided an empirical example. Multivariate Generalizability Theory and Conventional Approaches for Obtaining More Accurate Disattenuated Correlations Walter Vispoel, Carrie Morris and Murat Kilinc, University of Iowa The standard approach for obtaining disattenuated correlations rests on assumptions easily violated in practice. We explore multiple methods for obtaining disattenuated correlations designed to limit introduction of bias due to assumption violations, including methods based on applications of multivariate generalizability theory and a conventional alternative to such methods. Comparing a Modified Alpha Coefficient to Split-Half Approaches in the LOFT Framework Tammy Trierweiler, Law School Admission Council (LSAC); Charles Lewis, Educational Testing Service In this study, the performance of a Modified Alpha coefficient was compared to split-half methods for estimating generic reliability in a LOFT framework. Simulations across different ability distributions, sample sizes and ranges of item pool difficulties were considered and results were compared to the corresponding theoretical population reliability. Estimating Correlations Among School Relevant Categories in a Multidimensional Space Se-Kang Kim, Fordham University; Joseph Grochowalski, College Board The current study estimates correlations between row and column categories in a multidimensional space. The contingency table being analyzed consists of New York school districts as row categories and school relevant categories (e.g., attendance, safety,…, etc.) as column categories. To calculate correlations, the biplot paradigm (Greenacre, 2010) is utilized 148 Washington, DC, USA Monday, April 11, 2016 10:35 AM - 12:05 PM, Renaissance West A, Ballroom Level, Invited Session, J1 Hold the Presses! How Measurement Professionals Can Speak More Effectively with the Press and the Public (Education Writers Association Session) Session Chairs: Kristen Huff, ACT Laurie Wise, HumRRO, Emeritus Lori Crouch, EWA Session Panelists: Caroline Hendrie, EWA David Hoff, Hager Sharp Andrew Ho, Harvard Graduate School of Education Anya Kamenetz, NPR Sarah Sparks, Education Week How can members of the press help advance the assessment literacy of the general public? Could we have communicated better about the Common Core State Standards? Please join NCME for a panel session sponsored jointly with the Education Writers Association (EWA), the professional organization of journalists that covers education. In this panel discussion, EWA Executive Director Caroline Hendrie will lead a conversation with journalists and academics about the role of measurement experts and the press in the modern media era, with all its political polarization, sound bites, Twitter hashtags, and quotes on deadline. Approximately half the session will be reserved for audience questions and answers, so please take advantage of this unique opportunity to discuss how we can improve our communication about educational measurement. 149 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, J2 Challenges and Solutions in the Operational Use of Automated Scoring Systems Session Chair: Su-Youn Yoon Session Discussant: Klaus Zechner, ETS An automated scoring system can assess constructed responses faster than human raters and at a lower cost. These advantages have prompted a strong demand for high-performing automated scoring systems for various applications. However, even state-of-the-art automated scoring systems face numerous challenges to their use in operational testing programs. This session will discuss four important issues that may arise when automated scoring systems are used in operational tests: features vulnerable to sub-group bias, accommodations for special test taker groups with disabilities, the development of new tests using a novel input type, and the addition of automated scoring to ongoing operational testing programs based only on human scoring. These issues may be associated with problems that cause aberrant performance of automated scoring systems and result in weakening the validity of automated scores. Also, the addition of machine scoring to prior all-human scoring may change the score distribution and result in difficulty interpreting and maintaining the reported scale. We will analyze problems associated with these issues and provide solutions. This session will demonstrate the importance of considering validity issues at the initial stage of automated scoring system design in order to overcome these challenges. Fairness in Automated Scoring: Screening Features for Subgroup Differences Ji An, University of Maryland; Vincent Kieftenbeld and Raghuveer Kanneganti, McGraw-Hill Education CTB Use of Automated Scoring in Language Assessments for Candidates with Speech Impairments Heather Buzick, Educational Testing Service; Anastassia Loukina, ETS A Novel Automatic Handwriting Assessment System Built on Touch-Based Tablet Xin Chen, Ran Xu and Richard Wang, Pearson; Tuo Zhao, University of Missouri Ensuring Scale Continuity in Automated Scoring Deployment in Operational Programs Jay Breyer, Shelby Haberman and Chen Li, ETS 150 Washington, DC, USA Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, J3 Novel Models to Address Measurement Errors in Educational Assessment and Evaluation Studies Session Chair: Kilchan Choi, CRESST/UCLA Session Discussant: Elizabeth Stuart, Johns Hopkins Measurement error issues adversely affect results obtained from typical modeling approaches used to analyze data from assessment and evaluation studies. In particular, measurement error can weaken the validity of inferences from student assessment data, reduce the statistical power of impact studies, and diminish the ability of researchers to identify the causal mechanisms that lead to an intervention improving the desired outcome. This symposium proposes novel statistical models to account for the impact of measurement error. The first paper proposes a multilevel two-tier item factor model with latent change score parameterization in order to address conditional exchangeability of participants that routinely accompanies analysis of multisite randomized experiments with pre- and posttests. The second paper examines the consequence of correcting measurement errors in valueadded models to address a question on who are the teachers that are benefitting more than others in the result of correcting measurement errors. The third paper proposes a multilevel latent variable plausible values approach for more appropriately handling measurement error in predictors in multilevel modeling settings in which latent predictors are measured by observed categorical variables. The last paper proposes a three-level latent variable hierarchical model with a cluster-level measurement model using one-stage full information estimation approach. On the Role of Multilevel Item Response Models in Multisite Evaluation Studies Li Cai and Kilchan Choi, UCLA/CRESST; Megan Kuhfeld, UCLA Consequence of Correcting Measurement Errors in Value-Added Models Kilchan Choi, CRESST/UCLA; Yongnam Kim, University of Wisconsin Handling Error in Predictors Using Multiple-Imputation/Mcmc-Based Approaches: Sensitivity of Results to Priors Michael Seltzer, UCLA; Jiseung Yang, University of Maryland Three-Level Latent Variable Hierarchical Model with Level-2 Measurement Model Kilchan Choi and Li Cai, UCLA/CRESST; Michael Seltzer, UCLA 151 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, J4 Mode Comparability Investigation of a CCSS Based K-12 Assessment Session Chair: David Chayer, Data Recognition Corporation Session Discussant: Debora Harris, ACT Recent introduction of the Common Core State Standards and accountability legislation have brought extensive attention to online administration of K-12 large scale assessments. In this coordinated session, a series of mode comparability investigations on a K-12 assessment which uses various item types, such as multiple choice, technology enhancement, and open ended items, is attempted in order to test three major comparability hypotheses of same test factor structure, same measurement precision, and same score properties by applying various methods. A presentation of most recent trends of the mode comparability studies on K-12 assessments will be followed by the presentations of findings from the mode comparability hypotheses investigations mentioned above. Finally, results via various equating methods are compared when a difference in difficulty exists in the two modes. This coordinated session will contribute to the measurement field by providing a summary of the most recent mode comparability studies, theoretical guidelines for mode comparability, and practical considerations for educators and practitioners. Recent Trends of Mode Comparability Studies Jong Kim, ACT Comparison of OLT and PPT Structure Karen Barton, Learning Analytics; Jungnam Kim, NBCE Applying an IRT Method to Mode Comparability Dong-In Kim, Keith Boughton and Joanna Tomkowicz, Data Recognition Corporation; Frank Rijiman, AAMC Equating When Mode Effect Exists Marc Julian, Dong-in Kim, Ping Wan and Litong Zhang, Data Recognition Corporation 152 Washington, DC, USA Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 16, Meeting Room Level, Paper Session, J5 Validating “Noncognitive”/Nontraditional Constructs II Session Discussant: Andrew Maul, University of California, Santa Barbara Using Response Times to Enhance Scores on Measures of Executive Functioning Brooke Magnus, University of North Carolina at Chapel Hill; Michael Willoughby, RTI International; Yang Liu, University of California, Merced We propose a novel response time model for the assessment of executive functioning in children transitioning from early to middle childhood. Using a model comparison approach, we examine the degree to which response times may be analyzed jointly with response accuracy to improve the precision and range of ability scores. A Structural Equation Model Replication Study of Influences on Attitudes Towards Science Rajendra Chattergoon, University of Colorado, Boulder This paper replicates and extends a structural equation model using data from the Trends in International Mathematics and Science Study (TIMSS). Similar latent factor structure was obtained using TIMSS 1995 and 2011 data, but some items loaded on multiple factors. Three models fit the data equally well, suggesting multiple interpretations. Experimental Validation Strategies Using the Example of a Performance-Based Ict-Skills Test Lena Engelhardt and Frank Goldhammer, German Institute for International Educational Research; Johannes Naumann, Goethe University Frankfurt; Andreas Frey, Friedrich Schiller University Jena Two experimental validation approaches are presented to investigate the construct interpretation of ability scores using the example of a performance-based ICT (information and communication technology) -skills test. Constructrelevant task characteristics were manipulated experimentally, first, to change only the difficulty of items, and second, to change also the tapped construct. Measuring Being Bullied in the Context of Racial and Religious DIF Michael Rodriguez, Kory Vue and Jose Palma, University of Minnesota To address the measurement and relevance of novel constructs in education, a measure of being bullied is anticipated to exhibit DIF on items about the role of race and religion. The scale is recalibrated to account for DIF and compared vis-à-vis correlations, mean differences, and criterion-referenced levels of being bullied. 153 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 15, Meeting Room Level, Paper Session, J6 Differential Functioning - Theory and Applications Session Discussant: Catherine McClellan, Clowder Consulting Using the Partial Credit Model to Investigate the Comparability of Examination Standards Qingping He and Michelle Meadows, Office of Qualifications and Examinations Regulation This study explores the use of the Partial Credit Model (PCM) and differential step functioning (DSF) to investigate the comparability of standards in examinations that test the same subjects but are provided by different assessment providers. These examinations are used in the General Certificate of Secondary Education qualifications in England. Handling Missing Data on DIF Detection Under the Mimic Model Daniella Reboucas and Ying Cheng, University of Notre Dame In detecting differential item functioning (DIF), mistreatment of missing data would inflate type I error and lower power. This study examines DIF detection with the MIMIC model under the three missing mechanisms. Results suggest that the full information maximum likelihood method works better than multiple imputation in this case. Properties of Matching Criterion and Its Effect on Mantel-Haenszel DIF Procedure Usama Ali, Educational Testing Service This paper investigates the matching criterion used for Mantel-Haenszel DIF procedure. The goal of this paper is to evaluate the robustness of DIF results due to less optimal conditions as reflected in number of items contributing to the criterion score, number of score levels, and its reliability. Impact of Differential Bundle Functioning on Test Performance of Focal Examinees Kathleen Banks, LEAD Public Schools; Cindy Walker, University of Wisconsin-Milwaukee The purpose of this study was to apply the Walker, Zhang, Banks, and Cappaert (2012) effect size criteria to bundles that showed statistically significant differential bundle functioning (DBF) against focal groups in past DBF studies. The question was whether the bundles biased the mean total scores for focal groups. 154 Washington, DC, USA Monday, April 11, 2016 10:35 AM - 12:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, J7 Latent Regression and Related Topics Session Discussant: Matthias von Davier, ETS Multidimensional IRT Calibration with Simultaneous Latent Regression in Large-Scale Survey Assessments Lauren Harrell and Li Cai, University of California, Los Angeles Multidimensional item response theory models, estimated simultaneously with latent regression models using an adaptation of the Metropolis-Hastings Robbins-Monro algorithm, are applied to data from the National Assessment of Educational Progress (NAEP) Science and Mathematics assessments. The impact of dimensionality on parameter estimation and plausible values is investigated. Single-Stage Vs. Two-Stage Estimation of Latent Regression IRT Models Peter van Rijn, ETS Global; Yasmine El Masri, Oxford University Centre for Educational Assessment Item and population parameters of PISA 2012 data are compared between a single-stage and a two-stage approach. While item and population parameters remained similar, standard errors of population parameters were greater in a single-stage approach. Similar results were observed when fitting univariate and multivariate models. Practical implications are discussed. Improving Score Precision in Large-Scale Assessments with the Multivariate Bayesian Lasso Steven Culpepper, Trevor Park and James Balamuta, University of Illinois at Urbana-Champaign The multivariate Bayesian Lasso (MBL) was developed for high-dimensional regression models, such as the conditioning model in large-scale assessments (e.g., NAEP). Monte Carlo results document the gains in score precision achieved when employing the MBL model versus Bayesian models that assume a multivariate normal prior for regression coefficients. Performance of Missing Data Approaches in Retrieving Group-Level Parameters Steffi Pohl, Freie Universität Berlin; Carmen Köhler and Claus Carstensen, Otto-Friedrich-Universität Bamberg We investigate the performance of different missing data approaches in retrieving group-level parameters (e.g., regression coefficients) that are usually of interest in large-scale assessments. Results show that ignoring missing values performed almost equally well as model-based approaches for nonignorable missing data; both approaches outperformed treating missing values as incorrect responses. 155 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 11:00 AM - 2:00 PM, Meeting Room 12, Meeting Room Level Past Presidents Luncheon By invitation only 156 Washington, DC, USA Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 8/9, Meeting Room Level, Invited Session, K1 The Every Students Succeeds Act (ESSA): Implications for Measurement Research and Practice Session Moderator: Martin West, Harvard Graduate School of Education Session Presenters: Peter Oppenheim, Education Policy Director and Counsel, U.S. Senate Committee on Health, Education, Labor, and Pensions (Majority) Sarah Bolton, Education Policy Director, U.S. Senate Committee on Health, Education, Labor, and Pensions (Minority) Session Respondents: Sherman Dorn, Arizona State University Marianne Perie, University of Kansas John Easton, Spencer Foundation The 2015 enactment of the Every Student Succeeds Act marked a major shift in federal education policy, allowing states greater flexibility with respect to the design of school accountability systems while at the same time directing them to incorporate additional performance metrics not based on test scores. In this session, key Congressional staff involved in crafting the new law will describe its rationale and how they hope states will respond. A panel of researchers will in turn consider the opportunities the law creates for innovation in and research on educational measurement and the design of school accountability systems. 157 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 12:25 PM - 1:55 PM, Renaissance West A, Ballroom Level, Coordinated Session, K2 Career Paths in Educational Measurement: Lessons Learned by Accomplished Professionals Session Moderator: S E Phillips, Assessment Law Consultant Session Panelists: Kathy McKnight, Pearson School Research; Joe Martineau, National Center for the Improvement of Educational Assessment; Barbara Plake, University of Nebraska Lincoln, Emeritus Deciding what you want to do when you become a measurement professional can be a daunting task for a masters or doctoral student about to graduate. It can also be challenging for a graduate of a measurement program about to begin a first job. Sometimes, graduate students see the work of accomplished measurement professionals and wonder how they got there. Other times, graduate students know what they are interested in and the type of measurement activity they would like to engage in, but are uncertain which settings or career paths will provide the best fit. Careers in educational measurement are many and varied. As graduate students consider their career options, they must weight their skills, abilities, interests and preferences against the opportunities, expectations, demands and advancement potential of various jobs and career paths. This session is designed to provide some food for thought for these difficult decisions. It is targeted particularly at graduate students in measurement programs, graduates in their first jobs and career changers within measurement. 158 Washington, DC, USA Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, K3 Recent Investigations and Extensions of the Hierarchical Rater Model Session Chair: Jodi Casabianca, The University of Texas at Austin Session Discussant: Brian Patterson, Questar Assessment Rater effects in education testing and research have the potential to impact the quality of scores in constructed response and performance assessments. The hierarchical rater model (HRM) is a multilevel item response theory model for multiple ratings of behavior and performance that yields estimates of latent traits corrected for individual rater bias and variability (Casabianca, Junker, & Patz, 2015; Patz, Junker, Johnson, & Mariano, 2002). This session reports on some extensions and investigations of the basic HRM. The first paper serves as a primer to the session, providing the basic HRM formulae and notation, as well as comparisons to competing models. The second paper focuses on a parameterization of the longitudinal HRM that uses an autoregressive and/or moving average process in the estimation of latent traits over time. The third paper discusses a multidimensional extension to the HRM to be used with rubrics assessing more than one trait. The fourth paper evaluates HRM parameter estimates when the examinee population is nonnormal, and demonstrates the use of flexible options for the Bayesian prior on the latent trait. The HRM and Other Modern Models for Multiple Ratings of Rich Responses Brian Junker, Carnegie Mellon University The Longitudinal Hierarchical Rater Model with Autoregressive and Moving Average Processes Mark Bond and Jodi Casabianca, The University of Texas at Austin; Brian Junker, Carnegie Mellon University The Hierarchical Rater Model for Multidimensional Rubrics Ricardo Nieto, Jodi Casabianca and Brian Junker, The University of Texas at Austin Parameter Recovery of the Hierarchical Rater Model with Nonnormal Examinee Populations Peter Conforti and Jodi Casabianca, The University of Texas at Austin 159 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, K4 The Validity of Scenario-Based Assessment: Empirical Results Session Chair: Randy Bennett, ETS Session Discussant: Brian Stecher, RAND Scenario-based assessments are distinct from traditional tests in that the former present a unifying context with which all subsequent questions are associated. Among other things, that context, or scenario, is intended to provide a reasonably realistic setting and purpose for responding. The presence of the scenario should, at best, facilitate valid, fair, and reliable measurement but, in no event should it impede such measurement. The facilitation of valid, fair, and reliable measurement may occur because the scenario increases motivation and engagement, provides background information to activate prior knowledge and make it more equal across students, or steps students through warm-up problems that prepare them better for undertaking a culminating performance task. Among the issues that have emerged with respect to scenario-based assessment are generalizability (e.g., students less knowledgeable or interested in the particular scenario may be disadvantaged); local dependency (i.e., items may be conditionally dependent, artificially inflating measurement precision); and scaffolding effects (e.g., the leadin tasks may help students perform better than they otherwise would). This symposium will include three papers describing scenario-based assessments for K-12 reading, writing, and science, as well as empirical results related to their validity, fairness, and reliability. Brian Stecher, of RAND, will be the discussant. Building and Scaling Theory-Based and Developmentally-Sensitive Scenario-Based Reading Assessments John Sabatini, Tenaha O’Reilly, Jonathan Weeks and Jonathan Steinberg, ETS Scenario-Based Assessments in Writing: An Experimental Study Randy Bennett and Mo Zhang, ETS SimScientists Assessments: Science System Framework Scenarios Edys Quellmalz, Matt Silberglitt, Barbara Buckley, Mark Loveland, Daniel Brenner and Kevin (Chun-Wei) Huang, WestEd 160 Washington, DC, USA Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 5, Meeting Room Level, Paper Session, K5 Item Design and Development Session Discussant: Ruth Childs, University of Toronto A Mixed Methods Examination of Reverse-Scored Items in Adolescent Populations Carol Barry and Haifa Matos-Elefonte, The College Board; Whitney Smiley, SAS This study is a mixed methods exploration of reverse-scored items administered to 8th graders. The quantitative portion examines the psychometric properties of a measure of academic perseverance. The qualitative portion uses think aloud interviews to explore potential reasons for poor functioning of reverse-scored items on the instrument. Effects of Writing Skill on Scores on Justification/Evaluation Mathematics Items Tim Hazen and Catherine Welch, Iowa Testing Programs Justification/Explanation (J/E) items in Mathematics require students to justify or explain their answers, often through writing. This empirical study matches scores on J/E items with scores on Mathematics and Writing achievement tests to examine 1) unidimensionality assumptions and 2) potentially unwanted effects on scores on tests with J/E items. Economy of Multiple-Choice (mc) Versus Constructed-Response (cr) Items: Does Cr Always Lose? Xuan-Adele Tan and Longjuan Liang, Educational Testing Service This study will compare Multiple-Choice (MC) versus Constructed-Response (CR) items in different contents and of different types in terms of cost and time for certain level of reliability. Results showed that CRs can have higher or comparable reliabilities for certain contents. Results will help direct future test design effort. Applying the Q-Diffusion IRT Model to Assess the Impact of Multi-Media Items Nick Redell, Qiongqiong Liu and Hao Song, National Board of Osteopathic Medical Examiners (NBOME) An application of the Q-diffusion IRT response process model to data from a timed, high-stakes licensure examination suggested that multi-media items convey additional information to examinees above and beyond the time needed to process and encode the item and that multi-media alters response processes for select examinees. 161 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 15, Meeting Room Level, Paper Session, K6 English Learners Session Discussant: Michael Rodriguez, University of Minnesota Using Translanguaging to Assess Math Knowledge of Emergent Bilinguals: An Exploratory Study Alejandra Garcia and Fernanda Gandara, University of Massachusetts; Alexis Lopez, Educational Testing Services There are persisting gaps in mathematics scores between ELs (English-learners) and non-ELs even with existing test accommodations. Translanguaging considers that bilinguals have one linguistic repertoire from which they select features strategically to communicate effectively. This study analyzed the performance of ELs on a math assessment that included items translanguaging features. Estimating Effects of Reclassification of English Leaners Using a Propensity Score Approach Jinok Kim, Li Cai and Kilchan Choi, UCLA/CRESST Reclassification of English Learners (ELs) should be based on their readiness for mainstream classrooms. Drawing on propensity score methods, this paper estimates the effects of ELs’ reclassification on their subsequent academic outcomes in one state. Findings suggest small but positive effects for students reclassified in grades 4, 5, and 6. Comparability Study of Computer-Based and Paper-Based Tests for English Language Learners Nami Shin, Mark Hansen and Li Cai, University of California, Los Angeles/ National Center for Research on Evaluation, Standards, and Student Testing (CRESST) The purpose of this study is to examine the extent to which English Language Learner (ELL) status interacts with mode of test administration on large-scale, end-of-year content assessments. Specifically, we examine whether differences in item performance or functioning across Computer-based and Paper-based administrations are similar for ELL and non-ELL students. Applying Hierarchical Latent Regression Models in Cross Lingual Assessment Haiyan Lin and Xiaohong Gao, ACT, Inc. This study models the variation of examinees’ performance across groups and interaction effect between group and person variables by applying 2- and 3-level hierarchical latent regression model in cross lingual assessments. Simulation uses empirical estimates of two real datasets and explores different sample sizes, test lengths, and theta distributions. 162 Washington, DC, USA Monday, April 11, 2016 12:25 PM - 1:55 PM, Meeting Room 16, Meeting Room Level, Paper Session, K7 Differential Item and Test Functioning Session Discussant: Dubravka Svetina, Indiana University Examining Sources of Gender DIF Using Cross-Classified Multilevel IRT Models Liuhan Cai and Anthony Albano, University of Nebraska–Lincoln An understanding of the sources of DIF can lead to more effective test development. This study examined gender DIF and its relationship with item format and opportunity to learn using cross-classified multilevel IRT models fit to math achievement data from an international dataset. Implications for test development are discussed. Comparing Differential Test Functioning (dtf) for Dfit Mantel-Haenszel/Liu-Agresti Variance C. Hunter and T. Oshima, Georgia State University Using simulated data, DTF was calculated using DFIT and the Mantel-Haenszel/Liu-Agresti variance method. DFIT results show unacceptable Type I error rate for DIF conditions with unequal sample sizes, but no susceptibility to distributional differences. The variance method showed expected high rates of DTF, being especially sensitive to distributional differences. When Can MIRT Models Be a Solution for Dif? Yuan-Ling Liaw and Elizabeth Sanders, University of Washington The present study was designed to examine whether multidimensional item response theory (MIRT) models might be useful in controlling for differential item functioning (DIF) when estimating primary ability, or whether traditional (and simpler) unidimensional item response theory (UIRT) models with DIF items removed are sufficient for accurately estimating primary ability. Power Formulas for Uniform and Non-Uniform Logistic Regression DIF Tests Zhushan Li, Boston College Power formulas for the popular logistic regression tests for uniform and non-uniform DIF are derived. The formulas provide a means for sample size calculations in planning DIF studies with logistic regression DIF tests. Factors influencing the power are discussed. The correctness of the power formulas is confirmed by simulation studies. Detecting Group Differences in Item Response Processes: An Explanatory Speed-Accuracy Mixture Model Heather Hayes, AMTIS Inc.; Stephen Gunter and Sarah Morrisey, Camber Corporation; Michael Finger, Pamela Ing and Anne Thissen-Roe, Comira For the purpose of assessing construct validity, we extend previous conjoint speed-accuracy models to simultaneously examine a) the impact of cognitive components on performance for verbal reasoning items and b) how these effects (i.e., response processes) differ among groups who vary in educational breadth and depth. 163 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 12:25 PM - 1:55 PM, Mount Vernon Square, Meeting Room Level, Electronic Board Session, Paper Session, K8 Electronic Board #1 Extension of the Lz* Statistic to Mixed-Format Tests Sandip Sinharay, Pacific Metrics Corp Snijders (2001) suggested the lz* statistic that is a popular IRT-based person fit statistic (PFS). However, lz* can be computed for tests including only dichotomous items and has not been extended to mixed-format tests. This paper extends lz* to mixed-format tests. Electronic Board #2 Examining Two New Fit Statistics for Dichotomous IRT Models Leanne Freeman and Bo Zhang, University of Wisconsin, Milwaukee This study introduces the Clarke and Vuong statistics for assessing model-data fit for dichotomous IRT models. Monte Carlo simulations will be conducted to examine the Type I error and power of the two statistics. Their performance will be compared to the likelihood ratio test, which most researchers use currently. Electronic Board #3 Automated Marking of Written Response Items in a National Medical Licensing Examination Maxim Morin, André-Philippe Boulais and André De Champlain, Medical Council of Canada Automated essay scoring (AES) offers a promising alternative to human scoring for the marking of constructedresponse type items. Based on real data, the present study compared several AES conditions for scoring shortanswer CR items and evaluated the impact of using AES on the overall statistics of a sample examination form. Electronic Board #4 Evaluating Automated Rater Performance: Is the State of the Art Improving? Michelle Boyer, University of Massachusetts, Amherst; Vincent Kieftenbeld, Pacific Metrics This study evaluates multiple automated raters across four different automated scoring studies to assess whether the state of the art in automated scoring is advancing. Beyond an item by item evaluation, the method used here investigates automated rater performance across many items. Electronic Board #5 Test-Taking Strategies and Ability Estimates in a Speeded Computerized Adaptive Test Hua Wei and Xin Li, Pearson This study compares ability estimates of examinees using different test-taking strategies towards the end of a computerized adaptive test (CAT) when they are unable to finish the test within the allotted time. Item responses will be simulated for fixed-length CAT administrations with different test lengths and different degrees of speededness. Electronic Board #6 Detecting Cheating When Examinees and Accomplices Are Not Physically Co-Located Chi-Yu Huang, Yang Lu and Nooree Huh, ACT.Inc. A simulation study will be conducted to examine the efficiency of different statistics in detecting cheating among examinees who are physically in different locations but share highly similar item responses. Different statistics that will be investigated include a modified ω index, l_z index, H^T index, score estimation, and score prediction. 164 Washington, DC, USA Electronic Board #7 Detecting Differential Item Functioning (dif) Using Boosting Regression Tree Xin Luo and Mark Reckase, Michigan State University; John Lockwood, ETS A classification method in data mining known as boosting regression tree (BRT) was applied to identify the items with DIF in a variety of test situations, and the effectiveness of this new method was compared with other DIF detection procedures. The results supported the quality of the BRT method. Electronic Board #8 Using Growth Mixture Modeling to Explore Test Takers’ Score Change Patterns Youhua Wei, Educational Testing Service For a large-scale and high-stakes testing program, some examinees take the test more than once and their score change patterns vary across individuals. This study uses latent class and growth mixture modeling to identify unobserved sub-populations and explore different latent score change patterns among repeaters in a testing program. Electronic Board #9 Studies of Growth in Reading in a Vertically Equated National Reading Test David Andrich and Ida Marais, University of Western Australia Australia’s yearly reading assessments for all Year 3, 5, 7 and 9 students are equated vertically. The rate of increase of the worst performing state is greater than that of the best performing one. The former’s efforts to improve reading may be missed if mean achievements alone were compared. Electronic Board #10 Examining the Impact of Longitudinal Measurement Invariance Violations on Growth Models Kelli Samonte, American Board of Internal Medicine; John Willse, University of North Carolina Greensboro Longitudinal analyses rely on the assumption that scales function invariantly across measurement occasions. Minimal research has been conducted to evaluate the impact longitudinal measurement invariance violations have on latent growth models (LGM). The current study aims to examine the impact varying degrees of longitudinal invariance violations have on LGM parameters. Electronic Board #11 Defining On-Track Towards College Readiness Using Advanced Latent Growth Modeling Techniques Anthony Fina, Iowa Testing Programs, University of Iowa The primary purpose of this exploratory study was to investigate growth at the individual level and examine how individual variability in growth is related to college readiness. Growth mixture models and a latent class growth analysis were used to define developmental trajectories from middle school through high school. Electronic Board #12 Impact of Sample Size and the Number of Common Items on Equating Hongyu Diao, Duy Pham and Lisa Keller, University of Massachusetts-Amherst Three methods of small sample equating in the non-equivanlent groups anchor test design are investigated in this simulation study: circle-arc, nominal weights mean equating, and Rasch equating. Results indicate that in the presence of small samples, increasing the number of equating items might help mitigate the error. 165 2016 Annual Meeting & Training Sessions Electronic Board #13 Effect of Test Speededness on Item Parameter Estimation and Equating Can Shao, University of Notre Dame; Rongchun Zhu and Xiaohong Gao, ACT Test speededness often leads to biased parameter estimates and produces inaccurate equated scores, thus threatens test validity. In this study, we compare three different methods of dealing with test speededness and investigate their impact on item parameter estimation and equating. Electronic Board #14 Computation of Conditional Standard Error of Measurement with Compound Multinomial Models Hongling Wang, ACT, Inc. Compound multinomial models have been used to compute conditional standard error of measurement (CSEM) for tests containing polytomous scores. One problem hindering applications of these models is the great amount of computation for tests with complex item scoring. This study investigates strategies to simplify CSEM computation with compound multinomial models. Electronic Board #15 Exploring the Within-Item Speed-Accuracy Relationship with the Profile Method for Computer-Based Tests Shu-chuan Kao, Pearson The purpose of this study is to describe the effect of time on the item-person interaction for computer-based tests. The profile method shows the subgroup item difficulty conditioned on item latency. The profile trend can help testing practitioners easily inspect the effect of response time in empirical data. Electronic Board #16 Impact of Items with Minor Drift on Examinee Classification aijun wang, Yu zhang and Lorin Mueller, Federation of state boards of physical therapy This study examined the impact of items with minor drift on examinee’s classification accuracy at different levels of abilities. Results show the pass/fail status of examinees at medium ability levels are more affected than high or low ability levels. Electronic Board #17 Detecting DIF on Polytomous Items of Tests with Special Education Populations Kwang-lee Chu and Marc Johnson, Pearson; Pei-ying Lin, University of Saskatchewan Disability affects performance and interacts with gender/ethnicity; its impacts are more of ability differences and should be isolated from DIF analysis. The effects of disability on polytomous item DIF analysis are examined. This study uses empirical data and simulations investigating accuracy of DIF models. Electronic Board #18 Online Calibration of Polytomous Items Using the Generalized Partial Credit Model Yi Zheng, Arizona State University Online calibration is a technology-enhanced calibration strategy that dynamically embeds pretest items in operational computerized adaptive tests and utilizes known operational item parameters to calibrate the pretest items. This study extends existing online calibration methods for dichotomous IRT models to GPCM to model polytoums items such as performance-based items. 166 Washington, DC, USA Electronic Board #19 Identifying Intra-Individual Significant Growth in K-12 Reading and Mathematics with Adaptive Testing Chaitali Phadke, David Weiss and Theodore Christ, University of Minnesota Psychometrically significant intra-individual change in K-12 Math and Reading achievement was measured using the fixed-length (30-item) Adaptive Measurement of Change (AMC) method. Analyses indicated that the majority of change was nonlinear. Results supported the use of the AMC procedure for the detection of psychometrically significant change. Electronic Board #20 A Comparison of Estimation Techniques for IRT Models with Small Samples Holmes Finch, Ball State University; Brian French, Washington State University Estimation accuracy of item response theory (IRT) model parameters is a concern with small samples. This can preclude the use of IRT and associated advantages with low incidence populations. This simulation study compares marginal maximum likelihood (ML) and pairwise estimation procedures. Results support the accuracy of pairwise estimation over ML. Electronic Board #21 Comparing Three Procedures for Preknowledge Detection in Computerized Adaptive Testing Jin Zhang and Ann Wang, ACT Inc. One classical and two Bayesian procedures of item preknowledge detection based on the hierarchical lognormal response time model are compared for computerized adaptive testing. A simulation study is conducted to investigate the effectiveness of the methods in conditions with various proportions of items and examinees affected by item preknowledge. Electronic Board #22 Small Sample Equating for Different Uses of Test Scores in Higher Education HyeSun Lee, University of Nebraska-Lincoln; Katrina Roohr and Ou Lydia Liu, Educational Testing Service The current simulation examined four equating methods for small samples depending on the use of test scores in higher education. Mean equating performed better for the estimation of institution-level reliability, whereas identity equating performed slightly better for the estimation of value-added scores. The paper addresses practical implications of the findings. Electronic Board #23 Diagnostic Classification Modeling in Student Learning Progression Assessment Ruhan Circi, University of Colorado Boulder; Nathan Dadey, The National Center for the Improvementof Educational Assessment, Inc A diagnostic classification model is used in this study to model a learning progression assessment. Results provided evidence for the moderate item quality. There is found support for the use of learning progression in the classroom to help students to gain mastery at least in one of learning outcomes. 167 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 2:15 PM - 3:45 PM, Renaissance West A, Ballroom Level, Invited Session, L1 Learning from History: How K-12 Assessment Will Impact Student Learning Over the Next Decade (National Association of Assessment Directors) Session Organizer: Mary E Yakimowski, Sacred Heart University Session Panelists: Kenneth J Daly III Dale Whittington, Shaker Heights Schools Lou Fabrizio, North Carolina Department of Public Instruction Carlos Martínez, Jr, U.S. Department of Education James H McMillan, Virginia Commonwealth University Eva Baker, University of California, Los Angeles We have seen a remarkable evolution in the field of K-12 student assessment over the past 50 years. This increased attention has increased student learning, or has it? Through this invited session, you will hear panelists sharing insight from our history on K-12 assessment to offer learnings to best design and utilize assessment results that truly deepen student learning over this next decade. More specifically, this invited session brings together panelists representing practitioners (Mr. Kenneth J. Daly III, Dr. Dale Whittington), state and federal government agencies (Dr. Louis M. Fabrizio, Dr. Carlos Martinez) and higher education institutions (Dr. James H. McMillan, Dr. Eva Baker) with a combined experience in assessment of over 150 years. For the introductory portion of this session, panelists have been charged with sharing reflections on significant developments in K-12 student assessment from the last half century. They will do this by reconstructing their collective memory of this assessment history. The major portion of the session will be allotted to the second charge given to the panelists; specifically, to present and discuss some learnings gained from this history to better construct and use assessments that are geared to deepen student learning during this next decade. The last part of this session will allow for interactions among the panelists and the audience on improving learning through assessments. 168 Washington, DC, USA Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, L2 Psychometric Issues on the Operational New-Generation Consortia Assessments Session Discussant: Timothy Davey, Educational Testing Services Theoretical foundation of online (adaptive and non-adaptive) testing has been historically well established. Basic components of computerized adaptive test (CAT) procedures and their implementations have also been sufficiently investigated with options from various perspectives (Weiss and Gage, 1984; Way, 2005; Davey, 2011). However, new and practical psychometric issues arose as online assessments moved to large-scale operational testing practices. Particularly, newly-developed new-generation Common Core State Standards (CCSS) aligned assessments were operationalized to a number of states. Psychometric designs affected by these changes including scoring strategies, IRT model selections, and vertical scales, may have impact on the validity of test scores. Furthermore, a complex test design with both CAT and performance task was used for these CCSS-aligned assessments. The assessments also include innovative items, in addition to traditional dichotomous and polytomous items. Therefore, findings and solutions from previous research may not be directly applicable for some issues mentioned above regarding the operational online assessments. Innovative psychometric analyses and solutions are required. This session discusses the following important practical psychometric issues addressed in the first-year operational practice of the newly-developed new-generation Common Core State Standards (CCSS) aligned assessments, including (1) how to score an incomplete computerized adaptive test (CAT), (2) how to achieve an optimal balance between content/administration constraints and CAT efficiency in the assessment designs for accurate ability estimates? (3) which type of IRT models (unidimensional or multidimensional) produces more robust vertical scales in measuring student ability and growth? Three studies explore these questions using different psychometric and statistical methods based on operational data from multiple states or simulations. Analyses and findings are not only useful in validating the characteristics of the assessments for future improvement, but will also inspire more investigations in these areas that have not been fully explored yet. Psychometric Issues and Approaches in Scoring Incomplete Online-Adaptive Tests Yi Du, Yanming Jiang, Terran Brown and Timothy Davey, Educational Testing Service Effects of CAT Designs on Content Balance and the Efficiency of Test Shudong Wang, Northwest Evaluation Association; Hong Jiao, University of Maryland Multidimensional Vertical Scaling for Tests with Complex Structures and Various Growth Patterns Yanming Jiang, Educational Testing Service 169 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, L3 Issues and Practices in Multilevel Item Response Models Session Chair: Ji Seung Yang, University of Maryland Session Discussant: Li Ci, University of California Educational assessment data are often collected under complex sampling designs that result in unavoidable dependency among examinees within clusters such as classrooms or schools. The multilevel item response theory models (MLIRT) have been developed (e.g., Adams, Wilson, and Wu, 1997; Fox, 2005; Kamata, 2001) to address the nested structure of item response data more properly and to draw more sound statistical inferences for both within- and between-cluster level estimates (e.g., interclass correlation or cluster-level latent scores). Combined with multidimensionality or local dependency among item responses (e.g., testlet), the complexity of multilevel item response models has increased and drawn many methodologists’ attention with respect to the issues and practices that cover not only modeling but also scoring and choosing models. The purpose of this coordinated session is to introduce recent advanced topics in MLIRT and provide more practical guidance to practitioners to implement some of the extended MLIRT models. The session is composed of five papers. The first two papers are concerned about MLIRT models that reflect complex sampling designs properly, and the second two papers focus on the distribution of latent density and scoring at between-cluster level. Finally, the last paper is on model selection methods in MLIRT. Multilevel Cross-Classified Dichotomous Item Response Theory Models for Complex Person Clustering Structures Chen Li and Hong Jiao, University of Maryland Multilevel Item Response Models with Sampling Weights Xiaying Zheng and Ji Seung Yang, University of Maryland School-Level Subscores Using Multilevel Item Factor Analysis Megan Kuhfeld and Li Cai, University of California Multilevel Item Bifactor Models with Nonnormal Latent Densities Ji Seung Yang, Ji An and Xiaying Zheng, University of Maryland Model Selection Methods for Mlirt Models: Gaining Information from Different Focused Parameters Xue Zhang and Jian Tao, Northeast Normal University; Chun Wang, University of Minnesota 170 Washington, DC, USA Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, L4 Psychometric Issues in Alternate Assessments Session Chair: Okan Bulut, University of Alberta Session Discussant: Michael Rodriguez, University of Minnesota Alternate assessments are designed for students with significant cognitive disabilities. They are characterized by semi-adaptive test designs, testlet-based forms, small sample sizes, and negatively skewed ability distributions. This symposium aims to reflect the common psychometric challenges in the context of alternate assessments, such as local item dependence (LID), differential item functioning (DIF), testlet and position effects, and the impact of cumulative item parameter drift (IPD). The alternate assessments used in this proposal are mixed-format tests that consist of both dichotomous and polytomous items. The first study explores the advantages of a four-level measurement model (1– item effect, 2–testlet effect, 3–person effect, and 4–disability type effect) in investigating local item dependence caused by item clustering and local person dependence caused by person clustering over models that cannot handle them simultaneously. The second study employs the Linear Logistic Test Model (LLTM) to examine the consequences of item position and testlet position effects in alternate assessments. The use of LLTM for investigating position effects in a semi-adaptive test form is demonstrated. The third study quantifies the advantages of three bi-factor models that take the testlet-based item structure into account and compares them with the 2PL IRT model. In addition, DIF analysis based on each model included in the study is conducted, which helps understanding the differences of the models in the context of DIF. The last study examines the cumulative impact of item parameter drift on item parameter and student ability estimates. It includes a Monte Carlo simulation for each operational administration in five states across three to nine years. Results from simulations and operational testing are compared. Effects of different equating methods are also compared. Multilevel Modeling of Item and Person Clustering Simultaneously in Alternate Assessments Chao Xie and Hyesuk Jang, American Institutes for Research Examining Item and Testlet Position Effects in Computer-Based Alternate Assessments Okan Bulut, University of Alberta; Xiaodong Hou and Ming Lei, American Institutes for Research An Application of Bi-Factor Model for Examining DIF in Alternate Assessments Hyesuk Jang and Chao Xie, American Institutes for Research Impact of Cumulative Drift on Parameter and Ability Estimates in Alternate Assessments Ming Lei, American Institutes for Research; Okan Bulut, University of Alberta 171 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 5, Meeting Room Level, Coordinated Session, L5 Recommendations for Addressing the Unintended Consequences of Increasing Examination Rigor Session Discussant: Betsy Becker, Florida State University The purpose of this symposium is to present findings from all development activities since the RTTT and address the unintended consequences of increasing examination rigor. The findings from the past 5 years of FTCE/FELE development, scoring, reporting, and standard setting procedures and outcomes will be presented. First, The FTCE/ FELE program initiatives, as well as policy changes and outcomes that have occurred as a result of the increase in examination rigor will be presented. Second, the current study will draw an overview picture of the 1.5-2 year development cycle for the FTCE/FELE program and provide an in-depth explanation of the facilitation of each step in the test development process, based on the Standards for Educational and Psychological Testing. Third, the current psychometric, scoring and reporting, standard setting, and passing scores adoption processes for the FTCE/ FELE program will be discussed. Lastly an overview picture of educator candidates’ performance and in response to examinations’ increased rigor will be discussed and analysis of student-level and test-level data will be presented to answer: What is the impact of increased rigor on average difficulty of tests? Does increased rigor have significant impact on test takers’ performances? Does increased rigor have significant impact on passing rates? The Effect of Increased Rigor on Education Policy Phil Canto, Florida Department of Education Developing Assessments in an Ongoing Testing Environment Lauren White, Florida Department of Education FTCE/FELE Standard Setting and New Passing Scores: The Methodology Süleyman Olgar, Florida Department of Education Increased Rigor and Its Impact on Certification Examination Outcomes Onder Koklu, Florida Department of Education 172 Washington, DC, USA Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 15, Meeting Room Level, Paper Session, L6 Innovations in Assessment Session Discussant: TBA Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling Stephen Holmes, Michelle Meadows, Ian Stockford and Qingping He, Office of Qualifications and Examinations Regulation This research explores a new approach, the comparative judgement and Rasch modelling approach, to investigate the comparability of difficulty of examinations. Findings from this study suggests that this approach could potentially be used as a proxy for pretesting assessments when security or other issues are a major concern. Improvements in Automated Capturing of Psycho-Linguistic Features in Readingassessment Text Makoto Sano, Prometric This study explores psycho-linguistic features associated with reading passage MC item types that can be used to predict item difficulty levels of these item types. The effectiveness of new functions on NLP tool, PLIMAC (Sano, 2015) is evaluated in use of items from the NAEP Grade 8 Reading assessment. Generating Rubric Scores from Pairwise Comparisons Shayne Miel, Elijah Mayfield and David Adamson, Turnitin; Holly Garner, EverEd Technology Using pairwise comparisons to score essays on a holistic rubric is potentially a more reliable scoring method than traditional handscoring. We establish a metric for measuring the reliability of a scoring process and explore methods for assigning discrete rubric scores to the ranked list induced by the pairwise comparisons. Investigating Sequential Item Effects in a Testlet Model William Muntean and Joe Betts, Pearson Scenario-based assessments are well-suited for measuring professional decision-making skills such as clinical judgment. However, these types of items present a unique challenge to a testlet-based model because of potential sequential item effects. This research investigates the impact of sequential item effects within a testlet model. 173 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 12, Meeting Room Level, Paper Session, L7 Technology-Based Assessments Session Discussant: Mengxiao Zhu, ETS Theoretical Framework for Log-Data in Technology-Based Assessments with Empirical Applications from PISA Ulf Kroehne, Heiko Rölke, Susanne Kuger, Frank Goldhammer and Eckhard Klieme, German Institute for International Educational Research (DIPF) Indicators derived from log-data are often based on the ad hoc use of available events due to the missing definition of log-data completeness. This gap is filled with a theoretical framework that formalizes technology-based assessments with finite-state machines and provides completeness conditions, illustrated with empirical examples from PISA assessments. Investigating the Relations of Writing Process Features and the Final Product Chen Li, Mo Zhang and Paul Deane, Educational Testing Service Features extracted from the writing processes such as latency between keypresses have potential to provide evidence of one’s writing skills not available from the final product. This study investigates and compares the relations of process features with text quality as measured by two rubrics on writing fundamentals and higher-level skills. Interpretation of a Complex Assessment Focusing on Validity and Appropriate Reliability Assessment Steffen Brandt, Art of Reduction; Kristina Kögler, Goethe-Universität Frankfurt; Andreas Rausch, Universität Bamberg An analysis approach combining qualitative analyses of answer patterns and quantitative, IRT-based analyses is demonstrated on data from a test composed of three computer-based problem solving tasks (each 30-45 minutes). The strong qualitative component increases validity and additionally yields appropriate reliability estimates by avoiding local item dependence. Award Session: Brenda Loyd Dissertation Award 2016: Youn-Jeng Choi 174 Washington, DC, USA Monday, April 11, 2016 2:15 PM - 3:45 PM, Meeting Room 13/14, Meeting Room Level, Invited Session, L8 NCME Diversity and Testing Committee Sponsored Symposium: Implications of Computer-Based Testing for Assessing Diverse Learners: Lessons Learned from the Consortia Session Moderator: Priya Kannan, Educational Testing Service Session Discussant: Bob Dolan, Diverse Learners Consulting Six consortia developed and operationally delivered next-generation, large-scale assessments in 2015. These efforts provided opportunities to re-think the ways that assessment systems, and in particular computer-based tests, are designed to support valid assessment for all learners. In this session, representatives from each consortium will describe their lessons learned in the administration of computer-based tests to diverse learners. Topics will include design features of the assessment systems that are intended to promote effective and inclusive assessment, research and evaluation on the 2014-15 assessment administration, and future challenges and opportunities Smarter Balanced Assessment Consortium (SBAC) Tony Alpert, Smarter Balanced Assessment Consortium Partnership for Assessment of Readiness of College and Careers (PARCC) Trinell Bowman, Prince George’s County Public Schools in Maryland National Center and State Collaborative (NCSC) Rachel Quenemoen, National Center on Educational Outcomes Dynamic Learning Maps Alternate Assessment System (DLM) Russell Swinburne Romine, University of Kansas English Language Proficiency Assessment for the 21st Century (ELPA21) Martha Thurlow, National Center on Educational Outcomes WIDA Carsten Wilmes, University of Wisconsin 175 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 3:00 PM - 7:00 PM, Meeting Room 10/11, Meeting Room Level NCME Board of Directors Meeting Members of NCME are invited to attend as observers 176 Washington, DC, USA Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 8/9, Meeting Room Level, Coordinated Session, M1 Fairness Issues and Validation of Non-Cognitive Skills Session Chair: Haifa Matos-Elefonte, The College Board Session Discussant: Patrick Kyllonen, Educational Testing Service More research and attention are needed to ensure assessments of noncognitive skills provide fair and valid inferences for all examinees. Four presenters will offer perspectives on non-cognitive skills and the issues of fairness of assessing them in four contexts. The first presenter will discuss non-cognitive factors within the context of an international assessment offering a framework to handle the interplay of cultural and linguistic diversity in developing the assessment to ensure fairness and valid interpretations for all test takers. The second presenter will provide an overview of non-cognitive skills in K-12 settings with thoughts on the issues surrounding the various threats to fair and valid interpretations. The third presenter will extend the evidence-centered-design approach to capture the needs of culturally and linguistically diverse populations in the design and development of a noncognitive assessment used in higher education, so as to ensure the fairness and validity of inferences for all examinees. The fourth presentation will provide an overview of the fairness issues involving non-cognitive measures in personnel selection and discuss specific aspects that permit these assessments to be used in fair and valid ways. Finally, a discussant will provide some comments on each of the presentations and offer additional insights. Non-Cognitive Factors, Culture, and Fair and Valid Assessment of Culturally And-Linguistically-Diverse Learners Edynn Sato, Pearson Some Thoughts on Fairness Issues in Assessing Non-Cognitive Skills in K-12 Thanos Patelis, Center for Assessment An Application of Evidence-Centered-Design to Assess Collaborative Problem Solving in Higher Education Maria Elena Oliveri, Robert Mislevy and Rene Lawless, Educational Testing Service The Changing Use of Non-Cognitive Measures in Personnel Selection Kurt Geisinger, Buros Center for Testing, University of Nebraska-Lincoln 177 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 3, Meeting Room Level, Coordinated Session, M2 Thinking About Your Audience in Designing and Evaluating Score Reports Session Chair: Priya Kannan, Educational Testing Service Session Discussant: April Zenisky, University of Massachusetts, Amherst The information presented in score reports is often the single-most important point of interaction between a score user and the outcomes of an assessment. Score reports are consumed by a variety of score users (e.g., test takers, parents, teachers, administrators, policy makers), and each of these users have different levels of understanding of the assessment and its intended outcomes. The degree to which these diverse users understand the information presented in score reports impacts their ability to draw reasonable conclusions. Recent score reporting frameworks have highlighted the importance of taking into account the needs, pre-existing knowledge, and attitudes of specific stakeholder groups (Zapata-Rivera & Katz, 2014) as well as the importance of iterative design in the development of score reports (Hambleton & Zenisky, 2013). The papers in this session employ a variety of methods to identify and understand the needs of diverse stakeholder groups, and studies highlight the importance of sequential and iterative approaches (i.e., assessing needs – prototyping – evaluating usability and accuracy of understanding) to the design and development of audience-focused score reports. These collection of studies demonstrate how a focus on stakeholder needs can bring substantive gains for the validity of interpretations and decisions made from assessment results. Designing and Evaluating Score Reports for a Medical Licensing Examination Amanda Clauser, National Board of Medical Examiners; Francis Rick, University of Massachusetts, Amherst Evaluating Validity of Score Reports with Diverse Subgroups of Parents Priya Kannan, Diego Zapata-Rivera and Emily Leibowitz, Educational Testing Service Designing Alternate Assessment Score Reports: Implications for Instructional Planning Amy Clark, Meagan Karvonen and Neal Kingston, University of Kansas Interactive Score Reports: a Strategic and Systematic Approach to Development Richard Tannenbaum, Priya Kannan, Emily Leibowitz, Ikkyu Choi and Spyridon Papageorgiou, Educational Testing Service Data Systems and Reports as Active Participants in Data Analyses Jenny Rankin, Illuminate Education 178 Washington, DC, USA Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 4, Meeting Room Level, Coordinated Session, M3 Use of Automated Tools in Listening and Reading Item Generation Session Chair: Su-Youn Yoon, ETS Session Discussant: Christy Schneider, Center for Assessment Creating a large pool of valid items with appropriate difficulty has been a continuing challenge for testing programs. In order to address this need, several studies have focused on developing automated tools to predict the complexity of passages for reading or listening items. In addition to predicting text complexity, automated technologies can be used in a variety of ways in the context of item generation, which may contribute to increased efficiency, validity, and reliability in item development. This coordinated session will investigate the use of automated technology to support a wide range of processes for generating items that assess listening and reading skills. Aligning the Textevaluator Reporting Scale with the Common Core Text Complexity Scale Kathleen Sheehan, ETS Prediction of Passage Acceptance/ Rejection Using Linguistic Information Swapna Somasundaran, Yoko Futagi, Nitin Madnani, Nancy Glazer, Matt Chametsky and Cathy Wendler, ETS Measuring Text Complexity of Items for Adult English Language Learners Peter Foltz, Pearson and University of Colorado Boulder; Mark Rosenstein, Pearson Automatic Prediction of Difficulty of Listening Items Su-Youn Yoon, Anastassia Loukina, Youhua Wei and Jennifer Sakano, ETS Item Generation Using Natural Language Processing Based Tools and Resources Chong Min Lee, Melissa Lopez, Su-Youn Yoon, Jenifer Sakano, Anastassia Loukina, Bob Krovetz and Chi Lu, ETS 179 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 5, Meeting Room Level, Paper Session, M4 Practical Issues in Equating Session Discussant: Dongmei Li, ACT Empirical Item Characteristic Curve Pre-Equating with the Presence of Test Speededness Yuxi Qiu and Anne Huggins-Manley, University of Florida This simulation study is proposed to evaluate the accuracy of the empirical item characteristic curve (EICC) preequating method under combinations of varied levels of test speededness, sample size, and test length. Findings of this research provide guidelines for practitioners, and further stimulate a better practice toward score equating. Investigating the Effect of Missing and Speeded Responses in Equating Hongwook Suh, JP Kim and Tony Thompson, ACT, inc. This study investigates the effect of dealing with examinees who showed omitted and speeded responses on equating results by applying lognormal response time model (van der Linden, 2006). Empirical data are manipulated to design practical situations considered in the equating procedures. The Effects of Non-Representative Common Items on Linear Equating Relationships Lu Wang, ACT, Inc./The University of Iowa; Won-Chan Lee, University of Iowa This study investigates the effects of both content and statistical representation of common items on the accuracy of four linear equating relationships. The results of this study will assist practitioners in choosing the most accurate linear equating method(s) when the representativeness of common items is a concern. Pseudo-Equating Without Common Items or Common Persons Nooree Huh, Deborah Harris and Yu Fang, ACT, Inc. In some high stakes testing programs, it is not possible to conduct standard equating such as common item or random groups equating because once an item is exposed, it is no longer secure. However, the need to compare scores across administrations may still exist. This paper demonstrates some alternative approaches. Equating Item Difficulty Under Sub-Optimal Conditions Michael Walker, The College Board; Usama Ali, Educational Testing Service This paper evaluates two methods for equating item difficulty statistics: one using linear equating and the other using post-stratification. The paper evaluates these methods in terms of bias and error across a range of sample sizes and population ability differences; and across chains of equating of different length. Impact of Drifted Common Items on Proficiency Estimates Under the Ciecp Design Juan Chen, Andrew Mroch, Mengyao Zhang, Joanne Kane, Mark Connally and Mark Albanese, National Conference of Bar Examiners The authors explore the detection and impact of drifted common items on examinee proficiency estimates and examinee classification. Two different detection methods, two approaches to setting item parameter estimates, and two different linking methods are examined. Both practical and theoretical implications of the findings are discussed. 180 Washington, DC, USA Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 16, Meeting Room Level, Paper Session, M5 The Great Subscore Debate Session Discussant: Sandip Sinharay, Pacific Metrics How Worthless Subscores Are Causing Excessively Long Tests Howard Wainer and Richard Feinberg, National Board of Medical Examiners Previous research overwhelmingly confirms the paucity of subscores worth reporting for either individuals or institutions. Given the excessive length of most standardized tests, particularly licensure/credentialing, offered without evidence to support reporting more than a single score, we illustrate an approach for reducing test length and minimizing additional pass/fail misclassification. An Alternative Perspective on Subscores and Their Value Yuanchao Emily Bo, Mark Hansen and Li Cai, University of California, Los Angeles; Charles Lewis, Educational Testing Service, Fordham University Recent work has shown that observed subscores are often worse predictors of true subscores than the total score. However, we propose here that it is the specific component of the subscore that should be used to judge its value. From this perspective, we reach a quite different conclusion. Masking Distinct and Reliable Subscores: A Call to Assess Added Value Invariance Joseph Rios, Educational Testing Service Subscore added value is commonly assessed for the total sample; however, this study found that up to 30% of examinees with added value can be masked when treating subscores as invariant across groups. Therefore, we should consider that subscores may be valid and reliable for some examinees and not all. Why Do Value Added Ratios Differ Under Different Scoring Approaches? Brian Leventhal, University of Pittsburgh; Jonathan Rubright, American Institute of Certified Public Accountants Using classical test theory, Haberman (2008) developed an approach to calculate whether a subscore has value in being reported. This paper shows how value added ratios differ under item response theory, and provides an empirical example showing how various scoring options under IRT impact this ratio. Accuracy of the Person-Level Index for Conditional Subscore Reporting Richard Feinberg and Mark Raymond, National Board of Medical Examiners Recent research has proposed a conditional index to detect subscore value for certain test takers when more conventional methods suggest not reporting at all. The current study furthers this research by investigating conditions under which conditional indices detect potentially meaningful score profiles that may be worthy of reporting. The Validity of Augmented Subscores When Used for Different Purposes Marc Gessaroli, National Board of Medical Examiners The validity of augmented subscores has been debated in the literature. This paper studies the validity of augmented subscores when they are used for different purposes. The findings suggest that the usefulness of augmented subscores varies depending upon the intended use of the scores. 181 2016 Annual Meeting & Training Sessions Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 12, Meeting Room Level, Paper Session, M6 Scores and Scoring Rules Session Discussant: Steven Culpepper, University of Illinois The Relationship Between Pass Rate and Multiple Attempts Ying Cheng and Cheng Liu, University of Notre Dame We analytically derive the relationship between expected conditional and marginal pass rate and the number of allowed attempts at a test under two definitions of pass rate. It is shown that depending on the definition, the pass rate can go up or down with the number of attempts. Classification Consistency and Accuracy with Atypical Score Distributions Stella Kim and Won-Chan Lee, The University of Iowa The primary purpose of this study is to evaluate relative performance of various procedures for estimating classification consistency and accuracy indices with atypical score distributions. Three simulation studies are conducted, each of which is associated with a peculiar observed score distribution. A Psychometric Evaluation of Item-Level Scoring Rules for Educational Tests Frederik Coomans and Han van der Maas, University of Amsterdam; Peter van Rijn, ETS Global, Amsterdam; Marjan Bakker, Tilburg University; Gunter Maris, Cito Institute for Educational Measurement and University of Amsterdam We develop a modeling framework in which psychometric models can be constructed directly from a scoring rule for dichotomous and polytomous items. By assessing the fit of such a model, we can infer the extent to which the population of test takers responds in accordance with the scoring rule. For Want of Subscores in Large-Scale Educational Survey Assessment:a Simulation Study Nuo Xi, Yue Jia, Xueli Xu and Longjuan Liang, Educational Testing Service The objective of the simulation study is to investigate the impact of varying length of content area subscales (overall and per examinee) for its prospective use in large-scale educational survey assessments. Sample size and estimation method are also controlled to evaluate the overall effect on group statistics estimation. Comparability of Essay Scores Across Response-Modes: A Complementary View Using Multiple Approaches Nina Deng and Jennifer Dunn, Measured Progress This study evaluates the comparability of essay scores between computer-typed vs. handwritten responses. Multiple approaches were integrated to provide a complementary view for assessing both statistical and practical significance of essay score differences at the factorial, scoring-dimension, and item levels. 182 Washington, DC, USA Monday, April 11, 2016 4:05 PM - 6:05 PM, Meeting Room 13/14, Meeting Room Level, Invited Session, M7 On the Use and Misuse of Latent Variable Scores Session Presenter: Anders Skrondal, Norwegian Institute of Public Health One major purpose of latent variable modeling is scoring of latent variables, such as ability estimation. Another purpose is investigation of the relationships among latent (and possibly observed) variables. In this case the stateof-the-art approach is simultaneous estimation of a measurement model (for the relationships between latent variables and items measuring them) and a structural model (for the relationships between different latent variables and between latent and observed variables). An alternative approach, that is considered naive, is to use latent variable scores as proxies for latent variables. Here, estimation is simplified by first estimating the measurement model and obtaining latent variable scores, and subsequently treating the latent variable scores as observed variables in standard regression analyses. This approach will generally produce invalid estimates for the target parameters in the structural model, but we will demonstrate that valid estimates can be obtained if the scoring methods are judiciously chosen. Furthermore, the proxy approach can be superior to the state-of-the-art approach because it protects against certain misspecifications and allows doubly-robust causal inference in a class of latent variable models. 183 2016 Annual Meeting & Training Sessions 184 Washington, DC, USA Participant Index A Bertling, Masha . . . . . . . . . . . . . . . . . . . . . 74, 120 Betebenner, Damian . . . . . . . . . . . . . . . . . . 44, 130 Betts, Joe . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Beverly, Tanesia . . . . . . . . . . . . . . . . . . . . . . . . 89 Beymer, Lisa . . . . . . . . . . . . . . . . . . . . . . . 74, 120 Bian, Yufang . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Blood, Ian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Bo, Yuanchao Emily . . . . . . . . . . . . . . . . . . . . . 181 Boeck, Paul De . . . . . . . . . . . . . . . . . . . . . . . . . 49 Bohrnstedt, George . . . . . . . . . . . . . . . . . . . . . 117 Bolt, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Bolton, Sarah . . . . . . . . . . . . . . . . . . . . . . . . . 157 Bond, Mark . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Bonifay, Wes . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Bottge, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 106 Boughton, Keith . . . . . . . . . . . . . . . . . . . . . 65, 152 Boulais, André-Philippe . . . . . . . . . . . . . . . . . . 164 Bowman, Trinell . . . . . . . . . . . . . . . . . . . . . . . 175 Boyer, Michelle . . . . . . . . . . . . . . . . . . . . . 122, 164 Bradshaw, Laine . . . . . . . . . . . . 74, 86, 120, 127, 127 Brandstrom, Adele . . . . . . . . . . . . . . . . . . . . . . 71 Brandt, Steffen . . . . . . . . . . . . . . . . . . . . . . . . 174 Braun, Henry . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Brennan, Robert . . . . . . . . . . . . . . . . . . .30, 63, 131 Brenner, Daniel . . . . . . . . . . . . . . . . . . . . . . . 160 Breyer, F. Jay . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Breyer, Jay . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Bridgeman, Brent . . . . . . . . . . . . . . . . . . . . . . 134 Briggs, Derek . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Brijmohan, Amanda . . . . . . . . . . . . . . . . . . . . 109 Broaddus, Angela . . . . . . . . . . . . . . . . . . . . . . . 72 Broer, Markus . . . . . . . . . . . . . . . . . . . . . . . . . 117 Brophy, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Brown, Derek . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Brown, Emily . . . . . . . . . . . . . . . . . . . . . . . . . 109 Brown, Nathaniel . . . . . . . . . . . . . . . . . . . . . . . 40 Brown, Terran . . . . . . . . . . . . . . . . . . . . . . . . . 169 Brusilovsky, Peter . . . . . . . . . . . . . . . . . . . . . . 106 Brussow, Jennifer . . . . . . . . . . . . . . . . . . . . . . . 91 Bryant, Rosalyn . . . . . . . . . . . . . . . . . . . . . . . . 76 Buchholz, Janine . . . . . . . . . . . . . . . . . . . . . . 132 Buckendahl, Chad . . . . . . . . . . . . . . 18, 83, 101, 112 Buckley, Barbara . . . . . . . . . . . . . . . . . . . . . . . 160 Buckley, Jack . . . . . . . . . . . . . . . . . . . . . . . . . 128 Budescu, David . . . . . . . . . . . . . . . . . . . . . . . 105 Bukhari, Nurliyana . . . . . . . . . . . . . . . . . . . . . . 65 Bulut, Okan . . . . . . . . . . . . . . . . . .60, 171, 171, 171 Burstein, Jill . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Bushaw, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Abad, Francisco . . . . . . . . . . . . . . . . . . . . . . . . 72 Adamson, David . . . . . . . . . . . . . . . . . . . . . . . 173 Adesope, Olusola . . . . . . . . . . . . . . . . . . . . . . . 88 Adhikari, Sam . . . . . . . . . . . . . . . . . . . . . . . . 146 Aguado, David . . . . . . . . . . . . . . . . . . . . . . . . . 72 Akbay, Lokman . . . . . . . . . . . . . . . . . . . . . . . . 74 Albanese, Mark . . . . . . . . . . . . . . . . . . . . . . . 180 Albano, Anthony . . . . . . . . . . . . . . . . . . . . 32, 163 Ali, Usama . . . . . . . . . . . . . . . . . . . . . 116, 154, 180 Allexsaht-Snider, Martha . . . . . . . . . . . . . . . . . 104 Almond, Russell . . . . . . . . . . . . . . . . . . . . . . . . 70 Alpert, Tony . . . . . . . . . . . . . . . . . . . . . . . . 67, 175 Alzen, Jessica . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Amati, Lucy . . . . . . . . . . . . . . . . . . . . . . . . . . 118 An, Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . 150, 170 Anderson, Daniel . . . . . . . . . . . . . . . . . . . . . . . 59 Andrews, Benjamin . . . . . . . . . . . . . . . . . . . 46, 105 Andrich, David . . . . . . . . . . . . . . . . . . . . . . . . 165 Ankenmann, Robert . . . . . . . . . . . . . . . . . . . . . 63 Antal, Judit . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Austin, Bruce . . . . . . . . . . . . . . . . . . . . . . . . . . 88 B Baker, Eva . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Baker, Ryan . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Bakker, Marjan . . . . . . . . . . . . . . . . . . . . . . . . 182 Balamuta, James . . . . . . . . . . . . . . . . . . . . . . . 155 Banks, Kathleen . . . . . . . . . . . . . . . . . . . . . . . 154 Bao, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Barocas, Solon . . . . . . . . . . . . . . . . . . . . . . . . . 48 Barrada, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Barrett, Michelle . . . . . . . . . . . . . . . . . . . . . 16, 113 Barry, Carol . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Barton, Karen . . . . . . . . . . . . . . . . . . . . . . . 53, 152 Bashkov, Bozhidar . . . . . . . . . . . . . . . . . . . . . . . 73 Baumer, Michal . . . . . . . . . . . . . . . . . . . . . . . . 83 Bazaldua, Diego Luna . . . . . . . . . . . . . . . . . . . 124 Beard, Jonathan . . . . . . . . . . . . . . . . . . . . . . . 128 Becker, Betsy . . . . . . . . . . . . . . . . . . . . . . . . . 172 Bejar, Isaac . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Bejar, Isaac I. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Belov, Dmitry . . . . . . . . . . . . . . . . . . . . . . . 54, 113 Bennett, Randy . . . . . . . . . . . . . 47, 47, 142, 160, 160 Benson, Martin . . . . . . . . . . . . . . . . . . . . . . . . . 53 Bertling, Jonas . . . . . . . . . . . . . . . . . . . . 17, 57, 57 Bertling, Maria . . . . . . . . . . . . . . . . . . . . . . . . . 53 185 2016 Annual Meeting & Training Sessions Participant Index Buxton, Cory . . . . . . . . . . . . . . . . . . . . . . . . . 104 Buzick, Heather . . . . . . . . . . . . . . . . . . . . . . . 150 Choe, Edison . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Choi, Hye-Jeong . . . . . . . . . . . . . . . . . . . . . . . 106 Choi, Ikkyu . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Choi, In-Hee . . . . . . . . . . . . . . . . . . . . . . . 68, 118 Choi, Jinah . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Choi, Jiwon . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Choi, Kilchan . . . . . . . . . . . . .151, 151, 151, 151, 162 Christ, Theodore . . . . . . . . . . . . . . . . . . . . . . . 167 Chu, Kwang-lee . . . . . . . . . . . . . . . . . . . . . . . 166 Chung, Kyung Sun . . . . . . . . . . . . . . . . . . . . . 133 Chung, Seunghee . . . . . . . . . . . . . . . . . . . . 63, 123 Ci, Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Circi, Ruhan . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Cizek, Greg . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Clark, Amy . . . . . . . . . . . . . . . . . . . . . . . . 51, 178 Clauser, Amanda . . . . . . . . . . . . . . . . . . . . 43, 178 Clauser, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 142 Clauser, Jerome . . . . . . . . . . . . . . . . . . . . . 63, 142 Cohen, Allan . . . . . . . . . . . . . . . . . . . . . . 104, 106 Cohen, Allan S. . . . . . . . . . . . . . . . . . . . . . . . . . 73 Cohen, Jon . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Cohen, Michael . . . . . . . . . . . . . . . . . . . . . . . 100 Colvin, Kimberly . . . . . . . . . . . . . . . . . . . . . . . 107 Conaway, Carrie . . . . . . . . . . . . . . . . . . . . . . . . 67 Conforti, Peter . . . . . . . . . . . . . . . . . . . . . . . . 159 Confrey, Jere . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Connally, Mark . . . . . . . . . . . . . . . . . . . . . . . . 180 Cook, Linda . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Coomans, Frederik . . . . . . . . . . . . . . . . . . . . . 182 Cottrell, Nicholas . . . . . . . . . . . . . . . . . . . . . . . 84 Crabtree, Ashleigh . . . . . . . . . . . . . . . . . . . . . . 62 Crane, Samuel . . . . . . . . . . . . . . . . . . . . . . . . 147 Croft, Michelle . . . . . . . . . . . . . . . . . . . . . . . . . 55 Crouch, Lori . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Cui, Zhongmin . . . . . . . . . . . . . . . . . . . . . . 85, 106 Cukadar, Ismail . . . . . . . . . . . . . . . . . . . . . . . . 120 Culpepper, Steven . . . . . . . . . . . . . . . . . . 155, 182 Cúri, Mariana . . . . . . . . . . . . . . . . . . . . . . . . . . 90 C Cahill, Aoife . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Cai, Li . . . 27, 130, 143, 151, 151, 155, 162, 162, 170, 181 Cai, Liuhan . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Cai, Yan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Cain, Jessie Montana . . . . . . . . . . . . . . . . . . . . 134 Caliço, Tiago . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Camara, Wayne . . . . . . . . . . . . . . . . 56, 81, 112, 145 Camilli, Greg . . . . . . . . . . . . . . . . . . . . . . . . . 133 Canto, Phil . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Carstensen, Claus . . . . . . . . . . . . . . . . . . . . . . 155 Casabianaca, Jodi . . . . . . . . . . . . . . . . . . . . . . . 91 Casabianca, Jodi . . . . . . . . . . . . . . 159, 159, 159, 159 Castellano, Katherine Furgol . . . . . . 53, 59, 80, 80, 80, 130, 136, 141 Cavalie, Carlos . . . . . . . . . . . . . . . . . . . . . . . . . 84 Chajewski, Michael . . . . . . . . . . . . . . . . . . . . . . 50 Chametsky, Matt . . . . . . . . . . . . . . . . . . . . . . 179 Champlain, André De . . . . . . . . . . . . . . . . . . . 164 Chang, Hua-Hua . . . . . . . . . . . . . . . . . . . . . .61, 65 Chang, Hua-hua . . . . . . . . . . . . . . . . . . . . . . . . 91 Chang, Hua-Hua . . . . . . . . . . . . . . . . . . . . . . . 116 Chattergoon, Rajendra . . . . . . . . . . . . . . . . 126, 153 Chatterji, Madhabi . . . . . . . . . . . . . . . . . . . . . 117 Chayer, David . . . . . . . . . . . . . . . . . . . . . . . . . 152 Chen, Feng . . . . . . . . . . . . . . . . . . . . . . . . 72, 122 Chen, Hanwei . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chen, Hui-Fang . . . . . . . . . . . . . . . . . . . . . . . 108 Chen, I-Chien . . . . . . . . . . . . . . . . . . . . . . . . . 146 Chen, Jie . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Chen, Jing . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Chen, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Chen, Keyu . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chen, Pei-Hua . . . . . . . . . . . . . . . . . . . . . . . . . 83 Chen, Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Chen, Tingting . . . . . . . . . . . . . . . . . . . . . . . . . 83 CHEN, TINGTING . . . . . . . . . . . . . . . . . . . . . . . 135 Chen, Xin . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Cheng, Britte . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Cheng, Ying . . . . . . . . . . . . . . . . . . . . . . . 154, 182 Chien, Yuehmei . . . . . . . . . . . . . . . . . . . . . . . . 86 Childs, Ruth . . . . . . . . . . . . . . . . . . . . . . . 109, 161 Cho, Youngmi . . . . . . . . . . . . . . . . . . . . . . . . . 62 Cho, YoungWoo . . . . . . . . . . . . . . . . . . . . . . . . 70 D d’Brot, Juan . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Dabbs, Beau . . . . . . . . . . . . . . . . . . . . . . . . . 146 Dadey, Nathan . . . . . . . . . . . . . . . . . . 107, 126, 167 Dai, Shenghai . . . . . . . . . . . . . . . . . . . . . . 62, 135 Daniels, Vijay . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Davey, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Davey, Timothy . . . . . . . . . . . . . . . . . . 66, 169, 169 186 Washington, DC, USA Participant Participant Index Index F Davier, Alina von . . . .22, 41, 48, 60, 60, 85, 90, 105, 118 Davier, Matthias von . . . . . . . . . . . . . . . . . . . . . 57 Davier, Matthias Von . . . . . . . . . . . . . . . . . . . . . 68 Davier, Matthias von . . . . . . . .103, 103, 103, 103, 155 Davis, Laurie . . . . . . . . . . . . . . . . . . . . . 84, 84, 142 Davis-Becker, Susan . . . . . . . . . . . . . . . . . . . . . . 71 Deane, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Debeer, Dries . . . . . . . . . . . . . . . . . . . . . . . . . 116 DeCarlo, Larry . . . . . . . . . . . . . . . . . . . . . . . . . 61 DeCarlo, Lawrence . . . . . . . . . . . . . . . . . . . . . . 86 DeMars, Christine . . . . . . . . . . . . . . . . . . . . 73, 107 Denbleyker, Johnny . . . . . . . . . . . . . . . . . . . . . 130 Deng, Hui . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Deng, Nina . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Deters, Lauren . . . . . . . . . . . . . . . . . . . . . . . . . 43 Dhaliwal, Tasmin . . . . . . . . . . . . . . . . . . . . . . . 127 Diakow, Ronli . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Diao, Hongyu . . . . . . . . . . . . . . . . . . . . . . 121, 165 DiBello, Lou . . . . . . . . . . . . . . . . . . . . . . . . . . 136 DiCerbo, Kristen . . . . . . . . . . . . . . . . . . . . . . . 127 Ding, Shuliang . . . . . . . . . . . . . . . . . . . . . . 91, 116 Dodd, Barbara . . . . . . . . . . . . . . . . . . . . . . . . . 91 Dodson, Jenny . . . . . . . . . . . . . . . . . . . . . . . . . 42 Dolan, Bob . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Domingue, Benjamin . . . . . . . . . . . . . . . . . . . . . 89 Donoghue, John . . . . . . . . . . . . 52, 52, 115, 119, 133 Dorans, Neil . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Dorn, Sherman . . . . . . . . . . . . . . . . . . . . . . . . 157 Doromal, Justin . . . . . . . . . . . . . . . . . . . . . . . 115 Drasgow, Fritz . . . . . . . . . . . . . . . . . . . . . . . . 142 Du, Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 143, 169 Dunbar, Stephen . . . . . . . . . . . . . . . . . . . 108, 145 Dunbar, Steve . . . . . . . . . . . . . . . . . . . . . . . . . 43 Dunn, Jennifer . . . . . . . . . . . . . . . . . . . . . . 18, 182 Dunya, Beyza Aksu . . . . . . . . . . . . . . . . . . . . . . 60 Fabrizio, Lou . . . . . . . . . . . . . . . . . . . . . . . . . 168 Fahle, Erin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Famularo, Lisa . . . . . . . . . . . . . . . . . . . . . . . . . 63 Fan, Meichu . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Fan, Yuyu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Fang, Guoliang . . . . . . . . . . . . . . . . . . . . . . . . 136 Fang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Farley, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Feinberg, Richard . . . . . . . . . . . . . . . . . . . 181, 181 Ferrara, Steve . . . . . . . . . . . . . . . . . . . . . . . .28, 49 Fina, Anthony . . . . . . . . . . . . . . . . . . . . . . . . 165 Finch, Holmes . . . . . . . . . . . . . . . . . . . . . . . . 167 Finger, Michael . . . . . . . . . . . . . . . . . . . . . . . . 163 Finn, Chester . . . . . . . . . . . . . . . . . . . . . . . . . 100 Foltz, Peter . . . . . . . . . . . . . . . . . . . . . . . . 29, 179 Forte, Ellen . . . . . . . . . . . . . . . . . . . . . . . . 79, 112 Freeman, Leanne . . . . . . . . . . . . . . . . . . . . . . 164 French, Brian . . . . . . . . . . . . . . . . . . . . . . . 88, 167 Frey, Andreas . . . . . . . . . . . . . . . . . . . . . . . . . 153 Fu, Yanyan . . . . . . . . . . . . . . . . . . . . . . . . . 86, 136 Fung, Karen . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Futagi, Yoko . . . . . . . . . . . . . . . . . . . . . . . . . . 179 G Gafni, Naomi . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Gandara, Fernanda . . . . . . . . . . . . . . . . . . . . . 162 Gao, Furong . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Gao, Lingyun . . . . . . . . . . . . . . . . . . . . . . . . . 109 Gao, Xiaohong . . . . . . . . . . . . . 58, 90, 136, 162, 166 Garcia, Alejandra . . . . . . . . . . . . . . . . . . . . . . 162 Garner, Holly . . . . . . . . . . . . . . . . . . . . . . . . . 173 Gawade, Nandita . . . . . . . . . . . . . . . . . . . . . . . 59 Geis, Eugene . . . . . . . . . . . . . . . . . . . . . . . . . 133 Geisinger, Kurt . . . . . . . . . . . . . . . . . . . . . 142, 177 Gelbal, Selahattin . . . . . . . . . . . . . . . . . . . . . . . 45 Gessaroli, Marc . . . . . . . . . . . . . . . . . . . . . . . . 181 Gianopulos, Garron . . . . . . . . . . . . . . . . . . . . . . 44 Gierl, Mark . . . . . . . . . . . . . . . . . . . 53, 83, 142, 147 Gill, Brian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Gitchel, Dent . . . . . . . . . . . . . . . . . . . . . . . . . 148 Glazer, Nancy . . . . . . . . . . . . . . . . . . . . . . . . . 179 Goldhammer, Frank . . . . . . . . . . . . . . . 133, 153, 174 Gong, Brian . . . . . . . . . . . . . . . . . . . . . . . 107, 126 Gonzalez, Oscar . . . . . . . . . . . . . . . . . . . . . . . 120 González-Brenes, José . . . . . . . . . . . . . 106, 144, 144 González-Brenes, José Pablo . . . . . . . . . . . . . . . . 53 E Easton, John . . . . . . . . . . . . . . . . . . . . . . . . . 157 Egan, Karla . . . . . . . . . . . . . . . . . . . . . 18, 101, 102 Embretson, Susan . . . . . . . . . . . . . . . . . . . . . . . 49 Engelhardt, Lena . . . . . . . . . . . . . . . . . . . . . . 153 Ercikan, Kadriye . . . . . . . . . . . . . . . . . . . . . . . . 40 Erickan, Kadriye . . . . . . . . . . . . . . . . . . . . . . . . 99 Evans, Carla . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Ewing, Maureen . . . . . . . . . . . . . . . . . . . . . . . 128 187 2016 Annual Meeting & Training Sessions Participant Index Gotch, Chad . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Grabovsky, Irina . . . . . . . . . . . . . . . . . . . . 131, 148 Graesser, Art . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Graf, Edith Aurora . . . . . . . . . . . . . . . . . . . . 49, 109 Greiff, Samuel . . . . . . . . . . . . . . . . . . . . . . 41, 103 Griffin, Patrick . . . . . . . . . . . . . . . . . . . . . . . . . 89 Grochowalski, Joe . . . . . . . . . . . . . . . . . . . . . . . 58 Grochowalski, Joseph . . . . . . . . . . . . . . . . . . . 148 Groos, Janet Koster van . . . . . . . . . . . . . . . . . . . 88 Grosse, Philip . . . . . . . . . . . . . . . . . . . . . . . . . 123 Gu, Lixiong . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Guerreiro, Meg . . . . . . . . . . . . . . . . . . . . . . . . 108 Gunter, Stephen . . . . . . . . . . . . . . . . . . . . . . . 163 Guo, Hongwen . . . . . . . . . . . . . . . . . . . .65, 88, 119 Guo, Qi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Guo, Rui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Henson, Robert . . . . . . . . . . . . . . . . . . 86, 110, 136 Herman, Joan . . . . . . . . . . . . . . . . . . . . . . . . . 47 Herrera, Bill . . . . . . . . . . . . . . . . . . . . . . . . 43, 108 Heuvel, Jill R. van den . . . . . . . . . . . . . . . . . . . 141 Hillier, Tracey . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Himelfarb, Igor . . . . . . . . . . . . . . . . . . . . . . . . 136 Ho, Andrew . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Ho, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Hochstedt, Kirsten . . . . . . . . . . . . . . . . . . . . . 123 Hochweber, Jan . . . . . . . . . . . . . . . . . . . . . . . . 43 Hoff, David . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Hogan, Thomas . . . . . . . . . . . . . . . . . . . . . . . 110 Holmes, Stephen . . . . . . . . . . . . . . . . . . . . . . 173 Hong, Guanglei . . . . . . . . . . . . . . . . . . . . . . . . 21 Hou, Likun . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Hou, Xiaodong . . . . . . . . . . . . . . . . . . . . . . . . 171 Houts, Carrie R. . . . . . . . . . . . . . . . . . . . . . . . . . 27 Huang, Cheng-Yi . . . . . . . . . . . . . . . . . . . . . . . 83 Huang, Chi-Yu . . . . . . . . . . . . . . . . . . . . . . . . 164 Huang, Kevin (Chun-Wei) . . . . . . . . . . . . . . . . . 160 Huang, Xiaorui . . . . . . . . . . . . . . . . . . . . . . . . . 46 Huang, Yun . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Huff, Kristen . . . . . . . . . . . . . . . . . . . . . . . . . 149 Huggins-Manley, Anne . . . . . . . . . . . . . . . . . . . 180 Huggins-Manley, Anne Corinne . . . . . . . . . . . . . . 46 Hughes, Malorie . . . . . . . . . . . . . . . . . . . . . . . 147 Huh, Nooree . . . . . . . . . . . . . . . . . . . . . . 164, 180 Hunter, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Huo, Yan . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 143 Hurtz, Gregory . . . . . . . . . . . . . . . . . . . . . . 54, 134 Hwang, Dasom . . . . . . . . . . . . . . . . . . . . . . . . .75 H Haberman, Shelby . . . . . . . . . . . . . . . . . . . 80, 150 Hacker, Miriam . . . . . . . . . . . . . . . . . . . . . . . . 133 Haertel, Edward . . . . . . . . . . . . . . . . . . . . . . . 142 Hain, Bonnie . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Hakuta, Kenji . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Hall, Erika . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Han, HyunSuk . . . . . . . . . . . . . . . . . . . . . . . . . 46 Han, Kyung Chris . . . . . . . . . . . . . . . . . . . . . . . 22 Han, Zhuangzhuang . . . . . . . . . . . . . . . . . . . . 103 Hansen, Mark . . . . . . . . . . . . . . . . .73, 135, 162, 181 Hao, Jiangang . . . . . . . . . . . . . . . . . . . . . . . . . 41 Happel, Jay . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Harnly, Aaron . . . . . . . . . . . . . . . . . . . . . . . . . 147 Harrell, Lauren . . . . . . . . . . . . . . . . . . . . . . 57, 155 Harring, Jeffrey . . . . . . . . . . . . . . . . . . . . . . . . . 62 Harris, Debora . . . . . . . . . . . . . . . . . . . . . . . . 152 Harris, Deborah . . . . . . . . . . . . . . . . . . . . . 25, 180 Hartig, Johannes . . . . . . . . . . . . . . . . . . . . 43, 132 Hattie, John . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Hayes, Benjamin . . . . . . . . . . . . . . . . . . . . . . . . 51 Hayes, Heather . . . . . . . . . . . . . . . . . . . . . . . . 163 Hayes, Stacy . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Hazen, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . 161 He, Qingping . . . . . . . . . . . . . . . . . . . . . . 154, 173 He, Qiwei . . . . . . . . . . . . . . . . . . . . . 103, 103, 103 He, Yong . . . . . . . . . . . . . . . . . . . . . . . .70, 85, 106 Hebert, Andrea . . . . . . . . . . . . . . . . . . . . . . . 135 Hembry, Tracey . . . . . . . . . . . . . . . . . . . . . . . 127 Hendrie, Caroline . . . . . . . . . . . . . . . . . . . . . . 149 I Iaconangelo, Charles . . . . . . . . . . . . . . . . . . 80, 123 III, Kenneth J Daly . . . . . . . . . . . . . . . . . . . . . . 168 Ing, Pamela . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Insko, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Insko, William . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Invernizzi, Marcia . . . . . . . . . . . . . . . . . . . . . . . 50 Irribarra, David Torres . . . . . . . . . . . . . . . . . . . . 68 Iverson, Andrew . . . . . . . . . . . . . . . . . . . . . . . . 76 J Jacovidis, Jessica . . . . . . . . . . . . . . . . . . . . . . 107 Jang, Hyesuk . . . . . . . . . . . . . . . . . . . . 73, 171, 171 Jang, Yoonsun . . . . . . . . . . . . . . . . . . . . . . . . . 77 188 Washington, DC, USA Participant Index Jess, Nicole . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Jewsbury, Paul . . . . . . . . . . . . . . . . . . . . . . . . . 57 Ji, Grace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Jia, Helena . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Jia, Yue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Jiang, Shengyu . . . . . . . . . . . . . . . . . . . . . . . . . 75 Jiang, Yanming . . . . . . . . . . . . . . . . . . . . . 169, 169 Jiang, Zhehan . . . . . . . . . . . . . . . . . . . . . . . . 134 Jiao, Hong . . . . . . . . . . . . . . . . 46, 89, 109, 169, 170 Jin, Kuan-Yu . . . . . . . . . . . . . . . . . . . . . . . 107, 108 Jin, Rong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Johnson, Evelyn . . . . . . . . . . . . . . . . . . . . . 74, 120 Johnson, Marc . . . . . . . . . . . . . . . . . . . . . . . . 166 Johnson, Matthew . . . . . . . . . . . . . . . . . . . . 52, 80 Jones,, Ryan Seth . . . . . . . . . . . . . . . . . . . . . . . 44 Joo, Seang-hwane . . . . . . . . . . . . . . . . . . . . . . 148 Ju, Unhee . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Julian, Marc . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Julrich, Daniel . . . . . . . . . . . . . . . . . . . . . . . . 131 Jung, KwangHee . . . . . . . . . . . . . . . . . . . . . . . .62 Junker, Brian . . . . . . . . . . . . . . . . . . . 159, 159, 159 Kim, Dong-in . . . . . . . . . . . . . . . . . . . . . . . . . 152 Kim, Doyoung . . . . . . . . . . . . . . . . . . . . . . . . . 85 Kim, Han Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Kim, Hyung Jin . . . . . . . . . . . . . . . . . . . . . . . . 131 Kim, Ja Young . . . . . . . . . . . . . . . . . . . . . . . . . 70 Kim, Jinok . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Kim, Jong . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Kim, JP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Kim, Jungnam . . . . . . . . . . . . . . . . . . . . . . 64, 152 Kim, Meereem . . . . . . . . . . . . . . . . . . . . . . . . 104 Kim, Nana . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Kim, Se-Kang . . . . . . . . . . . . . . . . . . . . . . . 58, 148 Kim, Seohyun . . . . . . . . . . . . . . . . . . . . . . . . 104 Kim, Sooyeon . . . . . . . . . . . . . . . . . . . . . 64, 64, 90 Kim, Stella . . . . . . . . . . . . . . . . . . . . . . . . 121, 182 Kim, Sunhee . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Kim, Wonsuk . . . . . . . . . . . . . . . . . . . . . . . . . 135 Kim, Yongnam . . . . . . . . . . . . . . . . . . . . . . . . 151 Kim, Young Yee . . . . . . . . . . . . . . . . . . . . . . 19, 117 King, Kristin . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Kingston, Neal . . . . . . . . . . . . . . . . . . . . . . 47, 178 Kingston, Neal Martin . . . . . . . . . . . . . . . . . . . 134 Klieme, Eckhard . . . . . . . . . . . . . . . . . . . . . . . 174 Kobrin, Jennifer . . . . . . . . . . . . . . . . . . . . . 89, 127 Kögler, Kristina . . . . . . . . . . . . . . . . . . . . . . . . 174 Köhler, Carmen . . . . . . . . . . . . . . . . . . . . . . . 155 Koklu, Onder . . . . . . . . . . . . . . . . . . . . . . . . . 172 Kolen, Michael . . . . . . . . . . . . . . . . . . . . . . 30, 142 Kong, Xiaojing . . . . . . . . . . . . . . . . . . . . . . .84, 84 Konold, Tim . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Kosh, Audra . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Kroehne, Ulf . . . . . . . . . . . . . . . . . . . . . . . . . 174 Kröhne, Ulf . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Krost, Kevin . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Krovetz, Bob . . . . . . . . . . . . . . . . . . . . . . . . . 179 Kuger, Susanne . . . . . . . . . . . . . . . . . . . . . . . 174 Kuhfeld, Megan . . . . . . . . . . . . . . . . . 143, 151, 170 Kuo, Tzu Chun . . . . . . . . . . . . . . . . . . . . . . . . 124 Kupermintz, Haggai . . . . . . . . . . . . . . . . . . . . . 41 Kyllonen, Patrick . . . . . . . . . . . . . . . . . . .17, 41, 177 K Kaliski, Pamela . . . . . . . . . . . . . . . . . . . . . . 40, 128 Kamenetz, Anya . . . . . . . . . . . . . . . . . . . . . . . 149 Kane, Joanne . . . . . . . . . . . . . . . . . . . . . . . . . 180 Kane, Michael . . . . . . . . . . . . . . . . . . . . . . . . 145 Kang, Hyeon-Ah . . . . . . . . . . . . . . . . . . . . . . . 119 Kang, Yoon Jeong . . . . . . . . . . . . . . . . . . . . . . 109 Kang, Yujin . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Kannan, Priya . . . . . . . . . . . . . 43, 175, 178, 178, 178 Kanneganti, Raghuveer . . . . . . . . . . . . 129, 129, 150 Kao, Shu-chuan . . . . . . . . . . . . . . . . . . . . . . . 166 Kaplan, David . . . . . . . . . . . . . . . . . . . . . . . . . 57 Kapoor, Shalini . . . . . . . . . . . . . . . . . . . . . . . . . 43 Karadavut, Tugba . . . . . . . . . . . . . . . . . . . . . . . 73 Karvonen, Meagan . . . . . . . . . . . . . . 51, 79, 89, 178 Keller, Lisa . . . . . . . . . . . . . . . . . . . . . . 18, 135, 165 Keller, Rob . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Kelly, Justin . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Keng, Leslie . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Kenyon, Dorry . . . . . . . . . . . . . . . . . . . . . . .42, 79 Kern, Justin . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Khan, Gulam . . . . . . . . . . . . . . . . . . . . . . . . . 109 Kieftenbeld, Vincent . . . . . . . . . . . 104, 129, 150, 164 Kilinc, Murat . . . . . . . . . . . . . . . . . . . . . . . . . 148 Kim, Dong-In . . . . . . . . . . . . . . . . . . . . .64, 65, 152 L LaFond, Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Lai, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Lai, Hollis . . . . . . . . . . . . . . . . . . . . . . . . . 53, 142 Laitusis, Cara . . . . . . . . . . . . . . . . . . . . . . . . . 142 Lane, Suzanne . . . . . . . . . . . . . . . . . . . .40, 79, 128 189 2016 Annual Meeting & Training Sessions Participant Index Lao, Hongling . . . . . . . . . . . . . . . . . . . . . . . . . 86 Larsson, Lisa . . . . . . . . . . . . . . . . . . . . . . . . . 105 Lash, Andrea . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Lathrop, Quinn . . . . . . . . . . . . . . . . . . . . . . .50, 91 Latifi, Syed Muhammad Fahad . . . . . . . . . . . . . . 147 Lawless, Rene . . . . . . . . . . . . . . . . . . . . . . . . 177 Lawson, Janelle . . . . . . . . . . . . . . . . . . . . . . . 115 Leacock, Claudia . . . . . . . . . . . . . . . . . . 29, 104, 129 Lebeau, Adena . . . . . . . . . . . . . . . . . . . . . . . . . 45 LeBeau, Brandon . . . . . . . . . . . . . . . . . . . . . . . 56 Lee, Chansoon . . . . . . . . . . . . . . . . . . . . . . . . 109 Lee, Chong Min . . . . . . . . . . . . . . . . . . . . . . . 179 Lee, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Lee, HyeSun . . . . . . . . . . . . . . . . . . . . . . . . . 167 Lee, Philseok . . . . . . . . . . . . . . . . . . . . . . . . . 148 Lee, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Lee, Sora . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Lee, Won-Chan . . . . . 63, 70, 90, 105, 131, 131, 180, 182 Lee, Woo-yeol . . . . . . . . . . . . . . . . . . . . . . . . . 75 Lee, Yi-Hsuan . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Lee, Young-Sun . . . . . . . . . . . . . . . . . . . . . . 61, 86 Lei, Ming . . . . . . . . . . . . . . . . . . . . . . . . . 171, 171 Leibowitz, Emily . . . . . . . . . . . . . . . . . . . . 178, 178 Leighton, Jacqueline . . . . . . . . . . . . . . . . . . 63, 127 Leventhal, Brian . . . . . . . . . . . . . . . 74, 77, 120, 181 Levy, Roy . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Lewis, Charles . . . . . . . . . . . . . . 45, 52, 85, 148, 181 Li, Chen . . . . . . . . . . . . . . . . . . . . . . 150, 170, 174 Li, Cheng-Hsien . . . . . . . . . . . . . . . . . . . . . . . . 62 Li, Dongmei . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Li, Feifei . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 143 Li, Feiming . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Li, Isaac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Li, Jie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Li, Ming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Li, Tongyun . . . . . . . . . . . . . . . . . . . . . . . . .62, 65 Li, Xiaomin . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Li, Xin . . . . . . . . . . . . . . . . . . . 25, 70, 110, 118, 164 Li, Ying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Li, Zhushan . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Liang, Longjuan . . . . . . . . . . . . . . . . . . . . 161, 182 Liao, Chi-Wen . . . . . . . . . . . . . . . . . . . . . . . . . 116 Liao, Dandan . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Liaw, Yuan-Ling . . . . . . . . . . . . . . . . . . . . . . . 163 Lievens, Filip . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Lim, Euijin . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Lim, EunYoung . . . . . . . . . . . . . . . . . . . . . . . . . 92 Lim, MiYoun . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Lin, Chih-Kai . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Lin, Haiyan . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Lin, Johnny . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Lin, Meiko . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Lin, Pei-ying . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Lin, Peng . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Lin, Ye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Linden, Wim van der . . . . . . . . . . . . . . . . . . 83, 113 Ling, Guangming . . . . . . . . . . . . . . . . . . . . 84, 129 Lissitz, Robert . . . . . . . . . . . . . . . . . . . . 46, 89, 109 Liu, Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Liu, Chunyan . . . . . . . . . . . . . . . . . . . . . . . .85, 85 Liu, Hongyun . . . . . . . . . . . . . . . . . . . . . . 107, 110 Liu, Jinghua . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Liu, Lei . . . . . . . . . . . . . . . . . . . . . . . . . . . .41, 41 Liu, lou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Liu, Ou Lydia . . . . . . . . . . . . . . . . . . . . . . . . . 167 Liu, Qiongqiong . . . . . . . . . . . . . . . . . . . . . . . 161 Liu, Ren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Liu, Ruitao . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Liu, Xiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Liu, Yang . . . . . . . . . . . . . . . . . . . . . . 119, 132, 153 Liu, Yanlou . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Liu, Yue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Lockwood, J.R. . . . . . . . . . . . . .80, 80, 80, 80, 80, 130 Lockwood, John . . . . . . . . . . . . . . . . . . . . . . . 165 Longabach, Tanya . . . . . . . . . . . . . . . . . . . . . . . 64 Lopez, Alexis . . . . . . . . . . . . . . . . . . . . . . . 84, 162 Lopez, Melissa . . . . . . . . . . . . . . . . . . . . . . . . 179 Lord-Bessen, Jennifer . . . . . . . . . . . . . . . . . . . . . 71 Lorié, William . . . . . . . . . . . . . . . . . . . . . . 117, 144 Lottridge, Susan . . . . . . . . . . . . . . . . . . . . . . . 104 Loughran, Jessica . . . . . . . . . . . . . . . . . . . . .65, 91 Loukina, Anastassia . . . . . . . . . . . . 134, 150, 179, 179 Loveland, Mark . . . . . . . . . . . . . . . . . . . . . . . . 160 Lu, Chi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Lu, Lucy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Lu, Ru . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64, 64 Lu, Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Lu, Ying . . . . . . . . . . . . . . . . . . . . . . . . . . 66, 133 Lu, Zhenqui . . . . . . . . . . . . . . . . . . . . . . . . . . 104 LUO, Fen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Luo, Xiao . . . . . . . . . . . . . . . . . . . . . . . . . . 45, 85 Luo, Xin . . . . . . . . . . . . . . . . . . . . . . . 45, 123, 165 Lynch, Ryan . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Lyons, Susan . . . . . . . . . . . . . . . . . . . . . . . . . 126 190 Washington, DC, USA Participant Index M Mix, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Miyazaki, Yasuo . . . . . . . . . . . . . . . . . . . . . . . 132 Monroe, Scott . . . . . . . . . . . . . . . . . . . . . . 73, 130 Montee, Meg . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Montee, Megan . . . . . . . . . . . . . . . . . . . . . . . . 42 Moon, Jung Aa . . . . . . . . . . . . . . . . . . . . . . . . . 88 Moretti, Antonio . . . . . . . . . . . . . . . . . . . . 144, 144 Morgan, Deanna . . . . . . . . . . . . . . . . . . . . . 71, 115 Morin, Maxim . . . . . . . . . . . . . . . . . . . . . . . . 164 Morris, Carrie . . . . . . . . . . . . . . . . . . . . . . 118, 148 Morris, John . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Morrisey, Sarah . . . . . . . . . . . . . . . . . . . . . . . 163 Morrison, Kristin . . . . . . . . . . . . . . . . . . . . . .49, 84 Moses, Tim . . . . . . . . . . . . . . . . . . . . . . . 128, 134 Mroch, Andrew . . . . . . . . . . . . . . . . . . . . . . . 180 Mueller, Lorin . . . . . . . . . . . . . . . . . . . . 54, 119, 166 Mulholland, Matthew . . . . . . . . . . . . . . . . . . . 104 Muntean, William . . . . . . . . . . . . . . . . . . . . . . 173 Murphy, Stephen . . . . . . . . . . . . . . . . . . . . . 50, 71 Musser, Samantha . . . . . . . . . . . . . . . . . . . . . . . 42 Ma, Wenchao . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Maas, Han van der . . . . . . . . . . . . . . . . . . . . . 182 MacGregor, David . . . . . . . . . . . . . . . . . . . . . . . 42 Macready, George . . . . . . . . . . . . . . . . . . . . . . . 62 Madnani, Nitin . . . . . . . . . . . . . . . . . . . . . . 48, 179 Maeda, Hotaka . . . . . . . . . . . . . . . . . . . . . . . . . 75 Magaram, Eric . . . . . . . . . . . . . . . . . . . . . . . . 136 Magnus, Brooke . . . . . . . . . . . . . . . . . . . . . . . 153 Malone, Meg . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Mao, Liyang . . . . . . . . . . . . . . . . . . . . . . . . 65, 104 Mao, Xia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Marais, Ida . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Margolis, Melissa . . . . . . . . . . . . . . . . . . . . . . 142 Marini, Jessica . . . . . . . . . . . . . . . . . . . . . . . . 128 Marion, Scott . . . . . . . . . . . . . . . . . 40, 44, 126, 126 Maris, Gunter . . . . . . . . . . . . . . . . . . . . . . . . . 182 Martineau, Joe . . . . . . . . . . . . . . . . . . . . . . . . 158 Martineau, Joseph . . . . . . . . . . . . . . . . . . . 44, 101 Martínez, Jr, Carlos . . . . . . . . . . . . . . . . . . . . . 168 Masri, Yasmine El . . . . . . . . . . . . . . . . . . . . . . 155 Masters, Jessica . . . . . . . . . . . . . . . . . . . . . . . . 63 Matlock, Ki . . . . . . . . . . . . . . . . . . . . . . . . 46, 148 Matos-Elefonte, Haifa . . . . . . . . . . . . . . . . 161, 177 Matovinovic, Donna . . . . . . . . . . . . . . . . . . . . . 67 Matta, Tyler . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Maul, Andrew . . . . . . . . . . . . . . . . . . . . . 117, 153 Mayfield, Elijah . . . . . . . . . . . . . . . . . . . . . . . . 173 Mazany, Terry . . . . . . . . . . . . . . . . . . . . . . . . . 100 McBride, Yuanyuan . . . . . . . . . . . . . . . . . . . .84, 84 McCaffrey, Daniel . . . . . . . . . . . . . . . . . .80, 80, 130 McCall, Marty . . . . . . . . . . . . . . . . . . . . . . . . . . 79 McClellan, Catherine . . . . . . . . . . . . . . . . . 115, 154 McKnight, Kathy . . . . . . . . . . . . . . . . . 144, 144, 158 McMillan, James H . . . . . . . . . . . . . . . . . . . . . 168 McTavish, Thomas . . . . . . . . . . . . . . . . . . . . . . 144 Meador, Chris . . . . . . . . . . . . . . . . . . . . . . . . . 53 Meadows, Michelle . . . . . . . . . . . . . . . . . . 154, 173 Mehta, Vandhana . . . . . . . . . . . . . . . . . . . . . . . 53 Meng, Xiangbing . . . . . . . . . . . . . . . . . . . . . . . 73 Mercado, Ricardo . . . . . . . . . . . . . . . . . . . . . . . 71 Meyer, Patrick . . . . . . . . . . . . . . . . . . . . . . 50, 115 Meyer, Robert . . . . . . . . . . . . . . . . . . . . . . . . . 59 Miel, Shayne . . . . . . . . . . . . . . . . . . . . . . 147, 173 Miller, Sherral . . . . . . . . . . . . . . . . . . . . . . . . . 128 Minchen, Nathan . . . . . . . . . . . . . . . . . . . . . . . 74 Mislevy, Robert . . . . . . . . . . . . . . . . . . . . . . . 177 Mix, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 N Naumann, Alexander . . . . . . . . . . . . . . . . . . . . . 43 Naumann, Johannes . . . . . . . . . . . . . . . . . . . . 153 Naumenko, Oksana . . . . . . . . . . . . . . . . . . . . . 136 Nebelsick-Gullet, Lori . . . . . . . . . . . . . . . . . . 43, 101 Nebelsick-Gullett, Lori . . . . . . . . . . . . . . . . . . . 108 Neito, Ricardo . . . . . . . . . . . . . . . . . . . . . . 74, 120 Nicewander, W. . . . . . . . . . . . . . . . . . . . . . . . . 51 Niekrasz, John . . . . . . . . . . . . . . . . . . . . . . . . 147 Nieto, Ricardo . . . . . . . . . . . . . . . . . . . . . . . . 159 Noh, Eunhee . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Norris, Mary . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Norton, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . 42 Nydick, Steven . . . . . . . . . . . . . . . . . . . . . . . . . 85 O O’Brien, Sue . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 O’Connor, Brendan . . . . . . . . . . . . . . . . . . . . . . 48 O’Leary, Timothy . . . . . . . . . . . . . . . . . . . . . . . .89 O’Reilly, Tenaha . . . . . . . . . . . . . . . . . . . . . . . 160 Oakes, Jeannie . . . . . . . . . . . . . . . . . . . . . . . . 125 Ogut, Burhan . . . . . . . . . . . . . . . . . . . . . . . . . 117 Oh, Hyeon-Joo . . . . . . . . . . . . . . . . . . . . . . . . . 63 Olea, Julio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Olgar, Süleyman . . . . . . . . . . . . . . . . . . . . . 70, 172 191 2016 Annual Meeting & Training Sessions Participant Index Oliveri, Maria Elena . . . . . . . . . . . . . . . . . . . . . 177 Olsen, James . . . . . . . . . . . . . . . . . . . . . . . . . 108 Oppenheim, Peter . . . . . . . . . . . . . . . . . . . . . 157 Orpwood, Graham . . . . . . . . . . . . . . . . . . . . . 109 Oshima, T. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Özdemir, Burhanettin . . . . . . . . . . . . . . . . . . . . 45 Quellmalz, Edys . . . . . . . . . . . . . . . . . . . . . . . 160 Quenemoen, Rachel . . . . . . . . . . . . . . . . . . . . 175 R Rahman, Nazia . . . . . . . . . . . . . . . . . . . . . . . . . 52 Rankin, Jenny . . . . . . . . . . . . . . . . . . . . . . . . 178 Rausch, Andreas . . . . . . . . . . . . . . . . . . . . . . . 174 Rawls, Anita . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Raymond, Mark . . . . . . . . . . . . . . . . . . . . . . . 181 Reboucas, Daniella . . . . . . . . . . . . . . . . . . . . . 154 Reckase, Mark . . . . . . . . . 20, 42, 45, 99, 116, 142, 165 Redell, Nick . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Reichenberg, Ray . . . . . . . . . . . . . . . . . .74, 74, 120 Renn, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . . 42 Reshetar, Rosemary . . . . . . . . . . . . . . . . . . . . . 128 Reshetnyak, Evgeniya . . . . . . . . . . . . . . . . . . . . 85 Ricarte, Thales . . . . . . . . . . . . . . . . . . . . . . . . . 90 Rich, Changhua . . . . . . . . . . . . . . . . . . . . . . . 110 Rick, Francis . . . . . . . . . . . . . . . . . . . . . . . . 43, 178 Rickels, Heather . . . . . . . . . . . . . . . . . . . . . . . 108 Rijiman, Frank . . . . . . . . . . . . . . . . . . . . . . . . 152 Rijmen, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Rijn, Peter van . . . . . . . . . . 63, 109, 116, 132, 155, 182 Rios, Joseph . . . . . . . . . . . . . . . . . . . . . . 132, 181 Risk, Nicole . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Roberts, Mary Roduta . . . . . . . . . . . . . . . . . . . . 64 Robin, Frederic . . . . . . . . . . . . . . . . . . . . . . 65, 119 Rodriguez, Michael . . . . . . . . 32, 56, 60, 153, 162, 171 Rogers, H. Jane . . . . . . . . . . . . . . . . . . . . . . . . 109 Rölke, Heiko . . . . . . . . . . . . . . . . . . . . . . . . . 174 Rollins, Jonathan . . . . . . . . . . . . . . . . . . . . 86, 115 Rome, Logan . . . . . . . . . . . . . . . . . . . . . . . . . 121 Romine, Russell Swinburne . . . . . . . . . . . .51, 89, 175 Roohr, Katrina . . . . . . . . . . . . . . . . . . . . . . . . 167 Rorick, Beth . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Rosen, Yigal . . . . . . . . . . . . . . . . . . . . . . . . .41, 41 Rosenstein, Mark . . . . . . . . . . . . . . . . . . . . . . 179 Roussos, Louis . . . . . . . . . . . . . . . . . . . . . . 54, 135 Rubright, Jonathan . . . . . . . . . . . . . 46, 85, 147, 181 Runyon, Christopher . . . . . . . . . . . . . . . . . . . . . 91 Rupp, André . . . . . . . . . . . . . . . . . . . . . . . . 29, 53 Rutkowski, Leslie . . . . . . . . . . . . . . . . . . . . 57, 118 Rutstein, Daisy . . . . . . . . . . . . . . . . . . . . . . 40, 147 P Pak, Seohong . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Palma, Jose . . . . . . . . . . . . . . . . . . . . . . . . 60, 153 Pan, Tianshu . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Papa, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Papageorgiou, Spyridon . . . . . . . . . . . . . . . . . . 178 Pardos, Zachary . . . . . . . . . . . . . . . . . . . . . . . 144 Park, Jiyoon . . . . . . . . . . . . . . . . . . . . . . . . 54, 119 Park, Trevor . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Park, Yoon Soo . . . . . . . . . . . . . . . . . . . . . . . . . 61 Pashley, Peter . . . . . . . . . . . . . . . . . . . . . . . 52, 89 Patel, Priyank . . . . . . . . . . . . . . . . . . . . . . . 71, 115 Patelis, Thanos . . . . . . . . . . . . . . . 112, 145, 145, 177 Patterson, Brian . . . . . . . . . . . . . . . . . . . . . . . 159 Patz, Rich . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Peabody, Michael . . . . . . . . . . . . . . . . . . . . . . . 71 Peck, Fred . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Peng, Luyao . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Perie, Marianne . . . . . . . . . . . . . . . . 65, 79, 108, 157 Peterson, Mary . . . . . . . . . . . . . . . . . . . . . . . . . 51 Phadke, Chaitali . . . . . . . . . . . . . . . . . . . . . . . 167 Pham, Duy . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Phan, Ha . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Phelan, Jonathan . . . . . . . . . . . . . . . . . . . . . . 117 Phillips, S E . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Phillips, S.E. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Plake, Barbara . . . . . . . . . . . . . . . . . . . . . 112, 158 Pohl, Steffi . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Polikoff, Morgan . . . . . . . . . . . . . . . . . . . . . . . . 67 Por, Han-Hui . . . . . . . . . . . . . . . . . . . . . . 105, 134 Powers, Donald . . . . . . . . . . . . . . . . . . . . . . . 134 Powers, Sonya . . . . . . . . . . . . . . . . . . . . . . . . 105 Q QIAN, HAIXIA . . . . . . . . . . . . . . . . . . . . . . . . . 134 Qian, Hong . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Qian, Jiahe . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Qiu, Xue-Lan . . . . . . . . . . . . . . . . . . . . . . . . . 148 Qiu, Yuxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 S Sabatini, John . . . . . . . . . . . . . . . . . . . . . . . . 160 192 Washington, DC, USA Participant Index Sabol, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Şahin, Füsun . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Sahin, Sakine Gocer . . . . . . . . . . . . . . . . . . . . . . 88 Saiar, Amin . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Sakano, Jenifer . . . . . . . . . . . . . . . . . . . . . . . . 179 Sakano, Jennifer . . . . . . . . . . . . . . . . . . . . . . . 179 Sakworawich, Arnond . . . . . . . . . . . . . . . . . . . 105 Salleb-Aouissi, Ansaf . . . . . . . . . . . . . . . . . 144, 144 Samonte, Kelli . . . . . . . . . . . . . . . . . . . . . . . . 165 Sanders, Elizabeth . . . . . . . . . . . . . . . . . . . . . . 163 Sandrock, Paul . . . . . . . . . . . . . . . . . . . . . . . . . 40 Sano, Makoto . . . . . . . . . . . . . . . . . . . . . . . . . 173 Sato, Edynn . . . . . . . . . . . . . . . . . . . . . . . . 89, 177 Sauder, Derek . . . . . . . . . . . . . . . . . . . . . . . . 120 Schmigdall, Jonathan . . . . . . . . . . . . . . . . . . . . 84 Schneider, Christina . . . . . . . . . . . . . . . . . . . . . 28 Schneider, Christy . . . . . . . . . . . . . . . . . . . . . . 179 Schultz, Matthew . . . . . . . . . . . . . . . . . . . . . . 147 Schwarz, Rich . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Schwarz, Richard . . . . . . . . . . . . . . . . . . . . . . . 88 Scott, Lietta . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Secolsky, Charles . . . . . . . . . . . . . . . . . . . . . . 136 Sedivy, Sonya . . . . . . . . . . . . . . . . . . . . . . . . . 109 Segall, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Seltzer, Michael . . . . . . . . . . . . . . . . . . . . 151, 151 Semmelroth, Carrie . . . . . . . . . . . . . . . . . . . . . 115 Sen, Sedat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Sgammato, Adrienne . . . . . . . . . . . . . . . . . . .52, 52 Sha, Shuying . . . . . . . . . . . . . . . . . . . . . . . . . 110 Shao, Can . . . . . . . . . . . . . . . . . . . . . . . . 123, 166 Sharairi, Sid . . . . . . . . . . . . . . . . . . . . . . . . 50, 133 Shaw, Emily . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Shear, Benjamin . . . . . . . . . . . . . . . . . . . . . 80, 110 Sheehan, Kathleen . . . . . . . . . . . . . . . . . . . . . 179 Shepard, Lorrie . . . . . . . . . . . . . . . . . . . . . . . . 126 Shermis, Mark . . . . . . . . . . . . . . . . . . . 51, 104, 104 Shin, Hyo Jeong . . . . . . . . . . . . . . . . . . . . . . . . 68 Shin, Nami . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Shipman, Michelle . . . . . . . . . . . . . . . . . . . . . . 89 Shmueli, Doron . . . . . . . . . . . . . . . . . . . . . . . 128 Shropshire, Kevin . . . . . . . . . . . . . . . . . . . . . . 132 Shukla, Kathan . . . . . . . . . . . . . . . . . . . . . . . . . 63 Shuler, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Shute, Valerie . . . . . . . . . . . . . . . . . . . . . . . . . 127 Sikali, Emmanuel . . . . . . . . . . . . . . . . . . . . . . . 19 Silberglitt, Matt . . . . . . . . . . . . . . . . . . . . . . . 160 Sinharay, Sandip . . . . . . . . . . . . . . . . . . . . 164, 181 Sireci, Stephen . . . . . . . . . . . . . . . . . . . . . 142, 145 Skorupski, William . . . . . . . . . . . . . . 72, 91, 106, 134 Skrondal, Anders . . . . . . . . . . . . . . . . . . . . . . 182 Smiley, Whitney . . . . . . . . . . . . . . . . . . . . . . . 161 Smith, Kara . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Smith, Robert . . . . . . . . . . . . . . . . . . . . . . . . . 45 Smith, Weldon . . . . . . . . . . . . . . . . . . . . . . . . 110 Snow, Eric . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Somasundaran, Swapna . . . . . . . . . . . . . . . . . . 179 Song, Hao . . . . . . . . . . . . . . . . . . . . . . . . . 90, 161 Song, Lihong . . . . . . . . . . . . . . . . . . . . . . . . . 116 Sorrel, Miguel . . . . . . . . . . . . . . . . . . . . . . . . . 72 Sparks, Sarah . . . . . . . . . . . . . . . . . . . . . . . . . 149 Stafford, Rose . . . . . . . . . . . . . . . . . . . . . . . . . 91 Stanke, Luke . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Stark, Stephen . . . . . . . . . . . . . . . . . . . . . . . . 148 Stecher, Brian . . . . . . . . . . . . . . . . . . . . . . . . . 160 Steinberg, Jonathan . . . . . . . . . . . . . . . . . . . . 160 Sternod, Latisha . . . . . . . . . . . . . . . . . . . . . 74, 120 Stevens, Joseph . . . . . . . . . . . . . . . . . . . . . . . . 59 Stewart, John . . . . . . . . . . . . . . . . . . . . . . . . . 147 Stockford, Ian . . . . . . . . . . . . . . . . . . . . . . . . . 173 Stone, Clement . . . . . . . . . . . . . . . . . . . . . . . . 26 Stone, Elizabeth . . . . . . . . . . . . . . . . . . .54, 80, 142 Stout, Bill . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Strain-Seymour, Ellen . . . . . . . . . . . . . . . . . . . . 142 Stuart, Elizabeth . . . . . . . . . . . . . . . . . . . . . . . 151 Su, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 Su, Yu-Lan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 SU, YU-LAN . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Suh, Hongwook . . . . . . . . . . . . . . . . . . . . . . . 180 Sukin, Tia . . . . . . . . . . . . . . . . . . . . . . . . . 51, 115 Sullivan, Meghan . . . . . . . . . . . . . . . . . . . . . 31, 72 Sun, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Sung, Kyunghee . . . . . . . . . . . . . . . . . . . . . . . . 92 Svetina, Dubravka . . . . . . . . 62, 74, 118, 120, 135, 163 Swaminathan, Hariharan . . . . . . . . . . . . . . . . . 109 Sweet, Shauna . . . . . . . . . . . . . . . . . . . . . . . . . 83 Sweet, Tracy . . . . . . . . . . . . . . . . . . . . . . . . . 146 Swift, David . . . . . . . . . . . . . . . . . . . . . . . . . . 133 T Tan, Amy . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Tan, Xuan-Adele . . . . . . . . . . . . . . . . . . . . . . . 161 Tang, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Tannenbaum, Richard . . . . . . . . . . . . . . . . . . . 178 Tao, Jian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Tao, Shuqin . . . . . . . . . . . . . . . . . . . . . . . . 85, 143 Templin, Jonathan . . . . . . . . . . . . . . . 31, 72, 72, 86 193 2016 Annual Meeting & Training Sessions Participant Index Terzi, Ragip . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Tessema, Aster . . . . . . . . . . . . . . . . . . . . . . 15, 147 Thissen, David . . . . . . . . . . . . . . . . . . . . . . . . . 44 Thissen-Roe, Anne . . . . . . . . . . . . . . . . . . . . . 163 Thompson, Tony . . . . . . . . . . . . . . . . . . . . . . . 180 Thum, Yeow Meng . . . . . . . . . . . . . . . . . . . 50, 110 Thummaphan, Phonraphee . . . . . . . . . . . . . . . 126 Thurlow, Martha . . . . . . . . . . . . . . . . . . . . . . . 175 Tian, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . .61, 91 Tomkowicz, Joanna . . . . . . . . . . . . . . . . . . . 64, 152 Tong, Ye . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Topczewski, Anna . . . . . . . . . . . . . . . . . . . . . . . 50 Torre, Jimmy de la . . . . . . . . . . . . . . . . . . . . 72, 136 Torre, Jummy de la . . . . . . . . . . . . . . . . . . . . . 143 Towles, Elizabeth . . . . . . . . . . . . . . . . . . . . . . . 43 Toyama, Yukie . . . . . . . . . . . . . . . . . . . . . . . . . 77 Trang, Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Trierweiler, Tammy . . . . . . . . . . . . . . . . . . . 45, 148 Tu, Dongbo . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Turner, Charlene . . . . . . . . . . . . . . . . . . . . . 43, 108 Turner, Ronna . . . . . . . . . . . . . . . . . . . . . . . . 148 Tzou, Hueying . . . . . . . . . . . . . . . . . . . . . . . . . 86 Wang, Caroline . . . . . . . . . . . . . . . . . . . . . . . . . 59 Wang, Changjiang . . . . . . . . . . . . . . . . . . . . . 109 Wang, Chun . . . . . . . . . . . . . . . . . . . . . . . 45, 170 Wang, Hongling . . . . . . . . . . . . . . . . . . . . . . . 166 Wang, Jui-Sheng . . . . . . . . . . . . . . . . . . . . . . . . 83 WANG, JUI-SHENG . . . . . . . . . . . . . . . . . . . . . 135 Wang, Keyin . . . . . . . . . . . . . . . . . . . . . . . . . 135 Wang, Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Wang, Min . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Wang, Richard . . . . . . . . . . . . . . . . . . . . . . . . 150 Wang, Shichao . . . . . . . . . . . . . . . . . . . . . . 85, 122 Wang, Shudong . . . . . . . . . . . . . . . . . . . . . . . 169 Wang, Tianyu . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Wang, Wei . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Wang, Wen-Chung . . . . . . . . . . . . . 61, 107, 108, 148 Wang, Wenyi . . . . . . . . . . . . . . . . . . . . . . . . . 116 Wang, Xi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Wang, Xiaolin . . . . . . . . . . . . . . . . . . . . . . . 62, 135 Wang, Xiaoqing . . . . . . . . . . . . . . . . . . . . . . . . 91 Wang, Zhen . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Way, Walter . . . . . . . . . . . . . . . . . . . . . . . . 84, 142 Weegar, Johanna . . . . . . . . . . . . . . . . . . . . . . . 89 Weeks, Jonathan . . . . . . . . . . . . . . . . . 60, 116, 160 Wei, Hua . . . . . . . . . . . . . . . . . . . . . . . . . . 70, 164 Wei, Xiaoxin . . . . . . . . . . . . . . . . . . . . . . . . 50, 115 Wei, Youhua . . . . . . . . . . . . . . . . . . . . . . 165, 179 Weiner, John . . . . . . . . . . . . . . . . . . . . . . . 54, 134 Weiss, David . . . . . . . . . . . . . . . . . . . . . . . . . 167 Welch, Catherine . . . . . 43, 56, 56, 62, 90, 108, 145, 161 Wendler, Cathy . . . . . . . . . . . . . . . . . . . . . . 51, 179 West, Martin . . . . . . . . . . . . . . . . . . . . . . . . . 157 White, Lauren . . . . . . . . . . . . . . . . . . . . . . . . 172 Whittington, Dale . . . . . . . . . . . . . . . . . . . . . . 168 Wiberg, Marie . . . . . . . . . . . . . . . . . . . . . . . . . 60 Widiatmo, Heru . . . . . . . . . . . . . . . . . . . . . . . 119 Wiley, Andrew . . . . . . . . . . . . . . . . . . . . . . . . 112 Williams, Elizabeth . . . . . . . . . . . . . . . . . . . . . 120 Williams, Jean . . . . . . . . . . . . . . . . . . . . . . . . . 84 Willis, James . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Willoughby, Michael . . . . . . . . . . . . . . . . . . . . 153 Willse, John . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Wilmes, Carsten . . . . . . . . . . . . . . . . . . . . . . . 175 Wilson, Mark . . . . . . . . . . . . . . . . . . . . . 68, 68, 68 Wind, Stefanie . . . . . . . . . . . . . . . . . . . . . . . . . 71 Winter, Phoebe . . . . . . . . . . . . . . . . . . . . . . 51, 79 Wise, Lauress . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Wise, Laurie . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Wollack, James . . . . . . . . . . . . . . . . . . . . . . . . 109 Woo, Ada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 U Underhill, Stephanie . . . . . . . . . . . . . . . . . . . . . 62 University, Sacred Heart . . . . . . . . . . . . . . . . . . 168 V van der Linden, Wim . . . . . . . . . . . . . . . . . . . . . 16 Vansickle, Tim . . . . . . . . . . . . . . . . . . . . . . . . . 55 Vasquez-Colina, Maria Donata . . . . . . . . . . . . . . . 69 Veldkamp, Bernard . . . . . . . . . . . . . . . . . . . . . 113 Vispoel, Walter . . . . . . . . . . . . . . . . . . . . . . . . 148 VonDavier, Alina . . . . . . . . . . . . . . . . . . . . . . . . 15 Vue, Kory . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 W Wain, Jennifer . . . . . . . . . . . . . . . . . . . . . . . . . 84 Wainer, Howard . . . . . . . . . . . . . . . . . . . . . . . 181 Walker, Cindy . . . . . . . . . . . . . . . . . . . . . . . 69, 154 Walker, Cindy M. . . . . . . . . . . . . . . . . . . . . . . . . 88 Walker, Michael . . . . . . . . . . . . . . . . . . . . . . . 180 Wan, Ping . . . . . . . . . . . . . . . . . . . . . . . . . 64, 152 wang, aijun . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Wang, Ann . . . . . . . . . . . . . . . . . . . . . . . . . . 167 194 Washington, DC, USA Participant Index Z Wood, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Wu, Yi-Fang . . . . . . . . . . . . . . . . . . . . . . . . .86, 90 Wüstenberg, Sascha . . . . . . . . . . . . . . . . . . . . 103 Wyatt, Jeff . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Wyse, Adam . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Zapata-Rivera, Diego . . . . . . . . . . . . . . . . . 127, 178 Zechner, Klaus . . . . . . . . . . . . . . . . . . . . . . . . 150 Zenisky, April . . . . . . . . . . . . . . . . . . . . . . . . . 178 Zhan, Peida . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Zhang, Bo . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Zhang, Jiahui . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Zhang, Jin . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Zhang, Jinming . . . . . . . . . . . . . . . . . . . . . . . . 58 Zhang, Litong . . . . . . . . . . . . . . . . . . . . . . . . 152 Zhang, Mengyao . . . . . . . . . . . . . . . . . . . 105, 180 Zhang, Mingcai . . . . . . . . . . . . . . . . . . . . . . . . 75 Zhang, Mo . . . . . . . . . . . . . . . . . . . . . 29, 160, 174 Zhang, Oliver . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Zhang, Susu . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Zhang, Xinxin . . . . . . . . . . . . . . . . . . . . . . . . . 77 Zhang, Xue . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Zhang, Ya . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Zhang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . 54, 119 zhang, Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Zhao, Tuo . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Zhao, Yang . . . . . . . . . . . . . . . . . . . . . . . . 71, 115 Zhao, Yihan . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Zheng, Bin . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Zheng, Chanjin . . . . . . . . . . . . . . . . . . . . . . 61, 73 Zheng, Chunmei . . . . . . . . . . . . . . . . . . . . . . . . 86 Zheng, Qiwen . . . . . . . . . . . . . . . . . . . . . . . . 146 Zheng, Xiaying . . . . . . . . . . . . . . . . . . . . . 170, 170 Zheng, Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Zhu, Mengxiao . . . . . . . . . . . . . . . . . . . . . 146, 174 Zhu, Rongchun . . . . . . . . . . . . . . . . . . 90, 136, 166 Zhu, Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Zweifel, Michael . . . . . . . . . . . . . . . . . . . . . . . 110 X Xi, Nuo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Xiang, Shibei . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Xie, Chao . . . . . . . . . . . . . . . . . . . . . . . . 171, 171 Xie, Qing . . . . . . . . . . . . . . . . . . . . . . . . . . 90, 120 Xin, Tao . . . . . . . . . . . . . . . . . . . . . . . . 61, 91, 135 Xing, Kuan . . . . . . . . . . . . . . . . . . . . . . . . 61, 122 Xiong, Jianhua . . . . . . . . . . . . . . . . . . . . . . . . . 91 Xiong, Xinhui . . . . . . . . . . . . . . . . . . . . . . . . . 104 Xiong, Yao . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Xu, Jing-Ru . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Xu, Ran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Xu, Ting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Xu, Xueli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Y Yakimowski, Mary E . . . . . . . . . . . . . . . . . . . . . 168 Yan, Duanli . . . . . . . . . . . . . . . . . . . . . . . . .22, 85 Yan, Ning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Yang, Ji Seung . . . . . . . . . . . . 73, 132, 170, 170, 170 Yang, Jiseung . . . . . . . . . . . . . . . . . . . . . . . . . 151 Yang, Tao . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Yao, Lihua . . . . . . . . . . . . . . . . . 20, 45, 88, 107, 147 Yao, Lili . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Ye, Feifei . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Ye, Sangbeak . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Yi, qin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Yi, Qing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Yin, Ping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Yoo, Hanwook . . . . . . . . . . . . . . . . . . . . . . 63, 133 Yoon, Su-Youn . . . . . . . . .129, 129, 150, 179, 179, 179 Yu, Xin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 195 2016 Annual Meeting & Training Sessions Participant Index 196 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Aksu Dunya, Beyza Bennett, Randy E ETS [email protected] University of Illinois at Chicago [email protected] Ali, Usama S. Educational Testing Service [email protected] Bertling, Maria Harvard University [email protected] Alzen, Jessica School of Education University of Colorado Boulder [email protected] Beverly, Tanesia University of Connecticut [email protected] Amati, Lucy Educational Testing Service [email protected] Bo, Yuanchao Emily University of California, Los Angeles [email protected] An, Ji University of Maryland [email protected] Bond, Mark The University of Texas at Austin [email protected] Anderson, Daniel University of Oregon [email protected] Bonifay, Wes E University of Missouri [email protected] Andrews, Benjamin ACT [email protected] Boyer, Michelle University of Massachusetts, Amherst [email protected] Andrich, David University of Western Australia [email protected] Bradshaw, Laine University of Georgia [email protected] Austin, Bruce W Washington State University [email protected] Brandt, Steffen Art of Reduction [email protected] Banks, Kathleen LEAD Public Schools [email protected] Breyer, Jay F. ETS [email protected] Barry, Carol L The College Board [email protected] Bridgeman, Brent Educational Testing Service [email protected] Barton, Karen Learning Analytics [email protected] Briggs, Derek C University of Colorado [email protected] Bashkov, Bozhidar M American Board of Internal Medicine [email protected] Broaddus, Angela Center for Educational Testing and Evaluation University of Kansas [email protected] Bejar, Isaac I. ETS [email protected] Brown, Derek Oregon Department of Education [email protected] 197 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Buchholz, Janine German Institute for International Educational Research (DIPF) [email protected] Carstens, Ralph International Association for the Evaluation of Educational Achievement (IEA) Data Processing and Research Center [email protected] Buckendahl, Chad W. Alpine Testing Solutions, Inc. [email protected] Castellano, Katherine Furgol Educational Testing Service (ETS) [email protected] Buckley, Jack College Board [email protected] Chattergoon, Rajendra University of Colorado, Boulder [email protected] Bukhari, Nurliyana University of North Carolina at Greensboro [email protected] Chattergoon, Rajendra University of Colorado, Boulder [email protected] Bulut, Okan University of Alberta [email protected] Chatterji, Madhabi Teachers College, Columbia University [email protected] Buzick, Heather Educational Testing Service [email protected] Chen, Feng The University of Kansas [email protected] Cai, Li UCLA/CRESST [email protected] Chen, Hui-Fang City University of Hong Kong [email protected] Cai, Liuhan University of Nebraska-Lincoln [email protected] Chen, Jie Center for Educational Testing and Evaluation [email protected] Cain, Jessie Montana University of North Carolina at Chapel Hill [email protected] Chen, Juan National Conference of Bar Examiners [email protected] Caliço, Tiago A University of Maryland [email protected] Chen, Keyu University of Iowa [email protected] Camara, Wayne ACT [email protected] Chen, Pei-Hua National Chiao Tung University [email protected] Camara, Wayne J. ACT [email protected] Chen, Ping Beijing Normal University [email protected] Canto, Phil Florida Department of Education [email protected] Chen, Tingting ACT, Inc. [email protected] Carroll, Patricia E University of California - Los Angeles [email protected] 198 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Chen, Xin Pearson [email protected] Cizek, Greg University of North Carolina at Chapel Hill [email protected] Cheng, Ying Alison University of Notre Dame [email protected] Clark, Amy K. University of Kansas [email protected] Childs, Ruth A Ontario Institute for Studies in Education, University of Toronto [email protected] Clauser, Amanda L. National Board of Medical Examiners [email protected] Cohen, Allan University of Georgia [email protected] Cho, Youngmi Pearson [email protected] Cohen, Jon American Institutes for Research [email protected] Choi, Hye-Jeong University of Georgia [email protected] Colvin, Kimberly F University at Albany, SUNY [email protected] Choi, In-Hee University of California, Berkeley [email protected] Conforti, Peter The University of Texas at Austin [email protected] Choi, In-Hee University of California, Berkeley [email protected] Confrey, Jere North Carolina State University [email protected] Choi, Jinah The University of Iowa [email protected] Choi, Jiwon ACT/University of Iowa [email protected] Contributor Last Name, Contributor First Name Middle Initial Company/Institution Contributor Email Address Choi, Kilchan CRESST/UCLA [email protected] Coomans, Frederik University of Amsterdam [email protected] Choi, Kilchan CRESST/UCLA [email protected] Cottrell, Nicholas D Fulcrum [email protected] Chu, Kwang-lee Pearson [email protected] Crabtree, Ashleigh R University of Iowa [email protected] Chung, Kyung Sun Pennsylvania State University [email protected] Crane, Samuel Amplify [email protected] Circi, Ruhan University of Colorado Boulder [email protected] Croft, Michelle ACT, Inc. [email protected] 199 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Cui, Zhongmin ACT, Inc. [email protected] Diao, Hongyu University of Massachusetts-Amherst [email protected] Culpepper, Steven Andrew University of Illinois at Urbana-Champaign [email protected] DiCerbo, Kristen Pearson [email protected] Dadey, Nathan The National Center for the Improvement of Educational Assessment [email protected] Donata Vasquez-Colina, Maria Florida Atlantic University [email protected] Donoghue, John R Educational Testing Service [email protected] Davey, Tim Educational Testing Service [email protected] Du, Yi Educational Testing Service [email protected] Davis, Laurie L Pearson [email protected] Du, Yi Educational Testing Services [email protected] d’Brot, Juan DRC JD’[email protected] Egan, Karla NCIEA [email protected] De Boeck, Paul Ohiao State University [email protected] Embretson, Susan E Georgia Institute of Technology [email protected] Debeer, Dries University of Leuven [email protected] DeCarlo, Lawrence T. Teachers College, Columbia University [email protected] Engelhardt, Lena German Institute for International Educational Research [email protected] DeMars, Christine E. James Madison University [email protected] Evans, Carla M. University of New Hampshire [email protected] Denbleyker, Johnny Houghton Mifflin Harcourt [email protected] Fan, Meichu ACT, Inc [email protected] Deng, Nina Measured Progress [email protected] Fan, Yuyu Fordham University [email protected] Deters, Lauren edCount, LLC [email protected] Farley, Dan University of Oregon [email protected] Dhaliwal, Tasmin Pearson [email protected] Feinberg, Richard A National Board of Medical Examiners [email protected] 200 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Fina, Anthony D Iowa Testing Programs, University of Iowa [email protected] Gocer Sahin, Sakine Hacettepe University [email protected] Finch, Holmes Ball State University [email protected] Gong, Brian National Center for the Improvement of Educational Assessment [email protected] Foltz, Peter W. Pearson and University of Colorado Boulder [email protected] González-Brenes, José Center for Digital Data, Analytics & Adaptive Learning, Pearson [email protected] Forte, Ellen edCount [email protected] González-Brenes, José Pablo Pearson [email protected] Forte, Ellen edCount, LLC [email protected] Grabovsky, Irina NBME [email protected] Freeman, Leanne University of Wisconsin, Milwaukee [email protected] Graesser, Art University of Memphis [email protected] Fu, Yanyan UNCG [email protected] Graf, Edith Aurora ETS [email protected] Gafni, Naomi National Institute for Testing & Evaluation [email protected] Greiff, Samuel University of Luxemburg [email protected] Gao, Lingyun ACT, Inc. [email protected] Grochowalski, Joe The College Board [email protected] Gao, Xiaohong ACT, Inc. [email protected] Gu, Lixiong Educational Testing Service [email protected] Garcia, Alejandra Amador University of Massachusetts [email protected] Guo, Hongwen ETS [email protected] Geis, Eugene J Rutgers Graduate School of Education [email protected] Guo, Rui University of Illinois at Urbana-Champaign [email protected] Geisinger, Kurt F. Buros Center for Testing, University of NebraskaLincoln [email protected] Hacker, Miriam The German Institute for International Educational Research (DIPF) Centre for International Student Assessment (ZIB) [email protected] Gessaroli, Marc E National Board of Medical Examiners [email protected] 201 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Hakuta, Kenji Stanford University [email protected] Hogan, Thomas P University of Scranton [email protected] Hall, Erika Center for Assessment [email protected] Holmes, Stephen Office of Qualifications and Examinations Regulation [email protected] Han, Zhuangzhuang Teachers College Columbia University [email protected] Hou, Likun Educational Testing Services [email protected] Hansen, Mark University of California, Los Angeles [email protected] Huang, Chi-Yu ACT, Inc. [email protected] Harrell, Lauren University of California, Los Angeles [email protected] Huang, Xiaorui East China Normal University [email protected] Hayes, Heather AMTIS Inc. [email protected] Huggins-Manley, Anne Corinne University of Florida [email protected] Hayes, Stacy Discovery Education [email protected] Huh, Nooree ACT, Inc. [email protected] Hazen, Tim Iowa Testing Programs [email protected] Hunter, C. Vincent Georgia State Univrsity [email protected] He, Qingping Office of Qualifications and Examinations Regulation [email protected] Huo, Yan Educational Testing Service [email protected] He, Qiwei Educational Testing Service [email protected] Insko, William R Houghton Mifflin Harcourt [email protected] He, Yong ACT, Inc. [email protected] Jang, Hyesuk American Institutes for Research [email protected] Herrera, Bill edCount, LLC [email protected] Jang, Hyesuk American Institutes for Research [email protected] Himelfarb, Igor Educational Testing Service (ETS) [email protected] Jewsbury, Paul Educational Testing Service [email protected] Ho, Emily H College Board [email protected] Jiang, Yanming Educational Testing Service [email protected] 202 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Jiang, Zhehan University of Kansas [email protected] KARADAVUT, TUGBA UNIVERSITY OF GEORGIA [email protected] Jin, Kuan-Yu The Hong Kong Institute of Education [email protected] Karvonen, Meagan University of Kansas [email protected] Joo, Seang-hwane University of South Florida [email protected] Keller, Lisa A University of Massachusetts Amherst [email protected] Julian, Marc Data Recognition Corporation [email protected] Kenyon, Dorry Center for Applied Linguistics [email protected] Junker, Brian W Carnegie Mellon University [email protected] Kern, Justin L. University of Illinois at Urbana-Champaign [email protected] Kaliski, Pamela College Board [email protected] Kim, Dong-In Data Recognition Corporation [email protected] Kang, Hyeon-Ah University of Illinois at Urbana-Champaign [email protected] Kim, Dong-In Data Recognition Corporation [email protected] Kang, Yoon Jeong American Institutes for Research [email protected] Kim, Han Yi Measured Progress [email protected] Kang, Yujin University of Iowa [email protected] Kim, Hyung Jin The University of Iowa [email protected] Kannan, Priya Educational Testing Service [email protected] Kim, Ja Young ACT, Inc. [email protected] Kanneganti, Raghuveer Data Recognition Corporation CTB [email protected] Kim, Jinok UCLA/CRESST [email protected] Kao, Shu-chuan Pearson [email protected] Kim, Jong ACT [email protected] Kaplan, David University of Wisconsin – Madison [email protected] Kim, Se-Kang Fordham University [email protected] Kapoor, Shalini ACT [email protected] Kim, Sooyeon Educational Testing Service [email protected] 203 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Kim, Stella Y The University of Iowa [email protected] Latifi, Syed Muhammad Fahad University of Alberta [email protected] Kim, Sunhee Prometric [email protected] Lawson, Janelle San Francisco State University [email protected] Kim, Young Yee American Institues for Research [email protected] Leacock, Claudia McGraw-Hill Education CTB [email protected] Kobrin, Jennifer L. Pearson [email protected] LeBeau, Brandon University of Iowa [email protected] Koklu, Onder Florida Department of Education [email protected] Lee, Chansoon University of Wisconsin-Madison [email protected] Konold, Tim R University of Virginia [email protected] Lee, Chong Min ETS [email protected] Kroehne, Ulf German Institute for International Educational Research (DIPF) [email protected] Lee, HyeSun University of Nebraska-Lincoln [email protected] Lee, Sora University of Wisconsin, Madison [email protected] Kuhfeld, Megan University of California [email protected] Lei, Ming American Institutes for Research [email protected] Kuhfeld, Megan University of California, Los Angeles [email protected] Leventhal, Brian University of Pittsburgh [email protected] Kupermintz, Haggai University of Haifa [email protected] Li, Chen Educational Testing Service [email protected] Lai, Hollis University of Alberta [email protected] Li, Chen University of Maryland [email protected] Lao, Hongling University of Kansas [email protected] Lash, Andrea A. WestEd [email protected] Li, Cheng-Hsien Department of Pediatrics, University of Texas Medical School at Houston [email protected] Lathrop, Quinn N Northwest Evaluation Association [email protected] Li, Feifei Educational Testing Service [email protected] 204 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Li, Feiming University of North Texas Health Science Center [email protected] Ling, Guangming ETS [email protected] Li, Jie McGraw-Hill Education [email protected] Liu, Jinghua Secondary School Admission Test Board [email protected] Li, Ming University of Maryland [email protected] Liu, Lei ETS [email protected] Li, Tongyun Educational Testing Service [email protected] Liu, Xiang Teachers College, Columbia University [email protected] Li, Xin ACT, Inc. [email protected] Liu, Yang University of California, Merced [email protected] Li, Ying American Nurses Credentialing Center [email protected] Liu, Yue Sichuan Institute Of Education Sciences [email protected] Li, Zhushan Mandy Boston College [email protected] Lockwood, J.R. Educational Testing Service [email protected] Liao, Dandan University of Maryland, College Park [email protected] Longabach, Tanya Excelsior College [email protected] Liaw, Yuan-Ling University of Washington [email protected] Lopez, Alexis A ETS [email protected] Lim, Euijin The University of Iowa [email protected] Lord-Bessen, Jennifer McGraw Hill Education CTB [email protected] Lin, Chih-Kai Center for Applied Linguistics (CAL) [email protected] Lorié, William A Center for NextGen Learning & Assessment, Pearson [email protected] Lin, Haiyan ACT, Inc. [email protected] Lottridge, Susan Pacific Metrics, Inc. [email protected] Lin, Johnny University of California, Los Angeles [email protected] Lu, Lucy NSW Department of Education, Australia [email protected] Ling, Guangming Educational Testing Service [email protected] Lu, Ru Educational Testing Service [email protected] 205 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Lu, Ying Educational Testing Service [email protected] Matta, Tyler H. Northwest Evaluation Association [email protected] LUO, Fen Jiangxi Normal University [email protected] Maul, Andrew University of California, Santa Barbara [email protected] Luo, Xiao National Council of State Boards of Nursing [email protected] McCaffrey, Daniel F. Educational Testing Service [email protected] Luo, Xin Michigan State University [email protected] McCall, Marty Smarter Balanced Assessment Consortium [email protected] Ma, Wenchao Rutgers, The State University of New Jersey [email protected] McClellan, Catherine A Clowder Consulting [email protected] MacGregor, David Center for Applied Linguistics [email protected] McKnight, Kathy Center for Educator Learning & Effectiveness, Pearson [email protected] Magnus, Brooke E University of North Carolina at Chapel Hill [email protected] McTavish, Thomas S Center for Digital Data, Analytics and Adaptive Learning, Pearson [email protected] Mao, Xia Pearson [email protected] Meyer, Patrick University of Virginia [email protected] Marion, Scott National Center for the Improvement of Educational Assessment [email protected] Meyer, Robert H Education Analytics, Inc. [email protected] Martineau, Joseph National Center for the Improvement of Educational Assessment [email protected] Miel, Shayne Turnitin [email protected] Miller, Sherral College Board [email protected] Martineau, Joseph NCIEA [email protected] Monroe, Scott UMass Amherst [email protected] Masters, Jessica Measured Progress [email protected] Montee, Megan Center for Applied Linguistics [email protected] Matlock, Ki Lynn Oklahoma State University [email protected] 206 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Moretti, Antonio Center for Computational Learning Systems, Columbia University [email protected] Nydick, Steven W Pearson VUE [email protected] Ogut, Burhan American Institutes for Research [email protected] Morgan, Deanna L The College Board [email protected] O’Leary, Timothy Mark University of Melbourne [email protected] Morin, Maxim Medical Council of Canada [email protected] Olgar, Süleyman Florida Department of Education [email protected] Morris, Carrie A University of Iowa College of Education [email protected] Olgar, Süleyman Florida Department of Education [email protected] Morrison, Kristin M Georgia Institute of Technology [email protected] Oliveri, Maria Elena Educational Testing Service [email protected] Muntean, William Joseph Pearson [email protected] Olsen, James B. Renaissance Learning Inc. [email protected] Murphy, Stephen T Houghton Mifflin Harcourt [email protected] Özdemir, Burhanettin Hacettepe University [email protected] Naumann, Alexander German Institute for International Educational Research (DIPF) [email protected] Pak, Seohong University of Iowa [email protected] Naumenko, Oksana The University of North Carolina at Greensboro [email protected] Pan, Tianshu Pearson [email protected] Nebelsick-Gullet, Lori edCount [email protected] Park, Jiyoon Federation of State Boards of Physical Therapy [email protected] Nieto, Ricardo The University of Texas at Austin [email protected] Park, Yoon Soo University of Illinois at Chicago [email protected] Noh, Eunhee Korean Institute for Curriculum and Evaluation [email protected] Patelis, Thanos Center for Assessment [email protected] Norton, Jennifer Center for Applied Linguistics [email protected] Patelis, Thanos Center for Assessment [email protected] 207 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Peabody, Michael American Board of Family Medicine [email protected] Reboucas, Daniella University of Notre Dame [email protected] Perie, Marianne Center for Educational Testing and Evaluation [email protected] Reckase, Mark Michigan State University [email protected] Perie, Marianne CETE University of Kansas [email protected] Redell, Nick National Board of Osteopathic Medical Examiners (NBOME) [email protected] Phadke, Chaitali University of Minnesota [email protected] Renn, Jennifer Center for Applied Linguistics [email protected] Pohl, Steffi Freie Universität Berlin [email protected] Reshetnyak, Evgeniya Fordham University [email protected] Por, Han-Hui Educational Testing Service [email protected] Ricarte, Thales Akira Matsumoto Institute of Mathematical and Computer Sciences (ICMC-USP) [email protected] Powers, Sonya Pearson [email protected] Rick, Francis University of Massachusetts, Amherst [email protected] QIAN, HAIXIA University of Kansas [email protected] Rickels, Heather Anne University of Iowa, Iowa Testing Programs [email protected] Qian, Jiahe Educational Testing Service [email protected] Rios, Joseph A. Educational Testing Service [email protected] QIU, Xue-Lan The Hong Kong Institute of Education [email protected] Risk, Nicole M American Medical Technologists [email protected] Qiu, Yuxi University of Florida [email protected] Roduta Roberts, Mary University of Alberta [email protected] Quellmalz, Edys S WestEd [email protected] Rogers, H. Jane University of Connecticut [email protected] Rahman, Nazia Law School Admission Council [email protected] Rorick, Beth National Parent-Teacher Association Rankin, Jenny G. Illuminate Education [email protected] Rosen, Yigal Pearson [email protected] 208 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Rubright, Jonathan D American Institute of Certified Public Accountants [email protected] Seltzer, Michael UCLA [email protected] Runyon, Christopher R. The University of Texas at Austin [email protected] Sen, Sedat Harran University [email protected] Rutkowski, Leslie University of Oslo [email protected] Sgammato, Adrienne Educational Testing Service [email protected] Rutstein, Daisy W. SRI International [email protected] Sha, Shuying University of North Carolina at Greensboro [email protected] Sabatini, John ETS [email protected] Shao, Can University of Notre Dame [email protected] Şahin, Füsun University at Albany, State University of New York [email protected] Shaw, Emily College Board [email protected] Saiar, Amin PSI Services LLC [email protected] Shear, Benjamin Stanford University [email protected] Sakworawich, Arnond National Institute of Development Administration [email protected] Shear, Benjamin R. Stanford University [email protected] Samonte, Kelli M. American Board of Internal Medicine [email protected] Sheehan, Kathleen M. ETS [email protected] Sano, Makoto Prometric [email protected] Shermis, Mark D University of Houston--Clear Lake [email protected] Sato, Edynn Pearson [email protected] Shin, Hyo Jeong ETS [email protected] Schultz, Matthew T American Institute of Certified Public Accountants [email protected] Shin, Nami University of California, Los Angeles/ National Center for Research on Evaluation, Standards, and Student Testing (CRESST) [email protected] Schwarz, Richard D. ETS [email protected] Secolsky, Charles Mississippi Department of Education [email protected] 209 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Shropshire, Kevin O. Virginia Tech (note I graduated in May 2014). I currently work at the University of Georgia (OIR) and this research is not affiliated with that department / university. I am providing the school where my research was conducted. [email protected] Sweet, Shauna J University of Maryland, College Park [email protected] Swift, David Houghton Mifflin Harcourt [email protected] Swinburne Romine, Russell University of Kansas [email protected] Shute, Valerie Florida State University [email protected] Tan, Xuan-Adele Educational Testing Service [email protected] Sinharay, Sandip Pacific Metrics Corp [email protected] Tang, Wei University of Alberta [email protected] Sireci, Stephen G. University of Massachusetts-Amherst [email protected] Tannenbaum, Richard J. Educational Testing Service [email protected] Skorupski, William P University of Kansas [email protected] Tao, Shuqin Curriculum Associates [email protected] Somasundaran, Swapna ETS [email protected] Terzi, Ragip Rutgers, The State University of New Jersey [email protected] Sorrel, Miguel A. Universidad Autónoma de Madrid [email protected] Thissen, David University of North Carolina [email protected] Stanke, Luke Minneapolis Public Schools [email protected] Thomas, Larry University of California, Los Angeles [email protected] Stone, Elizabeth Educational Testing Service [email protected] Thummaphan, Phonraphee University of Washington, Seattle [email protected] SU, YU-LAN ACT.ING [email protected] Torres Irribarra, David Pontificia Universidad Católica de Chile [email protected] Suh, Hongwook ACT, inc. [email protected] Traynor, Anne Purdue University [email protected] Sukin, Tia M Pacific Metrics [email protected] Trierweiler, Tammy J. Law School Admission Council (LSAC) [email protected] Svetina, Dubravka Indiana University [email protected] 210 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors TU, DONGBO Jiangxi Normal University [email protected] Wang, Shichao The University of Iowa [email protected] Underhill, Stephanie Indiana University - Bloomington [email protected] Wang, Shudong Northwest Evaluation Association [email protected] van Rijn, Peter ETS Global [email protected] Wang, Wei Educational Testing Service [email protected] Vansickle, Tim Questar Assessment Inc., [email protected] Wang, Wenyi Jiangxi Normal University [email protected] Vispoel, Walter P University of Iowa [email protected] Wang, Xi University of Massachusetts Amherst [email protected] von Davier, Matthias Educational Testing Service [email protected] Wang, Xiaolin Indiana University, Bloomington [email protected] Vue, Kory University of Minnesota [email protected] Wang, Zhen Educational Testing Service (ETS) [email protected] Wainer, Howard National Board of Medical Examiners [email protected] Weeks, Jonathan P ETS [email protected] Walker, Cindy University of Wisconsin - Milwaukee [email protected] Wei, Hua Pearson [email protected] Walker, Michael E The College Board [email protected] Wei, Xiaoxin Elizabeth American Institutes for Research [email protected] wang, aijun federation of state boards of physical therapy [email protected] Wei, Youhua Educational Testing Service [email protected] Wang, Hongling ACT, Inc. [email protected] Weiner, John A. PSI Services LLC [email protected] Wang, Keyin Michigan State University [email protected] Welch, Catherine University of Iowa [email protected] Wang, Lu ACT, Inc./The University of Iowa [email protected] Welch, Catherine J University of Iowa [email protected] 211 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors Wendler, Cathy Educational Testing Service [email protected] Xin, Tao Beijing Normal University [email protected] White, Lauren Florida Department of Education [email protected] Xiong, Xinhui American Institute for Certified Public Accountants [email protected] Wiberg, Marie Umeå University [email protected] Xu, Jing-Ru Pearson VUE [email protected] Widiatmo, Heru ACT, Inc. [email protected] Xu, Ting University of Pittsburgh [email protected] Wilson, Mark University of California, Berkeley [email protected] Yang, Ji Seung University of Maryland [email protected] Wilson, Mark University of California, Berkeley [email protected] Yao, Lihua Defense manpower data center [email protected] Wood, Scott W Pacific Metrics Corporation [email protected] Ye, Sangbeak University of Illinois - Urbana Champaign [email protected] Wu, Yi-Fang University of Iowa [email protected] Yi, qin Faculty of Education, Beijing Normal University [email protected] Wyatt, Jeff College Board [email protected] Yi, Qing ACT, Inc. [email protected] Xi, Nuo Educational Testing Service [email protected] Yin, Ping Curriculum Associates [email protected] Xiang, Shibei National Cooperative Innovation Center for Assessment and Improvement of Basic Education Quality [email protected] Yoo, Hanwook Henry Educational Testing Service [email protected] Yoon, Su-Youn Educational Testing Service [email protected] Xie, Chao American Institutes for Research [email protected] Yoon, Su-Youn ETS [email protected] Xie, Qing ACT/The University of Iowa [email protected] Zhan, Peida Beijing Normal University [email protected] 212 Washington, DC, USA Contact Information for Individual and Coordinated Sessions First Authors Zhang, Jin ACT Inc. [email protected] Zhang, Jinming University Of Illinois at Urbana-Champaign [email protected] Zhang, Mengyao National Conference of Bar Examiners [email protected] Zhang, Xue Northeast Normal University [email protected] Zhang, Yu Federation of State Boards of Physical Therapy [email protected] Zhao, Yang University of Kansas [email protected] Zheng, Chanjin Jiangxi Normal University [email protected] Zheng, Chunmei Pearson [email protected] Zheng, Xiaying University of Maryland [email protected] Zheng, Yi Arizona State University [email protected] Zweifel, Michael University of Nebraska-Lincoln [email protected] 213 2016 Annual Meeting & Training Sessions Contact Information for Individual and Coordinated Sessions First Authors 214 Washington, DC, USA NCME 2016 • Schedule-At-A-Glance Time Room Type ID Title Thursday, April 7, 2016 8:00 AM–12:00 PM Meeting Room 6 TS AA Quality Control Tools in Support of Reporting Accurate and Valid Test Scores 8:00 AM–12:00 PM Meeting Room 7 TS BB IRT Parameter Linking 8:00 AM–5:00 PM Meeting Room 5 TS CC 21st Century Skills Assessment: Design, Development, Scoring, and Reporting of Character Skills 8:00 AM–5:00 PM Meeting Room 2 TS DD Introduction to Standard Setting 8:00 AM–5:00 PM Meeting Room 16 TS EE Analyzing NAEP Data Using Plausible Values and Marginal Estimation with AM 8:00 AM–5:00 PM Meeting Room 4 TS FF Multidimensional Item Response Theory: Theory and Applications and software 1:00 PM–5:00 PM Meeting Room 3 TS GG New Weighting Methods for Causal Mediation Analysis 1:00 PM–5:00 PM Meeting Room 6 TS II Computerized Multistage Adaptive Testing: Theory and Applications (Book by Chapman and Hall)” Friday, April 8, 2016 8:00 AM–12:00 PM Renaissance West B TS JJ Landing Your Dream Job for Graduate Students 8:00 AM–12:00 PM Meeting Room 4 TS KK Bayesian Analysis of IRT Models using SAS PROC MCMC 8:00 AM–5:00 PM Meeting Room 2 TS LL flexMIRT®: Flexible multilevel multidimensional item analysis and test scoring 8:00 AM–5:00 PM Meeting Room 5 TS MM Aligning ALDs and Item Response Demands to Support Teacher Evaluation Systems 8:00 AM–5:00 PM Renaissance East TS NN Best Practices for Lifecycles of Automated Scoring Systems for Learning and Assessment 8:00 AM–5:00 PM Meeting Room 3 TS OO Test Equating Methods and Practices 8:00 AM–5:00 PM Renaissance West A TS PP Diagnostic Measurement: Theory, Methods, Applications, and Software 1:00 PM–5:00 PM Renaissance West B TS QQ Effective Item Writing for Valid Measurement 3:00 PM–8:00 PM Meeting Room 11 Board Meeting 4:30 PM–6:30 PM Fado’s Irish Pub, 808 7th Street NW, Washington, DC 20001 Graduate Student Social CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 215 2016 Annual Meeting & Training Sessions Time Room Type ID Title 6:30 PM–10:00 PM Convention Center, Level Three, Ballroom C 6:30 AM–7:30 AM Meeting Room 7 8:15 AM–10:15 AM Renaissance East IS A1 NCME Book Series Symposium: The Challenges to Measurement in an Era of Accountability 8:15 AM–10:15 AM Renaissance West A CS A2 Collaborative Problem Solving Assessment: Challenges and Opportunities 8:15 AM–10:15 AM Renaissance West B CS A3 Harnessing Technological Innovation in Assessing English Learners: Enhancing Rather Than Hindering 8:15 AM–10:15 AM Meeting Room 3 PS A4 How can assessment inform classroom practice? AERA Centennial Symposium & Centennial Reception Saturday, April 9, 2016 Sunrise Yoga 8:15 AM–10:15 AM Meeting Room 4 CS A5 Enacting a Learning Progression Design to Measure Growth 8:15 AM–10:15 AM Meeting Room 5 PS A6 Testlets and Multidimensionality in Adaptive Testing 8:15 AM–10:15 AM Meeting Room 12 PS A7 Methods for Examining Local Item Dependence and Multidimensionality 10:35 AM–12:05 PM Renaissance East CS B1 The End of Testing as We Know it? 10:35 AM–12:05 PM Renaissance West A CS B2 Fairness and Machine Learning for Educational Practice 10:35 AM–12:05 PM Renaissance West B CS B3 Item Difficulty Modeling: From Theory to Practice 10:35 AM–12:05 PM Meeting Room 3 PS B4 Growth and Vertical Scales 10:35 AM–12:05 PM Meeting Room 4 PS B5 Perspectives on Validation 10:35 AM–12:05 PM Meeting Room 5 PS B6 Model Fit 10:35 AM–12:05 PM Meeting Room 12 PS B7 Simulation- and Game-based Assessments 10:35 AM–12:05 PM Meeting Room 10 PS B8 Test Security and Cheating 12:25 PM–1:55 PM Renaissance East CS C1 Opting out of testing: Parent rights versus valid accountability scores 12:25 PM–1:55 PM Renaissance West A CS C2 Building toward a validation argument with innovative field test design and analysis 12:25 PM–1:55 PM Renaissance West B CS C3 Towards establishing standards for spiraling of contextual questionnaires in large-scale assessments 12:25 PM–1:55 PM Meeting Room 3 CS C4 Estimation precision of variance components: Revisiting generalizability theory 12:25 PM–1:55 PM Meeting Room 4 PS C5 Sensitivity of Value-Added Models 12:25 PM–1:55 PM Meeting Room 5 PS C6 Item and Scale Drift 12:25 PM–1:55 PM Meeting Room 12 PS C7 Cognitive Diagnostic Model Extensions CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 216 Washington, DC, USA Time Room Type ID Title 12:25 PM–1:55 PM Mount Vernon Square EB C8 2:15 PM–3:45 PM Renaissance East IS D1 Assessing the assessments: Measuring the quality of new college- and career-ready assessments 2:15 PM–3:45 PM Renaissance West A CS D2 Some psychometric models for learning progressions 2:15 PM–3:45 PM Renaissance West B CS D3 Multiple Perspectives on Promoting Assessment Literacy for Parents 2:15 PM–3:45 PM Meeting Room 3 PS D4 Equating Mixed-Format Tests 2:15 PM–3:45 PM Meeting Room 4 PS D5 Standard Setting 2:15 PM–3:45 PM Meeting Room 5 PS D6 Diagnostic Classification Models: Applications 2:15 PM–3:45 PM Meeting Room 12 PS D7 Advances in IRT Modelling and Estimation 2:15 PM–3:45 PM Mount Vernon Square EB D8 GSIC Poster Session 4:05 PM–6:00 PM Renaissance East CS E1 Do Large Scale Performance Assessments Influence Classroom Instruction? Evidence from the Consortia 4:05 PM–6:05 PM Renaissance West A CS E2 Applications of Latent Regression to Modeling Student Achievement, Growth, and Educator Effectiveness 4:05 PM–6:05 PM Renaissance West B CS E3 Jail Terms for Falsifying Test Scores: Yes, No or Uncertain? 4:05 PM–6:05 PM Meeting Room 3 PS E4 Test Design and Construction 4:05 PM–6:05 PM Meeting Room 4 CS E5 Tablet Use in Assessment 4:05 PM–6:05 PM Meeting Room 5 PS E6 Topics in Multistage and Adaptive Testing 4:05 PM–6:05 PM Meeting Room 12 PS E7 Cognitive Diagnosis Models: Exploration and Evaluation 4:05 PM–5:35 PM Mount Vernon Square EB E8 6:30 PM–8:00 PM Grand Ballroom South NCME and Division D Reception Sunday, April 10, 2016 8:00 AM–9:00 AM Marriott Marquis Hotel, Marquis Salon 6 Breakfast and Business Session 9:00 AM–9:40 AM Marriott Marquis Hotel, Marquis Salon 6 Presidential Address: Education and the Measurement of Behavioral Change 10:35 AM–12:05 PM Renaissance East IS F1 Career Award: Do Educational Assessments Yield Achievement Measurements CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 217 2016 Annual Meeting & Training Sessions Time Room Type ID Title 10:35 AM–12:05 PM Renaissance West A IS F2 Debate: Should the NAEP Mathematics Framework be revised to align with the Common Core State Standards? 10:35 AM–12:05 PM Renaissance West B CS F3 Beyond process: Theory, policy, and practice in standard setting 10:35 AM–12:05 PM Meeting Room 3 CS F4 Exploring Timing and Process Data in Large-Scale Assessments 10:35 AM–12:05 PM Meeting Room 4 CS F5 Psychometric Challenges with the Machine Scoring of Short-Form Constructed Responses 10:35 AM–12:05 PM Meeting Room 5 PS F6 Advances in Equating 10:35 AM–12:05 PM Meeting Room 15 PS F7 Novel Approaches for the Analysis of Performance Data 10:35 AM–12:05 PM Mount Vernon Square EB F8 12:25 PM–2:25 PM Convention Center, Level Three, Ballroom ABC 2:45 PM–4:15 PM Renaissance East CS G1 Challenges and Opportunities in the Interpretation of the Testing Standards 2:45 PM–4:15 PM Renaissance West A CS G2 Applications of Combinatorial Optimization in Educational Measurement 2:45 PM–4:15 PM Renaissance West B PS G3 Psychometrics of Teacher Ratings 2:45 PM–4:15 PM Meeting Room 3 PS G4 Multidimensionality G5 Validating “Noncognitive”/Nontraditional Constructs I 2:45 PM–4:15 PM Meeting Room 4 AERA Awards Luncheon PS 2:45 PM–4:15 PM Meeting Room 5 PS G6 Invariance 2:45 PM–4:15 PM Meeting Room 15 PS G7 Detecting Aberrant Response Behaviors 2:45 PM–4:15 PM Mount Vernon Square EB G8 GSIC Poster Session 4:35 PM–5:50 PM Convention Center, Level Three, Ballroom C 4:35 PM–6:05 PM Renaissance East CS H1 Advances in Balanced Assessment Systems: Conceptual framework, informational analysis, application to accountability 4:35 PM–6:05 PM Renaissance West A CS H2 Minimizing Uncertainty: Effectively Communicating Results from CDM-based Assessments 4:35 PM–6:05 PM Meeting Room 16 CS H3 Overhauling the SAT: Using and Interpreting Redesigned SAT Scores AERA Presidential Address CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 218 Washington, DC, USA Time 4:35 PM–6:05 PM Room Meeting Room 3 Type ID CS H4 Title Quality Assurance Methods for Operational Automated Scoring of Essays and Speech 4:35 PM–6:05 PM Meeting Room 4 PS H5 Student Growth Percentiles 4:35 PM–6:05 PM Meeting Room 5 PS H6 Equating: From Theory to Practice 4:35 PM–6:05 PM Meeting Room 15 PS H7 Issues in Ability Estimation and Scoring 4:35 PM–6:05 PM Mount Vernon Square EB H8 6:30 PM–8:00 PM Renaissance West B President’s Reception Monday, April 11, 2016 5:45 AM–7:00 AM NCME Fitness Run/Walk 8:15 AM–10:15 AM Meeting Room 13/14 IS I1 NCME Book Series Symposium: Technology and Testing 8:15 AM–10:15 AM Meeting Room 8/9 CS I2 Exploring Various Psychometric Approaches to Report Meaningful Subscores 8:15 AM–10:15 AM Meeting Room 3 CS I3 From Items to Policies: Big Data in Education 8:15 AM–10:15 AM Meeting Room 4 CS I4 Methods and Approaches for Validating Claims of College and Career Readiness 8:15 AM–10:15 AM Renaissance West A IS I5 Recent Advances in Quantitative Social Network Analysis in Education 8:15 AM–10:15 AM Meeting Room 15 PS I6 Issues in Automated Scoring 8:15 AM–10:15 AM Meeting Room 16 PS I7 Multidimensional and Multivariate methods 10:35 AM–12:05 PM Renaissance West A IS J1 Hold the Presses! How Measurement Professionals can Speak More Effectively with the Press and the Public (Education Writers Association Session) 10:35 AM–12:05 PM Meeting Room 8/9 CS J2 Challenges and solutions in the operational use of automated scoring systems 10:35 AM–12:05 PM Meeting Room 3 CS J3 Novel Models to Address Measurement Errors in Educational Assessment and Evaluation Studies 10:35 AM–12:05 PM Meeting Room 4 CS J4 Mode Comparability Investigation of a CCSS based K-12 Assessment 10:35 AM–12:05 PM Meeting Room 16 PS J5 Validating “Noncognitive”/Nontraditional Constructs II 10:35 AM–12:05 PM Meeting Room 15 PS J6 Differential Functioning - Theory and Applications 10:35 AM–12:05 PM Meeting Room 5 PS J7 Latent Regression and Related Topics 11:00 AM–2:00 PM Meeting Room 12 12:25 PM–1:55 PM Meeting Room 8/9 IS K1 The Every Students Succeeds Act (ESSA): Implications for measurement research and practice 12:25 PM–1:55 PM Renaissance West A CS K2 Career Paths in Educational Measurement: Lessons Learned by Accomplished Professionals Past Presidents Luncheon CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 219 2016 Annual Meeting & Training Sessions Time Room Type ID Title 12:25 PM–1:55 PM Meeting Room 3 CS K3 Recent Investigations and Extensions of the Hierarchical Rater Model 12:25 PM–1:55 PM Meeting Room 4 CS K4 The Validity of Scenario-Based Assessment: Empirical Results 12:25 PM–1:55 PM Meeting Room 5 PS K5 Item Design and Development 12:25 PM–1:55 PM Meeting Room 15 PS K6 English Learners 12:25 PM–1:55 PM Meeting Room 16 PS K7 Differential Item and Test Functioning 12:25 PM–1:55 PM Mount Vernon Square EB K8 2:15 PM–3:45 PM Renaissance West A IS L1 Learning from History: How K-12 Assessment Will Impact Student Learning Over the Next Decade (National Association of Assessment Directors) 2:15 PM–3:45 PM Meeting Room 8/9 CS L2 Psychometric Issues on the Operational NewGeneration Consortia Assessments 2:15 PM–3:45 PM Meeting Room 3 CS L3 Issues and Practices in Multilevel Item Response Models 2:15 PM–3:45 PM Meeting Room 4 CS L4 Psychometric Issues in Alternate Assessments 2:15 PM–3:45 PM Meeting Room 5 CS L5 Recommendations for Addressing the Unintended Consequences of Increasing Examination Rigor 2:15 PM–3:45 PM Meeting Room 15 PS L6 Innovations in Assessment 2:15 PM–3:45 PM Meeting Room 12 PS L7 Technology-based Assessments 2:15 PM–3:45 PM Meeting Room 13/14 L8 NCME Diversity and Testing Committee Sponsored Symposium: Implications of Computer-Based Testing for Assessing Diverse Learners: Lessons Learned from the Consortia 3:00 PM–7:00 PM Meeting Room 10/11 4:05 PM–6:05 PM Meeting Room 8/9 CS M1 Fairness Issues and Validation of Non-Cognitive Skills 4:05 PM–6:05 PM Meeting Room 3 CS M2 Thinking about your Audience in Designing and Evaluating Score Reports 4:05 PM–6:05 PM Meeting Room 4 CS M3 Use of automated tools in listening and reading item generation 4:05 PM–6:05 PM Meeting Room 5 PS M4 Practical Issues in Equating 4:05 PM–6:05 PM Meeting Room 16 PS M5 The Great Subscore Debate 4:05 PM–6:05 PM Meeting Room 12 PS M6 Scores and Scoring Rules 4:05 PM–6:05 PM Meeting Room 13/14 IS M7 On the use and misuse of latent variable scores IS Board Meeting CS=Coordinated Session • EB= Electronic Board Session IS= Invited Session • PS= Paper Session • TS=Training Session 220 National Council on Measurement in Education is very grateful to the following organizations for their generous financial support of our 2016 Annual Meeting National Council on Measurement in Education 100 North 20th Street, Suite 400. Philadelphia, PA 19103 (215) 461-6263 http://www.ncme.org/