CA-‐YASI Predictive Utility

Transcription

CA-‐YASI Predictive Utility
1
2
CA-­‐YASI Predictive Utility How well do scores and classifications predict youths’ infractions and re-­‐arrest? Jennifer Skeem, PhD Patrick Kennealy, PhD Isaias Hernandez, Stephanie Clark, Joseph Tatar II University of California, Irvine C o n t a c t : s k e e m @ u c i . e d u ; ( 9 4 9 ) 2 9 4 -­‐ 1 4 7 2 1 Table of Contents Executive Summary .............................................................................................................................. 3 Context ................................................................................................................................................................ 3 Objective ............................................................................................................................................................ 3 Method ................................................................................................................................................................ 4 Results & Conclusion ...................................................................................................................................... 4 Recommendations .......................................................................................................................................... 7 Rationale & Method .............................................................................................................................. 8 Overview of evaluation ................................................................................................................................. 8 Importance of predictive utility ................................................................................................................. 9 Development of (male) study samples ................................................................................................. 10 Description of study participants ........................................................................................................... 10 Prediction variables: CA-­‐YASI assessments ....................................................................................... 11 Criterion variables: Infractions and arrests ...................................................................................... 12 Analyses & Results ............................................................................................................................. 14 Does the predictive utility of the CA-­‐YASI vary as a function of youths’ ethnicity? ............... 14 How well do CA-­‐YASI scores predict youths’ infractions and arrests? ....................................... 15 How well do CA-­‐YASI classifications predict youths’ infractions and arrests? ....................... 17 Conclusions .......................................................................................................................................... 21 Recommendations ............................................................................................................................. 22 The present: Understand what high CA-­‐YASI scores mean (and don’t mean) ......................... 22 The future: Streamline the DJJ approach to risk assessment ....................................................... 24 Appendix: Supplemental Tables ................................................................................................... 27 2 Executive Summary Context Years ago, the California Division of Juvenile Justice (DJJ) purchased the California-­‐Youth Assessment and Screening Instrument (CA-­‐YASI) from Orbis Partners Incorporated to help structure its decision-­‐making about, and treatment of, youth. For the past three years, we have been conducting a court-­‐mandated, independent evaluation of the CA-­‐YASI. The three specific aims of this evaluation are to (1) examine the extent to which DJJ staff are able to reliably score the CA-­‐YASI, (2) evaluate how well the CA-­‐YASI assesses risk factors it purports to assess, and (3) assess the utility of this tool in predicting future infractions and re-­‐arrest. Aim One was addressed in our first report. We found that 60% of DJJ staff were able to score the CA-­‐YASI with adequate reliability, at the total score level. Because a tool cannot be valid when it is not scored reliably, we excluded the remaining 40% of DJJ staff from subsequent reports, which represent a “best case scenario” for the validity of the CA-­‐YASI. Aim Two was addressed in our second report. We found little evidence that central CA-­‐YASI domains capture the variable risk factors they are meant to assess (e.g., anger/hostility, criminal thinking). Thus, the tool’s ability to guide evidence-­‐based treatment efforts is limited. Aim Three is addressed in this third and final report. Here, we use reliable CA-­‐YASI ratings to assess how well the measure predicts both infractions during detention, and arrests after release to the community. Objective Some risk assessment tools are relatively short and simple; many -­‐-­‐like the CA-­‐YASI-­‐-­‐ are relatively long and/or complex. Loosely, these tools represent an evolution in risk assessment over time, from prediction-­‐oriented approaches (which were designed solely to achieve efficient prediction) to reduction-­‐oriented approaches (which also emphasize variable risk factors that theoretically can be changed to reduce risk). Across tools and orientations, good predictive utility is the most fundamental criterion of risk assessment. Risk assessment tools generate scores and classifications (e.g., “low,” “medium,” “high” risk) that ostensibly predict youths’ future antisocial behavior (e.g., infractions, arrests). Predictive utility may be defined as the accuracy with which a tool characterizes a youth’s likelihood of future antisocial behavior, compared to other youth. Good predictive utility is a pre-­‐requisite for sound legal and correctional applications of risk assessment tools. A tool with poor predictive utility will promote erroneous legal decisions that sometimes have serious consequences. For example, a youth erroneously classified as “low risk” and released to the community without supervision may violently reoffend. Or a youth erroneously classified as “high risk” and needlessly confined for months may be exposed to 3 adverse conditions (e.g., procriminal peers; separation from prosocial family) that exacerbate risk. Moreover, when staff use functionally meaningless risk classifications to select levels of custody and treatment, correctional resources will not be maximized and offending will not be reduced. The objective of this Phase III assessment is to determine how well CA-­‐YASI scores and classifications predict infractions during detention, and arrest after release. Method To address this objective, we carefully produced relevant databases by merging data obtained from Orbis (CA-­‐YASI assessments), the California Department of Justice (arrests), and DJJ (detention periods and infractions). Study eligibility criteria were (a) male gender (because there were too few girls to adequately assess predictive utility), and (b) assessment by a DJJ staff member with adequate reliability in scoring the CA-­‐YASI (because unreliable scores cannot be valid). Participants were 846 ethnically diverse male youth with an average age of 17 years. Over half had a history of a violent offense. During their DJJ detention, participants were assessed with the CA-­‐YASI by a reliable staff member, as part of routine practice. All 846 youth were subsequently followed in DJJ facilities to determine whether they committed an infraction. The subset of 364 youth who were released from DJJ facilities were also followed in the community to determine whether they committed an arrest. Youths’ time at risk (i.e., the time they were in the institution and at risk for an infraction; or in the community and at risk for an arrest) was controlled for in all analyses. On average, youth were at risk for an infraction for 7 months (sd=4) and for an arrest for 10 months (sd=6). We used the “new” system recently developed by Orbis to score the CA-­‐YASI (not the system currently in use). Analyses focused on Grand Total Scores, as well as three global subscales (Total Static Risk, Total Dynamic Risk, and Protective). Infractions and arrests were coded into three categories: Any (i.e., any kind of infraction or arrest, including minor ones), Serious (i.e., infractions that can increase detention by 120 or 240 days; arrest for a person-­‐ or property-­‐ crime), and Violent (i.e., infraction or arrest for physical violence). Results & Conclusion The results represent a best case scenario for the predictive utility of the CA-­‐YASI with male youth because we (a) excluded the 47% of youth who were assessed by DJJ staff who could not score the tool reliably, and (b) used “new” CA-­‐YASI scores developed by Orbis to maximize predictive utility rather than scoring system currently programmed into DJJ software. With this best case scenario framework in mind, the study yielded three main findings: 1. The predictive utility of the CA-­‐YASI for infractions and arrests does not depend upon youths’ ethnicity. This is an important positive finding, given the diverse ethnic distribution of DJJ youth – the tool appears to work equally well for White, Black, and Hispanic youth. 4 2. Total CA-­‐YASI scores and classifications perform quite well in predicting institutional infractions – particularly serious infractions and violent infractions. For example, compared to youth classified as low risk by the CA-­‐YASI, those classified as medium-­‐ or high-­‐ risk were three-­‐ or six-­‐ times more likely to commit a serious infraction, respectively. 3. Although total CA-­‐YASI scores and classifications moderately predict “any” form of arrest, they perform poorly in predicting serious arrests and violent arrests. For example, compared to youth classified as low risk by the CA-­‐YASI, those classified as medium or high risk were both about 1.5 times more likely to be arrested for a serious offense. Having outlined the study’s three main findings, we now present more specific results on the performance of Grand Total CA-­‐YASI scores and classifications. The performance of CA-­‐YASI scores in predicting Any-­‐, Serious-­‐, and Violent-­‐ infractions and arrests is summarized in Figure 1 below. The Y axis represents the Area Under the ROC Curve (AUC), which ranges from .50 (predictive accuracy no better than chance) to 1.0 (perfect predictive accuracy). As shown in Figure 1, CA-­‐YASI scores strongly predict Serious Infractions and Violent Infractions, moderately predict Any Infraction and Any Arrest, and poorly predict Serious Arrests and Violent arrests. Figure 1. Predictive utility of CA-­‐YASI Total Scores for Infractions and Arrests 5 Generally, instruments that “can produce AUC values of 0.70 or above are considered acceptable for clinical application purposes” (Zhang, Roberts, & Farabee, 2011). By this definition, the CA-­‐YASI produces an acceptable level of predictive utility for infractions (serious and violent), but not arrests (any, serious, or violent). To place the tool’s predictive utility in context, relative to other instruments used in the CDCR system, the CA-­‐YASI performs somewhat more poorly in predicting any arrest for youth (AUC=.66) than the COMPAS performs in predicting a general arrest for adults (AUC=.70; Zhang et al., 2011). These results above indicate “in the abstract” – that is, in theory, across all possible cut scores -­‐-­‐ how well the CA-­‐YASI predicts antisocial behavior. But CA-­‐YASI scores are difficult to interpret, and tend not to be used in daily DJJ practice. Instead, CA-­‐YASI classifications of youth as “low,” “medium,” or “high” risk are used to inform decision-­‐making. As shown in Figure 2 below, Figure 2. Predictive utility of CA-­‐YASI classifications for Serious Infractions and Serious Arrests Grand Total CA-­‐YASI classifications perform well in differentiating among youth who will have low (21%), medium (46%), and high (71%) rates of Serious Infractions over the next six months…and poorly in differentiating among youth who will have low (46%) medium (61%) and high (57%) rates of Serious Arrests during the year after their release. They performed better in 6 differentiating among youth who will have low (55%), medium (75%) and high (87%) rates of Any Arrests during the year after release. Recommendations The implications of this study’s results largely depend on how DJJ is using the CA-­‐YASI now, and whether/how it plans to use this instrument in the future. We would be happy to meet with relevant executives to discuss these issues. In the interim, we have two groups of general recommendations. First, we recommend that DJJ executives, administrators, and staff bear in mind that high scores on the CA-­‐YASI chiefly mean that youth are at risk for serious and violent infractions during detention and, to a lesser extent, any arrest after release. Youth with high total CA-­‐YASI scores are not at disproportionate risk for a serious or violent arrest in the community. If DJJ personnel are interested in identifying youth at risk for serious recidivism after release, we recommend that they use Total Static Risk scores on the CA-­‐YASI (with the realization that these scores are only moderately predictive). As a whole, the CA-­‐YASI may be more useful for structuring supervision and treatment decisions during detention, than after release. Second, given the length and complexity of the CA-­‐YASI, we recommend that DJJ carefully consider whether the tool adds value to simpler measures of risk (e.g., the publicly available CAIS, which consists of only ten items; or a “home-­‐grown” tool for youth that is modeled after the adult California Static Risk Assessment). In our Phase II study, we found little evidence of added value – that is, little support for the notion that the CA-­‐YASI validly assesses strong variable risk factors that can inform evidence-­‐based treatment efforts. One option is to begin anew by developing or adopting an efficient risk assessment tool. Another option is to streamline, simplify, and generally improve the CA-­‐YASI. The fact that the CA-­‐YASI has been customized for the DJJ population and adequately predicts several forms of infractions and “any arrest” make the latter option appealing. If the CA-­‐YASI is retained, DJJ must implement a system for training and monitoring staff to accurately score the tool, given that 40% of staff currently have inadequate scoring reliability. 7 Rationale & Method Overview of evaluation Few ideals have greater traction in current discourse than “evidence-­‐based practice.” According to this ideal, the best research informs practice that improves outcomes. Across the United States, budget cuts are fueling interest in evidence-­‐based corrections. Policymakers wish to spend limited dollars wisely, in the manner that will best protect public safety. How can they do so? First, by supporting the use of well-­‐validated, structured tools to assess offenders’ risk of recidivism and inform sentencing and placement decisions. Research has established that validated risk assessment tools significantly improve professionals’ ability to predict future criminal behavior. Increasingly, these tools are being applied in response to regulations that require assessments to identify “high risk” individuals for detention or “low risk” individuals for release. Second, by supporting correctional programs that (a) match the intensity of services and supervision to an offender’s level of risk (such that high risk cases get high intensity services/supervision), and (b) target variable risk factors for crime (e.g., criminal thinking patterns) rather than variables that are less crime-­‐relevant (e.g., low self esteem). Programs that follow these principles have been shown to significantly reduce recidivism. Increased interest in evidence-­‐based corrections has created an active market for risk assessment tools. Tools that measure variable risk factors and can therefore inform risk reduction efforts are particularly popular. A handful of companies are selling tools to corrections agencies across the nation. However, the evidence base for these tools varies considerably – some tools in widespread use are not well-­‐validated. The mission of the California Department of Juvenile Justice (DJJ) is to protect public safety, partly by providing youth with “a range of training and treatment services” that could help them desist from crime. Years ago, DJJ purchased the California-­‐Youth Assessment and Screening Instrument (CA-­‐YASI) from a company called Orbis Partners Incorporated (hereafter, “Orbis”) to help structure its decision-­‐making about youth. Among tools currently on the market, the CA-­‐YASI is appealing option for reducing risk because it ostensibly taps variable factors that reliably predict recidivism. However, it has not been extensively evaluated. In this court-­‐mandated, independent evaluation, we assess whether the CA-­‐YASI is a good tool for making placement and release decisions (i.e., assessing risk) and identifying supervision and intervention targets (i.e., identifying variable risk factors). There are three specific aims: Aim 1:
to examine the extent to which DJJ staff are able to reliably score the CA-­‐YASI, Aim 2:
to evaluate how well the CA-­‐YASI assesses treatment-­‐relevant risk factors it purports to assess, and Aim 3:
to assess the utility of this tool in predicting future infractions and re-­‐arrest. Aim One was addressed in our first report (finalized May, 2011). We found that 60% of DJJ staff were able to score the CA-­‐YASI with adequate reliably, at the total score level. Because a tool 8 cannot be valid when it is not scored reliably, we excluded the remaining 40% of DJJ staff from subsequent reports, which represent a “best case scenario” for the validity of the CA-­‐YASI. Aim Two was addressed in our second report (submitted December, 2012). We found little evidence that the CA-­‐YASI domains capture the variable risk factors they are meant to assess (e.g., anger/hostility, criminal thinking). With the possible exception of substance abuse, these domains provide little guidance for evidence-­‐based treatment efforts. In contrast, there was strong evidence that the CA-­‐YASI captures relatively static individual risk factors for crime (e.g., criminal history, antisocial pattern). For example, CA-­‐YASI Total scores were strongly associated with a well-­‐validated measure of social deviance, i.e., “Factor 2” of the Psychopathy Checklist: Youth Version. This bodes well for the CA-­‐YASI’s utility in predicting misbehavior, given that this scale – like simpler measures of criminal history – robustly predicts recidivism. Aim Three is addressed in this third and final report. Here, we use reliable CA-­‐YASI ratings to assess how well the measure predicts both infractions (during detention) and arrests (after release). Importance of predictive utility Some risk assessment tools are relatively short and simple; many -­‐-­‐like the CA-­‐YASI-­‐-­‐ are relatively long and/or complex. Loosely, these tools represent an evolution in risk assessment over time, from prediction-­‐oriented approaches (which were designed solely to achieve efficient prediction) to reduction-­‐oriented approaches (which also emphasize variable risk factors that theoretically can be changed to reduce risk). Across tools and orientations, good predictive utility is the most fundamental criterion of risk assessment. Risk assessment tools generate scores and classifications (e.g., “low,” “medium,” “high” risk) that ostensibly predict youths’ future antisocial behavior (e.g., infractions, arrests). Predictive utility may be defined as the accuracy with which a tool characterizes a youth’s likelihood of future antisocial behavior, compared to other youth. Good predictive utility is a pre-­‐requisite for sound legal and correctional applications of risk assessment tools. A tool with poor predictive utility will promote erroneous legal decisions that sometimes have serious consequences. For example, a youth erroneously classified as “low risk” and released to the community without supervision may violently reoffend. Or a youth erroneously classified as “high risk” and needlessly confined for months may be exposed to adverse conditions that exacerbate risk. At best, using a tool with poor predictive utility is a waste of precious juvenile justice resources. When risk classifications that are functionally meaningless are used to select levels of custody and treatment, correctional resources will not be maximized and recidivism will not be reduced. In brief, even when a risk assessment tool is oriented toward risk reduction, it must have adequate predictive utility to get to “first base.” 9 Development of (male) study samples All study procedures were approved by institutional review boards (for both the state and the University of California, Irvine). Four datasets were carefully processed to assess the predictive utility of the CA-­‐YASI for youths’ infractions and re-­‐arrests. The first dataset -­‐-­‐ on CA-­‐YASI assessments -­‐-­‐ defined the eligible population for the remaining three datasets. 1) CA-­‐YASI assessments: Between November 2, 2010 (the completion date for our CA-­‐YASI inter-­‐rater reliability study) and April 4, 2012 (the date Orbis provided us with CA-­‐YASI data), 1,830 DJJ youth were assessed with the CA-­‐YASI. The inclusion criterion for the present study was assessment by a DJJ staff member who demonstrated adequate CA-­‐YASI scoring reliability in our Phase I study. Of the 1,830 youth assessed, 978 (53%) were assessed by reliable DJJ staff and selected for the study. Of these 978 youth, only 33 were female (3% of the sample, as in the DJJ population). The female sample was too small and their base rate of recidivism was too low to support adequately powered, stable tests of predictive utility. For this reason, girls were excluded from this report. 945 male youth scored by reliable DJJ staff were the eligible population for the remaining three databases (on time at risk, infractions, and arrests).1 2) Time at risk in the institution (for infractions) and in the community (for arrests): of the 945 eligible youth (see #1), “time at risk” data were available for 881 youth. Specifically, these were data on youths’ dates of release and entry from DJJ institutions during the reference period (11/2/10 to 11/30/12), obtained from DJJ. 3) Infractions: of eligible youth (see #1), infraction data were available for 903 youth. These were data on youths’ infractions (or lack thereof) that occurred during the reference period (11/2/10 to 4/7/12), obtained from DJJ. 4) Arrests: of eligible youth (see #1), arrest data were available for 496 youth. These were data on youths’ arrests (or lack thereof) that occurred during the reference period (11/2/10 to 11/30/12), obtained from the CA DOJ. (The arrest sample is smaller than the infraction sample because only the subset of youth who are released to the community are at risk for re-­‐arrest, whereas all youth had some time at risk for an institutional infraction.) We merged these datasets to create two samples: an “infraction” sample (from datasets 1, 2 & 3) and an “arrest” sample (from datasets 1, 2, and 4). The infraction sample is 846 youth (of the 903 male youth with reliable CA-­‐YASI assessments and infraction data, 57 had no time at risk data). The arrest sample is 364 youth (of the 496 male youth with reliable CA-­‐YASI assessments and arrest data, 132 had no time at risk data). Description of study participants The arrest sample is essentially a subset of the infraction sample. Because the characteristics of the two samples are virtually identical, the larger sample will be described here. Participants 1
We excluded 19 cases with reliable CA-­‐YASI assessments because they had been included in a sample 10 were 846 male youth with CA-­‐YASI assessments completed by reliable DJJ staff. This sample is representative of the total male DJJ population with respect to age, ethnicity, and index offense. For example, the average age of participants is 17.00 years (SD = 1.88), which is similar to that of the DJJ population (18.42 years, SD = 1.86; CDCR, 2012). Similarly, like the total DJJ population, participants were 60.5% Hispanic, 29.4% Black, 6.3% White, and 1.7% other ethnicities. In terms of legal history, participants were approximately 14 years old (M = 13.74, SD = 1.96) at the age of first arrest, and over half (55.6%) had at least one assault or other violent offense in their criminal history. Prediction variables: CA-­‐YASI assessments Assessment procedure. Eligible youth were assessed with the CA-­‐YASI by reliable DJJ staff as part of routine practice. For the majority of eligible youth who were included in both samples, only one CA-­‐YASI assessment was available, so that assessment was used to predict both infractions and arrests. However, for 160 eligible youth who were included in both samples, multiple CA-­‐YASI assessments were available. For these youth, the earliest CA-­‐YASI assessment was used to predict institutional infractions (to maximize time at risk), and the latest CA-­‐YASI assessment was used to predict community arrests (to maximize recency). For these 160 cases, the average time between assessments was almost six months (i.e., M=5.73, sd= 3.56). It is also important to note that the average time between a CA-­‐YASI assessment that was used to predict arrests and youths’ DJJ release date was about six months (5.88 months, sd=5.22). Use of new scoring system. As noted in our first two reports, three different scoring systems are available for the CA-­‐YASI (i.e., “old,” “new,” and “simple”). In the present report, we use the new scoring system, which assigns weights to items based on how strongly they predicted infractions and arrests in a sample of DJJ youth. The new system neutralizes several items (i.e., effectively deletes them) because they did not predict offending. In essence, the present study is a cross-­‐validation analysis of this new scoring system developed by Orbis -­‐-­‐ with an independent sample of DJJ youth (see footnote 2 above). We focus on the predictive utility of continuous scores and risk classifications (e.g., low, medium, high) generated by the new scoring system. Focus on global scales. All kinds of risk factors—fixed, variable, etc.—are relevant to prediction-­‐
oriented risk assessment. As summarized by Gottfredson and Moriarty, “if a variable can be measured reliably, and if it is predictive, then of course it should be used—absent legal or ethical challenge.” The content of the risk factors is empirically irrelevant, because the goal is to forecast ofending as efficiently and accurately as possible. Understanding the process that leads to recidivism is useful only if it increases predictive efficiency or accuracy. For this reason, we focus more on global CA-­‐YASI scores in this report, than on the twelve specific domain scores (i.e., Legal History, Correctional Response, Violence-­‐Aggression, Social Influences, Substance Use, Attitudes, Social-­‐Cognitive Skills, Family, Education-­‐Employment, Health, Community Linkages, Community Stability; see Phase II report, Table 1 for definitions and psychometric properties). Specifically, we focus on Grand Total CA-­‐YASI Scores and (a) Total Dynamic Risk (designed to capture variable risk factors across 10 specific domains), (b) 11 Total Static Risk (designed to capture risk markers across 6 specific domains), and (c) Total Protective (designed to capture factors that protect against offending across 8 specific domains). As shown in Table 1 below, these global scores are weakly-­‐to-­‐strongly correlated with one another (statistics are based on the eligible population of 944 male DJJ youth). Total Static Risk scores tend to be the most weakly associated with the remaining global scales. Table 1. Association Among Global CA-­‐YASI Scales Grand Total 1. Total (Risk – Protective) 2. Total Dynamic Risk 3. Total Static Risk 4. Total Protective *** p < .001. 1.00 .89*** .46*** -­‐.88*** Total Dynamic Risk -­‐-­‐-­‐ 1.00 .29*** -­‐.64*** Total Static Risk -­‐-­‐-­‐ -­‐-­‐-­‐ 1.00 -­‐.20*** Total Protective -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ 1.00 Descriptive statistics on the CA-­‐YASI scores are provided in Appendix A (Table A). Youth’s average Grand Total CA-­‐YASI score was 21.0 (sd = 38.5). Criterion variables: Infractions and arrests Time at risk. For each youth, we used release and entry data to calculate the time that he was at risk for either an infraction (i.e., the CA-­‐YASI date to the DJJ release date or to the first infraction) or an arrest (i.e., the DJJ release date to the date arrest data were obtained or to the first arrest, accounting for any reincarcerations). The average time at risk for an infraction was 6.94 months (sd= 4.29) and the average time at risk for an arrest was 10.44 months (5.70). Infractions. Institutional infractions were varied and frequent in this sample. We classified over 65 different DJJ infractions (i.e., Levels 2 and 3) into one of eight descriptive categories, using a coding manual that is available upon request. The eight categories are listed in Table 2, along with their six month base rate (i.e., proportion of the sample with that infraction, six months post-­‐CA-­‐YASI) and annualized rate (i.e., average number of infractions per youth, per year, taking into account time at risk). As shown in Table 2, the most common type of infraction involved Defiance/refusal and Disruptive behavior. Our analyses focus on three main measures of institutional infractions: (a) Any Infraction (i.e., all infractions, across descriptive categories), (b) Serious Infraction (i.e., infractions that can increase detention by 120 or 240 days, according to DJJ policy), and (c) Physical Violence (i.e., physical acts that cause, or are intended to cause harm). Any Infraction is all-­‐inclusive and occurs for virtually all youth, unlike Serious Infractions and Physical Violence. 12 Table 2. Infraction types and base rates Infraction type Base rate, 6 mos. (n=476) Base rate, varied followup (n=846) 86.5% 46.8% 41.6% 39.3% Annualized rate per youth (n=846) Any infraction 88.4% 22.51 (34.06) Serious infraction (120/240) 49.2% 2.79 (4.60) Descriptive categories Violent infraction (e.g., physical altercation) 45.4% 2.05 (3.64) Verbal aggression/intimidation (e.g., verbal or written -­‐ 2.05 (4.10) harassment) Defiance/refusal (e.g., refusing directions; education failure) -­‐ 65.5% 4.05 (6.43) Disruptive behavior -­‐ 59.1% 9.47 (22.41) Covert misbehavior (e.g., malingering) -­‐ 11.6% 0.24 (0.85) Drugs (e.g., possessing, concealing) -­‐ 48.3% 2.07 (3.55) Gang-­‐related (e.g., displaying signs) -­‐ 15.5% 0.45 (1.41) Other -­‐ 48.0% 2.01 (3.50) *Note: measures of focus appear in bold; base rate is for full follow-­‐up periods that vary by youth; -­‐ indicates that the base rates were not calculated. For each measure of infraction (i.e., Any, Serious, and Violent), we calculated full follow-­‐up variables for the entire sample and fixed follow-­‐up variables for the subsample who were at risk for at least six months, i.e., had at least six months in the institution after the CA-­‐YASI assessment (n=476, 54% of the full sample). The full follow-­‐up variables (for survival analyses, described in results below) were (a) the number of days between the youth’s CA-­‐YASI assessment and either an infraction or DJJ release (if there was no infraction), and (b) presence/absence of infraction. The fixed follow-­‐up variables (for ROC/DIFR analyses; described in results below) simply indicated whether or not an infraction occurred within six complete months of follow-­‐up. Arrests. We first classified CA DOJ arrest labels into one of seven categories, using the coding system from the MacArthur Violence Risk Assessment Study (i.e., violent, potentially violent, other crimes against person, sex, property, drug, and minor; see http://macarthur.virginia.edu/Data/Pdf/mac_crime_class.pdf ). Our analyses focus on three arrest indices: Any arrest (i.e., all seven categories of arrest), Person or property arrest (i.e., all categories except drug and minor arrests), and Violent arrest. These may be viewed as increasingly serious forms of criminal behavior. These three measures are listed in Table 3, along with their twelve month base rate (i.e., proportion of the sample with that arrest, twelve months post-­‐release) and annualized rate (i.e., average number of arrests per youth, per year, taking into account time at risk). Although most youth were arrested for some kind of offense post-­‐release, fewer were arrested for relatively serious (person/property) or violent crimes. 13 Table 3. Arrest types and base rates Arrest type Any arrest Serious arrest (person/property) Violent arrest Base rate, 12 month follow-­‐
up (n=169) 71.0% 54.2% 41.1% Base rate, varied follow-­‐
up (n=364) 78.1% 51.9% 42.0% Annualized rate per youth (n=364) 3.15 (6.36) 1.55 (3.74) 0.94 (2.84) For each measure of arrest (i.e., Any, Serious, and Violent), we calculated full follow-­‐up variables for the entire sample and fixed follow-­‐up variables for the subsample who were at risk for at least twelve months (n=364, 46% of the full sample). The full follow-­‐up variables (for survival analyses; described in results below) were (a) the number of days between the youth’s release and either an arrest or the end of the observation period (if there was no infraction), and (b) presence/absence of arrest. The fixed follow-­‐up variables (for ROC/DIFR analyses; described in results below) simply indicated whether or not an arrest occurred within twelve complete months of follow-­‐up. Rationale for focus on offense severity, rather than specific type. We use “Any,” “Serious,” and “Violent” measures of infractions and arrests rather than specific categories (e.g., drug, property) because we found no evidence that the CA-­‐YASI scales differentially predict specific categories of antisocial behavior. For example, there is no evidence that the CA-­‐YASI (global scores or the Drug scale) predict drug-­‐related infractions any better-­‐ or worse-­‐ than other types of infractions. Analyses & Results We analyzed the predictive utility of the CA-­‐YASI in three basic steps. First, we assessed whether the predictive utility of the instrument varied as a function of youths’ ethnicity (via hierarchical survival analyses). To avoid exacerbating ethnic disparities in the juvenile justice system, it is essential to determine whether an instrument’s performance varies by ethnicity. Second, we assessed how well CA-­‐YASI scores predicted youths’ infractions and arrests. The results of these analyses indicate “in the abstract” – that is, in theory, across all possible cut scores -­‐-­‐ how well the tool predicts antisocial behavior. Third, we assessed how well CA-­‐YASI classifications predicted infractions arrests (via survival and DIFR analyses). The results of these analyses indicate “concretely” – using cut scores that are actually applied in practice – how well the tool predicts antisocial behavior. Does the predictive utility of the CA-­‐YASI vary as a function of youths’ ethnicity? Because the DJJ population is heavily comprised of minority youth, it is essential to ensure that the CA-­‐YASI predicts infracitons and arrests about equally well for Hispanic, Black, and White youth. For this reason, we began by assessing whether youths’ ethnicity moderated the predictive utility of the CA-­‐YASI for infractions and arrests. 14 We used survival analysis to do so. The main outcome variable in survival analysis is the time until the occurrence of an event of interest (e.g., recidivism). In essence, this is the number of days a youth “survives” in custody after his CA-­‐YASI assessment before committing an infraction; and the number of days a youth “survives” in the community after DJJ-­‐release before being arrested. Survival analysis is useful because it can handle the censoring of observations. Observations are called censored when information about their survival time is incomplete – for example, if most youth are followed for one year, but a given youth does not commit an infraction during that period, that youth is said to be right censored. Survival analysis is ideal for a study like ours, where youth have different follow-­‐up periods (some are followed for few months; others are followed for over a year). Following Baron & Kenny’s (1986) model, we performed a series of hierarchical survival analyses to test whether ethnicity (Hispanic vs. Non-­‐Hispanic; Black vs. non-­‐Black; and White vs. Non-­‐White) moderated the utility of CA-­‐YASI global scores (Grand Total, Dynamic Risk Total, Static Risk Total, and Protective Total) in predicting time to a serious infraction and time to a serious arrest. On the first step of each model, the CA-­‐YASI score and ethnicity were entered. On the second step, the interaction between the CA-­‐YASI and ethnicity was entered. A Cox proportional hazards regression was performed for each of the three ethnicity contrasts, four predictors, and two outcomes, for a total of 24 separate survival analyses. The interaction term of interest was not significant in any of these 24 analyses. For example, CA-­‐YASI scores did not interact with Hispanic/non-­‐Hispanic ethnicity to predict serious arrests, ΔX2 (df=1, n=351) = 0.571, p = .45. These results suggest that the predictive utility of CA-­‐YASI global scores does not change as a function of youths’ ethnicity. The tool appears to work as well for Whites as non-­‐Whites; Blacks as non-­‐Blacks; and Hispanics as non-­‐Hispanics. For this reason, the full samples (pooled across ethnicity) were used for all remaining analyses. How well do CA-­‐YASI scores predict youths’ infractions and arrests? We used both survival analysis and ROC analysis to test how well CA-­‐YASI scores predicted youths’ infractions and arrests. The goal is to determine, in the abstract (i.e., in theory), how well the tool predicts infractions and arrests. Infractions. We first performed a series of survival analyses (explained in the previous section) to test whether each CA-­‐YASI score significantly predicted each type of infraction (Any, Serious, and Violent). The results indicate that all four CA-­‐YASI global scales significantly predicted time to all three types of infractions. The complete results are provided in Appendix B (Table B). Given that the global CA-­‐YASI scores are correlated with one another (see Table 1) and all predict infractions (see Appendix Table B), it is important to determine whether each scale adds incremental utility to the other scales, in predicting infractions. If a scale contributes independent predictive power to other scales, then it is worth scoring. If it doesn’t, then predictive efficiency will be achieved by omitting the scale. Some scholars have argued that 15 “dynamic risk” scales add little or no predictive utility to “static risk” scales, which underscores the importance of assessing incremental utility. To test the incremental utility of each global CA-­‐YASI score (i.e., Total Dynamic Risk, Total Static Risk, and Total Protective), we performed a series of three hierarchical survival analyses in which two global scores were entered on the first step, and the remaining global score was entered on the second step, to predict time to a serious infraction. The results indicate that each global score adds incremental utility to the remaining scores. For example, after controlling for Total Dynamic Risk and Total Static Risk scores, the addition of Total Protective scores produced a significant change in X2 (df=1, n=788) = 22.23, p < .001, such that each one point increase in scores reduced the risk of a serious infraction by 3% (HR= 0.97, p<.001). These results suggest that each global scale may be worth scoring, from a predictive standpoint. The results thus far indicate that each CA-­‐YASI global score (independently) predicts infractions statistically significantly better than chance. But effects that are statistically significant are not necessarily strong enough to be practically useful. To provide an estimate of how well the CA-­‐
YASI predicts infractions, we conducted ROC analyses to generate a measure of association called the Area Under the Curve (AUC). The AUC is commonly used to assess the predictive utility of risk assessment measures, in part because it is readily interpretable and its values are not heavily influenced by base rates of offending (which vary across studies and sites). AUC values range from 0.50 (i.e., accuracy is not improved over chance) to 1.00 (i.e., perfect accuracy). As shown by Rice and Harris’ analyses (1995), minimum AUCs of .56, .64, and .71 correspond to “small,” “medium,” and “large” effect sizes, respectively. Generally, instruments that “can produce AUC values of 0.70 or above are considered acceptable for clinical application purposes” (Zhang, Roberts, & Farabee, 2011). As shown in Table 4 (see also Appendix Table C), Grand Total, Total Dynamic Risk and Total Protective2 scores on the CA-­‐YASI had large effect sizes for predicting serious and violent infractions, whereas Total Static Risk scores had a moderate effect size. For example, the AUC of .75 for the Grand Total indicates a 75% probability that a youth randomly selected from those who committed a violent infraction will obtain a higher CA-­‐YASI score than a youth randomly selected from those that did not commit a violent infraction. Table 4. Predictive utility of global CA-­‐YASI scores for infractions (n=476) CA-­‐YASI Global Scale Any infraction Serious infraction Grand Total .65** .74*** Total Dynamic Risk .62** .74*** Total Static Risk .66*** .65*** Total Protective .35** .29*** ** p <.001, *** p < .001 Violent infraction .75*** .75*** .65*** .28*** 2
Given that protective scores predict decreased risk of infractions, an AUC protective value of .28 roughly corresponds to an AUC risk value of .72. 16 Arrests. Having determined that most scales of the CA-­‐YASI predict serious and violent institutional infractions fairly well, we next assessed the scales’ predictive utility for arrests, after youth were released from DJJ institutions. Because the analyses were parallel to those completed for infractions, our presentation of results can be abbreviated. First, we performed a series of survival analyses to test whether each CA-­‐YASI score significantly predicted each type of arrest (Any, Serious, and Violent). The results indicate that three of the four CA-­‐YASI global scales (all but Total Protective) significantly predicted time to all three types of arrests. The complete results are provided in the Appendix (Table D). Second, because the Total Dynamic Risk and Total Static Risk scales correlate with one another (r = .27) and both predicted arrests, we tested their incremental utility by performing two hierarchical survival analyses in which one global score was entered on the first step, and the remaining global score was entered on the second step, to predict time to a serious arrest. The results indicate that the Total Static Risk scale adds incremental utility to the Total Dynamic Risk scale, but the reverse is not true. After controlling for Total Static Risk scores, the addition of Total Dynamic scores produced no significant change in X2 (df=1, n = 153) = 0.01, p = .92. This suggests that the static scale drives the CA-­‐YASI’s utility for predicting arrests. As noted earlier, statistically significant effects are not always large enough to be practically useful. So, third, to provide an estimate of how well the CA-­‐YASI predicts arrests, we conducted ROC analyses to generate AUCs. As noted earlier, minimum AUCs of .56, .64, and .71 correspond to “small,” “medium,” and “large” effect sizes, respectively (Rice & Harris, 1995). As shown in Table 5 (see also Appendix, Table E), only Total Static Risk scores had a large effect size, and only for predicting any arrest. Total Static risk scores also moderately predicted serious and violent arrests, unlike the rest of the CA-­‐YASI (which performed poorly with these types of arrests). Grand Total scores moderately predicated any arrest, but had no significant effect for serious or violent arrests. Thus, for arrests, the most efficient prediction can be achieved by focusing on static risk scores. Adding other variables – at least for serious forms of crime -­‐ appears to reduce predictive efficiency (judging from Grand Total scores). Table 5. Predictive utility of global CA-­‐YASI scores for arrests (n=169) CA-­‐YASI Global Scale Any arrest Serious arrest Grand Total .66** .56 Total Dynamic Risk .62* .54 Total Static Risk .72*** .65** Total Protective .39* .47 *** p < .001, ** p < .01, * p < .05. Violent arrest .56 .55 .66** .48 How well do CA-­‐YASI classifications predict youths’ infractions and arrests? The results above suggest that three CA-­‐YASI global scores (all but Total Static Risk) are strongly predictive of infractions, and one (Total Static Risk) is moderately-­‐strongly predictive of arrests. But CA-­‐YASI scores are difficult to interpret, and tend not to be used in daily DJJ practice. Instead, CA-­‐YASI classifications youth as “low,” “medium,” or “high” risk are used to inform 17 decision-­‐making. As shown in a recent national study,3 it is possible for classifications on a risk assessment tool to be functionally meaningless, even when scores on that tool have good predictive utility. Risk classifications involve nothing more – and nothing less – than chopping up a continuous score on a risk assessment tool to create a number of ordinal categories. Orbis used a sample of DJJ youth to optimize risk classifications for global scores on the CA-­‐YASI– presumably by identifying cut scores that created reasonably sized groups of youth with offense rates that were as different as possible from one another. We assessed the predictive utility of global CA-­‐YASI risk classifications in two ways. First, we used survival analysis to assess whether risk classifications (rather than scores) predicted infractions and arrests. Second, we used the DIFR statistic to assess how well risk classifications distinguished among youth at low, medium, and high risk for infractions and arrests. We focus on serious infractions here, because there was little difference in the predictive utility of the CA-­‐YASI scales for particular types of infractions (i.e., any, serious, violent). We present results for both serious arrests and any arrests because the CA-­‐YASI performed poorly in predicting serious arrests, but any arrest is also a policy-­‐
relevant outcome and the tool performed better this all-­‐
inclusive outcome. We focus exclusively on Grand Total CA-­‐YASI classifications as predictors because they are most likely to be used in practice (as opposed to classifications based on Total Dynamic Risk, Total Static Risk, Figure 3. Survival to Serious Infraction by CA-­‐YASI R isk Classification or Total Protective). 3
see Skeem et al., (2013); comment on Baird et al. (2013) 18 Serious infractions. The results of the survival analysis indicated Grand Total CA-­‐YASI classifications significantly predicted serious infractions, X2 (2, n= 846) = 110.40 , p < .001. Figure 3 displays days of survival in the institution without a serious infraction as a function of CA-­‐YASI classifications. As shown in Figure 3, the low (n=160), medium (n=363), and high (n=231) risk groups are clearly differentiated in their rates of infractions. At six months (appx. 180 days), for example, approximately 15%, 45%, and 75% of the low, medium, and high scoring youths had committed a serious infraction. Moreover, relative to the low scoring group, the medium and high scoring groups were at over three-­‐ and six-­‐ times greater risk for a serious infraction (HR = 3.12, Wald = 31.34 and HR = 6.44, Wald = 79.15, both p <.001). Given these promising results, we next tested how well the risk groupings performed in terms of ‘‘baserate dispersion’’ (see Silver, Smith, & Banks, 2000), or maximal differentiation among risk categories in their likelihood of infractions We used youths’ base rates of serious infractions at 6 months (n = 445) to compute the Silver-­‐Banks Dispersion Index for Risk (DIFR; Silver & Banks 1999). The DIFR is a weighted, composite log-­‐odds that assesses the difference between the baserate of infractions in the total sample (i.e., 49%) and the baserates of infractions in each of the risk classes produced by the classification model (i.e., 21%, 46%, and 72% for low, medium, and high risk, respectively). DIFR ranges from 0 to infinity, increasing as the classification model disperses cases into subgroups whose baserates of infractions are distant from the total sample baserate and whose subgroup sample sizes are large in proportion to the total sample size. The DIFR for CA-­‐YASI risk groupings was 0.82, which we view as relatively high, particularly compared to other risk assessment tools, as implemented in “real world” juvenile justice settings (see Baird et al., 2013, where DIFRs for tools that performed moderately well were .68-­‐.71). Serious arrest. The results of the survival analysis indicated Grand Total CA-­‐YASI classifications do not predict serious arrests, X2 (2, n= 351 = 4.31 , p=.11). Figure 4 displays days of survival in the community without a serious arrest as a function of CA-­‐YASI classifications. As shown in Figure 4, the low (n=88), medium (n=183), and high (n=80) risk groups are poorly Figure 4. Survival to Serious Arrest by CA-­‐YASI R isk Classification 19 differentiated in their rates of recidivism, particularly the medium and high groups. At nine months (appx. 270 days), for example, approximately 30%, 55%, and 55% of the low, medium, and high scoring youths had committed a serious arrest. Relative to the low risk group, the medium and high risk groups were 1.50 and 1.52 times greater risk, respectively (Wald = 3.81, p = .05; Wald = 3.13, p = .08). Next, we used the DIFR to test how well the risk groupings performed in terms of baserate dispersion, using youths’ base rates of serious arrests at 1 year (n = 169; Total sample = 56%; Low, medium, and high risk = 46%, 61%, and 57%, respectively). The DIFR for CA-­‐YASI risk groupings in predicting any arrests was .26, which we view as very poor. Notably, the performance of the CA-­‐YASI in predicting violent arrests was even poorer. Base rates for violent arrests for the low, medium, and high risk groups were 32%, 49%, and 43%; DIFR was not calculated because it cannot account the fact that the rate of violent arrest was higher for medium-­‐ than high-­‐ risk cases. Any arrest. Survival analysis results indicated Grand Total CA-­‐YASI classifications significantly predicted any arrests, X2 (2, n= 351 = 15.26, p <.011). Figure 5 displays days of survival in the institution without a serious infraction as a function of CA-­‐YASI classifications. As shown in Figure 5, the low (n=88), medium (n=183), and high (n=80) risk groups are poorly differentiated in their rates of recidivism, particularly the medium and high groups. At nine months (appx. 270 days), for example, approximately 43%, 62%, and 70% of the low, medium, and high scoring youths had committed any arrest. Relative to the low risk group, the medium group was at 1.66 and 2.26 times greater risk, respectively (Wald = 7.32, p < .01; Wald = 14.67, p < .001). Next, we used the DIFR to test how well the risk groupings performed in terms of baserate dispersion, using youths’ base rates of any arrests at 1 year (n = 169; Total sample = 72%; Low, Figure 5. Survival to Any Arrest by CA-­‐YASI R isk Classification 20 medium, & high risk = 55%, 76%, & 87%, respectively). The DIFR for CA-­‐YASI risk groupings in predicting any arrests was .61, which we view as small-­‐moderate (see Baird et al., 2013, where DIFRs for risk assessment tools with moderate performance were .68-­‐.71). . Conclusions This is the third and final study of our independent evaluation of the performance of the CA-­‐
YASI. The results represent a best case scenario for the predictive utility of the CA-­‐YASI because we (a) excluded the 40% of DJJ staff who cannot reliably score the tool, and (b) used “new” CA-­‐
YASI scores developed by Orbis to maximize predictive utility rather than scoring system currently programmed into DJJ software. The results represent male DJJ youth only, as the female sample was too small to support analyses. The results are sound, in the sense that missing data were limited and sample sizes and base rates of infractions and arrests (including serious-­‐ and violent arrests) provided adequate statistical power to detect even small-­‐medium effects. The study yielded three main findings: 1. The predictive utility of the CA-­‐YASI for infractions and arrests does not depend upon youths’ ethnicity. This is an important positive finding, given the diverse ethnic distribution of DJJ youth – the tool appears to work equally well for White, Black, and Hispanic youth. 2. Total CA-­‐YASI scores and classifications perform quite well in predicting institutional infractions – particularly serious infractions and violent infractions. For example, compared to youth classified as low risk by the CA-­‐YASI, those classified as medium-­‐ or high-­‐ risk were three-­‐ or six-­‐ times more likely to commit a serious infraction, respectively. 3. Although total CA-­‐YASI scores and classifications moderately predict “any” form of arrest, they perform poorly in predicting serious arrests and violent arrests. For example, compared to youth classified as low risk by the CA-­‐YASI, those classified as medium or high risk were both about 1.5 times more likely to be arrested for a serious offense. The performance of Grand Total CA-­‐YASI scores in predicting Any-­‐, Serious-­‐, and Violent-­‐ infractions and arrests is summarized in Figure 1 (from the Executive Summary) below. As shown there, although Grand Total CA-­‐YASI scores predict the all-­‐inclusive categories of “Any” infraction or arrest moderately well, they predict the more policy-­‐relevant categories of “Serious” and “Violent” offenses strongly for infractions, but poorly for arrests. Generally, instruments that “can produce AUC values of 0.70 or above are considered acceptable for clinical application purposes” (Zhang, Roberts, & Farabee, 2011). By this definition, the CA-­‐YASI produces an acceptable level of predictive utility for infractions (serious and violent), but not arrests (any, serious, or violent). To place the tool’s predictive utility in context, relative to other instruments used in the CDCR system, the CA-­‐YASI performs somewhat more poorly in predicting any arrest for youth (AUC=.66) than the COMPAS performs in predicting a general arrest for adults (AUC=.70; Zhang et al., 2011). 21 Figure 1. Predictive utility of Grand Total CA-­‐YASI Scores for Infractions and Arrests (AUCs) Recommendations The implications of this study’s results largely depend on how DJJ is using the CA-­‐YASI now, and whether/how it plans to use CA-­‐YASI in the future. We would be happy to meet with relevant DJJ executives and staff to discuss these issues. In the interim, we note two groups of general implications that the findings of this study have for DJJ practice. The present: Understand what high CA-­‐YASI scores mean (and don’t mean) The first group of implications addresses how DJJ can apply the CA-­‐YASI now, to maximize its utility in assessing and classifying youths’ risk of infractions and arrests. These implications apply both to DJJ executives/administrators (in establishing policy for the system) and to DJJ staff (in making decisions about individual youth). CA-­‐YASI Grand Total classifications of youth as low, medium, or high risk are most likely to be used in everyday practice in DJJ. The CA-­‐YASI tends to classify half of DJJ youth as medium risk, and the remaining half as low-­‐ (one-­‐quarter) or high-­‐ (one-­‐quarter) risk. One of the best-­‐
validated principles of correctional treatment is to reserve the most intensive services and 22 supervision for high risk cases, leaving relatively little of each for low risk cases. The results of this study suggest that it is important to consider, “high risk for what?” The one-­‐quarter of youth classified as high risk by the CA-­‐YASI scores are at relatively high risk for (a) serious-­‐ and violent-­‐ infractions while they are in the institution, and, to a lesser extent, (b) any arrest after release to the community. They are not specifically at high risk for serious-­‐ or violent-­‐ arrests. As shown in Figure 2 (repeated from the Executive Summary), if the focus is on the policy-­‐relevant outcome of “serious” offenses, high risk youth are distinct from the large group of youth classified as moderate risk for infractions, but not arrests. Notably, low risk youth are more consistently distinct from the remaining three-­‐quarters of (‘moderate’ and ‘high’ risk) youth. Thus, in theory, if the CA-­‐YASI is used to assign youth to intensive services, this would help prevent serious and violent infractions and, to a lesser extent, general arrests. When the CA-­‐
YASI is used to predict offenses, the outcome it will most strongly predict is infractions and, Figure 2. Recidivism rates (in %) by CA-­‐YASI R isk Classifications to a lesser extent, general arrests. To be clear, youth with high total CA-­‐YASI scores are not at disproportionate risk for a serious or violent arrest. If DJJ is interested in identifying youth at risk for relatively serious recidivism in the community, they should use Total Static Risk scores and classifications on the CA-­‐YASI. Unlike Grand Total CA-­‐YASI scores, scores on this subscale moderately (but not strongly) predict serious recidivism in the community. 23 The future: Streamline the DJJ approach to risk assessment On the whole, these “best case scenario” results indicate that the CA-­‐YASI can adequately characterize a youth’s likelihood of infractions and, to a lesser extent, arrests. With the important exception of serious and violent arrests, the CA-­‐YASI performed quite well…from a strictly prediction-­‐oriented perspective. Given the length and complexity of the CA-­‐YASI, the question that DJJ executives must consider is whether the tool adds value to simpler measures of risk by assessing constructs that help explain the process that leads to recidivism. In our Phase II study, we found little evidence that it does. Because the CA-­‐YASI cannot specify variable risk factors to target in treatment to reduce recidivism, its utility is limited, from a risk reduction-­‐oriented perspective. Risk assessment tools that are short and simple tend to predict recidivism as well as those that are longer and more complex (for a review, see Monahan & Skeem, 2013; Skeem & Monahan, 2011). For this reason, Skeem & Monahan (2011) concluded: Given a pool of instruments that are well validated for the groups to which an individual belongs, our view is that the choice among them should be driven by the ultimate purpose of the evaluation. If the ultimate purpose is to characterize an individual’s likelihood of [recidivism] relative to other people, then choose the most efficient instrument available…If the ultimate purpose is to manage or reduce an individual’s risk, then value may be added by choosing an instrument that includes [variable] risk factors. In choosing the CA-­‐YASI, DJJ selected an instrument that specifically includes variable risk factors. We found little evidence that the CA-­‐YASI validly measures these variable risk factors. At best, its added value is unclear. Frankly, this is the case for most risk reduction-­‐oriented (AKA “risk-­‐needs”) tools. Given the current state of the science, efficiency and simplicity are to be preferred, in choosing a risk assessment approach. Variable risk factors (“needs”) require specific assessment only if there is a realistic likelihood that they subsequently will be addressed with pertinent treatment services. We encourage DJJ to consider (a) their available treatment services, and (b) whether and how they use the CA-­‐YASI to refer youth to these services. A simple prediction-­‐oriented assessment is sufficient, if the goal is to exclude all low risk offenders from services and provide all high risk offenders will the same generic services. Assessment of a specific variable risk factor may be added (e.g., anger problems), if a specific type of treatment is available to some, but not to all, high risk offenders (e.g., Aggression Replacement Therapy). It is a waste of time to assess variable risk factors that the system will not or cannot even attempt to change. Having said this, DJJ has invested considerable resources in the CA-­‐YASI. The tool has been customized for the DJJ population and adequately predicts several forms of infractions and “any” arrest. It may make sense to streamline, simplify, and improve the current system, rather 24 than begin anew. Once this is done, it will be important to adequately train and monitor DJJ staff, given that 40% are unable to score the tool reliably. Resource Requirements This project has been plagued by long delays associated with maintaining a contract. These are well documented in progress reports emails between UC Irvine and CDCR over the past year. Outside of many months spent waiting to establish a new contract (after DJJ refused to extend the old one to account for already-­‐accruing delays), the remainder of the period was spent on (a) data entry, cleaning, and analysis, and (b) interpretation and report-­‐writing. During the periods that were covered by a contract, the UCI Team held regular staff meetings. Given substantial turnover at CDCR (both in Research and DJJ), the UCI Team did not meet regularly with CDCR staff. However, they stayed in close contact with Steve Lesch and (when he left) Tammy McGuire. The core UCI Team for Phase Three consisted of three individuals: the project manager (Kennealy), a data analyst (Kearney), and the principal investigator (Skeem). The Principal Investigator oversaw data collection, cleaning, and analysis; interpreted results; and drafted this report. 25 References Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of personality and social psychology, 51(6), 1173. Monahan, J., & Skeem, J. (2013). Risk redux: The resurgence of risk assessment in criminal sanctioning. Unpublished paper under review (written for National Association of Sentencing Commissions symposium, August 2013). Rice, M. E., & Harris, G. T. (1995). Violent recidivism: Assessing predictive validity. Journal of consulting and clinical psychology, 63(5), 737. Silver, E., Smith, W. R., & Banks, S. (2000). Constructing Actuarial Devices for Predicting Recidivism A Comparison of Methods. Criminal Justice and Behavior, 27(6), 733-­‐764. Skeem, J., & Monahan, J. (2011). Current directions in violence risk assessment. Current Directions in Psychological Science, 20, 38-­‐42. Zhang, S. X., Roberts, R. E., & Farabee, D. (2011). An Analysis of Prisoner Reentry and Parole Risk Using COMPAS and Traditional Criminal History Measures. Crime & Delinquency. 26 Appendix: Supplemental Tables Table A. Descriptive Statistics for CA-­‐YASI (N=846) CA-­‐YASI Scale Mean Grand Total 21.03 Dynamic Risk Total Dynamic Risk 24.65 Aggression-­‐Violence 3.78 Social Influences 6.37 Substance Abuse 0.76 Attitudes 3.55 Social-­‐Cognitive Skills 4.10 Family 1.11 Education-­‐Employment 2.62 Health 0.35 Community Linkages 0.81 Community Stability 0.34 Static Risk Total Static Risk 9.81 Legal History 3.10 Correctional Response 4.59 Aggression-­‐Violence 0.56 Substance Abuse 0.38 Family 0.43 Education-­‐Employment 0.47 Protective Total Protective 13.42 Aggression-­‐Violence 1.45 Social Influences 2.15 Attitudes 1.43 Social-­‐Cognitive Skills 1.40 Family 1.45 Education-­‐Employment 2.95 Community Linkages 1.10 Community Stability 1.20 SD 28.53 14.02 3.96 4.85 1.58 1.80 2.99 1.29 2.31 0.50 0.83 1.35 7.48 3.60 3.24 1.47 0.78 0.74 1.19 14.03 2.83 3.09 2.60 3.87 1.94 2.81 2.36 2.16 Skew -­‐0.58 0.50 1.21 0.15 0.72 -­‐0.31 1.14 2.01 0.97 0.87 1.25 4.95 0.12 0.24 0.39 -­‐0.91 1.59 0.37 3.35 2.11 3.53 1.60 2.87 4.27 2.74 1.65 2.65 2.02 Kurtosis 0.38 -­‐0.22 0.84 -­‐1.28 -­‐0.15 -­‐0.76 1.78 5.51 0.88 -­‐0.62 1.53 28.37 -­‐0.43 -­‐0.20 -­‐0.49 -­‐0.18 0.54 0.68 11.26 5.99 16.23 2.03 11.17 21.06 8.23 3.87 7.19 4.31 27 Table B. Results of survival analyses, assessing predictive utility of CA-­‐YASI for infractions (n=846 males) CA-­‐YASI scale Scale Range Any Serious Violent infraction infraction infraction Grand Total -­‐123 to 140 1.02*** 1.03*** 1.03*** Dynamic Risk Total Dynamic Risk 0 to 105 1.03*** 1.05*** 1.05*** Aggression-­‐Violence 0 to 17 1.09*** 1.13*** 1.14*** Social Influences 0 to 17 1.07*** 1.09*** 1.09*** Substance Abuse 0 to 7 1.19*** 1.24*** 1.26*** Attitudes 0 to 7 1.26*** 1.37*** 1.38*** Social-­‐Cognitive Skills 0 to 15 1.10*** 1.15*** 1.16*** Family 0 to 12 1.09** 1.12** -­‐-­‐-­‐ Education-­‐Employment 0 to 14 1.14*** 1.17*** 1.20*** Health 0 to 2 1.16* 1.31** 1.35** Community Linkages 0 to 11 1.07* -­‐-­‐-­‐ -­‐-­‐-­‐ Community Stability 0 to 3 1.10* -­‐-­‐-­‐ -­‐-­‐-­‐ Static Risk Total Static Risk -­‐9 to 55 1.04*** 1.05*** 1.05*** Legal History -­‐4 to 23 1.05*** 1.06*** 1.06*** Correctional Response 0 to14 1.09*** 1.11*** 1.12*** Aggression-­‐Violence -­‐3 to 7 1.20*** 1.24*** 1.26*** Substance Abuse 0 to 2 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Family -­‐2 to 2 1.20*** 1.15* -­‐-­‐-­‐ Education-­‐Employment 0 to 7 1.08** -­‐-­‐-­‐ -­‐-­‐-­‐ Protective Total Protective 0 to 161 0.97*** 0.94*** 0.94*** Aggression-­‐Violence 0 to 23 0.86*** 0.72*** 0.71*** Social Influences 0 to 20 0.88*** 0.83*** 0.83*** Attitudes 0 to 20 0.85*** 0.80*** 0.80*** Social-­‐Cognitive Skills 0 to 31 0.94*** 0.84*** 0.84*** Family 0 to 14 0.91*** 0.86*** 0.88** Education-­‐Employment 0 to 26 0.89*** 0.82*** 0.81*** Community Linkages 0 to 14 -­‐-­‐-­‐ 0.92** 0.95* Community Stability 0 to 13 0.93*** 0.86*** 0.86*** *** p < .001, ** p < .01, * p < .05 -­‐-­‐-­‐ = no significant effect Note: Values are hazard ratios that indicate the degree of change in risk for each one point change in the predictive scale. For example, for each 1 point increase in Total Dynamic Risk scores, which range from 0 to 105, there is a 5% increase in the odds of a serious infraction. 28 Table C. Results of ROC analyses, assessing predictive utility of CA-­‐YASI for serious infractions (n=476 males) CA-­‐YASI Scale AUC Grand Total .75*** Dynamic Risk Total Dynamic Risk .74*** C-­‐ Aggression-­‐Violence .71*** D-­‐ Social Influences .67*** E-­‐ Substance Abuse .64*** F-­‐ Attitudes .68*** G-­‐ Social-­‐Cognitive Skills .68*** H-­‐ Family .56* I-­‐ Education-­‐Employment .67*** J-­‐ Health .55 K-­‐ Community Linkages .50 L-­‐ Community Stability .55 Static Risk Total Static Risk .65*** A-­‐ Legal History .59** B-­‐ Correctional Response .66*** C-­‐ Aggression-­‐Violence .61*** E-­‐ Substance Abuse .50 H-­‐ Family .55* I-­‐ Education-­‐Employment .52 Protective Total Protective .28*** C-­‐ Aggression-­‐Violence .32*** D-­‐ Social Influences .34*** F-­‐ Attitudes .36*** G-­‐ Social-­‐Cognitive Skills .40*** H-­‐ Family .41** I-­‐ Education-­‐Employment .35*** K-­‐ Community Linkages .44* L-­‐ Community Stability .41** *** p < .001, ** p < .01, * p < .05. Note: This pattern of results for serious infractions is highly similar to the patterns obtained for any infractions and violent infractions (not reported here) 29 Table D. Results of survival analyses, assessing predictive utility of CA-­‐YASI for arrests (n=364 males) CA-­‐YASI scale Scale Range Any Arrest Serious Violent Arrest Arrest Grand Total -­‐123 to 140 1.02** 1.01* 1.01* Dynamic Risk Total Dynamic Risk 0 to 105 1.01* -­‐-­‐-­‐ -­‐-­‐-­‐ Aggression-­‐Violence 0 to 17 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Social Influences 0 to 17 1.04* 1.04* 1.06* Substance Abuse 0 to 7 1.11* 1.11* -­‐-­‐-­‐ Attitudes 0 to 7 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Social-­‐Cognitive Skills 0 to 15 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Family 0 to 12 1.13** 1.23* 1.15* Education-­‐Employment 0 to 14 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Health 0 to 2 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Community Linkages 0 to 11 1.12* -­‐-­‐-­‐ -­‐-­‐-­‐ Community Stability 0 to 3 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Static Risk Total Static Risk -­‐9 to 55 1.09*** 1.08*** 1.08*** Legal History -­‐4 to 23 1.14*** 1.11*** 1.12*** Correctional Response 0 to14 1.18*** 1.19*** 1.20*** Aggression-­‐Violence -­‐3 to 7 1.32*** 1.36*** 1.43*** Substance Abuse 0 to 2 1.27** 1.23* 1.31* Family -­‐2 to 2 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Education-­‐Employment 0 to 7 1.16** 1.15** 1.17* Protective Total Protective 0 to 161 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Aggression-­‐Violence 0 to 23 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Social Influences 0 to 20 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Attitudes 0 to 20 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Social-­‐Cognitive Skills 0 to 31 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Family 0 to 14 0.90* -­‐-­‐-­‐ -­‐-­‐-­‐ Education-­‐Employment 0 to 26 0.94* -­‐-­‐-­‐ -­‐-­‐-­‐ Community Linkages 0 to 14 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ Community Stability 0 to 13 -­‐-­‐-­‐ -­‐-­‐-­‐ -­‐-­‐-­‐ *** p < .001, ** p < .01, * p < .05 -­‐-­‐-­‐ = no significant effect Note: Values are hazard ratios that indicate the degree of change in risk for each one point change in the predictive scale. For example, for each 1 point increase in Total Dynamic Risk scores, which range from 0 to 105, there is a 1% increase in the odds of a serious arrest. 30 Table E. Results of ROC analyses, assessing predictive utility of CA-­‐YASI for serious arrests (n=169 males) CA-­‐YASI Scale AUC Grand Total .56 Dynamic Risk Total Dynamic Risk .54 C-­‐ Aggression-­‐Violence .55 D-­‐ Social Influences .55 E-­‐ Substance Abuse .61* F-­‐ Attitudes .53 G-­‐ Social-­‐Cognitive Skills .48 H-­‐ Family .56 I-­‐ Education-­‐Employment .55 J-­‐ Health .50 K-­‐ Community Linkages .50 L-­‐ Community Stability .55 Static Risk Total Static Risk .65** A-­‐ Legal History .62** B-­‐ Correctional Response .65** C-­‐ Aggression-­‐Violence .58 E-­‐ Substance Abuse .54 H-­‐ Family .50 I-­‐ Education-­‐Employment .54 Protective Total Protective .47 C-­‐ Aggression-­‐Violence .49 D-­‐ Social Influences .47 F-­‐ Attitudes .50 G-­‐ Social-­‐Cognitive Skills .50 H-­‐ Family .52 I-­‐ Education-­‐Employment .46 K-­‐ Community Linkages .47 L-­‐ Community Stability .50 ** p < .01, * p < .05. 31