1. preface - Danske validerede og ikke validerede spørgeskemaer
Transcription
1. preface - Danske validerede og ikke validerede spørgeskemaer
Erratum Section 3.1.2. Internal Consistency (page 28) In the first paragraph the word “inter-correlation” is used several times. This should read “inter-item correlation”. Section 3.1.2. Internal Consistency (page 29) Interpretation of Cronbach’s Alpha in the first paragraph is unclear and has been clarified in the following description: “Frequently reported internal consistency coefficients include the item-total correlation, split-half coefficient, Kuder-Richardson 20 and 21 coefficients (dichotomous response options), and Cronbach’s coefficient α1 . The item-total correlation coefficient enables selection of relevant items from a large pool of items by correlating the individual item with the scale total omitting that item. If correlation is low (< 0.20) the item is irrelevant and can be discarded [11]. The split-half coefficient, on the other hand, divides the items into two subscales and calculates the correlation coefficient between the subscales. An internally consistent scale should have a high correlation coefficient between the two subscales [11]. However, the split-half coefficient does not address which item(s) are contributing to low reliability and the many pairs of subscales which can be produced. This is addressed by the commonly reported Cronbach’s α but interpretation of this coefficient is not straight forward. First, α depends on the number of items in the scale, thus increasing the number of items will increase α. Second, HMS measuring more than one latent variable will usually have a high α despite the different dimensions not being correlated to each other. Lastly, if α is too high this may suggest item redundancy and a reasonably interval of 0.7-0.9 has been suggested [80, 81]. Consequently, a low Cronbach’s α equates a scale which is not internally consistent, however, a high α is no guarantee for an internally consistent scale. A solution to this dilemma is to utilise the techniques of factor analysis to assess the relatedness of the various items [11]. Factor analysis investigates how many latent variables (or factors) underlie a set of items and three common approaches exist: principal component analysis, exploratory factor analysis and confirmatory factor analysis. Once factor analysis has established the number of latent variables, Cronbach’s α can be established for each subscale as a measure of internal consistency.” 1 Cronbach’s α = n n −1 (1 − ∑ δi2 δT2 ), where α = alpha, n = number of items, δi2 = item score SD, δT2 = total score SD [11] Section 6.1.2. Paper I-1 (page 55) In the second paragraph “Internal consistency”, second sentence. It states that “...these belonged to the same latent variable of pain related function.” This has been included by mistake and should have been omitted. Paper I-3. Results (page 6) In the Minimal Clinically Important Difference section, second paragraph, it states: “For each 25% increase in baseline entry score (original scale range), the MCID for all patients increased by: 12 points (ODI), 2 points (RMQ), 5 points (LBPRSdisability ), 18 points (SF36 (pf)), 6 points (LBPRSpain ), 13 points (SF36 (bp)), and 1 point for the NRSpain .” The values for the two subscales of the SF36 are on a 0-100 scale rather than their original scale range. The values for the original scale range are: 4 points (SF36 (pf)) and 1 point (SF36 (bp)). Contents List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1. P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1. S UPERVISORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2. L IST OF P UBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 PART I: T HE VALIDATION S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . . 7 1.3. F OREWORD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4. D EDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5. A BBREVIATIONS AND D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 A BBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.6. S UMMARY IN E NGLISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.7. S UMMARY IN D ANISH (D ANSK R ESUMÉ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2. I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1. O UTCOME M EASUREMENT IN L OW B ACK PAIN . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN . . . . . . . . 19 2.2.1. C HOICE OF HMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2. HMS VALIDATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.3. HMS R ESPONSIVENESS AND I NTERPRETATION . . . . . . . . . . . . . . . . . . . . . 21 3. E VALUATION OF H EALTH M EASUREMENT S CALES . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1. C ONCEPTS OF E VALUATION C RITERIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.1. C ROSS - CULTURAL A DAPTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.1.2. I NTERNAL C ONSISTENCY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1.3. F LOOR & C EILING E FFECT (S CALE W IDTH ) . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.4. R EPRODUCIBILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.5. VALIDITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.6. R ESPONSIVENESS AND I NTERPRETABILITY . . . . . . . . . . . . . . . . . . . . . . . 35 4. O BJECTIVE AND A IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 O BJECTIVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5. M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2 Contents 5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . . 47 5.1.1. T RANSLATION AND C ROSS - CULTURAL A DAPTATION . . . . . . . . . . . . . . . . . 47 5.1.2. PATIENTS AND S ETTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.1.3. O UTCOME M EASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.4. S UBGROUPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1.5. S TATISTICAL A NALYSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . . 52 5.2.1. PATIENTS AND S ETTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.2. P ILOT S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.3. M AIN S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.4. O UTCOME M EASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2.5. S TATISTICAL M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6. S UMMARY OF R ESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2. 6.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . . 55 6.1.1. PARTICIPANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1.2. PAPER I-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.1.3. PAPER I-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.1.4. PAPER I-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.1.5. PAPER I-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . . 58 6.2.1. PARTICIPANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2.2. M AIN S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 7. D ISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2. 7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . . 59 7.1.1. D ISCUSSION OF F INDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 7.1.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS . . . . . . . . . . . . . . . . . . . . . 62 PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . . 64 7.2.1. D ISCUSSION OF F INDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.2.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS . . . . . . . . . . . . . . . . . . . . . 65 8. C ONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7.2. 8.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . . 68 8.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . . 68 9. R ECOMMENDATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 10. A PPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 10.1. A PPENDIX I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 10.2. A PPENDIX II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 10.3. A PPENDIX III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 10.4. A PPENDIX IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 10.5. A PPENDIX V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Contents 3 10.6. A PPENDIX VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 10.7. A PPENDIX VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.8. A PPENDIX VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.9. A PPENDIX IX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 10.10. A PPENDIX X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 11. PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 List of Tables 2.1 Advantages and limitations of the four types of HMS. . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Main problem areas using HMS as outcome measures. . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 A core set of HMS for patients with spinal disorders according to Bombardier. . . . . . . . . . 20 2.4 Reliability - synonyms, definitions and measurement method . . . . . . . . . . . . . . . . . . . 22 2.5 Methods used to interpret HMS scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1 Essential measurement properties of HMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Definitions of responsiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Proposed conceptual framework for responsiveness . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Distribution-based methods for determining change . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Anchor-based methods for determining change . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 Methodological weaknesses of the TQs as reported in the literature. . . . . . . . . . . . . . . . 41 3.7 Study designs and their corresponding analytic methods. . . . . . . . . . . . . . . . . . . . . . 45 5.1 Merging transition questions 1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2 Concurrent validity and statistical tests examined in study I. . . . . . . . . . . . . . . . . . . . . 50 7.1 Steps in standardising the construction of patients’ global assessment of treatment effect. . . . 62 List of Figures 2.1 An algorithm for choosing an HMS in spinal research. . . . . . . . . . . . . . . . . . . . . . . . 25 3.1 Graphic representation of the stages of cross-cultural adaptation. . . . . . . . . . . . . . . . . . 28 3.2 “Floor” and “ceiling” effects versus scale width for a fictive HMS. . . . . . . . . . . . . . . . . 30 3.3 The concepts of agreement and reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Bland and Altman’s limits of agreement plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5 Concepts of validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 The construction of a ROC curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 7.1 Choosing pain and disability HMS in subgroups of LBP patients - an algorithm. . . . . . . . . 67 1. P REFACE 1.1. S UPERVISORS Professor Niels Grunnet-Nilsson, MD, DC, PhD (main supervisor) Clinical Locomotion Science Institute of Sports Science and Clinical Biomechanics University of Southern Denmark, Odense, Denmark Associate Professor Jan Hartvigsen, DC, PhD (project supervisor) Clinical Locomotion Science Institute of Sports Science and Clinical Biomechanics University of Southern Denmark, Odense, Denmark and Nordic Institute of Chiropractic and Clinical Biomechanics Part of Clinical Locomotion Science Professor Claus Manniche, MD, DMSc Clinical Locomotion Science Institute of Sports Science and Clinical Biomechanics University of Southern Denmark, Odense, Denmark and Backcenter Funen Ringe, Denmark 1.2. L IST OF P UBLICATIONS 7 1.2. L IST OF P UBLICATIONS This thesis is based on the following papers: PART I: T HE VALIDATION S TUDY 1. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Danish version of the Oswestry Disability Index for patients with low back pain. Part 1: Cross-cultural adaptation, reliability and validity in two different populations. Eur Spine J 2006 (paper I-1) 2. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Danish version of the Oswestry Disability Index for patients with low back pain. Part 2: Sensitivity, specificity and clinically significant improvement in two low back pain populations. Eur Spine J 2006 (paper I-2) 3. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Responsiveness and minimal clinically important difference for pain and disability instruments in low back pain patients. BMC Musculoskelet Disord 2006;7:82-98 (paper I-3) 4. Lauridsen HH, Hartvigsen J, Korsholm L, Grunnet-Nilsson N, Manniche C. Choice of external criteria in back pain research: does it matter? Recommendations based on analysis of responsiveness. Pain, accepted for publication (paper I-4) PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 1. Lauridsen HH, Manniche C, Korsholm L, Grunnet-Nilsson N, Hartvigsen J. Are low back pain patients able to determine acceptable outcome of treatment before it begins? J Clin Epidemiol, submitted (paper II-1) The papers are reprinted in Section 11. 8 1. P REFACE 1.3. F OREWORD This PhD-thesis is based on two longitudinal cohort studies conducted from 2004 to 2006. Study one is a questionnaire validation study carried out on both acute and chronic low back pain patients seen in the primary and secondary sectors of the Danish health care system. The second study is a methodological study carried out on chronic low back pain patients seen only in the secondary sector. I wish to express my appreciation and thank my supervisors and everyone who has been involved in this project. In particular, I wish to thank: Niels Grunnet-Nilsson, principal supervisor, for initiating a visionary project, providing constructive criticism and moral support. Jan Hartvigsen, second supervisor, for his comprehensive view of the project. In particular, I want to express my profound gratitude for his enthusiastic, calm and humorous guidance during numerous meetings and his readiness for constructive critical advice. Claus Manniche from Backcenter Funen, Ringe who allowed me to access the chronic low back pain patients used in both studies. His deep insights in the clinical aspects of low back pain and research methodology was invaluable to the project. I also want to express my warmest thanks to the staff at Backcenter Funen (in particular Ida Bhanderi) for their skilled handling of the questionnaires and patient records in both studies. I still owe them Danish pastry for their participation in the project. Lars Korsholm for his excellent statistical support during numerous meetings. In particular I wish to thank him for his professional support using Stata which, at times, required complicated programming and for his clear feedback during the manuscript preparation. Jytte Johannesen from the Nordic Institute of Chiropractic and Clinical Biomechanics for her excellent management of data collection in both studies and for her always critical and helpful comments when proofreading manuscripts. I am grateful for her warmth and helpful discussions during my time of sharing the same office. Furthermore, I wish to thank the Nordic Institute of Chiropractic and Clinical Biomechanics for allowing me to use an office during most of my PhD. 1.3. F OREWORD 9 Special thanks to the six chiropractic clinics involved in the project: • • • • • • Chiropractic Clinic in Nyborg (Henrik Wulff Christensen & Peter Højgaard) Hartvigsen & Hein Chiropractic Clinic in Odense (Lisbeth Hartvigsen & Tina Hein Lauridsen) Chiropractic Clinic in Odense (Rie Grunnet-Nilsson & Robert Devallier) Holt & Højer Chiropractic Clinic in Lyngby (Birgitte Holt & Kent Højer) Chiropractic Clinic in Fredericia (Susanne Bjerggaard & Kalle Buch) Chiropractic Clinic in Viby (Troels Gaarde & Gitte Mogensen) Without their assistance and support I would not have been able to collect data from the primary sector patients included in project I. Finally, I want to thank my colleagues at the Institute of Sports Science and Clinical Biomechanics (University of Southern Denmark) and Backcenter Funen (Ringe, Denmark) for moral support and encouragement during my PhD. The project was supported by the (Danish) Foundation of Chiropractic Research and Post Graduate Education, the Faculty of Health Science, University of Southern Denmark, Odense and the European Chiropractic Union Research Council. Henrik H. Lauridsen, Odense, 2007 10 1. P REFACE 1.4. D EDICATION The PhD thesis is dedicated to my family, my wife Tina, and my two children, Cecilia and Jonas who endured endless evenings and weekends with me in front of a computer. Without their support and patience during the last three years, I would not have been able to finish this thesis on time. Small secret: The source of your happiness is involvement in a relationship, not the person. The Story Teller 1.5. A BBREVIATIONS AND D EFINITIONS 11 1.5. A BBREVIATIONS AND D EFINITIONS A BBREVIATIONS HMS ICC ICCagreement ICCconsistency LBP LBPRSdisability LBPRSpain LOA LOAlower MDC95% MCID MCID% MCIDpost MCIDpre MID NNT NRSimp NRSpain ODI PrS RMQ ROC ROCauc SEM SeS SF36 (bp) SF36 (pf) SRM SRM% SRMraw TQ TQ1 TQ2 Health measurement scale(s) Intraclass correlation coefficient Intraclass correlation coefficient including systematic error variance Intraclass correlation coefficient excluding systematic error variance Low back pain Disability subscale of Low Back Pain Rating Scale Pain subscale of Low Back Pain Rating Scale Limits of agreement The lower limit of agreement Minimal Detectable Change at the 95% confidence level Minimal Clinically Important Difference of the raw change score Minimal Clinically Important Difference of the percentage change score Minimal Clinically Important Difference determined after treatment cessation Minimal Clinically Important Difference determined before start of treatment Minimal Important Difference Numbers needed to treat Numeric Rating Scale of importance Numeric Pain Rating Scale Oswestry Disability Index Primary sector Roland Morris Disability Questionnaire Receiver operating characteristic (curve) Area under the ROC curve Standard error of the measurement Secondary sector Bodily pain subscale of SF36 Physical function subscale of SF36 Standardised response mean Standardised response mean of the percentage change score Standardised response mean of the raw change score Transition question 7-point transition question 15-point transition question 12 1. P REFACE D EFINITIONS Clinimetrics “The methodologic discipline focussing on measurement issues in clinical medicine” [1, 2] Responsiveness “The ability of an instrument to detect accurately change when it has occurred” [3–5] Minimal detectable change “A change score which falls outside the measurement error of the outcome measure” [6] Interpretability “The degree to which one can assign qualitative meaning to an instrument’s quantitative score” [7] Minimal clinically Important difference “The smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” [8] Reproducibility “The degree to which an instrument yields comparable results if it is used repeatedly on stable patients” [9] Agreement “The closeness of scores on repeated measurements” [9] Reliability “An instrument’s ability to discriminate between different levels of the measured health outcome” [9] Nota Bene Paper I-1 contains several inconsistencies regarding terminology and definitions of reproducibility, agreement and reliability compared to the description given in this thesis. The disparity stems from lack of clear definitions supported by sound hypothetical frameworks at the time of writing. The definitions delineated in Section 3.1.4 on page 29 will be adhered to throughout the thesis with the terminology used in paper I-1 described in brackets where appropriate. 1.6. S UMMARY IN E NGLISH 13 1.6. S UMMARY IN E NGLISH Background The Oswestry Disability Index (ODI) is one of two standardised functional health measurement scales (HMS) recommended. Despite extensive psychometric testing, little is known about HMS behaviour and the minimal clinically important difference (MCID) in subgroups of LBP patients. Moreover, the most commonly used retrospective method to establish the MCID has inherent methodological flaws. Perhaps it would be more prudent to ask LBP patients what is an acceptable result of the treatment before it begins? Objectives The overall objective was to establish the responsiveness and MCID in specific subgroups of patients with LBP. In addition, we explored whether low back pain patients were able to determine an acceptable treatment outcome before it began. Methods The responsiveness in subgroups study. An extensive cross-cultural adaptation and validation of the ODI was carried out on patients seen in the primary (PrS) and secondary sectors (SeS) of the Danish health care system. The prospective acceptable outcome study. A method for estimating LBP patients’ view of an acceptable change before treatment begins (MCIDpre ) was developed and compared to a well established retrospective method of determining the MCID (MCIDpost ). Results The responsiveness in subgroups study. The ODI measurement error ranged between -11.5 and +13 points. Responsiveness was comparable to the external measures. A floor effect was seen in the PrS patients. The MCID was nine points in PrS and LBP only patients and eight points in SeS and leg pain patients. Moreover, patients’ retrospective evaluation of treatment effect was more responsive in PrS patients compared to serial measurements. The prospective acceptable outcome study. The prospective acceptable outcome method was reproducible. The MCIDpre was outside instrument measurement error and 1.5-4.5 times larger compared to the MCIDpost . Furthermore, the MCIDpre was almost comparable to patients’ post-treatment acceptable change, but only for the pain scale. Conclusion The Danish version of the ODI is a reliable, valid and responsive HMS which is psychometrically more appropriate in SeS patients. In addition, the Roland Morris Disability Questionnaire (RMQ) is the most suitable for patients with LBP only whereas the ODI and RMQ is equally suitable for patients with leg pain. The choice of pain scale is arbitrary in all subgroups 14 1. P REFACE and the pain subscale of the Low Back Pain Rating Scale is recommended. The MCID was more or less stable across subgroups for most instruments and increased monotonously with baseline condition severity in PrS and LBP patients only. The clinical question: “how are you now compared to when you started the treatment” seems to be most sensitive to condition alterations in PrS patients and should be added as an outcome measure to standard questionnaires used serially. The prospective acceptable outcome method offers a benchmark by which clinicians can balance any mismatch between what is acceptable outcomes to the patient with what is realistically obtainable by a certain treatment. Chronic LBP patients seem to have a reasonable idea of an acceptable change in pain but overestimate change in functional and psychological /affective domains. 1.7. S UMMARY IN D ANISH (D ANSK R ESUMÉ ) 15 1.7. S UMMARY IN D ANISH (D ANSK R ESUMÉ ) Baggrund Oswestry Disability Index (ODI) er et af to anbefalede spørgeskemaer til brug ved måling af funktion hos rygpatienter. Psykometrisk testning af disse skemaer er velfunderet, men vores viden om skemaernes anvendelighed samt den mindste kliniske relevante ændring (MCID) i specifikke subgrupper af patienter med lændesmerter er begrænset. Sidst men ikke mindst, har den gængse metode til bestemmelse af MCID metodologiske svagheder. En mulig løsning kunne være at spørge rygpatienterne, hvad de mener vil være et acceptabelt resultat af behandlingen, før den igangsættes. Formål Formålet med afhandlingen var at bestemme responsiviteten og MCID i specifikke subgrupper af patienter med lændesmerter. Udover belyses spørgsmålet om, hvorvidt lænderygpatienter kan bestemme, hvad et acceptabelt resultat af et behandlingsregi vil være, før dette påbegyndes. Metode & materiale Studie til måling af responsivitet i undergrupper. En omfattende oversættelse, tilpasning til det danske sprog og validering af ODI blev gennemført på lænderygpatienter set i primær(PrS) og sekundærsektoren (SeS) i det danske sundhedsvæsen. Studie til måling af patienters prospektive acceptable behandlingsresultat. En metode til måling af lænderygpatienters acceptable ændring (MCIDpre ) før iværksættelse af en behandling blev udviklet og sammenlignet med den gængse metode til bestemmelse af MCID (MCIDpost ). Resultater Studie til måling af responsivitet i undergrupper. Målefejlen på ODI blev fundet til at ligge mellem -11.5 og +13 points. Responsiviteten var sammenlignelig med de øvrige skemaer. ODI have en udpræget “gulv-effekt” (floor-effect) hos PrS patienter. MCID var ni points i PrS og lænderygpatienter og otte points hos SeS og bensmerte patienter. Desuden var patienternes retrospektive evaluering af behandlingseffekt mere responsiv hos PrS patienter sammenlignet med serielle målinger. Studie til måling af patienters prospektive acceptable behandlingsresultat. Metoden til bestemmelse af lænderygpatienters prospektive acceptable behandlingsresultat var reproducerbar. MCIDpre var større end det givne skemas målefejl og 1.5-4.5 gange større sammenlignet med MCIDpost . Hvad angår smerte, var lænderygpatienters MCIDpre næsten sammenlignelig med deres acceptable ændring bestemt efter behandlingen. 16 1. P REFACE Konklusion Den danske version af ODI er reproducerbar, valid og responsiv men grundet psykometriske forhold, er det mest hensigtsmæssigt at bruge spørgeskemaet til de mere kroniske sekundærsektor patienter. Ydermere, så er Roland Morris Disability Questionnaire (RMQ) det mest velegnede spørgeskema til patienter udelukkende med lændesmerter, mens ODI og RMQ er lige velegnede til patienter med bensmerter. Valget af smerteskala er mere vilkårligt i alle subgrupper, og smerteskalaen fra Low Back Pain Rating Scale anbefales. MCID var mere eller mindre stabil på tværs af alle subgrupperne for størstedelen af spørgeskemaerne og steg monotont med tilstandens alvorlighed ved baseline, men dette var kun gældende for PrS patienter og patienter med lænderygbesvær. Det kliniske spørgsmål: “hvordan har du det nu sammlignet med da du startede behandlingen” er mest sensitivt til kliniske relevante ændringer i PrS patienter og bør inkluderes som effektmål med de gængse spørgeskemaer appliceret serielt. Metoden til bestemmelse af lænderygpatienters prospektive acceptable behandlingsresultat er ny. Den tillader klinikeren at afstemme hvad der er acceptabelt for patienten med hvad der behandlingsmæssigt er opnåeligt. Det ser ud til, at kroniske lænderygpatienter har en fornuftig idé om, hvad der er en acceptabel ændring i smerte, men at de overvurderer ændringer i fysisk funktion og psykologiske/følelsesmæsige domæner, før behandlingen begynder. 2. I NTRODUCTION “Whatever exists, exists in some amount and can be measured” L. L. Thurstone, American psychometrician (1887-1955) 2.1. O UTCOME M EASUREMENT IN L OW B ACK PAIN During the last part of the twentieth century, professions dealing with back pain have become aware of the problem in providing services to patients based primarily on tradition and anecdote which are not justified by sound theory rooted in scientific evidence. Within the realm of a modern health care system, calls for “evidence-based practice” in the management of low back pain (LBP) is mounting. This mounting pressure has pushed clinicians and researchers dealing with back pain towards demonstrating what they do “works”. The importance of differentiating the impact of their management programmes from the natural course of recovery following the onset of disease or injury has been recognised and resulted in numerous clinical studies and randomised clinical trials. Demonstrating efficacy of clinical interventions for patients with LBP has proved difficult and many trials are unable to show significant differences between various treatment programmes and control groups. Two main reasons can, at least in part, explain this finding: first, the interventions are ineffective, and second, the trials have methodological shortcomings blurring the differences in efficacy. One such methodological shortcoming could well be found in the applied outcome measures as research into their psychometric properties always has lacked behind the clinical intervention research itself. However, the momentum for clinimetric research has increased drastically in recent years with the common goal of providing scientific evidence to guide the choice of effective health care decisions by all parties involved (patients, providers, policy makers, and third party payers). Producing high quality clinical outcome research depends, among other things, on the ability of outcome measures to capture a clinical relevant change. In the realm of spinal disorders, clinical success has traditionally relied on measuring physical changes which were directly observable. Consequently, measurements of mortality, physiological changes (e.g. nerve conduction) or impairments of body functions (e.g. range of motion, straight leg-raise) were predominant despite weak correlation between physical outcomes with patient behaviour and symptoms [10]. In the last decade clinicians and researchers are increasingly recognising that 18 2. I NTRODUCTION the outcomes of these interventions are usually best seen from the perspective of the patient in terms of their ability to perform activities of daily living and participate in life. This has resulted in a shift in paradigm where relying on objective findings as primary outcome measures was commonplace to relying primarily on questionnaires measuring the patient’s perception of function and pain as well as psychological/affective dimensions in clinical trials. Questionnaires designed to measure aspects of health are typically called health measurement scales (HMS) [11]. HMS are typically classified according to their general applicability and fall into four categories: a) generic, b) region-specific, c) disease-specific and d) patient-specific [12–16] (Table 2.1). The generic HMS are designed to be applicable across populations, conditions and interTable 2.1. Advantages and limitations of the four types of HMS. HMS category Advantages Limitations Generic − Cost and time effective. − Identification of co-existing problems − Comparisons of burden of illness and treatment outcome between and within groups − Normative data often available − May contain sections or items of irrelevance to the patient or condition under scrutiny − Often lengthy − May be less responsive compared to region- and disease specific HMS Region-specific − Often brief − Applicable to any condition in the body region − High relevance to patients − More responsive compared to generic measures − Only applicable to the specified region − Cannot address co-morbid problems − May contain items of irrelevance to patients with specific condition Disease-specific − Often brief − High relevance to patients − May be more responsive compared to both generic and region-specific measures − Only applicable to specific disease − Cannot address co-morbid problems Patient-specific − Not condition or region-specific − Content tailored to individual patient − May be the most responsive measure − Requires administration by interview − Cannot be meaningfully aggregated − Problematic in acute conditions References [12–16] Legend: Advantages and limitations of the four typesThis of HMS’s. ventions and are usually multidimensional. is in contrast to region- and disease-specific instruments which are designed to measure health status in diagnostic groupings or specific Footer: References [7-10] anatomical regions respectively. Last, the patient-specific instruments are structured to identify activity limitations specific to a particular patient. 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN 19 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN The inclusion of self-reported HMS as a measurement tool in clinical trials or clinical practice is not without problems. The main areas of difficulty have been outlined in Table 2.2 and will be discussed. Table 2.2. Main problem areas using HMS as outcome measures. Problem areas Short description Choice of HMS • Considerations important to choice of HMS are confusing and poorly described • The validity of using HMS in specific target populations and settings is poorly understood HMS validation • Measurement properties lack standardised definitions and are often poorly established HMS responsiveness and interpretation • Definitions of responsiveness lack standardisation • Establishing responsiveness of HMS is challenging as the choice of measurement index is arbitrary • Interpretation of HMS change scores is problematic as no gold standard exists Legend: problem areas using HMS’s as outcome measures. 2.2.1. CMain HOICE OF HMS Footer: measurement scale TheHMS, questHealth for the “ideal” HMS to be used in clinical practice and research has sparked the development of a plethora of self-report outcome measures. As a result more than 82 back related instruments have been reported in the literature [17] and new questionnaires seem constantly to emerge. This makes the choice of a proper instrument for a given clinical situation complicated and confusing [18–20]. In addition, many HMS are adapted and published in several versions due to postulated shortcomings in the original HMS. A few examples are the popular and much used Roland Morris Disability Questionnaire (RMQ) which exists in six different versions [21–26] and the Oswestry Disability Index (ODI) which is published in four versions [27]. As a consequence, there is poor comparability of study results and little shared understanding of their clinical relevance. The urgent need for standardisation has resulted in recommendations of a set of HMS to be used by clinicians and researchers when dealing with spinal disorders (Table 2.3 on the next page), and most clinical studies now include at least one of the recommended HMS [28]. However, it is important to recognise that the recommendations were based on agreement among a panel of medical experts and not on instrument superiority. Most instruments offer advantages and limitations depending on study setting and patient population. In addition, the included HMS are a core set of instruments covering only the most important domains. As a result, researchers and clinicians may be exposed to situations where the core set of instruments is insufficient to cover the requirements of a particular 20 2. I NTRODUCTION Table 2.3. A core set of HMS for patients with spinal disorders according to Bombardier. Domain HMS Back-specific function • • Roland Morris Disability Questionnaire Oswestry Disability Index Generic health status • SF-36 version 2.0 Pain • • Bodily Pain subscale of SF-36 The Chronic Pain Grade Questionnaire Work disability • • • Work status – 10 categories Days off work and days of cut down work - number of days Time to return to work - number of days Satisfaction • • Patient Satisfaction Scale – satisfaction with care Global satisfaction with treatment outcome Reference [28] Legend: A core set of HMS for patients with spinal disorders. study and patient population, thus facing the difficulties of choosing an appropriate Footer:setting Reference (Bombardier ref 003) HMS. To aid the choice of an appropriate HMS, the researcher or clinician may want to follow the algorithm outlined in Figure 2.1 on page 25. The practical application of some of the steps outlined in Figure 2.1 may, however, prove difficult. One obstacle is comparing measurement properties of the various instruments. Many reviews of HMS specific to LBP have been published [19, 20, 27, 29–39] with the intend of making the choice more transparent; however, these often lack comprehensiveness in either the included HMS or the measurement properties reviewed, and standardised quality criteria are rarely applied [40]. A second problem is finding a HMS which has been tested under similar conditions. A recent trend in LBP research is to focus on the efficacy of certain intervention strategies in specific subgroups of LBP patients rather than comparing these strategies in LBP patients as a whole [41–43]. A similar trend of testing and concurrently comparing the behaviour of HMS in different settings and subgroups is almost completely lacking in the literature. As a result, it is often assumed that HMS are psychometrically sound when applied to different settings and subgroups despite the fact that patient characteristics may be very different. Clarification of which HMS is appropriate for which setting and subgroup is required to catch up with recent trends in LBP research. In summary, choosing a proper HMS is often cumbersome and time consuming due to the large amount of instruments and versions available. Consequently, a core set of standardised HMS has been proposed, however, this is at the expense of flexibility and may be insufficient for specific study settings. An algorithm of choosing an HMS has been outlined. Using the algorithm requires close scrutiny of systematic reviews and concurrent comparisons of HMS and this may be problematic for two reasons: 1) systematic reviews of HMS are often methodologically debatable, and 2) concurrent comparisons of HMS in specific settings or subgroups are rare. 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN 21 2.2.2. HMS VALIDATION The methodological procedures applied in HMS validation studies reflect the confusion of what measurement properties are essential minimum requirements to legitimise their use in research or clinical settings. Some original measures have been published reporting only few measurement properties [47] while others have undergone a profound validation process [48]. This is particular true for instruments translated from one language to another. Examples are the RMQ and the ODI which have both been extensively validated in the English language under many different settings and therefore must be said to be legitimate research tools for patients with LBP [17, 34]. In contrast, the same questionnaires have been translated and validated into several languages reporting only a small number of measurement properties [49–52]. Thus, the validity of using these HMS in languages other than the source language is questionable due to lack of reporting essential measurement properties as outlined by Terwee et al. [40]. Another area of great confusion and disagreement is the framework of HMS measurement properties. First, the list of synonyms for each concept is often long and confusing. Second, many of these terms lack clear definitions and if definitions are given, they are oftentimes conflicting. Last, the methods used to establish these measurement properties are many and often interpreted in a variety of ways. An example of this is the measurement property of reliability (Table 2.4 on the next page). The concept of reliability has a wealth of synonyms and definitions when searching the HMS literature, and these reflects the multitude of described measurement methods. In addition, some studies do not describe the type of reliability coefficient used [57–60] making inferences from their findings difficult. In summary, choosing the proper measurement method for at given construct is complex, making amble room for mistakes. Using standardised measurement properties in validation studies is uncommon which probably reflects the numerous shortcomings and gaps seen in many HMS. We recommend researchers and clinicians to use the newly updated measurement properties as outlined by Terwee et al. [40]. 2.2.3. HMS R ESPONSIVENESS AND I NTERPRETATION The use of HMS in longitudinal studies requires a detailed and meaningful understanding of the instrument in question. The properties of responsiveness and interpretation are of major importance to evaluative HMS. For a HMS to be valid in a longitudinal study it has to be sensitive to changes in the measured health domain - it has to be responsive. Consequently, it has to be able to detect change in patients’ condition when assessments are compared serially over time (Time2 - Time1). The concept of responsiveness is a much sought after instrument property, however, it remains elusive and less well understood. It presents numerous problems: 1) a commonly accepted definition of responsiveness does not exist (at least 26 definitions have been described) pro- 22 2. I NTRODUCTION Table 2.4. Reliability - synonyms, definitions and measurement method Reliability Synonyms Definitions Streiner & Norman Agreement, association, concordance, consistency, precision, repeatability, reproducibility, stability and more… (Reliability) is a fundamental way to reflect the amount of error, both random and systematic, inherent in any measurement. Fayers & Machin (Reliability) consists of determining that a scale or measurement yields reproducible and consistent results. De Vet et al. Reliability concerns the degree to which patients can be distinguished from each other, despite measurement error. Finch et al. Reliability is a measure of consistency and the ability to differentiate among the objects of measurement. Devillis Scale reliability is the proportion of variance attributable to the true score of the latent variable. Oppenheim Reliability refers to the purity and consistency of a measure, to repeatability, to the probability of obtaining the same result again if the measure were to be duplicated. Hansagi & Allebeck Reliability is the absence of small or large random (or systematic) measurement errors. Measurement methods Correlations coefficients (Pearson correlation coefficient, Intra-class correlation coefficients), Internal consistency (e.g. coefficient α, split‐half, Kuder‐Richardson 20 and 21 coefficients), Kappa coefficient, Bland & Altman limits of agreement, Standard error of the measurement coefficients. References [9, 11, 16, 53–56]. Legend: Reliability – synonyms, definitions and measurement method. ducing an elusive conceptual framework [61], 2) quantification of responsiveness is often by Footer: References (…)however the methodology is poorly understood. As a result, at least 31 statistical indices, different responsiveness indices been reported in measurement, the literature between Oppenheim, AN. Questionnaire design,have interviewing and attitude 2ndmaking edition, 1996, Printerstudy Publishers, London & New York, chapter 8, Question wording pp. 119-149. comparisons difficult [61], 3) there is a lack of consensus on the optimal measuring approach of responsiveness, and different indices may lead to different conclusions [62], 4) interpretation Hansagi, H; Allebeck, P. Enkät och intervju inom hälso- och sjukvård. Handbok för forsking och utvecklingsarbete, Studenterlitteratur, 5, Utformning av mätinstrumentet – frågeformuläret, of some indices 1994, is unclear (e.g. effectchapter size and standardised response mean) [63], and 5)pp. sev38-63. eral different responsiveness classification systems have been published - each with a unique conceptual framework of responsiveness [4, 63–65]. The second important property of an HMS is the challenge of interpretability of the serial change scores they produce. This is illustrated in Case I. Case I A middle-aged car mechanic with LBP radiating to the left lateral thigh has received regular conservative care over a 4 weeks period and is progressing well. The pain in the thigh has disappeared, however, there is still some pain in the back, and he finds it difficult to bend forward with occasional catching pain. He is back to work 3 out of 5 days a week. Functional and pain HMS 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN 23 were administered at the initial visit and again at 4 weeks. The initial functional score was 55 points (0-100 scale, high score = high disability) which was reduced to 42 points at the 4 weeks follow-up visit. Several questions may puzzle the clinician who is responsible for the patient described in the case. For example: what does the summary score of the functional HMS mean? How do we interpret the change score? Has the patient achieved a “clinically significant change”? Is the change meaningful? Is there a need to change the management plan? Is the change score outside measurement error which would typically occur in a routine administration of the HMS? A prerequisite for answering some of these important questions is a clear definition of what is meant by interpretation: “Interpretation has been defined as the degree to which one can assign qualitative meaning to an instrument’s quantitative score” [7] Several approaches exist which can aid the researcher or clinician in interpreting HMS scores and these are summarised in Table 2.5. Table 2.5. Methods used to interpret HMS scores. Index for interpretation Description Mean scores and SD 1) Of a reference group providing norm values 2) Of subgroups of patients who are expected to differ (e.g. LBP patients seen in the PrS and SeS of the health care system) 3) Of patients before and after a treatment programme of known efficiency (e.g. acute LBP patients receiving a course of NSAID) 4) Of patients in the different categories of an external anchor (patients’ global rating of treatment effect) Minimal clinically important difference* “Is the smallest difference in a score in a domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” * Synonyms: minimal important change, minimal important difference [8, 40, 66] Legend: Methods used to interpret HMS scores. Work in the area of the minimal clinically important difference (MCID) has been important to advance our knowledge of interpretability, as small numerical differences in mean HMS Footer: * Synonyms: minimal important change, minimal important difference (refs, Guyatt, 1986 & Terwee, 2006) scores may produce statistically important results when large sample sizes are used. However, this may not be equivalent to clinical significance [67]. Accordingly, the MCID demarcates a threshold for when a person or group has begun experiencing an improvement which they consider important [4] and various methods have been proposed [63] (see Section 3.1.6 for details). Establishing the MCID for a particular HMS is not easy as many methodological challenges exist. First, there is no agreed upon gold standard used in MCID studies and there is some evidence that the magnitude of the MCID depends on the choice of external criteria [68]. Second, the validity of the retrospective external criteria is unclear as some authors argue against its 24 2. I NTRODUCTION use [69–71] while others believe it is valid [72,73]. Third, few studies concurrently compare the methods used to calculate the MCID, and little is known about the similarity of results using different methods [74,75]. Fourth, the MCID may vary according to which disease or condition it is applied to, the level of severity at baseline, socioeconomic status, nationality and other baseline characteristics [63, 75, 76]. Last, the reported MCID cannot always be distinguished from measurement error for a particular HMS (i.e. MCID is smaller than the instrument measurement error) [40]. In summary, responsiveness and interpretability of evaluative HMS are closely linked. Instrument responsiveness is a necessity for ascertaining interpretability of the measured change. Establishing responsiveness of an HMS is methodologically complex, poorly understood and lacking consensus despite an overwhelming amount of literature. Likewise, ascertaining interpretability of HMS change scores also faces many methodological challenges but is slowly developing. Further research in both areas is needed to advance our understanding of change scores which are indispensable for interpretation of results from clinical studies. 25 2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN Figure 2.1. An algorithm for choosing an HMS in spinal research. Identify measurement purpose Discriminative HMS is designed to distinguish between individuals or groups of patients at a single point in time. No gold standard is available. Predictive HMS is designed to classify patients into predefined measurement categories, either concurrently or prospectively. Gold standard is available. Evaluative HMS is designed to measure the magnitude of longitudinal change in individuals or groups of patients. Identify HMS measurement domain general health status pain intensity activity restrictions work disability patient satisfaction social status psychological/affective aspects other Define target patient population and setting Generic Region-specific Disease-specific Patient-specific Consider the information required Global HMS - provide simplicity at the cost of detail Mulit-item HMS - provide complete profile of construct - may ↑ burden Reduce the list of potential HMS by considering: 1. Measurement properties - reproducibility & validity - responsiveness & MCID values (evaluative HMS) 2. Feasibility - administration, cost, respondent burden 3. Use in similar population Pilot test the chosen HMS for feasibility - has HMS been tested in similar population as the target population? * MCID, minimal clinically important difference [16, 28, 44–46] 3. E VALUATION OF H EALTH M EASUREMENT S CALES “A measurement is not an absolute thing, but only relates one entity to another” H.T. Pledge (1966) 3.1. C ONCEPTS OF E VALUATION C RITERIA The field of health status measurement has been characterised by the proliferation of HMS varying in their methods of development, content and breadth of application. This has resulted in published HMS of inconsistent quality adding to the confusion and difficulty when choosing an instrument. Moreover, most of these are developed in English-speaking countries and only exist in the source language [77]. Consequently, both international comparisons of study results and the number of multinational research projects have been impeded. The above problems have called for the establishment of principles, procedures and criteria for assessment of instrument quality. Several articles have been published proposing blueprints for translating and cross-culturally adapting questionnaires [77, 78] while others have recommended evaluation criteria for the measurement properties of HMS [40, 79]. The cross-cultural adaptation procedures accompanied by quality criteria for measurement properties provide researchers with a comprehensive set of tools to undertake a validation study of an HMS. Table 3.1 on the facing page outlines essential measurement properties for HMS evaluation and will be described briefly in Section 3.1.1 to 3.1.6. 3.1.1. C ROSS - CULTURAL A DAPTATION The process of translation and cross-cultural adaptation of HMS for use in other languages has been thoroughly described and documented in two reviews by Guillemin et al. and Beaton et al. [77, 78]. The purpose of cross-culturally adapting HMS is to look at issues of translation and cultural adaptation when preparing it for use in another language than the source language and in different settings. It involves the adaptation of questionnaire instructions and items including the response options and involves six stages: 3.1. C ONCEPTS OF E VALUATION C RITERIA 27 Table 3.1. Essential measurement properties of HMS. Measurement properties Translation and cross-cultural adaptation Internal consistency Floor and ceiling effect (scale width) Reproducibility - agreement - reliability Validity - content validity - criterion validity - construct validity Responsiveness and interpretability Stage 1 is a forward translation of the HMS into the target language by two independent Legend: Essential HMS measurement properties and bilingual translators (T1 & T2). Translator 1 should be aware of the underlying HMS constructs to provide clinical equivalency whereas translator 2 is naive to the concepts being quantified. Stage 2 is a synthesis of the two translations by the two translators and the responsible investigator to produce one common translation (T-12). Discrepancies are resolved by consensus and documented in a written report. Stage 3 is a back translation of T-12 from the target to the source language by two translators (BT1 & BT2). Both translators are unaware of the content of the original HMS and have the source language as their mother tongue. Stage 4 is a meeting of an expert committee which comprises methodologists, health professionals, language professionals, the translators, and the responsible investigator. The main purpose is to review all translations together with the written reports, discuss disagreements and reach consensus in four areas: • • • • Semantic Equivalence (differences in the meaning of the words, grammatical difficulties) Idiomatic Equivalence (jargon may be difficult to translate) Experiential Equivalence (experiences of daily life may differ from one culture to another) Conceptual Equivalence (words may hold different conceptual meaning between cultures) The final translation of the HMS should be comprehensible by a 12-year-old person. Stage 5 is testing of the pre-final version of the HMS on 30-40 patients from the target settings. All patients complete the questionnaire and are subjected to a structured interview. The interview is designed to probe patient’s perception of the purpose and meaning of each item, difficulties in question comprehension, patterns of missing items, and comments on layout and content. Stage 6 is a final audit of the translation process carried out by either the HMS developer or the coordinating committee and will not result in alterations of the content of the HMS. 28 3. E VALUATION OF H EALTH M EASUREMENT S CALES A summary of the process of translation and cross-cultural adaptation is provided in Figure 3.1. Figure 3.1. Graphic representation of the stages of cross-cultural adaptation. Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Translation Synthesis Back Translation Expert committee review Pretesting - Two translations (T1 & T2) into target language - Informed + uninformed translator - Synthesize T1 & T2 into T-12 - Consensus on discrepancies with translators’ reports - Two English first-language - Naïve to outcome measurements - Work from T-12 - Create two back translations BT1 & BT2 - Review all reports - Methodologist, developer, language professional, translators - Consensus on discrepancies - Produce pre-final version - n = 30-40 - Complete questionnaire - Probe to get an understanding of item Written report Written report Written report for each version (T1 & T2) Written report Written report for each version (BT1 & BT2) Stage 6 Submission and appraisal of all written reports by developers/committee Reference [78] 3.1.2. I NTERNAL C ONSISTENCY In HMS designed to measure one particular health construct (e.g. pain related function), it is desirable that all items reflect that particular latent variable or dimension so that the items may be summed [11]. Internal consistency (or item homogeneity) is based on parallel assessments of patients at an instant in time, and some authors consider it an aspect of reliability as it provides an index of a scale’s ability to differentiate among patients at an instant in time [16,53]. It refers to the extent to which scores of the items are related to each other (item-item correlation) or to the total scale score (item-total correlation). With regard to scale construction, three important scenarios emerge which may result in item elimination: 1) items showing a high inter-correlation coefficient may be redundant, 2) items with a low inter-correlation coefficient may be measuring a different construct, and 3) items demonstrating either very high or low correlation with the total score should be discarded. 3.1. C ONCEPTS OF E VALUATION C RITERIA 29 Frequently reported internal consistency coefficients include the split-half coefficient, itemtotal correlation, Kuder-Richardson 20 and 21 coefficients, and Cronbach’s coefficient α. Of these, Cronbach’s α1 is reported most frequently as it is useful to determine which items to retain and which to reject to form an internally consistent HMS. Cronbach’s coefficient α ranges from 0 (uncorrelated items) to 1 (perfect item correlation), however, interpretation is not as straight forward as this may imply. First, α depends on the number of items in the scale, thus increasing the number of items will increase α. Second, HMS measuring more than one latent variable will usually have a high α as the different dimensions are often correlated to each other. Last, if α is too high this may suggest item redundancy and a reasonably interval of 0.7-0.9 has been suggested [80, 81]. 3.1.3. F LOOR & C EILING E FFECT (S CALE W IDTH ) A useful HMS must cover a broad spectrum of the measured health construct to provide room on the response scale for patients to demonstrate improvement or deterioration. Problems arise when patients score either the highest (most of the scale attribute) or lowest (least of the scale attribute) possible scale score as these scores do not allow for any further deterioration or improvement, respectively. Accordingly, a patient scoring the highest scale score is said to have reached the “ceiling” whereas patients scoring the lowest score have reached the “floor” of the scale [16]. “Floor” and “ceiling” effects are usually expressed as the proportion of patients returning the lowest and highest possible scores, and McHorney & Tarlow have suggested a rate of less than 15% to be acceptable [82]. The concepts of “floor” and “ceiling” effect can be extended to include the idea of scale width [83]. Scale width is defined as the region of the score range of an HMS with the capacity to allow detection of change in scores over time which is not due to measurement error. Instrument measurement error is often reported as the minimal detectable change (MDC95% ) or Bland and Altman’s limits of agreement (LOA). As an acceptable rate, one could suggest that HMS with more than 10% of the patients scoring within measurement error at each end of the scale range should not be used. This has been used in the current thesis (Figure 3.2 on the next page). 3.1.4. R EPRODUCIBILITY Reproducibility of an HMS in research and clinical practice is sine qua non2 . It has been defined as the degree to which an instrument yields comparable results if it is used repeatedly on stable patients [9]. An HMS will always show some degree of score fluctuation despite no changes in the patient and in the absence of treatment, however, large fluctuations provide results which cannot be trusted. The sources of score fluctuations are many and in instruments n n −1 ∑ σi2 σT2 1 Cronbach’s α = 2 Latin: “without which it could not be” (1 − ), where α = alpha, n = number of items, σi2 = item score SD, σT2 = total score SD [11] ”Measurement error” (LOA or MDC95%) < 10% = acceptable 30 ”Measurement error” (LOA or MDC95%) < 10% = acceptable 3. E VALUATION OF H EALTH M EASUREMENT S CALES Figure 3.2. “Floor” and “ceiling” effects versus scale width for a fictive HMS. 0 Scale range No disability 0 Worst possible disability Conventional definition ”Floor” < 15% = acceptable 0 100 100 ”Ceiling” < 15% = acceptable Scale width ”Measurement error” (LOA or MDC95%) < 10% = acceptable 100 ”Measurement error” (LOA or MDC95%) < 10% = acceptable LOA, Bland and Altman’s limits of agreement; MDC95% , minimal detectable change at 95% confidence level [82, 83] Legend: ”Floor” and ”ceiling” effect versus scale width for a fictive HMS. related to LBP patients the stability is often influenced by within-patient, instrument and seta small part of the available range will reduce the reliability coefficients as it is more difficult to discriminate between the subjects. A further consideration in reproducibility studies is the time frame for data collection, which may influence the coefficients. If the time period is long - say 3-4 weeks for LBP patients - then it is likely that the included patients have undergone a real change especially if the condition is expected to change rapidly. The consequence is reduced coefficients. Opposite, a short time interval may increase the memory effect and overestimate the coefficients. To understand the concept of reproducibility it is important to differentiate between agreement and reliability [9, 40] which will be described briefly. Footer: LOA, Bland & Altman’s limits of agreements; MDC95%, Minimal Detectable Change at 95% confidence level. ting variances. For example, using a homogeneous patient sample with scores restricted to Agreement Agreement involves the measurement error of the HMS and is an expression of how close the scores on repeated measurements are3 . It is expressed in the same units as the HMS in question and is an important property for evaluative instruments where clinically important changes have to be differentiated from measurement error. The concept of agreement including commonly used parameters is summarised in Figure 3.3 on the facing page. 3 This is sometimes referred to as test-retest reliability [84] 31 3.1. C ONCEPTS OF E VALUATION C RITERIA Figure 3.3. The concepts of agreement and reliability. Agreement Reliability 5 measurements in 1 person High Scale: 0 Scale: 0 Parameters: Scale: 0 100 5 measurements in 1 person Low 100 5 measurements in 2 persons Low Scale: 0 100 SEM LOA MDC95% 5 measurements in 2 persons High Parameters: 100 ICC Cohen’s Kappa (Pearson c.c.) SEM, standard error of the measurement; LOA, Bland and Altman’s limits of agreement; MDC95% , minimum detectable change at 95% confidence level; ICC, intra-class correlation coefficient; Pearson c.c., Pearson correlation coefficient [9] The standard error of the mean (SEM) is the error associated with a measurement taken at a single point in time [6, 85]. The SEM is one standard deviation of the error associated with a single measurement, so that in 95% of the cases the patient’s true score will lie approximately between ±2 SEMs of the observed value. Conceptually, the SEM can be estimated by two equivalent methods: The square root of the error variance derived from a repeated measures analysis of variance [86] The concepts of agreement and reliability Legend: p • The equation SEM = SD · (1 − R) [87] • Footer: References (de Vet). SEM, Standard error of the measurement; LOA, Bland and Altman’s limits of agreement; MDC95%, Minimum detectable change at 95% confidence level; ICC, intra-class correlation The reliability coefficient (R) can be coefficient either a test-retest parameter such as the intraclass coefficient; Pearson c.c., Pearson correlation correlation coefficients (ICC) or Cronbach’s α. It can be argued that the test-retest parameter is more appropriate in HMS Parameters: ICC scores, as it (in contrast to Parameters: SEM in the context of longitudinal changes LOA Cohen’s Kappa Cronbach’s α) represents temporal stability [6]. The SEM is considered a fixed characteristic MDC95% (Pearson c.c.) regardless of the subjects under investigation. The independence of the sample and the fact that it is expressed in the original metric of the instrument make the SEM more appropriate for interpreting individual scores within a population compared to effect size [87]. The SEM can be used to calculate what has been termed the minimum detectable change (MDC95% ). This coefficient indicates when a change score of an HMS is outside measurement error at the 95% confidence level. It is calculated from the following formula: √ MDC95% = 1.96 · 2 · SEM 32 3. E VALUATION OF H EALTH M EASUREMENT S CALES √ The 1.96 signifies the 95% confidence level and the 2 accounts for the magnitude of the measurement error in repeated measurements. A further popular method for establishing agreement is the method by Bland and Altman [88, 89]. It uses the idea of LOA which compares two measurements obtained by the same method. The difference between the measurements is plotted against the mean of the same measurements with 95% limits calculated as the mean difference ±1.96 SDs. Thus, 95% of the differences between the two measurements lie between these limits. An example of a limits of agreement plot is shown in Figure 3.4. Figure 3.4. Bland and Altman’s limits of agreement plot Reliability Reliability concerns the discriminative ability of an HMS and should be emphasized if discrimination between different levels of the measured health construct is important. In other words, how well can patients be distinguished from each other, despite measurement error (Figure 3.3). Reliability coefficients are expressed as a ratio between 0 and 1 - zero indicating no reliability, and one indicating no measurement error and perfect reliability. It follows the basic formula [11]: Reliability = Subject Variability Subject Variability + Measurement Error 33 3.1. C ONCEPTS OF E VALUATION C RITERIA Reliability is best described by the ICC where ICCagreement accounts for systematic error variance and ICCconsistency does not4 . Both coefficients depend on the heterogeneity of the measured construct in the sample under study, and basic formulas are: ICCagreement = 2 σsubjects 2 2 2 σsubjects + σsystematic + σresidual ICCconsistency = 2 σsubjects 2 2 σsubjects + σresidual 2 In the formulas, the σsubjects is the variance resulting from the subjects (or patients) included 2 in our study. The σsystematic represents the systematic difference between the two measurements 2 and the σresidual connotes the unexplained measurement error. For a more comprehensive review of reliability I refer to de Vet et al. [9]. 3.1.5. VALIDITY Validity is a process of determining if the scale is measuring what we think it is; that is, can we make valid statements about a person based on his or her score on the HMS. Thus, validation processes are not so much directed towards the integrity of the scale but more a process of determining the degree of confidence we can place on inferences made about the scale scores. Consistent with this concept is the notion that knowledge of an instrument’s validity is constantly evolving as new information becomes available [11, 16, 53]. Traditionally, validity has been subdivided into the trinitarian Cs: Content validity, Criterion validity and Construct validity. Unfortunately, the concepts of validity have been defined and structured differently in the literature creating a certain degree of confusion. However, as stated by Streiner and Norman [11]: The important questions are, “Does the hypothesis of this validation study make sense in light of what the scale is designed to measure”, and “Do the results of this study allow us to draw the inferences that we wish to make?” More recently, several additional terms have been added to distinguish among different assessment approaches. As these terms are presented in most textbooks and scientific articles they will be discussed briefly (Figure 3.5 on the next page). For a more in-depth description of validity and its many approaches I refer to well-known textbooks of psychometric theory [11,16,53]. Content Validity Content validity concerns the extent to which an HMS is composed of a representative sample of questions that assesses the target domain. High content validity will ensure broad 4 The terminology of agreement and consistency is not to be confused by other meanings used in the literature. 2 The ICCagreement simply includes the systematic error variance (σsystematic ) whereas the ICCconsistency does not. They are both reliability parameters. 34 3. E VALUATION OF H EALTH M EASUREMENT S CALES Figure 3.5. Concepts of validity Validity Content validity Criterion validity Construct validity Face validity Concurrent validity Convergent validity Item coverage Predictive validity Known group validity Discriminant validity References [11, 16, 53] applicability of a measure as the inferences hold true under a variety of different situations. It involves a critical examination of the instrument structure, a review of the development procedures, and consideration of applicability to the intended research question. Two related concepts are central to content validity: 1) item coverage and relevance, and 2) face validity [53]. Legend: Concepts of validity Item coverage and relevance involve selection of proper items from a pool of questions. The Footer: References (Fayes, Streiner, Finch) pool of questions is often collected from focus groups or key informant interviews. This usually leaves scale developers with far more items than will ultimately end up in the instrument, and a selection process has to occur. Two aspects are important for a rigorous selection process: 1) each item should be evaluated for relevance in terms of the target domain, and 2) the chosen items should cover the target domain in all aspects. Face validity considers whether items in an instrument appear “on the face of it” to measure what they are intended to measure clearly and unambiguously. It is often considered an aspect of content validity with the main difference being the timing of the critical review: face validity occurs after the HMS has been constructed while most of the procedures of content validity take place during the development procedures [53]. Criterion Validity Criterion validity has traditionally been defined as the correlation of a scale with the “true value”, or with some other standard that is accepted as providing an indication of the “true value” of the trait or disorder under study [11]. It can be divided into concurrent validity and predictive validity. Concurrent validity is the correlation of a new HMS with the criterion measure that is obtained at approximately the same point in time. If agreement between the two instruments is considered to be poor, the concurrent validity is low. 3.1. C ONCEPTS OF E VALUATION C RITERIA 35 Predictive validity has been defined as the ability of the instrument to predict future health status, future events, or future test results. Therefore, the future health status/event/test serves as a criterion to which the instrument is compared. A fictive example could be that HMS scores from patients with subacute LBP are predictive of their work status, thus providing additional prognostic information. Construct Validity The concept of construct validity refers to the extent to which a particular HMS relates to other instruments in a manner which is consistent with theoretically derived hypotheses concerning the concepts that are being measured [45]. It involves forming theories (or constructs) to explain the relationships among various behaviours or attitudes of interest and then assessing the extent to which the HMS provides results that are consistent with the theories. More formally, construct validity embraces a variety of techniques which are aimed at two things: 1) whether the theoretical postulated construct appears to be an adequate model, and 2) whether the HMS appears to correspond to that postulated construct. Three main types of construct validity are described [53]. Known-groups validity is a simple form of construct validation. It refers to a validation process that examines two distinct groups - one of which has the attribute, and the other does not. The group with the attribute should show higher scores on the HMS (higher = more of the attribute) in comparison to the other group. For example, it is likely that patients with longstanding chronic LBP will have higher scores on the pain catastrophising scale compared to persons without LBP. Convergent validity examines the extent to which a dimension measured in an HMS correlates appreciably with all other dimensions that are believed to be related to it. For example, it is likely that patients with longstanding chronic LBP are predisposed to catastrophize [90] for which reason there would be a correlation between the pain score and the pain catastrophizing scale. Discriminant validity (or divergent) on the other hand, recognises that a dimension measured in an HMS may be relatively unrelated to other dimensions not associated with it. For example, we would expect chronic LBP patients’ scores on a functional HMS to correlate more with the SF36 physical function subscale than with the role emotional subscale. Study designs for known-group, convergent and discriminant validity can be either crosssectional or longitudinal depending on which construct one chooses to examine [16]. 3.1.6. R ESPONSIVENESS AND I NTERPRETABILITY Conceptual Framework Numerous definitions of the concept of responsiveness have been described in the literature, and no single one is commonly accepted [61]. The many definitions have been grouped into three main categories which are summarised in Table 3.2 on the following page. 36 3. E VALUATION OF H EALTH M EASUREMENT S CALES Table 3.2. Definitions of responsiveness. Definition categories of responsiveness Type of change detected 1. ”The ability to detect change in general” - any type of change, regardless of whether it is relevant or meaningful 2. “The ability to detect clinically important change” - a clinically important change. Requires an explicit, although often subjective, judgement on what changes are important 3. “The ability to detect real changes in the concept being measured” - an extension of 1 & 2. Requires a “gold standard” in addition to judgement of what changes are important Reference [61] As an operational definition for this thesis, I have chosen a simple version which encompasses all types of change by omitting the nature of it [3–5]: “The ability of an instrument to detect accurately change when it has occurred” The literature shows some contention about whether responsiveness should be considered a part of validity or a separate attribute. Several theorists maintain that responsiveness is part and parcel of validity involving concurrent comparisons of the change, most akin to criterion validity [11,91,92]. Other authors regard it to be conceptually useful to consider responsiveness as a distinct measurement property from validity, albeit one that affects the range of valid applications [45, 93–95] Several conceptual frameworks for understanding the various methodological approaches to responsiveness have been published (Table 3.3 on the next page). Central to all of them is the link between methodological design features and the concepts being measured. For the purpose of this thesis I will focus on the framework of the distribution-based and anchor-based approaches, and the methods within these approaches which are relevant for the papers included in Section 11 will be dealt with in detail. A complete description of the different indices can be found in Terwee et al. and Crosby et al. [61, 63]. Distribution-based Approaches Probably the most popular strategies are the distribution-based methods which are based on statistical characteristics of the sample. They rely on relating the difference between preand post-treatment scores to some measure of variability [65, 97, 98]. The values obtained by these methods can be used to concurrently compare the responsiveness of HMS when applied to the same sample. However, all the distribution-based indices are limited by insufficiently indicating the importance of the observed change [6]. A summary of commonly used approaches with the sources used to define an important change is given in Table 3.4 on page 38. 37 3.1. C ONCEPTS OF E VALUATION C RITERIA Table 3.3. Proposed conceptual framework for responsiveness Authors Conceptual framework Lydick et al. (1993) - describes two approaches to define clinically meaningful change: a) distribution-based approaches are based on the statistical characteristics of the obtained sample, and three types have been identified: i. those based on statistical significance; ii. those based on change in relation to sample variation, and iii. those based on measurement precision b) anchor-based approaches are based on comparisons of HMS scores to other measures or phenomena that have clinical relevance Stratford et al. (1996) - identifies five study designs for assessing responsiveness. Presents theoretical considerations of which statistical analytical approach is optimal for each design Husted et al. (2000) - suggests two major areas of responsiveness: a) internal responsiveness, which characterises the ability of a measure to change over a particular pre-specified time frame b) external responsiveness, which reflects the extent to which changes in a measure over a specified time frame relate to corresponding changes in a reference measure of health status Beaton et al. (2001) - describes a taxonomy of responsiveness that provides a triple-axis matrix to classify responsiveness studies: a) the “Who” axis. Who is being studied: individuals or groups? b) the “Which” axis. Which information is being studied: i. between-person differences; ii. within-person differences, or iii. both c) the “What” axis. Differentiates five concepts of change: i. the minimum change; ii. the minimum detectable change; iii. the “observed change”; iv. the “estimated change”, and v. the “important change” Stratford et al. (2005) - identifies three study designs relevant for assessing sensitivity to change and suggests optimal statistical approaches for each design. References [4, 62, 64, 65, 96] Legend: Proposed conceptual frameworks for responsiveness Footer: References: see above Three sub-divisions of this approach have been described in the literature depending on: 1) statistical significance, 2) sample variation, or 3) measurement precision [63]. The first strategy is based on statistical significance and evaluates the probability of the observed change occurring by random variation. For example, LBP patients undergoing treatment with known efficacy are tested with a pain scale at baseline and after treatment cessation. The mean baseline score can be compared to the mean score at follow-up using the paired t-test or Wilcoxon signed-rank test. A second broad strategy assesses change in relation to sample variation. Typically, the numerator is the mean change score for a group of subjects which can be regarded as the “signal” [97]. This is divided by the variability in the sample - the “noise”. The most common example is the effect size where the denominator is divided by the standard deviation of the mean baseline score. An effect size of 1 would therefore indicate a magnitude of change equal to one standard deviation of the baseline score [105]. Interpretation of the effect size may follow the definitions by Cohen who defined a small effect size to be between 0.2-0.5, a medium as 0.5-0.8, and large as greater than 0.8 [106]. Several advantages of the effect size have been 38 3. E VALUATION OF H EALTH M EASUREMENT S CALES Table 3.4. Distribution-based methods for determining change Measure Paired t-test Gold standard* (reference) Calculation T P (Moayyedi et al., 1998) (Guyatt et al., 1987) Growth curve analysis T (Speer et al., 1995) Effect size T P (Kazis et al., 1989) (Fitzpatrick et al., 1992) ∑ (x T P (Norman et al., 1997) (Beaton et al., 1997) ∑ (d x1 − x 0 ∑ (d i − d )2 n ( n − 1) Standardised response mean B V x1 − x 0 2 0 − x1 ) n −1 x1 − x 0 0 − d )2 n −1 Guyatt’s responsiveness statistic Standard error of the measurement (SEM) D T (Deyo et al., 1991)** (Guyatt et al., 1987) T (Wyrwich et al., 1999) ∑ ∑ (x x 1 − x 0 ** ( d i stable − d stable ) 2 n −1 0 − x0 ) 2 n −1 Reliable change index T (Jacobson et al., 1991) (1 − R ) x1 − x0 2( SEM ) 2 * The gold standard is the source which can be used to define an important change. ** Can* be by the minimal clinically difference determined Thereplaced gold standard is the source which can beimportant used to define an important change. by the doctor. D, important change according to the doctor; P, important change according to the patient; T, change due ** Can be replaced by the minimal clinically important difference determined by the doctor Footer: D, important change according to the doctor; P, important change according to the patient;T, change due to treatment effect. treatment effect ¯ Key: xto 0 , pre-test score; x1 , post-test score; di , pre-to-post difference score for subject i; d, mean difference √ score; n, sample size; R, reliability of the HMS, B, empirical Bayes estimate of the individual slope; V, empirical Bayes estimate of the standard error of the slope; SEM, standard error of the measurement [61, 63, 71, 85, 94, 97, 99–104] described: 1) uses standardised units comparable across HMS, 2) uses the pre-test scores as the denominator (“noise”) which can be considered a proxy for control group scores [97]. Therefore, the effect size quantifies the extend to which the magnitude of the change scores exceeds the “noise” of normal variability. Last, effect size is independent of sample size. However, critics have challenged the effect size since it: 1) decreases with increasing baseline sample variability, 2) does not consider the variability of the change scores, and 3) may vary widely among samples [63]. In contrast to the effect size, the standardised response mean (SRM) has variability of the sample change scores as the denominator whereas the numerator is the same. A large SRM indicates that the change is large relative to the background measurement variability. The advantages of the SRM are: 1) it uses standardised units, 2) it is independent of sample size, and 3) it is based on variability of the change [63]. Disadvantages are: 1) that seemingly comparable individual changes may have different SRM values as it is dependent on the variability of the change in the sample, and 2) it varies as a function of the effectiveness of the treatment. 3.1. C ONCEPTS OF E VALUATION C RITERIA 39 The last strategy is based on the measurement precision of the instrument. Examples are the SEM (Section 3.1.4) or the reliable change index. These indices evaluate change in relation to variation of the instrument and not in relation to the sample. In summary, the distribution-based approaches are by far the most popular methods seen in clinimetric studies. They provide a means of measuring change beyond some level of random variation and have a common metric with comparable interpretation across HMS and study populations. On the downside, few agree-upon benchmarks for establishing clinically significant improvement have been established. Second, the reported results of these methods do not provide the clinician or researcher with an intuitive sense of what is a clinically meaningful and relevant change. The SEM and reliable change index have been proposed as the most promising indices since they are only influenced to a minor degree by baseline score variability, variability of the change scores, and the sample size [63]. Anchor-based Approaches The second strategy involves comparing changes in an HMS over a specified time frame to corresponding changes in a reference measure that have clinical relevance. This allows for determination of clinically meaningful change scores, and Jaeschke et al. [8] coined the term minimal clinically important difference in 1989 and defined it as: “The smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management” Several anchor-based methods have been reported in the literature ranging from correlation methods through diagnostic testing methods to regression models and both cross-sectional and longitudinal designs have been described [61, 63, 64]. A summary of the most important designs are presented in Table 3.5 on the next page. The following description and discussion will be limited to the longitudinal design involving a global rating of change. Correlation of Change Scores with a Global Rating of Change This method examines the relationship between HMS change scores and an independent external measure (or anchor) serving as an aid to learning about how to interpret the HMS [96]. The most commonly used external criterion is the patients’ global retrospective assessment of treatment effect (transition question, TQ) [13,60,63,69,107–109] even though other criteria such as the presence or absence of symptoms, differences among diagnostic groups and amount of health care utilisation have been used [68, 110]. The TQ approach uses a retrospective question such as: "Are you feeling better or worse, and if so, what is the extent of the change?" and the response options usually range from "much worse" to "much better". However, the number of in-between categories varies from one study to another (range: 3 to 13) [69, 107, 111]. 40 3. E VALUATION OF H EALTH M EASUREMENT S CALES Table 3.5. Anchor-based methods for determining change Method Description Gold standard (reference) Cross-sectional designs 1. Comparison to disease-related criteria Group comparisons in terms of standardised severity levels or diagnoses. Differences in mean scores across the groups are used to estimate the MCID Disease severity or diagnosis (Deyo et al., 1982) 2. Comparison to non-disease related criteria Links the change in HMS scores which occurs when an external non-disease event (e.g. death of a spouse) takes place i.e. before event and after event score External life event (Testa et al., 1996) 3. Preference ratings Patient comparisons of own health state to hypothetical health states on a pairwise basis. Differences in HMS scores rated “barely different” equates the MCID Own health state (Llewellyn-Thomas et al., 1996) 4. Comparison to known population(s) Comparison of dysfunctional to functional populations. Describes recovery status in relation to SDs between the groups Functional or dysfunctional populations (Jacobson et al., 1991) 1. Global ratings of change Changes in HMS are correlated to patients’/ clinicians’ global rating of improvement. Several methods to establish MCID e.g. 1) differences in mean scores between global rating categories, 2) application of diagnostic test procedures Patients’ or clinicians’ global rating of improvement (Stucki et al., 1995; Deyo et al., 1986) 2. Prognosis of future events HMS predicting a future event (e.g. care use, costs etc.). Difference in mean scores among those experiencing and those not experiencing the event used to establish the MCID Those experiencing and not experiencing a future event (Ware et al., 1984) 3. Changes in disease related outcome Comparison of changes in HMS to obtained or not obtained changes in other diseaserelated measures of outcome. Difference in HMS change scores between obtained and not obtained groups equates the MCID Changes in clinical outcome (Kolotkin et al., 2002) Longitudinal designs MCID, minimal clinically important difference [63, 93, 104, 112–116] Legend: Anchor-based methods for determining change The selection of the independent anchor is made either by the patient him/herself, by a clinician/expert, or both [8, 74, 117]. Several methodological problems have been described when using the retrospective TQ [69, 71, 118] and these are summarised in Table 3.6 on the facing page. However, Hägg et al. [72] showed the recall bias to be a “bidirectional overestimation” affecting both improved and worsened patients and interpreted this as an expression of greater sensitivity to change. This was valid for a 5-10 year period [73]. In addition they found only weak evidence of motivational bias in a surgical patient group. Additional aspects of the patients’ global retrospective assessment of treatment effect which deserve attention are their Footer: MCID, minimal clinically important difference (refs) 41 3.1. C ONCEPTS OF E VALUATION C RITERIA Table 3.6. Methodological weaknesses of the TQs as reported in the literature. TQ weaknesses Description Recall bias Refers to the inability of patients to recall their prior health state. This is most pronounced when the time span is long Present-state bias Refers to the correlation between the post-treatment score and the TQ. In other words, TQ ratings given by patients depend on how they are feeling “at present” and not the change experienced during the treatment period. Motivational bias Refers to the overestimation of the treatment response by patients who have undergone a cumbersome treatment Contamination bias Refers to the inability of patients to separate the measured health construct from other co-morbidities concurrently present when rating the TQ. Consequently, factors other than the measured construct may influence the TQ rating (i.e. for a LBP patient, this could be neck pain, headaches, a sprained ankle etc.) Dependence bias Refers to the methodological problem of allowing patients to concurrently rate both the HMS and the TQ. Consequently, there is dependence between the HMS and the external anchor – one may affect the other and vice versa. References [71, 72, 118–120] Legend: Methodological weaknesses of the TQ reported in the literature. construction and(ref implementation. been left at the discretion of individual researchers Footer: References 509 Norman et al. This 1997, has Hägg. Aseltine) as no standardisation has been reported in the literature. Consequently, a variety of different global ratings has been published together with a number of different analytical strategies. Whether this affects the results reported in clinical trials using transition questions is unknown and deserves clarification. Neither self-report HMS, nor transition ratings are “gold standard” measures. The transition question is used as a de facto “gold standard” indicator of meaningful change of a person’s health. A change in health state is a subjective evaluation where the individual concerned is the best judge of whether the change was important and/or meaningful. Consequently, the patient rated global retrospective assessment of treatment effect is probably the most valid indicator of change. Receiver Operating Characteristic Method Receiver Operating Characteristic (ROC) curve analysis originates from the signal detection theory in the 1950s, where radar operators needed to be able to distinguish the “signal” of a real target, from the background “noise” of the radar. Although primarily used in medicine to assess the ability of diagnostic tests to identify diseased from non-diseased individuals, Deyo and Centor [93] suggested that evaluative scales are analogous to “diagnostic tests” in the way that they should be able to discriminate between clinical improvement and non-improvement based on an external criterion (anchor). In this approach the ROC curve plots the “true positive rate” (sensitivity) against the “false positive rate” (1-specificity) for a series of change scores or “cut points”. The ROC curve is constructed using n change score cut-points that are plotted 42 3. E VALUATION OF H EALTH M EASUREMENT S CALES on the graph. The points are then joined with a smooth curve. The process of constructing a ROC-plot is illustrated in Figure 3.6. Figure 3.6. The construction of a ROC curve. NC (n = 500) n 1 Line: 2 3 Imp (n = 500) 4 5 6 7 8 Change score Line: Change (cut-point) ≥2 ≤2 Imp 500 0 Line: Change (cut-point) ≥4 ≤4 TQ NC 350 150 Imp 480 20 Change (cut-point) ≥6 ≤6 TQ NC 30 470 Imp 325 175 TQ NC 0 500 Sensitivity: 1.00 Sensitivity: 0.96 Sensitivity: 0.65 Specificity: 0.30 Specificity: 0.94 Specificity: 1.00 Sensitivity 1.0 (true-positive rate) 0.8 AUC 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1.0 1-specificity (false-positive rate) TQ, transition question; Imp, patients who have changed an important amount according to an external anchor; NC, patients who have not changed according to an external anchor. Note: The graph at the top represents the distribution of a fictive cohort of 1000 LBP patients plotted according to their change scores. All patients have been classified as having changed an important amount (Imp, n = 500) or stayed the same (NC, n = 500). The stippled lines represent examples of cohort dichotomisations (cut-points) in relation to their change scores, and sensitivity and specificity have been calculated for each of the three cut-points. Finally, the sensitivity is plotted against 1-specificity for each Legend: The construction of a ROC curve. cut-point and a ROC-curve generated. Footer: TQ, transition question; Imp, improved patients; NC, no change patients; AUC, area under the curve; n, number of patients area under the ROC curve (ROCauc ) indicates the probability of correctly rankThe resulting ing an improved or non-improved patient. A value of 1 for the ROCauc represents perfect accuracy - 100% correct diagnosis or identification of improved health status. On the other hand, 0.5 (50%) represents health status identification that is no better than chance alone. A useful 3.1. C ONCEPTS OF E VALUATION C RITERIA 43 test will be one with a steep rising curve indicating that as sensitivity increases, the rate of false positives remains low (Figure 3.6). A perfect test would be one where high sensitivity and specificity were attained simultaneously. In real life, tests rarely perform so well, and sensitivity usually decreases as specificity increases and vice versa. The change score cut-point closest to the top left corner of the graph indicates the value that gives the best sensitivity-specificity trade-off. This point represents the optimal cut-point and has been equated with the MCID of HMS [121]. The strengths of the ROC curve analysis have been summarised by Deyo et al. and Stratford et al. [103, 122]: • • • The difference between competing HMS to correctly classify improved and non-improved patients can be statistically tested The use of an external criterion allows comparison of HMS abilities to measure clinically meaningful change The point on the upper left-hand corner of the curve indicates the change score that is the most efficient at correctly differentiating subjects who have improved from those who have not However, determination of the optimal cut-point (MCID) by the ROC curve method has three main limitations. First, the choice of an appropriate anchor depends on the preference of the researcher, and this is likely to affect the size of the obtained optimal cut-point. Second, the external criterion has several methodological weaknesses, one of which is selecting the appropriate dividing-point on the transitional scale to dichotomise the cohort into improved and non-improved patients (please refer to Section 3.1.6 for further discussion). Last, it has been demonstrated that the optimal cut-point is dependent on the patient’s initial score [123] - i.e. the higher baseline score of the patient the larger is the required change score before it is clinically relevant. Whether baseline score is the only parameter which affects the optimal cut-point is unknown. In summary, the longitudinal anchor-based approaches are rapidly gaining popularity due to their clear advantage of linking change in a HMS to a meaningful external criterion. The advantage of global ratings is that they provide the single best measure of the clinical significance of the change experienced by the individual. This has to be weighted against the limitations using transition ratings. First, and probably most important, is the absence of a gold standard of clinically meaningful change, and the consequent reliance on a transitional scale as a de facto gold standard. Second, change score interpretation depends on the reliability and validity of transition questions which have been questioned by several authors. Finally, determination of the optimal cut-point (MCID) may well lie within the measurement error for the HMS questioning the validity of the obtained MCID. 44 3. E VALUATION OF H EALTH M EASUREMENT S CALES Combined Approaches The MCID can be determined using both distribution-based and anchor-based methods [110]. The distribution-based method has the advantage of being able to determine the measurement precision of an HMS at the expense of interpretability of the change scores. On the other hand, the anchor-based approach has the ability to establish the clinical significance of the change scores at the expense of often being within HMS measurement error. Consequently, a new breed of methods has been developed over the past few years which combine the advantages of the distribution-based and anchor-based methods [124–127]. The combined approaches have been developed to establish a more reliable minimal important difference (MID) in cancer research. In the method described by Eton et al. [125] results from both distribution-based and anchor-based analyses were synthesized into a range of MIDs for the FACT-B and its subscales. Criteria of 1/3 SD, 1/2 SD and the SEM were chosen to represent the distribution-based MIDs and the range was established by calculating the means of each criteria at all time points. The range of MIDs was refined by applying both cross-sectional and longitudinal anchor-based analyses. These analyses involved establishing clinically distinguishable comparison groups which were compared at a single point in time (cross-sectional scores) and over time (longitudinal change scores). The effect sizes and means of the group differences were calculated to establish the range of MIDs. Last, a synthesis of the distribution-based and anchor-based data was carried out and the mean differences corresponding to effect sizes between 0.2 and 0.6 used to precisely specify the range of MIDs (effect sizes in the low- to mid-part are most likely to incorporate the MID). The advantage of using a combined approach to establish a range of MIDs is the use of multiple external criteria which are derived from: • • • • The statistical characteristics of the sample i.e. takes measurement error into account Patient-reported global outcome of treatment Patient-reported functional and/or pain outcomes Physician-reported functional and/or pain outcomes Choice of Appropriate Change Coefficient The choice of change coefficient in studies of responsiveness is complicated and no consensus on the optimal strategy exists (see Section 2.2.3 on page 21). The reasons for this confusion are probably rooted in the many different definitions of responsiveness [61] and the absence of a gold standard for change in health status. Consequently, authors have typically applied multiple responsiveness indices to the same data set in order to increase confidence in their conclusions [3, 115, 128]. Stratford et al. [62] have called this approach for a “shotgun analysis” since conflicts in the application of several indices exist. They outline guidelines for choosing the optimal change coefficient based on study design and sample change characteristics (Table 3.7 on the next page). 45 3.1. C ONCEPTS OF E VALUATION C RITERIA Table 3.7. Study designs and their corresponding analytic methods. Study design Description Examples of change coefficients and statistical tests Coefficients based on homogeneity of patients’ change characteristics Sample of patients expected to change by approximately the same amount over the study period. Example: an effective intervention applied to a homogenous patient cohort who is expected to respond well. Change coefficient: - SRM Statistical tests: - Paired t-test - ANOVA (one within-patient factor i.e. before and after treatment) Between group contrast coefficients Two or more identifiable subgroups of patients who are expected to change by different amounts over the study period. Example: an effective intervention applied to patients with different severity of their problem or different diagnoses. Change coefficient: - ROC curve analysis Statistical tests: - Norman’s Srepeat - Unpaired t-test - ANOVA (with one within patient factor and one grouping factor i.e. amount of change) Correlation coefficients Sample of patients, many of whom are expected to truly change by different amounts over the study period. Example: an effective intervention applied to a heterogeneous patient cohort where only some are expected to respond well. Change coefficient: - correlation analysis - requires application of an external standard i.e. another similar HMS or a transition scale. SRM, standardised response mean; ANOVA, analysis of variance; ROC, receiver operating characteristic [62] Legend: Study designs and their corresponding analytic methods. In addition, they suggest researchers to focus on two important issues when planning studFooter: SRM, standardisedfirst, response mean; ANOVA, analysis of variance; ROC, receiver operatingthe characteristic ies of responsiveness: to develop a sound theoretical approach emphasizing included (ref: Stratford et al, 2005) cohorts likely change characteristics, and second, to select the more rigorous designs which allow both the assessment of change and discrimination among different groups of patients. 4. O BJECTIVE AND A IMS O BJECTIVE The overall objective of the PhD thesis was to establish which questionnaires were most appropriate in specific subgroups of patients with low back pain and to establish what patients think is a clinically relevant change when scoring these questionnaires. In addition, we wanted to explore whether low back pain patients would be able to determine an acceptable treatment outcome before it begins. A IMS The specific aims of the PhD thesis were: PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY • • • To translate and cross-culturally adapt the Oswestry Disability Index into the Danish language and to validate it in two sub-populations of low back pain patients (paper I-1 & I-2). To concurrently compare responsiveness and minimal clinically important differences for commonly used pain and functional instruments in four sub-populations of low back pain patients (paper I-3). To propose a standardised use of patients’ retrospective perception of treatment effect based on analysis of responsiveness (paper I-4). PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY • To develop a prospective method to determine patients’ acceptable outcome using standardised questionnaires and concurrently compare this to a well established retrospective method and measurement error of the questionnaires (paper II-1). 5. M ETHODS 5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 5.1.1. T RANSLATION AND C ROSS - CULTURAL A DAPTATION The translation and cross-cultural adaptation of the ODI (version 2.1) followed the five stages outlined in recent guidelines (see Section 3.1.1 on page 26) [77, 78]. Four independent translators, one methodologist, one clinician, one language specialist and a coordinator participated in the process. Divergences in translations were resolved by consensus and written documentation was produced for each stage of the process. The pre-final version was tested for content validity, wording, ease of understanding, and missing items in 40 patients (20 primary sector (PrS) and 20 secondary sector (SeS) patients) followed by a semi-structured interview. Further psychometric testing of the final version of the Danish ODI was carried out in a validation study. 5.1.2. PATIENTS AND S ETTING Inclusion criteria for the study were: • • • Age above 18 Presence of low back pain and/or leg pain when presenting to: 1) one of the included chiropractic clinics, or 2) the out-patient hospital back pain clinic Able to read and understand Danish Exclusion criteria were: • • Suspected pathological disorder of the spine (fractures, spinal infections or malignancy, ankylosing spondylitis, rheumatoid arthritis or other inflammatory diseases) Patients with a known psychiatric disorder A total of 233 consecutive patients with acute and chronic LBP were included in the study; 94 from the PrS (seven chiropractic practices) and 97 from the SeS (a hospital based multidisciplinary spinal unit). Questionnaire booklets were collected at baseline, at day one for PrS patients, at one week for SeS patients, and eight weeks follow-up. A telephone interview was conducted 3-5 days after the eight weeks follow-up by a professional interviewer from the Danish National Institute of Social Research to obtain the patients’ retrospective assessment of treatment effect (Appendix I on page 81). 48 5. M ETHODS 5.1.3. O UTCOME M EASURES The questionnaire booklet included the final version of the Danish ODI, the 23-item RMQ [22, 50], the two subscales of the Low Back Pain Rating Scale: pain (LBPRSpain ) and disability (LBPRSdisability ) [47] as well as the two subscales of the SF36: physical function SF36 (pf) and bodily pain SF36 (bp) scales [129–131] (Appendix II on page 82). The test-retest booklet (1 day/1 week follow-up) contained the Danish ODI with the questions rearranged and a global question of change (Appendix III on page 91). The patients’ global retrospective assessment of treatment effect was measured using two different transition questions (Appendix IV on page 94). Transition question 1 (TQ1) was a 7-point Likert scale [108] and transition question 2 (TQ2) was a 15-point scale [69]. In addition, a 0-10 numeric rating scale of importance of the change was included. For paper I-2 and I-3 the TQs were combined to one according to Table 5.1. Table 5.1. Merging transition questions 1 and 2. Transition question 2 A very great deal better A great deal better Transition question 1 Much better A good deal better Moderately better Better Somewhat better A little better A little better Almost the same, hardly any better About the same Almost the same, hardly any worse No change A little worse Somewhat worse A little worse Moderately worse A good deal worse Worse A great deal worse A very great deal worse Much worse 5.1.4. S UBGROUPS Legend: Merging transition question 1 and 2 Patients available at the eight weeks follow-up were divided into four subgroups after either pain location or point of entry into the health care system at baseline (condition severity). Patients were divided into LBP only and leg pain and/or LBP patients for pain location and PrS and SeS patients for patient entry point. 5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 49 5.1.5. S TATISTICAL A NALYSES PAPER I-1 Internal Consistency Internal consistency was measured using Cronbach’s α and item-total correlations (Section 3.1.2 on page 28). In addition, each item score was graphed against the five score categories as described by Fairbank et al. [132]. Reproducibility Agreement. Agreement (reproducibility and repeatability in paper I-1) was measured using the LOA plot as outlined by Bland and Altman (Section 3.1.4 on page 30) [88]. Reliability. Reliability (reproducibility and repeatability in paper I-1) was calculated as the ICCagreement (Section 3.1.4 on page 32). Floor & Ceiling Effect (scale width) Conventional floor and ceiling effects were calculated. In addition, the scale width using the LOA at each end of the scale was estimated (Section 3.1.3 on page 29). Validity Cross-sectional discriminant validity. Discriminant validity was assessed for two levels of 1) symptom location, 2) pain duration, and 3) medication frequency. Concurrent validity. Three aspects of concurrent validity was examined as outlined in Table 5.2 on the next page. Longitudinal external construct validity. This was assessed by comparing the change score of the ODI to that of the external measures using Pearson’s correlation coefficient (R). PAPER I-2 Responsiveness Change score comparisons. The ODI mean change scores were compared: 1) in PrS and SeS patients (paired t-test), and 2) to each of the external instruments (paired t-test). In addition, all HMS change scores were: 1) analysed for each TQ category using a robust linear regression analysis, and 2) compared in “important improvement” and “no change” patients. Distribution-based responsiveness. SRMraw and SRM% were calculated for: 1) the overall change scores, 2) for each TQ category, and 3) for “important improvement” and “no change” patients (Section 3.1.6 on page 36). 50 5. M ETHODS Table 5.2. Concurrent validity and statistical tests examined in study I. Aspects of concurrent validity (a) Within- and between-scale systematic differences (baseline and 8 weeks follow-up) − Comparison of the mean ODI score to that of the external HMS − Comparison of the mean ODI score between PrS and SeS patients − Comparison of the mean ODI score to that of the external HMS in PrS and SeS patients Statistical tests Regression model with an interaction term (b) Spread of HMS scores (baseline and 8 weeks follow-up) − Comparison of ODI SD to that of the external HMS − Comparison of ODI SD to that of the external HMS in PrS and SeS Variance comparison test (c) Individual patient score level − Comparison of ODI score level to that of the external HMS Bland & Altman LOA plots of standardised scores ODI, Oswestry Disability Index; SD, standard deviation; PrS, primary sector; SeS, secondary sector; LOA, limits of Aspects agreement Legend: of concurrent validity measured in study I. Footer: ODI, Oswestry Disability Index; SD, standard deviation; PrS, primary sector; SeS, secondary sector Anchor-based responsiveness. ROC statistics were used for the anchor-based approach (Section 3.1.6 on page 39). The ROCauc and the optimal cut-off change score in both the PrS and SeS patients were determined for all the questionnaires. In addition, PrS and SeS patients were stratified into six ODI baseline entry score categories and optimal cut-off change scores were calculated for each category and plotted against baseline entry scores. Weighted linear regression was used to determine the change in MCID with changing baseline entry score. The effects of both baseline entry score and patient entry point on classification of patients into “important improvement” and “no change” were analysed using diagnostic tests statistics. Transition question correlations. Spearman’s correlation coefficient was used to establish validity between the TQ and the HMS change scores [133, 134]. PAPER I-3 Responsiveness Distribution-based responsiveness. SRMraw was calculated for: 1) the change scores in each of the four subgroups, and 2) for “important improvement” and “no change” patients (Section 3.1.6 on page 36). Confidence intervals were estimated using a bootstrap method [135]. To compare the SRMraw of the different questionnaires within each subgroup, the SRMraw was estimated using stata’s regression command with group indicators and the cluster option to account for intra-individual correlation between responses. The differences between SRMraw were examined with a non-linear Wald test [136]. The same procedure was used to test the difference between “important improvement” and “no change” groups within each subpopulation. 5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 51 Anchor-based responsiveness. ROC statistics were used for the anchor-based approach (Section 3.1.6 on page 39). The ROCauc was determined in each of the four subgroups and an omnibus comparison within each subpopulation was carried out using a non-parametric approach [137]. Overall, quarter-specific, pain location specific, and patient entry point specific MCIDs were determined by an optimal cut-point analysis using both the raw (MCID) and percentage (MCID% ) change scores. Categories with less than 10 patients were excluded from the analysis. Moreover, the dependence of the MCID on baseline score was adjusted by a weighted linear regression. PAPER I-4 Measures of Serial Change and their Analyses Three disability HMS (ODI, RMQ and SF36 (pf)) and two pain scales (LBPRSpain and NRSpain ) were included in the analyses and transformed to cover an interval ranging from 0 - 100. The responsiveness of the serial change was expressed using the SRM. Measures of Transition and their Analyses Two different transition questions measuring the patients’ global retrospective assessment of treatment were included in the analyses. Group A received TQ1 (7-point scale) and group B received TQ2 (15-point scale) and both scales had different introductory questions. In addition, both groups rated the importance of the change in health state experienced during the treatment on a NRSimp . Dichotomisation of the transition questions. All patients were dichotomised as having either improved or stayed the same based on the transition question alone or based on the transition question and the global rating of importance in combination. Stringent and less stringent criteria for improved and unchanged patients were defined for four external criteria: TQ1, TQ1+NRSimp , TQ2 and TQ2+NRSimp . Patients who deteriorated were excluded from the analyses [72, 138–140]. Responsiveness of retrospective change. The SRM for the retrospective change was calculated from the coded TQ responses. TQ1 was coded as follows: much better = 3, better = 2, a little better = 1, no change = 0, a little worse = -1, worse = -2 and much worse = -3. Similarly, TQ2 was coded from 7 to -7. We used the mean of the coded post-treatment response as the numerator and the SD of the mean coded post-treatment response as the denominator. Hence, both positive and negative values can occur in the numerator making serial and retrospective SRMs comparable. Comparison of the Measures of Serial Change and Transition For each of the included instruments we calculated four different ROCauc - one for each external criterion - to determine the influence of different external criteria on the magnitude of 52 5. M ETHODS the ROCauc . Moreover, the dependence of the ROCauc on the TQ dichotomisation procedure was determined by comparing stringent and less stringent dichotomisations in a Bland and Altman LOA plot. 5.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 5.2.1. PATIENTS AND S ETTING Patients suffering from treatment resistant chronic low back pain and/or leg pain were recruited from an out-patient hospital back pain clinic in 2005. Inclusion and exclusion criteria were the same as those described in Section 5.1.2 on page 47. 5.2.2. P ILOT S TUDY Face and content validity of the modified questionnaires (Section 5.2.4) was tested on 25 consecutive chronic LBP patients in a semi-structured interview (Appendix V on page 97) and documented in a written report. 5.2.3. M AIN S TUDY One-hundred-and-fourty-seven chronic LBP patients receiving conservative care were followed over an eight week period. Questionnaire booklets were filled in at baseline before commencing the treatment (Appendix VI on page 100), at one-week follow-up (Appendix VII on page 110), and at eight-weeks follow-up (Appendix VIII on page 111). In addition, a telephone interview was carried out at nine-weeks follow-up (Appendix IX on page 116). 5.2.4. O UTCOME M EASURES Two sets of pain and functional/psychological outcome measures were completed at baseline: 1) ordinary outcome measures, and 2) modified outcome measures. The ordinary outcome measures. These consisted of the ODI, the multidimensional Bournemouth Questionnaire (BQ) [141, 142] and the NRSpain measured over the past week. The pre-treatment acceptable outcome measures. These consisted of modified versions of the ODI, BQ and NRSpain . The introduction to the modifed questionnaires asked the patient to differentiate between what they considered an acceptable result and their expectations/hopes to the treatment. Second, all the questions in each HMS were modified to include the following basic question: “Please indicate what you consider to be an acceptable level of (e.g. pain) after completion of the treatment if you had to accept some (e.g. pain)?” 5.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 53 Design. At one week follow-up all patients completed both the ordinary and pre-treatment acceptable outcome measures including questions indicating change since baseline. At eight weeks follow-up patients completed the ordinary HMS and a 7-point TQ. 5.2.5. S TATISTICAL M ETHODS PAPER II-1 Reproducibility A summary score of the pre-treatment acceptable post-score was generated for each of the modified instruments. The pre-treatment acceptable summary scores were tested for agreement (reproducibility in paper II-1) using the Bland and Altman LOA plot. Concurrent Validity The MCID determined before treatment (MCIDpre ) was compared to measurement error (MDC95% and the lower limit of agreement - LOAlower ) and a post-treatment anchor-based method of establishing the minimal clinically important difference - the MCIDpost . This was established in four subgroups: 1) patients with LBP only, 2) patients with leg pain and/or LBP, 3) patients with LBP duration ≤6 months, and patients with LBP duration > 6 months. The MCIDpre . This was calculated by subtracting the acceptable post-treatment score determined pre-treatment from the ordinary pre-treatment score for each item. A summary of the change score was calculated for each instrument by summing the MCIDpre for each item. The MDC95% . The minimal detectable change has been described in Section 3.1.4 on page 30. It was computed using ANOVA for random effects. The LOAlower . This is the lower 95% confidence level of the LOA plot and can be interpreted as the instrument measurement error when patients improve. Thus, any score change outside the LOAlower should be considered a “real improvement”. The MCIDpost . The MCIDpost was established by determining the optimal cut-off change score using ROC curve analysis. Confidence intervals for the MCIDpost were estimated using stata’s programming function to calculate the optimal cut-point and a bootstrap procedure. Acceptable Treatment Outcome To establish whether our cohort of chronic patients was able to determine an acceptable outcome of treatment before it began, the MCIDpre was compared to: a) the post-treatment acceptable change, b) the MCIDpost and c) the overall post-treatment change. The post-treatment 54 5. M ETHODS acceptable change was defined as the mean serial change score in patients who rated themselves as “better” or “much better” on the TQ. Statistical significance between the groups was tested using Wilcoxon rank-sum test. 6. S UMMARY OF R ESULTS 6.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 6.1.1. PARTICIPANTS A total of 191 LBP patients equally distributed between the PrS and SeS were available for analysis at eight weeks follow-up. PrS patients generally had LBP only, were more acute and less disabled compared to SeS patients, however, pain intensity was similar. In addition, all demographic characteristics were almost identical (except for more patients with LBP in group B - TQ2) when stratified according to the type of TQ. 6.1.2. PAPER I-1 Translation and Cross-cultural Adaptation The process of translating and cross-culturally adapting the ODI lasted almost four months and four Danish versions were produced before the final version was finished (Appendix X on page 117). Few disagreements arose during the process which were satisfactorily resolved by consensus. Internal Consistency Cronbach’s α was 0.88 for all patients, 0.89 in PrS and 0.85 in SeS patients. Moreover, all items contributed to the total score and these belonged to the same latent variable of pain related function. Reproducibility Agreement. The mean difference and 95% LOA for all patients were 0.8 [-11.5 to +13.0] with no noteworthy difference between PrS and SeS patients. (This is referred to as reproducibility and repeatability in paper I-1). Reliability. The ICCagreement was 0.91 for all patients, 0.93 in PrS and 0.89 in SeS patients. (This is referred to as reproducibility and repeatability in paper I-1). 56 6. S UMMARY OF R ESULTS Floor & Ceiling Effect (scale width) A total of 25 patients (10.7%) scored within the lower score range i.e. 0 - 11.5 points. These were mainly PrS patients. Validity Cross-sectional discriminant validity. The ODI could discriminate between subgroups of patients with regard to all the chosen medical history variables, however, only the difference in medication usage was considered clinically relevant. Concurrent validity. The ODI was compared to all included external HMS and showed: 1) 10%-21% lower measurements with only small differences between PrS and SeS patients, 2) a statistically significant narrower score spread with no differences seen between the two groups, and 3) comparable standardised baseline scores lying in the range of ±1.3 to ±1.7 SDs, again with no differences among the two groups. Longitudinal external construct validity. A change score correlation analysis showed coefficients ranging from 0.56 - 0.78. 6.1.3. PAPER I-2 Responsiveness Change score comparisons. The ODI was characterised by a significantly smaller change score reduction compared to most external HMS. Furthermore, there was an almost linear increase in mean change score with improving TQ category comparable to that of the external HMS. Last, the difference in mean change score between “important improvement” and “no change” patients was more or less 15 points which was in agreement with the external measures. Distribution-based responsiveness. The ODI showed comparable results to the external HMS with respect to: 1) a SRMraw of 0.7 and SRM% of 0.6 for all patients, 2) higher SRMs in the “important improvement” group compared to the “no change” group, and 3) a gradual increase in SRM with a progressive patient improvement. Moreover, the ODI was less sensitive to change in PrS patients compared to the RMQ. Anchor-based responsiveness. For all patients, the ODI ROCauc was 0.82 for the raw change score and 0.84 for the percentage change score. The MCID was nine points (71%) for PrS and eight points for SeS (27%) patients and dependent on baseline entry score primarily in PrS patients (6.6 points increase for every 10 points increase in baseline score). Transition question correlations. The correlation coefficient (R) between the ODI change score and the TQ was 0.6 for all patients. 6.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 57 6.1.4. PAPER I-3 Responsiveness Distribution-based responsiveness. The RMQ was the most responsive disability HMS for patients with LBP only (SRMraw : 0.5 - 1.4). The ODI and RMQ were equally responsive in leg pain patients (SRMraw : 0.3 - 0.9). For the pain measures, the SF36 (bp) had the highest responsiveness in all subgroups (SRMraw : 0.6 - 1.4). For pain and disability, the SF36 (bp) and RMQ showed the largest differences in SRMraw between the “important improvement” and “no change” groups in all subgroups, respectively. Anchor-based responsiveness. The RMQ showed the highest ROCauc in LBP only patients (both PrS and SeS) whereas the ODI was marginally superior in the leg pain patients. For the pain measures, the LBPRSpain was the superior instrument in the LBP only patients. Similar discriminative abilities were observed in the other subpopulations. Regarding the MCID, the following was observed: 1) the overall MCID showed only minor variations in the four subgroups, 2) the MCID increased with increasing baseline entry score mainly in the PrS patients, and 3) the MCID% was almost constant across the score groups for the ODI and RMQ but differed in the subgroups. 6.1.5. PAPER I-4 Patient Classification The proportion classified as improved using stringent criteria resulted in 6 - 7% fewer patients being classified as improved when using TQ1 (7-point response scale) compared to TQ2 (15-point response scale). No difference was seen using the less stringent criteria. Responsiveness Distribution-based responsiveness. The retrospective TQ showed slightly higher SRMs (range: 0.8-0.9) compared to the serial instruments (range: 0.6-0.7) when considering all patients. This difference was most pronounced in the PrS patients (serial SRMs: 0.9-1.2 vs. retrospective SRMs: 1.7) with no difference in the SeS patients. Anchor-based responsiveness. The magnitude of the ROCauc varied only slightly across the four stringent external criteria for all HMS (largest difference between minimum and maximum criteria was 0.09) with slightly larger differences between criteria in PrS and SeS patients. The ROCauc was slightly smaller (average: 0.02) and the variation slightly larger using the less stringent criteria. 58 6. S UMMARY OF R ESULTS 6.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 6.2.1. PARTICIPANTS A total of 119 LBP patients completed the eight weeks follow-up (response rate: 83.7%) and were available for analysis. The pre-treatment acceptable post-score was below 32 points for most patients, however, the distribution of the acceptable post-scores varied according to the type of HMS. 6.2.2. M AIN S TUDY Reproducibility The systematic difference and 95% LOA were 0.8 [-6.6; 8.2] for the modified ODI, -0.2 [-8.8; 8.4] for the modified BQ, and 0.0 [-1.9; 1.9] for the modified NRSpain . Concurrent Validity The pre-treatment acceptable change for chronic LBP patients scoring the ODI was a 26% reduction whereas this figure was 36% for the BQ and 42% for the NRSpain . The MCIDpre was outside measurement error (MDC95% and LOAlower ) and approximately 4.5 times larger compared to the MCIDpost for the ODI and 1.5 times larger for the BQ and NRSpain . Patients with leg pain ± LBP generally expected a larger change pre-treatment before it was acceptable in comparison to patients with LBP only. No differences were seen with regard to symptom duration. Acceptable Treatment Outcome The MCIDpre was almost identical to the post-treatment acceptable change for the NRSpain whereas it was significantly larger for the ODI and BQ. 7. D ISCUSSION 7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 7.1.1. D ISCUSSION OF F INDINGS The Danish Oswestry Disability Index Cross-cultural adaptation. The English version of the ODI was translated and cross-culturally adapted into the Danish language using current guidelines [77, 78] and I consider it both reliable and conceptually valid. The process was both time consuming and demanded considerable resources. In light of the relatively few discrepancies discussed during the five stages of the adaptation, one may question the value of this cumbersome process in comparison to a cross-cultural adaptation involving less resources. I suggest the following points to be considered when allocating resources for adapting a HMS: • • • The similarity of the language and cultural setting in which the original HMS was developed. The complexity of the HMS. Does the HMS contain difficult language requiring experts to obtain semantic, idiomatic, experiential and conceptual equivalency to the original HMS? The translations. Are two translations and back translations necessary? If not, then stage 2 can be omitted and stage 1 and 3 shortened. Internal consistency and reproducibility. Internal consistency was similar in the two populations (PrS, 0.89; SeS, 0.85) and comparable to previously reported coefficients [13, 143, 144]. Furthermore, reproducibility was measured using the ICCagreement and LOA in stable patients. The ICCagreement (0.91) was acceptable falling within the range of 0.76 to 0.94 reported in the literature [83, 145, 146]. Similarly, the LOA measurement error (11.5 points for improvement1 ) was close to published values of the MDC95% ranging from 10-13 points [83, 138, 144, 146, 147]. However, caution has to be taken as the ODI version and the level of confidence differ in these studies. Floor and ceiling effects. The ODI showed no floor and ceiling effects using the conventional definition as described in Figure 3.2 on page 30. In contrast, there was a pronounced floor effect 1 This was erroneously reported as 12 points for worsening rather than for improvement in paper I-1. Thus, a “real” worsening was an increase in change score of 13 points while a “real” improvement was a decrease of 12 points. 60 7. D ISCUSSION in PrS patients (14.1%) using the more sensible scale width method. Consequently, it seems logical to question: 1) the usefulness of the Danish ODI in PrS patients, and 2) the usefulness of the conventional method of establishing floor and ceiling effects. I recommend using the methodologically superior scale width method as a benchmark to detect instrument scaling problems at the extremes. Validity. Several aspects of criterion and construct validity were tested for the Danish ODI and some deserve special attention. The ODI showed 10-21% lower mean scores compared to the external HMS, had the poorest spread of patient disabilities in both PrS and SeS patients, and showed individual score levels between ±1.3 to ±1.7 SDs compared to the external disability and pain scales, respectively. This confirms the belief that the ODI is more appropriate for patients with a greater degree of disability [20, 27, 34] but also that its ability to discriminate between patient disabilities may be problematic. Responsiveness. Responsiveness of the ODI was scrutinised using both distribution and anchor-based methods and the most important findings will be discussed here. First, the ODI mean change score in PrS patients was generally lower compared to most of the external instruments reinforcing the appropriateness of the ODI in patients with a high degree of disability. Moreover, the ODI mean change score difference between “important improvement” and “no change” patients were 13 and 10 points in PrS and SeS patients respectively, and this concurs well with other studies [138, 148]. Second, the ODI had the second highest SRM of all the disability instruments with the RMQ showing the highest values. The difference between ODI and RMQ was most pronounced in the PrS patients and agrees with several studies [13,22,140,149]. Third, the ROCauc for the ODI (0.82) was comparable to that of the external instruments with no difference between the two patient populations. This was surprising as differences were seen in the SRM and PrS and SeS patients differ in a number of baseline characteristics. Minimal clinically important difference. ODI cut-off scores were determined to express the MCID. The change scores which seemed important to LBP patients were nine points (71%) in PrS and eight points (27%) in SeS patients and increased with approximately 5 points for every 10 points increase in baseline score. This was within reported values in the literature which vary widely (range from 4-23 points) [138, 140, 146, 148, 150–152] reflecting the diversity of available methodologies and the differences in the range of baseline scores. This seems to be a general problem as other HMS also show a wide range of reported MCID values (e.g. SF36 (pf): 7-16 points [22, 153]). Accordingly, it can be argued that the accuracy of classifying patients as improved or unchanged using a single cut-off change score may not be valid. I recommend researchers to report a range of MCIDs and either: 1) adjust for baseline dependence in a multivariate analysis, or 2) show how this varies according to baseline scores. A range of MCIDs can be obtained by using the combined approach which has several methodological 7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 61 advantages compared to the distribution-based and anchor-based methods alone as outlined in Section 3.1.6 on page 44. The wide range of MCID values reported for the ODI has consequences for reporting proportions of improved patients and the numbers needed to treat (NNT) of clinical trials as: 1) the choice of ODI cut-point to dichotomise patients as improved or unchanged is almost arbitrary in reported studies, and 2) to the author’s knowledge, the effect of baseline score on the MCID is not accounted for in the reported results. As a result I suggest that researchers consider the following when reporting proportions and NNT: • • • Carefully match own study population and settings to that of the MCID study or design own study to allow for MCID calculations Carefully consider the methodology used to establish the MCID Include MCID baseline dependence in the calculations of proportions and NNT Responsiveness and Subgroups Concurrent comparisons of responsiveness and MCID calculations in subgroups are rare [39, 72, 149] and lacking behind clinical intervention research (Section 2.1 on page 17). This study concurrently compared responsiveness and MCIDs in four subgroups of LBP patients relevant to clinical research: 1) primary sector patients, 2) secondary sector patients, 3) LBP only patients, and 4) leg pain ±LBP patients. The results demonstrate that responsiveness of an HMS varies according to the patient population it is applied to and that this was most pronounced between the disability instruments and between PrS and SeS patients. The RMQ or ODI proved to be the disability HMS of choice in all subgroups, however, the patients’ global retrospective assessment of treatment effect appeared to be the most responsive instrument in the PrS patients. Moreover, a moderate to large difference between the SF36 (bp) and the rest of the pain scales was observed. In spite of this, I conclude that the pain measures have similar responsiveness based on the fact that the SF36 (bp) showed poor specificity in all subgroups questioning the validity of this scale in LBP patients. In an attempt to simplify the choice of pain and disability scales, an algorithm was developed based on the subgroup study and the timing of the intervention programme (Figure 7.1 on page 67). Patients’ Global Assessment of Treatment Effect The use of TQs in clinical research are becoming the norm rather than the exception, however, no standardisation on wording, response options and dichotomisation procedures has been agreed upon. Thus, I set out to compare four different external criteria (two TQs and two TQs combined with a rating of importance) in PrS and SeS patients. We found that the choice of external criteria changed the proportion of patients classified as improved and unchanged (6 - 7% difference between TQ1 and TQ2) but did not influence the discriminative abilities of 62 7. D ISCUSSION the HMS. This has two major implications: 1) interpretation of clinical trials may vary according to which procedure was chosen, and 2) between-study comparisons of e.g. proportions of improved patients or NNT are difficult if not impossible. Consequently, I recommend that investigators or clinicians use the results of clinical trials including external criteria to: • • Closely scrutinise the transition question and dichotomisation procedures used when evaluating the results and conclusions. Also pay attention to the transition question and dichotomisation procedures used when comparing trial results. A proposal for standardised use of transition questions has been outlined and summarised in Table 7.1. Table 7.1. Steps in standardising the construction of patients’ global assessment of treatment effect. Steps Standardisation Step 1: Introductory question • Should be clear • Should have a well defined time frame • Focus on change in the area of interest Step 2: Response options • Should have seven response options with a “middle” representing no change • Should be short and clear • Should have a logical progression Step 3: Dichotomisation procedure • Should use a stringent definition of what represents a clinical relevant change from the patients point-of-view* * In a seven-point transition question, the improved patients would rate themselves as either “much better” or “better”, and the unchanged patients would rate themselves as “a little better”, “no change” or “a little worse”. Legend: Steps in standardising the construction of transition questions * In a seven-point transition question, the improved patients would rate themselves as either “much better” or “better” and the unchanged patients would rate themselves as “a little better”, “no change” or “a little worse”. 7.1.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS The methodology used in this study was designed primarily for validating the ODI in Danish and secondarily for concurrent HMS comparisons and TQ analysis. As a result, several methodological design features not mentioned in the papers deserve attention. The primary and secondary sector patients. The study was based on patients seen in the primary sector (chiropractic clinics) and secondary sector (out-patient hospital back pain clinic) of the Danish health care system to obtain a broad range of LBP patients. However, it can be questioned whether these patient groups are in fact a representative samples of PrS and SeS patients as a whole. Back pain patients’ initial contact with the Danish health care system is via the primary health care sector which comprises medical doctors, chiropractors, and physiotherapists. In 7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY 63 1997 Lønnberg compared, among other things, LBP patients’ contact patterns between medical doctors, chiropractors, and physiotherapists [154]. It was found that chiropractic patients differed in several aspects compared to patients seen by medical doctors: 1) they were generally older, 2) they were less disabled, 3) they had a better general health, and 4) a larger proportion were engaged in active employment. If interpreted literally, then chiropractic patients are not representative of PrS patients. However, caution is advocated interpreting the results this way as new evidence does not support the differences found by Lønnberg [155]. First, the age distribution of the chiropractic patients approximately equals the patients seen by medical doctors. Second, the proportion of chiropractic patients engaged in active employment had decreased by 14% between 1997 and 2002. Third, the proportion of chiropractic patients without longstanding disease had decreased considerably in the same period. Last, the level of pain (visual analog scale = 63-65 mm.) and disability (visual analog scale for activities of daily living = 62-64 mm.) at the initial visit was moderate to high and comparable to patients with cardiac insufficiency and patients with chronic obstructive pulmonary disease. On the basis of this I consider the included chiropractic patients representative of LBP patients seen in the PrS of the Danish health care system. For the SeS patients, we included only one out-patient hospital low back pain clinic and these patients may not be representative for all SeS patients in Denmark. On the one hand, the clinic receives referrals from the whole county of Funen which makes up 10% of the Danish population and is said to be representative thereof [156]. On the other hand, referrals to the hospital clinic are dependent on specific referral criteria which varies from county to county. Moreover, the surgical patients are probably underrepresented as many are seen elsewhere. In all, I believe these patients to be representative of the majority of the non-surgical SeS patient despite the limitations mentioned above. Inclusion of consecutive patients. A further limitation of the study is the inclusion of consecutive patients. A total of 23% of the available patients (PrS patients: 13%; SeS patients: 9%) refused to participate in the study and it is unknown whether our results have been biased as the included patients were not strictly consecutive. However, in comparison to a cohort of 293 strictly consecutive patients seen in the same hospital back pain clinic, the SeS patients’ mean disability scores (LBPRSdisability ) were similar (our SeS cohort: 45.2 [SD 17.8]; consecutive SeS cohort: 47.9 [SD 20.2]) (unpublished data). From baseline to 8 weeks follow-up, a further 18% of the patients (PrS patients: 27%; SeS patients: 8%) dropped out, but only small differences were seen in the dropout analysis. In summary, I conclude that the included cohort of patients is representative for the PrS and SeS patients (but more so for the SeS patients) despite not being strictly consecutive. Stable patients. The selection of stable patients for the reproducibility calculations was chosen on the basis of an external anchor of change. Difficulties arose in the PrS patients as true change was likely to occur shortly after initiation of treatment. This had to be balanced against the 64 7. D ISCUSSION possible recall bias from administering the retest questionnaires with a short time interval. The questions in the retest booklet were shuffled to minimise recall bias, however, it is possible that the reproducibility results are inflated in the PrS patients. As our results were comparable to other reproducibility studies, I consider this bias negligible. Merging transition questions. To obtain the same global ratings of change, the two TQs were merged into one question as outlined in Table 5.1 on page 48. As these transition ratings had different introductory questions and response options and were obtained by unvalidated telephone interviews, the critical reader may argue that this invalidates our results. However, close examination of the discriminative abilities of the two TQs did not show any disparity, and only small differences were seen in the patient classification (Section 6.1.5 on page 57). On the basis of this I believe that the procedure of merging the TQs is valid but whether the transition ratings were positively biased by the telephone interview remains speculative. Outcome measures. Comparing ODI and RMQ to the dimensions of daily living (SF36 (pf)) and a combination of daily living and pain related function (LBPRSdisability ) may be problematic. Many items of the ODI and RMQ inquire about functional activities in relation to pain - a slightly different dimension. Comparing responsiveness of related but different dimensions in the same patient population is likely to give different results as patients tend to respond inconsistently to various dimensions during treatment (e.g. pain vs. function). Consequently, some of the variability in responsiveness seen in our results may be attributed to the measurement of different but related dimensions. 7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 7.2.1. D ISCUSSION OF F INDINGS A method for determining LBP patients’ acceptable level of treatment outcome a priori using standardised HMS was developed and compared to measurement error and a commonly used a posteriori anchor-based method. MCIDpre versus MCIDpost . The results showed a considerable gap between the MCIDpre and the MCIDpost , some of which may be explained by the following two factors: 1) a response shift occurring during the treatment, and 2) patients’ (lack of) ability to differentiate between the concepts of “acceptable results of the treatment” and “ expectations/hopes to the treatment”. Both would result in an underestimation of what is acceptable, thus, overestimating the MCIDpre . I suspect that the response shift had the greatest impact on our results for several reasons. First, patient information is a cornerstone in the management at the out-patient hospital back pain unit and is emphasized continuously during the course of treatment. Accordingly, it seems only natural if patients are influenced by the large amount of information received and 7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 65 in fact change their attitudes and behaviours. Second, the results showed that patients were in fact able to distinguish between what is acceptable and their expectations/hopes in a small randomised trial, and this is also supported in the literature [157]. MCIDpre versus post-treatment acceptable change. The study compared the MCIDpre with what patients feel is acceptable change after treatment cessation, and the results show a disparity in the ODI and BQ but less so for the NRSpain . One likely interpretation is that LBP patients have a clearer understanding of what is an acceptable change in pain intensity before starting treatment in comparison to changes in functional and psychological/affective domains. This has implications for patient satisfaction and the results of clinical trials. If patient established acceptable outcomes are not matched to anticipated treatment efficacy before the treatment begins it follows that these patients may become dissatisfied. Dissatisfied patients will almost certainly report less favourable results of the treatment in clinical trials. The results show that this is most important for functional and psychological/affective domains, and I suggest researchers and clinicians to incorporate this in clinical trials and clinical practice to enhance satisfaction and treatment outcomes. 7.2.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS The process of developing a novel a priori method to establish clinically relevant change was laborious and time consuming. I undertook this demanding task as the method has several methodological advantages compared to the commonly used retrospective method: • • • It does not rely on an external anchor of interpretability which is vulnerable to biases as outlined in Section 3.1.6 on page 39. Interpretability is established directly on the HMS used in clinical studies. It allows clinicians and patients to discuss any mismatch between treatment efficacy and what is acceptable to the patient. Several obstacles were encountered during the development process, and the most important will be highlighted here. First, the development of the modified questionnaires required the use of a language expert. Precise wording of the introductory explanations and the questionnaire items were of paramount importance to ensure the highest possible validity. Emphasis was placed on clarity and avoiding misunderstandings, and several versions were developed before the pre-final version was tested in the pilot study. Second, problems were encountered during the pilot study as a considerable proportion of the included patients rated the smallest value possible as an acceptable outcome after treatment (i.e. zero for the BQ and the first option for the ODI). The reason given by most patients was that they expected to be “cured” and would not accept anything less. As a result, we decided to add the following to each of the modified questionnaire items: “...if you had to accept some (e.g. pain)”. Third, the pilot study resulted in more questions 66 7. D ISCUSSION from the patients than expected and most pertained to the interpretation of “an acceptable result after the treatment”. Consequently, a system of “easy access for questions” was developed and involved secretarial (or head researcher) assistance by telephone and access to a website with questions and answers. Secondary sector patients Follow-up<2 months Primary outcomes: - Oswestry Disability Index or Roland Morris Disability Questionnaire - Low Back Pain Rating Scale pain (LBP only) - Global anchors Follow-up≥2 months Primary outcomes: - Oswestry Disability Index or Roland Morris Disability Questionnaire - Low Back Pain Rating Scale pain (back) - SF36 Secondary outcomes: - Global anchors Primary sector patients Follow-up<2 months Primary outcomes: - Global anchors - Roland Morris Disability Questionnaire - Low Back Pain Rating Scalepain (LBP only) Secondary outcomes: - Patient Specific Function Scale Follow-up≥2 months Primary outcomes: - Roland Morris Disability Questionnaire - Low Back Pain Rating Scalepain (LBP only) - SF36 Secondary outcomes: - Global anchors - Patient Specific Function Scale LBP only Follow-up ≥2 months Primary outcomes: - Roland Morris Disability Questionnaire - Low Back Pain Rating Scalepain (leg pain) - SF36 Secondary outcomes: - Global anchors Follow-up <2 months Primary outcomes: - Global anchors - Roland Morris Disability Questionnaire - Low Back Pain Rating Scalepain (leg pain) Secondary outcomes: - Patient Specific Function Scale Follow-up ≥2 months Primary outcomes: - Oswestry Disability Index or Roland Morris Disability Questionnaire - Low Back Pain Rating Scale pain (leg pain) - SF36 Secondary outcomes: - Global anchors Follow-up <2 months Primary outcomes: - Oswestry Disability Index or Roland Morris Disability Questionnaire - Low Back Pain Rating Scale pain (leg pain) - Global anchors Secondary sector patients Leg pain ± LBP Primary sector patients Patients with LBP Evaluative HMS 7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY 67 Figure 7.1. Choosing pain and disability HMS in subgroups of LBP patients - an algorithm. 8. C ONCLUSIONS 8.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY The ODI was successfully translated and cross-culturally adapted into the Danish language. It is a reliable, valid, and responsive tool to assess pain related function and is probably more appropriate in the chronic SeS patients. The patient established minimal important change was 8-9 points and dependent on the level of disability at baseline. HMS responsiveness varied according to the patient population it was applied to and the RMQ or ODI proved to be the disability HMS of choice in all subgroups. However, the patients’ global retrospective assessment of treatment effect appeared to be the most responsive instrument in the PrS patients. An algorithm simplifying the choice of pain and disability HMS has been proposed. A standardised use of patients’ global assessment of treatment effect (transition questions) was proposed to simplify interpretation and comparisons of clinical trial results. 8.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY The prospective acceptable outcome method offers a benchmark by which patients’ acceptable outcome can be scrutinised before treatment begins. It yields results which are not comparable to the retrospective MCIDpost method and the disparity is possibly influenced by a response shift. Moreover, LBP patients have a clearer understanding of what is an acceptable change in pain intensity before starting treatment compared to changes in function and psychological/affective domains. 9. R ECOMMENDATIONS A multitude of areas within the realm of clinimetric research is ongoing, and most reflect methodological studies of the meaning and measurement of change, and the generality of MCID values. The present studies have focused on a new area - subgroups - and have fostered new information and ideas. These ideas have been drawn up in an agenda for future areas of research: • • • • • • • What are the consequences of using a less extensive cross-cultural adaptation process for HMS? Further work needs to validate a “light” version of the current cross-cultural adaptation process and delineate criteria for the situations where this version is acceptable. Future work should include clinimetric testing of HMS in subgroups other than the ones we have looked at. So far, dependence on initial disease severity [122, 123, 158, 159], acute and chronic patients [39], low back sprain vs. LBP with radiculopathy [149], surgical vs. non-surgical patients [72], point of entry into the health care system (PrS vs. SeS patients) and pain location (LBP only vs. leg pain ±LBP patients) have been investigated. It is likely that other sociodemographic factors, such as depression, co-morbidity, employment status, income, social status, duration of problem, duration of sick leave etc., may be important for the choice of HMS. Future work should include clinimetric testing of neck HMS in subgroups as very little work has been done in this area. Work should continue to explore the consequences of using different external criteria on responsiveness and the MCID. Recent work indicates that the magnitude of the responsiveness statistic depends on the type of criterion included in the study [68]. In particular, I recommend to look at the difference between patients’ global retrospective assessment of treatment effect and the rating of importance of such a change. Most MCID values have been established using one external criterion. To advance the confidence in the MCID for a particular HMS, future research should: 1) confirm the MCID values using other anchors, and 2) report MCID values using CIs as outlined in paper II-1. Accordingly, a range of MCIDs can be reported and scrutiny of individual MCID values for accuracy is possible. The advantages of combining distribution-based and anchor-based methods to back-specific HMS should be further explored to increase confidence in reported MCID ranges. Obtaining transition ratings by telephone interviews allows for independence between the 70 • • 9. R ECOMMENDATIONS HMS and the external criteria. In addition, it is quick and requires few resources. Future research is needed to establish the validity of this method. The prospective acceptable outcome method has been tested on patients seen in the SeS. Future studies should test whether the method yields similar results in PrS patients. Our results show that chronic LBP patients have unrealistically high expectations to the result of the treatment asked before it begins. This was especially true for disability and psychological/affective domains. Consequences of a mismatch between acceptable patient outcomes and expected treatment efficacy on the reporting of results in a clinical trial should be investigated. Bibliography [1] de Vet HC, Terwee CB, Bouter LM. Current challenges in clinimetrics. J Clin Epidemiol. 2003 Dec;56(12):1137–1141. [2] Feinstein AR. An additional basic science for clinical medicine: IV. The development of clinimetrics. Ann Intern Med. 1983 Dec;99(6):843–848. [3] Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997 Mar;50(3):239–246. [4] Beaton DE, Bombardier C, Katz JN, Wright JG. A taxonomy for responsiveness. J Clin Epidemiol. 2001 Dec;54(12):1204–1217. [5] de Bruin AF, Diederiks JP, de Witte LP, Stevens FC, Philipsen H. Assessing the responsiveness of a functional status measure: the Sickness Impact Profile versus the SIP68. J Clin Epidemiol. 1997 May;50(5):529–540. [6] de Vet HC, Terwee CB, Ostelo RW, Beckerman H, Knol DL, Bouter LM. Minimal changes in health status questionnaires: distinction between minimally detectable change and minimally important change. Health Qual Life Outcomes. 2006 Aug;4(1):54–62. [7] Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL, Perrin EB, et al. Evaluating quality-of-life and health status instruments: development of scientific review criteria. Clin Ther. 1996;18(5):979–992. [8] Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989 Dec;10(4):407–415. [9] de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006 Oct;59(10):1033–1039. [10] Deyo RA. Measuring the functional status of patients with low back pain. Arch Phys Med Rehabil. 1988 Dec;69(12):1044–1053. [11] Streiner DL, Norman GR. Health Measurment Scales. A Practical Guide to Their Development and Use. vol. Third. Streiner DL, Norman GR, editors. Oxford: Oxford Medical Publications; 2003. [12] Lurie J. A review of generic health status measures in patients with low back pain. Spine. 2000 Dec;25(24):3125–3129. [13] Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping DL, et al. The Quebec Back Pain Disability Scale. Measurement properties. Spine. 1995 Feb;20(3):341–352. [14] Stratford P, Gill C, Westaway M, Binkley J. Assessing disability and change on individual patients: a report of a patient specific measure. Physiother Can. 1995;47(4):258–263. [15] Walsh TL, Hanscom B, Lurie JD, Weinstein JN. Is a condition-specific instrument for patients with low back pain/leg symptoms really necessary? The responsiveness of the Oswestry Disability Index, MODEMS, and the SF-36. Spine. 2003 Mar;28(6):607–615. [16] Finch E, Brooks D, Stratford PW, Mayo NE. Physical Rehabilitation Outcome Measures. A Guide 72 Bibliography to Enhanced Clinical Decision Making. vol. Second. Finch E, Brooks D, Stratford PW, Mayo NE, editors. BC Decker Inc.; 2002. [17] Muller U, Roder C, Greenough CG. Back related outcome assessment instruments. Eur Spine J. 2006 Jan;15 Suppl 1:S25–S31. [18] Grotle M, Brox JI, Vollestad NK. Functional Status and Disability Questionnaires: What Do They Assess?: A Systematic Review of Back-Specific Outcome Questionnaires. Spine. 2005 Jan;30(1):130–140. [19] Muller U, Duetz MS, Roeder C, Greenough CG. Condition-specific outcome measures for low back pain. Part I: Validation. Eur Spine J. 2004 Mar;13:301–313. [20] Muller U, Roeder C, Dubs L, Duetz MS, Greenough CG. Condition-specific outcome measures for low back pain. Part II: Scale construction. Eur Spine J. 2004 Mar;13:314–324. [21] Roland M, Morris R. A Study of the Natural-History of Back Pain .1. Development of A Reliable and Sensitive Measure of Disability in Low-Back-Pain. Spine. 1983;8(2):141–144. [22] Patrick DL, Deyo RA, Atlas SJ, Singer DE, Chapin A, Keller RB. Assessing health-related quality of life in patients with sciatica. Spine. 1995 Sep;20(17):1899–1908. [23] Stratford PW, Binkley JM. Measurement properties of the RM-18. A modified version of the Roland-Morris Disability Scale. Spine. 1997 Oct;22(20):2416–2421. [24] Williams RM, Myers AM. Support for a shortened Roland-Morris Disability Questionnaire for patients with acute low back pain. Physiother Can. 2001;53(1):60–66. [25] Dionne CE, Von Korff M, Koepsell TD, Deyo RA, Barlow WE, Checkoway H. A comparison of pain, functional limitations, and work status indices as outcome measures in back pain research. Spine. 1999 Nov;24(22):2339–2345. [26] Atlas SJ, Deyo RA, van den Ancker M, Singer DE, Keller RB, Patrick DL. The Maine-Seattle back questionnaire: a 12-item disability questionnaire for evaluating patients with lumbar sciatica or stenosis: results of a derivation and validation cohort analysis. Spine. 2003 Aug;28(16):1869–1876. [27] Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine. 2000 Nov;25(22):2940–2952. [28] Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary and general recommendations. Spine. 2000 Dec;25(24):3100–3103. [29] Deyo RA, Andersson G, Bombardier C, Cherkin DC, Keller RB, Lee CK, et al. Outcome Measures for Studying Patients with Low-Back-Pain. Spine. 1994 Sep;19(18):S2032–S2036. [30] Beurskens AJ, de Vet HC, Koke AJ, van der Heijden GJ, Knipschild PG. Measuring the functional status of patients with low back pain. Assessment of the quality of four disease-specific questionnaires. Spine. 1995 May;20(9):1017–1028. [31] Kopec JA, Esdaile JM. Functional disability scales for back pain. Spine. 1995 Sep;20(17):1943–1949. [32] Millard RW, Beattie PF, Jones RH. A comprehensive review of questionnaires to evaluate chronic pain-related disability. Critical Reviews in Physical and Rehabilitation Medicine. 1997;9(1):35–52. [33] Kopec JA. Measuring functional outcomes in persons with back pain: a review of back-specific questionnaires. Spine. 2000 Dec;25(24):3110–3114. [34] Roland M, Fairbank J. The Roland-Morris Disability Questionnaire and the Oswestry Disability Questionnaire. Spine. 2000 Dec;25(24):3115–3124. [35] Beaton DE, Schemitsch E. Measures of health-related quality of life and physical function. Clin Orthop. 2003 Aug;(413):90–105. 73 Bibliography [36] Resnik L, Dobrzykowski E. Guide to outcomes measurement for patients with low back pain syndromes. J Orthop Sports Phys Ther. 2003 Jun;33(6):307–316. [37] Zanoli G, Stromqvist B, Padua R, Romanini E. Lessons learned searching for a HRQoL in- strument to assess the results of treatment in persons with lumbar disorders. Spine. 2000 Dec;25(24):3178–3185. [38] Schaufele MK, Boden SD. Outcome research in patients with chronic low back pain. Orthop Clin North Am. 2003 Apr;34(2):231–237. [39] Grotle M, Brox JI, Vollestad NK. Concurrent comparison of responsiveness in pain and functional status measurements used for patients with low back pain. Spine. 2004 Nov;29(21):E492–E501. [40] Terwee CB, Bot SD, Boers M, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2006;60(1):34–42. [41] Leboeuf-Yde C, Axén I, Jones JJ, Rosenbaum A, Løvgren PW, Halasz L, et al. The Nordic back pain subpopulation program: the long-term outcome pattern in patients with low back pain treated by chiropractors in Sweden. J Manipulative Physiol Ther. 2005 Sep;28(7):472–478. [42] Fritz JM, Brennan GP, Clifford SN, Hunter SJ, Thackeray A. An examination of the reliability of a classification algorithm for subgrouping patients with low back pain. Spine. 2006 Jan;31(1):77–82. [43] Brennan GP, Fritz JM, Hunter SJ, Thackeray A, Delitto A, Erhard RE. Identifying subgroups of patients with acute/subacute "nonspecific" low back pain: results of a randomized clinical trial. Spine. 2006 Mar;31(6):623–631. [44] Wilson IB, Cleary PD. Linking clinical variables with health-related quality of life. A conceptual model of patient outcomes. JAMA. 1995 Jan;273(1):59–65. [45] Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis. 1985;38(1):27–36. [46] Sloan JA, Aaronson N, Cappelleri JC, Fairclough DL, Varricchio C. Assessing the clinical significance of single items relative to summated scores. Mayo Clin Proc. 2002 May;77(5):479–487. [47] Manniche C, Asmussen K, Lauritsen B, Vinterberg H, Kreiner S, Jordan A. Low Back Pain Rating scale: validation of a tool for assessment of low back pain. Pain. 1994 Jun;57(3):317–326. [48] Ware JE. SF-36 health survey update. Spine. 2000 Dec;25(24):3130–3139. [49] Boscainos PJ, Sapkas G, Stilianessi E, Prouskas K, Papadakis SA. Greek versions of the Oswestry and Roland-Morris Disability Questionnaires. Clin Orthop. 2003 Jun;(411):40–53. [50] Albert HB, Jensen AM, Dahl D, Rasmussen MN. Criteria validation of the Roland Morris questionnaire. A Danish translation of the international scale for the assessment of functional level in patients with low back pain and sciatica. Ugeskr Laeger. 2003 Apr;165(18):1875–1880 [In Danish]. [51] Johansson E, Lindberg P. Subacute and chronic low back pain. Reliability and validity of a Swedish version of the Roland and Morris Disability Questionnaire. Scand J Rehabil Med. 1998 Sep;30(3):139–143. [52] Nusbaum L, Natour J, Ferraz MB, Goldenberg J. Translation, adaptation and validation of the Roland-Morris questionnaire–Brazil Roland-Morris. BrazJ Med Biol Res. 2001 Feb;34(2):203–210. [53] Fayers PM, Machin D. Quality of Life. Assessment, Analysis and Interpretation. Fayers PM, Machin D, editors. Chichester: John Wiley & Sons Ltd.; 2000. [54] DeVellis R. Scale development. Theory and application. 2nd ed. Seawell M, editor. Sage Publications; 2003. 74 Bibliography [55] Oppenheim AN. Questionnaire design, interviewing and attitude measurement. 2nd ed. Oppenheim AN, editor. Printer Publisher; 1996. [56] Hansagi H, Allebeck P. Enkät och intervju inom hälso- och sjukvård. Handbok för forsking och utvecklingsarbete. Hansagi H, Allebeck P, editors. Studenterlitteratur; 1996. [57] Parsons S, Carnes D, Pincus T, Foster N, Breen A, Vogel S, et al. Measuring troublesomeness of chronic pain by location. BMC Musculoskelet Disord. 2006 Apr;7(1):34–. [58] Wittink H, Turk DC, Carr DB, Sukiennik A, Rogers W. Comparison of the redundancy, reliability, and responsiveness to change among SF-36, Oswestry Disability Index, and Multidimensional Pain Inventory. Clin J Pain. 2004 May;20(3):133–142. [59] Bayar K, Bayar B, Yakut E, Yakut Y. Reliability and construct validity of the Oswestry Low Back Pain Disability Questionnaire in the elderly with low back pain. Pain Clinic. 2003;15(1):55–59. [60] Fritz JM, Piva SR. Physical impairment index: reliability, validity, and responsiveness in patients with acute low back pain. Spine. 2003 Jun;28(11):1189–1194. [61] Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003 Jun;12(4):349–362. [62] Stratford PW, Riddle DL. Assessing sensitivity to change: choosing the appropriate change coefficient. Health Qual Life Outcomes. 2005;3(1):23–. [63] Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol. 2003 May;56(5):395–407. [64] Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol. 2000 May;53(5):459–468. [65] Stratford PW, Binkley FM, Riddle DL. Health status measures: strategies and analytic methods for assessing change scores. Phys Ther. 1996 Oct;76(10):1109–1123. [66] Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials. CMAJ. 1986 Apr;134(8):889–895. [67] Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Interpreting the significance of changes in health-related quality-of-life scores. J Clin Oncol. 1998 Jan;16(1):139–144. [68] Kuijer W, Brouwer S, Dijkstra PU, Jorritsma W, Groothoff JW, Geertzen JH. Responsiveness of the Roland-Morris Disability Questionnaire: consequences of using different external criteria. Clin Rehabil. 2005 Aug;19(5):488–495. [69] Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol. 2002 Sep;55(9):900–908. [70] Elliott AM, Smith BH, Hannaford PC, Smith WC, Chambers WA. Assessing change in chronic pain severity: the chronic pain grade compared with retrospective perceptions. Br J Gen Pract. 2002 Apr;52(477):269–274. [71] Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997 Aug;50(8):869–879. [72] Hägg O, Fritzell P, Oden A, Nordwall A. Simplifying outcome measurement: evaluation of instruments for measuring outcome after fusion surgery for chronic low back pain. Spine. 2002 Jun;27(11):1213–1222. [73] Hägg O, Fritzell P, Nordwall A. Simplifying outcome measurement. Eur Spine J. 2005;14(Suppl. 1):S1–S30. 75 Bibliography [74] Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol. 1996 Nov;49(11):1215–1219. [75] Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a literature review and directions for future research. Curr Opin Rheumatol. 2002 Mar;14(2):109–114. [76] Hays RD, Woolley JM. The concept of clinically meaningful difference in health-related quality-of-life research. How meaningful is it? Pharmacoeconomics. 2000 Nov;18(5):419–423. [77] Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol. 1993 Dec;46(12):1417–1432. [78] Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000 Dec;25(24):3186–3191. [79] Scientific Advisory Committee of the Medical Outcomes Trust. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002 May;11(3):193–205. [80] Cortina JM. What Is Coefficient Alpha? An Examination of Theory and Applications. Journal of Applied Psychology. 1993;78(1):98–104. [81] Nunnally JC, Bernstein I. Psychometric Theory. 3rd ed. Vaicunas J, Belser JR, editors. McGraw Hill Higher Education; 1993. [82] McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995 Aug;4(4):293–307. [83] Davidson M, Keating JL. A comparison of five low back disability questionnaires: reliability and responsiveness. Phys Ther. 2002 Jan;82(1):8–24. [84] Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998 Oct;26(4):217–238. [85] Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol. 1999 Sep;52(9):861–873. [86] Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL. Smallest real difference, a link between reproducibility and responsiveness. Qual Life Res. 2001;10(7):571–578. [87] Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care. 1999 May;37(5):469–478. [88] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;1(8476):307–310. [89] Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. Lancet. 1995 Oct;346(8982):1085–1087. [90] Lamé IE, Peters ML, Vlaeyen JWS, v Kleef M, Patijn J. Quality of life in chronic pain is more associated with beliefs about pain, than with pain intensity. Eur J Pain. 2005 Feb;9(1):15–24. [91] Patrick DL, Chiang YP. Measurement of health outcomes in treatment effectiveness evaluations: conceptual and methodological challenges. Med Care. 2000 Sep;38(9 Suppl):II14–II25. [92] Hays RD, Hadorn D. Responsiveness to change: an aspect of validity, not a separate dimension. Qual Life Res. 1992 Feb;1(1):73–75. [93] Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an analogy to diagnostic test performance. J Chronic Dis. 1986;39(11):897–906. 76 Bibliography [94] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis. 1987;40(2):171–178. [95] Guyatt GH, Deyo RA, Charlson M, Levine MN, Mitchell A. Responsiveness and validity in health status measurement: a clarification. J Clin Epidemiol. 1989;42(5):403–408. [96] Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 1993 Jun;2(3):221–226. [97] Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care. 1989 Mar;27(3 Suppl):S178–S189. [98] Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003 May;41(5):582–592. [99] Moayyedi P, Duffett S, Braunholtz D, Mason S, Richards ID, Dowell AC, et al. The Leeds Dyspepsia Questionnaire: a valid tool for measuring the presence and severity of dyspepsia. Aliment Pharmacol Ther. 1998 Dec;12(12):1257–1262. [100] Speer DC, Greenbaum PE. Five methods for computing significant individual client change and improvement rates: support for an individual growth curve approach. J Consult Clin Psychol. 1995 Dec;63(6):1044–1048. [101] Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, Mowat A. Importance of sensitivity to change as a criterion for selecting health status measures. Qual Health Care. 1992 Jun;1(2):89–93. [102] Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders. J Clin Epidemiol. 1997 Jan;50(1):79–93. [103] Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials. 1991 Aug;12(4 Suppl):142S–158S. [104] Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol. 1991 Feb;59(1):12–19. [105] Jenkinson C. Measuring Health And Medical Outcomes. Jenkinson C, editor. Taylor & Francis; 1994. [106] Cohen J. Statistical Power Analysis for the Behavioral Sciences. vol. 2nd. edition. Cohen J, editor. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [107] Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, Mowat A. Transition Questions to Assess Outcomes in Rheumatoid-Arthritis. Br J Rheumatol. 1993 Sep;32(9):807–811. [108] Fischer D, Stewart AL, Bloch DA, Lorig K, Laurent D, Holman H. Capturing the patient’s view of change as a clinical outcome measure. JAMA. 1999 Sep;282(12):1157–1162. [109] Farrar JT, Portenoy RK, Berlin JA, Kinman JL, Strom BL. Defining the clinically important difference in pain outcome measures. Pain. 2000;88(3):287–294. [110] Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR. Methods to explain the clinical significance of health status measures. Mayo ClinProc. 2002 Apr;77(4):371–383. [111] Barck AL. Measurement of clinical change caused by knee replacement. Conventional score or special change indexes? Arch Orthop Trauma Surg. 1999;119(1-2):76–78. [112] Deyo RA, Inui TS, Leininger J, Overman S. Physical and psychosocial function in rheumatoid arthritis. Clinical use of a self-administered health status instrument. Arch Intern Med. 1982 May;142(5):879–882. [113] Testa MA, Simonson DC. Mar;334(13):835–840. Assesment of quality-of-life outcomes. N Engl J Med. 1996 77 Bibliography [114] Llewellyn-Thomas HA, Williams JI, Levy L, Naylor CD. Using a trade-off technique to as- sess patients’ treatment preferences for benign prostatic hyperplasia. Med Decis Making. 1996;16(3):262–282. [115] Stucki G, Liang MH, Fossel AH, Katz JN. Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol. 1995 Nov;48(11):1369–1378. [116] Kolotkin RL, Crosby RD, Williams GR. Health-related quality of life varies among obese subgroups. Obes Res. 2002 Aug;10(8):748–756. [117] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a disease-specific Quality of Life Questionnaire. J Clin Epidemiol. 1994 Jan;47(1):81–87. [118] Middel B, Goudriaan H, de Greef M, Stewart R, van Sonderen E, Bouma J, et al. Recall bias did not affect perceived magnitude of change in health-related functional status. J Clin Epidemiol. 2006 May;59(5):503–511. [119] Aseltine RH, Carlson KJ, Fowler J F J, Barry MJ. Comparing prospective and retrospective measures of treatment outcomes. Med Care. 1995 Apr;33(4 Suppl):AS67–AS76. [120] Herrmann D. Reporting current, past, and changed health status. What we know about distortion. Med Care. 1995 Apr;33(4 Suppl):AS89–AS94. [121] Farrar JT, Young J J P, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic pain intensity measured on an 11-point numerical pain rating scale. Pain. 2001 Dec;94(2):149–158. [122] Stratford PW, Binkley J, Solomon P, Finch E, Gill C, Moreland J. Defining the minimum level of detectable change for the Roland-Morris questionnaire. Phys Ther. 1996 Apr;76(4):359–365. [123] Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to change of the Roland-Morris Back Pain Questionnaire: part 1. Phys Ther. 1998 Nov;78(11):1186–1196. [124] Cella D, Eton DT, Lai JS, Peterman AH, Merkel DE. Combining anchor and distribution-based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales. J Pain Symptom Manage. 2002 Dec;24(6):547–561. [125] Eton DT, Cella D, Yost KJ, Yount SE, Peterman AH, Neuberg DS, et al. A combination of distribution- and anchor-based approaches determined minimally important differences (MIDs) for four endpoints in a breast cancer scale. J Clin Epidemiol. 2004 Oct;57(9):898–910. [126] Yost KJ, Cella D, Chawla A, Holmgren E, Eton DT, Ayanian JZ, et al. Minimally important differences were estimated for the Functional Assessment of Cancer Therapy-Colorectal (FACT-C) instrument using a combination of distribution- and anchor-based approaches. J Clin Epidemiol. 2005 Dec;58(12):1241–1251. [127] Cella D, Eton DT, Fairclough DL, Bonomi P, Heyes AE, Silberman C, et al. What is a clinically meaningful change on the Functional Assessment of Cancer Therapy-Lung (FACT-L) Questionnaire? Results from Eastern Cooperative Oncology Group (ECOG) Study 5592. J Clin Epidemiol. 2002 Mar;55(3):285–295. [128] Chansirinukor W, Maher CG, Latimer J, Hush J. Comparison of the functional rating index and the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability. Spine. 2005 Jan;30(1):141–145. [129] Bjorner JB, Damsgaard MT, Watt T, Groenvold M. Tests of data quality, scaling assumptions, and reliability of the Danish SF-36. J Clin Epidemiol. 1998 Nov;51(11):1001–1011. 78 Bibliography [130] Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P. Differential item functioning in the Danish translation of the SF-36. J Clin Epidemiol. 1998 Nov;51(11):1189–1202. [131] Bjorner JB, Thunedborg K, Kristensen TS, Modvig J, Bech P. The Danish SF-36 Health Survey: translation and preliminary validity studies. J Clin Epidemiol. 1998 Nov;51(11):991–999. [132] Fairbank JC, Couper J, Davies JB, O’Brien JP. The Oswestry low back pain disability questionnaire. Physiotherapy. 1980 Aug;66(8):271–273. [133] Stratford PW, Spadoni G, Kennedy D, Westaway MD, Alcock GK. Seven points to consider when investigating a measure’s ability to detect change. Physiother Can. 2002;54(1):16–24. [134] Guyatt GH. Making sense of quality-of-life data. Med Care. 2000 Sep;38(9 Suppl):II175–II179. [135] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. vol. 1st ed. New York: Chapman and Hall; 1993. [136] Phillips PCB, Park JY. On the formulation of wald tests of nonlinear restrictions. Econometrica. 1988;56(5):1065–1083. [137] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988 Sep;44(3):837–845. [138] Hägg O, Fritzell P, Nordwall A. The clinical importance of changes in outcome scores after treatment for chronic low back pain. Eur Spine J. 2003 Feb;12(1):12–20. [139] de Vet HC, Bouter LM, Bezemer PD, Beurskens AJ. Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care. 2001;17(4):479–487. [140] Beurskens AJ, de Vet HC, Koke AJ. Responsiveness of functional status in low back pain: a comparison of different instruments. Pain. 1996 Apr;65(1):71–76. [141] Bolton JE, Breen AC. The Bournemouth Questionnaire: a short-form comprehensive outcome measure. I. Psychometric properties in back pain patients. J Manipulative Physiol Ther. 1999 Oct;22(8):503–510. [142] Bolton JE, Humphreys BK. The Bournemouth Questionnaire: A short-form comprehensive outcome measure. II. Psychometric properties in neck pain patients. J Manipulative Physiol Ther. 2002 Mar;25(3):141–148. [143] Fisher K, Johnston M. Validation of the Oswestry Low Back Pain Disability Questionnaire, its sensitivity as a measure of change following treatment and its relationship with other aspects of the chronic pain experience. Physiotherapy Theory and Practice. 1997;13(1):67–80. [144] Grotle M, Brox JI, Vollestad NK. Cross-cultural adaptation of the Norwegian versions of the Roland-Morris Disability Questionnaire and the Oswestry Disability Index. J Rehabil Med. 2003 Sep;35(5):241–247. [145] Baker D, Pynsent PB, Fairbank JC. The Oswestry Disability Index revisited: its reliability, repeatability and validity, and a comparison with the St Thomas’s Disability Index. In: Roland M, Jenner J, editors. Back Pain: New Approaches to Rehabilitation and Education. 12. Manchester: Manchester University Press; 1989. p. 174–186. [146] Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back Pain Disability Questionnaire and the Quebec Back Pain Disability Scale. Phys Ther. 2001 Feb;81(2):776–788. [147] Mannion AF, Junge A, Fairbank JC, Dvorak J, Grob D. Development of a German version of the 79 Bibliography Oswestry Disability Index. Part 1: cross-cultural adaptation, reliability, and validity. Eur Spine J. 2006;15(1):55–65. [148] Mannion AF, Junge A, Grob D, Dvorak J, Fairbank JC. Development of a German version of the Oswestry Disability Index. Part 2: sensitivity to change after spinal surgery. Eur Spine J. 2006;15(1):66–73. [149] Leclaire R, Blier F, Fortin L, Proulx R. A cross-sectional study comparing the Oswestry and Roland-Morris Functional Disability scales in two populations of patients with low back pain of different levels of severity. Spine. 1997 Jan;22(1):68–71. [150] Taylor SJ, Taylor AE, Foy MA, Fogg AJ. Responsiveness of common outcome measures for patients with low back pain. Spine. 1999 Sep;24(17):1805–1812. [151] Suarez-Almazor ME, Kendall C, Johnson JA, Skeith K, Vincent D. Use of health status measures in patients with low back pain in clinical settings. Comparison of specific, generic and preference-based instruments. Rheumatology. 2000 Jul;39(7):783–790. [152] Rantanen P. Physical measurements and questionnaires as diagnostic tools in chronic low back pain. J Rehabil Med. 2001 Jan;33(1):31–35. [153] Davidson M, Keating JL, Eyres S. A low back-specific version of the SF-36 Physical Functioning scale. Spine. 2004 Mar;29(5):586–594. [154] Lønnberg F. The management of back problems among the population. I. Contact patterns and therapeutic routines. Ugeskr Laeger. 1997 Apr;159(15):2207–2214 [In Danish]. [155] Sørensen LP. Chiropractor patients in Denmark - a patient profile. Hartvigsen J, editor. Nordic Institute of Chiropractic and Clinical Biomechanics; 2002 [In Danish]. [156] Gaist D, Sørensen HT, Hallas J. The Danish prescription registries. Dan Med Bull. 1997 Sep;44(4):445–448. [157] Yelland MJ, Schluter PJ. Defining Worthwhile and Desired Responses to Treatment of Chronic Low Back Pain. Pain Medicine. 2006;7(1):38–45. [158] Angst F, Aeschlimann A, Michel BA, Stucki G. Minimal clinically important rehabilitation effects in patients with osteoarthritis of the lower extremities. J Rheumatol. 2002 Jan;29(1):131–138. [159] Riddle DL, Stratford PW, Binkley JM. Sensitivity to change of the Roland-Morris Back Pain Questionnaire: part 2. Phys Ther. 1998 Nov;78(11):1197–1207. 10. A PPENDICES The appendices included are divided into two sections: “Responsiveness in Subgroups Study” 10.1 Detailed description of the instructions to the interviewer 10.2 Baseline and eight weeks follow-up questionnaire booklet 10.3 Test-retest booklet 10.4 Transition questions 1 and 2 (including cover letter) “Prospective Acceptable Outcome Study” 10.5 The semi-structured interview form 10.6 Baseline questionnaire booklet 10.7 One week follow-up booklet 10.8 Eight weeks follow-up booklet 10.9 Nine weeks follow-up interview form 10.10 The Danish version of the Oswestry Disability Index 10.1. A PPENDIX I 81 10.1. A PPENDIX I Detailed description of the instructions to the interviewer used in the “Responsiveness in Subgroups Study”. Appendix 1 Instructions to the professional interviewer who obtained the patients global retrospective assessment of treatment effect and the global rating of importance. The interviewer first introduced the baseline NRSpain score by reading the statement: “The first time you filled out a questionnaire booklet, you estimated your overall pain from the back and/or leg to be: ? (score read) on a 0 to 10 scale with 0 being “no pain” and 10 being “the worst imaginable pain”” This was repeated if the patient didn’t understand it the first time. The protocol was now divided into two separate protocols – one for TQ1 and one for TQ2. TQ1: The interviewer read the sentence: “How would you describe your general low back and/or leg problems now, compared to how you were when the treatment started?” Following this all the response options were read twice to ensure clarity and understanding. The interviewer was strictly not allowed to help the patient with the answer; however, in case a patient couldn’t choose a specific category, the interviewer decided if the patient was either “better”, “about the same” or “worse” from the patient’s response. If the interviewer decided that the patient was better, the categories for being improved was read again (“much better”, “better”, “a little better”) and similar for patients classified as worse. If the patient still couldn’t choose the response was recorded as missing. TQ2: The interviewer read the sentence: “If you think about the first time you filled out the questionnaire booklet, how would you describe your average low back/leg problems?” Three options were read to the patient: 1) better, 2) about the same, or 3) worse After choosing one, the interviewer read the question: “How much better/worse is your low back or leg problems compared to how you were when the treatment started?” Following this the 7 response options were read twice to ensure clarity and understanding. The interviewer was strictly not allowed to help the patient with the answer; however, in the case that a patient couldn’t choose a specific category, the interviewer read the response options again. If the patient still couldn’t choose the response was recorded as missing. 82 10. A PPENDICES 10.2. A PPENDIX II Baseline and eight weeks follow-up questionnaire booklet used in the “Responsiveness in Subgroups Study”. The eight weeks follow-up booklet omitted page ii. Køn? Lidt o om d dig sselv Mand Kvinde Hvad er din alder? ___________ år. Hvor længe har dine nuværende smerter stået på? ca. dage Hvor mange dage om ugen har du smerter (gennemsnitlig)? Har du haft det nuværende problem før? Ja Nej dage Hvis Ja, ca. hvor mange gange? 0‐2 gange 2‐10 gange mere end 10 gange Hvor har du ondt? I lænderyggen (sæt ét kryds) Med ”lænderyggen” menes det skraverede område. I det ene eller begge ben Begge steder Ingen af stederne. Beskriv: Med benene menes det skraverede område. Smerter i benene som ikke stammer fra ryggen – fx slidgigt i knæet – medregnes ikke her. Hvor ofte har du taget smertestillende medicin for dine ryg‐/ben‐smerter indenfor den sidste uge? Aldrig Et par gange Mere end et par gange, men ikke dagligt Dagligt Har du tidligere søgt behandling for samme problem? Ja Nej Hvis ja, hos hvem? Læge Kiropraktor Fysioterapeut Andet Kører der, på grund af ryggen, en erstatningssag (fx arbejdsskadesag, patientforsikringssag eller klagesag): Ja Nej ii primær 83 10.2. A PPENDIX II Roland Morris‐spørgeskemaet Når du har ondt i ryggen eller benene, er nogle af de ting, du plejer at gøre, måske blevet mere vanskelige. Dette skema indeholder nogle sætninger, som folk med rygsmerter eller bensmerter (iskias) har brugt til at beskrive sig selv med. Nogle af sætningerne skiller sig måske ud, fordi de netop beskriver dig, som du har det i dag. Efterhånden som du læser listen, skal du tænke på dig selv i dag. Når du læser en sætning, der beskriver, hvordan du har det i dag, skal du sætte kryds ved Ja. Hvis den pågældende sætning ikke beskriver din tilstand i dag, sætter du et kryds ved Nej. Ja Nej 1. Jeg bliver hjemme det meste af tiden på grund af mit rygproblem eller bensmerter (iskias) 2. Jeg skifter ofte stilling i et forsøg på at gøre det behageligt for ryg og ben 3. Jeg går langsommere end sædvanligt på grund af mit rygproblem eller bensmerter (iskias) 4. På grund af mit rygproblem eller bensmerter (iskias) foretager jeg mig ikke nogle af de ting, som jeg sædvanligvis gør i og omkring huset På grund af mit rygproblem eller bensmerter (iskias) bruger jeg gelænderet, når jeg skal op ad trappen På grund af mit rygproblem eller bensmerter (iskias) er jeg nødt til at holde ved noget, når jeg skal op fra en lænestol Jeg kommer langsommere i tøjet end sædvanligt på grund af mit rygproblem eller bensmerter (iskias) 8. Jeg står kun op i kort tid på grund af mit rygproblem eller bensmerter (iskias) 9. På grund af mit rygproblem eller bensmerter (iskias) prøver jeg at undgå at bukke mig eller at gå ned i knæ 10. Jeg synes det er vanskeligt for mig at komme op fra en lænestol på grund af mit rygproblem eller bensmerter (iskias) 11. Jeg har næsten hele tiden ondt i min ryg eller ben 12. Jeg synes det er svært at vende mig i sengen på grund af mit rygproblem eller bensmerter (iskias) 5. 6. 7. 1 84 10. A PPENDICES Ja Nej 13. Jeg har vanskeligt ved at tage mine sokker eller strømper på, på grund af smerterne i ryg eller ben 14. Jeg spadserer kun korte afstande på grund af mit rygproblem eller bensmerter (iskias) 15. Jeg sover mindre godt på grund af mit rygproblem eller bensmerter (iskias) 16. Jeg undgår tungt arbejde i og omkring huset på grund af mit rygproblem eller bensmerter (iskias) 17. På grund af mit rygproblem eller bensmerter (iskias) er jeg mere irritabel og i dårligt humør overfor folk end ellers 18. På grund af mit rygproblem eller bensmerter (iskias) går jeg langsommere op ad trapper end ellers 19. Jeg bliver i sengen det meste af tiden på grund af mine ryg‐ eller bensmerter (iskias) 20. På grund af mit rygproblem eller bensmerter (iskias) er min seksuelle aktivitet faldet 21. Jeg bliver ved med at gnide på eller holde på de steder på min krop, hvor det gør ondt eller er ubehageligt 22. På grund af mit rygproblem eller bensmerter (iskias) laver jeg mindre af det daglige arbejde i og omkring huset end, hvad jeg ellers ville gøre 23. Jeg giver overfor andre folk ofte udtryk for bekymring over, hvad der måske er ved at ske med mit helbred 2 85 10.2. A PPENDIX II SF‐36 Vejledning: Dette spørgeskema handler om din opfattelse af din fysiske funktion og din smerte. Besvar hvert spørgsmål ved at sætte ring om det svar, der passer bedst på dig. Hvis du er i tvivl om, hvordan du skal svare, svar da venligst så godt du kan. 1. De følgende spørgsmål handler om aktiviteter i dagligdagen. Er du på grund af dit helbred begrænset i disse aktiviteter? I så fald, hvor meget? (Sæt ring om ét tal for hver linie) Ja, meget begrænset Ja, lidt begrænset Nej, slet ikke begrænset 1 2 3 1 2 3 c. At løfte eller bære dagligvarer 1 2 3 d. At gå flere etager op ad trapper 1 2 3 e. At gå én etage op ad trapper 1 2 3 f. At bøje sig ned eller gå ned i knæ 1 2 3 g. Gå mere end én kilometer 1 2 3 h. Gå nogle hundrede meter 1 2 3 i. Gå 100 meter 1 2 3 j. Gå i bad eller tage tøj på 1 2 3 a. Krævende aktiviteter, som fx at løbe, løfte tunge ting, deltage i anstrengende sport b. Lettere aktiviteter, såsom at flytte et bord, støvsuge eller cykle 2. Hvor stærke fysiske smerter har du haft i den sidste uge? (Sæt kun én ring) Ingen smerter ................................................................... 1 Meget lette smerter.......................................................... 2 Lette smerter .................................................................... 3 Middelstærke smerter..................................................... 4 Stærke smerter ................................................................. 5 Meget stærke smerter ..................................................... 6 3. Indenfor den sidste uge hvor meget har fysisk smerte vanskeliggjort dit daglige arbejde (både arbejde uden for hjemmet og husarbejde)? (Sæt kun én ring) Slet ikke............................................................................. 1 Lidt .................................................................................... 2 Noget................................................................................. 3 En hel del .......................................................................... 4 Virkelig meget.................................................................. 5 3 86 10. A PPENDICES Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter Jeg har ingen smerter for øjeblikket Smerterne er meget svage for øjeblikket Smerterne er moderate for øjeblikket Smerterne er forholdsvis kraftige for øjeblikket Smerterne er meget kraftige for øjeblikket Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) Jeg kan klare mig selv som normalt, uden at det giver flere smerter Jeg kan klare mig selv som normalt, men det giver smerter Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv Jeg skal have hjælp hver dag til det meste af min personlige pleje Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte Jeg kan løfte noget tungt uden at få flere smerter Jeg kan løfte noget tungt, men det giver mig flere smerter Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt Jeg kan kun løfte noget meget let Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå Jeg kan gå så langt jeg har lyst selvom jeg har smerter Smerterne hindrer mig i at gå mere end 2 kilometer Smerterne hindrer mig i at gå mere end 1 kilometer Smerterne hindrer mig i at gå mere end 500 meter Jeg kan kun gå, når jeg bruger stok eller krykker Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst Smerterne hindrer mig i at sidde mere end 1 time Smerterne hindrer mig i at sidde mere end en ½ time Smerterne hindrer mig i at sidde mere end 10 minutter Jeg kan overhovedet ikke sidde på grund af smerterne 4 87 10.2. A PPENDIX II Afsnit 6: Stå Jeg kan stå op så længe jeg vil uden at få flere smerter Jeg kan stå op så længe jeg vil, men det giver mig flere smerter Smerterne hindrer mig i at stå op i mere end 1 time Smerterne hindrer mig i at stå op i mere end en ½ time Smerterne hindrer mig i at stå op i mere end 10 minutter Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove Min søvn forstyrres aldrig af smerterne Min søvn forstyrres af og til af smerterne På grund af smerterne får jeg mindre end 6 timers søvn På grund af smerterne får jeg mindre end 4 timers søvn På grund af smerterne får jeg mindre end 2 timers søvn Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) Mit sexliv er som normalt og giver ikke flere smerter Mit sexliv er som normalt, men giver flere smerter Mit sexliv er næsten som normalt, men giver mange smerter Mit sexliv er alvorligt hæmmet af smerterne Mit sexliv er næsten ophørt på grund af smerterne Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv Mit sociale liv er som normalt og giver mig ikke ekstra smerter Mit sociale liv er som normalt, men øger mine smerter Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte Smerterne har begrænset mit sociale liv til mit hjem Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse Jeg kan rejse hvorhen jeg vil uden smerter Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter Smerterne er slemme, men jeg kan godt klare over 2 timers rejse Smerterne begrænser mine rejser til mindre end 1 time Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 5 88 10. A PPENDICES Low back pain rating scale Afkryds kun ét felt i hver linie, hvor 0 svarer til slet ingen smerter og 10 svarer til værst mulige smerter. ”10” svarer til de værst mulige smerter du kan forestille dig – og altså ikke (nødvendigvis) til de stærkeste ryg‐smerter, du har oplevet. RYGSMERTER Dine rygsmerter NETOP NU: Slet ingen smerter 0 1 Værst mulige smerter 2 3 4 5 6 7 8 9 10 De SVÆRESTE rygsmerter du har haft inden for de sidste 14 dage: 0 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 De GENNEMSNITLIGE rygsmerter de sidste 14 dage: 0 1 2 3 Dine bensmerter NETOP NU: 4 5 BENSMERTER Slet ingen smerter 0 1 Værst mulige smerter 2 3 4 5 6 7 8 9 10 De SVÆRESTE bensmerter du har haft inden for de sidste 14 dage: 0 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 De GENNEMSNITLIGE bensmerter de sidste 14 dage: 0 1 2 3 4 5 6 89 10.2. A PPENDIX II DIN VURDERING AF DIN FYSISKE/PSYKISKE FORMÅEN I HVERDAGEN I DE SIDSTE 14 DAGE. Sæt ét kryds på hver linie: Kan give Nej problemer Ved ikke Ja Vågner du om natten pga. ryg‐/bensmerter? Klarer du daglige gøremål uden at din ryg nedsætter aktiviteten? Klarer du lettere gøremål i hjemmet, som fx at vande blomster eller bære tallerkener fra bordet? 4. Kan du selv tage sko og strømper på? 5. Kan du bære to fulde indkøbsposer (10 kg i alt)? 6. Kan du selv komme op fra en lav lænestol uden besvær? 7. Kan du læne dig frem over håndvasken for at børste tænder? 8. Kan du gå op ad trappen fra én etage til en anden uden at hvile pga. ryg‐/bensmerter? 9. Kan du gå 400 m uden at hvile pga. ryg‐/bensmerter? 10. Kan du løbe 100 m uden at hvile pga. ryg‐/bensmerter? 11. Kan du cykle eller køre bil/bus uden ryg‐/bensmerter? Føler du, at ryg‐/bensmerterne har indflydelse på dine følelsesmæssige forhold til den nærmeste familie? 13. Hæmmer ryg‐/bensmerterne dit sexliv? 14. Tror du, at der er noget arbejde, som din ryg ikke kan klare? 15. Tror du, at rygsygdommen vil få indflydelse på din fremtid? 1. 2. 3. 12. 7 90 10. A PPENDICES Generelle ryg‐/ben‐smerter Hvis du samlet skulle beskrive hvordan dine generelle ryg‐/ben‐smerter har været i dag, hvordan har du da haft det? Afkryds kun ét felt. Slet ingen smerter 0 1 Værst tænkelige smerter 2 3 4 5 6 7 8 9 10 8 (Baseline) 91 10.3. A PPENDIX III 10.3. A PPENDIX III Test-retest booklet used in the “Responsiveness in Subgroups Study”. Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Personlig pleje (f.eks. vaske sig, klæde sig på) Jeg kan klare mig selv som normalt, uden at det giver flere smerter Jeg kan klare mig selv som normalt, men det giver smerter Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv Jeg skal have hjælp hver dag til det meste af min personlige pleje Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 2: Smerter Jeg har ingen smerter for øjeblikket Smerterne er meget svage for øjeblikket Smerterne er moderate for øjeblikket Smerterne er forholdsvis kraftige for øjeblikket Smerterne er meget kraftige for øjeblikket Smerterne er de værst tænkelige for øjeblikket Afsnit 3: Sidde Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst Smerterne hindrer mig i at sidde mere end 1 time Smerterne hindrer mig i at sidde mere end en ½ time Smerterne hindrer mig i at sidde mere end 10 minutter Jeg kan overhovedet ikke sidde på grund af smerterne Afsnit 4: Løfte Jeg kan løfte noget tungt uden at få flere smerter Jeg kan løfte noget tungt, men det giver mig flere smerter Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt Jeg kan kun løfte noget meget let Jeg kan ikke løfte eller bære noget som helst Afsnit 5: Stå Jeg kan stå op så længe jeg vil uden at få flere smerter Jeg kan stå op så længe jeg vil, men det giver mig flere smerter Smerterne hindrer mig i at stå op i mere end 1 time Smerterne hindrer mig i at stå op i mere end en ½ time Smerterne hindrer mig i at stå op i mere end 10 minutter Jeg kan overhovedet ikke stå på grund af smerterne 2 92 10. A PPENDICES Afsnit 6: Gå Jeg kan gå så langt jeg har lyst selvom jeg har smerter Smerterne hindrer mig i at gå mere end 2 kilometer Smerterne hindrer mig i at gå mere end 1 kilometer Smerterne hindrer mig i at gå mere end 500 meter Jeg kan kun gå, når jeg bruger stok eller krykker Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 7: Rejse Jeg kan rejse hvorhen jeg vil uden smerter Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter Smerterne er slemme, men jeg kan godt klare over 2 timers rejse Smerterne begrænser mine rejser til mindre end 1 time Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter Smerterne hindrer mig i at rejse, undtagen for at få behandling Afsnit 8: Sove Min søvn forstyrres aldrig af smerterne Min søvn forstyrres af og til af smerterne På grund af smerterne får jeg mindre end 6 timers søvn På grund af smerterne får jeg mindre end 4 timers søvn På grund af smerterne får jeg mindre end 2 timers søvn Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 9: Mit sociale liv Mit sociale liv er som normalt og giver mig ikke ekstra smerter Mit sociale liv er som normalt, men øger mine smerter Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte Smerterne har begrænset mit sociale liv til mit hjem Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Sexliv (hvis relevant) Mit sexliv er som normalt og giver ikke flere smerter Mit sexliv er som normalt, men giver flere smerter Mit sexliv er næsten som normalt, men giver mange smerter Mit sexliv er alvorligt hæmmet af smerterne Mit sexliv er næsten ophørt på grund af smerterne Smerterne hindrer sexliv overhovedet Mange tak for hjælpen. 3 93 10.3. A PPENDIX III Generelle ryg‐/ben‐smerter Hvordan har dine ryg‐/ben‐smerter været generelt siden i går? Afkryds kun ét felt. Bedre Uændret Værre Ikke sikker/ved ikke 4 (primær,1d) 94 10. A PPENDICES 10.4. A PPENDIX IV Transition questions 1 and 2 (including cover letter) used in the “Responsiveness in Subgroups Study”. Gruppe A, 8 ugers opfølgning Ref.: _________ Ændring af ryg‐/ben‐smerter Hvordan vil du beskrive din generelle tilstand i ryggen og benene nu, hvis du sammenligner med hvordan du havde det, da du startede behandlingen? Afkryds kun ét felt. Meget bedre Bedre Lidt bedre Næsten det samme......Du er færdig. Lidt værre Værre Meget værre Den ændring du har oplevet i dine ryg‐ og ben‐smerter siden behandlingen startede, hvor vigtig er den for dig? Afkryds kun ét felt. Ikke vigtig 0 1 Meget vigtig 2 3 4 5 6 7 8 9 10 95 10.4. A PPENDIX IV Gruppe B, 8 ugers opfølgning Ref.: _________ Ændring af ryg‐/ben‐smerter Hvis du tænker på, hvordan du havde det første gang du udfyldte spørgeskemaerne, hvordan har dine ryg‐ /ben‐smerter været siden da gennemsnitligt? Afkryds kun ét felt. Bedre ..................... Gå til spørgsmål 1. Uændret................ Du er færdig. Værre .................... Gå til spørgsmål 2. Spørgsmål 1: Hvor meget bedre er dine smerter i ryggen og benene blevet, siden du startede behandlingen? Næsten det samme, næsten ingen bedring En smule bedre Noget bedre En del bedre Meget bedre Rigtig meget bedre Helt rask Spørgsmål 2: Hvor meget værre er dine smerter i ryggen og benene blevet, siden du startede behandlingen? Næsten det samme, næsten ingen forværring En smule værre Noget værre En del værre Meget værre Rigtig meget værre Værst tænkelig Den ændring du har oplevet i dine ryg‐/ben‐smerter siden behandlingen startede, hvor vigtig er den for dig? Afkryds kun ét felt. Ikke vigtig Ligegyldig 0 1 Meget vigtig 2 3 4 5 6 7 8 9 10 96 10. A PPENDICES Kære XXXX Vedr.: usXXXX Følgende patient er inkluderet i vores undersøgelse om effekten af rygbehandling. Som aftalt fremsender jeg detaljerne til dig, så du kan lave et telefoninterview. Husk at gøre patienten opmærksom på, hvordan han/hun havde det første gang, før du læser det vedlagte spørgeskema op. Venligst skriv referencenummeret på interviewskemaet og brug det ved indtastningen af resultaterne. Tak for hjælpen. Henrik H. Lauridsen Syddansk Universitet Institut for Idræt og Biomekanik Campusvej 55 5230 Odense M Tlf: 65503487 Reference nummer: ____________________ Patientens navn: _______________________________________________________________ Telefonnummer: ____________________ Information til respondenten, som læses op først: Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen og benene således: Slet ingen smerter 0 1 2 Værst tænkelige smerter 3 4 5 6 7 8 9 10 97 10.5. A PPENDIX V 10.5. A PPENDIX V The semi-structured interview form used in the “Prospective Acceptable Outcome Study”. Respondent nr. _____ Spørgsmål til Pilotundersøgelse – baseline skemaer Hvis ikke tid nu, kan jeg ringe? □ Ja □ Nej Telefonnummer:_________________ Sværhedsgrad 1. Overordnet set, var spørgsmålene svære at svare på? □ Ja □ Nej Hvis ja, beskriv venligst hvad der var svært: ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ 2. I spørgsmål 8 på side 2 (ryg- og/eller bensmerter), var det svært at vælge en svarkategori for hvad der var acceptabelt efter behandlingen? □ Ja □ Nej Hvis ja, beskriv venligst hvad der var svært: ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ _______________________________________________________________________________ Forståelse 3. Beskriv venligst hvordan forstår du sætningen: ”hvilket resultat du vil acceptere efter behandlingen?” (evt. optages på bånd) ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ _______________________________________________________________________________ 4. Mener du, at der er forskel på ”hvad man forventer/håber af behandlingen” og ”hvad man vil acceptere af behandlingen? □ Ja □ Nej □ Ved ikke Hvis ja, kan du beskrive hvad forskellen er? (evt. optages på bånd) ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ _______________________________________________________________________________ 1 98 10. A PPENDICES Respondent nr. _____ Hvis nej, kan du beskrive hvorfor ikke? (evt. optages på bånd) ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ _______________________________________________________________________________ Vigtighed af en ændring Såfremt du skal vurdere om en given forbedring/forværring er enten vigtig eller ikke vigtig, hvor ville du så skille skalaen vist nedenfor? (sæt en lodret streg mellem de to tal hvor det skiller) Slet ingen smerter 0 1 2 3 4 5 6 7 8 Værst tænkelige smerter 9 10 Ikke udfyldte spørgsmål 5. Hvorfor har du ikke udfyldt spørgsmål _________________? Svar: ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ 6. Hvorfor har du ikke udfyldt spørgsmål _________________? Svar: ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ Kommentarer skrevet på skemaerne 7. Kommentar skrevet ved ________________ Hvorfor har du skrevet en kommentar og ikke afkrydset skemaet? ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ 2 99 10.5. A PPENDIX V Respondent nr. _____ ________________________________________________________________________________ 8. Kommentar skrevet ved spørgsmål(ene) ________________ Hvorfor har du skrevet en kommentar og ikke afkrydset skemaet? ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ Andet 9. Andre kommentarer/indvendinger? ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ ________________________________________________________________________________ 3 100 10. A PPENDICES 10.6. A PPENDIX VI Baseline questionnaire booklet used in the “Prospective Acceptable Outcome Study”. Lidt o om d dig sselv Dato _______________ Navn _________________________________________________________________ Adresse _______________________________________________________________ Postnummer ___________________________________________________________ By ____________________________________________________________________ Tlf. nr. Hjemme ________________ Arbejde __________________ Mobil ____________________ 1. Har du haft det nuværende problem før? Ja Nej 2. Hvis Ja, ca. hvor mange gange? 0‐2 gange 3‐10 gange mere end 10 gange 3. Hvor har du ondt? I lænderyggen (sæt ét kryds) Med ”lænderyggen” menes det skraverede område. I det ene eller begge ben Begge steder Ingen af stederne. Beskriv: Med benene menes det skraverede område. Smerter i benene som ikke stammer fra ryggen – fx slidgigt i knæet – medregnes ikke her. 4. Hvor ofte har du taget smertestillende medicin for dine ryg‐/bensmerter inden for den sidste uge? Aldrig Et par gange Mere end et par gange, men ikke dagligt Dagligt 5. Har du tidligere søgt behandling for samme problem? 2 Ja Nej 101 10.6. A PPENDIX VI Generelle ryg‐ og/eller bensmerter Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter. 6. Hvis du samlet skal bedømme dine ryg- og/eller bensmerter i de sidste 2 uger, hvordan har de da været? Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”. Slet ingen smerter 0 1 2 3 4 5 6 7 8 Værst tænkelige smerter 9 10 7. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i de sidste 2 uger, hvordan har de da været? Afkryds kun ét felt. Ingen smerter Lette smerter Moderate smerter Stærke smerter 3 102 10. A PPENDICES De følgende spørgsmål omhandler dine ryg- og/eller bensmerter. Sæt kun ét kryds i én boks på hver skala fra ”0” til ”10”. Læs venligst hvert spørgsmål grundigt, inden du svarer. Hver sektion består af to spørgsmål: Det første spørgsmål handler om, hvordan du har haft det de sidste par dage. Det andet spørgsmål handler om, hvilket resultat du vil acceptere efter behandlingen, hvis du fx var nødt til at acceptere en vis smerte. Det er altså ikke hvad du forventer/håber af behandlingen, men hvad du vil acceptere. Her er et eksempel på hvordan du skal gøre: Her viser du, hvordan du har haft det de sidste par dage. 8. Hvordan vil du beskrive dine ryg- og/eller bensmerter de sidste par dage? Ingen smerter 0 1 2 3 4 5 6 7 Værst tænkelige smerter 8 9 10 9. Hvad vil være acceptabel smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte? Ingen smerte 0 1 2 3 4 5 6 7 8 9 Her viser du, hvad du synes der er acceptabelt efter behandlingen, hvis du var nødt til at acceptere en vis smerte. 4 Værst tænkelig smerte 10 103 10.6. A PPENDIX VI 8. Hvordan vil du beskrive dine ryg- og/eller bensmerter de sidste par dage? Værst tænkelige smerter Ingen smerter 0 1 2 3 4 5 6 7 8 9 10 9. Hvad vil være acceptabel smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte? Værst tænkelig smerte Ingen smerte 0 1 2 3 4 5 6 7 8 9 10 10. Hvordan har dine ryg- og/eller bensmerter påvirket dine daglige aktiviteter (husligt arbejde, vaske sig, tage tøj på, løfte, gå, køre, gå på trapper, sætte sig i/rejse sig fra stol, lægge sig i/komme ud af seng, sove) de sidste par dage? Ikke i stand til at udføre aktiviteter Ingen påvirkning 0 1 2 3 4 5 6 7 8 9 10 11. Hvad vil være en acceptabel påvirkning af dine daglige aktiviteter efter behandlingen, hvis du bliver nødt til at acceptere en vis påvirkning? Ikke i stand til at udføre aktiviteter Ingen påvirkning 0 1 2 3 4 5 6 7 8 9 10 12. Hvor meget har dine ryg- og/eller bensmerter påvirket dine sociale, familiære og fritidsaktiviteter de sidste par dage? Ikke i stand til at udføre aktiviteter Ingen påvirkning 0 1 2 3 4 5 6 7 8 9 10 13. Hvad vil være en acceptabel påvirkning af dine sociale aktiviteter, hvis du bliver nødt til at acceptere en vis påvirkning? Ikke i stand til at udføre aktiviteter Ingen påvirkning 0 1 2 3 4 5 6 7 8 9 10 5 104 10. A PPENDICES 14. Hvor anspændt (nervøs, irritabel, svært ved at slappe af/koncentrere sig) har du været de sidste par dage? Ikke anspændt overhovedet Ekstremt anspændt 0 1 2 3 4 5 6 7 8 9 10 15. Hvad vil være acceptabel anspændthed efter behandlingen, hvis du bliver nødt til at acceptere en vis anspændthed? Ikke anspændt overhovedet Ekstremt anspændt 0 1 2 3 4 5 6 7 8 9 10 16. Hvor deprimeret (nede, ked af det, i dårligt humør, pessimistisk, sløv) har du været det sidste stykke tid? Ikke deprimeret overhovedet Dybt deprimeret 0 1 2 3 4 5 6 7 8 9 10 17. Hvor deprimeret kan du acceptere at være efter behandlingen, hvis du bliver nødt til at acceptere en vis grad af depression? Ikke deprimeret overhovedet Dybt deprimeret 0 1 2 3 4 5 6 7 8 9 10 18. Hvordan tror du dit arbejde (både i og udenfor hjemmet) har påvirket dine ryg- og/eller bensmerter de sidste par dage? Ikke forværret smerterne Forværret smerterne meget 0 1 2 3 4 5 6 7 8 9 10 19. Hvor meget vil du acceptere, acceptere at dit arbejde påvirker dine ryg- og/eller bensmerter (både i og udenfor hjemmet) efter behandlingen, hvis du bliver nødt til at acceptere en vis påvirkning? Forværrer ikke smerterne Forværrer smerterne meget 0 1 2 3 4 5 6 7 8 9 10 20. Hvor meget har du selv kunnet kontrollere (afhjælpe/mindske) og magte dine ryg- og/eller bensmerter de sidste par dage? Fuldstændig kontrol Ingen kontrol overhovedet 0 1 2 3 4 5 6 7 8 9 10 21. Hvad vil være acceptabel selvkontrol af dine ryg- og/eller bensmerter, hvis du bliver nødt til at acceptere, at du ikke får fuld kontrol over smerterne? Fuldstændig kontrol Ingen kontrol overhovedet 0 1 2 3 4 5 6 7 8 9 10 6 105 10.6. A PPENDIX VI Det næste spørgeskema er også lavet for at give os viden om, hvordan dine ryg- og/eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Læs venligst hvert spørgsmål grundigt, inden du svarer. Hver afsnit består af to spørgsmål: Det første spørgsmål handler om, hvordan du har det i dag. Det andet spørgsmål handler om, hvilket resultat du vil acceptere efter behandlingen, hvis du fx bliver nødt til at acceptere en vis smerte. Det er altså ikke hvad du forventer/håber af behandlingen, men hvad du vil acceptere. Her viser du, hvordan du har det i dag. Her er et eksempel på hvordan du skal gøre: I dag Afsnit 22a: Smerter 1. Jeg har ingen smerter 2. Smerterne er meget svage 3. Smerterne er moderate 4. Smerterne er forholdsvis kraftige 5. Smerterne er meget kraftige 6. Smerterne er de værst tænkelige Afsnit 22b: Hvad vil du acceptere med hensyn til din smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte? 1. Jeg har ingen smerter 2. Smerterne er meget svage 3. Smerterne er moderate 4. Smerterne er forholdsvis kraftige 5. Smerterne er meget kraftige 6. Smerterne er de værst tænkelige Her viser du, hvad du synes er acceptabelt efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte. 7 106 10. A PPENDICES I dag Afsnit 22a: Smerter 1. Jeg har ingen smerter 2. Smerterne er meget svage 3. Smerterne er moderate 4. Smerterne er forholdsvis kraftige 5. Smerterne er meget kraftige 6. Smerterne er de værst tænkelige Afsnit 22b: Hvad vil du acceptere med hensyn til din smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte? 1. Jeg har ingen smerter 2. Smerterne er meget svage 3. Smerterne er moderate 4. Smerterne er forholdsvis kraftige 5. Smerterne er meget kraftige 6. Smerterne er de værst tænkelige Efter behandlingen Afsnit 23a: Personlig pleje (f.eks. vaske sig, klæde sig på) 1. Jeg kan klare mig selv som normalt, uden at det giver flere smerter 2. Jeg kan klare mig selv som normalt, men det giver smerter 3. Det er meget smertefuldt at klare mig selv, og jeg er langsom og forsigtig 4. Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv 5. Jeg skal have hjælp hver dag til det meste af min personlige pleje 6. Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen I dag Afsnit 23b: Hvad vil du acceptere med hensyn til din personlige pleje efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Jeg kan klare mig selv som normalt, uden at det giver flere smerter 2. Jeg kan klare mig selv som normalt, men det giver smerter 3. Det er meget smertefuldt at klare mig selv, og jeg er langsom og forsigtig 4. Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv 5. Jeg skal have hjælp hver dag til det meste af min personlige pleje 6. Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Efter behandlingen Afsnit 24a: Løfte 1. Jeg kan løfte noget tungt uden at få flere smerter 2. Jeg kan løfte noget tungt, men det giver mig flere smerter 3. Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord 4. Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt 5. Jeg kan kun løfte noget meget let 6. Jeg kan ikke løfte eller bære noget som helst I dag Afsnit 24b: Hvad vil du acceptere med hensyn til at kunne løfte efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning. 1. Jeg kan løfte noget tungt uden at få flere smerter 2. Jeg kan løfte noget tungt, men det giver mig flere smerter 3. Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord 4. Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt 5. Jeg kan kun løfte noget meget let 6. Jeg kan ikke løfte eller bære noget som helst Efter behandlingen 8 107 10.6. A PPENDIX VI Afsnit 25a: Gå 1. Jeg kan gå så langt jeg har lyst, selvom jeg har smerter 2. Smerterne hindrer mig i at gå mere end 2 kilometer 3. Smerterne hindrer mig i at gå mere end 1 kilometer 4. Smerterne hindrer mig i at gå mere end 500 meter 5. Jeg kan kun gå, når jeg bruger stok eller krykker 6. Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 25b: Hvad vil du acceptere med hensyn til at kunne gå efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Jeg kan gå så langt jeg har lyst, selvom jeg har smerter 2. Smerterne hindrer mig i at gå mere end 2 kilometer 3. Smerterne hindrer mig i at gå mere end 1 kilometer 4. Smerterne hindrer mig i at gå mere end 500 meter 5. Jeg kan kun gå, når jeg bruger stok eller krykker 6. Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 26a: Sidde 1. Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst 2. Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst 3. Smerterne hindrer mig i at sidde mere end 1 time 4. Smerterne hindrer mig i at sidde mere end ½ time 5. Smerterne hindrer mig i at sidde mere end 10 minutter 6. Jeg kan overhovedet ikke sidde på grund af smerterne Afsnit 26b: Hvad vil du acceptere med hensyn til at kunne sidde efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst 2. Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst 3. Smerterne hindrer mig i at sidde mere end 1 time 4. Smerterne hindrer mig i at sidde mere end ½ time 5. Smerterne hindrer mig i at sidde mere end 10 minutter 6. Jeg kan overhovedet ikke sidde på grund af smerterne Afsnit 27a: Stå 1. Jeg kan stå op så længe jeg vil uden at få flere smerter 2. Jeg kan stå op så længe jeg vil, men det giver mig flere smerter 3. Smerterne hindrer mig i at stå op i mere end 1 time 4. Smerterne hindrer mig i at stå op i mere end ½ time 5. Smerterne hindrer mig i at stå op i mere end 10 minutter 6. Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 27b: Hvad vil du acceptere med hensyn til at kunne stå efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Jeg kan stå op så længe jeg vil uden at få flere smerter 2. Jeg kan stå op så længe jeg vil, men det giver mig flere smerter 3. Smerterne hindrer mig i at stå op i mere end 1 time 4. Smerterne hindrer mig i at stå op i mere end ½ time 5. Smerterne hindrer mig i at stå op i mere end 10 minutter 6. Jeg kan overhovedet ikke stå på grund af smerterne 9 I dag Efter behandlingen I dag Efter behandlingen I dag Efter behandlingen 108 10. A PPENDICES Afsnit 28a: Sove 1. Min søvn forstyrres aldrig af smerterne 2. Min søvn forstyrres af og til af smerterne 3. På grund af smerterne får jeg mindre end 6 timers søvn 4. På grund af smerterne får jeg mindre end 4 timers søvn 5. På grund af smerterne får jeg mindre end 2 timers søvn 6. Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 28b: Hvad vil du acceptere med hensyn til at kunne sove efter behandlingen, hvis du bliver nødt til at acceptere en vis forstyrrelse? 1. Min søvn forstyrres aldrig af smerterne 2. Min søvn forstyrres af og til af smerterne 3. På grund af smerterne får jeg mindre end 6 timers søvn 4. På grund af smerterne får jeg mindre end 4 timers søvn 5. På grund af smerterne får jeg mindre end 2 timers søvn 6. Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 29a: Sexliv (hvis relevant) 1. Mit sexliv er som normalt og giver ikke flere smerter 2. Mit sexliv er som normalt, men giver flere smerter 3. Mit sexliv er næsten som normalt, men giver mange smerter 4. Mit sexliv er alvorligt hæmmet af smerterne 5. Mit sexliv er næsten ophørt på grund af smerterne 6. Smerterne hindrer sexliv overhovedet Afsnit 29b: Hvad vil du acceptere med hensyn til dit sexliv (hvis relevant) efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Mit sexliv er som normalt og giver ikke flere smerter 2. Mit sexliv er som normalt, men giver flere smerter 3. Mit sexliv er næsten som normalt, men giver mange smerter 4. Mit sexliv er alvorligt hæmmet af smerterne 5. Mit sexliv er næsten ophørt på grund af smerterne 6. Smerterne hindrer sexliv overhovedet I dag Efter behandlingen I dag Efter behandlingen Afsnit 30a: Mit sociale liv 1. Mit sociale liv er som normalt og giver mig ikke ekstra smerter 2. Mit sociale liv er som normalt, men øger mine smerter 3. Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. 4. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte 5. Smerterne har begrænset mit sociale liv til mit hjem 6. Jeg har ikke noget socialt liv på grund af smerterne I dag Afsnit 30b: Hvad vil du acceptere med hensyn til dit sociale liv efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Mit sociale liv er som normalt og giver mig ikke ekstra smerter 2. Mit sociale liv er som normalt, men øger mine smerter 3. Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. 4. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte 5. Smerterne har begrænset mit sociale liv til mit hjem 6. Jeg har ikke noget socialt liv på grund af smerterne Efter behandlingen 10 109 10.6. A PPENDIX VI Afsnit 31a: Rejse 1. Jeg kan rejse hvorhen jeg vil uden smerter 2. Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter 3. Smerterne er slemme, men jeg kan godt klare over 2 timers rejse 4. Smerterne begrænser mine rejser til mindre end 1 time 5. Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter 6. Smerterne hindrer mig i at rejse, undtagen for at få behandling I dag Afsnit 31b: Hvad vil du acceptere med hensyn til at kunne rejse efter behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning? 1. Jeg kan rejse hvorhen jeg vil uden smerter 2. Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter 3. Smerterne er slemme, men jeg kan godt klare over 2 timers rejse 4. Smerterne begrænser mine rejser til mindre end 1 time 5. Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter 6. Smerterne hindrer mig i at rejse, undtagen for at få behandling Efter behandlingen 11 110 10. A PPENDICES 10.7. A PPENDIX VII The one week follow-up booklet used in the “Prospective Acceptable Outcome Study” was the same as appendix VI except for: 1) page 1 and 2 was omitted, and 2) an additional page was added at the end (see below). Generelle ryg- og/eller bensmerter Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter. 35. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i den sidste uge, hvordan har de da været? Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”. Slet ingen smerter 0 1 2 3 4 5 6 7 8 Værst tænkelige smerter 9 10 36. Hvordan har dine ryg- og/eller bensmerter generelt været i den sidste uge? Afkryds kun ét felt. Bedre Uændret Værre Ikke sikker/ved ikke Generel holdning til resultatet 37. Siden du sidst udfyldte skemaerne - har du da ændret mening/holdning til, hvad der er et acceptabelt resultat af behandlingen? Afkryds kun ét felt. Jeg har ikke ændret mening/holdning Jeg har ændret mening/holdning 10 111 10.8. A PPENDIX VIII 10.8. A PPENDIX VIII Eight weeks follow-up booklet used in the “Prospective Acceptable Outcome Study”. Skriv venligst dato for udfyldelse: ______________ Generelle ryg- og/eller bensmerter Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter. 1. Hvis du samlet skal bedømme dine ryg- og/eller bensmerter i de sidste 2 uger hvordan har de da været? Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”. Slet ingen smerter 0 1 2 3 4 5 6 7 8 Værst tænkelige smerter 9 10 2. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i de sidste 2 uger hvordan har de da været? Afkryds kun ét felt. Ingen smerter Lette smerter Moderate smerter Stærke smerter 2 112 10. A PPENDICES Dette spørgeskema omhandler dine ryg- og bensmerter. Oplysningerne vil blive behandlet fortroligt. Sæt kun ét kryds i én boks på hver skala fra ”0” til ”10”. Svarene skal være et gennemsnit af, hvordan din tilstand har været de sidste par dage. Læs venligst hvert spørgsmål grundigt, inden du svarer. 3. Hvordan vil du beskrive dine ryg- og bensmerter? Ingen smerter 0 1 2 3 4 5 6 7 8 Værst tænkelige smerter 9 10 4. Hvordan har dine ryg- og/eller bensmerter påvirket dine daglige aktiviteter (husligt arbejde, vaske sig, tage tøj på, løfte, gå, køre, gå på trapper, sætte sig i/rejse sig fra stol, lægge sig i/komme ud af seng, sove)? Ingen påvirkning 0 1 2 3 4 5 6 7 Ikke i stand til at udføre aktiviteter 8 9 10 5. Hvor meget har dine ryg- og/eller bensmerter påvirket dine sociale, familiære og fritidsaktiviteter? Ingen påvirkning 0 1 2 3 4 5 6 7 Ikke i stand til at udføre aktiviteter 8 9 10 6. Hvor anspændt (nervøs, irritabel, svært ved at slappe af/koncentrere sig) har du været? Ikke anspændt overhovedet 0 1 2 3 4 5 6 7 8 9 Ekstremt anspændt 10 7. Hvor deprimeret (nede, ked af det, i dårligt humør, pessimistisk, sløv) har du været? Ikke deprimeret overhovedet 0 1 2 3 4 5 6 7 8 Dybt deprimeret 9 10 8. Hvordan tror du dit arbejde (både i og udenfor hjemmet) har påvirket dine ryg- og/eller bensmerter? Ikke forværret smerterne 0 1 2 3 4 5 6 7 8 Forværret smerterne meget 9 10 9. Hvor meget har du selv kunnet kontrollere (afhjælpe/mindske) og magte dine ryg- og/eller bensmerter? Fuldstændig kontrol 0 1 2 3 4 5 6 7 8 3 Ingen kontrol overhovedet 9 10 113 10.8. A PPENDIX VIII Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter Jeg har ingen smerter for øjeblikket Smerterne er meget svage for øjeblikket Smerterne er moderate for øjeblikket Smerterne er forholdsvis kraftige for øjeblikket Smerterne er meget kraftige for øjeblikket Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) Jeg kan klare mig selv som normalt, uden at det giver flere smerter Jeg kan klare mig selv som normalt, men det giver smerter Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv Jeg skal have hjælp hver dag til det meste af min personlige pleje Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte Jeg kan løfte noget tungt uden at få flere smerter Jeg kan løfte noget tungt, men det giver mig flere smerter Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt Jeg kan kun løfte noget meget let Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå Jeg kan gå så langt jeg har lyst selvom jeg har smerter Smerterne hindrer mig i at gå mere end 2 kilometer Smerterne hindrer mig i at gå mere end 1 kilometer Smerterne hindrer mig i at gå mere end 500 meter Jeg kan kun gå, når jeg bruger stok eller krykker Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst Smerterne hindrer mig i at sidde mere end 1 time Smerterne hindrer mig i at sidde mere end en ½ time Smerterne hindrer mig i at sidde mere end 10 minutter Jeg kan overhovedet ikke sidde på grund af smerterne 4 114 10. A PPENDICES Afsnit 6: Stå Jeg kan stå op så længe jeg vil uden at få flere smerter Jeg kan stå op så længe jeg vil, men det giver mig flere smerter Smerterne hindrer mig i at stå op i mere end 1 time Smerterne hindrer mig i at stå op i mere end en ½ time Smerterne hindrer mig i at stå op i mere end 10 minutter Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove Min søvn forstyrres aldrig af smerterne Min søvn forstyrres af og til af smerterne På grund af smerterne får jeg mindre end 6 timers søvn På grund af smerterne får jeg mindre end 4 timers søvn På grund af smerterne får jeg mindre end 2 timers søvn Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) Mit sexliv er som normalt og giver ikke flere smerter Mit sexliv er som normalt, men giver flere smerter Mit sexliv er næsten som normalt, men giver mange smerter Mit sexliv er alvorligt hæmmet af smerterne Mit sexliv er næsten ophørt på grund af smerterne Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv Mit sociale liv er som normalt og giver mig ikke ekstra smerter Mit sociale liv er som normalt, men øger mine smerter Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte Smerterne har begrænset mit sociale liv til mit hjem Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse Jeg kan rejse hvorhen jeg vil uden smerter Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter Smerterne er slemme, men jeg kan godt klare over 2 timers rejse Smerterne begrænser mine rejser til mindre end 1 time Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 5 115 10.8. A PPENDIX VIII Vurdering af din behandling I nedenstående 3 spørgsmål skal du give din samlede vurdering af den behandling, som du har modtaget i forhold til dine ryg- og/eller benproblemer. (Sæt ring om ét tal for hver linie) Helt enig Enig Hverken enig eller uenig 20. Behandlingen opfyldte alle mine forventninger 5 4 3 2 1 21. Resultatet af behandlingen var acceptabelt 5 4 3 2 1 22. Behandlingen var værd at modtage 5 4 3 2 1 Uenig Helt uenig Ændring af ryg- og/eller bensmerter 23. Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen og/eller benene således: Slet ingen smerter 0 Værst tænkelige smerter 1 2 3 4 5 6 7 8 9 10 24. Hvordan vil du beskrive din generelle tilstand i ryggen og/eller benene nu, hvis du sammenligner med, hvordan du havde det, da du påbegyndte behandlingen? Afkryds kun ét felt. Meget bedre Bedre Lidt bedre Næsten det samme Lidt værre Værre Meget værre 25. Samlet set, hvor vigtig er den ændring du har oplevet i dine ryg- og/eller bensmerter siden behandlingen begyndte? Afkryds kun ét felt. Den ændring jeg har oplevet er vigtig og relevant for mig. Den ændring (hvis nogen) jeg har oplevet er ligegyldig. 26. Hvis du har oplevet en ændring, hvor på en skala fra 0 – 10 vil du så placere ændringen? Afkryds kun ét felt. Ikke vigtig, ligegyldig 0 1 Meget vigtig 2 3 4 5 6 6 7 8 9 10 116 10. A PPENDICES 10.9. A PPENDIX IX Nine weeks follow-up interview form used in the prospective acceptable outcome study. Ændring af ryg-/bensmerter Telefonopfølgning – uge 9 Ref. nr._________ Navn:____________________________ Tlf. nr.: (H) ______________________ (A) ______________________ (M) ______________________ Dato for udfyldelse: ___________________ Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen og benene således: Slet ingen smerter 0 1 Værst tænkelige smerter 2 3 4 5 6 7 8 9 10 1. Hvordan vil du beskrive din generelle tilstand i ryggen og benene nu, hvis du sammenligner med hvordan du havde det, da du startede behandlingen? Afkryds kun ét felt. Meget bedre Bedre Lidt bedre Næsten det samme Lidt værre Værre Meget værre 2. Har der været nogen ændringer i dine ryg‐/bensmerter over den sidste uge? Afkryds kun ét felt. Tendens mod en forværring Uændret Tendens mod en forbedring 1 117 10.10. A PPENDIX X 10.10. A PPENDIX X The Danish version of the Oswestry Disability Index. Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter Jeg har ingen smerter for øjeblikket Smerterne er meget svage for øjeblikket Smerterne er moderate for øjeblikket Smerterne er forholdsvis kraftige for øjeblikket Smerterne er meget kraftige for øjeblikket Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) Jeg kan klare mig selv som normalt, uden at det giver flere smerter Jeg kan klare mig selv som normalt, men det giver smerter Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv Jeg skal have hjælp hver dag til det meste af min personlige pleje Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte Jeg kan løfte noget tungt uden at få flere smerter Jeg kan løfte noget tungt, men det giver mig flere smerter Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt Jeg kan kun løfte noget meget let Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå Jeg kan gå så langt jeg har lyst selvom jeg har smerter Smerterne hindrer mig i at gå mere end 2 kilometer Smerterne hindrer mig i at gå mere end 1 kilometer Smerterne hindrer mig i at gå mere end 500 meter Jeg kan kun gå, når jeg bruger stok eller krykker Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst Smerterne hindrer mig i at sidde mere end 1 time Smerterne hindrer mig i at sidde mere end en ½ time Smerterne hindrer mig i at sidde mere end 10 minutter Jeg kan overhovedet ikke sidde på grund af smerterne 1 118 10. A PPENDICES Afsnit 6: Stå Jeg kan stå op så længe jeg vil uden at få flere smerter Jeg kan stå op så længe jeg vil, men det giver mig flere smerter Smerterne hindrer mig i at stå op i mere end 1 time Smerterne hindrer mig i at stå op i mere end en ½ time Smerterne hindrer mig i at stå op i mere end 10 minutter Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove Min søvn forstyrres aldrig af smerterne Min søvn forstyrres af og til af smerterne På grund af smerterne får jeg mindre end 6 timers søvn På grund af smerterne får jeg mindre end 4 timers søvn På grund af smerterne får jeg mindre end 2 timers søvn Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) Mit sexliv er som normalt og giver ikke flere smerter Mit sexliv er som normalt, men giver flere smerter Mit sexliv er næsten som normalt, men giver mange smerter Mit sexliv er alvorligt hæmmet af smerterne Mit sexliv er næsten ophørt på grund af smerterne Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv Mit sociale liv er som normalt og giver mig ikke ekstra smerter Mit sociale liv er som normalt, men øger mine smerter Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte Smerterne har begrænset mit sociale liv til mit hjem Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse Jeg kan rejse hvorhen jeg vil uden smerter Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter Smerterne er slemme, men jeg kan godt klare over 2 timers rejse Smerterne begrænser mine rejser til mindre end 1 time Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 2 11. PAPERS Paper I-1 Danish version of the Oswestry Disability Index for patients with low back pain. Part 1: Crosscultural adaptation, reliability and validity in two different populations Paper I-2 Danish version of the Oswestry Disability Index for patients with low back pain. Part 2: Sensitivity, specificity and clinically significant improvement in two low back pain populations Paper I-3 Responsiveness and minimal clinically important difference for pain and disability instruments in low back pain patients Paper I-4 Choice of external criteria in back pain research: does it matter? Recommendations based on analysis of responsiveness Paper II-1 Are low back pain patients able to determine acceptable outcome of treatment before it begins?