1. preface - Danske validerede og ikke validerede spørgeskemaer

Transcription

1. preface - Danske validerede og ikke validerede spørgeskemaer
Erratum
Section 3.1.2. Internal Consistency (page 28)
In the first paragraph the word “inter-correlation” is used several times. This should read
“inter-item correlation”.
Section 3.1.2. Internal Consistency (page 29)
Interpretation of Cronbach’s Alpha in the first paragraph is unclear and has been clarified
in the following description:
“Frequently reported internal consistency coefficients include the item-total correlation,
split-half coefficient, Kuder-Richardson 20 and 21 coefficients (dichotomous response options),
and Cronbach’s coefficient α1 . The item-total correlation coefficient enables selection of relevant
items from a large pool of items by correlating the individual item with the scale total omitting
that item. If correlation is low (< 0.20) the item is irrelevant and can be discarded [11]. The
split-half coefficient, on the other hand, divides the items into two subscales and calculates the
correlation coefficient between the subscales. An internally consistent scale should have a high
correlation coefficient between the two subscales [11]. However, the split-half coefficient does
not address which item(s) are contributing to low reliability and the many pairs of subscales
which can be produced. This is addressed by the commonly reported Cronbach’s α but interpretation of this coefficient is not straight forward. First, α depends on the number of items in
the scale, thus increasing the number of items will increase α. Second, HMS measuring more
than one latent variable will usually have a high α despite the different dimensions not being
correlated to each other. Lastly, if α is too high this may suggest item redundancy and a reasonably interval of 0.7-0.9 has been suggested [80, 81]. Consequently, a low Cronbach’s α equates
a scale which is not internally consistent, however, a high α is no guarantee for an internally
consistent scale. A solution to this dilemma is to utilise the techniques of factor analysis to
assess the relatedness of the various items [11]. Factor analysis investigates how many latent
variables (or factors) underlie a set of items and three common approaches exist: principal
component analysis, exploratory factor analysis and confirmatory factor analysis. Once factor
analysis has established the number of latent variables, Cronbach’s α can be established for
each subscale as a measure of internal consistency.”
1
Cronbach’s α =
n
n −1
(1 −
∑ δi2
δT2
), where α = alpha, n = number of items, δi2 = item score SD, δT2 = total score SD [11]
Section 6.1.2. Paper I-1 (page 55)
In the second paragraph “Internal consistency”, second sentence. It states that “...these belonged to the same latent variable of pain related function.” This has been included by mistake and
should have been omitted.
Paper I-3. Results (page 6)
In the Minimal Clinically Important Difference section, second paragraph, it states: “For
each 25% increase in baseline entry score (original scale range), the MCID for all patients increased by:
12 points (ODI), 2 points (RMQ), 5 points (LBPRSdisability ), 18 points (SF36 (pf)), 6 points (LBPRSpain ),
13 points (SF36 (bp)), and 1 point for the NRSpain .” The values for the two subscales of the SF36
are on a 0-100 scale rather than their original scale range. The values for the original scale range
are: 4 points (SF36 (pf)) and 1 point (SF36 (bp)).
Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1. P REFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.1.
S UPERVISORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.2.
L IST OF P UBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
PART I: T HE VALIDATION S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . .
7
1.3.
F OREWORD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.4.
D EDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.5.
A BBREVIATIONS AND D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
A BBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
1.6.
S UMMARY IN E NGLISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.7.
S UMMARY IN D ANISH (D ANSK R ESUMÉ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2. I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.1.
O UTCOME M EASUREMENT IN L OW B ACK PAIN . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.
P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN . . . . . . . .
19
2.2.1.
C HOICE OF HMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.2.2.
HMS VALIDATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.2.3.
HMS R ESPONSIVENESS AND I NTERPRETATION . . . . . . . . . . . . . . . . . . . . .
21
3. E VALUATION OF H EALTH M EASUREMENT S CALES . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.
C ONCEPTS OF E VALUATION C RITERIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.1.
C ROSS - CULTURAL A DAPTATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.2.
I NTERNAL C ONSISTENCY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.1.3.
F LOOR & C EILING E FFECT (S CALE W IDTH ) . . . . . . . . . . . . . . . . . . . . . . .
29
3.1.4.
R EPRODUCIBILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.1.5.
VALIDITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.1.6.
R ESPONSIVENESS AND I NTERPRETABILITY . . . . . . . . . . . . . . . . . . . . . . .
35
4. O BJECTIVE AND A IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
O BJECTIVE
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
A IMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5. M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
2
Contents
5.1.
PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . .
47
5.1.1.
T RANSLATION AND C ROSS - CULTURAL A DAPTATION . . . . . . . . . . . . . . . . .
47
5.1.2.
PATIENTS AND S ETTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.1.3.
O UTCOME M EASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.1.4.
S UBGROUPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.1.5.
S TATISTICAL A NALYSES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . .
52
5.2.1.
PATIENTS AND S ETTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.2.
P ILOT S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.3.
M AIN S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.4.
O UTCOME M EASURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.2.5.
S TATISTICAL M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6. S UMMARY OF R ESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.2.
6.1.
PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . .
55
6.1.1.
PARTICIPANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.1.2.
PAPER I-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.1.3.
PAPER I-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
6.1.4.
PAPER I-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
6.1.5.
PAPER I-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . .
58
6.2.1.
PARTICIPANTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
6.2.2.
M AIN S TUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
7. D ISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
6.2.
7.1.
PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . .
59
7.1.1.
D ISCUSSION OF F INDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
7.1.2.
D ISCUSSION OF M ETHODOLOGICAL A SPECTS . . . . . . . . . . . . . . . . . . . . .
62
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . .
64
7.2.1.
D ISCUSSION OF F INDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.2.2.
D ISCUSSION OF M ETHODOLOGICAL A SPECTS . . . . . . . . . . . . . . . . . . . . .
65
8. C ONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
7.2.
8.1.
PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY . . . . . . . . . . . . . . . . . . . . .
68
8.2.
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY . . . . . . . . . . . . . . . . . .
68
9. R ECOMMENDATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
10. A PPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
10.1. A PPENDIX I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
10.2. A PPENDIX II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
10.3. A PPENDIX III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
10.4. A PPENDIX IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
10.5. A PPENDIX V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
Contents
3
10.6. A PPENDIX VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.7. A PPENDIX VII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.8. A PPENDIX VIII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.9. A PPENDIX IX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.10. A PPENDIX X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
11. PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
List of Tables
2.1
Advantages and limitations of the four types of HMS. . . . . . . . . . . . . . . . . . . . . . . . .
18
2.2
Main problem areas using HMS as outcome measures. . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
A core set of HMS for patients with spinal disorders according to Bombardier. . . . . . . . . .
20
2.4
Reliability - synonyms, definitions and measurement method . . . . . . . . . . . . . . . . . . .
22
2.5
Methods used to interpret HMS scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.1
Essential measurement properties of HMS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.2
Definitions of responsiveness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.3
Proposed conceptual framework for responsiveness . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.4
Distribution-based methods for determining change . . . . . . . . . . . . . . . . . . . . . . . .
38
3.5
Anchor-based methods for determining change . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.6
Methodological weaknesses of the TQs as reported in the literature. . . . . . . . . . . . . . . .
41
3.7
Study designs and their corresponding analytic methods. . . . . . . . . . . . . . . . . . . . . .
45
5.1
Merging transition questions 1 and 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.2
Concurrent validity and statistical tests examined in study I. . . . . . . . . . . . . . . . . . . . .
50
7.1
Steps in standardising the construction of patients’ global assessment of treatment effect. . . .
62
List of Figures
2.1
An algorithm for choosing an HMS in spinal research. . . . . . . . . . . . . . . . . . . . . . . .
25
3.1
Graphic representation of the stages of cross-cultural adaptation. . . . . . . . . . . . . . . . . .
28
3.2
“Floor” and “ceiling” effects versus scale width for a fictive HMS. . . . . . . . . . . . . . . . .
30
3.3
The concepts of agreement and reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.4
Bland and Altman’s limits of agreement plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.5
Concepts of validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.6
The construction of a ROC curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
7.1
Choosing pain and disability HMS in subgroups of LBP patients - an algorithm. . . . . . . . .
67
1. P REFACE
1.1. S UPERVISORS
Professor Niels Grunnet-Nilsson, MD, DC, PhD (main supervisor)
Clinical Locomotion Science
Institute of Sports Science and Clinical Biomechanics
University of Southern Denmark, Odense, Denmark
Associate Professor Jan Hartvigsen, DC, PhD (project supervisor)
Clinical Locomotion Science
Institute of Sports Science and Clinical Biomechanics
University of Southern Denmark, Odense, Denmark
and
Nordic Institute of Chiropractic and Clinical Biomechanics
Part of Clinical Locomotion Science
Professor Claus Manniche, MD, DMSc
Clinical Locomotion Science
Institute of Sports Science and Clinical Biomechanics
University of Southern Denmark, Odense, Denmark
and
Backcenter Funen
Ringe, Denmark
1.2. L IST OF P UBLICATIONS
7
1.2. L IST OF P UBLICATIONS
This thesis is based on the following papers:
PART I: T HE VALIDATION S TUDY
1. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Danish version
of the Oswestry Disability Index for patients with low back pain. Part 1: Cross-cultural
adaptation, reliability and validity in two different populations. Eur Spine J 2006 (paper
I-1)
2. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Danish version
of the Oswestry Disability Index for patients with low back pain. Part 2: Sensitivity, specificity and clinically significant improvement in two low back pain populations. Eur Spine J
2006 (paper I-2)
3. Lauridsen HH, Hartvigsen J, Manniche C, Korsholm L, Grunnet-Nilsson N. Responsiveness
and minimal clinically important difference for pain and disability instruments in low back
pain patients. BMC Musculoskelet Disord 2006;7:82-98 (paper I-3)
4. Lauridsen HH, Hartvigsen J, Korsholm L, Grunnet-Nilsson N, Manniche C. Choice of external criteria in back pain research: does it matter? Recommendations based on analysis of
responsiveness. Pain, accepted for publication (paper I-4)
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
1. Lauridsen HH, Manniche C, Korsholm L, Grunnet-Nilsson N, Hartvigsen J. Are low back
pain patients able to determine acceptable outcome of treatment before it begins? J Clin
Epidemiol, submitted (paper II-1)
The papers are reprinted in Section 11.
8
1. P REFACE
1.3. F OREWORD
This PhD-thesis is based on two longitudinal cohort studies conducted from 2004 to 2006.
Study one is a questionnaire validation study carried out on both acute and chronic low back
pain patients seen in the primary and secondary sectors of the Danish health care system. The
second study is a methodological study carried out on chronic low back pain patients seen only
in the secondary sector.
I wish to express my appreciation and thank my supervisors and everyone who has been
involved in this project.
In particular, I wish to thank:
Niels Grunnet-Nilsson, principal supervisor, for initiating a visionary project, providing constructive criticism and moral support.
Jan Hartvigsen, second supervisor, for his comprehensive view of the project. In particular,
I want to express my profound gratitude for his enthusiastic, calm and humorous guidance
during numerous meetings and his readiness for constructive critical advice.
Claus Manniche from Backcenter Funen, Ringe who allowed me to access the chronic low
back pain patients used in both studies. His deep insights in the clinical aspects of low back
pain and research methodology was invaluable to the project. I also want to express my
warmest thanks to the staff at Backcenter Funen (in particular Ida Bhanderi) for their skilled
handling of the questionnaires and patient records in both studies. I still owe them Danish
pastry for their participation in the project.
Lars Korsholm for his excellent statistical support during numerous meetings. In particular
I wish to thank him for his professional support using Stata which, at times, required complicated programming and for his clear feedback during the manuscript preparation.
Jytte Johannesen from the Nordic Institute of Chiropractic and Clinical Biomechanics for her
excellent management of data collection in both studies and for her always critical and helpful
comments when proofreading manuscripts. I am grateful for her warmth and helpful discussions during my time of sharing the same office. Furthermore, I wish to thank the Nordic
Institute of Chiropractic and Clinical Biomechanics for allowing me to use an office during
most of my PhD.
1.3. F OREWORD
9
Special thanks to the six chiropractic clinics involved in the project:
•
•
•
•
•
•
Chiropractic Clinic in Nyborg (Henrik Wulff Christensen & Peter Højgaard)
Hartvigsen & Hein Chiropractic Clinic in Odense (Lisbeth Hartvigsen & Tina Hein Lauridsen)
Chiropractic Clinic in Odense (Rie Grunnet-Nilsson & Robert Devallier)
Holt & Højer Chiropractic Clinic in Lyngby (Birgitte Holt & Kent Højer)
Chiropractic Clinic in Fredericia (Susanne Bjerggaard & Kalle Buch)
Chiropractic Clinic in Viby (Troels Gaarde & Gitte Mogensen)
Without their assistance and support I would not have been able to collect data from the primary sector patients included in project I.
Finally, I want to thank my colleagues at the Institute of Sports Science and Clinical Biomechanics (University of Southern Denmark) and Backcenter Funen (Ringe, Denmark) for moral
support and encouragement during my PhD.
The project was supported by the (Danish) Foundation of Chiropractic Research and Post
Graduate Education, the Faculty of Health Science, University of Southern Denmark, Odense
and the European Chiropractic Union Research Council.
Henrik H. Lauridsen, Odense, 2007
10
1. P REFACE
1.4. D EDICATION
The PhD thesis is dedicated to my family, my wife Tina, and my two children, Cecilia and
Jonas who endured endless evenings and weekends with me in front of a computer. Without
their support and patience during the last three years, I would not have been able to finish this
thesis on time.
Small secret: The source of your happiness is involvement in a relationship, not the person.
The Story Teller
1.5. A BBREVIATIONS AND D EFINITIONS
11
1.5. A BBREVIATIONS AND D EFINITIONS
A BBREVIATIONS
HMS
ICC
ICCagreement
ICCconsistency
LBP
LBPRSdisability
LBPRSpain
LOA
LOAlower
MDC95%
MCID
MCID%
MCIDpost
MCIDpre
MID
NNT
NRSimp
NRSpain
ODI
PrS
RMQ
ROC
ROCauc
SEM
SeS
SF36 (bp)
SF36 (pf)
SRM
SRM%
SRMraw
TQ
TQ1
TQ2
Health measurement scale(s)
Intraclass correlation coefficient
Intraclass correlation coefficient including systematic error variance
Intraclass correlation coefficient excluding systematic error variance
Low back pain
Disability subscale of Low Back Pain Rating Scale
Pain subscale of Low Back Pain Rating Scale
Limits of agreement
The lower limit of agreement
Minimal Detectable Change at the 95% confidence level
Minimal Clinically Important Difference of the raw change score
Minimal Clinically Important Difference of the percentage change score
Minimal Clinically Important Difference determined after treatment cessation
Minimal Clinically Important Difference determined before start of treatment
Minimal Important Difference
Numbers needed to treat
Numeric Rating Scale of importance
Numeric Pain Rating Scale
Oswestry Disability Index
Primary sector
Roland Morris Disability Questionnaire
Receiver operating characteristic (curve)
Area under the ROC curve
Standard error of the measurement
Secondary sector
Bodily pain subscale of SF36
Physical function subscale of SF36
Standardised response mean
Standardised response mean of the percentage change score
Standardised response mean of the raw change score
Transition question
7-point transition question
15-point transition question
12
1. P REFACE
D EFINITIONS
Clinimetrics
“The methodologic discipline focussing on measurement issues in clinical medicine” [1, 2]
Responsiveness
“The ability of an instrument to detect accurately change when it has occurred” [3–5]
Minimal detectable change
“A change score which falls outside the measurement error of the outcome measure” [6]
Interpretability
“The degree to which one can assign qualitative meaning to an instrument’s quantitative score” [7]
Minimal clinically Important difference
“The smallest difference in score in the domain of interest which patients perceive as beneficial and which
would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s
management” [8]
Reproducibility
“The degree to which an instrument yields comparable results if it is used repeatedly on stable patients”
[9]
Agreement
“The closeness of scores on repeated measurements” [9]
Reliability
“An instrument’s ability to discriminate between different levels of the measured health outcome” [9]
Nota Bene
Paper I-1 contains several inconsistencies regarding terminology and definitions of
reproducibility, agreement and reliability compared to the description given in this
thesis. The disparity stems from lack of clear definitions supported by sound hypothetical frameworks at the time of writing. The definitions delineated in Section 3.1.4
on page 29 will be adhered to throughout the thesis with the terminology used in
paper I-1 described in brackets where appropriate.
1.6. S UMMARY IN E NGLISH
13
1.6. S UMMARY IN E NGLISH
Background
The Oswestry Disability Index (ODI) is one of two standardised functional health measurement scales (HMS) recommended. Despite extensive psychometric testing, little is known
about HMS behaviour and the minimal clinically important difference (MCID) in subgroups of
LBP patients. Moreover, the most commonly used retrospective method to establish the MCID
has inherent methodological flaws. Perhaps it would be more prudent to ask LBP patients what
is an acceptable result of the treatment before it begins?
Objectives
The overall objective was to establish the responsiveness and MCID in specific subgroups
of patients with LBP. In addition, we explored whether low back pain patients were able to
determine an acceptable treatment outcome before it began.
Methods
The responsiveness in subgroups study. An extensive cross-cultural adaptation and validation of the ODI was carried out on patients seen in the primary (PrS) and secondary sectors
(SeS) of the Danish health care system.
The prospective acceptable outcome study. A method for estimating LBP patients’ view of an
acceptable change before treatment begins (MCIDpre ) was developed and compared to a well
established retrospective method of determining the MCID (MCIDpost ).
Results
The responsiveness in subgroups study. The ODI measurement error ranged between -11.5
and +13 points. Responsiveness was comparable to the external measures. A floor effect was
seen in the PrS patients. The MCID was nine points in PrS and LBP only patients and eight
points in SeS and leg pain patients. Moreover, patients’ retrospective evaluation of treatment
effect was more responsive in PrS patients compared to serial measurements.
The prospective acceptable outcome study. The prospective acceptable outcome method was
reproducible. The MCIDpre was outside instrument measurement error and 1.5-4.5 times larger
compared to the MCIDpost . Furthermore, the MCIDpre was almost comparable to patients’
post-treatment acceptable change, but only for the pain scale.
Conclusion
The Danish version of the ODI is a reliable, valid and responsive HMS which is psychometrically more appropriate in SeS patients. In addition, the Roland Morris Disability Questionnaire (RMQ) is the most suitable for patients with LBP only whereas the ODI and RMQ is
equally suitable for patients with leg pain. The choice of pain scale is arbitrary in all subgroups
14
1. P REFACE
and the pain subscale of the Low Back Pain Rating Scale is recommended. The MCID was
more or less stable across subgroups for most instruments and increased monotonously with
baseline condition severity in PrS and LBP patients only.
The clinical question: “how are you now compared to when you started the treatment” seems to
be most sensitive to condition alterations in PrS patients and should be added as an outcome
measure to standard questionnaires used serially.
The prospective acceptable outcome method offers a benchmark by which clinicians can
balance any mismatch between what is acceptable outcomes to the patient with what is realistically obtainable by a certain treatment. Chronic LBP patients seem to have a reasonable
idea of an acceptable change in pain but overestimate change in functional and psychological
/affective domains.
1.7. S UMMARY IN D ANISH (D ANSK R ESUMÉ )
15
1.7. S UMMARY IN D ANISH (D ANSK R ESUMÉ )
Baggrund
Oswestry Disability Index (ODI) er et af to anbefalede spørgeskemaer til brug ved måling
af funktion hos rygpatienter. Psykometrisk testning af disse skemaer er velfunderet, men vores
viden om skemaernes anvendelighed samt den mindste kliniske relevante ændring (MCID) i
specifikke subgrupper af patienter med lændesmerter er begrænset. Sidst men ikke mindst,
har den gængse metode til bestemmelse af MCID metodologiske svagheder. En mulig løsning kunne være at spørge rygpatienterne, hvad de mener vil være et acceptabelt resultat af
behandlingen, før den igangsættes.
Formål
Formålet med afhandlingen var at bestemme responsiviteten og MCID i specifikke subgrupper af patienter med lændesmerter. Udover belyses spørgsmålet om, hvorvidt lænderygpatienter kan bestemme, hvad et acceptabelt resultat af et behandlingsregi vil være, før dette
påbegyndes.
Metode & materiale
Studie til måling af responsivitet i undergrupper. En omfattende oversættelse, tilpasning til
det danske sprog og validering af ODI blev gennemført på lænderygpatienter set i primær(PrS) og sekundærsektoren (SeS) i det danske sundhedsvæsen.
Studie til måling af patienters prospektive acceptable behandlingsresultat. En metode til
måling af lænderygpatienters acceptable ændring (MCIDpre ) før iværksættelse af en behandling blev udviklet og sammenlignet med den gængse metode til bestemmelse af MCID (MCIDpost ).
Resultater
Studie til måling af responsivitet i undergrupper. Målefejlen på ODI blev fundet til at ligge
mellem -11.5 og +13 points. Responsiviteten var sammenlignelig med de øvrige skemaer. ODI
have en udpræget “gulv-effekt” (floor-effect) hos PrS patienter. MCID var ni points i PrS og
lænderygpatienter og otte points hos SeS og bensmerte patienter. Desuden var patienternes
retrospektive evaluering af behandlingseffekt mere responsiv hos PrS patienter sammenlignet
med serielle målinger.
Studie til måling af patienters prospektive acceptable behandlingsresultat. Metoden til bestemmelse af lænderygpatienters prospektive acceptable behandlingsresultat var reproducerbar. MCIDpre var større end det givne skemas målefejl og 1.5-4.5 gange større sammenlignet
med MCIDpost . Hvad angår smerte, var lænderygpatienters MCIDpre næsten sammenlignelig
med deres acceptable ændring bestemt efter behandlingen.
16
1. P REFACE
Konklusion
Den danske version af ODI er reproducerbar, valid og responsiv men grundet psykometriske
forhold, er det mest hensigtsmæssigt at bruge spørgeskemaet til de mere kroniske sekundærsektor patienter. Ydermere, så er Roland Morris Disability Questionnaire (RMQ) det mest velegnede spørgeskema til patienter udelukkende med lændesmerter, mens ODI og RMQ er lige
velegnede til patienter med bensmerter. Valget af smerteskala er mere vilkårligt i alle subgrupper, og smerteskalaen fra Low Back Pain Rating Scale anbefales. MCID var mere eller mindre
stabil på tværs af alle subgrupperne for størstedelen af spørgeskemaerne og steg monotont
med tilstandens alvorlighed ved baseline, men dette var kun gældende for PrS patienter og
patienter med lænderygbesvær.
Det kliniske spørgsmål: “hvordan har du det nu sammlignet med da du startede behandlingen” er
mest sensitivt til kliniske relevante ændringer i PrS patienter og bør inkluderes som effektmål
med de gængse spørgeskemaer appliceret serielt.
Metoden til bestemmelse af lænderygpatienters prospektive acceptable behandlingsresultat
er ny. Den tillader klinikeren at afstemme hvad der er acceptabelt for patienten med hvad der
behandlingsmæssigt er opnåeligt. Det ser ud til, at kroniske lænderygpatienter har en fornuftig
idé om, hvad der er en acceptabel ændring i smerte, men at de overvurderer ændringer i fysisk
funktion og psykologiske/følelsesmæsige domæner, før behandlingen begynder.
2. I NTRODUCTION
“Whatever exists, exists in some amount and can be measured”
L. L. Thurstone, American psychometrician (1887-1955)
2.1. O UTCOME M EASUREMENT IN L OW B ACK PAIN
During the last part of the twentieth century, professions dealing with back pain have become aware of the problem in providing services to patients based primarily on tradition and
anecdote which are not justified by sound theory rooted in scientific evidence. Within the realm
of a modern health care system, calls for “evidence-based practice” in the management of low
back pain (LBP) is mounting. This mounting pressure has pushed clinicians and researchers
dealing with back pain towards demonstrating what they do “works”. The importance of differentiating the impact of their management programmes from the natural course of recovery
following the onset of disease or injury has been recognised and resulted in numerous clinical
studies and randomised clinical trials.
Demonstrating efficacy of clinical interventions for patients with LBP has proved difficult
and many trials are unable to show significant differences between various treatment programmes and control groups. Two main reasons can, at least in part, explain this finding:
first, the interventions are ineffective, and second, the trials have methodological shortcomings
blurring the differences in efficacy. One such methodological shortcoming could well be found
in the applied outcome measures as research into their psychometric properties always has
lacked behind the clinical intervention research itself. However, the momentum for clinimetric
research has increased drastically in recent years with the common goal of providing scientific
evidence to guide the choice of effective health care decisions by all parties involved (patients,
providers, policy makers, and third party payers).
Producing high quality clinical outcome research depends, among other things, on the ability of outcome measures to capture a clinical relevant change. In the realm of spinal disorders,
clinical success has traditionally relied on measuring physical changes which were directly
observable. Consequently, measurements of mortality, physiological changes (e.g. nerve conduction) or impairments of body functions (e.g. range of motion, straight leg-raise) were predominant despite weak correlation between physical outcomes with patient behaviour and
symptoms [10]. In the last decade clinicians and researchers are increasingly recognising that
18
2. I NTRODUCTION
the outcomes of these interventions are usually best seen from the perspective of the patient
in terms of their ability to perform activities of daily living and participate in life. This has resulted in a shift in paradigm where relying on objective findings as primary outcome measures
was commonplace to relying primarily on questionnaires measuring the patient’s perception
of function and pain as well as psychological/affective dimensions in clinical trials. Questionnaires designed to measure aspects of health are typically called health measurement scales
(HMS) [11].
HMS are typically classified according to their general applicability and fall into four categories: a) generic, b) region-specific, c) disease-specific and d) patient-specific [12–16] (Table
2.1). The generic HMS are designed to be applicable across populations, conditions and interTable 2.1. Advantages and limitations of the four types of HMS.
HMS category
Advantages
Limitations
Generic
− Cost and time effective.
− Identification of co-existing problems
− Comparisons of burden of illness and
treatment outcome between and
within groups
− Normative data often available
− May contain sections or items of
irrelevance to the patient or condition
under scrutiny
− Often lengthy
− May be less responsive compared to
region- and disease specific HMS
Region-specific
− Often brief
− Applicable to any condition in the
body region
− High relevance to patients
− More responsive compared to generic
measures
− Only applicable to the specified region
− Cannot address co-morbid problems
− May contain items of irrelevance to
patients with specific condition
Disease-specific
− Often brief
− High relevance to patients
− May be more responsive compared to
both generic and region-specific
measures
− Only applicable to specific disease
− Cannot address co-morbid problems
Patient-specific
− Not condition or region-specific
− Content tailored to individual patient
− May be the most responsive measure
− Requires administration by interview
− Cannot be meaningfully aggregated
− Problematic in acute conditions
References [12–16]
Legend: Advantages
and limitations
of the four typesThis
of HMS’s.
ventions
and are usually
multidimensional.
is in contrast to region- and disease-specific
instruments
which are designed to measure health status in diagnostic groupings or specific
Footer: References [7-10]
anatomical regions respectively. Last, the patient-specific instruments are structured to identify
activity limitations specific to a particular patient.
2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN
19
2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK
PAIN
The inclusion of self-reported HMS as a measurement tool in clinical trials or clinical practice is not without problems. The main areas of difficulty have been outlined in Table 2.2 and
will be discussed.
Table 2.2. Main problem areas using HMS as outcome measures.
Problem areas
Short description
Choice of HMS
• Considerations important to choice of HMS are
confusing and poorly described
• The validity of using HMS in specific target populations
and settings is poorly understood
HMS validation
• Measurement properties lack standardised definitions
and are often poorly established
HMS responsiveness and interpretation
• Definitions of responsiveness lack standardisation
• Establishing responsiveness of HMS is challenging as the
choice of measurement index is arbitrary
• Interpretation of HMS change scores is problematic as no
gold standard exists
Legend:
problem
areas using HMS’s as outcome measures.
2.2.1.
CMain
HOICE
OF HMS
Footer:
measurement
scale
TheHMS,
questHealth
for the
“ideal” HMS
to be used in clinical practice and research has sparked the
development of a plethora of self-report outcome measures. As a result more than 82 back
related instruments have been reported in the literature [17] and new questionnaires seem constantly to emerge. This makes the choice of a proper instrument for a given clinical situation
complicated and confusing [18–20]. In addition, many HMS are adapted and published in
several versions due to postulated shortcomings in the original HMS. A few examples are the
popular and much used Roland Morris Disability Questionnaire (RMQ) which exists in six
different versions [21–26] and the Oswestry Disability Index (ODI) which is published in four
versions [27]. As a consequence, there is poor comparability of study results and little shared
understanding of their clinical relevance. The urgent need for standardisation has resulted
in recommendations of a set of HMS to be used by clinicians and researchers when dealing
with spinal disorders (Table 2.3 on the next page), and most clinical studies now include at
least one of the recommended HMS [28]. However, it is important to recognise that the recommendations were based on agreement among a panel of medical experts and not on instrument
superiority. Most instruments offer advantages and limitations depending on study setting and
patient population. In addition, the included HMS are a core set of instruments covering only
the most important domains. As a result, researchers and clinicians may be exposed to situations where the core set of instruments is insufficient to cover the requirements of a particular
20
2. I NTRODUCTION
Table 2.3. A core set of HMS for patients with spinal disorders according to Bombardier.
Domain
HMS
Back-specific function
•
•
Roland Morris Disability Questionnaire
Oswestry Disability Index
Generic health status
•
SF-36 version 2.0
Pain
•
•
Bodily Pain subscale of SF-36
The Chronic Pain Grade Questionnaire
Work disability
•
•
•
Work status – 10 categories
Days off work and days of cut down work - number of days
Time to return to work - number of days
Satisfaction
•
•
Patient Satisfaction Scale – satisfaction with care
Global satisfaction with treatment outcome
Reference [28]
Legend: A core set of HMS for patients with spinal disorders.
study
and
patient population,
thus facing the difficulties of choosing an appropriate
Footer:setting
Reference
(Bombardier
ref 003)
HMS. To aid the choice of an appropriate HMS, the researcher or clinician may want to follow
the algorithm outlined in Figure 2.1 on page 25. The practical application of some of the steps
outlined in Figure 2.1 may, however, prove difficult. One obstacle is comparing measurement
properties of the various instruments. Many reviews of HMS specific to LBP have been published [19, 20, 27, 29–39] with the intend of making the choice more transparent; however, these
often lack comprehensiveness in either the included HMS or the measurement properties reviewed, and standardised quality criteria are rarely applied [40]. A second problem is finding
a HMS which has been tested under similar conditions. A recent trend in LBP research is to
focus on the efficacy of certain intervention strategies in specific subgroups of LBP patients
rather than comparing these strategies in LBP patients as a whole [41–43]. A similar trend of
testing and concurrently comparing the behaviour of HMS in different settings and subgroups
is almost completely lacking in the literature. As a result, it is often assumed that HMS are
psychometrically sound when applied to different settings and subgroups despite the fact that
patient characteristics may be very different. Clarification of which HMS is appropriate for
which setting and subgroup is required to catch up with recent trends in LBP research.
In summary, choosing a proper HMS is often cumbersome and time consuming due to the
large amount of instruments and versions available. Consequently, a core set of standardised HMS has been proposed, however, this is at the expense of flexibility and may be
insufficient for specific study settings. An algorithm of choosing an HMS has been outlined. Using the algorithm requires close scrutiny of systematic reviews and concurrent
comparisons of HMS and this may be problematic for two reasons: 1) systematic reviews
of HMS are often methodologically debatable, and 2) concurrent comparisons of HMS in
specific settings or subgroups are rare.
2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN
21
2.2.2. HMS VALIDATION
The methodological procedures applied in HMS validation studies reflect the confusion of
what measurement properties are essential minimum requirements to legitimise their use in
research or clinical settings. Some original measures have been published reporting only few
measurement properties [47] while others have undergone a profound validation process [48].
This is particular true for instruments translated from one language to another. Examples are
the RMQ and the ODI which have both been extensively validated in the English language
under many different settings and therefore must be said to be legitimate research tools for
patients with LBP [17, 34]. In contrast, the same questionnaires have been translated and validated into several languages reporting only a small number of measurement properties [49–52].
Thus, the validity of using these HMS in languages other than the source language is questionable due to lack of reporting essential measurement properties as outlined by Terwee et al. [40].
Another area of great confusion and disagreement is the framework of HMS measurement
properties. First, the list of synonyms for each concept is often long and confusing. Second,
many of these terms lack clear definitions and if definitions are given, they are oftentimes
conflicting. Last, the methods used to establish these measurement properties are many and
often interpreted in a variety of ways. An example of this is the measurement property of
reliability (Table 2.4 on the next page). The concept of reliability has a wealth of synonyms and
definitions when searching the HMS literature, and these reflects the multitude of described
measurement methods. In addition, some studies do not describe the type of reliability coefficient used [57–60] making inferences from their findings difficult.
In summary, choosing the proper measurement method for at given construct is complex,
making amble room for mistakes. Using standardised measurement properties in validation studies is uncommon which probably reflects the numerous shortcomings and gaps
seen in many HMS. We recommend researchers and clinicians to use the newly updated
measurement properties as outlined by Terwee et al. [40].
2.2.3. HMS R ESPONSIVENESS AND I NTERPRETATION
The use of HMS in longitudinal studies requires a detailed and meaningful understanding
of the instrument in question. The properties of responsiveness and interpretation are of major
importance to evaluative HMS.
For a HMS to be valid in a longitudinal study it has to be sensitive to changes in the measured health domain - it has to be responsive. Consequently, it has to be able to detect change
in patients’ condition when assessments are compared serially over time (Time2 - Time1). The
concept of responsiveness is a much sought after instrument property, however, it remains
elusive and less well understood. It presents numerous problems: 1) a commonly accepted
definition of responsiveness does not exist (at least 26 definitions have been described) pro-
22
2. I NTRODUCTION
Table 2.4. Reliability - synonyms, definitions and measurement method
Reliability
Synonyms
Definitions
Streiner & Norman
Agreement, association, concordance, consistency, precision, repeatability,
reproducibility, stability and more…
(Reliability) is a fundamental way to reflect the amount of error, both
random and systematic, inherent in any measurement.
Fayers & Machin
(Reliability) consists of determining that a scale or measurement yields
reproducible and consistent results.
De Vet et al.
Reliability concerns the degree to which patients can be distinguished from
each other, despite measurement error.
Finch et al.
Reliability is a measure of consistency and the ability to differentiate among
the objects of measurement.
Devillis
Scale reliability is the proportion of variance attributable to the true score of
the latent variable.
Oppenheim
Reliability refers to the purity and consistency of a measure, to repeatability,
to the probability of obtaining the same result again if the measure were to
be duplicated.
Hansagi & Allebeck
Reliability is the absence of small or large random (or systematic)
measurement errors.
Measurement methods
Correlations coefficients (Pearson correlation coefficient, Intra-class
correlation coefficients), Internal consistency (e.g. coefficient α, split‐half, Kuder‐Richardson 20 and 21 coefficients), Kappa coefficient, Bland &
Altman limits of agreement, Standard error of the measurement coefficients.
References [9, 11, 16, 53–56].
Legend: Reliability – synonyms, definitions and measurement method.
ducing an elusive conceptual framework [61], 2) quantification of responsiveness is often by
Footer: References
(…)however the methodology is poorly understood. As a result, at least 31
statistical
indices,
different
responsiveness
indices
been reported
in measurement,
the literature
between
Oppenheim,
AN. Questionnaire
design,have
interviewing
and attitude
2ndmaking
edition, 1996,
Printerstudy
Publishers,
London
&
New
York,
chapter
8,
Question
wording
pp.
119-149.
comparisons difficult [61], 3) there is a lack of consensus on the optimal measuring approach of
responsiveness,
and different indices may lead to different conclusions [62], 4) interpretation
Hansagi, H; Allebeck, P. Enkät och intervju inom hälso- och sjukvård. Handbok för forsking och
utvecklingsarbete,
Studenterlitteratur,
5, Utformning
av mätinstrumentet
– frågeformuläret,
of
some indices 1994,
is unclear
(e.g. effectchapter
size and
standardised
response mean)
[63], and 5)pp.
sev38-63.
eral different responsiveness classification systems have been published - each with a unique
conceptual framework of responsiveness [4, 63–65].
The second important property of an HMS is the challenge of interpretability of the serial
change scores they produce. This is illustrated in Case I.
Case I
A middle-aged car mechanic with LBP radiating to the left lateral thigh has received regular
conservative care over a 4 weeks period and is progressing well. The pain in the thigh has disappeared, however, there is still some pain in the back, and he finds it difficult to bend forward with
occasional catching pain. He is back to work 3 out of 5 days a week. Functional and pain HMS
2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN
23
were administered at the initial visit and again at 4 weeks. The initial functional score was 55
points (0-100 scale, high score = high disability) which was reduced to 42 points at the 4 weeks
follow-up visit.
Several questions may puzzle the clinician who is responsible for the patient described in the
case. For example: what does the summary score of the functional HMS mean? How do we
interpret the change score? Has the patient achieved a “clinically significant change”? Is the
change meaningful? Is there a need to change the management plan? Is the change score
outside measurement error which would typically occur in a routine administration of the
HMS? A prerequisite for answering some of these important questions is a clear definition of
what is meant by interpretation:
“Interpretation has been defined as the degree to which one can assign qualitative meaning to an
instrument’s quantitative score” [7]
Several approaches exist which can aid the researcher or clinician in interpreting HMS scores
and these are summarised in Table 2.5.
Table 2.5. Methods used to interpret HMS scores.
Index for interpretation
Description
Mean scores and SD
1) Of a reference group providing norm values
2) Of subgroups of patients who are expected to differ (e.g. LBP
patients seen in the PrS and SeS of the health care system)
3) Of patients before and after a treatment programme of known
efficiency (e.g. acute LBP patients receiving a course of NSAID)
4) Of patients in the different categories of an external anchor
(patients’ global rating of treatment effect)
Minimal clinically important difference*
“Is the smallest difference in a score in a domain of interest which
patients perceive as beneficial and which would mandate, in the
absence of troublesome side effects and excessive cost, a change
in the patient’s management”
* Synonyms: minimal important change, minimal important difference [8, 40, 66]
Legend: Methods used to interpret HMS scores.
Work in the area of the minimal clinically important difference (MCID) has been important
to
advance our knowledge of interpretability, as small numerical differences in mean HMS
Footer:
* Synonyms: minimal important change, minimal important difference (refs, Guyatt, 1986 & Terwee, 2006)
scores may produce statistically important results when large sample sizes are used. However,
this may not be equivalent to clinical significance [67]. Accordingly, the MCID demarcates a
threshold for when a person or group has begun experiencing an improvement which they consider important [4] and various methods have been proposed [63] (see Section 3.1.6 for details).
Establishing the MCID for a particular HMS is not easy as many methodological challenges
exist. First, there is no agreed upon gold standard used in MCID studies and there is some evidence that the magnitude of the MCID depends on the choice of external criteria [68]. Second,
the validity of the retrospective external criteria is unclear as some authors argue against its
24
2. I NTRODUCTION
use [69–71] while others believe it is valid [72,73]. Third, few studies concurrently compare the
methods used to calculate the MCID, and little is known about the similarity of results using
different methods [74,75]. Fourth, the MCID may vary according to which disease or condition
it is applied to, the level of severity at baseline, socioeconomic status, nationality and other
baseline characteristics [63, 75, 76]. Last, the reported MCID cannot always be distinguished
from measurement error for a particular HMS (i.e. MCID is smaller than the instrument measurement error) [40].
In summary, responsiveness and interpretability of evaluative HMS are closely linked. Instrument responsiveness is a necessity for ascertaining interpretability of the measured
change. Establishing responsiveness of an HMS is methodologically complex, poorly understood and lacking consensus despite an overwhelming amount of literature. Likewise,
ascertaining interpretability of HMS change scores also faces many methodological challenges but is slowly developing. Further research in both areas is needed to advance our
understanding of change scores which are indispensable for interpretation of results from
clinical studies.
25
2.2. P ROBLEMS OF O UTCOME M EASUREMENT IN R ELATION TO L OW B ACK PAIN
Figure 2.1. An algorithm for choosing an HMS in spinal research.
Identify measurement purpose
Discriminative
HMS is designed to distinguish
between individuals or groups of
patients at a single point in time.
No gold standard is available.
Predictive
HMS is designed to classify patients into
predefined measurement categories,
either concurrently or prospectively.
Gold standard is available.
Evaluative
HMS is designed to measure the
magnitude of longitudinal
change in individuals or groups
of patients.
Identify HMS measurement domain
general health status
pain intensity
activity restrictions
work disability
patient satisfaction
social status
psychological/affective aspects
other
Define target patient population and setting
Generic
Region-specific
Disease-specific
Patient-specific
Consider the information required
Global HMS
- provide simplicity at the cost of detail
Mulit-item HMS
- provide complete profile of construct - may ↑ burden
Reduce the list of potential HMS by considering:
1. Measurement properties
- reproducibility & validity
- responsiveness & MCID values (evaluative HMS)
2. Feasibility
- administration, cost, respondent burden
3. Use in similar population
Pilot test the chosen HMS for feasibility
- has HMS been tested in similar population as the
target population?
* MCID, minimal clinically important difference [16, 28, 44–46]
3. E VALUATION OF H EALTH M EASUREMENT
S CALES
“A measurement is not an absolute thing, but only relates one entity to another”
H.T. Pledge (1966)
3.1. C ONCEPTS OF E VALUATION C RITERIA
The field of health status measurement has been characterised by the proliferation of HMS
varying in their methods of development, content and breadth of application. This has resulted
in published HMS of inconsistent quality adding to the confusion and difficulty when choosing
an instrument. Moreover, most of these are developed in English-speaking countries and only
exist in the source language [77]. Consequently, both international comparisons of study results
and the number of multinational research projects have been impeded.
The above problems have called for the establishment of principles, procedures and criteria for assessment of instrument quality. Several articles have been published proposing
blueprints for translating and cross-culturally adapting questionnaires [77, 78] while others
have recommended evaluation criteria for the measurement properties of HMS [40, 79]. The
cross-cultural adaptation procedures accompanied by quality criteria for measurement properties provide researchers with a comprehensive set of tools to undertake a validation study
of an HMS. Table 3.1 on the facing page outlines essential measurement properties for HMS
evaluation and will be described briefly in Section 3.1.1 to 3.1.6.
3.1.1. C ROSS - CULTURAL A DAPTATION
The process of translation and cross-cultural adaptation of HMS for use in other languages
has been thoroughly described and documented in two reviews by Guillemin et al. and Beaton
et al. [77, 78].
The purpose of cross-culturally adapting HMS is to look at issues of translation and cultural
adaptation when preparing it for use in another language than the source language and in
different settings. It involves the adaptation of questionnaire instructions and items including
the response options and involves six stages:
3.1. C ONCEPTS OF E VALUATION C RITERIA
27
Table 3.1. Essential measurement properties of HMS.
Measurement properties
Translation and cross-cultural adaptation
Internal consistency
Floor and ceiling effect (scale width)
Reproducibility
- agreement
- reliability
Validity
- content validity
- criterion validity
- construct validity
Responsiveness and interpretability
Stage 1 is a forward translation
of the HMS
into the target
language by two independent
Legend: Essential
HMS measurement
properties
and bilingual translators (T1 & T2). Translator 1 should be aware of the underlying HMS
constructs to provide clinical equivalency whereas translator 2 is naive to the concepts being
quantified.
Stage 2 is a synthesis of the two translations by the two translators and the responsible investigator to produce one common translation (T-12). Discrepancies are resolved by consensus
and documented in a written report.
Stage 3 is a back translation of T-12 from the target to the source language by two translators
(BT1 & BT2). Both translators are unaware of the content of the original HMS and have the
source language as their mother tongue.
Stage 4 is a meeting of an expert committee which comprises methodologists, health professionals, language professionals, the translators, and the responsible investigator. The main
purpose is to review all translations together with the written reports, discuss disagreements
and reach consensus in four areas:
•
•
•
•
Semantic Equivalence (differences in the meaning of the words, grammatical difficulties)
Idiomatic Equivalence (jargon may be difficult to translate)
Experiential Equivalence (experiences of daily life may differ from one culture to another)
Conceptual Equivalence (words may hold different conceptual meaning between cultures)
The final translation of the HMS should be comprehensible by a 12-year-old person.
Stage 5 is testing of the pre-final version of the HMS on 30-40 patients from the target settings. All patients complete the questionnaire and are subjected to a structured interview. The
interview is designed to probe patient’s perception of the purpose and meaning of each item,
difficulties in question comprehension, patterns of missing items, and comments on layout and
content.
Stage 6 is a final audit of the translation process carried out by either the HMS developer
or the coordinating committee and will not result in alterations of the content of the HMS.
28
3. E VALUATION OF H EALTH M EASUREMENT S CALES
A summary of the process of translation and cross-cultural adaptation is provided in Figure
3.1.
Figure 3.1. Graphic representation of the stages of cross-cultural adaptation.
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Translation
Synthesis
Back
Translation
Expert
committee
review
Pretesting
- Two translations (T1 &
T2) into target
language
- Informed + uninformed
translator
- Synthesize T1 &
T2 into T-12
- Consensus on
discrepancies
with translators’
reports
- Two English
first-language
- Naïve to
outcome
measurements
- Work from T-12
- Create two back
translations BT1
& BT2
- Review all
reports
- Methodologist,
developer,
language
professional,
translators
- Consensus on
discrepancies
- Produce pre-final
version
- n = 30-40
- Complete
questionnaire
- Probe to get
an understanding of
item
Written report
Written report
Written report
for each version
(T1 & T2)
Written report
Written report
for each version
(BT1 & BT2)
Stage 6
Submission and appraisal of all written reports by developers/committee
Reference [78]
3.1.2. I NTERNAL C ONSISTENCY
In HMS designed to measure one particular health construct (e.g. pain related function), it
is desirable that all items reflect that particular latent variable or dimension so that the items
may be summed [11]. Internal consistency (or item homogeneity) is based on parallel assessments of patients at an instant in time, and some authors consider it an aspect of reliability as it
provides an index of a scale’s ability to differentiate among patients at an instant in time [16,53].
It refers to the extent to which scores of the items are related to each other (item-item correlation) or to the total scale score (item-total correlation). With regard to scale construction, three
important scenarios emerge which may result in item elimination: 1) items showing a high
inter-correlation coefficient may be redundant, 2) items with a low inter-correlation coefficient
may be measuring a different construct, and 3) items demonstrating either very high or low
correlation with the total score should be discarded.
3.1. C ONCEPTS OF E VALUATION C RITERIA
29
Frequently reported internal consistency coefficients include the split-half coefficient, itemtotal correlation, Kuder-Richardson 20 and 21 coefficients, and Cronbach’s coefficient α. Of
these, Cronbach’s α1 is reported most frequently as it is useful to determine which items to
retain and which to reject to form an internally consistent HMS. Cronbach’s coefficient α ranges
from 0 (uncorrelated items) to 1 (perfect item correlation), however, interpretation is not as
straight forward as this may imply. First, α depends on the number of items in the scale, thus
increasing the number of items will increase α. Second, HMS measuring more than one latent
variable will usually have a high α as the different dimensions are often correlated to each other.
Last, if α is too high this may suggest item redundancy and a reasonably interval of 0.7-0.9 has
been suggested [80, 81].
3.1.3. F LOOR & C EILING E FFECT (S CALE W IDTH )
A useful HMS must cover a broad spectrum of the measured health construct to provide
room on the response scale for patients to demonstrate improvement or deterioration. Problems arise when patients score either the highest (most of the scale attribute) or lowest (least
of the scale attribute) possible scale score as these scores do not allow for any further deterioration or improvement, respectively. Accordingly, a patient scoring the highest scale score is
said to have reached the “ceiling” whereas patients scoring the lowest score have reached the
“floor” of the scale [16]. “Floor” and “ceiling” effects are usually expressed as the proportion
of patients returning the lowest and highest possible scores, and McHorney & Tarlow have
suggested a rate of less than 15% to be acceptable [82].
The concepts of “floor” and “ceiling” effect can be extended to include the idea of scale
width [83]. Scale width is defined as the region of the score range of an HMS with the capacity
to allow detection of change in scores over time which is not due to measurement error. Instrument measurement error is often reported as the minimal detectable change (MDC95% ) or
Bland and Altman’s limits of agreement (LOA). As an acceptable rate, one could suggest that
HMS with more than 10% of the patients scoring within measurement error at each end of the
scale range should not be used. This has been used in the current thesis (Figure 3.2 on the next
page).
3.1.4. R EPRODUCIBILITY
Reproducibility of an HMS in research and clinical practice is sine qua non2 . It has been
defined as the degree to which an instrument yields comparable results if it is used repeatedly
on stable patients [9]. An HMS will always show some degree of score fluctuation despite no
changes in the patient and in the absence of treatment, however, large fluctuations provide
results which cannot be trusted. The sources of score fluctuations are many and in instruments
n
n −1
∑ σi2
σT2
1
Cronbach’s α =
2
Latin: “without which it could not be”
(1 −
), where α = alpha, n = number of items, σi2 = item score SD, σT2 = total score SD [11]
”Measurement error”
(LOA or MDC95%)
< 10% = acceptable
30
”Measurement error”
(LOA or MDC95%)
< 10% = acceptable
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Figure 3.2. “Floor” and “ceiling” effects versus scale width for a fictive HMS.
0
Scale range
No disability
0
Worst possible
disability
Conventional definition
”Floor”
< 15% = acceptable
0
100
100
”Ceiling”
< 15% = acceptable
Scale width
”Measurement error”
(LOA or MDC95%)
< 10% = acceptable
100
”Measurement error”
(LOA or MDC95%)
< 10% = acceptable
LOA, Bland and Altman’s limits of agreement; MDC95% , minimal detectable change at 95% confidence
level [82, 83]
Legend: ”Floor” and ”ceiling” effect versus scale width for a fictive HMS.
related to LBP patients the stability is often influenced by within-patient, instrument and seta
small part of the available range will reduce the reliability coefficients as it is more difficult to
discriminate between the subjects.
A further consideration in reproducibility studies is the time frame for data collection,
which may influence the coefficients. If the time period is long - say 3-4 weeks for LBP patients - then it is likely that the included patients have undergone a real change especially if the
condition is expected to change rapidly. The consequence is reduced coefficients. Opposite,
a short time interval may increase the memory effect and overestimate the coefficients. To
understand the concept of reproducibility it is important to differentiate between agreement
and reliability [9, 40] which will be described briefly.
Footer: LOA, Bland & Altman’s limits of agreements; MDC95%, Minimal Detectable Change at 95% confidence
level.
ting variances. For example, using a homogeneous patient sample with scores restricted to
Agreement
Agreement involves the measurement error of the HMS and is an expression of how close
the scores on repeated measurements are3 . It is expressed in the same units as the HMS in
question and is an important property for evaluative instruments where clinically important
changes have to be differentiated from measurement error. The concept of agreement including
commonly used parameters is summarised in Figure 3.3 on the facing page.
3
This is sometimes referred to as test-retest reliability [84]
31
3.1. C ONCEPTS OF E VALUATION C RITERIA
Figure 3.3. The concepts of agreement and reliability.
Agreement
Reliability
5 measurements in 1 person
High
Scale: 0
Scale: 0
Parameters:
Scale: 0
100
5 measurements in 1 person
Low
100
5 measurements in 2 persons
Low
Scale: 0
100
SEM
LOA
MDC95%
5 measurements in 2 persons
High
Parameters:
100
ICC
Cohen’s Kappa
(Pearson c.c.)
SEM, standard error of the measurement; LOA, Bland and Altman’s limits of agreement; MDC95% , minimum detectable change at 95% confidence level; ICC, intra-class correlation coefficient; Pearson c.c.,
Pearson correlation coefficient [9]
The standard error of the mean (SEM) is the error associated with a measurement taken at
a single point in time [6, 85]. The SEM is one standard deviation of the error associated with a
single measurement, so that in 95% of the cases the patient’s true score will lie approximately
between ±2 SEMs of the observed value. Conceptually, the SEM can be estimated by two
equivalent methods:
The square root of the error variance derived from a repeated measures analysis of variance
[86] The concepts of agreement and reliability
Legend:
p
• The equation SEM = SD · (1 − R) [87]
•
Footer: References (de Vet). SEM, Standard error of the measurement; LOA, Bland and Altman’s limits of
agreement; MDC95%, Minimum detectable change at 95% confidence level; ICC, intra-class correlation
The
reliability
coefficient
(R)
can be coefficient
either a test-retest parameter such as the intraclass
coefficient;
Pearson
c.c., Pearson
correlation
correlation coefficients (ICC) or Cronbach’s α. It can be argued that the test-retest parameter is
more
appropriate
in HMS
Parameters:
ICC scores, as it (in contrast to
Parameters:
SEM in the context of longitudinal changes
LOA
Cohen’s Kappa
Cronbach’s α) represents temporal stability [6]. The SEM is considered
a fixed characteristic
MDC95%
(Pearson c.c.)
regardless of the subjects under investigation. The independence of the sample and the fact
that it is expressed in the original metric of the instrument make the SEM more appropriate for
interpreting individual scores within a population compared to effect size [87].
The SEM can be used to calculate what has been termed the minimum detectable change
(MDC95% ). This coefficient indicates when a change score of an HMS is outside measurement
error at the 95% confidence level. It is calculated from the following formula:
√
MDC95% = 1.96 · 2 · SEM
32
3. E VALUATION OF H EALTH M EASUREMENT S CALES
√
The 1.96 signifies the 95% confidence level and the 2 accounts for the magnitude of the
measurement error in repeated measurements.
A further popular method for establishing agreement is the method by Bland and Altman
[88, 89]. It uses the idea of LOA which compares two measurements obtained by the same
method. The difference between the measurements is plotted against the mean of the same
measurements with 95% limits calculated as the mean difference ±1.96 SDs. Thus, 95% of the
differences between the two measurements lie between these limits. An example of a limits of
agreement plot is shown in Figure 3.4.
Figure 3.4. Bland and Altman’s limits of agreement plot
Reliability
Reliability concerns the discriminative ability of an HMS and should be emphasized if discrimination between different levels of the measured health construct is important. In other
words, how well can patients be distinguished from each other, despite measurement error
(Figure 3.3). Reliability coefficients are expressed as a ratio between 0 and 1 - zero indicating
no reliability, and one indicating no measurement error and perfect reliability. It follows the
basic formula [11]:
Reliability =
Subject Variability
Subject Variability + Measurement Error
33
3.1. C ONCEPTS OF E VALUATION C RITERIA
Reliability is best described by the ICC where ICCagreement accounts for systematic error
variance and ICCconsistency does not4 . Both coefficients depend on the heterogeneity of the
measured construct in the sample under study, and basic formulas are:
ICCagreement =
2
σsubjects
2
2
2
σsubjects
+ σsystematic
+ σresidual
ICCconsistency =
2
σsubjects
2
2
σsubjects
+ σresidual
2
In the formulas, the σsubjects
is the variance resulting from the subjects (or patients) included
2
in our study. The σsystematic represents the systematic difference between the two measurements
2
and the σresidual
connotes the unexplained measurement error. For a more comprehensive review of reliability I refer to de Vet et al. [9].
3.1.5. VALIDITY
Validity is a process of determining if the scale is measuring what we think it is; that is,
can we make valid statements about a person based on his or her score on the HMS. Thus,
validation processes are not so much directed towards the integrity of the scale but more a
process of determining the degree of confidence we can place on inferences made about the
scale scores. Consistent with this concept is the notion that knowledge of an instrument’s
validity is constantly evolving as new information becomes available [11, 16, 53].
Traditionally, validity has been subdivided into the trinitarian Cs: Content validity, Criterion validity and Construct validity. Unfortunately, the concepts of validity have been defined
and structured differently in the literature creating a certain degree of confusion. However, as
stated by Streiner and Norman [11]:
The important questions are, “Does the hypothesis of this validation study make sense in light
of what the scale is designed to measure”, and “Do the results of this study allow us to draw the
inferences that we wish to make?”
More recently, several additional terms have been added to distinguish among different assessment approaches. As these terms are presented in most textbooks and scientific articles they
will be discussed briefly (Figure 3.5 on the next page). For a more in-depth description of validity and its many approaches I refer to well-known textbooks of psychometric theory [11,16,53].
Content Validity
Content validity concerns the extent to which an HMS is composed of a representative
sample of questions that assesses the target domain. High content validity will ensure broad
4 The terminology of agreement and consistency is not to be confused by other meanings used in the literature.
2
The ICCagreement simply includes the systematic error variance (σsystematic
) whereas the ICCconsistency does not. They are
both reliability parameters.
34
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Figure 3.5. Concepts of validity
Validity
Content
validity
Criterion
validity
Construct
validity
Face
validity
Concurrent
validity
Convergent
validity
Item
coverage
Predictive
validity
Known group
validity
Discriminant
validity
References [11, 16, 53]
applicability of a measure as the inferences hold true under a variety of different situations.
It involves a critical examination of the instrument structure, a review of the development
procedures, and consideration of applicability to the intended research question. Two related
concepts are central to content validity: 1) item coverage and relevance, and 2) face validity
[53]. Legend: Concepts of validity
Item coverage and relevance involve selection of proper items from a pool of questions. The
Footer: References (Fayes, Streiner, Finch)
pool of questions is often collected from focus groups or key informant interviews. This usually
leaves scale developers with far more items than will ultimately end up in the instrument, and
a selection process has to occur. Two aspects are important for a rigorous selection process: 1)
each item should be evaluated for relevance in terms of the target domain, and 2) the chosen
items should cover the target domain in all aspects.
Face validity considers whether items in an instrument appear “on the face of it” to measure
what they are intended to measure clearly and unambiguously. It is often considered an aspect
of content validity with the main difference being the timing of the critical review: face validity
occurs after the HMS has been constructed while most of the procedures of content validity
take place during the development procedures [53].
Criterion Validity
Criterion validity has traditionally been defined as the correlation of a scale with the “true
value”, or with some other standard that is accepted as providing an indication of the “true
value” of the trait or disorder under study [11]. It can be divided into concurrent validity and
predictive validity.
Concurrent validity is the correlation of a new HMS with the criterion measure that is obtained at approximately the same point in time. If agreement between the two instruments is
considered to be poor, the concurrent validity is low.
3.1. C ONCEPTS OF E VALUATION C RITERIA
35
Predictive validity has been defined as the ability of the instrument to predict future health
status, future events, or future test results. Therefore, the future health status/event/test serves
as a criterion to which the instrument is compared. A fictive example could be that HMS scores
from patients with subacute LBP are predictive of their work status, thus providing additional
prognostic information.
Construct Validity
The concept of construct validity refers to the extent to which a particular HMS relates to
other instruments in a manner which is consistent with theoretically derived hypotheses concerning the concepts that are being measured [45]. It involves forming theories (or constructs)
to explain the relationships among various behaviours or attitudes of interest and then assessing the extent to which the HMS provides results that are consistent with the theories. More
formally, construct validity embraces a variety of techniques which are aimed at two things: 1)
whether the theoretical postulated construct appears to be an adequate model, and 2) whether
the HMS appears to correspond to that postulated construct. Three main types of construct
validity are described [53].
Known-groups validity is a simple form of construct validation. It refers to a validation process that examines two distinct groups - one of which has the attribute, and the other does
not. The group with the attribute should show higher scores on the HMS (higher = more of
the attribute) in comparison to the other group. For example, it is likely that patients with
longstanding chronic LBP will have higher scores on the pain catastrophising scale compared
to persons without LBP.
Convergent validity examines the extent to which a dimension measured in an HMS correlates appreciably with all other dimensions that are believed to be related to it. For example, it
is likely that patients with longstanding chronic LBP are predisposed to catastrophize [90] for
which reason there would be a correlation between the pain score and the pain catastrophizing
scale.
Discriminant validity (or divergent) on the other hand, recognises that a dimension measured
in an HMS may be relatively unrelated to other dimensions not associated with it. For example,
we would expect chronic LBP patients’ scores on a functional HMS to correlate more with the
SF36 physical function subscale than with the role emotional subscale.
Study designs for known-group, convergent and discriminant validity can be either crosssectional or longitudinal depending on which construct one chooses to examine [16].
3.1.6. R ESPONSIVENESS AND I NTERPRETABILITY
Conceptual Framework
Numerous definitions of the concept of responsiveness have been described in the literature, and no single one is commonly accepted [61]. The many definitions have been grouped
into three main categories which are summarised in Table 3.2 on the following page.
36
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Table 3.2. Definitions of responsiveness.
Definition categories of responsiveness
Type of change detected
1. ”The ability to detect change in general”
- any type of change, regardless of whether it is
relevant or meaningful
2. “The ability to detect clinically important change”
- a clinically important change. Requires an explicit,
although often subjective, judgement on what
changes are important
3. “The ability to detect real changes in the concept
being measured”
- an extension of 1 & 2. Requires a “gold standard” in
addition to judgement of what changes are important
Reference [61]
As an operational definition for this thesis, I have chosen a simple version which encompasses all types of change by omitting the nature of it [3–5]:
“The ability of an instrument to detect accurately change when it has occurred”
The literature shows some contention about whether responsiveness should be considered a
part of validity or a separate attribute. Several theorists maintain that responsiveness is part
and parcel of validity involving concurrent comparisons of the change, most akin to criterion
validity [11,91,92]. Other authors regard it to be conceptually useful to consider responsiveness
as a distinct measurement property from validity, albeit one that affects the range of valid
applications [45, 93–95]
Several conceptual frameworks for understanding the various methodological approaches
to responsiveness have been published (Table 3.3 on the next page). Central to all of them is the
link between methodological design features and the concepts being measured. For the purpose of this thesis I will focus on the framework of the distribution-based and anchor-based
approaches, and the methods within these approaches which are relevant for the papers included in Section 11 will be dealt with in detail. A complete description of the different indices
can be found in Terwee et al. and Crosby et al. [61, 63].
Distribution-based Approaches
Probably the most popular strategies are the distribution-based methods which are based
on statistical characteristics of the sample. They rely on relating the difference between preand post-treatment scores to some measure of variability [65, 97, 98]. The values obtained by
these methods can be used to concurrently compare the responsiveness of HMS when applied
to the same sample. However, all the distribution-based indices are limited by insufficiently indicating the importance of the observed change [6]. A summary of commonly used approaches
with the sources used to define an important change is given in Table 3.4 on page 38.
37
3.1. C ONCEPTS OF E VALUATION C RITERIA
Table 3.3. Proposed conceptual framework for responsiveness
Authors
Conceptual framework
Lydick et al. (1993)
- describes two approaches to define clinically meaningful change:
a) distribution-based approaches are based on the statistical characteristics of the
obtained sample, and three types have been identified: i. those based on
statistical significance; ii. those based on change in relation to sample
variation, and iii. those based on measurement precision
b) anchor-based approaches are based on comparisons of HMS scores to other
measures or phenomena that have clinical relevance
Stratford et al. (1996)
- identifies five study designs for assessing responsiveness. Presents theoretical
considerations of which statistical analytical approach is optimal for each design
Husted et al. (2000)
- suggests two major areas of responsiveness:
a) internal responsiveness, which characterises the ability of a measure to
change over a particular pre-specified time frame
b) external responsiveness, which reflects the extent to which changes in a
measure over a specified time frame relate to corresponding changes in a
reference measure of health status
Beaton et al. (2001)
- describes a taxonomy of responsiveness that provides a triple-axis matrix to
classify responsiveness studies:
a) the “Who” axis. Who is being studied: individuals or groups?
b) the “Which” axis. Which information is being studied: i. between-person
differences; ii. within-person differences, or iii. both
c) the “What” axis. Differentiates five concepts of change: i. the minimum
change; ii. the minimum detectable change; iii. the “observed change”; iv.
the “estimated change”, and v. the “important change”
Stratford et al. (2005)
- identifies three study designs relevant for assessing sensitivity to change and
suggests optimal statistical approaches for each design.
References [4, 62, 64, 65, 96]
Legend: Proposed conceptual frameworks for responsiveness
Footer: References: see above
Three sub-divisions of this approach have been described in the literature depending on:
1) statistical significance, 2) sample variation, or 3) measurement precision [63]. The first strategy is based on statistical significance and evaluates the probability of the observed change
occurring by random variation. For example, LBP patients undergoing treatment with known
efficacy are tested with a pain scale at baseline and after treatment cessation. The mean baseline score can be compared to the mean score at follow-up using the paired t-test or Wilcoxon
signed-rank test.
A second broad strategy assesses change in relation to sample variation. Typically, the
numerator is the mean change score for a group of subjects which can be regarded as the “signal” [97]. This is divided by the variability in the sample - the “noise”. The most common
example is the effect size where the denominator is divided by the standard deviation of the
mean baseline score. An effect size of 1 would therefore indicate a magnitude of change equal
to one standard deviation of the baseline score [105]. Interpretation of the effect size may follow the definitions by Cohen who defined a small effect size to be between 0.2-0.5, a medium
as 0.5-0.8, and large as greater than 0.8 [106]. Several advantages of the effect size have been
38
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Table 3.4. Distribution-based methods for determining change
Measure
Paired t-test
Gold standard* (reference)
Calculation
T
P
(Moayyedi et al., 1998)
(Guyatt et al., 1987)
Growth curve analysis
T
(Speer et al., 1995)
Effect size
T
P
(Kazis et al., 1989)
(Fitzpatrick et al., 1992)
∑ (x
T
P
(Norman et al., 1997)
(Beaton et al., 1997)
∑ (d
x1 − x 0
∑ (d
i
− d )2
n ( n − 1)
Standardised response mean
B
V
x1 − x 0
2
0 − x1 )
n −1
x1 − x 0
0
− d )2
n −1
Guyatt’s responsiveness statistic
Standard error of the measurement (SEM)
D
T
(Deyo et al., 1991)**
(Guyatt et al., 1987)
T
(Wyrwich et al., 1999)
∑
∑ (x
x 1 − x 0 **
( d i stable − d stable ) 2
n −1
0
− x0 ) 2
n −1
Reliable change index
T
(Jacobson et al., 1991)
(1 − R )
x1 − x0
2( SEM ) 2
* The gold standard is the source which can be used to define an important change.
** Can* be
by the
minimal
clinically
difference
determined
Thereplaced
gold standard
is the
source which
can beimportant
used to define
an important
change. by the doctor.
D, important
change according to the doctor; P, important change according to the patient; T, change due
** Can be replaced by the minimal clinically important difference determined by the doctor
Footer: D,
important change according to the doctor; P, important change according to the patient;T, change due
to treatment
effect.
treatment effect
¯
Key: xto
0 , pre-test score; x1 , post-test score; di , pre-to-post difference score for subject i; d, mean difference
√
score; n, sample size; R, reliability of the HMS, B, empirical Bayes estimate of the individual slope; V,
empirical Bayes estimate of the standard error of the slope; SEM, standard error of the measurement
[61, 63, 71, 85, 94, 97, 99–104]
described: 1) uses standardised units comparable across HMS, 2) uses the pre-test scores as the
denominator (“noise”) which can be considered a proxy for control group scores [97]. Therefore, the effect size quantifies the extend to which the magnitude of the change scores exceeds
the “noise” of normal variability. Last, effect size is independent of sample size. However,
critics have challenged the effect size since it: 1) decreases with increasing baseline sample
variability, 2) does not consider the variability of the change scores, and 3) may vary widely
among samples [63].
In contrast to the effect size, the standardised response mean (SRM) has variability of the
sample change scores as the denominator whereas the numerator is the same. A large SRM
indicates that the change is large relative to the background measurement variability. The
advantages of the SRM are: 1) it uses standardised units, 2) it is independent of sample size, and
3) it is based on variability of the change [63]. Disadvantages are: 1) that seemingly comparable
individual changes may have different SRM values as it is dependent on the variability of the
change in the sample, and 2) it varies as a function of the effectiveness of the treatment.
3.1. C ONCEPTS OF E VALUATION C RITERIA
39
The last strategy is based on the measurement precision of the instrument. Examples are
the SEM (Section 3.1.4) or the reliable change index. These indices evaluate change in relation
to variation of the instrument and not in relation to the sample.
In summary, the distribution-based approaches are by far the most popular methods seen
in clinimetric studies. They provide a means of measuring change beyond some level of
random variation and have a common metric with comparable interpretation across HMS
and study populations. On the downside, few agree-upon benchmarks for establishing
clinically significant improvement have been established. Second, the reported results of
these methods do not provide the clinician or researcher with an intuitive sense of what is
a clinically meaningful and relevant change. The SEM and reliable change index have been
proposed as the most promising indices since they are only influenced to a minor degree
by baseline score variability, variability of the change scores, and the sample size [63].
Anchor-based Approaches
The second strategy involves comparing changes in an HMS over a specified time frame
to corresponding changes in a reference measure that have clinical relevance. This allows for
determination of clinically meaningful change scores, and Jaeschke et al. [8] coined the term
minimal clinically important difference in 1989 and defined it as:
“The smallest difference in score in the domain of interest which patients perceive as beneficial
and which would mandate, in the absence of troublesome side effects and excessive cost, a change
in the patient’s management”
Several anchor-based methods have been reported in the literature ranging from correlation
methods through diagnostic testing methods to regression models and both cross-sectional
and longitudinal designs have been described [61, 63, 64]. A summary of the most important
designs are presented in Table 3.5 on the next page. The following description and discussion
will be limited to the longitudinal design involving a global rating of change.
Correlation of Change Scores with a Global Rating of Change
This method examines the relationship between HMS change scores and an independent
external measure (or anchor) serving as an aid to learning about how to interpret the HMS [96].
The most commonly used external criterion is the patients’ global retrospective assessment of
treatment effect (transition question, TQ) [13,60,63,69,107–109] even though other criteria such
as the presence or absence of symptoms, differences among diagnostic groups and amount of
health care utilisation have been used [68, 110].
The TQ approach uses a retrospective question such as: "Are you feeling better or worse,
and if so, what is the extent of the change?" and the response options usually range from "much
worse" to "much better". However, the number of in-between categories varies from one study
to another (range: 3 to 13) [69, 107, 111].
40
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Table 3.5. Anchor-based methods for determining change
Method
Description
Gold standard (reference)
Cross-sectional designs
1. Comparison to
disease-related
criteria
Group comparisons in terms of standardised
severity levels or diagnoses. Differences in
mean scores across the groups are used to
estimate the MCID
Disease severity or diagnosis
(Deyo et al., 1982)
2. Comparison to
non-disease
related criteria
Links the change in HMS scores which occurs
when an external non-disease event (e.g.
death of a spouse) takes place i.e. before
event and after event score
External life event
(Testa et al., 1996)
3. Preference ratings
Patient comparisons of own health state to
hypothetical health states on a pairwise
basis. Differences in HMS scores rated
“barely different” equates the MCID
Own health state
(Llewellyn-Thomas et al., 1996)
4. Comparison to
known
population(s)
Comparison of dysfunctional to functional
populations. Describes recovery status in
relation to SDs between the groups
Functional or dysfunctional
populations
(Jacobson et al., 1991)
1. Global ratings of
change
Changes in HMS are correlated to patients’/
clinicians’ global rating of improvement.
Several methods to establish MCID e.g. 1)
differences in mean scores between global
rating categories, 2) application of
diagnostic test procedures
Patients’ or clinicians’ global
rating of improvement
(Stucki et al., 1995; Deyo et al.,
1986)
2. Prognosis of
future events
HMS predicting a future event (e.g. care use,
costs etc.). Difference in mean scores
among those experiencing and those not
experiencing the event used to establish the
MCID
Those experiencing and not
experiencing a future event
(Ware et al., 1984)
3. Changes in
disease related
outcome
Comparison of changes in HMS to obtained
or not obtained changes in other diseaserelated measures of outcome. Difference in
HMS change scores between obtained and
not obtained groups equates the MCID
Changes in clinical outcome
(Kolotkin et al., 2002)
Longitudinal designs
MCID, minimal clinically important difference [63, 93, 104, 112–116]
Legend: Anchor-based methods for determining change
The selection of the independent anchor is made either by the patient him/herself, by a
clinician/expert, or both [8, 74, 117]. Several methodological problems have been described
when using the retrospective TQ [69, 71, 118] and these are summarised in Table 3.6 on the
facing page. However, Hägg et al. [72] showed the recall bias to be a “bidirectional overestimation” affecting both improved and worsened patients and interpreted this as an expression of
greater sensitivity to change. This was valid for a 5-10 year period [73]. In addition they found
only weak evidence of motivational bias in a surgical patient group. Additional aspects of the
patients’ global retrospective assessment of treatment effect which deserve attention are their
Footer: MCID, minimal clinically important difference (refs)
41
3.1. C ONCEPTS OF E VALUATION C RITERIA
Table 3.6. Methodological weaknesses of the TQs as reported in the literature.
TQ weaknesses
Description
Recall bias
Refers to the inability of patients to recall their prior health state. This is most
pronounced when the time span is long
Present-state bias
Refers to the correlation between the post-treatment score and the TQ. In other
words, TQ ratings given by patients depend on how they are feeling “at present”
and not the change experienced during the treatment period.
Motivational bias
Refers to the overestimation of the treatment response by patients who have
undergone a cumbersome treatment
Contamination bias
Refers to the inability of patients to separate the measured health construct from
other co-morbidities concurrently present when rating the TQ. Consequently,
factors other than the measured construct may influence the TQ rating (i.e. for a
LBP patient, this could be neck pain, headaches, a sprained ankle etc.)
Dependence bias
Refers to the methodological problem of allowing patients to concurrently rate
both the HMS and the TQ. Consequently, there is dependence between the HMS
and the external anchor – one may affect the other and vice versa.
References [71, 72, 118–120]
Legend: Methodological weaknesses of the TQ reported in the literature.
construction
and(ref
implementation.
been
left at the discretion of individual researchers
Footer: References
509 Norman et al. This
1997, has
Hägg.
Aseltine)
as no standardisation has been reported in the literature. Consequently, a variety of different
global ratings has been published together with a number of different analytical strategies.
Whether this affects the results reported in clinical trials using transition questions is unknown
and deserves clarification.
Neither self-report HMS, nor transition ratings are “gold standard” measures. The transition question is used as a de facto “gold standard” indicator of meaningful change of a person’s
health. A change in health state is a subjective evaluation where the individual concerned is the
best judge of whether the change was important and/or meaningful. Consequently, the patient
rated global retrospective assessment of treatment effect is probably the most valid indicator of
change.
Receiver Operating Characteristic Method
Receiver Operating Characteristic (ROC) curve analysis originates from the signal detection
theory in the 1950s, where radar operators needed to be able to distinguish the “signal” of a
real target, from the background “noise” of the radar. Although primarily used in medicine to
assess the ability of diagnostic tests to identify diseased from non-diseased individuals, Deyo
and Centor [93] suggested that evaluative scales are analogous to “diagnostic tests” in the way
that they should be able to discriminate between clinical improvement and non-improvement
based on an external criterion (anchor). In this approach the ROC curve plots the “true positive
rate” (sensitivity) against the “false positive rate” (1-specificity) for a series of change scores or
“cut points”. The ROC curve is constructed using n change score cut-points that are plotted
42
3. E VALUATION OF H EALTH M EASUREMENT S CALES
on the graph. The points are then joined with a smooth curve. The process of constructing a
ROC-plot is illustrated in Figure 3.6.
Figure 3.6. The construction of a ROC curve.
NC
(n = 500)
n
1
Line:
2
3
Imp
(n = 500)
4
5
6
7
8 Change score
Line:
Change
(cut-point)
≥2
≤2
Imp
500
0
Line:
Change
(cut-point)
≥4
≤4
TQ
NC
350
150
Imp
480
20
Change
(cut-point)
≥6
≤6
TQ
NC
30
470
Imp
325
175
TQ
NC
0
500
Sensitivity: 1.00
Sensitivity: 0.96
Sensitivity: 0.65
Specificity: 0.30
Specificity: 0.94
Specificity: 1.00
Sensitivity 1.0
(true-positive rate)
0.8
AUC
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
1-specificity
(false-positive rate)
TQ, transition question; Imp, patients who have changed an important amount according to an external
anchor; NC, patients who have not changed according to an external anchor.
Note: The graph at the top represents the distribution of a fictive cohort of 1000 LBP patients plotted according to their change scores. All patients have been classified as having changed an important amount
(Imp, n = 500) or stayed the same (NC, n = 500). The stippled lines represent examples of cohort dichotomisations (cut-points) in relation to their change scores, and sensitivity and specificity have been
calculated for each of the three cut-points. Finally, the sensitivity is plotted against 1-specificity for each
Legend: The construction of a ROC curve.
cut-point and a ROC-curve generated.
Footer: TQ, transition question; Imp, improved patients; NC, no change patients; AUC, area under the curve; n,
number
of patients area under the ROC curve (ROCauc ) indicates the probability of correctly rankThe resulting
ing an improved or non-improved patient. A value of 1 for the ROCauc represents perfect accuracy - 100% correct diagnosis or identification of improved health status. On the other hand,
0.5 (50%) represents health status identification that is no better than chance alone. A useful
3.1. C ONCEPTS OF E VALUATION C RITERIA
43
test will be one with a steep rising curve indicating that as sensitivity increases, the rate of
false positives remains low (Figure 3.6). A perfect test would be one where high sensitivity and
specificity were attained simultaneously. In real life, tests rarely perform so well, and sensitivity usually decreases as specificity increases and vice versa. The change score cut-point closest
to the top left corner of the graph indicates the value that gives the best sensitivity-specificity
trade-off. This point represents the optimal cut-point and has been equated with the MCID of
HMS [121].
The strengths of the ROC curve analysis have been summarised by Deyo et al. and Stratford
et al. [103, 122]:
•
•
•
The difference between competing HMS to correctly classify improved and non-improved
patients can be statistically tested
The use of an external criterion allows comparison of HMS abilities to measure clinically
meaningful change
The point on the upper left-hand corner of the curve indicates the change score that is the
most efficient at correctly differentiating subjects who have improved from those who have
not
However, determination of the optimal cut-point (MCID) by the ROC curve method has three
main limitations. First, the choice of an appropriate anchor depends on the preference of the
researcher, and this is likely to affect the size of the obtained optimal cut-point. Second, the
external criterion has several methodological weaknesses, one of which is selecting the appropriate dividing-point on the transitional scale to dichotomise the cohort into improved and
non-improved patients (please refer to Section 3.1.6 for further discussion). Last, it has been
demonstrated that the optimal cut-point is dependent on the patient’s initial score [123] - i.e. the
higher baseline score of the patient the larger is the required change score before it is clinically
relevant. Whether baseline score is the only parameter which affects the optimal cut-point is
unknown.
In summary, the longitudinal anchor-based approaches are rapidly gaining popularity due
to their clear advantage of linking change in a HMS to a meaningful external criterion.
The advantage of global ratings is that they provide the single best measure of the clinical
significance of the change experienced by the individual. This has to be weighted against
the limitations using transition ratings. First, and probably most important, is the absence
of a gold standard of clinically meaningful change, and the consequent reliance on a transitional scale as a de facto gold standard. Second, change score interpretation depends on
the reliability and validity of transition questions which have been questioned by several
authors. Finally, determination of the optimal cut-point (MCID) may well lie within the
measurement error for the HMS questioning the validity of the obtained MCID.
44
3. E VALUATION OF H EALTH M EASUREMENT S CALES
Combined Approaches
The MCID can be determined using both distribution-based and anchor-based methods
[110]. The distribution-based method has the advantage of being able to determine the measurement precision of an HMS at the expense of interpretability of the change scores. On the
other hand, the anchor-based approach has the ability to establish the clinical significance of
the change scores at the expense of often being within HMS measurement error. Consequently,
a new breed of methods has been developed over the past few years which combine the advantages of the distribution-based and anchor-based methods [124–127].
The combined approaches have been developed to establish a more reliable minimal important difference (MID) in cancer research. In the method described by Eton et al. [125] results from both distribution-based and anchor-based analyses were synthesized into a range
of MIDs for the FACT-B and its subscales. Criteria of 1/3 SD, 1/2 SD and the SEM were chosen to represent the distribution-based MIDs and the range was established by calculating the
means of each criteria at all time points. The range of MIDs was refined by applying both
cross-sectional and longitudinal anchor-based analyses. These analyses involved establishing
clinically distinguishable comparison groups which were compared at a single point in time
(cross-sectional scores) and over time (longitudinal change scores). The effect sizes and means
of the group differences were calculated to establish the range of MIDs. Last, a synthesis of
the distribution-based and anchor-based data was carried out and the mean differences corresponding to effect sizes between 0.2 and 0.6 used to precisely specify the range of MIDs (effect
sizes in the low- to mid-part are most likely to incorporate the MID).
The advantage of using a combined approach to establish a range of MIDs is the use of
multiple external criteria which are derived from:
•
•
•
•
The statistical characteristics of the sample i.e. takes measurement error into account
Patient-reported global outcome of treatment
Patient-reported functional and/or pain outcomes
Physician-reported functional and/or pain outcomes
Choice of Appropriate Change Coefficient
The choice of change coefficient in studies of responsiveness is complicated and no consensus on the optimal strategy exists (see Section 2.2.3 on page 21). The reasons for this confusion
are probably rooted in the many different definitions of responsiveness [61] and the absence
of a gold standard for change in health status. Consequently, authors have typically applied
multiple responsiveness indices to the same data set in order to increase confidence in their
conclusions [3, 115, 128]. Stratford et al. [62] have called this approach for a “shotgun analysis”
since conflicts in the application of several indices exist. They outline guidelines for choosing the optimal change coefficient based on study design and sample change characteristics
(Table 3.7 on the next page).
45
3.1. C ONCEPTS OF E VALUATION C RITERIA
Table 3.7. Study designs and their corresponding analytic methods.
Study design
Description
Examples of change coefficients and
statistical tests
Coefficients based
on homogeneity of
patients’ change
characteristics
Sample of patients expected to change
by approximately the same amount over
the study period.
Example: an effective intervention
applied to a homogenous patient cohort
who is expected to respond well.
Change coefficient:
- SRM
Statistical tests:
- Paired t-test
- ANOVA (one within-patient factor
i.e. before and after treatment)
Between group
contrast coefficients
Two or more identifiable subgroups of
patients who are expected to change by
different amounts over the study period.
Example: an effective intervention
applied to patients with different
severity of their problem or different
diagnoses.
Change coefficient:
- ROC curve analysis
Statistical tests:
- Norman’s Srepeat
- Unpaired t-test
- ANOVA (with one within patient
factor and one grouping factor i.e.
amount of change)
Correlation
coefficients
Sample of patients, many of whom are
expected to truly change by different
amounts over the study period.
Example: an effective intervention
applied to a heterogeneous patient
cohort where only some are expected to
respond well.
Change coefficient:
- correlation analysis
- requires application of an external
standard i.e. another similar HMS
or a transition scale.
SRM, standardised response mean; ANOVA, analysis of variance; ROC, receiver operating characteristic
[62]
Legend: Study designs and their corresponding analytic methods.
In addition, they suggest researchers to focus on two important issues when planning studFooter:
SRM, standardisedfirst,
response
mean; ANOVA,
analysis
of variance;
ROC, receiver
operatingthe
characteristic
ies
of responsiveness:
to develop
a sound
theoretical
approach
emphasizing
included
(ref: Stratford et al, 2005)
cohorts likely change characteristics, and second, to select the more rigorous designs which
allow both the assessment of change and discrimination among different groups of patients.
4. O BJECTIVE AND A IMS
O BJECTIVE
The overall objective of the PhD thesis was to establish which questionnaires were most
appropriate in specific subgroups of patients with low back pain and to establish what patients
think is a clinically relevant change when scoring these questionnaires. In addition, we wanted
to explore whether low back pain patients would be able to determine an acceptable treatment
outcome before it begins.
A IMS
The specific aims of the PhD thesis were:
PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
•
•
•
To translate and cross-culturally adapt the Oswestry Disability Index into the Danish language and to validate it in two sub-populations of low back pain patients (paper I-1 & I-2).
To concurrently compare responsiveness and minimal clinically important differences for
commonly used pain and functional instruments in four sub-populations of low back pain
patients (paper I-3).
To propose a standardised use of patients’ retrospective perception of treatment effect based
on analysis of responsiveness (paper I-4).
PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
•
To develop a prospective method to determine patients’ acceptable outcome using standardised questionnaires and concurrently compare this to a well established retrospective
method and measurement error of the questionnaires (paper II-1).
5. M ETHODS
5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
5.1.1. T RANSLATION AND C ROSS - CULTURAL A DAPTATION
The translation and cross-cultural adaptation of the ODI (version 2.1) followed the five
stages outlined in recent guidelines (see Section 3.1.1 on page 26) [77, 78]. Four independent
translators, one methodologist, one clinician, one language specialist and a coordinator participated in the process. Divergences in translations were resolved by consensus and written
documentation was produced for each stage of the process. The pre-final version was tested for
content validity, wording, ease of understanding, and missing items in 40 patients (20 primary
sector (PrS) and 20 secondary sector (SeS) patients) followed by a semi-structured interview.
Further psychometric testing of the final version of the Danish ODI was carried out in a validation study.
5.1.2. PATIENTS AND S ETTING
Inclusion criteria for the study were:
•
•
•
Age above 18
Presence of low back pain and/or leg pain when presenting to: 1) one of the included
chiropractic clinics, or 2) the out-patient hospital back pain clinic
Able to read and understand Danish
Exclusion criteria were:
•
•
Suspected pathological disorder of the spine (fractures, spinal infections or malignancy,
ankylosing spondylitis, rheumatoid arthritis or other inflammatory diseases)
Patients with a known psychiatric disorder
A total of 233 consecutive patients with acute and chronic LBP were included in the study;
94 from the PrS (seven chiropractic practices) and 97 from the SeS (a hospital based multidisciplinary spinal unit). Questionnaire booklets were collected at baseline, at day one for PrS
patients, at one week for SeS patients, and eight weeks follow-up. A telephone interview was
conducted 3-5 days after the eight weeks follow-up by a professional interviewer from the
Danish National Institute of Social Research to obtain the patients’ retrospective assessment of
treatment effect (Appendix I on page 81).
48
5. M ETHODS
5.1.3. O UTCOME M EASURES
The questionnaire booklet included the final version of the Danish ODI, the 23-item RMQ
[22, 50], the two subscales of the Low Back Pain Rating Scale: pain (LBPRSpain ) and disability
(LBPRSdisability ) [47] as well as the two subscales of the SF36: physical function SF36 (pf) and
bodily pain SF36 (bp) scales [129–131] (Appendix II on page 82).
The test-retest booklet (1 day/1 week follow-up) contained the Danish ODI with the questions rearranged and a global question of change (Appendix III on page 91).
The patients’ global retrospective assessment of treatment effect was measured using two
different transition questions (Appendix IV on page 94). Transition question 1 (TQ1) was a
7-point Likert scale [108] and transition question 2 (TQ2) was a 15-point scale [69]. In addition,
a 0-10 numeric rating scale of importance of the change was included. For paper I-2 and I-3 the
TQs were combined to one according to Table 5.1.
Table 5.1. Merging transition questions 1 and 2.
Transition question 2
A very great deal better
A great deal better
Transition question 1
Much better
A good deal better
Moderately better
Better
Somewhat better
A little better
A little better
Almost the same, hardly any better
About the same
Almost the same, hardly any worse
No change
A little worse
Somewhat worse
A little worse
Moderately worse
A good deal worse
Worse
A great deal worse
A very great deal worse
Much worse
5.1.4. S UBGROUPS
Legend: Merging transition question 1 and 2
Patients available at the eight weeks follow-up were divided into four subgroups after either pain location or point of entry into the health care system at baseline (condition severity).
Patients were divided into LBP only and leg pain and/or LBP patients for pain location and
PrS and SeS patients for patient entry point.
5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
49
5.1.5. S TATISTICAL A NALYSES
PAPER I-1
Internal Consistency
Internal consistency was measured using Cronbach’s α and item-total correlations (Section 3.1.2 on page 28). In addition, each item score was graphed against the five score categories
as described by Fairbank et al. [132].
Reproducibility
Agreement. Agreement (reproducibility and repeatability in paper I-1) was measured using
the LOA plot as outlined by Bland and Altman (Section 3.1.4 on page 30) [88].
Reliability. Reliability (reproducibility and repeatability in paper I-1) was calculated as the
ICCagreement (Section 3.1.4 on page 32).
Floor & Ceiling Effect (scale width)
Conventional floor and ceiling effects were calculated. In addition, the scale width using the
LOA at each end of the scale was estimated (Section 3.1.3 on page 29).
Validity
Cross-sectional discriminant validity. Discriminant validity was assessed for two levels of 1)
symptom location, 2) pain duration, and 3) medication frequency.
Concurrent validity. Three aspects of concurrent validity was examined as outlined in Table 5.2 on the next page.
Longitudinal external construct validity. This was assessed by comparing the change score of
the ODI to that of the external measures using Pearson’s correlation coefficient (R).
PAPER I-2
Responsiveness
Change score comparisons. The ODI mean change scores were compared: 1) in PrS and SeS
patients (paired t-test), and 2) to each of the external instruments (paired t-test). In addition,
all HMS change scores were: 1) analysed for each TQ category using a robust linear regression
analysis, and 2) compared in “important improvement” and “no change” patients.
Distribution-based responsiveness. SRMraw and SRM% were calculated for: 1) the overall
change scores, 2) for each TQ category, and 3) for “important improvement” and “no change”
patients (Section 3.1.6 on page 36).
50
5. M ETHODS
Table 5.2. Concurrent validity and statistical tests examined in study I.
Aspects of concurrent validity
(a) Within- and between-scale systematic differences (baseline and 8
weeks follow-up)
− Comparison of the mean ODI score to that of the external HMS
− Comparison of the mean ODI score between PrS and SeS patients
− Comparison of the mean ODI score to that of the external HMS in
PrS and SeS patients
Statistical tests
Regression model with an
interaction term
(b) Spread of HMS scores (baseline and 8 weeks follow-up)
− Comparison of ODI SD to that of the external HMS
− Comparison of ODI SD to that of the external HMS in PrS and
SeS
Variance comparison test
(c) Individual patient score level
− Comparison of ODI score level to that of the external HMS
Bland & Altman LOA plots of
standardised scores
ODI, Oswestry Disability Index; SD, standard deviation; PrS, primary sector; SeS, secondary sector; LOA,
limits
of Aspects
agreement
Legend:
of concurrent validity measured in study I.
Footer: ODI, Oswestry Disability Index; SD, standard deviation; PrS, primary sector; SeS, secondary sector
Anchor-based responsiveness. ROC statistics were used for the anchor-based approach (Section 3.1.6 on page 39). The ROCauc and the optimal cut-off change score in both the PrS and
SeS patients were determined for all the questionnaires. In addition, PrS and SeS patients were
stratified into six ODI baseline entry score categories and optimal cut-off change scores were
calculated for each category and plotted against baseline entry scores. Weighted linear regression was used to determine the change in MCID with changing baseline entry score. The effects
of both baseline entry score and patient entry point on classification of patients into “important
improvement” and “no change” were analysed using diagnostic tests statistics.
Transition question correlations. Spearman’s correlation coefficient was used to establish validity between the TQ and the HMS change scores [133, 134].
PAPER I-3
Responsiveness
Distribution-based responsiveness. SRMraw was calculated for: 1) the change scores in each
of the four subgroups, and 2) for “important improvement” and “no change” patients (Section 3.1.6 on page 36). Confidence intervals were estimated using a bootstrap method [135].
To compare the SRMraw of the different questionnaires within each subgroup, the SRMraw was
estimated using stata’s regression command with group indicators and the cluster option to
account for intra-individual correlation between responses. The differences between SRMraw
were examined with a non-linear Wald test [136]. The same procedure was used to test the
difference between “important improvement” and “no change” groups within each subpopulation.
5.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
51
Anchor-based responsiveness. ROC statistics were used for the anchor-based approach (Section 3.1.6 on page 39). The ROCauc was determined in each of the four subgroups and an
omnibus comparison within each subpopulation was carried out using a non-parametric approach [137]. Overall, quarter-specific, pain location specific, and patient entry point specific
MCIDs were determined by an optimal cut-point analysis using both the raw (MCID) and percentage (MCID% ) change scores. Categories with less than 10 patients were excluded from the
analysis. Moreover, the dependence of the MCID on baseline score was adjusted by a weighted
linear regression.
PAPER I-4
Measures of Serial Change and their Analyses
Three disability HMS (ODI, RMQ and SF36 (pf)) and two pain scales (LBPRSpain and NRSpain )
were included in the analyses and transformed to cover an interval ranging from 0 - 100. The
responsiveness of the serial change was expressed using the SRM.
Measures of Transition and their Analyses
Two different transition questions measuring the patients’ global retrospective assessment
of treatment were included in the analyses. Group A received TQ1 (7-point scale) and group
B received TQ2 (15-point scale) and both scales had different introductory questions. In addition, both groups rated the importance of the change in health state experienced during the
treatment on a NRSimp .
Dichotomisation of the transition questions. All patients were dichotomised as having either
improved or stayed the same based on the transition question alone or based on the transition
question and the global rating of importance in combination.
Stringent and less stringent criteria for improved and unchanged patients were defined for
four external criteria: TQ1, TQ1+NRSimp , TQ2 and TQ2+NRSimp . Patients who deteriorated
were excluded from the analyses [72, 138–140].
Responsiveness of retrospective change. The SRM for the retrospective change was calculated from the coded TQ responses. TQ1 was coded as follows: much better = 3, better = 2, a
little better = 1, no change = 0, a little worse = -1, worse = -2 and much worse = -3. Similarly,
TQ2 was coded from 7 to -7. We used the mean of the coded post-treatment response as the
numerator and the SD of the mean coded post-treatment response as the denominator. Hence,
both positive and negative values can occur in the numerator making serial and retrospective
SRMs comparable.
Comparison of the Measures of Serial Change and Transition
For each of the included instruments we calculated four different ROCauc - one for each
external criterion - to determine the influence of different external criteria on the magnitude of
52
5. M ETHODS
the ROCauc . Moreover, the dependence of the ROCauc on the TQ dichotomisation procedure
was determined by comparing stringent and less stringent dichotomisations in a Bland and
Altman LOA plot.
5.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
5.2.1. PATIENTS AND S ETTING
Patients suffering from treatment resistant chronic low back pain and/or leg pain were
recruited from an out-patient hospital back pain clinic in 2005. Inclusion and exclusion criteria
were the same as those described in Section 5.1.2 on page 47.
5.2.2. P ILOT S TUDY
Face and content validity of the modified questionnaires (Section 5.2.4) was tested on 25
consecutive chronic LBP patients in a semi-structured interview (Appendix V on page 97) and
documented in a written report.
5.2.3. M AIN S TUDY
One-hundred-and-fourty-seven chronic LBP patients receiving conservative care were followed over an eight week period. Questionnaire booklets were filled in at baseline before commencing the treatment (Appendix VI on page 100), at one-week follow-up (Appendix VII on
page 110), and at eight-weeks follow-up (Appendix VIII on page 111). In addition, a telephone
interview was carried out at nine-weeks follow-up (Appendix IX on page 116).
5.2.4. O UTCOME M EASURES
Two sets of pain and functional/psychological outcome measures were completed at baseline: 1) ordinary outcome measures, and 2) modified outcome measures.
The ordinary outcome measures. These consisted of the ODI, the multidimensional Bournemouth
Questionnaire (BQ) [141, 142] and the NRSpain measured over the past week.
The pre-treatment acceptable outcome measures. These consisted of modified versions of the
ODI, BQ and NRSpain . The introduction to the modifed questionnaires asked the patient to
differentiate between what they considered an acceptable result and their expectations/hopes
to the treatment. Second, all the questions in each HMS were modified to include the following
basic question:
“Please indicate what you consider to be an acceptable level of (e.g. pain) after completion of the
treatment if you had to accept some (e.g. pain)?”
5.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
53
Design. At one week follow-up all patients completed both the ordinary and pre-treatment
acceptable outcome measures including questions indicating change since baseline. At eight
weeks follow-up patients completed the ordinary HMS and a 7-point TQ.
5.2.5. S TATISTICAL M ETHODS
PAPER II-1
Reproducibility
A summary score of the pre-treatment acceptable post-score was generated for each of the
modified instruments. The pre-treatment acceptable summary scores were tested for agreement (reproducibility in paper II-1) using the Bland and Altman LOA plot.
Concurrent Validity
The MCID determined before treatment (MCIDpre ) was compared to measurement error
(MDC95% and the lower limit of agreement - LOAlower ) and a post-treatment anchor-based
method of establishing the minimal clinically important difference - the MCIDpost . This was
established in four subgroups: 1) patients with LBP only, 2) patients with leg pain and/or LBP,
3) patients with LBP duration ≤6 months, and patients with LBP duration > 6 months.
The MCIDpre . This was calculated by subtracting the acceptable post-treatment score determined pre-treatment from the ordinary pre-treatment score for each item. A summary of the
change score was calculated for each instrument by summing the MCIDpre for each item.
The MDC95% . The minimal detectable change has been described in Section 3.1.4 on page 30.
It was computed using ANOVA for random effects.
The LOAlower . This is the lower 95% confidence level of the LOA plot and can be interpreted
as the instrument measurement error when patients improve. Thus, any score change outside
the LOAlower should be considered a “real improvement”.
The MCIDpost . The MCIDpost was established by determining the optimal cut-off change score
using ROC curve analysis. Confidence intervals for the MCIDpost were estimated using stata’s
programming function to calculate the optimal cut-point and a bootstrap procedure.
Acceptable Treatment Outcome
To establish whether our cohort of chronic patients was able to determine an acceptable
outcome of treatment before it began, the MCIDpre was compared to: a) the post-treatment acceptable change, b) the MCIDpost and c) the overall post-treatment change. The post-treatment
54
5. M ETHODS
acceptable change was defined as the mean serial change score in patients who rated themselves as “better” or “much better” on the TQ. Statistical significance between the groups was
tested using Wilcoxon rank-sum test.
6. S UMMARY OF R ESULTS
6.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
6.1.1. PARTICIPANTS
A total of 191 LBP patients equally distributed between the PrS and SeS were available for
analysis at eight weeks follow-up. PrS patients generally had LBP only, were more acute and
less disabled compared to SeS patients, however, pain intensity was similar. In addition, all
demographic characteristics were almost identical (except for more patients with LBP in group
B - TQ2) when stratified according to the type of TQ.
6.1.2. PAPER I-1
Translation and Cross-cultural Adaptation
The process of translating and cross-culturally adapting the ODI lasted almost four months
and four Danish versions were produced before the final version was finished (Appendix X on
page 117). Few disagreements arose during the process which were satisfactorily resolved by
consensus.
Internal Consistency
Cronbach’s α was 0.88 for all patients, 0.89 in PrS and 0.85 in SeS patients. Moreover, all
items contributed to the total score and these belonged to the same latent variable of pain
related function.
Reproducibility
Agreement. The mean difference and 95% LOA for all patients were 0.8 [-11.5 to +13.0] with
no noteworthy difference between PrS and SeS patients. (This is referred to as reproducibility
and repeatability in paper I-1).
Reliability. The ICCagreement was 0.91 for all patients, 0.93 in PrS and 0.89 in SeS patients. (This
is referred to as reproducibility and repeatability in paper I-1).
56
6. S UMMARY OF R ESULTS
Floor & Ceiling Effect (scale width)
A total of 25 patients (10.7%) scored within the lower score range i.e. 0 - 11.5 points. These
were mainly PrS patients.
Validity
Cross-sectional discriminant validity. The ODI could discriminate between subgroups of patients with regard to all the chosen medical history variables, however, only the difference in
medication usage was considered clinically relevant.
Concurrent validity. The ODI was compared to all included external HMS and showed: 1)
10%-21% lower measurements with only small differences between PrS and SeS patients, 2) a
statistically significant narrower score spread with no differences seen between the two groups,
and 3) comparable standardised baseline scores lying in the range of ±1.3 to ±1.7 SDs, again
with no differences among the two groups.
Longitudinal external construct validity. A change score correlation analysis showed coefficients ranging from 0.56 - 0.78.
6.1.3. PAPER I-2
Responsiveness
Change score comparisons. The ODI was characterised by a significantly smaller change score
reduction compared to most external HMS. Furthermore, there was an almost linear increase
in mean change score with improving TQ category comparable to that of the external HMS.
Last, the difference in mean change score between “important improvement” and “no change”
patients was more or less 15 points which was in agreement with the external measures.
Distribution-based responsiveness. The ODI showed comparable results to the external HMS
with respect to: 1) a SRMraw of 0.7 and SRM% of 0.6 for all patients, 2) higher SRMs in the “important improvement” group compared to the “no change” group, and 3) a gradual increase in
SRM with a progressive patient improvement. Moreover, the ODI was less sensitive to change
in PrS patients compared to the RMQ.
Anchor-based responsiveness. For all patients, the ODI ROCauc was 0.82 for the raw change
score and 0.84 for the percentage change score. The MCID was nine points (71%) for PrS and
eight points for SeS (27%) patients and dependent on baseline entry score primarily in PrS
patients (6.6 points increase for every 10 points increase in baseline score).
Transition question correlations. The correlation coefficient (R) between the ODI change score
and the TQ was 0.6 for all patients.
6.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
57
6.1.4. PAPER I-3
Responsiveness
Distribution-based responsiveness. The RMQ was the most responsive disability HMS for
patients with LBP only (SRMraw : 0.5 - 1.4). The ODI and RMQ were equally responsive in
leg pain patients (SRMraw : 0.3 - 0.9). For the pain measures, the SF36 (bp) had the highest
responsiveness in all subgroups (SRMraw : 0.6 - 1.4). For pain and disability, the SF36 (bp) and
RMQ showed the largest differences in SRMraw between the “important improvement” and
“no change” groups in all subgroups, respectively.
Anchor-based responsiveness. The RMQ showed the highest ROCauc in LBP only patients
(both PrS and SeS) whereas the ODI was marginally superior in the leg pain patients. For the
pain measures, the LBPRSpain was the superior instrument in the LBP only patients. Similar
discriminative abilities were observed in the other subpopulations.
Regarding the MCID, the following was observed: 1) the overall MCID showed only minor
variations in the four subgroups, 2) the MCID increased with increasing baseline entry score
mainly in the PrS patients, and 3) the MCID% was almost constant across the score groups for
the ODI and RMQ but differed in the subgroups.
6.1.5. PAPER I-4
Patient Classification
The proportion classified as improved using stringent criteria resulted in 6 - 7% fewer patients being classified as improved when using TQ1 (7-point response scale) compared to TQ2
(15-point response scale). No difference was seen using the less stringent criteria.
Responsiveness
Distribution-based responsiveness. The retrospective TQ showed slightly higher SRMs (range:
0.8-0.9) compared to the serial instruments (range: 0.6-0.7) when considering all patients. This
difference was most pronounced in the PrS patients (serial SRMs: 0.9-1.2 vs. retrospective
SRMs: 1.7) with no difference in the SeS patients.
Anchor-based responsiveness. The magnitude of the ROCauc varied only slightly across the
four stringent external criteria for all HMS (largest difference between minimum and maximum criteria was 0.09) with slightly larger differences between criteria in PrS and SeS patients.
The ROCauc was slightly smaller (average: 0.02) and the variation slightly larger using the less
stringent criteria.
58
6. S UMMARY OF R ESULTS
6.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
6.2.1. PARTICIPANTS
A total of 119 LBP patients completed the eight weeks follow-up (response rate: 83.7%) and
were available for analysis. The pre-treatment acceptable post-score was below 32 points for
most patients, however, the distribution of the acceptable post-scores varied according to the
type of HMS.
6.2.2. M AIN S TUDY
Reproducibility
The systematic difference and 95% LOA were 0.8 [-6.6; 8.2] for the modified ODI, -0.2 [-8.8;
8.4] for the modified BQ, and 0.0 [-1.9; 1.9] for the modified NRSpain .
Concurrent Validity
The pre-treatment acceptable change for chronic LBP patients scoring the ODI was a 26%
reduction whereas this figure was 36% for the BQ and 42% for the NRSpain .
The MCIDpre was outside measurement error (MDC95% and LOAlower ) and approximately
4.5 times larger compared to the MCIDpost for the ODI and 1.5 times larger for the BQ and
NRSpain . Patients with leg pain ± LBP generally expected a larger change pre-treatment before
it was acceptable in comparison to patients with LBP only. No differences were seen with
regard to symptom duration.
Acceptable Treatment Outcome
The MCIDpre was almost identical to the post-treatment acceptable change for the NRSpain
whereas it was significantly larger for the ODI and BQ.
7. D ISCUSSION
7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
7.1.1. D ISCUSSION OF F INDINGS
The Danish Oswestry Disability Index
Cross-cultural adaptation. The English version of the ODI was translated and cross-culturally
adapted into the Danish language using current guidelines [77, 78] and I consider it both reliable and conceptually valid. The process was both time consuming and demanded considerable resources. In light of the relatively few discrepancies discussed during the five stages
of the adaptation, one may question the value of this cumbersome process in comparison to a
cross-cultural adaptation involving less resources. I suggest the following points to be considered when allocating resources for adapting a HMS:
•
•
•
The similarity of the language and cultural setting in which the original HMS was developed.
The complexity of the HMS. Does the HMS contain difficult language requiring experts to
obtain semantic, idiomatic, experiential and conceptual equivalency to the original HMS?
The translations. Are two translations and back translations necessary? If not, then stage 2
can be omitted and stage 1 and 3 shortened.
Internal consistency and reproducibility. Internal consistency was similar in the two populations (PrS, 0.89; SeS, 0.85) and comparable to previously reported coefficients [13, 143, 144].
Furthermore, reproducibility was measured using the ICCagreement and LOA in stable patients.
The ICCagreement (0.91) was acceptable falling within the range of 0.76 to 0.94 reported in the
literature [83, 145, 146]. Similarly, the LOA measurement error (11.5 points for improvement1 )
was close to published values of the MDC95% ranging from 10-13 points [83, 138, 144, 146, 147].
However, caution has to be taken as the ODI version and the level of confidence differ in these
studies.
Floor and ceiling effects. The ODI showed no floor and ceiling effects using the conventional
definition as described in Figure 3.2 on page 30. In contrast, there was a pronounced floor effect
1 This was erroneously reported as 12 points for worsening rather than for improvement in paper I-1. Thus, a “real”
worsening was an increase in change score of 13 points while a “real” improvement was a decrease of 12 points.
60
7. D ISCUSSION
in PrS patients (14.1%) using the more sensible scale width method. Consequently, it seems
logical to question: 1) the usefulness of the Danish ODI in PrS patients, and 2) the usefulness
of the conventional method of establishing floor and ceiling effects. I recommend using the
methodologically superior scale width method as a benchmark to detect instrument scaling
problems at the extremes.
Validity. Several aspects of criterion and construct validity were tested for the Danish ODI and
some deserve special attention. The ODI showed 10-21% lower mean scores compared to the
external HMS, had the poorest spread of patient disabilities in both PrS and SeS patients, and
showed individual score levels between ±1.3 to ±1.7 SDs compared to the external disability
and pain scales, respectively. This confirms the belief that the ODI is more appropriate for
patients with a greater degree of disability [20, 27, 34] but also that its ability to discriminate
between patient disabilities may be problematic.
Responsiveness. Responsiveness of the ODI was scrutinised using both distribution and anchor-based methods and the most important findings will be discussed here. First, the ODI
mean change score in PrS patients was generally lower compared to most of the external instruments reinforcing the appropriateness of the ODI in patients with a high degree of disability.
Moreover, the ODI mean change score difference between “important improvement” and “no
change” patients were 13 and 10 points in PrS and SeS patients respectively, and this concurs
well with other studies [138, 148]. Second, the ODI had the second highest SRM of all the disability instruments with the RMQ showing the highest values. The difference between ODI and
RMQ was most pronounced in the PrS patients and agrees with several studies [13,22,140,149].
Third, the ROCauc for the ODI (0.82) was comparable to that of the external instruments with
no difference between the two patient populations. This was surprising as differences were
seen in the SRM and PrS and SeS patients differ in a number of baseline characteristics.
Minimal clinically important difference. ODI cut-off scores were determined to express the
MCID. The change scores which seemed important to LBP patients were nine points (71%)
in PrS and eight points (27%) in SeS patients and increased with approximately 5 points for
every 10 points increase in baseline score. This was within reported values in the literature
which vary widely (range from 4-23 points) [138, 140, 146, 148, 150–152] reflecting the diversity
of available methodologies and the differences in the range of baseline scores. This seems to
be a general problem as other HMS also show a wide range of reported MCID values (e.g.
SF36 (pf): 7-16 points [22, 153]). Accordingly, it can be argued that the accuracy of classifying
patients as improved or unchanged using a single cut-off change score may not be valid. I recommend researchers to report a range of MCIDs and either: 1) adjust for baseline dependence
in a multivariate analysis, or 2) show how this varies according to baseline scores. A range
of MCIDs can be obtained by using the combined approach which has several methodological
7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
61
advantages compared to the distribution-based and anchor-based methods alone as outlined
in Section 3.1.6 on page 44.
The wide range of MCID values reported for the ODI has consequences for reporting proportions of improved patients and the numbers needed to treat (NNT) of clinical trials as: 1) the
choice of ODI cut-point to dichotomise patients as improved or unchanged is almost arbitrary
in reported studies, and 2) to the author’s knowledge, the effect of baseline score on the MCID
is not accounted for in the reported results. As a result I suggest that researchers consider the
following when reporting proportions and NNT:
•
•
•
Carefully match own study population and settings to that of the MCID study or design
own study to allow for MCID calculations
Carefully consider the methodology used to establish the MCID
Include MCID baseline dependence in the calculations of proportions and NNT
Responsiveness and Subgroups
Concurrent comparisons of responsiveness and MCID calculations in subgroups are rare
[39, 72, 149] and lacking behind clinical intervention research (Section 2.1 on page 17). This
study concurrently compared responsiveness and MCIDs in four subgroups of LBP patients
relevant to clinical research: 1) primary sector patients, 2) secondary sector patients, 3) LBP
only patients, and 4) leg pain ±LBP patients.
The results demonstrate that responsiveness of an HMS varies according to the patient population it is applied to and that this was most pronounced between the disability instruments
and between PrS and SeS patients. The RMQ or ODI proved to be the disability HMS of choice
in all subgroups, however, the patients’ global retrospective assessment of treatment effect
appeared to be the most responsive instrument in the PrS patients. Moreover, a moderate to
large difference between the SF36 (bp) and the rest of the pain scales was observed. In spite of
this, I conclude that the pain measures have similar responsiveness based on the fact that the
SF36 (bp) showed poor specificity in all subgroups questioning the validity of this scale in LBP
patients.
In an attempt to simplify the choice of pain and disability scales, an algorithm was developed based on the subgroup study and the timing of the intervention programme (Figure 7.1
on page 67).
Patients’ Global Assessment of Treatment Effect
The use of TQs in clinical research are becoming the norm rather than the exception, however, no standardisation on wording, response options and dichotomisation procedures has
been agreed upon. Thus, I set out to compare four different external criteria (two TQs and two
TQs combined with a rating of importance) in PrS and SeS patients. We found that the choice
of external criteria changed the proportion of patients classified as improved and unchanged
(6 - 7% difference between TQ1 and TQ2) but did not influence the discriminative abilities of
62
7. D ISCUSSION
the HMS. This has two major implications: 1) interpretation of clinical trials may vary according to which procedure was chosen, and 2) between-study comparisons of e.g. proportions
of improved patients or NNT are difficult if not impossible. Consequently, I recommend that
investigators or clinicians use the results of clinical trials including external criteria to:
•
•
Closely scrutinise the transition question and dichotomisation procedures used when evaluating the results and conclusions.
Also pay attention to the transition question and dichotomisation procedures used when
comparing trial results.
A proposal for standardised use of transition questions has been outlined and summarised in
Table 7.1.
Table 7.1. Steps in standardising the construction of patients’ global assessment of treatment
effect.
Steps
Standardisation
Step 1: Introductory question
• Should be clear
• Should have a well defined time frame
• Focus on change in the area of interest
Step 2: Response options
• Should have seven response options with a “middle”
representing no change
• Should be short and clear
• Should have a logical progression
Step 3: Dichotomisation procedure
• Should use a stringent definition of what represents a clinical
relevant change from the patients point-of-view*
* In a seven-point transition question, the improved patients would rate themselves as either “much
better” or “better”, and the unchanged patients would rate themselves as “a little better”, “no change”
or
“a little
worse”.
Legend:
Steps
in standardising the construction of transition questions
* In a seven-point transition question, the improved patients would rate themselves as either “much better” or
“better” and the unchanged patients would rate themselves as “a little better”, “no change” or “a little worse”.
7.1.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS
The methodology used in this study was designed primarily for validating the ODI in Danish and secondarily for concurrent HMS comparisons and TQ analysis. As a result, several
methodological design features not mentioned in the papers deserve attention.
The primary and secondary sector patients. The study was based on patients seen in the primary sector (chiropractic clinics) and secondary sector (out-patient hospital back pain clinic)
of the Danish health care system to obtain a broad range of LBP patients. However, it can be
questioned whether these patient groups are in fact a representative samples of PrS and SeS
patients as a whole.
Back pain patients’ initial contact with the Danish health care system is via the primary
health care sector which comprises medical doctors, chiropractors, and physiotherapists. In
7.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
63
1997 Lønnberg compared, among other things, LBP patients’ contact patterns between medical
doctors, chiropractors, and physiotherapists [154]. It was found that chiropractic patients differed in several aspects compared to patients seen by medical doctors: 1) they were generally
older, 2) they were less disabled, 3) they had a better general health, and 4) a larger proportion
were engaged in active employment. If interpreted literally, then chiropractic patients are not
representative of PrS patients. However, caution is advocated interpreting the results this way
as new evidence does not support the differences found by Lønnberg [155]. First, the age distribution of the chiropractic patients approximately equals the patients seen by medical doctors.
Second, the proportion of chiropractic patients engaged in active employment had decreased
by 14% between 1997 and 2002. Third, the proportion of chiropractic patients without longstanding disease had decreased considerably in the same period. Last, the level of pain (visual
analog scale = 63-65 mm.) and disability (visual analog scale for activities of daily living =
62-64 mm.) at the initial visit was moderate to high and comparable to patients with cardiac
insufficiency and patients with chronic obstructive pulmonary disease. On the basis of this I
consider the included chiropractic patients representative of LBP patients seen in the PrS of the
Danish health care system.
For the SeS patients, we included only one out-patient hospital low back pain clinic and
these patients may not be representative for all SeS patients in Denmark. On the one hand, the
clinic receives referrals from the whole county of Funen which makes up 10% of the Danish
population and is said to be representative thereof [156]. On the other hand, referrals to the
hospital clinic are dependent on specific referral criteria which varies from county to county.
Moreover, the surgical patients are probably underrepresented as many are seen elsewhere. In
all, I believe these patients to be representative of the majority of the non-surgical SeS patient
despite the limitations mentioned above.
Inclusion of consecutive patients. A further limitation of the study is the inclusion of consecutive patients. A total of 23% of the available patients (PrS patients: 13%; SeS patients: 9%)
refused to participate in the study and it is unknown whether our results have been biased as
the included patients were not strictly consecutive. However, in comparison to a cohort of 293
strictly consecutive patients seen in the same hospital back pain clinic, the SeS patients’ mean
disability scores (LBPRSdisability ) were similar (our SeS cohort: 45.2 [SD 17.8]; consecutive SeS
cohort: 47.9 [SD 20.2]) (unpublished data). From baseline to 8 weeks follow-up, a further 18%
of the patients (PrS patients: 27%; SeS patients: 8%) dropped out, but only small differences
were seen in the dropout analysis. In summary, I conclude that the included cohort of patients
is representative for the PrS and SeS patients (but more so for the SeS patients) despite not
being strictly consecutive.
Stable patients. The selection of stable patients for the reproducibility calculations was chosen
on the basis of an external anchor of change. Difficulties arose in the PrS patients as true change
was likely to occur shortly after initiation of treatment. This had to be balanced against the
64
7. D ISCUSSION
possible recall bias from administering the retest questionnaires with a short time interval. The
questions in the retest booklet were shuffled to minimise recall bias, however, it is possible that
the reproducibility results are inflated in the PrS patients. As our results were comparable to
other reproducibility studies, I consider this bias negligible.
Merging transition questions. To obtain the same global ratings of change, the two TQs were
merged into one question as outlined in Table 5.1 on page 48. As these transition ratings had
different introductory questions and response options and were obtained by unvalidated telephone interviews, the critical reader may argue that this invalidates our results. However,
close examination of the discriminative abilities of the two TQs did not show any disparity,
and only small differences were seen in the patient classification (Section 6.1.5 on page 57).
On the basis of this I believe that the procedure of merging the TQs is valid but whether the
transition ratings were positively biased by the telephone interview remains speculative.
Outcome measures. Comparing ODI and RMQ to the dimensions of daily living (SF36 (pf))
and a combination of daily living and pain related function (LBPRSdisability ) may be problematic. Many items of the ODI and RMQ inquire about functional activities in relation to pain - a
slightly different dimension. Comparing responsiveness of related but different dimensions in
the same patient population is likely to give different results as patients tend to respond inconsistently to various dimensions during treatment (e.g. pain vs. function). Consequently, some
of the variability in responsiveness seen in our results may be attributed to the measurement
of different but related dimensions.
7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
7.2.1. D ISCUSSION OF F INDINGS
A method for determining LBP patients’ acceptable level of treatment outcome a priori using
standardised HMS was developed and compared to measurement error and a commonly used
a posteriori anchor-based method.
MCIDpre versus MCIDpost . The results showed a considerable gap between the MCIDpre and
the MCIDpost , some of which may be explained by the following two factors: 1) a response
shift occurring during the treatment, and 2) patients’ (lack of) ability to differentiate between
the concepts of “acceptable results of the treatment” and “ expectations/hopes to the treatment”. Both would result in an underestimation of what is acceptable, thus, overestimating
the MCIDpre . I suspect that the response shift had the greatest impact on our results for several
reasons. First, patient information is a cornerstone in the management at the out-patient hospital back pain unit and is emphasized continuously during the course of treatment. Accordingly,
it seems only natural if patients are influenced by the large amount of information received and
7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
65
in fact change their attitudes and behaviours. Second, the results showed that patients were
in fact able to distinguish between what is acceptable and their expectations/hopes in a small
randomised trial, and this is also supported in the literature [157].
MCIDpre versus post-treatment acceptable change. The study compared the MCIDpre with
what patients feel is acceptable change after treatment cessation, and the results show a disparity in the ODI and BQ but less so for the NRSpain . One likely interpretation is that LBP patients
have a clearer understanding of what is an acceptable change in pain intensity before starting
treatment in comparison to changes in functional and psychological/affective domains. This
has implications for patient satisfaction and the results of clinical trials. If patient established
acceptable outcomes are not matched to anticipated treatment efficacy before the treatment
begins it follows that these patients may become dissatisfied. Dissatisfied patients will almost
certainly report less favourable results of the treatment in clinical trials. The results show that
this is most important for functional and psychological/affective domains, and I suggest researchers and clinicians to incorporate this in clinical trials and clinical practice to enhance
satisfaction and treatment outcomes.
7.2.2. D ISCUSSION OF M ETHODOLOGICAL A SPECTS
The process of developing a novel a priori method to establish clinically relevant change was
laborious and time consuming. I undertook this demanding task as the method has several
methodological advantages compared to the commonly used retrospective method:
•
•
•
It does not rely on an external anchor of interpretability which is vulnerable to biases as
outlined in Section 3.1.6 on page 39.
Interpretability is established directly on the HMS used in clinical studies.
It allows clinicians and patients to discuss any mismatch between treatment efficacy and
what is acceptable to the patient.
Several obstacles were encountered during the development process, and the most important
will be highlighted here.
First, the development of the modified questionnaires required the use of a language expert. Precise wording of the introductory explanations and the questionnaire items were of
paramount importance to ensure the highest possible validity. Emphasis was placed on clarity
and avoiding misunderstandings, and several versions were developed before the pre-final
version was tested in the pilot study. Second, problems were encountered during the pilot
study as a considerable proportion of the included patients rated the smallest value possible as
an acceptable outcome after treatment (i.e. zero for the BQ and the first option for the ODI). The
reason given by most patients was that they expected to be “cured” and would not accept anything less. As a result, we decided to add the following to each of the modified questionnaire
items: “...if you had to accept some (e.g. pain)”. Third, the pilot study resulted in more questions
66
7. D ISCUSSION
from the patients than expected and most pertained to the interpretation of “an acceptable result after the treatment”. Consequently, a system of “easy access for questions” was developed
and involved secretarial (or head researcher) assistance by telephone and access to a website
with questions and answers.
Secondary sector patients
Follow-up<2 months
Primary outcomes:
- Oswestry Disability Index or
Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scale pain
(LBP only)
- Global anchors
Follow-up≥2 months
Primary outcomes:
- Oswestry Disability Index or
Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scale pain
(back)
- SF36
Secondary outcomes:
- Global anchors
Primary sector patients
Follow-up<2 months
Primary outcomes:
- Global anchors
- Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scalepain
(LBP only)
Secondary outcomes:
- Patient Specific Function Scale
Follow-up≥2 months
Primary outcomes:
- Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scalepain
(LBP only)
- SF36
Secondary outcomes:
- Global anchors
- Patient Specific Function Scale
LBP only
Follow-up ≥2 months
Primary outcomes:
- Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scalepain
(leg pain)
- SF36
Secondary outcomes:
- Global anchors
Follow-up <2 months
Primary outcomes:
- Global anchors
- Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scalepain
(leg pain)
Secondary outcomes:
- Patient Specific Function Scale
Follow-up ≥2 months
Primary outcomes:
- Oswestry Disability Index or
Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scale pain
(leg pain)
- SF36
Secondary outcomes:
- Global anchors
Follow-up <2 months
Primary outcomes:
- Oswestry Disability Index or
Roland Morris Disability
Questionnaire
- Low Back Pain Rating Scale pain
(leg pain)
- Global anchors
Secondary sector patients
Leg pain ± LBP
Primary sector patients
Patients with LBP
Evaluative HMS
7.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
67
Figure 7.1. Choosing pain and disability HMS in subgroups of LBP patients - an algorithm.
8. C ONCLUSIONS
8.1. PART I: T HE R ESPONSIVENESS IN S UBGROUPS S TUDY
The ODI was successfully translated and cross-culturally adapted into the Danish language.
It is a reliable, valid, and responsive tool to assess pain related function and is probably more
appropriate in the chronic SeS patients. The patient established minimal important change was
8-9 points and dependent on the level of disability at baseline.
HMS responsiveness varied according to the patient population it was applied to and the
RMQ or ODI proved to be the disability HMS of choice in all subgroups. However, the patients’ global retrospective assessment of treatment effect appeared to be the most responsive
instrument in the PrS patients. An algorithm simplifying the choice of pain and disability HMS
has been proposed.
A standardised use of patients’ global assessment of treatment effect (transition questions)
was proposed to simplify interpretation and comparisons of clinical trial results.
8.2. PART II: T HE P ROSPECTIVE A CCEPTABLE O UTCOME S TUDY
The prospective acceptable outcome method offers a benchmark by which patients’ acceptable outcome can be scrutinised before treatment begins. It yields results which are not
comparable to the retrospective MCIDpost method and the disparity is possibly influenced by
a response shift. Moreover, LBP patients have a clearer understanding of what is an acceptable change in pain intensity before starting treatment compared to changes in function and
psychological/affective domains.
9. R ECOMMENDATIONS
A multitude of areas within the realm of clinimetric research is ongoing, and most reflect
methodological studies of the meaning and measurement of change, and the generality of
MCID values. The present studies have focused on a new area - subgroups - and have fostered
new information and ideas. These ideas have been drawn up in an agenda for future areas of
research:
•
•
•
•
•
•
•
What are the consequences of using a less extensive cross-cultural adaptation process for
HMS? Further work needs to validate a “light” version of the current cross-cultural adaptation process and delineate criteria for the situations where this version is acceptable.
Future work should include clinimetric testing of HMS in subgroups other than the ones
we have looked at. So far, dependence on initial disease severity [122, 123, 158, 159], acute
and chronic patients [39], low back sprain vs. LBP with radiculopathy [149], surgical vs.
non-surgical patients [72], point of entry into the health care system (PrS vs. SeS patients)
and pain location (LBP only vs. leg pain ±LBP patients) have been investigated. It is likely
that other sociodemographic factors, such as depression, co-morbidity, employment status,
income, social status, duration of problem, duration of sick leave etc., may be important for
the choice of HMS.
Future work should include clinimetric testing of neck HMS in subgroups as very little
work has been done in this area.
Work should continue to explore the consequences of using different external criteria on
responsiveness and the MCID. Recent work indicates that the magnitude of the responsiveness statistic depends on the type of criterion included in the study [68]. In particular, I
recommend to look at the difference between patients’ global retrospective assessment of
treatment effect and the rating of importance of such a change.
Most MCID values have been established using one external criterion. To advance the confidence in the MCID for a particular HMS, future research should: 1) confirm the MCID
values using other anchors, and 2) report MCID values using CIs as outlined in paper II-1.
Accordingly, a range of MCIDs can be reported and scrutiny of individual MCID values for
accuracy is possible.
The advantages of combining distribution-based and anchor-based methods to back-specific
HMS should be further explored to increase confidence in reported MCID ranges.
Obtaining transition ratings by telephone interviews allows for independence between the
70
•
•
9. R ECOMMENDATIONS
HMS and the external criteria. In addition, it is quick and requires few resources. Future
research is needed to establish the validity of this method.
The prospective acceptable outcome method has been tested on patients seen in the SeS.
Future studies should test whether the method yields similar results in PrS patients.
Our results show that chronic LBP patients have unrealistically high expectations to the
result of the treatment asked before it begins. This was especially true for disability and
psychological/affective domains. Consequences of a mismatch between acceptable patient
outcomes and expected treatment efficacy on the reporting of results in a clinical trial should
be investigated.
Bibliography
[1] de Vet HC, Terwee CB, Bouter LM. Current challenges in clinimetrics. J Clin Epidemiol. 2003
Dec;56(12):1137–1141.
[2] Feinstein AR. An additional basic science for clinical medicine: IV. The development of clinimetrics. Ann Intern Med. 1983 Dec;99(6):843–848.
[3] Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol. 1997
Mar;50(3):239–246.
[4] Beaton DE, Bombardier C, Katz JN, Wright JG. A taxonomy for responsiveness. J Clin Epidemiol.
2001 Dec;54(12):1204–1217.
[5] de Bruin AF, Diederiks JP, de Witte LP, Stevens FC, Philipsen H. Assessing the responsiveness of
a functional status measure: the Sickness Impact Profile versus the SIP68. J Clin Epidemiol. 1997
May;50(5):529–540.
[6] de Vet HC, Terwee CB, Ostelo RW, Beckerman H, Knol DL, Bouter LM. Minimal changes in health
status questionnaires: distinction between minimally detectable change and minimally important
change. Health Qual Life Outcomes. 2006 Aug;4(1):54–62.
[7] Lohr KN, Aaronson NK, Alonso J, Burnam MA, Patrick DL, Perrin EB, et al.
Evaluating
quality-of-life and health status instruments: development of scientific review criteria. Clin Ther.
1996;18(5):979–992.
[8] Jaeschke R, Singer J, Guyatt GH. Measurement of health status. Ascertaining the minimal clinically
important difference. Control Clin Trials. 1989 Dec;10(4):407–415.
[9] de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J
Clin Epidemiol. 2006 Oct;59(10):1033–1039.
[10] Deyo RA. Measuring the functional status of patients with low back pain. Arch Phys Med Rehabil.
1988 Dec;69(12):1044–1053.
[11] Streiner DL, Norman GR. Health Measurment Scales. A Practical Guide to Their Development and
Use. vol. Third. Streiner DL, Norman GR, editors. Oxford: Oxford Medical Publications; 2003.
[12] Lurie J. A review of generic health status measures in patients with low back pain. Spine. 2000
Dec;25(24):3125–3129.
[13] Kopec JA, Esdaile JM, Abrahamowicz M, Abenhaim L, Wood-Dauphinee S, Lamping DL, et al. The
Quebec Back Pain Disability Scale. Measurement properties. Spine. 1995 Feb;20(3):341–352.
[14] Stratford P, Gill C, Westaway M, Binkley J. Assessing disability and change on individual patients:
a report of a patient specific measure. Physiother Can. 1995;47(4):258–263.
[15] Walsh TL, Hanscom B, Lurie JD, Weinstein JN. Is a condition-specific instrument for patients with
low back pain/leg symptoms really necessary? The responsiveness of the Oswestry Disability
Index, MODEMS, and the SF-36. Spine. 2003 Mar;28(6):607–615.
[16] Finch E, Brooks D, Stratford PW, Mayo NE. Physical Rehabilitation Outcome Measures. A Guide
72
Bibliography
to Enhanced Clinical Decision Making. vol. Second. Finch E, Brooks D, Stratford PW, Mayo NE,
editors. BC Decker Inc.; 2002.
[17] Muller U, Roder C, Greenough CG. Back related outcome assessment instruments. Eur Spine J.
2006 Jan;15 Suppl 1:S25–S31.
[18] Grotle M, Brox JI, Vollestad NK.
Functional Status and Disability Questionnaires: What Do
They Assess?: A Systematic Review of Back-Specific Outcome Questionnaires.
Spine. 2005
Jan;30(1):130–140.
[19] Muller U, Duetz MS, Roeder C, Greenough CG. Condition-specific outcome measures for low back
pain. Part I: Validation. Eur Spine J. 2004 Mar;13:301–313.
[20] Muller U, Roeder C, Dubs L, Duetz MS, Greenough CG. Condition-specific outcome measures for
low back pain. Part II: Scale construction. Eur Spine J. 2004 Mar;13:314–324.
[21] Roland M, Morris R. A Study of the Natural-History of Back Pain .1. Development of A Reliable
and Sensitive Measure of Disability in Low-Back-Pain. Spine. 1983;8(2):141–144.
[22] Patrick DL, Deyo RA, Atlas SJ, Singer DE, Chapin A, Keller RB. Assessing health-related quality
of life in patients with sciatica. Spine. 1995 Sep;20(17):1899–1908.
[23] Stratford PW, Binkley JM. Measurement properties of the RM-18. A modified version of the
Roland-Morris Disability Scale. Spine. 1997 Oct;22(20):2416–2421.
[24] Williams RM, Myers AM. Support for a shortened Roland-Morris Disability Questionnaire for
patients with acute low back pain. Physiother Can. 2001;53(1):60–66.
[25] Dionne CE, Von Korff M, Koepsell TD, Deyo RA, Barlow WE, Checkoway H. A comparison of
pain, functional limitations, and work status indices as outcome measures in back pain research.
Spine. 1999 Nov;24(22):2339–2345.
[26] Atlas SJ, Deyo RA, van den Ancker M, Singer DE, Keller RB, Patrick DL. The Maine-Seattle back
questionnaire: a 12-item disability questionnaire for evaluating patients with lumbar sciatica or
stenosis: results of a derivation and validation cohort analysis. Spine. 2003 Aug;28(16):1869–1876.
[27] Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine. 2000 Nov;25(22):2940–2952.
[28] Bombardier C. Outcome assessments in the evaluation of treatment of spinal disorders: summary
and general recommendations. Spine. 2000 Dec;25(24):3100–3103.
[29] Deyo RA, Andersson G, Bombardier C, Cherkin DC, Keller RB, Lee CK, et al. Outcome Measures
for Studying Patients with Low-Back-Pain. Spine. 1994 Sep;19(18):S2032–S2036.
[30] Beurskens AJ, de Vet HC, Koke AJ, van der Heijden GJ, Knipschild PG. Measuring the functional
status of patients with low back pain. Assessment of the quality of four disease-specific questionnaires. Spine. 1995 May;20(9):1017–1028.
[31] Kopec JA, Esdaile JM. Functional disability scales for back pain. Spine. 1995 Sep;20(17):1943–1949.
[32] Millard RW, Beattie PF, Jones RH. A comprehensive review of questionnaires to evaluate chronic
pain-related disability. Critical Reviews in Physical and Rehabilitation Medicine. 1997;9(1):35–52.
[33] Kopec JA. Measuring functional outcomes in persons with back pain: a review of back-specific
questionnaires. Spine. 2000 Dec;25(24):3110–3114.
[34] Roland M, Fairbank J. The Roland-Morris Disability Questionnaire and the Oswestry Disability
Questionnaire. Spine. 2000 Dec;25(24):3115–3124.
[35] Beaton DE, Schemitsch E. Measures of health-related quality of life and physical function. Clin
Orthop. 2003 Aug;(413):90–105.
73
Bibliography
[36] Resnik L, Dobrzykowski E. Guide to outcomes measurement for patients with low back pain
syndromes. J Orthop Sports Phys Ther. 2003 Jun;33(6):307–316.
[37] Zanoli G, Stromqvist B, Padua R, Romanini E.
Lessons learned searching for a HRQoL in-
strument to assess the results of treatment in persons with lumbar disorders.
Spine. 2000
Dec;25(24):3178–3185.
[38] Schaufele MK, Boden SD. Outcome research in patients with chronic low back pain. Orthop Clin
North Am. 2003 Apr;34(2):231–237.
[39] Grotle M, Brox JI, Vollestad NK. Concurrent comparison of responsiveness in pain and functional
status measurements used for patients with low back pain. Spine. 2004 Nov;29(21):E492–E501.
[40] Terwee CB, Bot SD, Boers M, van der Windt DA, Knol DL, Dekker J, et al. Quality criteria
were proposed for measurement properties of health status questionnaires. J Clin Epidemiol.
2006;60(1):34–42.
[41] Leboeuf-Yde C, Axén I, Jones JJ, Rosenbaum A, Løvgren PW, Halasz L, et al. The Nordic back pain
subpopulation program: the long-term outcome pattern in patients with low back pain treated by
chiropractors in Sweden. J Manipulative Physiol Ther. 2005 Sep;28(7):472–478.
[42] Fritz JM, Brennan GP, Clifford SN, Hunter SJ, Thackeray A. An examination of the reliability of a
classification algorithm for subgrouping patients with low back pain. Spine. 2006 Jan;31(1):77–82.
[43] Brennan GP, Fritz JM, Hunter SJ, Thackeray A, Delitto A, Erhard RE. Identifying subgroups of
patients with acute/subacute "nonspecific" low back pain: results of a randomized clinical trial.
Spine. 2006 Mar;31(6):623–631.
[44] Wilson IB, Cleary PD. Linking clinical variables with health-related quality of life. A conceptual
model of patient outcomes. JAMA. 1995 Jan;273(1):59–65.
[45] Kirshner B, Guyatt G. A methodological framework for assessing health indices. J Chronic Dis.
1985;38(1):27–36.
[46] Sloan JA, Aaronson N, Cappelleri JC, Fairclough DL, Varricchio C. Assessing the clinical significance of single items relative to summated scores. Mayo Clin Proc. 2002 May;77(5):479–487.
[47] Manniche C, Asmussen K, Lauritsen B, Vinterberg H, Kreiner S, Jordan A. Low Back Pain Rating
scale: validation of a tool for assessment of low back pain. Pain. 1994 Jun;57(3):317–326.
[48] Ware JE. SF-36 health survey update. Spine. 2000 Dec;25(24):3130–3139.
[49] Boscainos PJ, Sapkas G, Stilianessi E, Prouskas K, Papadakis SA. Greek versions of the Oswestry
and Roland-Morris Disability Questionnaires. Clin Orthop. 2003 Jun;(411):40–53.
[50] Albert HB, Jensen AM, Dahl D, Rasmussen MN. Criteria validation of the Roland Morris questionnaire. A Danish translation of the international scale for the assessment of functional level in
patients with low back pain and sciatica. Ugeskr Laeger. 2003 Apr;165(18):1875–1880 [In Danish].
[51] Johansson E, Lindberg P.
Subacute and chronic low back pain. Reliability and validity of a
Swedish version of the Roland and Morris Disability Questionnaire. Scand J Rehabil Med. 1998
Sep;30(3):139–143.
[52] Nusbaum L, Natour J, Ferraz MB, Goldenberg J. Translation, adaptation and validation of the
Roland-Morris questionnaire–Brazil Roland-Morris. BrazJ Med Biol Res. 2001 Feb;34(2):203–210.
[53] Fayers PM, Machin D. Quality of Life. Assessment, Analysis and Interpretation. Fayers PM,
Machin D, editors. Chichester: John Wiley & Sons Ltd.; 2000.
[54] DeVellis R. Scale development. Theory and application. 2nd ed. Seawell M, editor. Sage Publications; 2003.
74
Bibliography
[55] Oppenheim AN. Questionnaire design, interviewing and attitude measurement. 2nd ed. Oppenheim AN, editor. Printer Publisher; 1996.
[56] Hansagi H, Allebeck P. Enkät och intervju inom hälso- och sjukvård. Handbok för forsking och
utvecklingsarbete. Hansagi H, Allebeck P, editors. Studenterlitteratur; 1996.
[57] Parsons S, Carnes D, Pincus T, Foster N, Breen A, Vogel S, et al. Measuring troublesomeness of
chronic pain by location. BMC Musculoskelet Disord. 2006 Apr;7(1):34–.
[58] Wittink H, Turk DC, Carr DB, Sukiennik A, Rogers W. Comparison of the redundancy, reliability,
and responsiveness to change among SF-36, Oswestry Disability Index, and Multidimensional Pain
Inventory. Clin J Pain. 2004 May;20(3):133–142.
[59] Bayar K, Bayar B, Yakut E, Yakut Y. Reliability and construct validity of the Oswestry Low Back
Pain Disability Questionnaire in the elderly with low back pain. Pain Clinic. 2003;15(1):55–59.
[60] Fritz JM, Piva SR. Physical impairment index: reliability, validity, and responsiveness in patients
with acute low back pain. Spine. 2003 Jun;28(11):1189–1194.
[61] Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM. On assessing responsiveness
of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res.
2003 Jun;12(4):349–362.
[62] Stratford PW, Riddle DL. Assessing sensitivity to change: choosing the appropriate change coefficient. Health Qual Life Outcomes. 2005;3(1):23–.
[63] Crosby RD, Kolotkin RL, Williams GR. Defining clinically meaningful change in health-related
quality of life. J Clin Epidemiol. 2003 May;56(5):395–407.
[64] Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical
review and recommendations. J Clin Epidemiol. 2000 May;53(5):459–468.
[65] Stratford PW, Binkley FM, Riddle DL. Health status measures: strategies and analytic methods for
assessing change scores. Phys Ther. 1996 Oct;76(10):1109–1123.
[66] Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials.
CMAJ. 1986 Apr;134(8):889–895.
[67] Osoba D, Rodrigues G, Myles J, Zee B, Pater J.
Interpreting the significance of changes in
health-related quality-of-life scores. J Clin Oncol. 1998 Jan;16(1):139–144.
[68] Kuijer W, Brouwer S, Dijkstra PU, Jorritsma W, Groothoff JW, Geertzen JH. Responsiveness of the
Roland-Morris Disability Questionnaire: consequences of using different external criteria. Clin
Rehabil. 2005 Aug;19(5):488–495.
[69] Guyatt GH, Norman GR, Juniper EF, Griffith LE. A critical look at transition ratings. J Clin Epidemiol. 2002 Sep;55(9):900–908.
[70] Elliott AM, Smith BH, Hannaford PC, Smith WC, Chambers WA. Assessing change in chronic
pain severity: the chronic pain grade compared with retrospective perceptions. Br J Gen Pract.
2002 Apr;52(477):269–274.
[71] Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of
responsiveness to change: the lesson of Cronbach. J Clin Epidemiol. 1997 Aug;50(8):869–879.
[72] Hägg O, Fritzell P, Oden A, Nordwall A. Simplifying outcome measurement: evaluation of instruments for measuring outcome after fusion surgery for chronic low back pain. Spine. 2002
Jun;27(11):1213–1222.
[73] Hägg O, Fritzell P, Nordwall A. Simplifying outcome measurement. Eur Spine J. 2005;14(Suppl.
1):S1–S30.
75
Bibliography
[74] Redelmeier DA, Guyatt GH, Goldstein RS. Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol. 1996 Nov;49(11):1215–1219.
[75] Beaton DE, Boers M, Wells GA. Many faces of the minimal clinically important difference (MCID): a
literature review and directions for future research. Curr Opin Rheumatol. 2002 Mar;14(2):109–114.
[76] Hays RD, Woolley JM.
The concept of clinically meaningful difference in health-related
quality-of-life research. How meaningful is it? Pharmacoeconomics. 2000 Nov;18(5):419–423.
[77] Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life
measures: literature review and proposed guidelines. J Clin Epidemiol. 1993 Dec;46(12):1417–1432.
[78] Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural
adaptation of self-report measures. Spine. 2000 Dec;25(24):3186–3191.
[79] Scientific Advisory Committee of the Medical Outcomes Trust. Assessing health status and
quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002 May;11(3):193–205.
[80] Cortina JM. What Is Coefficient Alpha? An Examination of Theory and Applications. Journal of
Applied Psychology. 1993;78(1):98–104.
[81] Nunnally JC, Bernstein I. Psychometric Theory. 3rd ed. Vaicunas J, Belser JR, editors. McGraw Hill
Higher Education; 1993.
[82] McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health
status surveys adequate? Qual Life Res. 1995 Aug;4(4):293–307.
[83] Davidson M, Keating JL. A comparison of five low back disability questionnaires: reliability and
responsiveness. Phys Ther. 2002 Jan;82(1):8–24.
[84] Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998 Oct;26(4):217–238.
[85] Wyrwich KW, Tierney WM, Wolinsky FD. Further evidence supporting an SEM-based criterion for
identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol.
1999 Sep;52(9):861–873.
[86] Beckerman H, Roebroeck ME, Lankhorst GJ, Becher JG, Bezemer PD, Verbeek AL. Smallest real
difference, a link between reproducibility and responsiveness. Qual Life Res. 2001;10(7):571–578.
[87] Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD. Linking clinical relevance and statistical
significance in evaluating intra-individual changes in health-related quality of life. Med Care. 1999
May;37(5):469–478.
[88] Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;1(8476):307–310.
[89] Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against
standard method is misleading. Lancet. 1995 Oct;346(8982):1085–1087.
[90] Lamé IE, Peters ML, Vlaeyen JWS, v Kleef M, Patijn J. Quality of life in chronic pain is more
associated with beliefs about pain, than with pain intensity. Eur J Pain. 2005 Feb;9(1):15–24.
[91] Patrick DL, Chiang YP. Measurement of health outcomes in treatment effectiveness evaluations:
conceptual and methodological challenges. Med Care. 2000 Sep;38(9 Suppl):II14–II25.
[92] Hays RD, Hadorn D. Responsiveness to change: an aspect of validity, not a separate dimension.
Qual Life Res. 1992 Feb;1(1):73–75.
[93] Deyo RA, Centor RM. Assessing the responsiveness of functional scales to clinical change: an
analogy to diagnostic test performance. J Chronic Dis. 1986;39(11):897–906.
76
Bibliography
[94] Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative
instruments. J Chronic Dis. 1987;40(2):171–178.
[95] Guyatt GH, Deyo RA, Charlson M, Levine MN, Mitchell A. Responsiveness and validity in health
status measurement: a clarification. J Clin Epidemiol. 1989;42(5):403–408.
[96] Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 1993 Jun;2(3):221–226.
[97] Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care.
1989 Mar;27(3 Suppl):S178–S189.
[98] Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the
remarkable universality of half a standard deviation. Med Care. 2003 May;41(5):582–592.
[99] Moayyedi P, Duffett S, Braunholtz D, Mason S, Richards ID, Dowell AC, et al. The Leeds Dyspepsia Questionnaire: a valid tool for measuring the presence and severity of dyspepsia. Aliment
Pharmacol Ther. 1998 Dec;12(12):1257–1262.
[100] Speer DC, Greenbaum PE. Five methods for computing significant individual client change and
improvement rates: support for an individual growth curve approach. J Consult Clin Psychol. 1995
Dec;63(6):1044–1048.
[101] Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, Mowat A. Importance of sensitivity to change as
a criterion for selecting health status measures. Qual Health Care. 1992 Jun;1(2):89–93.
[102] Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in health status: reliability and
responsiveness of five generic health status measures in workers with musculoskeletal disorders.
J Clin Epidemiol. 1997 Jan;50(1):79–93.
[103] Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures.
Statistics and strategies for evaluation. Control Clin Trials. 1991 Aug;12(4 Suppl):142S–158S.
[104] Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change
in psychotherapy research. J Consult Clin Psychol. 1991 Feb;59(1):12–19.
[105] Jenkinson C. Measuring Health And Medical Outcomes. Jenkinson C, editor. Taylor & Francis;
1994.
[106] Cohen J. Statistical Power Analysis for the Behavioral Sciences. vol. 2nd. edition. Cohen J, editor.
Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
[107] Fitzpatrick R, Ziebland S, Jenkinson C, Mowat A, Mowat A. Transition Questions to Assess Outcomes in Rheumatoid-Arthritis. Br J Rheumatol. 1993 Sep;32(9):807–811.
[108] Fischer D, Stewart AL, Bloch DA, Lorig K, Laurent D, Holman H. Capturing the patient’s view of
change as a clinical outcome measure. JAMA. 1999 Sep;282(12):1157–1162.
[109] Farrar JT, Portenoy RK, Berlin JA, Kinman JL, Strom BL. Defining the clinically important difference in pain outcome measures. Pain. 2000;88(3):287–294.
[110] Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR. Methods to explain the clinical significance of health status measures. Mayo ClinProc. 2002 Apr;77(4):371–383.
[111] Barck AL. Measurement of clinical change caused by knee replacement. Conventional score or
special change indexes? Arch Orthop Trauma Surg. 1999;119(1-2):76–78.
[112] Deyo RA, Inui TS, Leininger J, Overman S. Physical and psychosocial function in rheumatoid
arthritis. Clinical use of a self-administered health status instrument. Arch Intern Med. 1982
May;142(5):879–882.
[113] Testa MA, Simonson DC.
Mar;334(13):835–840.
Assesment of quality-of-life outcomes.
N Engl J Med. 1996
77
Bibliography
[114] Llewellyn-Thomas HA, Williams JI, Levy L, Naylor CD.
Using a trade-off technique to as-
sess patients’ treatment preferences for benign prostatic hyperplasia.
Med Decis Making.
1996;16(3):262–282.
[115] Stucki G, Liang MH, Fossel AH, Katz JN.
Relative responsiveness of condition-specific and
generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol. 1995
Nov;48(11):1369–1378.
[116] Kolotkin RL, Crosby RD, Williams GR. Health-related quality of life varies among obese subgroups. Obes Res. 2002 Aug;10(8):748–756.
[117] Juniper EF, Guyatt GH, Willan A, Griffith LE. Determining a minimal important change in a
disease-specific Quality of Life Questionnaire. J Clin Epidemiol. 1994 Jan;47(1):81–87.
[118] Middel B, Goudriaan H, de Greef M, Stewart R, van Sonderen E, Bouma J, et al. Recall bias did not
affect perceived magnitude of change in health-related functional status. J Clin Epidemiol. 2006
May;59(5):503–511.
[119] Aseltine RH, Carlson KJ, Fowler J F J, Barry MJ. Comparing prospective and retrospective measures
of treatment outcomes. Med Care. 1995 Apr;33(4 Suppl):AS67–AS76.
[120] Herrmann D. Reporting current, past, and changed health status. What we know about distortion.
Med Care. 1995 Apr;33(4 Suppl):AS89–AS94.
[121] Farrar JT, Young J J P, LaMoreaux L, Werth JL, Poole RM. Clinical importance of changes in chronic
pain intensity measured on an 11-point numerical pain rating scale. Pain. 2001 Dec;94(2):149–158.
[122] Stratford PW, Binkley J, Solomon P, Finch E, Gill C, Moreland J. Defining the minimum level of
detectable change for the Roland-Morris questionnaire. Phys Ther. 1996 Apr;76(4):359–365.
[123] Stratford PW, Binkley JM, Riddle DL, Guyatt GH. Sensitivity to change of the Roland-Morris Back
Pain Questionnaire: part 1. Phys Ther. 1998 Nov;78(11):1186–1196.
[124] Cella D, Eton DT, Lai JS, Peterman AH, Merkel DE. Combining anchor and distribution-based
methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales. J Pain Symptom Manage. 2002 Dec;24(6):547–561.
[125] Eton DT, Cella D, Yost KJ, Yount SE, Peterman AH, Neuberg DS, et al.
A combination of
distribution- and anchor-based approaches determined minimally important differences (MIDs)
for four endpoints in a breast cancer scale. J Clin Epidemiol. 2004 Oct;57(9):898–910.
[126] Yost KJ, Cella D, Chawla A, Holmgren E, Eton DT, Ayanian JZ, et al. Minimally important differences were estimated for the Functional Assessment of Cancer Therapy-Colorectal (FACT-C)
instrument using a combination of distribution- and anchor-based approaches. J Clin Epidemiol.
2005 Dec;58(12):1241–1251.
[127] Cella D, Eton DT, Fairclough DL, Bonomi P, Heyes AE, Silberman C, et al. What is a clinically
meaningful change on the Functional Assessment of Cancer Therapy-Lung (FACT-L) Questionnaire? Results from Eastern Cooperative Oncology Group (ECOG) Study 5592. J Clin Epidemiol.
2002 Mar;55(3):285–295.
[128] Chansirinukor W, Maher CG, Latimer J, Hush J. Comparison of the functional rating index and
the 18-item Roland-Morris Disability Questionnaire: responsiveness and reliability. Spine. 2005
Jan;30(1):141–145.
[129] Bjorner JB, Damsgaard MT, Watt T, Groenvold M. Tests of data quality, scaling assumptions, and
reliability of the Danish SF-36. J Clin Epidemiol. 1998 Nov;51(11):1001–1011.
78
Bibliography
[130] Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P. Differential item functioning in the Danish
translation of the SF-36. J Clin Epidemiol. 1998 Nov;51(11):1189–1202.
[131] Bjorner JB, Thunedborg K, Kristensen TS, Modvig J, Bech P. The Danish SF-36 Health Survey:
translation and preliminary validity studies. J Clin Epidemiol. 1998 Nov;51(11):991–999.
[132] Fairbank JC, Couper J, Davies JB, O’Brien JP. The Oswestry low back pain disability questionnaire.
Physiotherapy. 1980 Aug;66(8):271–273.
[133] Stratford PW, Spadoni G, Kennedy D, Westaway MD, Alcock GK. Seven points to consider when
investigating a measure’s ability to detect change. Physiother Can. 2002;54(1):16–24.
[134] Guyatt GH. Making sense of quality-of-life data. Med Care. 2000 Sep;38(9 Suppl):II175–II179.
[135] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. vol. 1st ed. New York: Chapman and
Hall; 1993.
[136] Phillips PCB, Park JY. On the formulation of wald tests of nonlinear restrictions. Econometrica.
1988;56(5):1065–1083.
[137] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.
Biometrics. 1988
Sep;44(3):837–845.
[138] Hägg O, Fritzell P, Nordwall A. The clinical importance of changes in outcome scores after treatment for chronic low back pain. Eur Spine J. 2003 Feb;12(1):12–20.
[139] de Vet HC, Bouter LM, Bezemer PD, Beurskens AJ. Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J
Technol Assess Health Care. 2001;17(4):479–487.
[140] Beurskens AJ, de Vet HC, Koke AJ. Responsiveness of functional status in low back pain: a comparison of different instruments. Pain. 1996 Apr;65(1):71–76.
[141] Bolton JE, Breen AC. The Bournemouth Questionnaire: a short-form comprehensive outcome
measure. I. Psychometric properties in back pain patients. J Manipulative Physiol Ther. 1999
Oct;22(8):503–510.
[142] Bolton JE, Humphreys BK. The Bournemouth Questionnaire: A short-form comprehensive outcome measure. II. Psychometric properties in neck pain patients. J Manipulative Physiol Ther.
2002 Mar;25(3):141–148.
[143] Fisher K, Johnston M. Validation of the Oswestry Low Back Pain Disability Questionnaire, its
sensitivity as a measure of change following treatment and its relationship with other aspects of
the chronic pain experience. Physiotherapy Theory and Practice. 1997;13(1):67–80.
[144] Grotle M, Brox JI, Vollestad NK. Cross-cultural adaptation of the Norwegian versions of the
Roland-Morris Disability Questionnaire and the Oswestry Disability Index. J Rehabil Med. 2003
Sep;35(5):241–247.
[145] Baker D, Pynsent PB, Fairbank JC. The Oswestry Disability Index revisited: its reliability, repeatability and validity, and a comparison with the St Thomas’s Disability Index. In: Roland M, Jenner J,
editors. Back Pain: New Approaches to Rehabilitation and Education. 12. Manchester: Manchester
University Press; 1989. p. 174–186.
[146] Fritz JM, Irrgang JJ. A comparison of a modified Oswestry Low Back Pain Disability Questionnaire
and the Quebec Back Pain Disability Scale. Phys Ther. 2001 Feb;81(2):776–788.
[147] Mannion AF, Junge A, Fairbank JC, Dvorak J, Grob D. Development of a German version of the
79
Bibliography
Oswestry Disability Index. Part 1: cross-cultural adaptation, reliability, and validity. Eur Spine J.
2006;15(1):55–65.
[148] Mannion AF, Junge A, Grob D, Dvorak J, Fairbank JC. Development of a German version of
the Oswestry Disability Index. Part 2: sensitivity to change after spinal surgery. Eur Spine J.
2006;15(1):66–73.
[149] Leclaire R, Blier F, Fortin L, Proulx R. A cross-sectional study comparing the Oswestry and
Roland-Morris Functional Disability scales in two populations of patients with low back pain of
different levels of severity. Spine. 1997 Jan;22(1):68–71.
[150] Taylor SJ, Taylor AE, Foy MA, Fogg AJ. Responsiveness of common outcome measures for patients
with low back pain. Spine. 1999 Sep;24(17):1805–1812.
[151] Suarez-Almazor ME, Kendall C, Johnson JA, Skeith K, Vincent D. Use of health status measures in patients with low back pain in clinical settings. Comparison of specific, generic and
preference-based instruments. Rheumatology. 2000 Jul;39(7):783–790.
[152] Rantanen P. Physical measurements and questionnaires as diagnostic tools in chronic low back
pain. J Rehabil Med. 2001 Jan;33(1):31–35.
[153] Davidson M, Keating JL, Eyres S. A low back-specific version of the SF-36 Physical Functioning
scale. Spine. 2004 Mar;29(5):586–594.
[154] Lønnberg F. The management of back problems among the population. I. Contact patterns and
therapeutic routines. Ugeskr Laeger. 1997 Apr;159(15):2207–2214 [In Danish].
[155] Sørensen LP. Chiropractor patients in Denmark - a patient profile. Hartvigsen J, editor. Nordic
Institute of Chiropractic and Clinical Biomechanics; 2002 [In Danish].
[156] Gaist D, Sørensen HT, Hallas J.
The Danish prescription registries.
Dan Med Bull. 1997
Sep;44(4):445–448.
[157] Yelland MJ, Schluter PJ. Defining Worthwhile and Desired Responses to Treatment of Chronic Low
Back Pain. Pain Medicine. 2006;7(1):38–45.
[158] Angst F, Aeschlimann A, Michel BA, Stucki G. Minimal clinically important rehabilitation effects
in patients with osteoarthritis of the lower extremities. J Rheumatol. 2002 Jan;29(1):131–138.
[159] Riddle DL, Stratford PW, Binkley JM. Sensitivity to change of the Roland-Morris Back Pain Questionnaire: part 2. Phys Ther. 1998 Nov;78(11):1197–1207.
10. A PPENDICES
The appendices included are divided into two sections:
“Responsiveness in Subgroups Study”
10.1 Detailed description of the instructions to the interviewer
10.2 Baseline and eight weeks follow-up questionnaire booklet
10.3 Test-retest booklet
10.4 Transition questions 1 and 2 (including cover letter)
“Prospective Acceptable Outcome Study”
10.5 The semi-structured interview form
10.6 Baseline questionnaire booklet
10.7 One week follow-up booklet
10.8 Eight weeks follow-up booklet
10.9 Nine weeks follow-up interview form
10.10 The Danish version of the Oswestry Disability Index
10.1. A PPENDIX I
81
10.1. A PPENDIX I
Detailed description of the instructions to the interviewer used in the “Responsiveness in Subgroups Study”.
Appendix 1
Instructions to the professional interviewer who obtained the patients global retrospective
assessment of treatment effect and the global rating of importance.
The interviewer first introduced the baseline NRSpain score by reading the statement:
“The first time you filled out a questionnaire booklet, you estimated your overall pain from the back
and/or leg to be: ? (score read) on a 0 to 10 scale with 0 being “no pain” and 10 being “the worst
imaginable pain””
This was repeated if the patient didn’t understand it the first time. The protocol was now divided
into two separate protocols – one for TQ1 and one for TQ2.
TQ1:
The interviewer read the sentence:
“How would you describe your general low back and/or leg problems now, compared to how you
were when the treatment started?”
Following this all the response options were read twice to ensure clarity and understanding. The
interviewer was strictly not allowed to help the patient with the answer; however, in case a patient
couldn’t choose a specific category, the interviewer decided if the patient was either “better”,
“about the same” or “worse” from the patient’s response. If the interviewer decided that the patient
was better, the categories for being improved was read again (“much better”, “better”, “a little
better”) and similar for patients classified as worse. If the patient still couldn’t choose the response
was recorded as missing.
TQ2:
The interviewer read the sentence:
“If you think about the first time you filled out the questionnaire booklet, how would you describe
your average low back/leg problems?”
Three options were read to the patient: 1) better, 2) about the same, or 3) worse
After choosing one, the interviewer read the question:
“How much better/worse is your low back or leg problems compared to how you were when the
treatment started?”
Following this the 7 response options were read twice to ensure clarity and understanding. The
interviewer was strictly not allowed to help the patient with the answer; however, in the case that a
patient couldn’t choose a specific category, the interviewer read the response options again. If the
patient still couldn’t choose the response was recorded as missing.
82
10. A PPENDICES
10.2. A PPENDIX II
Baseline and eight weeks follow-up questionnaire booklet used in the “Responsiveness in Subgroups Study”. The eight weeks follow-up booklet omitted page ii.
Køn? Lidt o
om d
dig sselv ‰ Mand ‰ Kvinde Hvad er din alder? ___________ år. Hvor længe har dine nuværende smerter stået på? ca. dage Hvor mange dage om ugen har du smerter (gennemsnitlig)? Har du haft det nuværende problem før? ‰ Ja ‰ Nej dage Hvis Ja, ca. hvor mange gange? ‰ 0‐2 gange ‰ 2‐10 gange ‰ mere end 10 gange Hvor har du ondt? ‰ I lænderyggen (sæt ét kryds) Med ”lænderyggen” menes det skraverede område. ‰ I det ene eller begge ben ‰ Begge steder ‰ Ingen af stederne. Beskriv: Med benene menes det skraverede område. Smerter i benene som ikke stammer fra ryggen – fx slidgigt i knæet – medregnes ikke her. Hvor ofte har du taget smertestillende medicin for dine ryg‐/ben‐smerter indenfor den sidste uge? ‰ Aldrig ‰ Et par gange ‰ Mere end et par gange, men ikke dagligt ‰ Dagligt Har du tidligere søgt behandling for samme problem? ‰ Ja ‰ Nej Hvis ja, hos hvem? ‰ Læge ‰ Kiropraktor ‰ Fysioterapeut ‰ Andet Kører der, på grund af ryggen, en erstatningssag (fx arbejdsskadesag, patientforsikringssag eller klagesag): ‰ Ja ‰ Nej ii
primær
83
10.2. A PPENDIX II
Roland Morris‐spørgeskemaet Når du har ondt i ryggen eller benene, er nogle af de ting, du plejer at gøre, måske blevet mere vanskelige. Dette skema indeholder nogle sætninger, som folk med rygsmerter eller bensmerter (iskias) har brugt til at beskrive sig selv med. Nogle af sætningerne skiller sig måske ud, fordi de netop beskriver dig, som du har det i dag. Efterhånden som du læser listen, skal du tænke på dig selv i dag. Når du læser en sætning, der beskriver, hvordan du har det i dag, skal du sætte kryds ved Ja. Hvis den pågældende sætning ikke beskriver din tilstand i dag, sætter du et kryds ved Nej. Ja Nej 1. Jeg bliver hjemme det meste af tiden på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 2. Jeg skifter ofte stilling i et forsøg på at gøre det behageligt for ryg og ben ‰ ‰ 3. Jeg går langsommere end sædvanligt på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 4. På grund af mit rygproblem eller bensmerter (iskias) foretager jeg mig ikke nogle af de ting, som jeg sædvanligvis gør i og omkring huset ‰ ‰ På grund af mit rygproblem eller bensmerter (iskias) bruger jeg gelænderet, når jeg skal op ad trappen ‰ ‰ På grund af mit rygproblem eller bensmerter (iskias) er jeg nødt til at holde ved noget, når jeg skal op fra en lænestol ‰ ‰ Jeg kommer langsommere i tøjet end sædvanligt på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 8. Jeg står kun op i kort tid på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 9. På grund af mit rygproblem eller bensmerter (iskias) prøver jeg at undgå at bukke mig eller at gå ned i knæ ‰ ‰ 10. Jeg synes det er vanskeligt for mig at komme op fra en lænestol på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 11. Jeg har næsten hele tiden ondt i min ryg eller ben ‰ ‰ 12. Jeg synes det er svært at vende mig i sengen på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 5. 6. 7. 1
84
10. A PPENDICES
Ja Nej 13. Jeg har vanskeligt ved at tage mine sokker eller strømper på, på grund af smerterne i ryg eller ben ‰ ‰ 14. Jeg spadserer kun korte afstande på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 15. Jeg sover mindre godt på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 16. Jeg undgår tungt arbejde i og omkring huset på grund af mit rygproblem eller bensmerter (iskias) ‰ ‰ 17. På grund af mit rygproblem eller bensmerter (iskias) er jeg mere irritabel og i dårligt humør overfor folk end ellers ‰ ‰ 18. På grund af mit rygproblem eller bensmerter (iskias) går jeg langsommere op ad trapper end ellers ‰ ‰ 19. Jeg bliver i sengen det meste af tiden på grund af mine ryg‐ eller bensmerter (iskias) ‰ ‰ 20. På grund af mit rygproblem eller bensmerter (iskias) er min seksuelle aktivitet faldet ‰ ‰ 21. Jeg bliver ved med at gnide på eller holde på de steder på min krop, hvor det gør ondt eller er ubehageligt ‰ ‰ 22. På grund af mit rygproblem eller bensmerter (iskias) laver jeg mindre af det daglige arbejde i og omkring huset end, hvad jeg ellers ville gøre ‰ ‰ 23. Jeg giver overfor andre folk ofte udtryk for bekymring over, hvad der måske er ved at ske med mit helbred ‰ ‰ 2
85
10.2. A PPENDIX II
SF‐36 Vejledning: Dette spørgeskema handler om din opfattelse af din fysiske funktion og din smerte. Besvar hvert spørgsmål ved at sætte ring om det svar, der passer bedst på dig. Hvis du er i tvivl om, hvordan du skal svare, svar da venligst så godt du kan. 1. De følgende spørgsmål handler om aktiviteter i dagligdagen. Er du på grund af dit helbred begrænset i disse aktiviteter? I så fald, hvor meget? (Sæt ring om ét tal for hver linie) Ja, meget begrænset Ja, lidt begrænset Nej, slet ikke begrænset 1 2 3 1 2 3 c. At løfte eller bære dagligvarer 1 2 3 d. At gå flere etager op ad trapper 1 2 3 e. At gå én etage op ad trapper 1 2 3 f. At bøje sig ned eller gå ned i knæ 1 2 3 g. Gå mere end én kilometer 1 2 3 h. Gå nogle hundrede meter 1 2 3 i. Gå 100 meter 1 2 3 j. Gå i bad eller tage tøj på 1 2 3 a. Krævende aktiviteter, som fx at løbe, løfte tunge ting, deltage i anstrengende sport b. Lettere aktiviteter, såsom at flytte et bord, støvsuge eller cykle 2. Hvor stærke fysiske smerter har du haft i den sidste uge? (Sæt kun én ring) Ingen smerter ................................................................... 1 Meget lette smerter.......................................................... 2 Lette smerter .................................................................... 3 Middelstærke smerter..................................................... 4 Stærke smerter ................................................................. 5 Meget stærke smerter ..................................................... 6 3. Indenfor den sidste uge hvor meget har fysisk smerte vanskeliggjort dit daglige arbejde (både arbejde uden for hjemmet og husarbejde)? (Sæt kun én ring) Slet ikke............................................................................. 1 Lidt .................................................................................... 2 Noget................................................................................. 3 En hel del .......................................................................... 4 Virkelig meget.................................................................. 5 3
86
10. A PPENDICES
Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter ‰ Jeg har ingen smerter for øjeblikket ‰ Smerterne er meget svage for øjeblikket ‰ Smerterne er moderate for øjeblikket ‰ Smerterne er forholdsvis kraftige for øjeblikket ‰ Smerterne er meget kraftige for øjeblikket ‰ Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) ‰ Jeg kan klare mig selv som normalt, uden at det giver flere smerter ‰ Jeg kan klare mig selv som normalt, men det giver smerter ‰ Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig ‰ Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv ‰ Jeg skal have hjælp hver dag til det meste af min personlige pleje ‰ Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte ‰ Jeg kan løfte noget tungt uden at få flere smerter ‰ Jeg kan løfte noget tungt, men det giver mig flere smerter ‰ Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg ‰ kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord ‰ Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt ‰ Jeg kan kun løfte noget meget let ‰ Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå ‰ Jeg kan gå så langt jeg har lyst selvom jeg har smerter ‰ Smerterne hindrer mig i at gå mere end 2 kilometer ‰ Smerterne hindrer mig i at gå mere end 1 kilometer ‰ Smerterne hindrer mig i at gå mere end 500 meter ‰ Jeg kan kun gå, når jeg bruger stok eller krykker ‰ Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde ‰ Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst ‰ Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst ‰ Smerterne hindrer mig i at sidde mere end 1 time ‰ Smerterne hindrer mig i at sidde mere end en ½ time ‰ Smerterne hindrer mig i at sidde mere end 10 minutter ‰ Jeg kan overhovedet ikke sidde på grund af smerterne 4 87
10.2. A PPENDIX II
Afsnit 6: Stå ‰ Jeg kan stå op så længe jeg vil uden at få flere smerter ‰ Jeg kan stå op så længe jeg vil, men det giver mig flere smerter ‰ Smerterne hindrer mig i at stå op i mere end 1 time ‰ Smerterne hindrer mig i at stå op i mere end en ½ time ‰ Smerterne hindrer mig i at stå op i mere end 10 minutter ‰ Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove ‰ Min søvn forstyrres aldrig af smerterne ‰ Min søvn forstyrres af og til af smerterne ‰ På grund af smerterne får jeg mindre end 6 timers søvn ‰ På grund af smerterne får jeg mindre end 4 timers søvn ‰ På grund af smerterne får jeg mindre end 2 timers søvn ‰ Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) ‰ Mit sexliv er som normalt og giver ikke flere smerter ‰ Mit sexliv er som normalt, men giver flere smerter ‰ Mit sexliv er næsten som normalt, men giver mange smerter ‰ Mit sexliv er alvorligt hæmmet af smerterne ‰ Mit sexliv er næsten ophørt på grund af smerterne ‰ Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv ‰ Mit sociale liv er som normalt og giver mig ikke ekstra smerter ‰ Mit sociale liv er som normalt, men øger mine smerter ‰ Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. ‰ Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte ‰ Smerterne har begrænset mit sociale liv til mit hjem ‰ Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse ‰ Jeg kan rejse hvorhen jeg vil uden smerter ‰ Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter ‰ Smerterne er slemme, men jeg kan godt klare over 2 timers rejse ‰ Smerterne begrænser mine rejser til mindre end 1 time ‰ Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter ‰ Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 5 88
10. A PPENDICES
Low back pain rating scale Afkryds kun ét felt i hver linie, hvor 0 svarer til slet ingen smerter og 10 svarer til værst mulige smerter. ”10” svarer til de værst mulige smerter du kan forestille dig – og altså ikke (nødvendigvis) til de stærkeste ryg‐smerter, du har oplevet. RYGSMERTER Dine rygsmerter NETOP NU: Slet ingen smerter 0 1 Værst mulige smerter 2 3 4 5 6 7 8 9 10 De SVÆRESTE rygsmerter du har haft inden for de sidste 14 dage: 0 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 De GENNEMSNITLIGE rygsmerter de sidste 14 dage: 0 1 2 3 Dine bensmerter NETOP NU: 4 5 BENSMERTER
Slet ingen smerter 0 1 Værst mulige smerter 2 3 4 5 6 7 8 9 10 De SVÆRESTE bensmerter du har haft inden for de sidste 14 dage: 0 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 De GENNEMSNITLIGE bensmerter de sidste 14 dage: 0 1 2 3 4 5 6
89
10.2. A PPENDIX II
DIN VURDERING AF DIN FYSISKE/PSYKISKE FORMÅEN I HVERDAGEN I DE SIDSTE 14 DAGE.
Sæt ét kryds på hver linie: Kan give Nej problemer Ved ikke Ja Vågner du om natten pga. ryg‐/bensmerter? ‰ ‰ ‰ ‰ Klarer du daglige gøremål uden at din ryg nedsætter aktiviteten? Klarer du lettere gøremål i hjemmet, som fx at vande blomster eller bære tallerkener fra bordet? ‰ ‰ ‰ ‰ ‰ ‰ ‰ ‰ 4. Kan du selv tage sko og strømper på? ‰ ‰ ‰ ‰ 5. Kan du bære to fulde indkøbsposer (10 kg i alt)? ‰ ‰ ‰ ‰ 6. Kan du selv komme op fra en lav lænestol uden besvær? ‰ ‰ ‰ ‰ 7. Kan du læne dig frem over håndvasken for at børste tænder? ‰ ‰ ‰ ‰ 8. Kan du gå op ad trappen fra én etage til en anden uden at hvile pga. ryg‐/bensmerter? ‰ ‰ ‰ ‰ 9. Kan du gå 400 m uden at hvile pga. ryg‐/bensmerter? ‰ ‰ ‰ ‰ 10. Kan du løbe 100 m uden at hvile pga. ryg‐/bensmerter? ‰ ‰ ‰ ‰ 11. Kan du cykle eller køre bil/bus uden ryg‐/bensmerter? ‰ ‰ ‰ ‰ Føler du, at ryg‐/bensmerterne har indflydelse på dine følelsesmæssige forhold til den nærmeste familie? ‰ ‰ ‰ ‰ 13. Hæmmer ryg‐/bensmerterne dit sexliv? ‰ ‰ ‰ ‰ 14. Tror du, at der er noget arbejde, som din ryg ikke kan klare? ‰ ‰ ‰ ‰ 15. Tror du, at rygsygdommen vil få indflydelse på din fremtid? ‰ ‰ ‰ ‰ 1. 2. 3. 12. 7
90
10. A PPENDICES
Generelle ryg‐/ben‐smerter
Hvis du samlet skulle beskrive hvordan dine generelle ryg‐/ben‐smerter har været i dag, hvordan har du da haft det? Afkryds kun ét felt. Slet ingen smerter 0 1 Værst tænkelige smerter 2 3 4 5 6 7 8 9 10 8
(Baseline)
91
10.3. A PPENDIX III
10.3. A PPENDIX III
Test-retest booklet used in the “Responsiveness in Subgroups Study”.
Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Personlig pleje (f.eks. vaske sig, klæde sig på) ‰ Jeg kan klare mig selv som normalt, uden at det giver flere smerter ‰ Jeg kan klare mig selv som normalt, men det giver smerter ‰ Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig ‰ Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv ‰ Jeg skal have hjælp hver dag til det meste af min personlige pleje ‰ Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 2: Smerter ‰ Jeg har ingen smerter for øjeblikket ‰ Smerterne er meget svage for øjeblikket ‰ Smerterne er moderate for øjeblikket ‰ Smerterne er forholdsvis kraftige for øjeblikket ‰ Smerterne er meget kraftige for øjeblikket ‰ Smerterne er de værst tænkelige for øjeblikket Afsnit 3: Sidde ‰ Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst ‰ Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst ‰ Smerterne hindrer mig i at sidde mere end 1 time ‰ Smerterne hindrer mig i at sidde mere end en ½ time ‰ Smerterne hindrer mig i at sidde mere end 10 minutter ‰ Jeg kan overhovedet ikke sidde på grund af smerterne Afsnit 4: Løfte ‰ Jeg kan løfte noget tungt uden at få flere smerter ‰ Jeg kan løfte noget tungt, men det giver mig flere smerter ‰ Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg ‰ kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord ‰ Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt ‰ Jeg kan kun løfte noget meget let ‰ Jeg kan ikke løfte eller bære noget som helst Afsnit 5: Stå ‰ Jeg kan stå op så længe jeg vil uden at få flere smerter ‰ Jeg kan stå op så længe jeg vil, men det giver mig flere smerter ‰ Smerterne hindrer mig i at stå op i mere end 1 time ‰ Smerterne hindrer mig i at stå op i mere end en ½ time ‰ Smerterne hindrer mig i at stå op i mere end 10 minutter ‰ Jeg kan overhovedet ikke stå på grund af smerterne 2 92
10. A PPENDICES
Afsnit 6: Gå ‰ Jeg kan gå så langt jeg har lyst selvom jeg har smerter ‰ Smerterne hindrer mig i at gå mere end 2 kilometer ‰ Smerterne hindrer mig i at gå mere end 1 kilometer ‰ Smerterne hindrer mig i at gå mere end 500 meter ‰ Jeg kan kun gå, når jeg bruger stok eller krykker ‰ Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 7: Rejse ‰ Jeg kan rejse hvorhen jeg vil uden smerter ‰ Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter ‰ Smerterne er slemme, men jeg kan godt klare over 2 timers rejse ‰ Smerterne begrænser mine rejser til mindre end 1 time ‰ Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter ‰ Smerterne hindrer mig i at rejse, undtagen for at få behandling Afsnit 8: Sove ‰ Min søvn forstyrres aldrig af smerterne ‰ Min søvn forstyrres af og til af smerterne ‰ På grund af smerterne får jeg mindre end 6 timers søvn ‰ På grund af smerterne får jeg mindre end 4 timers søvn ‰ På grund af smerterne får jeg mindre end 2 timers søvn ‰ Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 9: Mit sociale liv ‰ Mit sociale liv er som normalt og giver mig ikke ekstra smerter ‰ Mit sociale liv er som normalt, men øger mine smerter ‰ Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. ‰ Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte ‰ Smerterne har begrænset mit sociale liv til mit hjem ‰ Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Sexliv (hvis relevant) ‰ Mit sexliv er som normalt og giver ikke flere smerter ‰ Mit sexliv er som normalt, men giver flere smerter ‰ Mit sexliv er næsten som normalt, men giver mange smerter ‰ Mit sexliv er alvorligt hæmmet af smerterne ‰ Mit sexliv er næsten ophørt på grund af smerterne ‰ Smerterne hindrer sexliv overhovedet Mange tak for hjælpen. 3 93
10.3. A PPENDIX III
Generelle ryg‐/ben‐smerter
Hvordan har dine ryg‐/ben‐smerter været generelt siden i går? Afkryds kun ét felt. ‰ ‰ ‰ ‰
Bedre Uændret Værre Ikke sikker/ved ikke 4
(primær,1d)
94
10. A PPENDICES
10.4. A PPENDIX IV
Transition questions 1 and 2 (including cover letter) used in the “Responsiveness in Subgroups
Study”.
Gruppe A, 8 ugers opfølgning
Ref.: _________
Ændring af ryg‐/ben‐smerter
Hvordan vil du beskrive din generelle tilstand i ryggen og benene nu, hvis du sammenligner med hvordan du havde det, da du startede behandlingen? Afkryds kun ét felt. ‰ Meget bedre ‰ Bedre ‰ Lidt bedre ‰ Næsten det samme......Du er færdig. ‰ Lidt værre ‰ Værre ‰
Meget værre Den ændring du har oplevet i dine ryg‐ og ben‐smerter siden behandlingen startede, hvor vigtig er den for dig? Afkryds kun ét felt. Ikke vigtig 0 1 Meget vigtig 2 3 4 5 6 7 8 9 10 95
10.4. A PPENDIX IV
Gruppe B, 8 ugers opfølgning
Ref.: _________
Ændring af ryg‐/ben‐smerter
Hvis du tænker på, hvordan du havde det første gang du udfyldte spørgeskemaerne, hvordan har dine ryg‐ /ben‐smerter været siden da gennemsnitligt? ‰ ‰ ‰ Afkryds kun ét felt. Bedre ..................... Gå til spørgsmål 1. Uændret................ Du er færdig. Værre .................... Gå til spørgsmål 2. Spørgsmål 1: Hvor meget bedre er dine smerter i ryggen og benene blevet, siden du startede behandlingen? ‰
Næsten det samme, næsten ingen bedring ‰ En smule bedre ‰ Noget bedre ‰ En del bedre ‰ Meget bedre ‰ Rigtig meget bedre ‰ Helt rask Spørgsmål 2: Hvor meget værre er dine smerter i ryggen og benene blevet, siden du startede behandlingen? ‰ Næsten det samme, næsten ingen forværring ‰ En smule værre ‰ Noget værre ‰ En del værre ‰ Meget værre ‰ Rigtig meget værre ‰
Værst tænkelig Den ændring du har oplevet i dine ryg‐/ben‐smerter siden behandlingen startede, hvor vigtig er den for dig? Afkryds kun ét felt. Ikke vigtig Ligegyldig 0 1 Meget vigtig 2 3 4 5 6 7 8 9 10 96
10. A PPENDICES
Kære XXXX
Vedr.: usXXXX
Følgende patient er inkluderet i vores undersøgelse om effekten af rygbehandling. Som aftalt
fremsender jeg detaljerne til dig, så du kan lave et telefoninterview. Husk at gøre patienten
opmærksom på, hvordan han/hun havde det første gang, før du læser det vedlagte spørgeskema op.
Venligst skriv referencenummeret på interviewskemaet og brug det ved indtastningen af
resultaterne.
Tak for hjælpen.
Henrik H. Lauridsen
Syddansk Universitet
Institut for Idræt og Biomekanik
Campusvej 55
5230 Odense M
Tlf: 65503487
Reference nummer: ____________________
Patientens navn:
_______________________________________________________________
Telefonnummer:
____________________
Information til respondenten, som læses op først:
Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen og
benene således:
Slet ingen smerter
0
1
2
Værst tænkelige smerter
3
4
5
6
7
8
9
10
97
10.5. A PPENDIX V
10.5. A PPENDIX V
The semi-structured interview form used in the “Prospective Acceptable Outcome Study”.
Respondent nr. _____
Spørgsmål til Pilotundersøgelse – baseline skemaer
Hvis ikke tid nu, kan jeg ringe? □ Ja
□ Nej
Telefonnummer:_________________
Sværhedsgrad
1. Overordnet set, var spørgsmålene svære at svare på?
□ Ja
□ Nej
Hvis ja, beskriv venligst hvad der var svært:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
2. I spørgsmål 8 på side 2 (ryg- og/eller bensmerter), var det svært at vælge en svarkategori for hvad
der var acceptabelt efter behandlingen?
□ Ja
□ Nej
Hvis ja, beskriv venligst hvad der var svært:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
_______________________________________________________________________________
Forståelse
3. Beskriv venligst hvordan forstår du sætningen: ”hvilket resultat du vil acceptere efter
behandlingen?” (evt. optages på bånd)
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
_______________________________________________________________________________
4. Mener du, at der er forskel på ”hvad man forventer/håber af behandlingen” og ”hvad man vil
acceptere af behandlingen?
□ Ja
□ Nej
□ Ved ikke
Hvis ja, kan du beskrive hvad forskellen er? (evt. optages på bånd)
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
_______________________________________________________________________________
1
98
10. A PPENDICES
Respondent nr. _____
Hvis nej, kan du beskrive hvorfor ikke? (evt. optages på bånd)
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
_______________________________________________________________________________
Vigtighed af en ændring
Såfremt du skal vurdere om en given forbedring/forværring er enten vigtig eller ikke vigtig, hvor ville
du så skille skalaen vist nedenfor? (sæt en lodret streg mellem de to tal hvor det skiller)
Slet
ingen
smerter
0
1
2
3
4
5
6
7
8
Værst
tænkelige
smerter
9
10
Ikke udfyldte spørgsmål
5. Hvorfor har du ikke udfyldt spørgsmål _________________?
Svar:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
6. Hvorfor har du ikke udfyldt spørgsmål _________________?
Svar:
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
Kommentarer skrevet på skemaerne
7. Kommentar skrevet ved ________________
Hvorfor har du skrevet en kommentar og ikke afkrydset skemaet?
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
2
99
10.5. A PPENDIX V
Respondent nr. _____
________________________________________________________________________________
8. Kommentar skrevet ved spørgsmål(ene) ________________
Hvorfor har du skrevet en kommentar og ikke afkrydset skemaet?
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
Andet
9. Andre kommentarer/indvendinger?
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
________________________________________________________________________________
3
100
10. A PPENDICES
10.6. A PPENDIX VI
Baseline questionnaire booklet used in the “Prospective Acceptable Outcome Study”.
Lidt o
om d
dig sselv Dato _______________ Navn _________________________________________________________________ Adresse _______________________________________________________________ Postnummer ___________________________________________________________ By ____________________________________________________________________ Tlf. nr. Hjemme ________________ Arbejde __________________ Mobil ____________________ 1. Har du haft det nuværende problem før? ‰ Ja ‰ Nej 2. Hvis Ja, ca. hvor mange gange? ‰ 0‐2 gange ‰ 3‐10 gange ‰ mere end 10 gange 3. Hvor har du ondt? ‰ I lænderyggen (sæt ét kryds) Med ”lænderyggen” menes det skraverede område. ‰ I det ene eller begge ben ‰ Begge steder ‰ Ingen af stederne. Beskriv: Med benene menes det skraverede område. Smerter i benene som ikke stammer fra ryggen – fx slidgigt i knæet – medregnes ikke her. 4. Hvor ofte har du taget smertestillende medicin for dine ryg‐/bensmerter inden for den sidste uge? ‰
‰
‰
‰
Aldrig Et par gange Mere end et par gange, men ikke dagligt Dagligt 5. Har du tidligere søgt behandling for samme problem? 2
‰ Ja ‰ Nej 101
10.6. A PPENDIX VI
Generelle ryg‐ og/eller bensmerter Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter.
6. Hvis du samlet skal bedømme dine ryg- og/eller bensmerter i de sidste 2 uger,
hvordan har de da været?
Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”.
Slet
ingen
smerter
0
1
2
3
4
5
6
7
8
Værst
tænkelige
smerter
9
10
7. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i de sidste 2 uger, hvordan har de da
været?
Afkryds kun ét felt.
‰
‰
‰
‰
Ingen smerter
Lette smerter
Moderate smerter
Stærke smerter
3 102
10. A PPENDICES
De følgende spørgsmål omhandler dine ryg- og/eller bensmerter. Sæt kun ét kryds i én boks på hver skala
fra ”0” til ”10”. Læs venligst hvert spørgsmål grundigt, inden du svarer.
Hver sektion består af to spørgsmål: Det første spørgsmål handler om, hvordan du har haft det de sidste
par dage. Det andet spørgsmål handler om, hvilket resultat du vil acceptere efter behandlingen, hvis du
fx var nødt til at acceptere en vis smerte. Det er altså ikke hvad du forventer/håber af behandlingen,
men hvad du vil acceptere.
Her er et eksempel på hvordan du skal gøre:
Her viser du, hvordan du
har haft det de sidste par
dage.
8. Hvordan vil du beskrive dine ryg- og/eller bensmerter de sidste par dage?
Ingen smerter
0
1
‰
‰
2
3
4
5
6
7
‰
‰
‰
‰
‰
‰
Værst tænkelige smerter
8
9
10
‰
‰
‰
9. Hvad vil være acceptabel smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte?
Ingen
smerte
0
1
2
3
4
5
6
7
8
9
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
Her viser du, hvad du synes der er
acceptabelt efter behandlingen,
hvis du var nødt til at acceptere en
vis smerte.
4
Værst
tænkelig
smerte
10
‰
103
10.6. A PPENDIX VI
8. Hvordan vil du beskrive dine ryg- og/eller bensmerter de sidste par dage?
Værst
tænkelige
smerter
Ingen
smerter
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
9. Hvad vil være acceptabel smerte efter behandlingen, hvis du bliver nødt til at acceptere en vis smerte?
Værst
tænkelig
smerte
Ingen
smerte
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
10. Hvordan har dine ryg- og/eller bensmerter påvirket dine daglige aktiviteter (husligt arbejde, vaske sig, tage tøj
på, løfte, gå, køre, gå på trapper, sætte sig i/rejse sig fra stol, lægge sig i/komme ud af seng, sove) de sidste par
dage?
Ikke i stand
til at udføre
aktiviteter
Ingen
påvirkning
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
11. Hvad vil være en acceptabel påvirkning af dine daglige aktiviteter efter behandlingen, hvis du bliver nødt til
at acceptere en vis påvirkning?
Ikke i stand
til at udføre
aktiviteter
Ingen
påvirkning
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
12. Hvor meget har dine ryg- og/eller bensmerter påvirket dine sociale, familiære og fritidsaktiviteter de sidste par
dage?
Ikke i stand
til at udføre
aktiviteter
Ingen
påvirkning
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
13. Hvad vil være en acceptabel påvirkning af dine sociale aktiviteter, hvis du bliver nødt til at acceptere en vis
påvirkning?
Ikke i stand
til at udføre
aktiviteter
Ingen
påvirkning
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
5
104
10. A PPENDICES
14. Hvor anspændt (nervøs, irritabel, svært ved at slappe af/koncentrere sig) har du været de sidste par dage?
Ikke anspændt
overhovedet
Ekstremt
anspændt
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
15. Hvad vil være acceptabel anspændthed efter behandlingen, hvis du bliver nødt til at acceptere en vis
anspændthed?
Ikke anspændt
overhovedet
Ekstremt
anspændt
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
16. Hvor deprimeret (nede, ked af det, i dårligt humør, pessimistisk, sløv) har du været det sidste stykke tid?
Ikke deprimeret
overhovedet
Dybt
deprimeret
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
17. Hvor deprimeret kan du acceptere at være efter behandlingen, hvis du bliver nødt til at acceptere en vis grad
af depression?
Ikke deprimeret
overhovedet
Dybt
deprimeret
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
18. Hvordan tror du dit arbejde (både i og udenfor hjemmet) har påvirket dine ryg- og/eller bensmerter de sidste par
dage?
Ikke forværret
smerterne
Forværret
smerterne meget
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
19. Hvor meget vil du acceptere,
acceptere at dit arbejde påvirker dine ryg- og/eller bensmerter (både i og udenfor hjemmet)
efter behandlingen, hvis du bliver nødt til at acceptere en vis påvirkning?
Forværrer
ikke smerterne
Forværrer
smerterne meget
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
20. Hvor meget har du selv kunnet kontrollere (afhjælpe/mindske) og magte dine ryg- og/eller bensmerter de sidste
par dage?
Fuldstændig
kontrol
Ingen kontrol
overhovedet
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
21. Hvad vil være acceptabel selvkontrol af dine ryg- og/eller bensmerter, hvis du bliver nødt til at acceptere, at
du ikke får fuld kontrol over smerterne?
Fuldstændig
kontrol
Ingen kontrol
overhovedet
0
1
2
3
4
5
6
7
8
9
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
6
105
10.6. A PPENDIX VI
Det næste spørgeskema er også lavet for at give os viden om, hvordan dine ryg- og/eller bensmerter påvirker
din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vi er klar over, at du måske mener, at
to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi
dig om kun at markere det udsagn, som bedst beskriver dit problem. Læs venligst hvert spørgsmål
grundigt, inden du svarer.
Hver afsnit består af to spørgsmål: Det første spørgsmål handler om, hvordan du har det i dag. Det andet
spørgsmål handler om, hvilket resultat du vil acceptere efter behandlingen, hvis du fx bliver nødt til at
acceptere en vis smerte. Det er altså ikke hvad du forventer/håber af behandlingen, men hvad du vil
acceptere.
Her viser du, hvordan du
har det i dag.
Her er et eksempel på hvordan du skal gøre:
I dag
Afsnit 22a: Smerter
1. Jeg har ingen smerter
2. Smerterne er meget svage
3. Smerterne er moderate
4. Smerterne er forholdsvis kraftige
5. Smerterne er meget kraftige
6. Smerterne er de værst tænkelige
‰
‰
‰
‰
‰
‰
Afsnit 22b: Hvad vil du acceptere med hensyn til din smerte efter behandlingen, hvis du
bliver nødt til at acceptere en vis smerte?
1. Jeg har ingen smerter
2. Smerterne er meget svage
3. Smerterne er moderate
4. Smerterne er forholdsvis kraftige
5. Smerterne er meget kraftige
6. Smerterne er de værst tænkelige
‰
‰
‰
‰
‰
‰
Her viser du, hvad du synes er acceptabelt efter
behandlingen, hvis du bliver nødt til at acceptere en
vis smerte.
7
106
10. A PPENDICES
I dag
Afsnit 22a: Smerter
1. Jeg har ingen smerter
2. Smerterne er meget svage
3. Smerterne er moderate
4. Smerterne er forholdsvis kraftige
5. Smerterne er meget kraftige
6. Smerterne er de værst tænkelige
‰
‰
‰
‰
‰
‰
Afsnit 22b: Hvad vil du acceptere med hensyn til din smerte efter
behandlingen, hvis du bliver nødt til at acceptere en vis smerte?
1. Jeg har ingen smerter
2. Smerterne er meget svage
3. Smerterne er moderate
4. Smerterne er forholdsvis kraftige
5. Smerterne er meget kraftige
6. Smerterne er de værst tænkelige
Efter
behandlingen
‰
‰
‰
‰
‰
‰
Afsnit 23a: Personlig pleje (f.eks. vaske sig, klæde sig på)
1. Jeg kan klare mig selv som normalt, uden at det giver flere smerter
2. Jeg kan klare mig selv som normalt, men det giver smerter
3. Det er meget smertefuldt at klare mig selv, og jeg er langsom og forsigtig
4. Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv
5. Jeg skal have hjælp hver dag til det meste af min personlige pleje
6. Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen
I dag
Afsnit 23b: Hvad vil du acceptere med hensyn til din personlige pleje efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Jeg kan klare mig selv som normalt, uden at det giver flere smerter
2. Jeg kan klare mig selv som normalt, men det giver smerter
3. Det er meget smertefuldt at klare mig selv, og jeg er langsom og forsigtig
4. Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv
5. Jeg skal have hjælp hver dag til det meste af min personlige pleje
6. Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen
Efter
behandlingen
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
Afsnit 24a: Løfte
1. Jeg kan løfte noget tungt uden at få flere smerter
2. Jeg kan løfte noget tungt, men det giver mig flere smerter
3. Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det
er anbragt bekvemt, f.eks. på et bord
4. Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt,
hvis det er anbragt bekvemt
5. Jeg kan kun løfte noget meget let
6. Jeg kan ikke løfte eller bære noget som helst
I dag
Afsnit 24b: Hvad vil du acceptere med hensyn til at kunne løfte efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning.
1. Jeg kan løfte noget tungt uden at få flere smerter
2. Jeg kan løfte noget tungt, men det giver mig flere smerter
3. Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg kan klare det, hvis det
er anbragt bekvemt, f.eks. på et bord
4. Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt,
hvis det er anbragt bekvemt
5. Jeg kan kun løfte noget meget let
6. Jeg kan ikke løfte eller bære noget som helst
Efter
behandlingen
8
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
107
10.6. A PPENDIX VI
Afsnit 25a: Gå
1. Jeg kan gå så langt jeg har lyst, selvom jeg har smerter
2. Smerterne hindrer mig i at gå mere end 2 kilometer
3. Smerterne hindrer mig i at gå mere end 1 kilometer
4. Smerterne hindrer mig i at gå mere end 500 meter
5. Jeg kan kun gå, når jeg bruger stok eller krykker
6. Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet
Afsnit 25b: Hvad vil du acceptere med hensyn til at kunne gå efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Jeg kan gå så langt jeg har lyst, selvom jeg har smerter
2. Smerterne hindrer mig i at gå mere end 2 kilometer
3. Smerterne hindrer mig i at gå mere end 1 kilometer
4. Smerterne hindrer mig i at gå mere end 500 meter
5. Jeg kan kun gå, når jeg bruger stok eller krykker
6. Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet
Afsnit 26a: Sidde
1. Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst
2. Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst
3. Smerterne hindrer mig i at sidde mere end 1 time
4. Smerterne hindrer mig i at sidde mere end ½ time
5. Smerterne hindrer mig i at sidde mere end 10 minutter
6. Jeg kan overhovedet ikke sidde på grund af smerterne
Afsnit 26b: Hvad vil du acceptere med hensyn til at kunne sidde efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst
2. Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst
3. Smerterne hindrer mig i at sidde mere end 1 time
4. Smerterne hindrer mig i at sidde mere end ½ time
5. Smerterne hindrer mig i at sidde mere end 10 minutter
6. Jeg kan overhovedet ikke sidde på grund af smerterne
Afsnit 27a: Stå
1. Jeg kan stå op så længe jeg vil uden at få flere smerter
2. Jeg kan stå op så længe jeg vil, men det giver mig flere smerter
3. Smerterne hindrer mig i at stå op i mere end 1 time
4. Smerterne hindrer mig i at stå op i mere end ½ time
5. Smerterne hindrer mig i at stå op i mere end 10 minutter
6. Jeg kan overhovedet ikke stå på grund af smerterne
Afsnit 27b: Hvad vil du acceptere med hensyn til at kunne stå efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Jeg kan stå op så længe jeg vil uden at få flere smerter
2. Jeg kan stå op så længe jeg vil, men det giver mig flere smerter
3. Smerterne hindrer mig i at stå op i mere end 1 time
4. Smerterne hindrer mig i at stå op i mere end ½ time
5. Smerterne hindrer mig i at stå op i mere end 10 minutter
6. Jeg kan overhovedet ikke stå på grund af smerterne
9
I dag
‰
‰
‰
‰
‰
‰
Efter
behandlingen
‰
‰
‰
‰
‰
‰
I dag
‰
‰
‰
‰
‰
‰
Efter
behandlingen
‰
‰
‰
‰
‰
‰
I dag
‰
‰
‰
‰
‰
‰
Efter
behandlingen
‰
‰
‰
‰
‰
‰
108
10. A PPENDICES
Afsnit 28a: Sove
1. Min søvn forstyrres aldrig af smerterne
2. Min søvn forstyrres af og til af smerterne
3. På grund af smerterne får jeg mindre end 6 timers søvn
4. På grund af smerterne får jeg mindre end 4 timers søvn
5. På grund af smerterne får jeg mindre end 2 timers søvn
6. Jeg kan overhovedet ikke sove på grund af smerterne
Afsnit 28b: Hvad vil du acceptere med hensyn til at kunne sove efter
behandlingen, hvis du bliver nødt til at acceptere en vis forstyrrelse?
1. Min søvn forstyrres aldrig af smerterne
2. Min søvn forstyrres af og til af smerterne
3. På grund af smerterne får jeg mindre end 6 timers søvn
4. På grund af smerterne får jeg mindre end 4 timers søvn
5. På grund af smerterne får jeg mindre end 2 timers søvn
6. Jeg kan overhovedet ikke sove på grund af smerterne
Afsnit 29a: Sexliv (hvis relevant)
1. Mit sexliv er som normalt og giver ikke flere smerter
2. Mit sexliv er som normalt, men giver flere smerter
3. Mit sexliv er næsten som normalt, men giver mange smerter
4. Mit sexliv er alvorligt hæmmet af smerterne
5. Mit sexliv er næsten ophørt på grund af smerterne
6. Smerterne hindrer sexliv overhovedet
Afsnit 29b: Hvad vil du acceptere med hensyn til dit sexliv (hvis relevant) efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Mit sexliv er som normalt og giver ikke flere smerter
2. Mit sexliv er som normalt, men giver flere smerter
3. Mit sexliv er næsten som normalt, men giver mange smerter
4. Mit sexliv er alvorligt hæmmet af smerterne
5. Mit sexliv er næsten ophørt på grund af smerterne
6. Smerterne hindrer sexliv overhovedet
I dag
‰
‰
‰
‰
‰
‰
Efter
behandlingen
‰
‰
‰
‰
‰
‰
I dag
‰
‰
‰
‰
‰
‰
Efter
behandlingen
‰
‰
‰
‰
‰
‰
Afsnit 30a: Mit sociale liv
1. Mit sociale liv er som normalt og giver mig ikke ekstra smerter
2. Mit sociale liv er som normalt, men øger mine smerter
3. Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske
aktiviteter som f.eks. sport osv.
4. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte
5. Smerterne har begrænset mit sociale liv til mit hjem
6. Jeg har ikke noget socialt liv på grund af smerterne
I dag
Afsnit 30b: Hvad vil du acceptere med hensyn til dit sociale liv efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Mit sociale liv er som normalt og giver mig ikke ekstra smerter
2. Mit sociale liv er som normalt, men øger mine smerter
3. Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske
aktiviteter som f.eks. sport osv.
4. Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte
5. Smerterne har begrænset mit sociale liv til mit hjem
6. Jeg har ikke noget socialt liv på grund af smerterne
Efter
behandlingen
10
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
109
10.6. A PPENDIX VI
Afsnit 31a: Rejse
1. Jeg kan rejse hvorhen jeg vil uden smerter
2. Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter
3. Smerterne er slemme, men jeg kan godt klare over 2 timers rejse
4. Smerterne begrænser mine rejser til mindre end 1 time
5. Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter
6. Smerterne hindrer mig i at rejse, undtagen for at få behandling
I dag
Afsnit 31b: Hvad vil du acceptere med hensyn til at kunne rejse efter
behandlingen, hvis du bliver nødt til at acceptere en vis begrænsning?
1. Jeg kan rejse hvorhen jeg vil uden smerter
2. Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter
3. Smerterne er slemme, men jeg kan godt klare over 2 timers rejse
4. Smerterne begrænser mine rejser til mindre end 1 time
5. Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter
6. Smerterne hindrer mig i at rejse, undtagen for at få behandling
Efter
behandlingen
11
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
‰
110
10. A PPENDICES
10.7. A PPENDIX VII
The one week follow-up booklet used in the “Prospective Acceptable Outcome Study” was the
same as appendix VI except for: 1) page 1 and 2 was omitted, and 2) an additional page was
added at the end (see below).
Generelle ryg- og/eller bensmerter
Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter.
35. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i den sidste uge, hvordan
har de da været?
Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”.
Slet
ingen
smerter
0
1
2
3
4
5
6
7
8
Værst
tænkelige
smerter
9
10
36. Hvordan har dine ryg- og/eller bensmerter generelt været i den sidste uge?
Afkryds kun ét felt.
‰
‰
‰
‰
Bedre
Uændret
Værre
Ikke sikker/ved ikke
Generel holdning til resultatet
37. Siden du sidst udfyldte skemaerne - har du da ændret mening/holdning til, hvad der er et
acceptabelt resultat af behandlingen?
Afkryds kun ét felt.
‰
‰
Jeg har ikke ændret mening/holdning
Jeg har ændret mening/holdning
10
111
10.8. A PPENDIX VIII
10.8. A PPENDIX VIII
Eight weeks follow-up booklet used in the “Prospective Acceptable Outcome Study”.
Skriv venligst dato for udfyldelse: ______________
Generelle ryg- og/eller bensmerter
Disse to spørgsmål drejer sig om dine samlede ryg- og/eller bensmerter.
1. Hvis du samlet skal bedømme dine ryg- og/eller bensmerter i de sidste 2 uger hvordan har de
da været?
Sæt kun ét kryds i én boks på skalaen fra ”0” til ”10”.
Slet
ingen
smerter
0
1
2
3
4
5
6
7
8
Værst
tænkelige
smerter
9
10
2. Hvis du samlet skal beskrive dine ryg- og/eller bensmerter i de sidste 2 uger hvordan har de da været?
Afkryds kun ét felt.
‰
‰
‰
‰
Ingen smerter
Lette smerter
Moderate smerter
Stærke smerter
2
112
10. A PPENDICES
Dette spørgeskema omhandler dine ryg- og bensmerter. Oplysningerne vil blive behandlet fortroligt. Sæt kun ét kryds
i én boks på hver skala fra ”0” til ”10”. Svarene skal være et gennemsnit af, hvordan din tilstand har været de sidste
par dage. Læs venligst hvert spørgsmål grundigt, inden du svarer.
3. Hvordan vil du beskrive dine ryg- og bensmerter?
Ingen
smerter
0
1
2
3
4
5
6
7
8
‰
‰
‰
‰
‰
‰
‰
‰
‰
Værst tænkelige
smerter
9
10
‰
‰
4. Hvordan har dine ryg- og/eller bensmerter påvirket dine daglige aktiviteter (husligt arbejde, vaske sig,
tage tøj på, løfte, gå, køre, gå på trapper, sætte sig i/rejse sig fra stol, lægge sig i/komme ud af seng, sove)?
Ingen
påvirkning
0
1
‰
‰
2
3
4
5
6
7
‰
‰
‰
‰
‰
‰
Ikke i stand til
at udføre aktiviteter
8
9
10
‰
‰
‰
5. Hvor meget har dine ryg- og/eller bensmerter påvirket dine sociale, familiære og fritidsaktiviteter?
Ingen
påvirkning
0
1
‰
‰
2
3
4
5
6
7
‰
‰
‰
‰
‰
‰
Ikke i stand til
at udføre aktiviteter
8
9
10
‰
‰
‰
6. Hvor anspændt (nervøs, irritabel, svært ved at slappe af/koncentrere sig) har du været?
Ikke anspændt
overhovedet
0
1
‰
‰
2
3
4
5
6
7
8
9
‰
‰
‰
‰
‰
‰
‰
‰
Ekstremt
anspændt
10
‰
7. Hvor deprimeret (nede, ked af det, i dårligt humør, pessimistisk, sløv) har du været?
Ikke deprimeret
overhovedet
0
1
‰
‰
2
3
4
5
6
7
8
‰
‰
‰
‰
‰
‰
‰
Dybt
deprimeret
9
10
‰
‰
8. Hvordan tror du dit arbejde (både i og udenfor hjemmet) har påvirket dine ryg- og/eller bensmerter?
Ikke forværret
smerterne
0
1
‰
‰
2
3
4
5
6
7
8
‰
‰
‰
‰
‰
‰
‰
Forværret
smerterne meget
9
10
‰
‰
9. Hvor meget har du selv kunnet kontrollere (afhjælpe/mindske) og magte dine ryg- og/eller bensmerter?
Fuldstændig
kontrol
0
1
‰
‰
2
3
4
5
6
7
8
‰
‰
‰
‰
‰
‰
‰
3
Ingen kontrol
overhovedet
9
10
‰
‰
113
10.8. A PPENDIX VIII
Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter ‰ Jeg har ingen smerter for øjeblikket ‰ Smerterne er meget svage for øjeblikket ‰ Smerterne er moderate for øjeblikket ‰ Smerterne er forholdsvis kraftige for øjeblikket ‰ Smerterne er meget kraftige for øjeblikket ‰ Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) ‰ Jeg kan klare mig selv som normalt, uden at det giver flere smerter ‰ Jeg kan klare mig selv som normalt, men det giver smerter ‰ Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig ‰ Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv ‰ Jeg skal have hjælp hver dag til det meste af min personlige pleje ‰ Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte ‰ Jeg kan løfte noget tungt uden at få flere smerter ‰ Jeg kan løfte noget tungt, men det giver mig flere smerter ‰ Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg ‰ kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord ‰ Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt ‰ Jeg kan kun løfte noget meget let ‰ Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå ‰ Jeg kan gå så langt jeg har lyst selvom jeg har smerter ‰ Smerterne hindrer mig i at gå mere end 2 kilometer ‰ Smerterne hindrer mig i at gå mere end 1 kilometer ‰ Smerterne hindrer mig i at gå mere end 500 meter ‰ Jeg kan kun gå, når jeg bruger stok eller krykker ‰ Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde ‰ Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst ‰ Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst ‰ Smerterne hindrer mig i at sidde mere end 1 time ‰ Smerterne hindrer mig i at sidde mere end en ½ time ‰ Smerterne hindrer mig i at sidde mere end 10 minutter ‰ Jeg kan overhovedet ikke sidde på grund af smerterne 4 114
10. A PPENDICES
Afsnit 6: Stå ‰ Jeg kan stå op så længe jeg vil uden at få flere smerter ‰ Jeg kan stå op så længe jeg vil, men det giver mig flere smerter ‰ Smerterne hindrer mig i at stå op i mere end 1 time ‰ Smerterne hindrer mig i at stå op i mere end en ½ time ‰ Smerterne hindrer mig i at stå op i mere end 10 minutter ‰ Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove ‰ Min søvn forstyrres aldrig af smerterne ‰ Min søvn forstyrres af og til af smerterne ‰ På grund af smerterne får jeg mindre end 6 timers søvn ‰ På grund af smerterne får jeg mindre end 4 timers søvn ‰ På grund af smerterne får jeg mindre end 2 timers søvn ‰ Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) ‰ Mit sexliv er som normalt og giver ikke flere smerter ‰ Mit sexliv er som normalt, men giver flere smerter ‰ Mit sexliv er næsten som normalt, men giver mange smerter ‰ Mit sexliv er alvorligt hæmmet af smerterne ‰ Mit sexliv er næsten ophørt på grund af smerterne ‰ Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv ‰ Mit sociale liv er som normalt og giver mig ikke ekstra smerter ‰ Mit sociale liv er som normalt, men øger mine smerter ‰ Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. ‰ Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte ‰ Smerterne har begrænset mit sociale liv til mit hjem ‰ Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse ‰ Jeg kan rejse hvorhen jeg vil uden smerter ‰ Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter ‰ Smerterne er slemme, men jeg kan godt klare over 2 timers rejse ‰ Smerterne begrænser mine rejser til mindre end 1 time ‰ Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter ‰ Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 5 115
10.8. A PPENDIX VIII
Vurdering af din behandling
I nedenstående 3 spørgsmål skal du give din samlede vurdering af den behandling, som du har
modtaget i forhold til dine ryg- og/eller benproblemer.
(Sæt ring om ét tal for hver linie)
Helt enig
Enig
Hverken
enig eller
uenig
20. Behandlingen opfyldte alle
mine forventninger
5
4
3
2
1
21. Resultatet af behandlingen
var acceptabelt
5
4
3
2
1
22. Behandlingen var værd at
modtage
5
4
3
2
1
Uenig
Helt uenig
Ændring af ryg- og/eller bensmerter
23. Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen
og/eller benene således:
Slet
ingen
smerter
0
Værst
tænkelige
smerter
1
2
3
4
5
6
7
8
9
10
24. Hvordan vil du beskrive din generelle tilstand i ryggen og/eller benene nu, hvis du
sammenligner med, hvordan du havde det, da du påbegyndte behandlingen?
Afkryds kun ét felt.
‰
‰
‰
‰
‰
‰
‰
Meget bedre
Bedre
Lidt bedre
Næsten det samme
Lidt værre
Værre
Meget værre
25. Samlet set, hvor vigtig er den ændring du har oplevet i dine ryg- og/eller bensmerter siden
behandlingen begyndte?
Afkryds kun ét felt.
‰
‰
Den ændring jeg har oplevet er vigtig og relevant for mig.
Den ændring (hvis nogen) jeg har oplevet er ligegyldig.
26. Hvis du har oplevet en ændring, hvor på en skala fra 0 – 10 vil du så placere ændringen?
Afkryds kun ét felt.
Ikke vigtig,
ligegyldig
0
1
Meget
vigtig
2
3
4
5
6
6
7
8
9
10
116
10. A PPENDICES
10.9. A PPENDIX IX
Nine weeks follow-up interview form used in the prospective acceptable outcome study.
Ændring af ryg-/bensmerter
Telefonopfølgning – uge 9 Ref. nr._________ Navn:____________________________ Tlf. nr.: (H) ______________________ (A) ______________________ (M) ______________________ Dato for udfyldelse: ___________________ Den første gang du udfyldte spørgeskemaerne, vurderede du dine generelle smerter fra ryggen og benene således: Slet ingen smerter 0 1 Værst tænkelige smerter 2 3 4 5 6 7 8 9 10 1. Hvordan vil du beskrive din generelle tilstand i ryggen og benene nu, hvis du sammenligner med hvordan du havde det, da du startede behandlingen? Afkryds kun ét felt. ‰ Meget bedre ‰ Bedre ‰ Lidt bedre ‰ Næsten det samme ‰ Lidt værre ‰ Værre ‰
Meget værre 2. Har der været nogen ændringer i dine ryg‐/bensmerter over den sidste uge? Afkryds kun ét felt. ‰ ‰ ‰
Tendens mod en forværring Uændret Tendens mod en forbedring 1
117
10.10. A PPENDIX X
10.10. A PPENDIX X
The Danish version of the Oswestry Disability Index.
Oswestry‐spørgeskema Dette spørgeskema er lavet for at give os viden om, hvordan dine ryg‐ eller bensmerter påvirker din evne til at klare dig i hverdagen. Sæt kun ét kryds i hvert afsnit. Vælg det udsagn, der passer bedst på dig i dag. Vi er klar over, at du måske mener, at to eller flere udsagn i samme afsnit passer på dig i dag, men af hensyn til undersøgelsens klarhed, beder vi dig om kun at markere det udsagn, som bedst beskriver dit problem. Afsnit 1: Smerter ‰ Jeg har ingen smerter for øjeblikket ‰ Smerterne er meget svage for øjeblikket ‰ Smerterne er moderate for øjeblikket ‰ Smerterne er forholdsvis kraftige for øjeblikket ‰ Smerterne er meget kraftige for øjeblikket ‰ Smerterne er de værst tænkelige for øjeblikket Afsnit 2: Personlig pleje (f.eks. vaske sig, klæde sig på) ‰ Jeg kan klare mig selv som normalt, uden at det giver flere smerter ‰ Jeg kan klare mig selv som normalt, men det giver smerter ‰ Det er smertefuldt at klare mig selv, og jeg er langsom og forsigtig ‰ Jeg har brug for nogen hjælp, men kan klare det meste af min personlige pleje selv ‰ Jeg skal have hjælp hver dag til det meste af min personlige pleje ‰ Jeg tager ikke tøj på, kan kun vanskeligt vaske mig og bliver i sengen Afsnit 3: Løfte ‰ Jeg kan løfte noget tungt uden at få flere smerter ‰ Jeg kan løfte noget tungt, men det giver mig flere smerter ‰ Smerterne hindrer mig i at løfte noget tungt fra gulvet, men jeg ‰ kan klare det, hvis det er anbragt bekvemt, f.eks. på et bord ‰ Smerterne hindrer mig i at løfte tunge ting, men jeg kan klare noget let til middeltungt, hvis det er anbragt bekvemt ‰ Jeg kan kun løfte noget meget let ‰ Jeg kan ikke løfte eller bære noget som helst Afsnit 4: Gå ‰ Jeg kan gå så langt jeg har lyst selvom jeg har smerter ‰ Smerterne hindrer mig i at gå mere end 2 kilometer ‰ Smerterne hindrer mig i at gå mere end 1 kilometer ‰ Smerterne hindrer mig i at gå mere end 500 meter ‰ Jeg kan kun gå, når jeg bruger stok eller krykker ‰ Jeg ligger i sengen det meste af tiden og må kravle ud til toilettet Afsnit 5: Sidde ‰ Jeg kan sidde i en hvilken som helst stol, så længe jeg har lyst ‰ Det er kun min yndlingsstol jeg kan sidde i, så længe jeg har lyst ‰ Smerterne hindrer mig i at sidde mere end 1 time ‰ Smerterne hindrer mig i at sidde mere end en ½ time ‰ Smerterne hindrer mig i at sidde mere end 10 minutter ‰ Jeg kan overhovedet ikke sidde på grund af smerterne 1 118
10. A PPENDICES
Afsnit 6: Stå ‰ Jeg kan stå op så længe jeg vil uden at få flere smerter ‰ Jeg kan stå op så længe jeg vil, men det giver mig flere smerter ‰ Smerterne hindrer mig i at stå op i mere end 1 time ‰ Smerterne hindrer mig i at stå op i mere end en ½ time ‰ Smerterne hindrer mig i at stå op i mere end 10 minutter ‰ Jeg kan overhovedet ikke stå på grund af smerterne Afsnit 7: Sove ‰ Min søvn forstyrres aldrig af smerterne ‰ Min søvn forstyrres af og til af smerterne ‰ På grund af smerterne får jeg mindre end 6 timers søvn ‰ På grund af smerterne får jeg mindre end 4 timers søvn ‰ På grund af smerterne får jeg mindre end 2 timers søvn ‰ Jeg kan overhovedet ikke sove på grund af smerterne Afsnit 8: Sexliv (hvis relevant) ‰ Mit sexliv er som normalt og giver ikke flere smerter ‰ Mit sexliv er som normalt, men giver flere smerter ‰ Mit sexliv er næsten som normalt, men giver mange smerter ‰ Mit sexliv er alvorligt hæmmet af smerterne ‰ Mit sexliv er næsten ophørt på grund af smerterne ‰ Smerterne hindrer sexliv overhovedet Afsnit 9: Mit sociale liv ‰ Mit sociale liv er som normalt og giver mig ikke ekstra smerter ‰ Mit sociale liv er som normalt, men øger mine smerter ‰ Smerterne begrænser ikke mit sociale liv væsentligt, bortset fra de mere fysiske aktiviteter som f.eks. sport osv. ‰ Smerterne har begrænset mit sociale liv, og jeg går ikke ud så ofte ‰ Smerterne har begrænset mit sociale liv til mit hjem ‰ Jeg har ikke noget socialt liv på grund af smerterne Afsnit 10: Rejse ‰ Jeg kan rejse hvorhen jeg vil uden smerter ‰ Jeg kan rejse hvorhen jeg vil, men det giver mig flere smerter ‰ Smerterne er slemme, men jeg kan godt klare over 2 timers rejse ‰ Smerterne begrænser mine rejser til mindre end 1 time ‰ Smerterne begrænser mine rejser til korte, nødvendige rejser under 30 minutter ‰ Smerterne hindrer mig i at rejse, undtagen for at få behandling Mange tak for hjælpen. 2 11. PAPERS
Paper I-1
Danish version of the Oswestry Disability Index for patients with low back pain. Part 1: Crosscultural adaptation, reliability and validity in two different populations
Paper I-2
Danish version of the Oswestry Disability Index for patients with low back pain. Part 2: Sensitivity, specificity and clinically significant improvement in two low back pain populations
Paper I-3
Responsiveness and minimal clinically important difference for pain and disability instruments
in low back pain patients
Paper I-4
Choice of external criteria in back pain research: does it matter? Recommendations based on
analysis of responsiveness
Paper II-1
Are low back pain patients able to determine acceptable outcome of treatment before it begins?