- Manual Therapy
Transcription
- Manual Therapy
Available online at www.sciencedirect.com Manual Therapy 14 (2009) 152e159 www.elsevier.com/math Original Article Interobserver reliability of physical examination of shoulder girdle Jettie G. Nomden a,b,*, Anton J. Slagers a,b, Gert J.D. Bergman c, Jan C. Winters c, Thomas J.B. Kropmans d, Pieter U. Dijkstra a,b,e a Department of Rehabilitation, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, 9700 RB Groningen, The Netherlands b Share, Graduate School for Health Research, University Medical Center Groningen, University of Groningen, The Netherlands c Department of General Practice, University Medical Center Groningen, University of Groningen, The Netherlands d Department of Medical Informatics & Medical Education, University of Ireland, Galway, Ireland e Department of Oral and Maxillofacial Surgery, University Medical Center Groningen, University of Groningen, The Netherlands Received 2 March 2007; received in revised form 20 December 2007; accepted 6 January 2008 Abstract The object of this study was to assess interobserver reliability in 23 tests concerning physical examination of the shoulder girdle. A physical therapist and a physical therapist/manual therapist independently performed a physical examination of the shoulder girdle in 91 patients with shoulder complaints of varying severity and duration. The observers assessed 23 items in total: active and passive abductions, passive external rotation, hand in neck (HIN) test, hand in back (HIB) test, impingement test according to Neer, springing test of the first rib and joint play test of the acromioclavicular joint. The interobserver reliability was evaluated by means of a Cohen’s Kappa, the weighted Kappa and the intraclass correlation (ICC). Criteria for acceptable reliability were: Kappa value 0.60, ICC 0.75 or an absolute agreement 80%. The results showed that Kappa values varied from 0.09 (springing test first rib, stiffness) to 0.66 (springing test first rib, pain), weighted Kappa varied from 0.35 (pain during HIB) to 0.73 (range of motion HIB) and ICC varied from 0.54 (abduction passive starting point painful arc) to 0.96 (active and passive ranges of motion in abduction). In total 11 (48%) items fulfilled the criteria of acceptable reliability. In conclusion, there appears to be a great deal of variation in the reliability of the tests used in the physical examination of the shoulder girdle. Over 50% of the tests did not meet the statistical criteria for acceptable reliability. Ó 2008 Elsevier Ltd. All rights reserved. Keywords: Reliability; Observer; Shoulder girdle; Physical examination 1. Introduction Shoulder complaints are common in the locomotor system. The yearly prevalence of shoulder complaints ranges from 100 to 160 per 1000 patients in the general population (Winters et al., 1999). The diagnosis in patients with shoulder complaints is difficult because currently no uniformity exists as to how shoulder * Corresponding author. þ31 50 3613651. E-mail address: [email protected] (J.G. Nomden). 1356-689X/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.math.2008.01.005 complaints should be labelled or defined (Green et al., 1998a). Diagnostic criteria for defining shoulder disorders are neither consistently nor reliably applied (Green et al., 2003). According to the Guidelines for Shoulder Complaints of the Dutch College of General Practitioners (Winters et al., 1999) most shoulder complaints are elicited by shoulder disorders, probably resulting from strain, aseptic inflammation or degeneration of soft tissues of the glenohumeral joint or of structures in the immediate surroundings. In most cases it cannot be determined accurately which structure is affected. J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 Hence, the term ‘shoulder complaints’ is used as a working as well as a final diagnosis (Winters et al., 1999). Shoulder complaints may result in considerable disability (Green et al., 2003). Shoulder pain often impairs the ability to sleep, and restricted and/or painful range of motion of the shoulder influences performance of activities of daily living (Green et al., 2003). Treatment of shoulder complaints is aimed at reducing symptoms such as pain and restricted range of motion, increasing functional activities and re-starting participation in work and social activities. In order to focus treatment and to evaluate effectiveness of treatment, reliable tests are an important prerequisite. Reliability of assessment of shoulder complaints and function of the shoulder differs per study, ranging from low to moderate (Green et al., 1998b; de Winter, 1999; Hoving et al., 2002; Terwee et al., 2005). Recently, movement tests of the shoulder and shoulder girdle, as recommended in the Guidelines for Shoulder Complaints of the Dutch College of General Practitioners (Winters et al., 1999), together with additional functional tests were used as outcome measures in a randomised controlled trial (Bergman et al., 2002). Thus these tests were used for evaluation of treatment efficacy. To interpret the outcomes of this study it is important to evaluate the reliability of the tests used. Differences found in the trial within or between groups may be caused by differences in treatment effects but also by differences between observers. The aim of the present study is to determine the interobserver reliability of the physical examination of the shoulder girdle as performed in the above-mentioned randomised controlled trial. 2. Methods Consecutive patients eligible for participation in the randomised controlled trial were invited to participate in this reliability study. Inclusion criteria for patients in that trial were presence of shoulder complaints, not being treated for these complaints in the past 3 months and aged over 18 yrs. Shoulder complaints were defined as pain at rest or provoked or aggravated by movement in the area between neck and elbow. Informed consent was obtained from all patients. Extension of the pain to the region between the scapulae, to the cervical spine or to the lower part of the arm was not an exclusion criterion. Exclusion criteria for patients were presence of specific rheumatic disorders, shoulder complaints caused by acute severe trauma or previous surgery, signs of cervical nerve root compression, or shoulder complaints related to general internal pathologic conditions of thoracic and abdominal organs (Bergman et al., 2002). Most patients included in the randomised controlled trial also participated in this reliability study. 153 Physical examinations were performed independently by a physical therapist and a physical therapist/manual therapist (JGN and AJS, 27 and 12 yrs practice experience, respectively). Before the study (clinical trial and reliability study) all tests were standardised and the observers received training in the application of the tests. The diagnosis was unknown to the observers. The order of examination by the two observers varied. Each observer examined about half of the patients as first observer followed by the second observer who performed the same examination a few minutes later. Patients were sitting upright during all examinations. All tests were performed in the morning. During the study the two physical therapists did not exchange information concerning the outcome of the assessments. Patients were instructed not to give any comment about the previous examination. 2.1. Examination of shoulder girdle The examination of the shoulder girdle was based upon the Guidelines for Shoulder Complaints of the Dutch College of General Practitioners (Winters et al., 1999). The examination was focused on range of motion of the shoulder (visually assessed to the nearest 5 ), on pain experienced (four point ordinal scale: no pain, little pain, much pain, and excruciating pain) and on occurrence of pain during movement. The following movements were examined: 2.1.1. Functional tests: hand in neck (HIN) test and hand in back (HIB) test Both tests were slightly modified from the tests described by Solem-Bertoft et al. (1996) (Appendix 1). The HIN and HIB were graded in to a score (range 0e7) based upon the end point reached. Additionally, during the HIN and HIB pain was assessed on a four point ordinal scale: no pain, little pain, much pain, and excruciating pain. 2.1.2. Active abduction The starting position of the patient was arm stretched alongside the body, held in external rotation and thumb directed sidewards. The patient lifted his extended arm sideways and upwards in the frontal plane until it was beside his head. The range of motion and pain was assessed. 2.1.3. Painful arc during active abduction Presence of a painful arc was assessed and if present starting point and end point was visually estimated. 2.1.4. Passive abduction The starting position of the patient was arm stretched alongside the body, held in external rotation and thumb directed sideward. The patient was asked to keep the 154 J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 shoulder arm muscles relaxed. The observer lifted the extended arm sideways and upwards in the frontal plane until it was beside the patient’s head. The range of motion and pain was assessed. 2.1.5. Painful arc during passive abduction Presence of a painful arc was assessed and if present starting point and end point was visually estimated. 2.1.6. Passive external rotation The starting position of the upper arm was 0 elevation, elbow held in 90 and forearm in neutral position. The patient was asked to keep the shoulder arm muscles relaxed. The observer supported the arm at the wrist, locked the elbow, and held the arm bent at 90 and rotated it outwards in the transversal plane. Range of motion and pain was assessed. 2.1.7. Impingement test The impingement test was only performed if no glenohumeral restrictions were found. The starting position was similar to passive abduction. During the test scapular rotation was prevented with one hand by the observer, while the other hand of the observer raised the patient’s arm in abduction, causing the greater tuberosity to impinge against the acromion. The results of the tests were interpreted as positive or negative (Neer, 1983). 2.1.8. Springing test of the first rib The observer exerted force with the second metacarpophalangeal joint on the first rib of the patient, assessing range of motion (normal or restricted), pain (present or absent), and joint stiffness (present or absent) (Jirout, 1986). 2.1.9. Acromioclavicular joint assessment Visual assessment of swelling (present of absent) and joint play test of the acromioclavicular joint. The observer manipulated the joint in the sagittal plane assessing presence of pain (present or absent). According to the protocol in the randomised clinical trial each observer assessed the active and passive movements in one or two movements. No verbal encouragements were given by the observers during active tests. 2.2. Statistical analysis Data analyses were performed in SPSS (version 12). Percentage of absolute agreement (calculated as the number of observations in which both observers agreed with each other divided by the total number of observations), Cohen’s Kappa and weighted Cohen’s Kappa were calculated to quantify the interobserver agreement for dichotomous data and ordinal data. Regarding range of motion of the shoulder, t-tests for related samples were performed and intraclass correlations (ICCs) were calculated. Additionally Bland and Altman (1986) plots were made for range of motion of the shoulder to analyse if the differences between observers were consistent across the range of measurements. Criteria for acceptable reliability were a Kappa value 0.60, and an ICC of 0.75 (Landis and Koch, 1977; Brouwer et al., 2003). Poor Kappa value can be present although absolute agreement is very high, probably related to lack of variation in cell filling. Therefore, an absolute agreement of 80% was also a criterion for an acceptable agreement. This study was approved by the Medical Ethics Committee of the University Medical Center Groningen, University of Groningen, The Netherlands. 3. Results A total of 91 participants were included in the study. Table 1 shows baseline characteristics of the patients. Generally, the duration of shoulder complaints ranged between 3 and 5 weeks. Many patients had had previous periods of shoulder complaints. In total 76 participants were assessed 6 weeks after inclusion in the trial and 15 participants were assessed 12 weeks after inclusion in the trial. Table 2 shows Cohen’s Kappa and absolute agreement for dichotomous data. For one test (‘acromioclavicular swelling’) Cohen’s Kappa could not be Table 1 Baseline characteristics of the participating patients. Variables N ¼ 91 Age in years (mean SD) Male Female 48.5 (11.8) 43 (47.3%) 48 (52.7%) Duration complaints 0e2 weeks 3e5 weeks 6e8 weeks 9e11 weeks 12e26 weeks >26 weeks Previous periods of shoulder complaints No Yes, left shoulder Yes, right shoulder Yes, both shoulders Previous neck complaints (minimally 1 week) No Yes 9 (9.9%) 28 (30.8%) 13 (14.3%) 11 (12.1%) 12 (13.2%) 18 (19.8%) 31 (34.1%) 23 (25.3%) 28 (30.8%) 9 (9.9%) 36 (39.6%) 55 (60.4%) Development of complaints Rapid/acute Gradual 28 (31%) 63 (69%) Shoulder pain (range 0e10) 3.4 (2.2) Shoulder restrictions (range 0e10) 4.5 (2.8) J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 Table 2 Cohen’s Kappa and absolute agreement for dichotomous data. Variables Kappa Absolute agreement (%) Active painful arc (present, absent) Passive painful arc (present, absent) Impingement (present, absent) Acromioclavicular swelling (present, absent) Springing test first rib range of motion (normal, restricted) Springing test first rib stiff (present, absent) Springing test first rib pain (present, absent) 0.46 0.52 0.47 e 0.26 74 76 74b 99a 66 0.09 0.66 68 82a e: Cohen’s Kappa could not be calculated because of incomplete filling of the 2 2 tables. a Tests fulfilling criteria for acceptable reliability. b Test only performed if no restrictions in glenohumeral range of motion were found. calculated because of incomplete filling of the 2 2 tables. For two tests (‘acromioclavicular swelling’ and ‘springing test first rib pain’) acceptable reliability (absolute agreement > 80%) was found. Table 3 shows the results in absolute agreement for ordinal data. In two functional tests (‘pain HIN’ and ‘pain HIB’) the absolute agreement was less than 80%. In the other seven tests the reliability was acceptable. Data of the differences between observers, results of t-tests for differences in mean range of motion between observers, and the corresponding ICC are shown in Table 4. For the tests ‘abduction passive starting point of painful arc’ and ‘passive external rotation’ the difference between the observers was statistically significant. For these outcome variables no plots were made because systematic differences between the observers exist (Bland and Altman, 1986). In Figs. 1 and 2 Bland and Altman plots are shown for ‘abduction range of motion active’ and ‘abduction active starting point of painful arc’ to illustrate the magnitude and direction of differences across the range of measurements. No funnel shape was observed in the plots. Similar results are found in Bland and Altman plots for ‘abduction passive range of motion’, Table 3 Weighted Kappa and absolute agreement for ordinal data. Variables Kappa Absolute agreement (%) Range of motion HIN HIB 0.52 0.73 85a 94a Pain HIN HIB Abduction active Abduction passive External rotation passive Impingement Acromioclavicular joint 0.52 0.35 0.65 0.69 0.50 0.62 0.51 79 73 90a 91a 82a 91a 90a a Tests fulfilling criteria for acceptable reliability. 155 ‘abduction active end point of painful arc’ and for ‘abduction passive end point of painful arc’. Thus differences between observers were consistent across the range of measurements for these tests. In two tests (range of motion in active and passive abductions) an ICC of >0.75 was observed. For these tests the interobserver reliability was acceptable. In summary, 11 of the 23 tests (48%) had an acceptable interobserver reliability. 4. Discussion Substantial variation in the interobserver reliability, ranging from poor to good reliability in the tests of physical examination of the shoulder girdle was found in this study. In the 23 tests performed 11 (48%) fulfilled the criteria of an acceptable reliability. For the tests on dichotomous data two out of seven tests showed acceptable reliability, for tests on ordinal data seven out of nine tests showed acceptable reliability and for tests on interval data two out of seven tests showed acceptable reliability (Tables 2e4). Thus, tests on ordinal data showed a higher reliability than tests on dichotomous or interval data. One might consider several explanations for the overall moderate reliability reported in this study. These explanations are related to the data level of the physical examination, training effects within patients, difference between observers and changes of the outcome as a result of the first physical examination. 4.1. Data level An explanation for better reliability results of tests at ordinal data level could be that patients prefer more response options. Answering on a more gradual, ordinal, scale (no pain, little pain, much pain, and excruciating pain) might be easier than answering on a dichotomous scale: pain absent or present. On a gradual scale patients can indicate more precisely how they experience the pain during the test. The tests producing interval data were all tests based on visual estimation by the observer of active/passive range of motion and starting/end point of a painful arc. Two movements at most were performed during which the examiner had to do his assessment because this was the trial protocol. For the movements active and passive abductions a good reliability was found despite the large standard deviations of the mean difference between the observers. For the observer it may be more difficult (i.e. less reliable) to assess range of motion during the movement, as for instance the starting point or end point of a painful arc, than in an end position of active and passive abductions. A significant difference between the assessments of the two observers was found 156 J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 Table 4 Differences between observer 1 and observer 2, results of t-test for related samples and ICCs. Variable Observer 1 mean (SD) Observer 2 mean (SD) Abduction range of motion Active Passive 160.2 (40.0) 165.9 (33.0) 160.2 (38.8) 165.0 (34.3) 0.0 (11.1) 1.0 (10.0) 1.000 0.346 0.96a 0.96a Abduction active Starting point of painful arc End point of painful arc 104.8 (39.2) 158.0 (26.4) 110.7 (37.2) 153.0 (31.4) 5.9 (28.5) 5.0 (26.7) 0.180 0.226 0.72 0.57 Abduction passive Starting point of painful arc End point of painful arc 114.7 (35.2) 162.6 (24.8) 126.9 (36.3) 160.9 (26.5) 12.2 (33.1) 1.6 (19.5) 0.032b 0.617 0.54 0.72 55.5 (19.4) 63.2 (21.5) 7.7 (14.2) <0.001b 0.70 External rotation range of motion passive a b Difference mean (SD) p value ICC (one way random) Tests fulfilling criteria for acceptable reliability between observers. Tests showing significant differences between observers. in ‘abduction passive starting point of painful arc’ and ‘passive external rotation’. The standard deviations of the mean difference between the observers provide an indication of the range of differences found between these observers. These differences are illustrated in the Bland and Altman plots (Figs. 1 and 2). The standard deviation of mean difference between the observers for ‘abduction active’ (11.1 ) indicates that if two observers measure the same patients a difference of 2 11.1 is to be expected in 95% of the number of patients. For the standard deviation of the ‘abduction passive end point of painful arc’ a difference of 2 19.5 is to be expected in 95% of the number of patients. These differences are considerable in the light of the total range measured. 4.2. Training effects patient during the physical examinations because ‘pain HIN’ and ‘pain HIB’ tests were the first tests in the examination. Patients may find it difficult, initially, to indicate the experienced pain level (no pain, little pain, much pain, and excruciating pain) during the test. 4.3. Observer differences Examinations were carried out by two experienced physical therapists, who had been trained extensively in performing the tests. However, one of them was also a manual therapist. Manual therapy is a postgraduate course undertaken following a physical therapy course. Manual therapists are specialised in diagnosing and treatment of dysfunction of the musculoskeletal system Therefore, it is possible that the physical signs and symptoms were interpreted differently by the two observers. It is remarkable that the tests on an ordinal scale ‘pain HIN’ and ‘pain HIB’ did not show an acceptable reliability. It is possible that a training effect occurs within the Fig. 1. Bland and Altman plot of the mean (of the two observers) active range of motion abduction plotted against the difference in active range of motion abduction between observers. Note that some data points represent more than one observation. Fig. 2. Bland and Altman plot of the mean (of the two observers) starting point of painful arc abduction active plotted against the difference between observers of starting point of painful arc abduction active. Note that some data points represent more than one observation. J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 Practical issues dictated which of the two observers performed the first or the second examination. In a post-hoc analysis the influence of observer sequence was analysed for differences in active and passive abductions, for passive external rotation and for start and end of painful arc, active and passive. Only for two movements, passive external rotation and start of the painful arc (passive) did the sequence have a significant influence on the differences between the observers. It is not clear why this phenomenon occurred only in these two movements. For all other movements the observer sequence had no effect on the differences between observers. 4.4. Systematic changes of the outcome as a result of the first examination It is possible that the first examination induces a change in magnitude or presence of an outcome measure and as a consequence the results of the second examination will differ from those of the first. For instance, pain provoked during the first examination of active abduction may increase pain perception during the second examination or may even influence the outcome of the assessment of the range of motion. 4.5. Random changes of outcome as a result of the first examination Finally it is possible that the differences between the first and the second examinations are based on random changes within the outcome variables assessed. An explanation for these differences cannot be given. Theoretically it might be possible that current neck pain influenced reliability of physical examination. This influence would only be possible if the influence of neck pain were different for the two observers and thereby inducing a difference in outcomes of the observers. This differential influence of neck pain on reliability results was not analysed in this study. 4.6. Other considerations The tests analysed in the reliability study are all tests commonly used in physical therapy practice and in clinical medical practice. The choice to include a test in this reliability study was pragmatic. Retrospectively it might have been more interesting or clinically more relevant if other tests focussing on functional limitations or pathophysiology had been investigated. For the tests in this study no technical instruments were used, which make these tests suitable for use in daily practice. Some reliability studies on shoulder movement have been performed when using instruments (Riddle et al., 1987; Green et al., 1998b; Hoving et al., 2002), but is not incontestably found that using 157 instruments results in higher reliability. In Tables 5 and 6 an overview of the results of studies similar to the current is presented. Comparing the present results with those of other studies is difficult because of differences in research methodology, for instance differences in diagnostic tests applied, joints assessed, active and passive motions, testing positions, and the profession of the observers (Riddle et al., 1987; Croft et al., 1994; Green et al., 1998b; de Winter, 1999; Hoving et al., 2002; Terwee et al., 2005). Within these studies and in the current study a similar variability was found concerning interobserver reliability (Tables 5 and 6). In the studies by Green et al. (1998b) and Hoving et al. (2002) the same design was used for a similar patient group. The physiotherapists achieved overall better results for interobserver reliability than the rheumatologists. Perhaps the training of physical therapists in physical examination during these studies was more extensive than that of the rheumatologists. In Terwee’s study (Terwee et al., 2005) five movements of the shoulder were estimated visually. Three tests and test conditions were similar to those in the current study. Active and passive abductions showed acceptable reliability in the current study as well as in the Terwee’s study. The mean difference and the standard deviation for active and passive abductions were higher in Terwee’s study than in the current study. In de Winter’s (1999) study interobserver agreement of the examination of the shoulder joint was performed and Kappa’s and absolute agreement were calculated. Five tests in that study were similar to the tests in the current study and similar reliability results were found (Table 6). In the current study two observers were used for logistical reasons. Because of the use of two observers we felt obligated to investigate interobserver differences. Within the time limits of this trial it was not possible to assess additionally the intraobserver reliability. In daily practice it is possible that two colleagues may temporarily take over each other’s duties. In that case, interobserver reliability assessed in this study is important. Differences in assessment results may be caused by improvements of the complaints but it may also reflect interobserver differences. The strength of the current study is the substantial number of patients (n ¼ 91) that participated. All patients who were asked to participate in this reliability study actually participated. However, not all patients participating in the trial of Bergman et al. (2002) could be recruited because of logistical reasons. The authors have no reason to believe that the selection of the patients for the reliability study may have influenced the results. Interobserver reliability of physical tests was moderate in this study as well as in other studies. Differences in assessments performed by two observers on the same subject do not automatically indicate actual change in 158 J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 Table 5 ICC reliability in similar shoulder movements in different studies. Observers: n Patients: n Year Profession Professional experience Method Standardization Time interval Movements/ comparable movements: n Flexion, act. Abduction, act. Abduction, pass. Riddle (Riddle et al., 1987) Croft (Croft et al., 1994) study 1 Croft (Croft et al., 1994) study 2 Green (Green et al., 1998b) Hoving (Hoving et al., 2002) Terwee Nomden (Terwee current et al., 2005) 16 50 1987 PT 6.3 yrs (mean) 6 6 1994 PT e 6 6 1994 PT e 6 6 1998 PT/MT Experienced 6 6 2002 Rheumatologists Experienced Goniometer, large, small e e 7/2 Visual Visual Inclinometer Inclinometer 2 201 2005 PT 3 and 10 yrs Visual 2 91 current PT/MT 27 and 15 yrs Visual Yes 15 min 2/2 Yes e 2 (4 pos)/2 Yes 1 hr 8/2 Yes 1 hr 8/2 Yes <1 hr 5/3 Yes <5 min 5 0.72 0.77a 0.72 0.49 0.88a 0.87a 0.96a 0.96a 4.7 (20.1) 4.1 (22.7) 0.0 (11.1) 1.0 (10.0) 0.88a (lying) 0.29 0.73 0.70 11.2 (12.0) 7.7 (14.2) 0.87a (large), 0.84a (small) External rotation, act. External 0.88a (large), 0.90a (small) rotation, pass. Hand behind back 0.95a 0.99a 0.43 0.37 0.80a 0.73 Terwee (Terwee et al., 2005) mean difference (SD) Nomden current mean difference (SD) 94a (abs. agr.) e: not reported. a Acceptable reliability. the outcome measures of that subject. Determining improvement or deterioration is not easy. It is still not clear which (combination of) tests should be used in diagnosing shoulder disorders and evaluation of shoulder treatment. It is recommended that more interobserver reliability studies should be carried out on tests producing ordinal data in order to analyse sources of measurement variation. 5. Conclusion A great variability in reliability exists in physical tests of the shoulder girdle. Despite the use of a standardised protocol to assess physical examination of the shoulder girdle, acceptable interobserver reliability was hard to achieve. In this study overall reliability was moderate. The most reliable tests in the study were tests at ordinal data level. In other reliability studies substantial variability was also been found in interobserver reliability. Unfortunately, it is difficult to compare these studies. Further investigations have to be carried out to find out which (combination of) tests is most suitable to assess shoulder complaints. Clinicians and researchers should interpret outcomes of physical examination of the shoulder girdle cautiously because outcomes might be biased by observer differences, but also by other sources of variation. Table 6 Kappa and absolute agreement in shoulder tests. Patients (n) Statistics Abduction active, pain Abduction passive, pain External rotation passive, pain Presence painful arc active Presence painful arc passive a Acceptable reliability. de Winter (1999) Nomden (current) de Winter (1999) Nomden (current) 201 Kappa 0.73a 0.44 0.45 0.67a 0.59 91 Kappa 0.65a 0.69a 0.50 0.46 0.52 201 Abs. agreement 95%a 89%a 80%a 88%a 89%a 91 Abs. agreement 90%a 91%a 82%a 74% 76% J.G. Nomden et al. / Manual Therapy 14 (2009) 152e159 159 Appendix 1 HIN and HIB as assessed in the randomised controlled trial concerning the effectiveness of manual therapy of the shoulder girdle Score HIN, an external rotation movement pattern HIB, an internal rotation movement pattern 1 From hand on thigh up to and including HIN on affected side, underarm in sagittal plane (90 flexed elbow fixed against hip) From HIN at affected side and underarm in sagittal plane just to touching with fingertips processus spinosi C7 and underarm (about) in sagittal plane From fingertips on processus spinosi C7 with underarm (about) in sagittal plane just to elbow in frontal plane From fingertips on processus spinosi C7 and underarm in frontal plane just to fingertips at heterolateral angulus superior scapulae with underarm in sagittal plane From fingertips on heterolateral angulus superior scapulae with underarm in sagittal plane just to elbow in frontal plane From fingertips on heterolateral angulus superior scapulae with elbow in frontal plane just to (almost) full abduction/elevation, but painful terminal passive abduction/elevation Active full abduction/elevation and (almost) painless terminal abduction/elevation From hand on thigh till lateral side thigh-bone with palm of the hand From palm of the hand on lateral side of thigh-bone till back of the hand on homolateral buttock 2 3 4 5 6 7 From back of the hand on homolateral buttock till back of the hand on lumbosacral crossing (the height of processus spinosus L5) From back of the hand on lumbosacral crossing till fist on waist (the height of processus spinosi L3) From fist on waist till back of the hand on thoracolumbal crossing (the height of processus spinosi Th 12) From back of the hand on thoracolumbal crossing to fingertips on heterolateral angulus inferior scapulae From fingertips on heterolateral angulus inferior scapulae till back of the hand between scapulae (the height of processus spinosi Th 7) HIN and HIB slightly modified from Solem-Bertoft et al. (1996). References Bergman GJ, Winters JC, van der Heijden GJ, Postema K, Meyboomde-Jong B. Groningen manipulation study. The effect of manipulation of the structures of the shoulder girdle as additional treatment for symptom relief and for prevention of chronicity or recurrence of shoulder symptoms. Design of a randomized controlled trial within a comprehensive prognostic cohort study. Journal of Manipulative and Physiological Therapeutics 2002;25(9):543e9. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1(8476):307e10. Brouwer S, Reneman MF, Dijkstra PU, Groothoff JW, Schellekens JM, Goeken LN. Testeretest reliability of the Isernhagen work systems functional capacity evaluation in patients with chronic low back pain. Journal of Occupational Rehabilitation 2003;13(4):207e18. Croft P, Pope D, Boswell R, Rigby A, Silman A. Observer variability in measuring elevation and external rotation of the shoulder. Primary Care Rheumatology Society Shoulder Study Group. British Journal of Rheumatology 1994;33(10):942e6. Green S, Buchbinder R, Glazier R, Forbes A. Systematic review of randomised controlled trials of interventions for painful shoulder: selection criteria, outcome assessment, and efficacy. BMJ 1998a; 316(7128):354e60. Green S, Buchbinder R, Forbes A, Bellamy N. A standardized protocol for measurement of range of movement of the shoulder using the Plurimeter-V inclinometer and assessment of its intrarater and interrater reliability. Arthritis Care and Research 1998b; 11(1):43e52. Green S, Buchbinder R, Hetrick S. Physiotherapy interventions for shoulder pain. Cochrane Database of Systematic Reviews 2003; 2:CD004258. Hoving JL, Buchbinder R, Green S, Forbes A, Bellamy N, Brand C, et al. How reliably do rheumatologists measure shoulder movement? Annals of the Rheumatic Diseases 2002;61(7):612e6. Jirout J. X-ray studies on the dynamics of the first rib. Manual Medicine 1986;2:59e61. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33(1):159e74. Neer CS. Impingement lesions. Clinical Orthopaedics and Related Research 1983;173:70e7. Riddle DL, Rothstein JM, Lamb RL. Goniometric reliability in a clinical setting. Shoulder measurements. Physical Therapy 1987;67(5):668e73. Solem-Bertoft E, Lundh I, Westerberg CE. Pain is a major determinant of impaired performance in standardized active motor tests. A study in patients with fracture of the proximal humerus. Scandinavian Journal of Rehabilitation Medicine 1996;28(2):71e8. Terwee CB, de Winter AF, Scholten RJ, Jans MP, Deville W, van Schaardenburg D, et al. Interobserver reproducibility of the visual estimation of range of motion of the shoulder. Archives of Physical Medicine and Rehabilitation 2005;86(7):1356e61. de Winter AF. Diagnosis and classification of shoulder complaints. Vrije Universiteit; 1999. p. 23e37. Winters JC, Sobel JS, van der Windt DAWM, Jonquiere M, de Winter AF, van der Heijden GJ, et al. NHG Standaard Schouderklachten (versie 1999) (Guidelines for shoulder Complaints of the Dutch College of General Practitioners (version 1999)). Huisarts en Wetenschap 1999;42:222e31.