Journal Home
Search for

Volume 29, Issue 6, Pages 475-485.e10 (July 2006)


View previous. 14 of 19 View next.

Manual Examination of the Spine: A Systematic Critical Literature Review of Reproducibility

Mette Jensen Stochkendahl, DCaCorresponding Author Informationemail address, Henrik Wulff Christensen, DC, MD, PhDb, Jan Hartvigsen, DC, PhDc, Werner Vach, PhDd, Mitchell Haas, DC, MAe, Lise Hestbaek, DC, PhDf, Alan Adams, DC, MS, MSEdg, Gert Bronfort, DC, PhDh

Received 15 September 2005; received in revised form 2 February 2006

Abstract 

Objective

Poor reproducibility of spinal palpation has been reported in previously published literature, and authors of recent reviews have posted criticism on study quality. This article critically analyzes the literature pertaining to the inter- and intraobserver reproducibility of spinal palpation to investigate the consistency of study results and assess the level of evidence for reproducibility.

Methods

Systematic review and meta-analysis were performed on relevant literature published from 1965 to 2005, identified using the electronic databases MEDLINE, MANTIS, and CINAHL and checking of reference lists. Descriptive data from included articles were extracted independently by 2 reviewers. A 6-point scale was constructed to assess the methodological quality of original studies. A meta-analysis was conducted among the high-quality studies to investigate the consistency of data, separately on motion palpation, static palpation, osseous pain, soft tissue pain, soft tissue changes, and global assessment. A standardized method was used to determine the level of evidence.

Results

The quality score of 48 included studies ranged from 0% to 100%. There was strong evidence that the interobserver reproducibility of osseous and soft tissue pain is clinically acceptable (κ ≥ 0.4) and that intraobserver reproducibility of soft tissue pain and global assessment are clinically acceptable. Other spinal procedures are either not reproducible or the evidence is conflicting or preliminary.

Article Outline

Abstract

Methods

Definitions

Study Selection

Data Extraction

Assessment of Methodological Quality of Trials

Meta-Analysis

Assessment of the Level of Evidence

Sensitivity Analysis

Results

Results of the Literature Search

Methodological Quality

Meta-Analysis

Evidence of Reproducibility

Sensitivity Analysis

Discussion

Summary of Results

Methodological and Clinical Considerations

Statistical Considerations

Limitations of this Review

Conclusions

Appendix A. 

Appendix B. 

Appendix C. Intra-observer reproducibility studies

Appendix D. Inter-observer reproducibility studies

References

Copyright

Biomechanical dysfunction is thought to be an important contributor to spinal pain, and manual palpation is a widely used procedure for the diagnosis of such dysfunctions among providers of manual medicine.1, 2, 3 Contrary to the expectations of many clinicians, unacceptable levels of reproducibility have been shown in the majority of the previously published literature, and authors of newer reviews have questioned the utility of manual examination procedures in spinal diagnosis altogether.4, 5, 6, 7 Severe criticism has been posted on the design of the original studies, including the use of asymptomatic subjects,4, 5 inexperienced observers,5 parallel testing,4 unclear definitions of positive findings and rating scales,4, 6 weak description of study results,4, 5, 7 and the need for improvement in overall study quality.4, 7 Furthermore, the dependence of Cohen's κ (the most widely statistical method used in studies on reproducibility) on the prevalence of positive findings, and the composition of the study population has been the subject of discussion.8, 9

Unfortunately, these reviews themselves have important limitations. For instance, some deal with only a minority of manual examination procedures such as chiropractic procedures only,4 1 spinal region,4, 6, 10 or motion palpation only.5 In only 3 reviews were a predefined quality system applied to assess study quality,4, 6, 7 and in none of the reviews were both the number of studies, the methodological quality, and the consistency of the outcomes considered, as recommended by van Tulder and others.11, 12, 13 Finally, in none of these reviews was the impact of the predefined criteria on the conclusions tested. Therefore, the value of palpation as a diagnostic tool is, at present, still unknown and so are the abilities of practitioners of manual therapy to reliably diagnose spinal dysfunctions using palpation.

We therefore decided that another systematic review taking into account the above issues was warranted. Furthermore, a meta-analysis including comparable studies of adequate methodological standard and assessment of the consistency of study outcomes would be highly useful. The purpose of this paper is therefore to systematically review and critically assess the design and statistical methodology of the literature pertaining to reproducibility of spinal palpation adopting standardized criteria for judging diagnostic studies. A meta-analysis was conducted to evaluate consistency of study outcomes. Finally, the level of evidence for the reproducibility of spinal palpation was determined.

Methods 

return to Article Outline

Definitions 

Palpation was defined according to Bergmann and Petersen,1 and results of the original articles were analyzed according to the palpation procedure, using the following annotations: motion palpation (MP), static palpation (SP) (palpation for alignment and/or structure), osseous pain (OP) (pain generated from palpation of osseous structures), soft tissue pain (STP), soft tissue changes (STC), and global assessment (GA) (the latter was introduced to describe the use of 2 or more of the above procedures to make 1 single judgement on the presence/absence of mechanical dysfunction). Each palpation procedure could be by applied under 5 conditions—standing, sitting, prone, supine, or side lying—and at different segmental levels. Consequently, a palpation procedure applied under a specific condition at 1 or more segmental level is denoted a test. A paper could consider a single test or several tests and only 1 palpation procedure or several palpation procedures.

Reproducibility refers to the ability of a single observer to find the same result using the same diagnostic procedure in the same patient on 2 separate moments in time (intraobserver agreement) and/or the ability of 2 observers to find the same result of a given diagnostic procedure in a patient (interobserver agreement).14

Study Selection 

Studies were identified by a comprehensive search of the MANTIS (1966-2005), CINAHL (1982-2005), and MEDLINE (1965-2005) databases using the index terms reproducibility, reliability, or observer variation in combination with palpation, motion palpation, physical examination procedures, or spine in text and abstracts. Bibliographies of retrieved documents were checked for any additional studies. The principal investigator (MJS) screened the documents retrieved from this search twice to determine eligibility according to inclusion and exclusion criteria, as listed in Figure 1.


View full-size image.

Fig 1. Inclusion and exclusion criteria.


Data Extraction 

Using a checklist, data from included documents were extracted and recorded independently by 2 of the authors (MJS and HWC). Completed checklists were then compared, and discordances were resolved by discussion until consensus was reached. If consensus could not be reached, a third investigator (JH) was available to mediate.

Assessment of Methodological Quality of Trials 

No standardized and validated method for assessing the quality of reproducibility studies exists. Therefore, a 6-point scale was constructed based on recognized requirements for clinical trials of reproducibility and standard recommendations for systematic reviews of test accuracy.12, 15, 16 The operational definitions of the quality criteria are described in Figure 2. A study was considered high-quality if the methodological quality score, expressed as a percentage of the maximum score, was 50% or higher and low-quality if the score was less than 50%. The quality score reflects the relevance and appropriateness of 3 separate dimensions that may affect interpretation of results, study population, study design, and statistical analysis. The quality scoring of the trials was performed independently by 2 reviewers (MJS and HWC). Differences in scores were resolved through consensus by the 2 reviewers. The quality scores of the individual trials were used as part of the evidence determination.


View full-size image.

Fig 2. Operational definitions of the quality criteria.


Meta-Analysis 

To assess the consistency of study outcomes in articles included in the systematic review, a meta-analysis was conducted. Not eligible for inclusion in the meta-analysis were (1) low quality studies (<50%), (2) studies not using a binary classification of the test outcome, (3) studies not reporting any results at all, (4) studies using a binary outcome but not reporting κ values, and (5) studies not reporting an adequate description of the palpation procedure.

When possible, single results from included studies (κ and confidence intervals [CI]) were drawn directly from the original articles. If CIs were not reported in the original studies, CIs were calculated according to Altman17 if the necessary information (prevalence and sample size) was available. Results for individual segmental levels not in sequence were included separately in the analysis. In case of multiple reproducibility results reported for several pairs of observers or several spinal segments in sequence, we took the average of the reported κ values and computed a CI, again by applying the Altman formula with the original sample size. This is a conservative approach ignoring a possible gain in precision due to taking the average.

We displayed all available original results in a forest plot. No formal modeling and analysis of heterogeneity was performed because (1) information on the precision of the single results was not available in all studies, (2) we used partially a conservative assessment in the single studies, and (3) multiple results within a study cannot be regarded as independent.

Overall κ values were computed by taking first the mean κ value within each study and then by averaging these mean κ values. Confidence intervals for the overall κ values are based on the empirical variation of the mean κ values, and were only computed if at least 4 studies constituted a mean κ value.

In a secondary analysis, the association between several study characteristics and the mean κ value of the study was tested by an analysis of covariance, including the type of palpation, separately for the intra- and interobserver results. The study characteristics were as follows: publication year, definition of positive findings, segmental region, standardization (ie, agreement on procedure, written instructions, and training sessions), application condition, occupation, experience, symptomatic status of test population, multiple tests.

Assessment of the Level of Evidence 

Criteria for determining the level of evidence for reproducibility of spinal palpation were adapted from the Agency for Health Care Policy and Research's guidelines for acute low back pain.18This method has been used to assess the level of evidence of risk factors for low back pain in systematic reviews of epidemiological studies.13, 19 The method takes into account all available included studies which describe a palpation procedure, report results, and use a valid statistical method (κ or κw) or intraclass correlation coefficient [ICC]).8

The system evaluates the evidence by taking into account (1) the number of studies, (2) the methodological quality expressed by quality scores, and (3) the consistency of the study outcomes. Consistency was checked by visual inspection of the forest plots. The rating system was applied to each palpation procedure. Five categories were used to describe evidence levels:


-Strong evidence: provided by generally consistent findings in multiple (≥2) high-quality studies

-Moderate evidence: provided by generally consistent findings in 1 high-quality study and 1 or more low-quality studies or in multiple (≥2) low-quality studies

-Preliminary evidence: only 1 study available

-Conflicting evidence: inconsistent findings in multiple (≥2) studies

-No evidence: no studies were identified

The level of acceptable reproducibility has traditionally, and somewhat arbitrarily, been set at κ > 0.4 in studies of manual medicine,8, 20, 21, 22, 23, 24, 25 and thus, a κ value above 0.4 was considered clinically acceptable reproducibility in this review. Levels of clinically acceptable reproducibility expressed in κw or ICC were arbitrarily chosen at 0.4 and 0.8, respectively.

Sensitivity Analysis 

To test the robustness of the assumptions behind the weighting of the evidence, the prespecified cut points for adequate methodological quality (50%) and minimal clinically acceptable reproducibility (κ ≥ 0.4) were subjected to increases and decreases of the cut points of ±25% in the quality score and ± .1 in reproducibility.

Results 

return to Article Outline

Results of the Literature Search 

More than 900 publications were retrieved, and 48 original articles published between 1980 and 2005 were included according to the inclusion criteria.20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67 In all 48 studies interobserver reproducibility were reported, and in 19 studies, intraobserver reproducibility was also reported (Appendix A, Appendix B, available online at www.mosby.com/jmpt). All predefined categories of palpation, spinal segments, and application conditions were evaluated. In 25 articles, a single test was evaluated, and in 22 articles, multiple tests (parallel testing) were assessed. Classification of the palpation procedure was not possible in 1 study due to insufficient description.63 Altogether, 58 tests were considered for interobserver reproducibility and 26 tests for intraobserver reproducibility (Table 1). Motion palpation was the most frequently investigated palpation procedure, followed by studies of palpation for pain.

Table 1.

Basic characteristic of the selected articles for the systematic review

Region
No. of articles
Inter (n = 48)Intra (n = 19)
Cervical163
Thoracic52
Lumbar198
SI joints86

Palpaton procedureNo. of tests considered
Inter (n = 58)Intra (n = 26)
MP2815
SP30
OP61
STP115
STC30
GA75

Methodological Quality 

The methodological quality of the studies ranged from 0% to 100% (Appendix C, Appendix D, available online at www.mosby.com/jmpt). Overall, 30 studies (63%) were of high quality; however, only 8 of 19 studies (42%) investigating intraobserver reproducibility were high-quality. The proportion of high quality was higher among articles investigating the cervical and thoracic spine than the articles investigating the lumbar spine and the sacroiliac (SI) joints (67% vs 59%). A trend for increasing quality was seen for more recent articles. The average quality score increases from 27% in articles published before 1988, to 48% in articles published between 1988 and 1995, and to 54% in articles published after 1996.

Meta-Analysis 

Of 48 original studies addressing interobserver reproducibility, 22 were considered both high-quality and eligible for inclusion in the meta-analysis according to the predetermined criteria. Twenty-six articles were not included (Fig 3). Fig 4, Fig 5 give an overview of the single results available for the meta-analysis.


View full-size image.

Fig 3. Flow chart of study inclusion in the meta-analysis of interobserver reproducibility studies.



View full-size image.

Fig 4. Meta-analysis: intraobserver reproducibility.



View full-size image.

Fig 5. Meta-analysis: interobserver reproducibility.


Eight original studies addressing intraobserver reproducibility were included in the meta-analysis (Fig 4). Eleven studies were not eligible. Ten studies were low-quality,34, 37, 48, 53, 60, 61, 63, 64, 65, 66 and 1 paper did not use a binary classification of the test outcome.55 Results were only available for 4 procedures (STP, OP, MP, and GA). Within each procedure, results seem to be comparable and point to midrange to high-range κ values, except of the study of Meijne et al.39

With respect to interobserver reproducibility, most of the results for STP indicate midrange reproducibility (Fig 5). Excepted are results from Boline,58 which showed low-range reproducibility; however, the κ estimate was very imprecise here (large CI). For STC, the results suggest low-range reproducibility, whereas SP shows inconsistent results. Results of OP all suggest mid- to high-range κ values. Most of the results for MP suggest low reproducibility. κ Values were inconsistent for GA but had wide, overlapping confidence intervals.

We found no significant effect of year of publication, segmental region, standardization of procedures, observer profession or experience, symptomatic status of test population, or number of tests performed on the κ values (data not shown). Thus, our investigation showed that most study characteristics had little influence on the study results. A notable exception was seen when comparing the application conditions, where sitting palpation was associated with slightly smaller κ values and standing palpation was associated with distinctly smaller κ values. These differences were significant (P = .042) for the interobserver studies, but the tendency could be also seen in the intraobserver studies (nonsignificant). We would also like to note that we could observe in the intraobserver analysis a tendency to low mean κ values in studies without parallel testing (κ = 0.23), compared with studies with parallel testing (κ = 0.61) (nonsignificant).

Evidence of Reproducibility 

Thirty-one articles were available for the assessment of level of evidence, including 6 studies not reporting a binary outcome (Fig 6).20, 21, 25, 40, 47, 49 Results from the 6 studies using weighted κ or ICC were not directly comparable to the studies using κ, but all 6 studies showed results with similar trends of low interobserver agreement on MP and higher interobserver agreement on evaluation of pain (Table 2). Similarly, we also included 5 low-quality studies, which showed similar trends (Table 2).33, 36, 37, 56, 57


View full-size image.

Fig 6. Flow chart of study inclusion in the assessment of level of evidence of interobserver reproducibility studies.


Table 2.

Results of studies using ICC or κw, and low quality studies included in the level of evidence of interobserver reproducibility

Palpaton procedure
Results
κ or ICCLow quality
MPICC:κ:
0.09-0.25 470.05 37
−0.4-0.73 490.01 56
κw:−0.17-0.17 57
−0.16-0.49 21
0.42-0.75 40
OP and STPICC:κ:
OP: 0.27-0.85 49OP: 0.00-1.0 36
OP: 0.22-0.80 20STP: 0.35-0.87 36
κw:
OP: 0.47-0.52 25
STP: 0.24-0.56 25
STC κ: 0.07 33
SP κ: 0.14-0.37 36

κw represents weighted κ.

In-text reference number.

Taking all 31 studies together, strong evidence of clinically acceptable intraobserver reproducibility (κ ≥ 0.4) was found for STP and GA (Table 3). Strong evidence for clinically acceptable interobserver reproducibility was found for OP and STP according to the predefined criteria for assessment of levels of evidence. Strong evidence of clinically unacceptable reproducibility was found for intraobserver MP and interobserver MP and STC. Conflicting evidence was found for interobserver reproducibility of SP and GA. Preliminary evidence of clinically acceptable reproducibility was found for intraobserver OP, and no evidence was found for intraobserver SP and STC.

Table 3.

Articles included in the meta-analysis and the assessment of level of evidence in categories of palpation procedures

Total number of articles in systematic review (n = HQ/LQ)
No of HQ articles eligible for meta-analysis
No of used test results in the meta-analysis
No of articles eligible for level of evidence (n = HQ/LQ)
Conflicting evidence
Level of evidence
Average κ value from the meta-analysis (95% CI)
ProcedureInter (30/18)Intra (8/11)Inter (n = 22)Intra (n = 8)Inter (n = 57)Intra (n = 26)Inter (25/6)Intra (11/3)InterIntraInterIntraInterIntra
OP8/21/151518/11/0NoStrongPre0.53 (0.32-0.74)0.91
STP8/22/1721158/12/0NoNoStrongStrong0.42 (0.29-0.55)0.65
MP22/147/8166271520/36/2NoNoStrongStrong0.17 (0.10-0.24)0.35(0.13-0.58)
STC5/20/030303/10NoStrongNo0.03
SP4/10/030303/10YesConfNo
GA4/12/142754/02/1YesNoConfStrong0.44

HQ, High-quality; LQ, low-quality; Pre, preliminary; Conf, conflicting.

Calculated if 4 or more results were available.

Sensitivity Analysis 

In the meta-analysis, only high-quality studies were included. If low-quality studies reporting binary outcomes and κ values or high-quality studies using κw or ICC had been included, the results would have been unaffected (data not shown).

Raising the cut point for adequate methodological quality from 50% to 75%, or any amount of decrease in the cut point, did not effect the weight of the evidence or the overall conclusions, except for intraobserver MP and intraobserver GA, where an increase to 75% would result in conflicting evidence derived from only 2 studies for intraobserver MP and moderate evidence for clinically acceptable intraobserver GA. Raising the cut point for clinical acceptability has an obvious impact, with results for pain being most robust due to high overall κ values.

Discussion 

return to Article Outline

Summary of Results 

After reviewing studies dealing with reproducibility of manual palpation of the entire spine, including the SI joints, we found strong evidence for clinically acceptable reproducibility both within and between observers for palpation of osseous and STP and within the same observer for GA. Strong evidence for clinically unacceptable levels of reproducibility for intra- and interobserver MP and STC was found. Intraobserver reproducibility was consistently higher than interobserver reproducibility, and reproducibility of palpation for pain response was consistently higher than reproducibility of palpation for motion.

The most recent and comprehensive review evaluating the reproducibility of spinal palpation by Seffinger et al7 applied different inclusion and general review criteria, and thus, only 27 of 44 articles and 9 of 19 high-quality articles included in this review were evaluated. Furthermore, we included several more recent publications and articles dealing with the SI joints, GA, and evaluated single results from multiple test regimens. Our conclusions are based on predefined criteria and an evaluation of consistency of high-quality studies, a method not previously applied, whereas the conclusions by Seffinger et al7 were based on both high- and low-quality studies without an evaluation of consistency. The authors concluded that pain provocation tests are most reliable, and soft tissue paraspinal palpatory diagnostic test is not reliable. Among the 12 highest-quality articles, pain provocation, motion, and landmark location tests were reliable within the same observer, but not always among observers under similar conditions. Overall, examiner' discipline, experience level, consensus on procedures used, training, or the use of symptomatic subjects did not improve reliability. This is in agreement with our findings. Furthermore, we conclude that palpation of pain is reproducible both within and among observers, whereas MP may be reproducible within the same observer.

Methodological and Clinical Considerations 

The experimental design of reproducibility studies has been criticized in previous reviews,4, 5, 6, 7, 68, 69, 70, 71 and we found that 26 of 48 articles were of low methodological quality, had invalid statistical methods, or insufficient reporting of palpation procedures or test results.

Comparability of the studies included in a review is the important requirement to ensure valid generalizations. We ensured comparability with respect to the palpation procedures used, but the studies were rather heterogeneous with respect to characteristics such as definition of positive findings, segmental region, standardization, occupation, experience, symptomatic status of test population, and parallel testing. However, our investigation showed that most study characteristics had little influence on the study results, with the exception of the application condition. Especially, standing palpation was associated with very low κ values. Among the reviewed studies, standing palpation is used solely in the “Gillet test” of SI biomechanical dysfunction, and only 2 studies reporting this condition were included in our analysis.39, 59 However, both contributed to the evaluation of the inter- and intraobserver agreement of MP. If we remove these 2 studies, then the average κ for the interobserver agreement increases to 0.19 (0.13-0.26), and the intraobserver agreement increases to 0.44 (0.14-0.73), such that the intraobserver agreement of MP can be regarded as acceptable.

Poor reproducibility of MP may reflect the design of reproducibility studies, rather than the quality of the palpation procedure.29, 30, 72 Greater reproducibility may be attained by allowing positive findings in a neighboring spinal segment to count in assessing agreement.29 However, this implies that we define a new, different diagnostic test which, then, requires a clinical rationale of test meaningfulness, beyond just an increase in κ values.8 Further, parallel testing (test regimens) seems to aid the observer in making the clinical decision, thus enhancing reproducibility;30, 42 a tendency we could also observe in our data. The acceptable intraobserver reproducibility for GA is also in line with this finding. However, when evaluating a combination of tests, information is only given about the reproducibility of the single test as part of this exact combination of tests.14, 73Moreover, we must be aware that conclusions on a single test from a study involving several tests may be only valid if the test is applied as part of this exact combination of tests. From a clinical perspective, increased reproducibility with parallel testing indicates that at this point, clinicians should not base their diagnosis on a single clinical examination finding such as palpation but, rather, conduct a range of tests. It is, however, premature to make clinical guidelines on how to use palpation because many aspects of palpation, such as the validity, still need to be investigated.

The reproducibility of palpation for pain response is consistently higher than palpation for motion and, consistently, substantially higher within an observer than among different observers. However, both palpatory pain studies and intraobserver studies in general have inherent problems with blinding of observers. In intraobserver studies, conscious and unconscious cues may render blinding of the observers impossible, and the independence of measures can not be guaranteed. In palpatory pain studies, blinding of subjects is impossible. Both situations imply the risk of overestimating reproducibility. It should also be noted that intraobserver reproducibility is somewhat higher than interobserver reproducibility by definition (depending on the magnitude of observer by subject interaction).74

A dilemma between high internal validity and clinical applicability arises when designing studies of reproducibility. For example, training studies contrast maximal (ideal) reproducibility with actual reproducibility in practice. To enhance the internal validity, rigid testing conditions should be set up with considerations to blinding, randomization, standardization and training, and parallel testing. However, rigid enforcement of testing condition often diverges from the clinical situation and, hence, may reduce the external validity. In a clinical situation, a mix of both asymptomatic and symptomatic patients will most likely present to practitioners of manual medicine. Therefore, the study population should consist of a mix of both symptomatic and asymptomatic subjects so that the reproducibility of the testing procedure has a relation to the characteristics of the study population.14 Finally, in spite of the use in every day clinical routines, test procedures do not always necessarily evaluate the clinical entity it is intended to evaluate, and it is therefore important to discuss the content of the test procedure.14, 75

Statistical Considerations 

κ is widely accepted as the statistical method of choice for evaluating agreement between 2 observers for a binary classification.8 It is, however, not without problems to use κ as the sole measure of observer agreement because information is lost when a 4-fold table is summarized into 1 number. Consequently, we do not know whether it is due to a difference in prevalence estimates between observers, or whether observers lack agreement in spite of similar prevalence if a moderate κ value is obtained in a study of reproducibility.

κ has been criticized for its dependence on the prevalence of positive findings, which limits its usefulness in meta-analyses, because studies with varying prevalence are typically compared. However, the composition of the study population may have greater impact on κ than the prevalence of positive findings.9 Both a binary outcome and a reported κ value were required for studies to be part of our meta-analysis. However, binary outcomes may vary according to the definition of positive findings (ie, prevalence is directly dependent on the definition of positive findings). For example, if the observer is asked to identify any hypomobile segment(s) in a spinal region, the prevalence can vary from 0% to 100%, depending on the study population. If the observer is to identify the most hypomobile segment, the overall prevalence of positive findings will be 100%, but at any particular segment under investigation, the prevalence of the most hypomobile can be 0% to 100%. However, we found no association between the prevalence of positive findings and κ values. This supports that the composition of the study populations is probably of greater importance than the prevalence of positive findings, as suggested by Vach.9

Different words and schemes have been used to evaluate the strength of reproducibility, but there are no definitive guidelines for interpreting good concordance.8, 76 Moreover, little research has been done to establish minimal, clinically acceptable reproducibility, and perhaps more important than qualifying the strength of concordance, the quantitative reproducibility indices need to be evaluated in terms of their clinical application.8

Limitations of this Review 

Different methodologies have been advocated for systematic reviews of trials addressing therapeutic efficacy,12 but little consensus exists when it comes to assessing the quality of reproducibility studies. We have chosen to evaluate the strength of evidence based on a best-evidence synthesis method, and this is one of the main differences between this review and previously published reviews on the same topic. Heterogeneity across studies, in terms of test procedures, inclusion criteria, study design and presentation of results, may be masked by the best-evidence approach. Considerable heterogeneity in study characteristics was noted across studies included in this review. However, despite this heterogeneity, the meta-analysis showed very consistent overall findings and only moderate impact of the specific design characteristics on the study outcomes.

The exclusion from the meta-analysis of studies that did not report a binary outcome is another important difference between this and previous reviews. To compare studies of reproducibility, the same type of outcome and method of statistics must be applied. On this account, we had to exclude 5 high-quality studies from the meta-analysis. Results from these studies are not directly comparable to the included studies, but all 5 articles show results with similar trends of low interobserver agreement on MP and higher interobserver agreement on evaluation of pain; they were included in the level of evidence assessment. The restricted number of articles causes the strength of evidence to be preliminary or nonexistent in 3 categories. In return, the power of the conclusions with respect to pain and motion testing is compelling. However, results were, in some categories, based on a relatively small number of original studies, making the conclusions very sensitive to just a few future high-quality studies with different results.

A κ value was reported in all high-quality studies using a binary classification. Hence, there was no need to calculate these from a published 4-fold table. No attempts were made to retrieve additional, original results or materials from the primary authors.

Although every effort was made to find all published reproducibility studies, selection bias may have occurred because we included only English-language articles. Publication bias may have resulted in an overestimation of test reproducibility because studies arriving at positive conclusions are more likely to get published.77, 78 Furthermore, reviewer bias is also a possible limitation of this review. Reviewers were not blinded to the authors or the results of the individual trials when the methodological scoring was performed because of our familiarity with the literature.

Despite acceptable study quality according to our criteria, many trials still had methodological limitations or, at best, inadequate reporting of methods. Nonetheless, reproducibility of spinal manual palpation has been very thoroughly investigated and more than 40 original articles have been evaluated in this review. However, to shed light on the clinical usefulness of palpation, the validity needs to be investigated, and new innovative research that addresses the concomitant problems of selecting a golden standard in motion testing is warranted. Future research should also address the question of palpation in the overall assessment of neck and back pain patients and the importance of palpation as part of the complete clinical evaluation of patients.

Conclusions 

return to Article Outline

Palpation for pain is reproducible at a clinically acceptable level, both within the same observer and among observers. Palpation for GA is reproducible within the same observer but not among different observers. The level of evidence to support these conclusions is strong. The reproducibility of MP, STC, and SP is not clinically acceptable. The level of evidence is strong for interobserver reproducibility of MP and STC, whereas no evidence or conflicting evidence exists for SP and intraobserver reproducibility of STC. Results are overall robust with respect to the predefined levels of acceptable quality. However, the results are sensitive to changes in the preset level of clinically acceptable reproducibility and to the number of included studies.

Practical Applications 

return to Article Outline


Palpation for pain is reproducible between observers at a clinically acceptable level.

Most spinal palpatory procedures investigated is reproducible within the same observer but not between observers.

Appendix A. 

return to Article Outline

Reference
Test procedure
Segmental level/patient position
Study population (no. [M/F], category, symptomatic status)
Examiners (no., occupation, experience)
Standardization
Additional procedures
Definition of positive findings/acceptable reliability
Statistics (type, prevalence/CI reported)
Summary of results/κ (PA)
Quality score
Christensen et al29MP STPT1-T8 Sitting + prone107 (68/39) Outpatient Sympt + Asympt2 Chiropractors; experience NR+Abnormality κ > 0.5κ (expanded κ): +/+MP: 0.13-0.45 (0.60-0.68) (82%-88%); STP: 0.34-0.57 (0.63-0.77) (81%-88%)100%
Horneij et al30MP STPT7-L5 prone84 (sex, NR) Gen pop Sympt + Asympt3 Physiotherapists, 18-25 y+Muscle lengthPainκ: −/+MP: 0.56-0.78 (78%-89%); STP: 0.64-0.78 (83%-89%)50.0%
French et al34GAT11-L5 + SI observers own choice19 (14/5) Recruitment NR Sympt5 Chiropractors 5-18 yHistory posture x-ray Neuro ClinJoint in need of adjustment; allows ± 1 segmentκ: −/−−0.21 to 1.00 (30%-100%)25.0%
Vincent-Smith and Gibbons37MPSI standing9 (5/4) Edu/staff Asympt9 Osteopathic stud 4-5 y+Unsymmetrical movement, L> < Rκ: −/−0.46 (42%)25.0%
Hawk et al38GAT12-S1 Observers own choice18 (14/4) Edu/staff Sympt + Asympt4 Chiropractors 2 > 20 y 2 < 3 yManual examinationJoint in need of adjustment (segment and functional unit)κ: +/−segment: −0.1 to 0.85 unit: −0.1 to 0.7750.0%
Meijne et al39MPSI Standing41 (41/0) Edu/staff Sympt + Asympt2 Physiotherapy stud experience NR+Fixationκ: −/+0.03-0.08 (71%-83%)75.0%
Cattrysse et al41GACx supine + sitting11 (sex NR) Research Status NR4 Manual practitioners 1.5-13 y3 tests of instabilityInstabilityκ: −/−−0.27 to 1.0 (63.6%-100%)75.0%
Inscoe et al48MPT12-S1 Side posture6 (2/4) Edu/staff Sympt2 Physiotherapists 4-5 y+MobilityPercent agreement0%
Paydar et al51MP OPSI Sitting32 (17/15) Edu/staff Asympt2 Chiropractic stud 1 y+PostureRestriction tendernessκ: −/seMP: 0.29 (58%) OP: 0.91 (97%)50.0%
Mior et al53MPSI>15 (sex NR) Recruitment NR Status NR74 Chiropractic stud Experience NR 2 Chiropractors >5 y+/−Fixationκ: −/−NR25.0%
Leboeuf54MP OP STPLx + SI sitting45 (29/16) Gen pop Sympt4 Chiropractic stud Experience NRNRNRPercent agreement25.0%
Herzog et al55MPSI Standing11 (sex NR) Prim Care Sympt + Asympt10 Chiropractors 1-11 y+Gait analysisFixation, 3-point scalePercentage agreement, χ250.0%
Mootz et al57MPLx Sitting60 (sex NR) Edu/staff Status NR2 Chiropractors 7 + 10 y+Fixationκ: +/−−0.09 to 0.4825.0%
Love and Brodeur60MPT1-L5 Sitting32 (32/0) Edu/staff Status NR8 Chiropractic stud 1 yMost hypomobile motor unitPearson0%
Carmichael59MPSI Standing54 (sex NR) Edu/staff Asympt10 stud. 1-3 y+Fixationκ: +/se0.31 (90%)50.0%
Bergstrøm and Courtis61MPLx Sitting100 (sex NR) Edu/staff Status NR2 Chiropractic stud. Experience NRFixationPercent agreement0%
Deboer et al63Insuff descripCx Sitting40 (40/0) Research + Edu/staff Asympt3 ChiropractorsFixation Pain Muscleκ25.0%
Mior and King62MPC1 Supine62 (sex NR) Edu/staff Status NR2 Chiropractic stud Experience NRNRFixationκ: +/−0.37-0.52 (71%-79%)50.0%
Gonella et al66MPT12-S15 (0/5) Edu/staff Asympt5 Physiotherapists 3-20 y+Mobility, 7-point scaleMean, SD0%

Cx, Cervical spine; Tx, thoracic spine; NR, not reported; NA, not applicable; Symp, symptomatic; Asympt, asymptomatic; Prim Care, primary care; Edu/staff, educational (students) or staff members; Gen pop, General population; Outpatient, outpatient clinic; Research, research setting; Stud, student. M/F, male/female; PA, percentage agreement; CI, confidence interval; Neuro, neurologic testing, such as sensitivity, reflexes, muscular strength; Clin, clinical testing, such as active and passive range of motion, axial compression test, manual traction test, strait leg raise, and shoulder abduction test.

Appendix B. 

return to Article Outline

Reference
Test procedure
Segmental level/ patient position
Study population (number (m/f), category, symptomatic status)
Examiners (number, occupation, experience)
Standardization
Additional procedures
Definition of positive findings/ acceptable reliability
Statistics (type, prevalence/ CI reported)
Summary of results/κ (PA)
Quality score
Pool et al20MP OPCx Supine32 (12/20) Primary care Sympt2 Physiotherapists Experience NR+ClinMobility Pain, 11-point scale κ > 0.4, ICC >0.75κ and ICC (2.1) +/−MP: -0.09-0.63 (48%-90%) OP: 0.22-0.80 (40.6%-87.4%)50%
Hicks et al27MP OPLx Prone63 (25/38) Outpatient + Research Sympt3 Physiotherapist 1 Physiotherapist/ chiropractor 3-8 y+Clin General mobility testMobility Painκ: +/+MP: -0.02-0.26 (52%-69% ) OP: 0.25-0.55 (65%-87%)50%
Downey et al28MPLx Prone60 (28/32) Prim Care Sympt6 Physiotherapists 3-11 y-History ClinMost symptomatic levelκ: +/+0.3750%
Sebastian and Chovvath26MPL5 Sitting + prone31 (sex NR) Recruitment NR Sympt2 Physiotherapists 5-8 y+-Dysfunctionκ: +/−0.6916.7%
Christensen et al29MP STPT1-T8 Sitting + prone107 (68/39) Outpatient Sympt + Asympt2 Chiropractors Experience NR+-Abnormality κ > 0.5κ (expanded κ): +/+MP: −0.03-0.0 (0.22-0.24) (68%-80%) STP: 0.38 (0.67-0.70) (77%-79%)100%
Horneij et al30MP STPT7-L5 Prone84 (sex NR) Gen pop Sympt + Asympt3 Physiotherapists 18-25 y+Muscle lengthPainκ: −/+MP: 0.12-0.49 (61%-77%) STP: 0.31-0.88 (80%-95%)66.7%
Marcotte et al31MPCx Supine3 (sex NR) Edu/staff Asympt24 Chiropractic stud + 1 Chiropractor Experience NR+Fixation Inclination = 6°κ: +/se0.337-0.682 (81%-90%)16.7%
Comeaux et al32MP STCC2-T8 Sitting54 (27/28) Gen pop Status NR3 Occupation NR >10 yThe most dysfunctional segmentκ: +/−NR50.0%
Ghoukassian et al33STCTx Sitting19 (19/0) Recruitment NR Asympt10 Osteopathic Stud 2 y+The most significant area of tissue tensionκ: −/−0.0733.3%
French et al34GAT11-L5 + SI Observers own choice19 (14/5) Recruitment NR Sympt5 Chiropractors 5-18 yHistory Posture X-ray Neuro ClinJoint in need of adjustment Allows ± 1segmentκ: −/−−0.16 to 0.25 (48%-64%)50.0%
Smedmark and Wallin35MPC1-3 + C7-T1 Sitting + prone + side lying61 (15/46) Prim. care Sympt2 Physiotherapists >25 y+4 tests of mobilityStiffness (reduced mobility)κ: −/−0.28-0.43 (79%-87%)66.7%
Van Suijlekom et al36SP OP STPCx Position NR24 (13/11) Outpatient + Research Sympt2 Neurologists Experience NRHistory Clin Tender pointsFacet joint pain Impairmentκ: −/−SP: 0.14-0.37 OP: 0.0-1.0 STP: 0.35-0.8733.3%
Vincent-Smith and Gibbons 37MPSI Standing9 (5/4) Edu/staff Asympt9 Osteopathic stud. 4-5 y+Unsymmetrical movement, L> < Rκ: −/−0.05 (42%)16.7%
Hawk et al38GAT12-S1 Observers own choice18 (14/4) Edu/staff Sympt + Asympt4 Chiropractors 2 > 20 y 2 < 3 yManual examinationJoint in need of adjustment (segment and functional unit)κ: +/−segment: −0.42 to 0.44 unit: −0.39 to 0.5466.7%
Meijne et al39MPSI Standing41 (41/0) Edu/staff Symptom + Asympt2 Physiotherapy stud. Experience NR+Fixationκ: −/+−0.05 to 0.0 (76%-77%)66.7%
Fjellner et al21MPC0-C5 Sitting + supine48 (8/40) Edu/staff + Gen pop Asympt2 Physiotherapists 6 + 12 y+ClinIf not normal κ >0.4κ(w): +/+−0.16 to 0.49 (41%-92%)66.7%
Lundberg and Gerdle40MPLx Side posture156 (0/156) Gen pop Status NR3 Physiotherapists Experience NR+Posture ClinMobility, 5-point scaleκ(w): −/+0.42-0.7566.7%
Strender et al22MP SP OP STP STCC0-C3 Supine50 (13/37) Gen pop Sympt + Asympt2 Physiotherapists 21 + 23 y+ClinMobility Consistency Pain Difference between L/R, the most pronounced side κ > 0.4κ: +/+MP: 0.05-0.15 (26%-44%) SP: 0.24 (70%) OP: 0.37 (58%) STP: 0.31-0.52 (62%-68%) STC: −.18 (36%)75.0%
Strender et al23MP STPLx Prone71 (28/43) Outpatient + Prim Care Sympt2 Physiotherapists 2 Physicians Experience NR+Clin NeuroMobility Normality versus pathology κ > 0.4κ: +/+MP: PT: 0.38-0.75 (72%-88%) MD: -0.08-0.24 (48%-62%) STP PT: 0.27-0.56 (72%-86%) MD: 0.22-0.40 (71%-76%)66.7%
Cattrysse et al41GACx Supine + sitting11 (sex NR) Research Status NR4 Manual practitioners 1.5-13 y3 tests of instabilityInstabilityκ: −/−−0.64 to 1.0 (18%-100%)83.3%
Jull and Zito42GAC0-C3 Position NR40 (12/28) Out patient Sympt + Asympt7 Physiotherapists Experience NRManual examinationMost dysfunctional segment Order of magnitudeκ: −/−0.25-1.066.7%
McPartland and Goodridge43MP SP STCC0-C3 Position NR7 + 11 (1/6 + 5/6) Research + Edu/staff Sympt + Asympt2 Osteopaths 10 + 40 y 36 Osteopathic studNRDysfunction. Facet joint tenderness. Tissue texture. (Rating 0-10)κ: −/−MP: 0.34 (67%) SP: 0.53 (77%) STC: 0.19 (70%)58.3%
Tuchin et al44GAC1-C7 Position NR53 (sex NR) Edu/staff Sympt + Asympt8 Chiropractors 2-14 yManual examinationVertebral dysfunctionLogistic regression χ216.7%
Haas 45MPT3-T12 Sitting73 (2/3 males) Edu/staff Sympt/ Asympt2 Chiropractors >15 y+End play restrictionκ: −/SE0.14100%
Lindsay 46MPLx + SI Supine + prone8 (sex NR) Gen pop Asympt2 Physiotherapists 6 + 10 yPosture Clin Muscle lengthBeyond slight anomalyκ: +/−Lx: −0.30 to 0.0 (14%-50%) SI: 0.0-0.60 (75%-86%)66.7%
Binkley et al 47MPL1-S1 Prone18 (9/9) Outpatient Sympt6 Physiotherapists 6-13 y+-Motion, 9-point scaleICC −/+0.09-0.2533.3%
Inscoe et al48MPT12-S1 Side posture6 (2/4) Edu/staff Sympt2 Physiotherapists 4-5 y+MobilityPercent Agreement16.7%
Maher and Adams49MP OPLx Prone90 (34/56) Prim Care Sympt6 Physiotherapists 8-21 yStiffness, 11-point scale Pain, 11-point scaleICC (1,1) +/+MP: −0.40 to 0.73 OP: 0.27-0.8558.3%
Hubka and Phelon50SPC0-C7 Sitting30 (11/19) Private Clinic Sympt2 Chiropractors 1 + 5 yThe most tender spotκ: +/+0.68 (77%)75.0%
Paydar et al51MP OPSI Sitting32 (17/15) Edu/staff Asympt2 Chiropractic stud. 1 y+PostureRestriction Tendernessκ: −/seMP: 0.09 (34%) OP: 0.73 (91%)50.0%
Boline et al52OP STPLx prone28 (+/+)Prim Care Sympt3 Chiropractors Experience NRNRPosture Dermothemography Surface electromyographyPresence of abnormalityκ: +/−OP: 0.48-0.90 (75-96%) STP: 0.40-0.78 (89%)50.0%
Keating et al24MP SP OP STP STCLx Prone + sitting46 (20/26) Recruitment NR Sympt + Asympt3 Chiropractors 2 -10 y+Posture Dermothemography TemperatureMisalignment Pain Fixation κ > 0.4κ: +/−MP: 0.07-0.09 SP: 0.0 OP: 0.48 STP: 0.30 STC: 0.0775.0%
Mior et al53MPSI>15 (sex NR) Recruitment NR Status NR74 Chiropractic stud. Experience NR 2 Chiropractors >5 y+/−Fixationκ: −/−NR16.7%
Leboeuf54MP OP STPLx + SI Sitting45 (29/16) Gen pop Sympt4 Chiropractic stud Experience NRNRNRPercent agreement16.7%
Herzog et al55MPSI Standing11 (sex NR) Prim Care Sympt + Asympt10 Chiropractors 1-11 y+Gait analysisFixation, 3-point scalePercentage agreement, χ250.0%
Nansel et al56MPMiddle + lower Cx Sitting + supine270 (Approximately 50% males) Edu/staff Asympt4 Chiropractors Experience NR+The side of greatest resistance (L> <R) - marked segment.κ: +/−0.01 (46%-54%)16.7%
Mootz et al57MPLx Sitting60 (sex NR) Edu/staff Status NR2 Chiropractors 7 + 10 y+-Fixationκ: +/−−0.17 to 0.1733.3%
Boline58MP STP STCLx Sitting50 (27/23) Edu/staff + outpatient + Prim Care Sympt + Asympt2 Chiropractors Experience NR+Presence of severe abnormality, fixationκ: +/−MP: −0.05 to 0.31 (78-91%) STP: −0.03 to 0.49 (90-96%) STC: 0.10-0.31 (70%)66.7%
Carmichael59MPSI Standing54 (sex NR) Edu/staff Asympt10 stud. 1-3 y+Fixationκ: +/se0.02 (85%)50.0%
Love and Brodeur60MPT1-L5 Sitting32 (32/0) Edu/staff Status NR8 Chiropractic stud 1 yMost hypomobile motor unitPearson16.7%
Viikari-Juntura25OP STPCx Seated69 (29/23) Outpatient Sympt1 Physician 1 Physiotherapist Experience NR+Neuro ClinTendersness Rating (0-3) κ > 0.4κ(w): +/−OP: 0.47-0.52 STP: 0.24-0.5650.0%
Bergstrøm and Courtis61MPLx Sitting100 (sex NR) Edu/staff Status NR2 Chiropractic stud. Experience NRFixationPercent agreement0%
Mior and King 62MPC1 Supine62 (sex NR) Edu/staff Status NR2 Chiropractic stud Experience NRNRFixationκ: +/−0.15 (61%)50.0%
Deboer et al63Insuff descripCx Sitting40 (40/0) Research + Edu/staff Asympt3 Chiropractors Experience NRFixation Pain Muscleκ50.0%
Potter and Rothstein64MPSI Standing + sitting + side posture + prone17 (10/7) Outpatient Sympt8 Physiotherapists 2-18 y+13 SI joint testsRestrictionPercentage agreement, χ233.3%
Johnston et al65STCC7-T12 Standing30 (sex NR) Edu/staff Status NR1 Osteopaths 5 Osteopathic stud Experience NRNRDecreased rebound/dullnessPercent Agreement−(79%-86%)0%
Gonella et al66MPT12-S15 (0/5) Edu/staff Asympt5 Physiotherapists 3-20 y+Mobility, 7-point scaleMean, SD16.7%
Wiles 67MPSI46 (sex NR) Edu/staff Asympt12 Chiropractors average 2.75 yNRRestriction, 5-point scalePercentage agreement, Pearson0%

Appendix C. Intra-observer reproducibility studies 

return to Article Outline

Reference
Case mix
Blinding of observers to confounding info
Subject blinding
κ/ICC
Total (max 4 points)
Total percentage
Christensen et al2911114100.00
Horneij et al301001250.00
French et al340001125.00
Vincent-Smith and Gibbons370001125.00
Hawk et al381001250.00
Meijne et al390111375.00
Cattrysse et al410111375.00
Inscoe et al48000000.00
Paydar et al511001250.00
Mior et al530001125.00
Leboeuf541000125.00
Herzog et al551010250.00
Mootz et al570001125.00
Love and Brodeur60000000.00
Carmichael590011250.00
Bergstrøm and Courtis61000000.00
Deboer et al630010125.00
Mior and King620011250.00
Gonella et al66000000.00

Appendix D. Inter-observer reproducibility studies 

return to Article Outline

Reference
Randomized order of observer
Case mix
Blinding of observers to other observers
Blinding of observers to confounding info
Subject blinding
κ/ICC
Total (max 6 points)
Total percentage
Pool et al20011001350.00
Hicks et al27011001350.00
Downey et al28011001350.00
Sebastian and Chovvath26100000116.67
Christensen et al291111116100.00
Horneij et al30111001466.67
Marcotte et al31001000116.67
Comeaux et al32001110350.00
Ghoukassian et al33001001233.33
French et al34101001350.00
Smedmark and Wallin35111001466.67
Van Suijlekom et al36010001233.33
Vincent-Smith and Gibbons37000001116.67
Hawk et al38111001466.67
Meijne et al39001111466.67
Fjellner et al21101011466.67
Lundberg and Gerdle40111001466.67
Strender et al2211100.514.575.00
Strender et al23101011466.67
Cattrysse et al41101111583.33
Jull and Zito42011101466.67
McPartland and Goodridge4311100.503.558.33
Tuchin et al44100000116.67
Haas 451111116100.00
Lindsay 46101011466.67
Binkley et al47010001233.33
Inscoe et al48001000116.67
Maher and Adams4910100.513.558.33
Hubka and Phelan5011100.514.575.00
Paydar et al51110001350.00
Boline et al52110001350.00
Keating et al2411100.514.575.00
Mior et al53000001116.67
Leboeuf54010000116.67
Herzog et al55011010350.00
Nansel et al56000001116.67
Mootz et al57100001233.33
Boline58111001466.67
Carmichael59001011350.00
Love and Brodeur60001000116.67
Viikari-Juntura25110001350.00
Bergstrøm and Courtis6100000000.00
Mior and King62001011350.00
Deboer et al63101010350.00
Potter and Rothstein64011000233.33
Johnston et al6500000000.00
Gonella et al66100000116.67
Wiles 6700000000.00

References 

return to Article Outline

1. 1Bergmann TF, Petersen DH. Joint principles and procedures. In:  Bergmann TF,  Petersen DH,  Lawrence DJ editor. Chiropractic technique: principles and procedures. New York: Churchill Livingstone Inc; 1993;p. 51–121.

2. 2Schafer RC, Faye LJ. Introduction to the dynamic chiropractic paradigm. In:  Schafer RC,  Faye LJ editor. Motion palpation and chiropractic technique. 1st ed.. Huntington Beach, Calif: The motion palpation institute; 1989;p. 1–41.

3. 3Maitland GD. Vertebral manipulation. 3rd ed.. London: Butterworths; 1977;.

4. 4Hestbaek L, Leboeuf-Yde C. Are chiropractic tests for the lumbo-pelvic spine reliable and valid? A systematic critical literature review. J Manipulative Physiol Ther. 2000;23:258–275. Abstract | Full Text | Full-Text PDF (91 KB) | CrossRef

5. 5Huijbregts PA. Spinal motion palpation: a review of reliability studies. J Man Manip Ther. 2002;10:24–39.

6. 6van der Wurff P, Hagmeijer RH, Meyne W. Clinical tests of the sacroiliac joint. A systemic methodological review. Part 1: reliability. Man Ther. 2000;5:30–36. CrossRef

7. 7Seffinger MA, Najm WI, Mishra SI, Adams A, Dickerson VM, Murphy LS, et al. Reliability of spinal palpation for diagnosis of back and neck pain: a systematic review of the literature. Spine. 2004;29:E413–E425.

8. 8Haas M. Statistical methodology for reliability studies. J Manipulative Physiol Ther. 1991;14:119–132. MEDLINE

9. 9Vach W. The dependence of Cohen's kappa on the prevalence does not matter. J Clin Epidemiol. 2005;58:655–661. Abstract | Full Text | Full-Text PDF (244 KB) | CrossRef

10. 10Vaughan B. Inter-examiner reliability in detecting cervical spine dysfunction: a short review. J Osteopath Med. 2002;5:24–27.

11. 11van Tulder MW, Assendelft WJ, Koes BW, et al. Method guidelines for systematic reviews in the Cochrane collaboration back review group for spinal disorders. Spine. 1997;22:2323–2330. MEDLINE | CrossRef

12. 12Clarke M, Oxmann AD. Cochrane reviewers' handbook 4.2.0. Oxford: Cochrane Collaboration; 2003;cited 2004 Jun 1.

13. 13Hoogendoorn WE, van Poppel MN, Bongers PM, Koes BW, Bouter LM. Systematic review of psychosocial factors at work and private life as risk factors for back pain. Spine. 2000;25:2114–2125. MEDLINE | CrossRef

14. 14Patijn J. Reproducibility and validity studies of diagnostic procedures in manual/musculoskeletal medicine. In: International Federation for Manual/Musculoskeletal Medicine Scientific committee. Protocol Formats. 2004;.

15. 15Deeks JJ. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323:157–162.

16. 16Irwig L, Macaskill P, Glasziou P, et al. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol. 1995;48:119–130. Abstract | Full-Text PDF (1161 KB) | CrossRef

17. 17Altman DG. Some common problems in medical research. In:  Altman DG editors. Practical statistics for medical research. London: Chapman & Hall; 1991;p. 396–439.

18. 18Bigos S, Bowyer O, Braen G, et al. Acute low back problems in adults. Clinical Practice Guideline No. 14. AHCPR Publication No. 95-0642 Rockville (Md): Agency for Health Care Policy and Research, Public Health Service, U.S. Department of Health and Human Services; 1994;[December. Available from:www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hstat6.chapter.25870.].

19. 19Hartvigsen J, Lings S, Leboeuf-Yde C, Bakketeig L. Psychosocial factors at work in relation to low back pain and consequences of low back pain; a systematic, critical review of prospective cohort studies. Occup Environ Med. 2004;61:e2.

20. 20Pool JJ, Hoving JL, De Vet HC, van Mameren H, Bouter LM. The interexaminer reproducibility of physical examination of the cervical spine. J Manipulative Physiol Ther. 2004;27:84–90. Abstract | Full Text | Full-Text PDF (155 KB) | CrossRef

21. 21Fjellner A, Bexander C, Faleij R, Strender LE. Interexaminer reliability in physical examination of the cervical spine. J Manipulative Physiol Ther. 1999;22:511–516. Abstract | Full Text | Full-Text PDF (41 KB) | CrossRef

22. 22Strender LE, Lundin M, Nell K. Interexaminer reliability in physical examination of the neck. J Manipulative Physiol Ther. 1997;20:516–520. MEDLINE

23. 23Strender LE, Sjoblom A, Sundell K, Ludwig R, Taube A. Interexaminer reliability in physical examination of patients with low back pain. Spine. 1997;22:814–820. MEDLINE | CrossRef

24. 24Keating JC, Bergmann TF, Jacobs GE, Finer BA, Larson K. Interexaminer reliability of eight evaluative dimensions of lumbar segmental abnormality. J Manipulative Physiol Ther. 1990;13:463–470. MEDLINE

25. 25Viikari-Juntura E. Interexaminer reliability of observations in physical examinations of the neck. Phys Ther. 1987;67:1526–1532. MEDLINE

26. 26Sebastian D, Chovvath R. Reliability of palpation assessment in non-neutral dysfunctions of the lumbar spine. Orthop Phys Ther Pract. 2004;16:23–26.

27. 27Hicks GE, Fritz JM, Delitto A, Mishock J. Interrater reliability of clinical examination measures for identification of lumbar segmental instability. Arch Phys Med Rehabil. 2003;84:1858–1864. Abstract | Full Text | Full-Text PDF (142 KB) | CrossRef

28. 28Downey B, Nicholas T, Niere K. Can manipulative physiotherapists agree on which lumbar level to treat based on palpation?. Physiotherapy. 2003;89:74–81. Abstract | Full Text | Full-Text PDF (128 KB) | CrossRef

29. 29Christensen HW, Vach W, Manniche C, Haghfelt T, Hartvigsen L, Høilund-Carlsen PF. Palpation of the upper thoracic spine—an observer reliability study. J Manipulative Physiol Ther. 2002;25:285–292. Abstract | Full Text | Full-Text PDF (84 KB) | CrossRef

30. 30Horneij E, Hemborg B, Johnsson B, Ekdahl C. Clinical tests on impairment level related to low back pain: a study of test reliability. J Rehabil Med. 2002;34:176–182. MEDLINE | CrossRef

31. 31Marcotte J, Normand MC, Black P. The kinematics of motion palpation and its effect on the reliability for cervical spine rotation. J Manipulative Physiol Ther. 2002;25:E7. MEDLINE

32. 32Comeaux Z, Eland D, Chila A, Pheley A, Tate M. Measurement challenges in physical diagnosis: refining interrater palpation, perception and comminication. J Bodyw Mov Ther. 2001;5:245–253.

33. 33Ghoukassian M, Nicholls B, McLaughlin P. Inter-examiner reliability of the Johnson and Friedman percussion scan of the thoracic spine. J Osteopath Med. 2001;4:15–20.

34. 34French SD, Green S, Forbes A. Reliability of chiropractic methods commonly used to detect manipulable lesions in patients with chronic low-back pain. J Manipulative Physiol Ther. 2000;23:231–238. Abstract | Full Text | Full-Text PDF (64 KB) | CrossRef

35. 35Smedmark V, Wallin M. Inter-examiner reliability in assessing passive intervertebral motion of the cervical spine. Man Ther. 2000;5:97–101. CrossRef

36. 36van Suijlekom HA, de Vet HC, van den Berg SG, Weber WE. Interobserver reliability in physical examination of the cervical spine in patients with headache. Headache. 2000;40:581–586. MEDLINE | CrossRef

37. 37Vincent-Smith B, Gibbons P. Inter-examiner and intra-examiner reliability of standing flexion test. Man Ther. 1999;4:87–93. CrossRef

38. 38Hawk C, Phongphua C, Bleecker J, Swank L, Lopez D, Rubley T. Preliminary study of the reliability of assessment procedures for indications for chiropractic adjustments of the lumbar spine. J Manipulative Physiol Ther. 1999;22:382–389. Abstract | Full Text | Full-Text PDF (124 KB) | CrossRef

39. 39Meijne W, van Neerbos K, Aufdemkampe G, van der Wurff P. Intraexaminer and interexaminer reliability of the Gillet test. J Manipulative Physiol Ther. 1999;22:4–9. Full Text | CrossRef

40. 40Lundberg G, Gerdle B. The relationships between spinal sagittal configuration, joint mobility, general low back mobility and segmental mobility in female homecare personnel. Scand J Rehabil Med. 1999;31:197–206. MEDLINE | CrossRef

41. 41Cattrysse E, Swinkels RAH, Oostendorp RAB, Duquet W. Upper cervical instability: are clinical tests reliable?. Man Ther. 1997;2:91–97. CrossRef

42. 42Jull G, Zito G. Inter-examiner reliability to detect painful upper cervical joint dysfunction. Aust J Physiother. 1997;43:125–129.

43. 43McPartland JM, Goodridge JP. Counterstrain and traditional osteopathic examination of the cervical spine compared. J Bodyw Mov Ther. 1997;1:173–178.

44. 44Tuchin P, Hart J, Colman R, Johnson C, Gee A, Edwards I, et al. Interexaminer reliability of chiropractic evaluation for cervical spine problems—a pilot study. Chiropr J Aust. 1996;5:23–29.

45. 45Haas M. Reliability of manual end-play palpation of the thoracic spine. Chiropr Tech. 1995;7:120–124.

46. 46Lindsay DM. Interrater reliability of manual therapy assessment techniques. Phys Ther Can. 1995;47:173–180.

47. 47Binkley J, Stratford PW, Gill C. Interrater reliability of lumbar accessory motion mobility testing. Phys Ther. 1995;75:786–792. MEDLINE

48. 48Inscoe EL, Witt PL, Gross MT, Mitchell RU. Reliability in evaluating passive intervertebral motion of the lumbar spine. J Man Manip Ther. 1995;3:135–143.

49. 49Maher C, Adams R. Reliability of pain and stiffness assessments in clinical manual lumbar spine examination. Phys Ther. 1994;74:801–809. MEDLINE

50. 50Hubka MJ, Phelan SP. Interexaminer reliability of palpation for cervical spine tenderness. J Manip Physiol Ther. 1994;17:591–595.

51. 51Paydar D, Thiel H, Gemmell H. Intra- and interexaminer reliability of certain pelvic palpatory procedures and the sitting flexion test for sacroiliac joint mobility and dysfunction. J Neuromusculoskel Syst. 1994;2:65–69.

52. 52Boline PD, Haas M, Meyer JJ, Kassak K, Nelson C, Keating JC. Interexaminer reliability of eight evaluative dimensions of lumbar segmental abnormality: part II. J Manipulative Physiol Ther. 1993;16:363–374. MEDLINE

53. 53Mior SA, McGregor M, Schut B. The role of experience in clinical accuracy. J Manipulative Physiol Ther. 1990;13:68–71. MEDLINE

54. 54Leboeuf C. Chiropractic examination procedures: a reliability and consistency study. J Aust Chiropr Assoc. 1989;19:101–104.

55. 55Herzog W, Read LJ, Conway PJ, Shaw LD, McEwen MC. Reliability of motion palpation procedures to detect sacroiliac joint fixations. J Manipulative Physiol Ther. 1989;12:86–92. MEDLINE

56. 56Nansel DD, Peneff AL, Jansen RD, Cooperstein R. Interexaminer concordance in detecting joint-play asymmetries in the cervical spines of otherwise asymptomatic subjects. J Manipulative Physiol Ther. 1989;12:428–433. MEDLINE

57. 57Mootz RD, Keating JC, Kontz HP, Milus TB, Jacobs GE. Intra- and interobserver reliability of passive motion palpation of the lumbar spine. J Manipulative Physiol Ther. 1989;12:440–445. MEDLINE

58. 58Boline PD. Interexaminer reliability of palpatory evaluations of the lumbar spine. Am J Chiropr Med. 1988;1:5–11.

59. 59Carmichael JP. Inter- and intra-examiner reliability of palpation for sacroiliac joint dysfunction. J Manipulative Physiol Ther. 1987;10:164–171. MEDLINE

60. 60Love RM, Brodeur RR. Inter- and intra-examiner reliability of motion palpation for the thoracolumbar spine. J Manipulative Physiol Ther. 1987;10:1–4. MEDLINE

61. 61Bergstrøm E, Courtis G. An inter- and intra-examiner reliability study of motion palpation of the lumbar spine in lateral flexion in the seated position. Eur J Chiropr. 1986;34:121–141.

62. 62Mior SA, King R. Intra and interexaminer reliability of motion palpation in the cervical spine. J Can Chiropr Assoc. 1985;29:195–199.

63. 63Deboer KF, Harmon R, Tuttle CD, Wallace H. Reliability study of detection of somatic dysfunctions in the cervical spine. J Manipulative Physiol Ther. 1985;8:9–16. MEDLINE

64. 64Potter NA, Rothstein JM. Intertester reliability for selected clinical tests of the sacroiliac joint. Phys Ther. 1985;65:1671–1675. MEDLINE

65. 65Johnston WL, Allan BR, Hendra JL, Neff DR, Rosen ME, Sills LD, et al. Interexaminer study of palpation in detecting location of spinal segmental dysfunction. J Am Osteopath Assoc. 1983;82:839–845. MEDLINE

66. 66Gonella C, Paris SV, Kutner M. Reliability in evaluating passive intervertebral motion. Phys Ther. 1982;62:436–444. MEDLINE

67. 67Wiles MR. Reproducibility and interexaminer correlation of motion palpation findings of the sacroiliac joints. J Can Chiropr Assoc. 1980;24:59–69.

68. 68Oldreive WL. Manual therapy rounds. A critical review of the literature on tests of the sacroiliac joint. J Man Manip Ther. 1995;3:157–161.

69. 69Keating JC. Inter-examiner reliability of motion palpation of the lumbar spine: a review of quantitative literature. Am J Chiropr Med. 1989;2:107–110.

70. 70Panzer DM. The reliability of lumbar motion palpation. J Manipulative Physiol Ther. 1992;15:518–524. MEDLINE

71. 71Haas M. The reliability of reliability. J Manipulative Physiol Ther. 1991;14:199–208. MEDLINE

72. 72Humphreys K, Delahaye M, Peterson CK. An investigation into the validity of cervical spine motion palpation using subjects with congenital block vertebrae as a “gold standard”. BMC Musculoskelet Disord. 2004;5:19. MEDLINE | CrossRef

73. 73van Deursen L, Patijn J, Ockhuysen A, Vortman BJ. The value of some clinical tests of the sacro-iliac joint. Man Med. 1990;5:96–99.

74. 74Feldt LS, McKee ME. Estimation of the reliability of skill tests. Res Q. 1958;29:279–293.

75. 75Haas M, Groupp E, Panzer D, Partna L, Lumsden S, Aickin M. Efficacy of cervical endplay assessment as an indicator for spinal manipulation. Spine. 2003;28:1091–1096. CrossRef

76. 76Landis JR, Koch GC. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174. CrossRef

77. 77Huxley R, Neil A, Collins R. Unravelling the fetal origins hypothesis: is there really an inverse association between birthweight and subsequent blood pressure?. Lancet. 2002;360:659–665. Abstract | Full Text | Full-Text PDF (104 KB) | CrossRef

78. 78Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;291:2457–2465. CrossRef

a Research Fellow, Nordic Institute of Chiropractic and Clinical Biomechanics, Part of Clinical Locomotion Science, Odense, Denmark

b Senior Researcher, Nordic Institute of Chiropractic and Clinical Biomechanics, Part of Clinical Locomotion Science, Odense, Denmark

c Senior Researcher, Nordic Institute of Chiropractic and Clinical Biomechanics, Part of Clinical Locomotion Science, Odense, Denmark; and Associate Professor, Institute of Sports Science and Clinical Biomechanics, Part of Clinical Locomotion Science, University of Southern Denmark, Denmark

d Professor, The Department of Statistics, University of Southern Denmark, Denmark

e Professor, Center for Outcomes Studies, Western States Chiropractic College, Portland, Ore

f Senior Researcher, The Back Research Center, Backcenter Funen; and Part of Clinical Locomotion Science, University of Southern Denmark, Denmark

g Professor, Texas Chiropractic College, Pasadena, Tex

h Professor, Department of Research, Wolfe-Harris Center for Clinical Studies, Northwestern Health Sciences University, Bloomington, Minn

Corresponding Author InformationSubmit requests for reprints to: Mette Jensen Stochkendahl, DC, Nordic Institute of Chiropractic and Clinical Biomechanics, Research Department, Klosterbakken 20, DK-5000 Odense C, Denmark.

 This study was funded by the Nordic Institute of Chiropractic and Clinical Biomechanics, Odense, Denmark and the Foundation for Chiropractic Education and Research, grant no. 03-09-01.

PII: S0161-4754(06)00155-2

doi:10.1016/j.jmpt.2006.06.011


View previous. 14 of 19 View next.