How to Use an Article Reporting a Multiple Treatment Comparison Meta-analysis

Transcription

How to Use an Article Reporting a Multiple Treatment Comparison Meta-analysis
USERS’ GUIDES TO THE MEDICAL LITERATURE
How to Use an Article Reporting a Multiple
Treatment Comparison Meta-analysis
Edward J. Mills, PhD, MSc
John P. A. Ioannidis, MD, DSc
Kristian Thorlund, PhD, MSc
Holger J. Schu¨nemann, MD, PhD, MSc
Milo A. Puhan, MD, PhD
Gordon H. Guyatt, MD, MSc
CLINICAL SCENARIO
You are seeing a 45-year-old patient for
whom, 6 weeks previously, you prescribed paroxetine, a selective serotonin reuptake inhibitor (SSRI), for treatment of generalized anxiety disorder
(GAD). The patient reports reduced
anxiety, but also insomnia and a reduced interest in sex. You wonder if there
is another drug the patient might tolerate better while still maintaining treatment response. You retrieve a multiple
treatment comparison (MTC) metaanalysis that evaluates response and tolerability of all available GAD drugs.1 You
are not familiar with this type of study,
and you wonder if there are special issues to which you should attend in evaluating its methods and results.
MTC META-ANALYSES
Traditionally, a systematic review addresses the merits of one intervention
vs another (eg, placebo, or another active intervention). Data are combined
from all studies—often randomized
clinical trials (RCTs)—that meet eligibility criteria in what we will term a
pairwise meta-analysis. Compared with
single RCTs, meta-analysis improves
the power to detect differences and also
facilitates the examination of the extent to which there are important differences in treatment effects across
eligible RCTs—variability that is frequently called heterogeneity.2,3 Large,
1246
JAMA, September 26, 2012—Vol 308, No. 12
Multiple treatment comparison (MTC) meta-analysis uses both direct (headto-head) randomized clinical trial (RCT) evidence as well as indirect evidence from RCTs to compare the relative effectiveness of all included interventions. The methodological quality of MTCs may be difficult for clinicians
to interpret because the number of interventions evaluated may be large and
the methodological approaches may be complex. Clinicians and others evaluating an MTC should be aware of the potential biases that can affect the
interpretation of these analyses. Readers should consider whether the primary studies are sufficiently homogeneous to combine; whether the different interventions are sufficiently similar in their populations, study designs,
and outcomes; and whether the direct evidence is sufficiently similar to the
indirect evidence to consider combining. This article uses the existing Users’
Guides format to address study validity, interpretation of results, and application to a patient scenario.
www.jama.com
JAMA. 2012;308(12):1246-1253
unexplained heterogeneity may reduce a reader’s confidence in estimates of treatment effects.
A drawback of pairwise metaanalysis is that it evaluates the effects of
only 1 intervention vs 1 comparator
and does not permit inferences about the
relative effectiveness of several interventions unless all have been compared
directly in head-to-head trials. Yet, for
many medical conditions, there are numerous available interventions that
have—unfortunately—most frequently been compared with placebo and
seldom with one another.4,5 For example, despite 91 completed and ongoing RCTs addressing the effectiveness of
drugs for the treatment of rheumatoid arthritis, only 5 compare directly against
each other.4
Recently, another form of metaanalysis, the MTC meta-analysis (also
known as network meta-analysis because it involves creating a network of
treatments), has emerged.6,7 The MTC
approach provides estimates of effect
sizes for all possible pairwise comparisons whether or not they have been
compared head-to-head in RCTs.
FIGURE 1 displays examples of common networks of treatments. The eAppendix (available at http://www.jama
.com) provides a glossary of common
nomenclature found in MTCs.
When 2 interventions, A and B (eg,
paroxetine and lorazepam in FIGURE 2A),
Author Affiliations: Faculty of Health Sciences, University of Ottawa, Ottawa, Ontario, Canada (Dr
Mills); Department of Clinical Epidemiology and
Biostatistics, McMaster University, Hamilton,
Ontario, Canada (Drs Mills, Thorlund, Schu¨ nemann, and Guyatt); Stanford Prevention Research
Center, Departments of Medicine and Health
Research and Policy, Stanford University School of
Medicine, and Department of Statistics, Stanford
University School of Humanities and Sciences, Stanford, California (Dr Ioannidis); Department of Epidemiology, Johns Hopkins Bloomberg School of
Public Health, Baltimore, Maryland (Dr Puhan).
Corresponding Author: Edward J. Mills, PhD, MSc, Faculty of Health Sciences, University of Ottawa, Room
031, Thompson Hall, 35 University Private, Ottawa,
ON, K1N 7K4, Canada ([email protected]).
Users’ Guides to the Medical Literature Section Editor: Drummond Rennie, MD, Deputy Editor, JAMA.
©2012 American Medical Association. All rights reserved.
USERS’ GUIDES TO THE MEDICAL LITERATURE
Figure 1. Examples of Possible Network Geometry
Treatment or intervention node
Direct comparison in RCT
Star network
Single closed loop
B
A
G
Complex network
Connected network
A
B
A
B
C
G
A
C
C
F
D
B
D
D
C
E
E
F
The figure shows 4 network graphs. In each graph, lines show where direct comparisons exist from one or more trials. The star shows a network for which all interventions have a single mutual comparator. A single closed loop involves 3 interventions and can provide data to calculate both direct comparisons and indirect comparisons. A well-connected network in which all interventions have been compared against each other in multiple randomized controlled trials (RCTs). The complex
network has multiple loops and arms that may have sparse connections.
have not been compared directly, one
can still estimate their relative effect if
each has been compared directly against
another intervention, C (eg, placebo).
This is called an adjusted indirect comparison of A and B.8 Multiple treatment comparisons simultaneously include both direct and indirect evidence.
Indirect comparisons (and MTCs) make
the assumption that the relevant trials
are similar enough in essential features (eg, patient characteristics, definitions and measurements of outcomes, and risk of bias in the studies)9
to be combined.
There are 3 chief questions concerning the conduct of an MTC. First,
among trials available for pairwise comparisons, are the studies sufficiently homogeneous to combine for each intervention? Second, across the trials
involved in all interventions, are the
studies sufficiently similar, with the exception of the intervention (eg, in important features such as populations,
design, or outcomes), that they can be
compared? Third, where direct and indirect evidence exists, are the findings
sufficiently consistent that both direct
and indirect evidence can be relied
upon?
By including evidence from both direct and indirect comparisons, an MTC
may increase precision in estimates of
the relative effects of treatments and facilitate simultaneous comparisons, or
even ranking, of these treatments.7
However, because MTCs are methodologically sophisticated, they are often challenging to interpret.10
Herein, we will demystify the MTC
by using the 3-step validity-resultsapplicability approach of other Users’
Guides.11 BOX 1 includes all issues relevant to evaluating systematic reviews, with a discussion that highlights the issues most important in
MTCs.
ARE THE RESULTS
OF THE STUDY VALID?
Figure 2. A Simple Indirect Comparison and
Simple Closed Loop
Treatment or intervention node
Direct comparison in RCT
Indirect comparison
A Indirect comparison
Placebo
Paroxetine
Lorazepam
Did the Review Explicitly Address
a Sensible Clinical Question?
One can formulate questions of optimal patient management in terms of patients, interventions, comparators, and
outcomes.
Broader eligibility criteria may enhance generalizability of the results, but
may be misleading if participants are too
dissimilar and, as a consequence,
heterogeneity is large. Diversity of interventions may also be excessive if authors pool results from different doses,
or even different agents in the same
class (eg, all statins), based on the assumption that effects are similar. Readers should ask whether investigators
have been too broad in their inclusion
of different populations, of different
doses or different agents in the same
class, or of different outcomes, and
©2012 American Medical Association. All rights reserved.
B Closed loop
Nicotine replacement
therapy
Varenicline
Bupropion
A, In an indirect comparison, there is direct evidence from
paroxetine compared with placebo and direct evidence
of lorazepam compared with placebo. Therefore, the indirect comparison can be applied to determine the effect of paroxetine compared with lorazepam, even if no
direct head-to-head comparison exists for these 2 agents.
B, In the closed loop, there is direct evidence that compares nicotine replacement therapy with both varenicline
and also bupropion. There is also direct evidence comparing bupropion with varenicline. Therefore, enough information exists to evaluate whether the results are coherent between direct and indirect evidence.
JAMA, September 26, 2012—Vol 308, No. 12
1247
USERS’ GUIDES TO THE MEDICAL LITERATURE
Box 1. Critical Appraisal Guide to a Multiple Treatment Comparison
Meta-analysis
A. Are the results of the study valid?
Did the review explicitly address a sensible clinical question?
Was the search for relevant studies exhaustive?
Were there major biases in the primary studies?
B. What are the results?
What was the amount of evidence in the network?
Were the results similar from study to study?
Were the results consistent in direct and indirect comparisons?
What were the overall treatment effects and their uncertainty, and how did the
treatments rank?
Were the results robust to sensitivity assumptions and potential biases?
C. How can I apply the results to patient care?
Were all patient-important outcomes considered?
Were all potential treatment options considered?
Are any postulated subgroup effects credible?
What is the overall quality and what are limitations of the evidence?
Box 2. Using the Guide
Returning to our opening scenario, the
multiple treatment comparison we identified compared the efficacy and tolerability of available general anxiety disorder
(GAD) drugs, with a specific focus on
drugslicensedintheUnitedKingdom(because it was a British study) using RCT evidence.1 (P, GAD patients; I and C, available drugs; O, efficacy and tolerability.)
Patients in the included RCTs met similarly broad diagnostic criteria, including
being older and receiving inpatient and
outpatient care.29 The authors assumed all
doses of each drug were equivalent (no
good evidence exists for or against this assumption).Foroutcomes,theauthorsconsidered response (ⱖ50% reduction in
Hamilton Anxiety Scale), remission (final
score ⱕ7), and tolerability (not withdrawing because of adverse events). These outcomes are important to patients, and the
should look carefully at the resulting
heterogeneity.
When substantial clinical variability or statistical heterogeneity is present,
authors may conduct subgroup analyses or meta-regression to explain
heterogeneity. If such analyses are successful in explaining heterogeneity, the
1248
JAMA, September 26, 2012—Vol 308, No. 12
definitions of the outcomes were consistent across trials.
The search for published literature was
comprehensive, but the authors did not
search for unpublished data and did not
include RCTs of reboxetine, buspirone, or alprazolam, interventions for
which data were available. Because many
mental health RCTs are unpublished,20
the risk of publication bias is substantial. Eligibility assessment was done by
a single individual with a 10% random
selection of articles additionally reviewed by another reviewer. These same
reviewers independently extracted prespecified information on study characteristics and outcomes. The authors did
not perform a risk-of-bias assessment of
the included trials.
P indicates patients; I, interventions; C, comparators; and O, outcomes.
MTC may provide results that more optimally match the clinical settings and
the characteristics of the patient.12 For
example, in an MTC evaluating different statins for cardiovascular disease
protection, the authors used metaregression to address whether it was appropriate to combine results across
primary and secondary prevention
populations, different statins, and different doses of statins by examining
whether trials with these features exhibited different treatment effects.13
Including multiple control interventions (eg, placebo, no intervention, or
older standard of care) may enhance the
robustness and connectedness of the
treatment network. However, it is important to gauge and account for potential differences between control
groups. For example, due to potential
placebo effects, patients receiving placebo in a blinded RCT may have differing responses than patients receiving no intervention in a nonblinded
RCT. Thus, if 2 of 3 active treatments,
A and B, have been compared with placebo, and 2 of 3 active treatments, B and
C, have been compared with no intervention, the different choice of “control groups” may produce misleading
results. By examining whether certain
trials exhibit different treatment effects, meta-regression may address this
problem.
For example, in an MTC evaluating
the effectiveness of smoking cessation
therapies, the authors combined placebo-controlled groups with standardof-care control groups and then used
meta-regression to examine whether the
choice of control changed the effect
size.14 The authors found that trials
using placebo controls had smaller effect sizes than those using standard of
care, and this explained the identified
heterogeneity.
Was the Search for Relevant
Studies Exhaustive?
Some published MTCs have used the
search strategies from other systematic reviews as the basis for identifying potentially eligible trials. Readers
can be confident in such approaches
only if the authors have updated the
search to include recently published
trials.15
The eligible interventions can be unrestricted. Sometimes, however, the authors may choose to include only a specific set of interventions; eg, those
available in their country. Some indus-
©2012 American Medical Association. All rights reserved.
USERS’ GUIDES TO THE MEDICAL LITERATURE
try-initiated MTCs may choose to consider only a sponsored agent and its direct competitors.16 This may omit the
optimal agent for some situations and
tends to give a fragmented picture of the
evidence. It is typically best to include
all interventions.17 Data on clearly suboptimal or abandoned interventions
may still offer indirect evidence for
other comparisons.17
In an MTC of 12 treatments for major depression, the authors chose to exclude placebo-controlled RCTs and included only head-to-head active
treatment RCTs.18 However, publication bias in the antidepressant literature is well acknowledged,19,20 and by excluding placebo-controlled trials, the
analysis loses the opportunity to benefit from additional available evidence.21 Exclusion of eligible interventions, in this case placebos, may not just
decrease statistical power. Placebocontrolled trials may be different from
head-to-head comparison trials in their
conduct or in the degree of bias (eg, they
may have more or less publication bias
or selective outcome and analysis reporting). Thus, their exclusion may also affect
the point estimates of the effects of pairwise comparisons and may affect the relative ranking of regimens. Placing too
great an emphasis on the interpretation
of rankings rather than the relative comparisons may be misleading to clinicians and patients. When a network
meta-analysis of second-generation antidepressants was later conducted and did
include placebo-controlled trials, relying only on the relative differences between treatments using the same depression scale, the authors reached a different
interpretation from the earlier MTC.18,22,23
Finally, original trials often address
multiple outcomes. Selection of MTC
outcomes should not be data-driven but
should be based on importance for patients and consider both benefit and
harm outcomes.
Were There Major Biases
in the Primary Studies?
Trial-level limitations (eg, lack of concealment or blinding, or loss to followup) may bias results of RCTs.24 The
Box 3. Potential Reasons for Incoherence Between the Results
of Direct and Indirect Comparisons
Chance
Genuine diversity
Differences in enrolled participants (eg, entry criteria, clinical setting, disease
spectrum, baseline risk, selection based on prior response)
Differences in the interventions (eg, dose, duration of administration, prior administration [second-line treatment])
Differences in background treatment and management (eg, evolving treatment
and management in more recent years)
Differences in definition or measurement of outcomes
Bias in head-to-head (direct) comparisons
Optimism bias with unconcealed analysis
Publication bias
Selective reporting of outcomes and analyses
Inflated effect size in early stopped trials and in early evidence
Limitations in allocation concealment, blinding, loss to follow-up, analysis as
randomized
Bias in indirect comparisons
Each of the biasing issues above can affect the results of the direct comparisons
on which the indirect comparisons are based
Cochrane Collaboration provides detailed advice on dealing with triallevel biases in meta-analysis.25,26
Publication bias and selective outcome reporting bias 27,28 are major
threats to the results of pairwise metaanalyses and may affect the availability of information among MTCs. The
effect of these biases depends also on
whether specific interventions and comparisons are affected more than others
(BOX 2).
WHAT ARE THE RESULTS?
What Was the Amount of Evidence
in the Treatment Network?
One can gauge the amount of evidence in the treatment network from
the number of trials, total sample size,
and number of events for each treatment and comparison. Furthermore,
the extent to which the treatments are
connected in the network is an important determinant in the quality of the
evidence. Understanding the geometry of the network (nodes and links)
will permit the reader to examine the
larger picture and see what is com-
©2012 American Medical Association. All rights reserved.
pared with what.30 The authors should
present the structure of the network (as
in Figure 1 examples). Multiple treatment comparison analyses are feasible
when there is a consistent connection
in the network.
When different interventions have
only been compared with a single common comparator (eg, placebo), this is
a star network (Figure 1). A star network only allows for indirect comparison between active treatments, which
reduces confidence in effects, particularly if there are a limited number of
trials, patients, and events.31 When there
are data available using both direct and
indirect evidence of the same interventions, this is a closed loop. The presence of direct evidence increases confidence in the estimates of interest.
Often, a treatment network will include a mixture of exclusively indirect
links and closed loops. Most networks
have unbalanced shapes with many
trials of some comparisons but few or
none of others.30 In this situation, evidence may be of high quality for some
treatments and comparisons but of low
JAMA, September 26, 2012—Vol 308, No. 12
1249
USERS’ GUIDES TO THE MEDICAL LITERATURE
Box 4. Using the Guide
Returningtoourclinicalscenario,Figure3
displays the network of considered treatments. The authors excluded 13 relevant
RCTs because the interventions could not
be connected to any of the treatments in
the network. All treatments are informed
by placebo comparisons, some by more
trials than others, and some by direct comparison of active treatments . There are different degrees of evidence supporting the
comparative effectiveness of each of the
treatments.Forexample,thefluoxetineevidence comes only from a subgroup analysisofalargertrialthatcomparesitwithvenlafaxine and placebo. The evidence base
for fluoxetine is thus a very weak mix of
direct (1 trial) and indirect (1 fluoxetine
vs venlafaxine trial and 8 venlafaxine
vs placebo trials). In contrast, evidence for
pregabaline is based on a larger amount of
evidencefrombothdirectandindirectcomparisons. Five trials have compared pregabalinewithplacebo,2trialshavecompared
pregabalinewithlorazepam,and1trialhas
compared pregabaline with venlafaxine.
Here, the availability of 3 head-to-head
trials increases our confidence in effect
estimates.
Of 27 included RCTs, 23 reported
response to treatment, 23 reported withdrawals due to adverse effects, and 14
reported on remission, raising concern
about missing outcomes and selective
quality for others. Unfortunately, MTC
studies often fail to point out these differences in the evidence quality.
Were the Results Similar From
Study to Study?
Marked, unexplained differences in
treatment effects across trials in direct
comparisons of alternative agents, or of
single agents with no-treatment controls, lowers confidence in effect estimates. There are a number of statistical
measures available to guide assessment of whether the variability (heterogeneity) of results is high; eg, I2 and
other metrics.32 Authors should report the degree of heterogeneity in all
comparisons—the greater the heterogeneity, the less certain the results.
1250
JAMA, September 26, 2012—Vol 308, No. 12
reporting,inparticularforremission.Studiespredominantlyincludedpatientsreceiving treatment for between 4 and 12 weeks.
Because the authors did not report tests
for pairwise heterogeneity, we cannot tell
if the magnitude of effect was similar from
study to study in each of the direct comparisons or in each drug-placebo comparison. The authors did, however, conduct
severalsensitivityanalyses.Theauthorsalso
checked the coherence between direct and
indirectcomparisonsfromclosedloopsand
did not find significant incoherence.
They found that all drugs, except tigabine, exhibited significantly better response rates compared with placebo. Most
treatments offered about a 50% improvement in response rates over placebo. Regarding our patient’s current drug (paroxetine), no drug appeared significantly
better. When the authors calculated the
probability of which drug was best, fluoxetine ranked the highest (63%), whereas
paroxetine ranked third. Paroxetine had
a higher rate of drug discontinuation than
placebo (odds ratio, 2.51; 95% credible interval, 1.50-4.19), and ranked fourth for
tolerability. By comparison, fluoxetine
ranked first for response and third for tolerability. The results were not entirely robust in sensitivity analysis because the
best-ranked treatment for effectiveness
changed from fluoxetine to duloxetine.
Possible explanations of differences
in treatment effects can be examined
using subgroup analysis and metaregression. However, these analyses are
limited in the presence of small numbers of trials, and apparent subgroup effects often prove spurious, an issue to
which we will return in our discussion of applicability.33-35
Were the Results Consistent in
Direct and Indirect Comparisons?
Head-to-head direct comparisons of
treatments are generally more trustworthy than indirect comparisons.
However, head-to-head trials can also
yield misleading estimates; eg, when
conflicts of interest influence the choice
of comparators used or result in selec-
tive reporting. Therefore, indirect comparisons may on occasion provide more
trustworthy estimates.36
One can assess whether direct and indirect estimates yield similar effects
whenever there is a closed loop in the
network (as in Figure 2B). Statistical
methods exist for checking this type of
inconsistency, typically called a test for
incoherence.37,38
An evaluation examining 112 examples of direct vs indirect evidence
found that the results were statistically inconsistent 14% of the time.9 This
same evaluation found that comparisons with smaller number of trials and
measuring subjective outcomes had a
greater risk of incoherence. When incoherence is present, there are many explanations for the authors—and for
readers—to consider (BOX 3).
For example, a meta-analysis
examining the analgesic efficacy of
paracetamol (acetaminophen) plus
codeine in surgical pain displayed a
direct comparison indicating the intervention was more efficacious than
paracetamol alone (mean difference in
pain intensity change, 6.97; 95% CI,
3.56 to 10.37). The adjusted indirect
comparison did not show a significant
difference between paracetamol plus
codeine and paracetamol alone (−1.16,
95% CI; −6.95 to 4.64). 39 In this
example, the direct and indirect evidence was statistically significantly incoherent (P=.02). This indirect analysis
addresses a continuous subjective outcome (ie, pain intensity), and is thus
more likely to statistically demonstrate incoherence. The explanation for
incoherence may be that the direct trials
included patients who had lower pain
intensity at baseline, who had greater
representation in the direct trials, and
who may have been more responsive to
the addition of codeine.
Statistical testing is often not sufficient to document incoherence. Because MTCs are often imprecise, important differences may still exist in the
absence of a statistically significant difference. Authors should report on
whether they assessed incoherence and,
if they did, what they found.
©2012 American Medical Association. All rights reserved.
USERS’ GUIDES TO THE MEDICAL LITERATURE
What Were the Overall Treatment
Effects and Their Uncertainty,
and How Did Treatments Rank?
The treatment effects in an MTC are
typically displayed with common effect estimates along with 95% credible
intervals (CrIs). Credible intervals are
the Bayesian equivalent to the more
commonly understood, frequentist confidence intervals. When there are K interventions included in the treatment
network, there are K × (K −1)/2 possible pairwise comparisons. For example, if there are 7 interventions, then
there are 21 [7⫻(7 −1)/2] possible pairwise comparisons.
Besides presenting treatment effects, authors may also present the probability that each treatment is superior
to all other treatments, allowing ranking of treatments.40,41 Although this approach is appealing, it may also oversimplify an analysis because of fragility
in the rankings, because differences between the ranks may be too small to be
important, or because bias in the MTC
may importantly affect the rank order.
For example, an MTC examining directacting agents for hepatitis C found no
statistical difference for sustained virological response between teleprevir
and boceprevir (odds ratio, 1.42; 95%
CrI, 0.89-2.25); based on these results, the probability of being the best
favors teleprevir by far (93%) over boceprevir (7%).42 However, this 93%
probability provides a misleadingly
strong endorsement for teleprevir. One
may wish to know the probability that
teleprevir is better than boceprevir by
a clinically important margin, eg, at least
50% better, and this probability is only
about 40%. Therefore, probabilities of
being the best should be interpreted
with caution.
Were the Results Robust
to Sensitivity Assumptions
and Potential Biases?
Given the complexity of some MTC
meta-analyses, authors may assess the
robustness of their study findings by applying sensitivity analyses that show
how the results change if some criteria or assumptions change. Sensitivity
Figure 3. Treatment Network for the Drugs Considered in the Example Multiple Treatment
Comparison on Generalized Anxiety Disorder
Treatment or intervention node
n
Paroxetine
No. of RCTs in direct comparison
Lorazepam
Pregabalin
2
1
2
Fluoxetine
Sertraline
1
1
3
Duloxetine
Tiagabine
5
3
1
2
2
1
Escitalopram
Venlafaxine
5
4
8
Placebo
The lines between treatment nodes indicate the comparisons made throughout randomized clinical trials (RCTs).
The numbers on the lines indicate the number of RCTs informing a particular comparison. (The figure is based
on Baldwin et al.1)
analyses may include restricting the
analyses to trials with a low risk of bias
only or by examining different but related outcomes. The Cochrane Handbook for Systematic Reviews of Interventions 2 5 provides a discussion of
sensitivity analyses.
In an MTC on prevention of chronic
obstructive pulmonary disease (COPD)
exacerbations, the authors used the incidence rate as the primary outcome.
However, there is some debate on
whether incidence rates should be used
in COPD trials,43 so the authors conducted sensitivity analyses using the binary outcome of ever having an exacerbation. The results were sufficiently
similar to consider the analyses robust44 (BOX 4, FIGURE 3).
HOW CAN I APPLY THE RESULTS
TO PATIENT CARE?
Were All Patient-Important
Outcomes Considered?
Many MTCs report only 1 or a few selected outcomes of interest. For example, a recent MTC comparing the efficacy of antihypertensive treatments
only looked at heart failure and mortality,45 whereas an older MTC of an-
©2012 American Medical Association. All rights reserved.
tihypertensive treatments also considered coronary heart disease and stroke.46
Adverse events are infrequently assessed in meta-analyses and in MTCs,
reflecting poor reporting in primary
trials.47,48 Multiple treatment comparisons conducted in the context of health
technology assessment submissions and
evidence-based practice reports are
more likely to include multiple outcomes and assessments of harms than
the less lengthy, journal-based MTCs.
Were All Potential Treatment
Options Considered?
Multiple treatment comparisons may
place restrictions on what treatments
are examined. For example, for irritable bowel syndrome, an MTC may
focus on pharmacological agents,
neglecting RCTs of diet, peppermint
oil, and counseling. 49 Decisions to
focus on subclasses of drugs may also
be problematic. For example, in rheumatoid arthritis, biologic diseasemodifying antirheumatic drugs (biologics) are used for patients failing
conventional drugs. Five of the 9
available biologics are antitumor
necrosis factor agents (anti-TNFs).
JAMA, September 26, 2012—Vol 308, No. 12
1251
USERS’ GUIDES TO THE MEDICAL LITERATURE
Box 5. Using the Guide
The assessed outcomes (response, remission, and overall tolerability) are
important, but some additional outcomes would be useful to know (eg,
specific harms such as sexual dysfunction or insomnia). A concern is that
although 9 treatments were evaluated, only one randomized controlled trial evaluated the drug that
ranked the best (fluoxetine, comparing it to placebo and venlafaxine) and
that evidence was from a subgroup
analysis in the original trial where
fluoxetine was not better than its active comparator (venlafaxine). Only 3
drugs had been evaluated in direct
comparisons against at least 2 other
drugs. In the subgroup analysis examining only licensed dose drugs
available in the United Kingdom, the
results and rankings were different
than the primary analysis, thus lowering our confidence. In summary,
confidence in estimates is limited by
uncertainty regarding the risk of bias
in individual studies, the possibility of
publication bias, the small number of
patients studied and the correspondingly imprecise estimates, the lack of
information regarding heterogeneity
in individual comparisons, and a paucity of direct comparisons.
Are Any Postulated Subgroup
Effects Credible?
There are very few situations in which
investigations have convincingly established important differences in the relative effect of treatment according to patient characteristics.51 Criteria exist for
determining the credibility of subgroup
analyses.51 Multiple treatment comparisons allow a greater number of RCTs to
be evaluated and may offer more opportunities for subgroup analysis, but with
due skepticism and respect for credibility criteria.
In an MTC examining inhaled drugs
for COPD, the authors examined
whether a patient’s disease stage effected response according to severity of
airflow obstruction, measured by forced
expiratory volume in 1 second (FEV1).52
If FEV1 was 40% or less of predicted, then
long-acting anticholinergics, inhaled corticosteroids, and combination treatment reduced exacerbations significantly compared with long-acting
␤-agonists alone, but not if FEV1 was
greater than 40% predicted. This difference was significant for inhaled corticosteroids (P=.02 for interaction) and combination treatment (P=.01), but not for
long-acting anticholinergics (P=.46).
What Is the Overall Quality
of the Evidence,
and What Are the Limitations?
Box 6. Clinical Resolution
You conclude that low-quality evidence allows only weak inferences regarding the effectiveness of the treatment your patient is currently
receiving relative to other options.
You recognize, however, that other
drugs appear to have better tolerability profiles and will discuss with your
patient the possibility of switching to
one of these agents.
One recent MTC only considered
anti-TNFs and excluded other biologics.50 To the extent that the other biologic agents are equivalent or superior
to the anti-TNFs, their exclusion risks
giving clinicians the wrong impression of the best biologic agents.
1252
JAMA, September 26, 2012—Vol 308, No. 12
It is important to determine the quality
of the overall MTC so that a reader can
determine if the evidence provides
strong inferences. The following aspects are hallmarks of high-quality evidence in an MTC: individual studies are
at low risk of bias and publication bias
is unlikely; results are consistent in individual direct comparisons and individual comparisons with no-treatment
controls and are consistent between direct and indirect comparisons; sample
size is large and confidence intervals are
correspondingly narrow; and evidence
includes direct comparisons. If all of
these hallmarks are present and the differences in effect sizes are large, high
confidence in estimates may be warranted. However, in most cases, some
but not all of these hallmarks are present,
and confidence in key estimates are
likely to warrant only moderate or low
confidence.
As MTC methods become popular,
clinicians will find more than one metaanalysis addressing the same question
and—as in the case of the secondgeneration antidepressants that we have
already mentioned—may sometimes arrive at different conclusions.18,54 Differences in eligibility criteria and outcomes may explain these differences.
To illustrate, there are at least 15 published MTCs and health technology assessments on the comparative effectiveness of biologics for rheumatoid arthritis.
Across these published reports, one can
observe considerable discrepancies with
respect to the numbers and sets of included trials, assessment of heterogeneity and risk of bias, and inclusion/
exclusion/modification of outcomes.53
Concerns with each of these issues, as
well as discrepancies between MTCs,
leaves uncertainty regarding which of the
biologics has the greatest treatment effect (BOX 5 and BOX 6).
Author Contributions: Dr Mills had full access to all of
the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Financial Disclosures: All authors have completed and
submitted the ICMJE Form for Disclosure of Potential
Conflicts of Interest. Dr Mills reported consulting for
Merck & Co Inc, Pfizer Ltd, Novartis and Takeda on
MTC issues; receiving grant funding from the Canadian Institutes of Health Research (CIHR) Drug Safety
& Effectiveness Network (DSEN) NETMAN project to
develop methods and educational materials on MTCs;
and receiving salary support from the CIHR through
a Canada Research Chair. Dr Ioannidis reported receiving grant funding from the CIHR DSEN NETMAN
project to develop methods and educational materials on MTCs. Dr Thorlund reported consulting to
Merck & Co Inc, Pfizer Ltd, Novartis, Takeda, or
GlaxoSmithKline on MTC issues; receiving grant funding from the CIHR DSEN NETMAN project to develop methods and educational materials on MTCs;
and receiving salary support from the CIHR DSEN
NETMAN project. Dr Schu¨nemann is a coinvestigator on a grant from the CIHR DSEN NETMAN project
to develop methods and educational materials on
MTCs. Dr Guyatt reported consulting to BristolMyers Squibb and UpToDate Inc on MTC issues and
is a coinvestigator on a grant from the CIHR DSEN
NETMAN project to develop methods and educational materials on MTCs.
Funding/Support: The CIHR, through the DSEN
NETMAN project, provided support for this article.
Role of the Sponsor: The funding agency had no role
in the design or conduct of the study; in the collection, management, analysis, or interpretation of the
data; or in the preparation, review, or approval of the
manuscript. DSEN had no role in the design and conduct of the study; collection, management, analysis,
and interpretation of the data; and preparation, review, or approval of the manuscript.
©2012 American Medical Association. All rights reserved.
USERS’ GUIDES TO THE MEDICAL LITERATURE
Disclaimer: Several examples used to demonstrate concepts in this article originated from consulting projects.
Online-Only Material: The eAppendix is available at
www.jama.com.
Additional Contributions: We thank David Baldwin,
MBBS, DM, FRCPsych (University of Southampton),
for clarifications on his study.
REFERENCES
1. Baldwin D, Woods R, Lawson R, Taylor D. Efficacy of drug treatments for generalised anxiety disorder: systematic review and meta-analysis. BMJ. 2011;
342:d1199.
2. Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough. Lancet. 1998;
351(9096):123-127.
3. Sacks HS, Berrier J, Reitman D, Ancona-Berk VA,
Chalmers TC. Meta-analyses of randomized controlled trials. N Engl J Med. 1987;316(8):450-455.
4. Estellat C, Ravaud P. Lack of head-to-head trials
and fair control arms: randomized controlled trials of
biologic treatment for rheumatoid arthritis. Arch Intern Med. 2012;172(3):237-244.
5. Lathyris DN, Patsopoulos NA, Salanti G, Ioannidis
JP. Industry sponsorship and selection of comparators in randomized clinical trials. Eur J Clin Invest. 2010;
40(2):172-182.
6. Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat Methods
Med Res. 2008;17(3):279-301.
7. Lu G, Ades AE. Combination of direct and indirect
evidence in mixed treatment comparisons. Stat Med.
2004;23(20):3105-3124.
8. Bucher HC, Guyatt GH, Griffith LE, Walter SD. The
results of direct and indirect treatment comparisons
in meta-analysis of randomized controlled trials. J Clin
Epidemiol. 1997;50(6):683-691.
9. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ.
2011;343:d4909.
10. Mills EJ, Bansback N, Ghement I, et al. Multiple
treatment comparison meta-analyses: a step forward
into complexity. Clin Epidemiol. 2011;3:193-202.
11. Guyatt G, Straus S, Meade MO, et al. Therapy (randomized trials). In: Guyatt B, Rennie D, Meade MO,
Cook DJ, eds. Users’ Guides to the Medical Literature:
A Manual for Evidence-Based Clinical Practice. 2nd ed.
New York, NY: McGraw-Hill Co; 2008:67-86.
12. Nixon RM, Bansback N, Brennan A. Using mixed
treatment comparisons and meta-regression to perform indirect comparisons to estimate the efficacy of
biologic treatments in rheumatoid arthritis. Stat Med.
2007;26(6):1237-1254.
13. Mills EJ, Wu P, Chong G, et al. Efficacy and safety
of statin treatment for cardiovascular disease: a network meta-analysis of 170,255 patients from 76 randomized trials. QJM. 2011;104(2):109-124.
14. Mills EJ, Wu P, Lockhart I, Thorlund K, Puhan M,
Ebbert JO. Comparisons of high dose and combination nicotine replacement therapy, varenicline and bupropion for smoking cessation: a systematic review and
multiple treatment meta-analysis [published online
ahead of print August 6, 2012]. Ann Med. 2012;44
(6):588-597.
15. Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA
statement for reporting systematic reviews and metaanalyses of studies that evaluate health care interventions: explanation and elaboration. Ann Intern Med.
2009;151(4):W65-94.
16. Sutton A, Ades AE, Cooper N, Abrams K. Use of
indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics. 2008;
26(9):753-767.
17. Kyrgiou M, Salanti G, Pavlidis N, Paraskevaidis
E, Ioannidis JP. Survival benefits with diverse chemotherapy regimens for ovarian cancer: meta-analysis of
multiple treatments. J Natl Cancer Inst. 2006;98
(22):1655-1663.
18. Cipriani A, Furukawa TA, Salanti G, et al. Comparative efficacy and acceptability of 12 newgeneration antidepressants: a multiple-treatments
meta-analysis. Lancet. 2009;373(9665):746-758.
19. Turner EH, Matthews AM, Linardatos E, Tell RA,
Rosenthal R. Selective publication of antidepressant
trials and its influence on apparent efficacy. N Engl J
Med. 2008;358(3):252-260.
20. Ioannidis JP. Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized
trials? Philos Ethics Humanit Med. 2008;3:14.
21. Higgins JP, Whitehead A. Borrowing strength from
external trials in a meta-analysis. Stat Med. 1996;
15(24):2733-2749.
22. Ioannidis JP. Ranking antidepressants. Lancet.
2009;373(9677):1759-1760.
23. Gartlehner G, Hansen RA, Morgan LC, et al. Comparative benefits and harms of second-generation antidepressants for treating major depressive disorder:
an updated meta-analysis. Ann Intern Med. 2011;
155(11):772-785.
24. Devereaux PJ, Choi PT, El-Dika S, et al. An observational study found that authors of randomized
controlled trials frequently use concealment of randomization and blinding, despite the failure to report
these methods. J Clin Epidemiol. 2004;57(12):
1232-1236.
25. Sterne JAC, Egger M, Moher D. Addressing reporting biases. In: Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Intervention. Version 5.1.0 (updated March 2011). The
Cochrane Collaboration; 2011.
26. Moher D, Pham B, Jones A, et al. Does quality of
reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet.
1998;352(9128):609-613.
27. Chan AW, Hro´bjartsson A, Haahr MT, Gøtzsche
PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA. 2004;
291(20):2457-2465.
28. Young NS, Ioannidis JP, Al-Ubaydli O. Why current publication practices may distort science. PLoS
Med. 2008;5(10):e201.
29. Wittchen HU, Kessler RC, Beesdo K, Krause P,
Ho¨fler M, Hoyer J. Generalized anxiety and depression in primary care: prevalence, recognition, and
management. J Clin Psychiatry. 2002;63(Suppl 8):
24-34.
30. Salanti G, Kavvoura FK, Ioannidis JP. Exploring the
geometry of treatment networks. Ann Intern Med.
2008;148(7):544-553.
31. Mills EJ, Ghement I, O’Regan C, Thorlund K. Estimating the power of indirect comparisons: a simulation study. PLoS One. 2011;6(1):e16237.
32. Higgins JP, Thompson SG, Deeks JJ, Altman DG.
Measuring inconsistency in meta-analyses. BMJ. 2003;
327(7414):557-560.
33. Smith GD, Egger M. Going beyond the grand
mean: subgroup analysis in meta-analysis of randomised trials. In: Smith GD, Egger M, Altman DG,
eds. Systematic Reviews in Health Care: Metaanalysis in Context. 2nd ed. London, England: BMJ
Publishing Group; 2001:143-156.
34. Thompson SG, Higgins JP. How should metaregression analyses be undertaken and interpreted?
Stat Med. 2002;21(11):1559-1573.
35. Jansen J, Schmid C, Salanti G. When do indirect
and mixed treatment comparisons result in invalid
findings? a graphical explanation. Poster presented at:
19th Cochrane Colloquium; October 19-22, 2011; Madrid, Spain. P3B379.
36. Song F, Harvey I, Lilford R. Adjusted indirect comparison may be less biased than direct comparison for
evaluating new pharmaceutical interventions. J Clin
Epidemiol. 2008;61(5):455-463.
©2012 American Medical Association. All rights reserved.
37. Lu G, Ades A. Assessing evidence inconsistency
in mixed treatment comparisons. J Am Stat Assoc.
2006;101:447-459.
38. Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison
meta-analysis. Stat Med. 2010;29(7-8):932-944.
39. Zhang WY, Li Wan Po A. Analgesic efficacy of
paracetamol and its combination with codeine and caffeine in surgical pain—a meta-analysis. J Clin Pharm
Ther. 1996;21(4):261-282.
40. Salanti G, Ades AE, Ioannidis JP. Graphical methods and numerical summaries for presenting results from
multiple-treatment meta-analysis: an overview and
tutorial. J Clin Epidemiol. 2011;64(2):163-171.
41. Golfinopoulos V, Salanti G, Pavlidis N, Ioannidis JP.
Survival and disease-progression benefits with treatment regimens for advanced colorectal cancer: a
meta-analysis. Lancet Oncol. 2007;8(10):898-911.
42. Diels J, Cure S, Gavart S. PIN7 the comparative efficacy of telaprevir versus boceprevir in treatmentnaive and treatment-experienced patients with genotype 1 chronic hepatitis. Value Health. 2011;14(7):
A266-A266.
43. Aaron SD, Fergusson D, Marks GB, et al; Canadian Thoracic Society/Canadian Respiratory Clinical Research Consortium. Counting, analysing and reporting
exacerbations of COPD in randomised controlled trials.
Thorax. 2008;63(2):122-128.
44. Mills EJ, Druyts E, Ghement I, Puhan MA. Pharmacotherapies for chronic obstructive pulmonary disease: a multiple treatment comparison meta-analysis.
Clin Epidemiol. 2011;3:107-129.
45. Sciarretta S, Palano F, Tocci G, Baldini R, Volpe M.
Antihypertensive treatment and development of heart
failure in hypertension: a Bayesian network metaanalysis of studies in patients with hypertension and high
cardiovascular risk. Arch Intern Med. 2011;171(5):
384-394.
46. Psaty BM, Lumley T, Furberg CD, et al. Health outcomes associated with various antihypertensive therapies used as first-line agents: a network meta-analysis.
JAMA. 2003;289(19):2534-2544.
47. Hernandez AV, Walker E, Ioannidis JP, Kattan MW.
Challenges in meta-analysis of randomized clinical
trials for rare harmful cardiovascular events: the
case of rosiglitazone. Am Heart J. 2008;156(1):2330.
48. Ioannidis JP, Evans SJ, Gøtzsche PC, et al; CONSORT
Group. Better reporting of harms in randomized trials:
an extension of the CONSORT statement. Ann Intern
Med. 2004;141(10):781-788.
49. Ford AC, Talley NJ, Spiegel BM, et al. Effect of fibre, antispasmodics, and peppermint oil in the treatment of irritable bowel syndrome: systematic review and
meta-analysis. BMJ. 2008;337:a2313.
50. Schmitz S, Adams R, Walsh CD, Barry M, FitzGerald
O. A mixed treatment comparison of the efficacy of antiTNF agents in rheumatoid arthritis for methotrexate nonresponders demonstrates differences between treatments: a Bayesian approach. Ann Rheum Dis. 2012;
71(2):225-230.
51. Sun X, Briel M, Walter SD, Guyatt GH. Is a subgroup effect believable? Updating criteria to evaluate
the credibility of subgroup analyses. BMJ. 2010;
340:c117.
52. Puhan MA, Bachmann LM, Kleijnen J, Ter Riet G,
Kessels AG. Inhaled drugs to reduce exacerbations in
patients with chronic obstructive pulmonary disease: a
network meta-analysis. BMC Med. 2009;7:2.
53. Mills EJ, Thorlund K, Bansback N. Multiple treatment meta-analysis in rheumatoid arthritis can provide
weak inferences due to methodological approaches.
In: 2012 CADTH Symposium; April 15-17, 2012; Ottawa. OB1.
54. Gartlehner G, Morgan LC, Thieda P, et al. Drug Class
Review: Second Generation Antidepressants: Final Report Update 4. Portland, OR: Oregon Health & Science University; 2008.
JAMA, September 26, 2012—Vol 308, No. 12
1253