Randomised trials in surgery: problems and possible solutions
Peter McCulloch, senior lecturer in surgerya, Irving Taylor, professor of
surgeryb, Mitsuru Sasako,
professor of surgeryc, Bryony Lovett,
lecturer in surgeryd, Damian Griffin,
clinical readere.
a Academic Unit of Surgery, University of Liverpool, Clinical
Sciences Centre, University Hospital Aintree, Liverpool L9 7AL, b Department
of Surgery, Royal Free and University College Medical School, Charles Bell
House, London W1W 7EJ, c Gastric Surgery Division, National Cancer
Centre Hospital, Tsukiji, 5-1-1 Chuo-Ku, Tokyo, Japan, d Basildon
Hospital, Nethermayne, Basildon SS16 5NL, e Nuffield Department of
Orthopaedic Surgery, Orthopaedic Centre, Oxford OX3 7LD
Correspondence to: P McCulloch, Academic Unit of Surgery, University of
Liverpool, Clinical Sciences Centre, University Hospital Aintree, Long Lane,
Liverpool L9 7AL petermcculloch@cs.com
The quality and quantity of randomised trials of surgical techniques is
acknowledged to be limited. According to Peter McCullochand
colleagues, however, some aspects of surgery present special
difficulties for randomised trials. In this article they analysewhat
these difficulties are and propose some solutions for improvingthe
standards of clinical research in surgery
The improvement in the quality of clinical research in the past decade is to
be welcomed, but it carries its own dangers.Some have extrapolated
the advantages of the randomised controlledtrial (RCT) into the
dogma that it is the only valid method forcomparing treatments,1
ignoring the difficulties that havehampered the use of RCTs in some
disciplines. The RCT has theoreticaladvantages over other study
designs, but experimental studiescomparing treatment effect
estimates in randomised and non-randomisedstudies have not
consistently confirmed this, 23w1-w3 and the superiority of RCTs should not therefore be acceptedasaxiomatic.
Small, poorly conducted RCTs are more likely to result when RCTs are
difficult to conduct, and these may then be misleadingbecause their
design affords them unwarranted credibility. Surgeryseems to be such
an area. Until recently, most studies of operationswere
retrospective case series, with RCTs accounting for lessthan 10% of
the total.w4-w6 RCTs declined from 14% of research articles in the
British Journalof Surgery in 1985 to 5% in 1992. 45 Treatments in general
surgery are half as likely to be based on RCT evidence as treatments
in internal medicine. 67
Methodological quality was poorin 56% of RCTs comparing cancer
surgery techniques.8 Only 58%of these
studies described satisfactory randomisation, and fewsignificant
outcome differences were found, probably because oftype II
statisticalerrors.
Why is surgery so deficient? Some of the obstacles militate against all
scientific studies, but in view of previous specificcriticism,w7
we focus on randomised trials and try to evaluate the problemsand
suggest potential solutions.
Summary points
Research in surgery is disadvantaged by the limited quality and
quantity of randomised trials of surgical techniques
Some aspects of surgery present special difficulties for randomised
trials
The existence and nature of these difficulties needs to be recognised,
with strategies developed to overcome them
A proposed strategy involves the integration of modified randomised
trials with prospective audit and quality control studies
Obstacles to randomised trials in surgery
Historical, structural, and cultural
History
History did not favour the validation of surgeryby RCTs. After the
invention of anaesthesia and antiseptic techniques,surgical
treatments were rapidly developed for many previouslyuntreatable
conditions. Many current operations were thereforeintroduced well
before randomised trials became established inmedicineunlike
most modern drugs. Once a treatment is acceptedas standard, testing
it against placebo becomes difficult. Rarely,treatment benefits are
so obvious that a trial would clearly beunethical,9
but often lack of equipoise (see below) simplyprevents studies. This
problem applies equally to old drugsforexample, digoxinwhich
are also difficult to study in RCTs usingplacebo. For fields such as
cardiac surgery, transplantation,orthopaedics, and neurosurgery,
however, which have developedrapidly since 1950, surgeons cannot
fall back on history to explainthe lack of rigour in surgicalresearch.
Commercial competition and personal prestige
Doctors can be tempted to ignore evidencethat threatens their
personal interests. Objectivity about procedurescentral to a
surgeon's reputation is difficult, and RCTs may seemthreatening.
Private sector competition may affect surgeons particularlystrongly,
and it arguably influenced the introduction of laparoscopic
cholecystectomy. A consensus conference in 199410
quoted manyreports of increased bile duct injuries and only two RCTs.
1112 The benefits
that these showed were not overwhelming against thisevidence of
possible harm, but further RCTs were declared infeasiblebecause the
technique was already so widespread. Surgeons' eagernessto learn the
operation seemed related more to commercial concernsthan to concern
forpatients.
Surgeons' equipoise
Other doctors regard surgeons as making upin self confidence for
what they lack in patience, a stereotypecontaining a kernel of
truth. Career surgeons are selected fortraits that include comfort
with making important clinical decisionsquickly with incomplete
information. This quality, required fordecisive action during
operations, may make it difficult for themto be consciously
uncertain which of two treatments is better.This state of equipoise,
however, is a prerequisite for performingRCTs.
Box 1: Problems of
performing randomised trials in surgery
Structural, cultural, and psychological resistance exists to
the use of randomisation
The inherent variability of surgery requires precise
definition of interventions and close monitoring of quality
Surgical learning curves cause difficulty in timing and
performing randomised trials of new techniques
Comparisons of surgical and non-surgical treatments with
greatly different risks causes difficulties with patients'
equipoise
Rare conditions and urgent and life threatening situations
cause difficulties with recruitment, consent, and randomisation
Lack of funding, infrastructure, and experience of data
collection
These are real and major problems for surgicaltrials.w8
The difficulty is partly self inflicted as funding bodies are
influenced by the poor quality of much previous surgical research.w9
Lack of education in clinical epidemiology
Subjectively, surgeons' knowledge of clinicalepidemiology remains
poor despite relevant publications in surgicaljournalsw10-w17:
we have no objective evidence that they receive less specific
education than other doctors.13w15
Surgeons recruit patients for cancer chemotherapy trials14
w18 but less readily for trials of surgical technique. Whether lackof education can explain this isunclear.
Rare conditions and life threatening and urgent situations
Emergency surgery often occurs outside normalworking hours and
involves urgent lifesaving treatment, makingconsent and
randomisation difficult. Uncommon conditions are difficultto
investigate when accrual of patients takes over two years.13
Special technical problems
The learning curve
Some authors suggest that RCTs of new operationsshould begin with
the first patient.15w19 Operations,
however, are complex procedures, and quality in performancerequires
frequent repetition over time. Learning curves of similarlengths are
reported for disparate operations. 1617w20 During the learning curve,
errors and adverse outcomes are morelikely. Randomising between a
familiar and an unfamiliar operationtherefore introduces bias
against the latter, as observed forgastrectomy.18
This problem for surgical RCTs has few parallelsin drug
trials.
Definition
Variations on an operation are common andmay influence success
rates. When comparing operations, cleardefinitions are therefore
needed of the limits on acceptable technicalvariation. A standard
description may be necessary, proscribingall modifications. If
definitions are not precise, the treatmentsdelivered may overlap,
whereas in drug trials, treatments areusually simple to defineexactly.
Quality control monitoring
The technical quality of operations undoubtedlyaffects outcome. Poor
quality surgery represents failure to deliverthe intended treatment,
causing a difference between efficacyand effectiveness. Trials then
measure deliverability, not efficacy.w21 Quality control failures may
narrow important differences inthe surgery receivedfor
example, for gastric cancer 1920andmay influence outcomes.w22 w23 Defining and enforcing minimum
quality standards may be difficultfor surgicaltrials.
Development versus research
RCTs consume substantial resources and aretherefore not justified
for some questions about small modificationsto treatments. Surgical
technique typically progresses via suchmodifications, which
individually are unlikely to produce detectablebenefits, but which
collectively may do so. During the historicalprogression through
hand washing via the use of antiseptics tothe aseptic surgical
environment, the change in morbidity fromsurgical infection was
huge, but the increment with each stepwas small enough to allow
persistent scepticism.21 Small randomisedtrials of components of this progression showed no benefit.22
w24 If a positive RCT were required before adopting each small
improvement,most would be rejected, and progress would be slowed.
RCTs areappropriate where a clear, clinically important choice
existsbetween contrasting alternatives. For smaller changes, an
industrialparadigm may be needed.
Patients' equipoise
Three types of RCT are commonly describedas "surgical." Type
1 trialsstandard RCTs
comparing medical treatmentsin surgical patientsaccount
for 75% of "surgical trials."23Type
2 trialscomparing
surgical techniquespose
the problemsdescribed above. Type 3 trialscomparing
surgical and non-surgicaltreatmentspose
particular difficulties with the equipoise ofpatientsw25:
patients often reject RCTs because they do not wish their treatment
to be decided by chance.w26 Type 3 trials increase this discomfort
because the adverse effectsof the options often differ enormously
and the surgical optionis irreversible. Eighty two per cent of
problems preventing type3 trials are related to patients' equipoise.13
Examples of choicesinclude aspirin versus carotid endarterectomy to
prevent embolicstroke24 and goserelin
versus castration for prostate cancer.25 w27
Such trials may recruit slowly, or select an unusual subgroupof
patients, making them impractical or their results difficultto
generalise.w28
Blinding
Blinding is particularly difficult in surgicaltrials, although
creative solutionssuch
as the use of standardisedwound dressingscan
succeed.w29 Only a third of surgical trials examined by Solomon et al
hadadequate blinding of patients and/or surgeons.23
Proposed solutions
HistoryA
comprehensive review of the evidence base is needed to indicate areas warranting
new trials of oldtechniques.
Commercial competition and prestige may be less obstructive in a
framework ofcomprehensive continuous performance evaluation (seebelow).
Surgeons' equipoise , if confirmed, may need to be accommodatedby including parallel, non-randomised, preference arms alongsideRCTs.
Lack of funding, infrastructure, and experience of data collection
require a change to a culture of cooperationrather than competition.
This would facilitate the creation oflarge groups to perform
specific trials, thereby attracting fundingand developing the
infrastructure. This change would require supportfrom bodies
responsible for funding clinicalresearch.
Lack of education in clinical epidemiology needs to be investigated
and if necessarycorrected through the bodies responsible for
postgraduate surgicaleducation andtraining.
Rare conditions and life threatening and urgent situations will always
be challenging areas for RCTs,but have been successfully studied in
other disciplines.26w30 Paediatric
oncologists have illustrated the enormous value ofcooperation
through their success in trials on childhood leukaemia.27
w31
The learning curve needs to be recognised and evaluated usingappropriate statistical techniques.28 Trial
methodology willneed modificationfor
example, to show completion of the curvebefore beginning
randomisation,w32 as in two recent trials. 2930 In theory, patients could alsobe
randomised not to operations but to surgeons, who would performtheir
operation of preference, although this option remains untestedinpractice.
Definition of intervention and quality control monitoring
Precisely defined photographic or video evidenceand/or pathological
specimens could document the nature and qualityof the treatment
delivered, as in a recent trial of total mesorectalexcision in
rectal cancer.31 Norms for pre-trial success ratesand complications could provide a basis for defining acceptable
quality, making reliable surgical audit data essential for participationinRCTs.
Development v research
Surgeons should adopt industrial quality assessmenttechniques to
evaluate changes in technique where RCTs are inappropriate.32The Japanese term "kaizen" defines an evaluative system akin to
the classical audit loop.w33 Sequential approaches such as CUSUM33
and the "control curve"32are also
applicable to surgicalinnovation.
Patients' equipoise in type 3 trials may be helped by decisionanalysis techniquesw34 and carefully designed composite end
pointsw35 to reflect the contrasting possible outcomes of trialarms.
Blinding will always be difficult for surgical treatments,34but blinded observers should be used routinely for evaluating
outcomes.w36
Proposed framework for clinical research
in surgery
This analysis of the problems shows why current practices are not working. We
need a framework that reflects the difficultiesof evaluation in
surgery.
(Credit: MICHAEL DONNE/SPL)
Audit data collection
The baseline for the scientific study of surgeryis routine
collection of comprehensive data about practice andoutcomes. The
culture and organisation necessary for this shouldpermit easy
participation in trials, whereas where these are absent,trialists
have to develop the trial infrastructure and run itsimultaneously.
Surgeons need the resources to record a meaningfulaudit dataset,
entailing considerable investment in data acquisitionand managementresources.
Continuous performance evaluation
Systems for continuous quality control, usinginstruments such as
CUSUM, CRAM or VLAD plots 33
3536 or controlcurves32
should be used for the analysis of technical innovations.Indications
of outcome changes from this surveillance should leadto an audit or
kaizen assessment, using decision analysis techniquesto determine
whether an RCT is warranted.w37 Where it is not, continuing
prospective data collection and regularre-evaluation using bayesian
analysisw38 provide the best available data on outcome changes and
allowreconsideration of the need for anRCT.
Conduct of RCTs
When RCTs are necessary, they should routinelybe preceded by
preliminary phase 2S (phase 2 surgical) studies.These would develop
satisfactory definition criteria for the procedure,test measures of
surgical quality, define suitable end points,estimate the required
sample size, and analyse the learning curveof participants. Such
studies would reduce the problems of timingsurgical RCTs, and
randomisation could be introduced early using"tracker" designs if
desired.w39 During randomised data entry, continuous quality control
shouldbe linked to preplanned interim analyses by the trial review
committeeand appropriate stopping rules. Objective validation of
qualityshould evaluate images, pathological specimens, and outcome
dataagainst criteria drawn up in the phase 2S study. Parallel
preferencearms may be used to improve overall power and evaluate
generalisability.For type 3 trials, end point design and decision
analysis toolsto help patients understand their choices may beimportant.
Other sources of evidence
Historically, the surgical literature is poorin RCTs. Meta-analysis
of non-randomised evidence should thereforebe used wherever
appropriate. Where RCTs are difficult for soundreasons, prospective
non-randomised designs that minimise knownbiases should be
considered sympathetically by journals and fundingbodies.
Conclusion
The substantial obstacles to RCTs of surgical techniques should be recognised.
Alternative methods of studying operationsshould be based on
comprehensive prospective audit data. WhereRCTs are appropriate they
require attention to the issues of thelearning curve, intervention
definition, and quality control;a preliminary non-randomised phase
is also recommended.
Box 2: Suggestions
for progress in surgical research
Detailed prospective "audit" data collection is essential for
surgical research
Continuous quality control techniques should be used to help
determine whether randomised trials are appropriate
Larger randomised trials are needed, requiring better
cooperation
Learning curves and variations in technique and in quality of
surgery must be measured and controlled
Trials should incorporate a non-randomised initial phase to
permit these evaluations, determine suitable end points, and allow
sample size calculations
The need for study types other than randomised trials should
be recognised
Acknowledgments
This work was partly inspired by interactions with members of the Cochrane
Non-randomised Studies Methodology Group and bythe activities of its
surgical subgroup. We thank Laurent Audigeand Barney Reeves in
particular for their helpful criticisms.The final article is the
responsibility of the authors and notof the surgical
subgroup.
Footnotes
Funding:None.
Competing interests: PMcC and DG are members of the Cochrane Non-randomised
Studies Methodology Group and its surgical subgroup.PMcC is a member
of the Centre for Evidence Based Medicine andis paid to facilitate
at its Oxford teaching courses once ayear.
References cited in the text with the prefix "w" are available on bmj.com
Concato J, Shah N, Horwitz RI. Randomised controlled
trials, observational studies and the hierarchy of research designs. N
Engl J Med 2000; 342: 1887-1892[Abstract/Full
Text].
Lovett B, Sawyer W, Houghton J, Taylor I. Systematic review
of the methodological quality of randomized controlled trials of the
surgical excision of cancer [abstract]. Eur J Surg Oncol 2000; 26:
840.
Neugebauer E, Troidl H, Kum CK, Eypasch E, Miserez M. The
EAES consensus development conferences on laparoscopic cholecystectomy,
appendectomy and hernia repair. Surg Endosc 1995; 9: 550-563[Medline].
Barkun JS, Barkun AN, Sampalis JS, Fried G, Taylor B,
Wexler MJ, et al. Randomised controlled trial of laparoscopic versus mini-cholecystectomy.
The McGill gallstone treatment group. Lancet 1992; 340: 1116-1119[Medline].
McMahon AJ, Russell IT, Baxter JN, Ross S, Anderson JR,
Morran CG, et al. Laparoscopic versus mini-laparotomy cholecystectomy: a
randomised controlled trial. Lancet 1994; 343: 135-138[Medline].
Comparison of fluorouracil with additional levamisole,
higher-dose folinic acid, or both, as adjuvant chemotherapy for colorectal
cancer: a randomised trial. QUASAR Collaborative Group. Lancet 2000;
355: 1588-1596[Medline].
Parikh D, Chagla L, Johnson M, Lowe D, McCulloch P. D2
gastrectomy: lessons from a prospective audit of the learning curve. Br J
Surg 1996; 83: 1595-1599[Medline].
Testori M, Bartolomei M, Grana C, Mezzetti M, Chinol M,
Mazzarol G, et al. Sentinel node localization in primary melanoma: learning
curve and results. Melanoma Res 1999; 9: 587-593[Medline].
Bonenkamp JJ, Songun I, Hermans J, Sasako M, Welvaart K,
Plukker JTM, et al. Randomised comparison of morbidity and mortality after
D1 and D2 dissection for gastric cancer in Dutch patients. Lancet
1995; 345: 745-748[Medline].
Bonenkamp JJ, Hermans J, Sasako M, van de Velde CJH.
Extended lymph node dissection for gastric cancer. N Engl J Med 1999;
340: 908-914[Abstract/Full
Text].
Cuschieri A, Weeden S, Fielding J, Bancewicz J, Craven J,
Joypaul V, et al. Patient survival after D1 and D2 resecctions for gastric
cancer: long term results of the MRC randomised surgical trial. Br J
Cancer 1999; 79: 1522-1530[Medline].
Vogelzang NJ, Chodak GW, Soloway MS, Block NL, Schellhammer
PF, Smith Jr JA, et al. Goserelin versus orchiectomy in the treatment of
advanced prostate cancer: final results of a randomized trial. Zoladex
Prostate Study Group. Urology 1995; 46: 220-226[Medline].
Gausche M, Lewis RJ, Stratton SJ, Haynes BE, Gunter CS,
Goodrich SM, et al. Effect of out-of-hospital pediatric endotracheal
intubation on survival and neurological outcome: a controlled clinical
trial. JAMA 2000; 283: 783-790[Medline].
Nesbit ME, Sather H, Robison LL, Donaldson M, Littman P,
Ortega JA, et al. Sanctuary therapy: a randomized trial of 724 children with
previously untreated acute lymphoblastic leukemia: a report from Children's
Cancer Study Group. Cancer Res 1982; 42: 674-680[Abstract].
Ramsay CR, Grant AM, Wallace SA, Garthwaite PH, Monk AF,
Russell IT. Statistical assessment of the learning curves of health
technologies. Health Technology Assess 2001; 5: 1-79.
Deguili M, Sasako M, Ponti A, Soldati T, Danese F, Calvo F.
Morbidity and mortality after D2 gastrectomy for gastric cancer: results of
the Italian Gastric Cancer Study Group prospective multicenter surgical
study. J Clin Oncol 1998; 16: 1-6[Medline].
Kapiteijn E, Kranenbarg EK, Steup WH, Taat CW, Rutten HJ,
Wiggers T, et al. Total mesorectal excision (TME) with or without
preoperative radiotherapy in the treatment of primary rectal cancer.
Prospective randomised trial with standard operative and histopathological
techniques. Dutch ColoRectal Cancer Group. Eur J Surg 1999; 165:
410-420[Medline].
Mohammed MA, Cheng KK, Rouse A, Marshall T. Bristol,
Shipman, and clinical governance: Shewhart's forgotten lessons. Lancet
2001; 357: 463-467[Medline].
Van Rij AM, McDonald JR, Pettigrew RA, Putterill MJ, Reddy
CK, Wright JJ. CUSUM as an aid to early assessment of the surgical trainee.
Br J Surg 1995; 82: 1500-1503[Medline].
Poloniecki J, Valencia O, Littlejohns P. Cumulative risk
adjusted mortality chart for detecting changes in death rate: observational
study of heart surgery. BMJ 1998; 316: 1697-1700[Abstract/Full
Text].
Lovegrove J, Valencia O, Treasure T, Sherlaw-Johnson C,
Gallivan S. Monitoring the results of cardiac surgery by variable
life-adjusted display. Lancet 1997; 350: 1128-1130[Medline].
ALL INFORMATION, DATA, AND
MATERIAL CONTAINED, PRESENTED, OR PROVIDED HERE IS FOR GENERAL INFORMATION
PURPOSES ONLY AND IS NOT TO BE CONSTRUED AS REFLECTING THE KNOWLEDGE OR OPINIONS
OF THE PUBLISHER, AND IS NOT TO BE CONSTRUED OR INTENDED AS PROVIDING MEDICAL OR
LEGAL ADVICE. THE DECISION WHETHER OR NOT TO VACCINATE IS AN IMPORTANT AND
COMPLEX ISSUE AND SHOULD BE MADE BY YOU, AND YOU ALONE, IN CONSULTATION WITH
YOUR HEALTH CARE PROVIDER.
"A foolish faith in authority is the worst enemy of truth."
-- Albert Einstein, letter to a friend, 1901
"I know of no safe depository of the ultimate powers of the society but the people themselves, and if we think them not enlightened enough to exercise control with a wholesome discretion, the remedy is not to take it from them, but to inform their discretion by education."
-- Thomas Jefferson, letter to William C. Jarvis, September 28, 1820
"What's the point of vaccination if it doesn't protect you from the unvaccinated?"
-- Sandy Gottstein
"Who gets to decide what the greater good is and how many will be sacrificed to it?"