| |
|
Prevention & Treatment, Volume
5, Article
23, posted
July 15, 2002
Copyright 2002 by the
American Psychological Association
The Emperor's New Drugs: An Analysis of Antidepressant
Medication Data Submitted to the U.S. Food and Drug Administration
Irving Kirsch
University of Connecticut
Thomas J. Moore
The George Washington University School of Public Health and
Health Services
Alan Scoboria and Sarah S. Nicholls
University of Connecticut
ABSTRACT
This article reports an analysis of the efficacy data submitted to the
U.S. Food and Drug Administration for approval of the 6 most widely
prescribed antidepressants approved between 1987 and 1999. Approximately
80% of the response to medication was duplicated in placebo control
groups, and the mean difference between drug and placebo was approximately
2 points on the 17-item (50-point) and 21-item (62-point) Hamilton
Depression Scale. Improvement at the highest doses of medication was not
different from improvement at the lowest doses. The proportion of the drug
response duplicated by placebo was significantly greater with observed
cases (OC) data than with last observation carried forward (LOCF) data. If
drug and placebo effects are additive, the pharmacological effects of
antidepressants are clinically negligible. If they are not additive,
alternative experimental designs are needed for the evaluation of
antidepressants.
Keywords: drug efficacy, placebo, meta-analysis, depression
Correspondence concerning this article should be addressed
to to Irving Kirsch, Ph.D., Department of Psychology, University of
Connecticut , 406 Babbidge Road, U-20, Storrs, CT 06269-1020.
E-mail: irving.kirsch@uconn.edu
Although antidepressant medication is widely regarded as efficacious, a
recent meta-analysis of published clinical trials indicates that 75 percent
of the response to antidepressants is duplicated by placebo ( Kirsch
& Sapirstein, 1998). These data have been challenged on a number of
grounds, including the restriction of the analyses to patients who had
completed the trials, the limited number of clinical trials assessed, the
methodological characteristics of those trials, and the use of meta-analytic
statistical procedures (Klein, 1998).
The present article reports analyses of a data set to which these
objections do not apply, namely, the data submitted to the U.S. Food and
Drug Administration (FDA) for approval of recent antidepressant medications.
We analyzed the efficacy data submitted to the FDA for the six most widely
prescribed antidepressants approved between 1987 and 1999 ( RxList:
The Internet Drug Index, 1999): fluoxetine (Prozac), paroxetine (Paxil),
sertraline (Zoloft), venlafaxine (Effexor), nefazodone (Serzone), and
citalopram (Celexa). These represent all but one of the selective serotonin
reuptake inhibitors (SSRI) approved during the study period. The FDA data
set includes analyses of data from all patients who attended at least one
evaluation visit, even if they subsequently dropped out of the trial
prematurely. Results are reported from all well controlled efficacy trials
of the use of these medications for the treatment of depression. FDA medical
and statistical reviewers had access to the raw data and evaluated the
trials independently. The findings of the primary medical and statistical
reviewers were verified by at least one other reviewer, and the analysis was
also assessed by an independent advisory panel. More important, the FDA data
constitute the basis on which these medications were approved. Approval of
these medications implies that these particular data are strong enough and
reliable enough to warrant approval. To the extent that these data are
flawed, the medications should not have been approved.
Khan, Warner, and Brown (2000) recently reported
the results of a concurrent analysis of the FDA database. Similar to the
Kirsch and Sapirstein report, their analysis revealed that 76% of response
to antidepressant was duplicated by placebo. In several respects, our
analyses of the FDA data differ from, and supplement those, reported by Khan
et al. First, although information on all efficacy trials for depression are
included in the FDA database, mean change scores were not reported to the
FDA for some trials on which a significant difference between drug and
placebo was not obtained. Thus, the summary data reported by Khan et al.
overestimate drug/placebo differences. In contrast, we provide an estimate
of drug/placebo differences that is based on those medications for which for
all clinical trials were reported, thus eliminating the bias due to the
exclusion of trials least favorable to the medication.
Second, the means reported by Khan et al. (2000)
were not adjusted for sample size. Thus, trials with small numbers of
participants were given equal weight with the more reliable data from larger
trials. In our analysis, mean scores were weighted by sample size, and
summary statistics were calculated across medications for which full data
were available.
Third, two methods of accounting for attrition were used in the data
reported to the FDA: last observation carried forward (LOCF) and observed
cases (OC). In LOCF analyses, when a patient drops out of a trial, the
results of the last evaluation visit are carried forward as if the patient
had continued to the completion of the trial without further change. In OC
analyses, the results are reported only for those patients who are still
participating at the end of the time period being assessed. Because patients
who discontinue medication are regarded as treatment failures, LOCF analyses
are widely considered to provide a more conservative test of drug effects,
and the Khan et al. (2000) analysis was confined to
those data. We used the FDA database to test this hypothesis empirically by
comparing LOCF and OC data for all trials in which both were reported.
Finally, in many of the trials reported to the FDA, various fixed doses
of the active medication were evaluated in separately randomized arms.
Finding a dose-response relationship is one method of establishing the
presence of true drug effects. Also, a dose-response relationship suggests
that the drug effect may be underestimated in trials involving low dosages.
Therefore, our analyses include a comparison of treatment effects at the
lowest doses employed in fixed-dose trials with those at the highest doses.
Method
Using the Freedom of Information Act, we obtained the medical and
statistical reviews of every placebo controlled clinical trial for
depression reported to the FDA for initial approval of the six most widely
used antidepressant drugs approved within the study period. We received
information about 47 randomized placebo controlled short-term efficacy
trials conducted for the six drugs in support of an approved indication of
treatment of depression. The breakdown by efficacy trial was as follows:
fluoxetine (5), paroxetine (16), sertraline (7), venlafaxine (6), nefadozone
(8), and citalopram (5). Data on relapse prevention trials were not
analyzed.
In order to generalize the findings of the clinical trial to a larger
patient population, FDA reviewers sought a completion rate of 70% or better
for these typically 6-week trials. Only 4 of 45 trials, however, reached
this objective. Completion rates were not reported for two trials. Attrition
rates were comparable between drug and placebo conditions. Of those trials
for which these rates were reported, 60% of the placebo patients and 63% of
the study drug patients completed a 4-, 5-, 6-, or 8-week trial.
Thirty-three of 42 trials lasted 6 weeks, 6 trials lasted 4 weeks, 2 lasted
5 weeks, and 6 lasted 8 weeks. Patients were evaluated on a weekly basis.
For the present meta-analysis, the data were taken from the last visit prior
to trial termination.
Although the FDA approved the drugs for "the treatment of depression" not
otherwise specified, all but one of the clinical trials were conducted on
patients described as moderately to severely depressed (their mean baseline
Hamilton Depression Scale [HAM-D] scores ranged from 21.0 to 29.7). One of
the trials was conducted on patients with mild depression (mean baseline
HAM-D score = 17.21). Thirty-nine of the 47 clinical trials focused on
outpatients, 3 included both inpatients and outpatients, 3 were conducted
with elderly patients (including one of the trials with both inpatients and
outpatients), and 2 were conducted among patients hospitalized for severe
depression. No trial was reported for the treatment of children or
adolescents.
After 2 weeks, replacement of patients was allowed for those who
investigators determined were not improving in three fluoxetine trials and
in the three sertraline trials for which data were reported. The trials also
included a 1- to 2-week placebo washout period, during which patients were
given placebo. Those whose scores improved 20 percent or more were excluded
from the study. The use of other psychoactive medication was reported in 25
trials. In most trials, a chloral hydrate sedative was permitted in doses
ranging from 500 mg to 2000 mg per day. Other psychoactive medication was
usually prohibited but still was reported as having been taken in several
trials.
A shortcoming in the FDA data is the absence in many of the reports of
reported standard deviations. This precludes direct calculation of effect
sizes. Calculating effect sizes by dividing mean differences by standard
deviations allows researchers to combine the results of trials on which
different outcome measurement scales had been used. However, when the
same scale is used across studies, it is possible to combine the results of
the studies without first dividing them by the standard deviation of the
scales ( Hunter & Schmidt, 1990). The HAM-D was the
primary endpoint for all of the reported trials in this analysis, thereby
allowing direct comparisons of outcome data without conversion into
conventional effect size (D) scores. The HAM-D is a widely used
measure of depression, with interjudge reliability coefficients ranging from
r = .84 to r = .90 (Hamilton, 1960).
For each clinical trial, we recorded the mean improvement in HAM-D scores
in the drug and placebo groups. Next, improvement in the placebo group was
divided by improvement in the drug group to provide an estimate of the
degree of improvement in the drug-treated patients that was duplicated in
the placebo group. Then, the mean of each of these trials, weighted for
sample size, was calculated within each drug.
Results
Sample size and mean change on the HAM-D in drug and placebo conditions
are presented in Table 1 for each of the 38
clinical trials on which LOCF data were reported.
| Table 1 |
| Mean LOCF HAM-D Change in Drug and Placebo Conditions
on Each Clinical Trial |
|
| Drug and
study |
Drug |
Placebo |
|
|
| Change |
N |
Change |
N |
|
| Fluoxetine |
|
|
|
|
| |
19 |
-12.50 |
22 |
-5.50 |
24 |
| |
25 |
-7.20 |
18 |
-8.80 |
24 |
| |
27 |
-11.00 |
181 |
-8.40 |
163 |
| |
62 (mild) |
-5.89 |
299 |
-5.82 |
56 |
| |
62 (moderate) |
-8.82 |
297 |
-5.69 |
48 |
| Paroxetine |
|
|
|
|
| |
01-001 |
-13.50 |
24 |
-10.50 |
24 |
| |
02-001 |
-12.30 |
51 |
-6.81 |
53 |
| |
02-002 |
-10.90 |
36 |
-5.77 |
34 |
| |
02-003 |
-9.73 |
33 |
-7.15 |
33 |
| |
02-004 |
-12.70 |
36 |
-7.61 |
38 |
| |
03-001 |
-10.80 |
40 |
-4.70 |
38 |
| |
03-002 |
-8.00 |
40 |
-6.22 |
40 |
| |
03-003 |
-9.90 |
41 |
-10.00 |
42 |
| |
03-004 |
-10.40 |
37 |
-6.65 |
37 |
| |
03-005 |
-10.00 |
40 |
-4.07 |
42 |
| |
03-006 |
-9.08 |
39 |
-2.97 |
37 |
| |
Par 09 |
-9.14 |
403 |
-8.23 |
51 |
| Sertraline |
|
|
|
|
| |
103 |
-9.92 |
261 |
-7.60 |
86 |
| |
104 |
-10.60 |
142 |
-8.20 |
141 |
| |
315 |
-8.90 |
76 |
-7.80 |
73 |
| Venlafaxine |
|
|
|
|
| |
203 |
-11.20 |
231 |
-6.70 |
92 |
| |
301 |
-13.90 |
64 |
-9.45 |
78 |
| |
302 |
-11.90 |
65 |
-8.88 |
75 |
| |
303 |
-10.10 |
69 |
-9.89 |
79 |
| |
313 |
-11.00 |
227 |
-9.49 |
75 |
| |
206 |
-14.20 |
46 |
-4.80 |
47 |
| Nefazodone |
|
|
|
|
| |
03A0A-003 |
-9.57 |
101 |
-8.00 |
52 |
| |
03A0A-004A |
-8.90 |
153 |
-8.90 |
77 |
| |
03A0A-004B |
-11.40 |
156 |
-9.50 |
75 |
| |
030A2-0004 / 0005 |
-10.00 |
74 |
-9.84 |
70 |
| |
030A2-0007 |
-12.30 |
175 |
-9.80 |
47 |
| |
CN104-002 |
-10.80 |
57 |
-8.20 |
57 |
| |
CN104-005 |
-12.00 |
86 |
-8.00 |
90 |
| |
CN104-006 |
-10.00 |
80 |
-8.90 |
78 |
| Citalopram |
|
|
|
|
| |
85A |
-8.78 |
82 |
-6.63 |
87 |
| |
91206 |
-9.95 |
521 |
-8.32 |
129 |
| |
89303 |
-11.76 |
134 |
-10.24 |
66 |
| |
86141 |
-6.26 |
98 |
-4.74 |
51 |
|
Mean improvement (weighted for sample size) for each of the six
medications is presented in Table 2.
| Table 2 |
| Mean Improvement (Weighted for Sample Size) in Drug
and Placebo Conditions, and Proportion of the Drug Response That Was
Duplicated in Placebo Groups for Each Antidepressant |
|
| Drug |
K |
N |
Improvement |
|
| Drug |
Placebo |
Propor-
tion |
|
| Fluoxetine |
5 |
1,132 |
8.30 |
7.34 |
.89 |
| Paroxetine |
12 |
1,289 |
9.88 |
6.67 |
.68 |
| Sertraline |
3 |
779 |
9.96 |
7.93 |
.80 |
| Venlafaxine |
6 |
1,148 |
11.54 |
8.38 |
.73 |
| Nefazodone |
8 |
1,428 |
10.71 |
8.87 |
.83 |
| Citalopram |
4 |
1,168 |
9.69 |
7.71 |
.80 |
|
| Note. Data were not reported from
four paroxetine trials, four sertraline trials, and one citalopram trial
in which no significant differences were found. K = number of
trials. |
The 17-item version of the HAM-D was used in all trials of paroxetine,
sertraline, nefazodone, and citalopram. The 21-item version was used in
trials of fluoxetine and venlafaxine. One citalopram trial reported scores
on both the 17-item scale and the 21-item scale, and another reported scores
on the 17-item scale and a 24-item version of the scale. We used the 17-item
scores for citalopram studies because this version of the scale was used in
all of the clinical trials of that medication. Calculation of response to
drug and placebo for the two studies using different forms of the scale
reveals that the drug/placebo comparison is comparable, regardless of which
scale is used.
Mean improvement scores were not reported in 9 of the 47 trials.
Specifically, four paroxetine trials involving 165 participants, four
sertraline trials involving 486 participants, and one citalopram trial
involving 274 participants were reported as having failed to achieve a
statistically significant drug effect, but the mean HAM-D scores were not
reported. This represents 11% of the patients in paroxetine trials,
38% of the patients in sertraline trials, and 23% of the patients in
citalopram trials. In each case, the statistical or medical reviewers stated
that no drug effect was found.
Including data from paroxetine and sertraline trials in summary
statistics would produce an inflated estimate of drug effects. Therefore, to
obtain an unbiased estimate of drug and placebo effects across medications,
we calculated weighted means of all medications for which data on all
clinical trials were reported. This included the data for fluoxetine,
venlafaxine, and nefadozone. The weighted mean difference between the drug
and placebo groups across these three medications was 1.80 points on the
HAM-D, and 82% of the drug response was duplicated by the placebo response.
A t-test, weighted for sample size, indicated that the drug/placebo
difference was statistically significant, t(18) = 5.01, p <
.001.
On most of the clinical trials, medication dose was titrated individually
for each patient within a specified range. However, in 12 trials involving
1,942 patients, various fixed doses of a medication were evaluated in
separately randomized arms. It is possible that some of the doses used in
these trials were subclinical. If this is the case, inclusion of these data
could result in an underestimate of the drug effect. To test this
possibility, we compared LOCF data at the lowest and highest doses reported
in each study. Across these 12 trials, mean improvement (weighted for sample
size) was 9.57 points on the HAM-D at the lowest dose evaluated and 9.97 at
the highest dose. This difference between high and low doses of
antidepressant medication was not statistically significant.
Finally, we tested the hypothesis that LOCF analyses provide more
conservative tests of drug effects than do OC analyses. LOCF means were
reported for all 38 of the 46 trials in which means of any kind were
reported. OC means were reported for 27 of these 38 trials. In 22 trials,
the difference between drug and placebo group was not statistically
significant with either LOCF or OC measures. In 12 trials, the difference
was statistically significant with both measures. In 8 trials, the
difference was significant with LOCF but not with OC, and 4 trials were
reported to have shown no difference between drug and placebo without
specifying an attrition rule. For the 27 trials for which both sets of means
were reported, correlated t-tests indicated that mean improvement
scores were significantly greater with OC data than with LOCF data for both
drug, t(26) = 12.46, p < .001, and placebo, t(26) =
10.56, p < .001, as was the proportion of the drug response
duplicated by placebo, t(26) = 3.36, p < .01. In the LOCF
data, 79% of the drug response was duplicated in the placebo groups; in the
OC data, 85% of the drug response was duplicated by placebo. Thus, LOCF
analyses indicate a greater drug/placebo difference than do OC analyses.
Discussion
In clinical trials, the effect of the active drug is assumed to be the
difference between the drug response and the placebo response. Thus, the FDA
clinical trials data indicate that 18% of the drug response is due to the
pharmacological effects of the medication. This is based on LOCF data, in
which the drug effect was significantly stronger than in OC data, and it is
obtained after those who show the greatest response to placebo are excluded
from the study. Overall, the drug/placebo difference was less than 2 points
on the HAM-D, a highly reliable physician-rated scale that has been reported
to be more sensitive than patient-rated scales to drug/placebo differences ( Murray,
1989). The range was from a 3-point drug/placebo difference for
venlafaxine to a 1-point difference for fluoxetine, both of which were on
the 21-item (64-point) version of the scale. As intimated in FDA memoranda (Laughren,
1998; Leber, 1998), the clinical significance of
these differences is questionable.
The proportion of the drug response duplicated in placebo groups is
greater in the FDA clinical trials data than in previous meta-analyses ( Khan
et al., 2000; Kirsch & Sapirstein, 1998). The
differences may be due to two factors: publication bias and missing data.
Publication bias is avoided in the FDA data by the requirement that the
results of all trials for an indication be reported. Calculating summary
statistics only for medications for which means on all trials were reported
circumvented the missing data problem.
Of the two widely used methods of coping with attrition in clinical
trials, LOCF analyses are considered the more stringent. The FDA data set
calls this assumption into question. The proportion of the drug effect
duplicated by placebo was significantly larger in the OC data set than in
the corresponding LOCF data set. In addition, the degrees of freedom are
necessarily larger in LOCF analyses, thereby making it more likely that a
mean difference will be statistically significant. In the 47 clinical trials
obtained from the FDA, there were no reported instances in which OC data
yielded significant differences that were not detected in LOCF analyses.
However, in 8 trials, LOCF data yielded significant differences that were
not detected when OC data were analyzed. These data indicate that, compared
with LOCF analyses, OC analyses provide more conservative tests of
drug/placebo differences.
Although mean differences were small, most of them favored the active
drug, and overall, the difference was statistically significant. There were
only 4 trials in which mean improvement scores in the placebo condition were
equal to or higher than those in the drug condition, and in no case was
placebo significantly more effective than active drug. This may indicate a
small but significant drug effect. However, it is also possible that this
difference between drug and placebo is an enhanced placebo effect due to the
breaking of blind. Antidepressant clinical trial data indicate that the
ability of patients and doctors to deduce whether they have been assigned to
the drug or placebo condition exceeds chance levels ( Rabkin
et al., 1986), possibly because of the greater occurrence of side
effects in the drug condition. Knowing that one has been randomized to the
active drug condition is likely to enhance the placebo effect, whereas
knowledge of assignment to the placebo group ought to decrease its effect (Fisher
& Greenberg, 1993). Enhanced drug effects due to breaking blind in
clinical trials may be small, but evaluation of the FDA database indicates
that the drug/placebo difference is also very small, amounting to about 2
points on the HAM-D.
Although our data suggest that the effect of antidepressant drugs are
very small and of questionable clinical significance, this conclusion rests
on the assumption that drug effects and placebo effects are additive.
However, it is also possible that antidepressant drug and placebo effects
are not additive and that the true drug effect is greater than the
drug/placebo difference. Clinical trials are based on the assumption of
additivity ( Kirsch, 2000). That is, the drug is deemed
effective only if the response to it is significantly greater than the
response to placebo, and the magnitude of the drug effect is assumed to be
the difference between the response to drug and the placebo. However, drug
and placebo responses are not always additive. Alcohol and stimulant drugs,
for example, produce at least some drug and placebo effects that are not
additive. Placebo alcohol produces effects that are not observed when
alcohol is administered surreptitiously, and alcohol produces effects that
are not duplicated by placebo alcohol (Hull & Bond, 1986).
The placebo and pharmacological effects of caffeine are additive for
feelings of alertness but not for feelings of tension (Kirsch
& Rosadino, 1993), and similarly mixed results have been reported for
other stimulants (Lyerly, Ross, Krugman, & Clyde, 1964;
Ross, Krugman, Lyerly, & Clyde, 1962).
If antidepressant drug effects and antidepressant placebo effects are not
additive, the ameliorating effects of antidepressants might be obtained even
if patients did not know the drug was being administered. If that is the
case, then antidepressant drugs have substantial pharmacologic effects that
are duplicated or masked by placebo. In this case, conventional clinical
trials are inappropriate for testing the effects of these drugs, as they may
result in the rejection of effective medications. Conversely, if drug and
placebo effects of antidepressant medication are additive, then the data
clearly show that those effects are small, at best, and of questionable
clinical efficacy. Finally, it is conceivable that the effects are partially
additive, with the true drug effect being somewhere in between these
extremes. The problem is that we do not know which of these models is most
accurate because the assumption of additivity has never been tested with
antidepressant mediation.
One method of testing the additivity is the use of the balanced placebo
design ( Marlatt & Rohsenow, 1980). In this design,
informed consent is first obtained for a study in which active drug or
placebo will be administered. Half of the participants are told they are
receiving active drug and half are led to believe they are not. In fact,
half of the participants are given an active drug and half are not. Thus,
half of the participants are misinformed about what they will receive and
are debriefed after participation in the trial. As shown in
Figure 1, there are four cells in the balanced
placebo design.

Figure 1. The balanced placebo design.
Depending on assignment, participants are (a) told they are getting the
drug and do in fact receive it, (b) told they are getting drug but in fact
receive placebo, (c) told they are getting placebo but in fact receive drug,
and (d) told they are getting placebo and in fact receive placebo. This
permits independent and combined assessment of drug and placebo effects.
This design has been used with healthy volunteers and has provided
interesting data on the additive and nonadditive effects of alcohol ( Hull
& Bond, 1986) and caffeine (Kirsch & Rosadino, 1993).
It has not been used in clinical trials, in which its use might pose a more
difficult ethical problem because of the temporary deception that is
involved. However, there is also an ethical risk involved in not assessing
the additivity assumption underlying clinical trials. If that assumption is
unwarranted, effective medications may be rejected because their effects are
masked by placebo effects. Conversely, if the assumption is warranted, then
current antidepressants may be little more than active placebos. Thus, some
means of assessing the additivity hypothesis is a crucial task.
Without the assumption of additivity, the FDA data do not allow one to
determine the effectiveness of antidepressant medication. That is, it is not
possible to determine the degree to which the antidepressant response is a
drug effect and the degree to which it is a placebo effect. If one does make
the assumption that the drug effect is the difference between the drug
response and the placebo response, then it is very small and of questionable
clinical value. By far, the greatest part of the change is also observed
among patients treated with inert placebo. The active agent enhances this
effect, but to a degree, that may be clinically meaningless.
These data raise questions about the criteria used by the FDA in
approving antidepressant medications. The FDA required positive findings
from at least two controlled clinical trials, but the total number of trials
can vary. Positive findings consist of statistically significant
drug/placebo differences. The clinical significance of these differences is
not considered.
The problems associated with these criteria are illustrated in a
memorandum from the director of the FDA Division of Neuropharmacological
Drug Products (DNDP; Leber, 1998) on the approvable
action on Celexa (citalopram) for the management of depression. Two
controlled efficacy trials showed significant drug/placebo differences.
Three others "failed to provide results confirming the positive findings" (Leber,
1998, p.6).1 This led to the conclusion
that "there is clear evidence from more than one adequate and well
controlled clinical investigation that citalopram exerts an antidepressant
effect. The size of that effect, and more importantly, the clinical value of
that effect, is not something that can be validly measured, at least not in
the kind of experiments conducted. Accordingly, substantial evidence in the
present case, as it has in all other evaluations of antidepressant
effectiveness, speaks to proof in principle [emphasis added] of a
product's effectiveness" (Leber, 1998, p. 7).
Similarly, the DNDP team leader for psychiatric drug products commented,
"While it is difficult to judge the clinical significance of this
difference, similar findings for other SSRIs and other recently approved
antidepressants have been considered sufficient to support the approvals of
those other products" ( Laughren, 1998, p. 6).
Laughren noted that "while the reasons for negative outcomes for [these
studies] are unknown," about 25% of the patients in one of the failed
studies did not meet criteria for major depression, and in the other two,
"there was a substantial placebo response, making it difficult to
distinguish drug from placebo" (Laughren, 1998, p. 4). On
the basis of these concerns, he concluded, "I feel there were sufficient
reasons to speculate about the negative outcomes and, therefore, not count
these studies against citalopram" (Laughren, 1998, p. 6).
To summarize, the data submitted to the FDA reveal a small but
significant difference between antidepressant drug and inert placebo. This
difference may be a true pharmacological effect, or it may be an artifact
associated with the breaking of blind by clinical trial patients and the
psychiatrists who are rating the severity of their conditions. Further
research is needed to determine which of these is the case.
In any case, the difference is relatively small (about 2 points on the
HAM-D), and its clinical significance is dubious. Research is therefore
needed to assess the additivity of antidepressant drug and placebo effects.
If there is a powerful antidepressant effect, then it is being masked by a
nonadditive placebo effect, in which case current clinical trial methodology
may be inappropriate for evaluating these medications, and alternate
methodology need to be developed. Conversely, if the drug effect is as small
as it appears when drug/placebo differences are estimated, then there may be
little justification for the clinical use of these medications. The problem,
then, would be to find an alternative, as the clinical response to both drug
and placebo is substantial. Placebo treatment has the advantage of eliciting
fewer side effects. However, the deception that is inherent in clinical
administration of placebos inhibits their use. Thus, the development of
nondeceptive methods of eliciting the placebo effect would be of great
importance.
References
Fisher, S., & Greenberg, R. P. (1993). How sound is the double-blind
design for evaluating psychotropic drugs. Journal of Nervous and Mental
Disease, 181, 345-350.
Hamilton, M. A. (1960). A rating scale for depression. Journal of
Neurology, Neurosurgery, and Psychiatry, 23, 56-61.
Hull, J. G., & Bond, C. F. (1986). Social and behavioral consequences of
alcohol consumption and expectancy: A meta-analysis. Psychological
Bulletin, 99, 347 360.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis:
Correcting error and bias in research findings. Newbury Park, CA: Sage.
Khan, A., Warner, H. A., & Brown, W. A. (2000). Symptom reduction and
suicide risk in patients treated with placebo in antidepressant clinical
trials: An analysis of the Food and Drug Administration database.
Archives of General Psychiatry 57, 311-317.
Kirsch, I. (2000). Are drug and placebo effects in depression additive?
Biological Psychiatry 47, 733-73.
Kirsch, I., & Rosadino, M. J. (1993). Do double-blind studies with
informed consent yield externally valid results? An empirical test.
Psychopharmacology, 110, 437-442.
Kirsch, I., & Sapirstein, G. (1998). Listening to Prozac but hearing
placebo: A meta analysis of antidepressant medication. Prevention &
Treatment, 1, Article 0002a. Available on the World Wide Web:
http://www.journals.apa.org/prevention/volume1/pre0010002a.html.
Klein, D. F. (1998). Listening to meta-analysis but hearing bias.
Prevention & Treatment, 1, Article 0006c. Available on the World Wide
Web:
http://www.journals.apa.org/prevention/volume1/pre0010006c.html.
Laughren, T. P. (1998, March 26). Recommendation for approvable action
for Celexa (citalopram) for the treatment of depression. Memoradum:
Department of Health and Human Services, Public Health Service, Food and
Drug Administration, Center for Drug Evaluation and Research, Washington,
DC.
Leber, P. (1998, May 4). Approvable action on Forrest Laboratories,
Inc. NDA 20-822 Celexa (citalopram HBr) for the management of depression.
Memoradum: Department of Health and Human Services, Public Health Service,
Food and Drug Administration, Center for Drug Evaluation and Research,
Washington, DC.
Lyerly, S. B., Ross, S., Krugman, A. D., & Clyde, D. J. (1964). Drugs and
placebos: The effects of instructions upon performance and mood under
amphetamine sulphate and chloral hydrate. Journal of Abnormal and Social
Psychology, 68, 321 327.
Marlatt, G. A., & Rohsenow, D. J. (1980). Cognitive processes in alcohol
use: Expectancy and the balanced placebo design. In N. K. Mello (Ed.),
Advances in substance abuse: Behavioral and Biological Research, (pp.
159 199). Greenwich, CT: JAI Press.
Murray, E. J. (1989). Measurement issues in the evaluation of
pharmacological therapy. In S. Fisher & R. P.Greenberg (Eds), The limits
of biological treatments for psychological distress: Comparisons with
psychotherapy and placebo (pp. 39-67). Hillsdale, NJ: Erlbaum.
Rabkin, J.G., Markowitz, J. S., Stewart, J. W., McGrath, P. J., Harrison,
W., Quitkin, F. J., & Klein, D. F. (1986) How blind is blind? Assessment of
patient and doctor medication guesses in a placebo-controlled trial of
imipramine and phenelzine. Psychiatry Research, 19, 75-86.
Ross, S., Krugman, A. D., Lyerly, S. B., & Clyde, D. J. (1962). Drugs and
placebos: A model design. Psychological Reports, 10, 383 392.
RxList: The Internet Drug Index. (1999). The top 200 prescriptions for
1999 by number of U.S. prescriptions dispensed. Retrieved November 19,
2001, from
http://www.rxlist.com/99top.htm
Footnote
1Data on two maintenance studies were also reported by the
manufacturer of Celexa. In these relapse prevention trials, participants who
had responded to citalopram were ramdomized to drug or placebo. HAM-D scores
did not distinguish between drug and placebo in one of these trials and were
not assessed in the other. The primary outcome in these studies was time to
relapse ( Laughren, 1998). Mean time to relapse was 21
weeks for citalopram versus 18 weeks for placebo in one of these studies and
was not reported in the other. |