IN RE: ZOLOFT (SERTRALINE HYDROCHLORIDE) PRODUCTS LIABILITY LITIGATION
No. 16-2247
United States Court of Appeals, Third Circuit.
June 2, 2017
Jennifer Adams, et al, Plaintiffs appealing dismissal by order entered April 5, 2016, Appellants
Argued on January 25, 2017
858 F.3d 787
Sheila L. Birnbaum, Mark S. Cheffo [Argued], Quinn Emanuel Urquhart & Sullivan, 51 Madison Avenue, 22nd Floor, New York, NY 10010, Robert C. Heim, Judy L. Leone, Dechert, 2929 Arch Street, 18th Floor, Cira Centre, Philadelphia, PA 19104, Counsel for Appellees
Cory L. Andrews, Washington Legal Foundation, 2009 Massachusetts Avenue, N.W., Washington, DC 20036, Counsel for Amicus Washington Legal Foundation
Brian D. Boone, Alston & Bird, 101 South Tryon Street, Suite 4000, Charlotte, NC 28280, David R. Venderbush, Alston & Bird, 90 Park Avenue, 15th Floor, New York, NY 10016, Counsel of Amicus Chamber of Commerce of the United States
Joe G. Hollingsworth, Hollingsworth, 1350 I Street, N.W., Washington, DC 20005, Counsel for Amicus American Tort Reform Association and Pharmaceutical Research and Manufacturers of America
Before: CHAGARES, RESTREPO and ROTH, Circuit Judges
OPINION
ROTH, Circuit Judge:
This case involves allegations that the anti-depressant drug Zoloft, manufactured by Pfizer, causes cardiac birth defects when taken during early pregnancy. In support of their position, plaintiffs, through a Plaintiffs’ Steering Committee (PSC), depended upon the testimony of Dr. Nicholas Jewell, Ph.D. Dr. Jewell used the “Bradford Hill” criteria1 to analyze existing literature on the causal connection between Zoloft and birth defects. The District Court excluded this testimony and granted summary judgment to defendants. The PSC now appeals these orders, alleging that 1) the District Court erroneously held that an expert opinion on general causation must be supported by replicated observational studies reporting a statistically significant association between the drug and the adverse effect, and 2) it was an abuse of discretion to exclude Dr. Jewell‘s testimony. Because we find that the District Court did not establish such a legal standard and did not abuse its discretion in excluding Dr. Jewell‘s testimony, we will affirm the District Court‘s orders.
I.
This case arises from multi-district litigation involving 315 product liability claims against Pfizer, alleging that Zoloft, a selective serotonin reuptake inhibitor (SSRI), causes cardiac birth defects. The PSC introduced a number of experts in order to establish causation. The testimony of each of these experts was excluded in whole or in part. In particular, the court excluded all of the testimony of Dr. Anick Bérard (an epidemiologist), which relied on the “novel technique of drawing conclusions by examining ‘trends’ (often statistically non-significant) across selected stud-
The District Court considered Dr. Jewell‘s application of various methodologies, reviewing his expert report, rebuttal reports, party briefs, and oral testimony. The District Court first examined how Dr. Jewell applied the traditional methodology of analyzing replicated, significant results. While Dr. Jewell discussed many groupings of cardiac birth defects, he focused on the significant findings for all cardiac defects and septal dеfects. Dr. Jewell presented two studies reporting a significant association between Zoloft and all cardiac defects (Kornum (2010)4 and Jimenez-Solem (2012)5). He also presented five studies reporting a significant association between Zoloft and septal defects (Kornum (2010), Jimenez-Solem (2012), Louik (2007),6 Pedersen (2009),7 and Bérard (2015)8). After excluding two studies from its consideration,9 the District Court expressed two concerns with the remaining studies: Jimenez-Solem (2012), Kornum (2010), and Pedersen (2009). First, despite the fact that the remaining studies produced consistent results, the District Court did not consider them to be independent replications because they used overlapping Danish populations. Second, a larger study, Furu (2015),10 included almost all the data from Jimenez-Solem (2012), Kor-
The court then examined Dr. Jewell‘s reliance on insignificant results, noting that it was very similar to Dr. Bérard‘s methodology. The court noted that Dr. Jewell did not provide any evidence that the epidemiology or teratology11 communities value statistical significance12 any less than it has traditionally been understood.13 The court also expressed concern that Dr. Jewell inconsistently applied his “technique” of multiplying p-values14 and his trend analysis.
The District Court critiqued several other techniques Dr. Jewell used in analyzing the evidence. First, Dr. Jewell rejected meta-analyses on which he had previously relied in a lawsuit against another SSRI, Prozac. The meta-analyses reported insignificant associations with birth defects for Zoloft but not for Prozac. Dr. Jewell rationalized his decision to ignore these meta-analyses because the “heterogеneity”15 within its Zoloft studies was significant; the District Court accepted this explanation but questioned why Dr. Jewell “fails to statistically calculate the heterogeneity” across other studies instead of relying on trends.16 Second, Dr. Jewell reanalyzed two studies, Jimenez-Solem (2012) and Huybrechts (2014),17 both of which had originally concluded that there was no significant effect attributable to Zoloft.18 The District Court questioned his rationale for conducting, and tactics for implementing, this reanalysis. Finally, Dr. Jewell conducted a meta-analysis with
Huybrechts (2014) and Jimenez-Solem (2012). The District Court questioned why he used only those particular studies.19
Based on this analysis, the District Court found that Dr. Jewell, tasked with explaining his opinion about Zoloft‘s effect on birth defects and reconciling contrary studies, “failed to consistently apply the scientific mеthods he articulates, has deviated from or downplayed certain well-established principles of his field, and has inconsistently applied methods and standards to the data so as to support his a priori opinion.”20 For this reason, on December 2, 2015, the District Court entered an order, excluding Dr. Jewell‘s testimony, and on April 5, 2016, the court granted Pfizer‘s motion for summary judgment. The PSC appeals the exclusion of Dr. Jewell and the grant of summary judgment.21
II.22
In general, courts serve as gatekeepers for expert witness testimony. “A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if,” inter alia, “the testimony is the product of reliable principles and methods[ ] and ... the expert has reliably applied the principles and methods to the facts of the case.”23 In determining the reliability of novel scientific methodology, courts can consider multiple factors, including the testability of the hypothesis, whether it has been peer reviewed or published, the error rate, whether standards controlling the technique‘s operation exist, and whether the methodology is generally accepted.24 Both an expert‘s methodology and the application of that methodology must be reviewed for reliability.25 A court should not, however, usurp the role of the fact-finder; instead, an expert should only be excluded if “the flaw is large enough that the expert lacks the ‘good grounds’ for his or her conclusions.”26
We review the decision to exclude expert testimony for abuse of discretion. In re Paoli, 35 F.3d at 749. However, when the exclusion of such evidence results in a summary judgment, we perform a “hard look” analysis to determine if a district court has abused its discretion. Id. at 750. An abuse of discretion occurs when a court‘s decision “rests upon a clearly erroneous finding of fact, an errant conclusion of law or an improper application of law to fact” or “when no reasonable person would adopt the district court‘s view.” Oddi v. Ford Motor Co., 234 F.3d 136, 146 (3d Cir. 2000).
With this in mind, we proceed to the issues at hand. The PSC raises two issues on appeal: 1) whether the District Court erroneously concluded that reliability requires replicated, statistically significant findings, and 2) whether Dr. Jewell‘s testimony was properly excluded.
A.
The PSC argues that the District Court erroneously held that replicated, statistically significant findings are necessary to satisfy reliability. This argument seems to have been originally raised in the motion for reconsideration of Dr. Bérard‘s exclusion. Explaining its decision to exclude Dr. Bérard, the District Court cited a previous case, Wade-Greaux v. Whitehall Labs., Inc., for the proposition that the teratology community generally requires replicated, significant epidemiological results before inferring causality.28 The PSC clаims that in so doing, the District Court was asserting a legal standard that required replicated, significant findings for reliability.29 Pfizer contends that the District Court merely made a factual finding about what the teratology community generally accepts.
Second, the course of the proceedings make clear that the replication of significant results was not dispositive in establishing whether the testimony of either Dr. Bérard or Dr. Jewell was reliable. In fact, the District Court expressly rejected Pfizer‘s argument that the existence of a statistically significant, replicated result is a threshold issue before an expert can conduct the Bradford-Hill analysis.35 In dоing so, the District Court was clear that it was not requiring a threshold showing of statistical significance. Similarly, the District Court did not end its inquiry after analyzing whether there were replicated, significant results. Instead, the District Court examined other techniques of general trend analysis, reanalysis of other studies, and meta-analysis. Even though it ultimately rejected the application of these techniques as unreliable, it did not categorically reject alternative techniques, suggesting that it did not make a legal standard requiring replicated, significant results.
For these reasons, we find that the District Court did not require replication of
B.
The second issue on appeal is whether it was an abuse of discretion for the District Court to exclude Dr. Jewell‘s testimony. Dr. Jewell utilized a combination of two methods: the “weight of the evidence” analysis and the Bradford Hill criteria. The “weight of the evidence” analysis involves a series of logical steps used to “infer[] to the best explanation[.]”37 The Bradford Hill criteria are metrics that epidemiologists use to distinguish a causal connection from a mere association. These metrics include strength of the association, consistency, specificity, temporality, coherence, biological gradient, plausibility, experimental evidence, and analogy.38 In his expert report, Dr. Jewell seems to utilize numerous “techniques” in implementing the wеight of the evidence methodology. Dr. Jewell discusses whether the conclusions drawn from these techniques satisfy the Bradford Hill criteria and support the existence of a causal connection.39
Pfizer does not seem to contest the reliability of the Bradford Hill criteria or weight of the evidence analysis generally; the dispute centers on whether the specific methodology implemented by Dr. Jewell is reliable. Flexible methodologies, such as the “weight of the evidence,” can be implemented in multiple ways; despite the fact that the methodology is generally reliable, each application is distinct and should be analyzed for reliability. In In re Paoli R.R. Yard PCB Litigation, this Circuit noted that while differential diagnosis—also a flexible methodology—is generally accepted, “no particular combination of techniques chosen by a doctor to assess an individual patient is likely to have been generally accepted.”40 Accordingly, we subjected the expert‘s specific differential diagnosis process to a Daubert inquiry.41 We noted that “to the extent that a doctor utilizes standard diagnostic techniques in gathering this information, the more likely we are to find that the doctor‘s methodology is reliable.”42 While we did not require the expert to run specific tests or ascertain full information in order for the differential diagnosis to be reliable, we did require
This standard, while articulated with respect to differential diagnoses, applies to the weight of the evidence analysis. We have briefly encountered the Bradford Hill criteria/weight of the evidence methodology in Magistrini v. One Hour Martinizing Dry Cleaning, a nonprecedential affirmance of the District of New Jersey‘s exclusion of an expert.44 The expert followed the weight of the evidence methodology, including epidemiological findings assessed using the Bradford Hill criteria. The District Court acknowledged that although the weight of the evidence methodology was generally reliable, “[t]he particular combination of evidence considered and weighed here has not been subjected to peer review.”45 Similar concerns are arguably present for the Bradford Hill criteria, which are neither an exhaustive nor a necessary list.46 An expert can theoretically assign the most weight to only a few factors, or draw conсlusions about one factor based on a particular combination of evidence. The specific way an expert conducts such an analysis must be reliable; “all of the relevant evidence must be gathered, and the assessment or weighing of that evidence must not be arbitrary, but must itself be based on methods of science.”47 To ensure that the Bradford Hill/weight of the evidence criteria “is truly a methodology, rather than a mere conclusion-oriented selection process ... there must be a scientific method of weighting that is used and explained.”48 For this reason, the specific techniques by which the weight of the evidence/Bradford Hill methodology is conducted must themselves be reliable according to the principles articulated in Daubert.49
In short, despite the fact that both the Bradford Hill and the weight of the evidence analyses are generally reliable, the “techniques” used to implement the analysis must be 1) reliable and 2) reliably applied. In discussing the conclusions produced by such techniques in light of the Bradford Hill criteria, an expert must explain 1) how conclusions are drawn for each Bradford Hill criterion and 2) how the criteria are weighed relative to one another. Here, we accept that the Brad-
1.
It was not an abuse of discretion for the District Court to find Dr. Jewell‘s application of trend analysis, reanalysis, and meta-analysis to the body of evidence to be unreliable. Here, we assume the techniques listed are generally reliable and rest on the fact that they were unreliably applied. As stated in In re Paoli, use of standard techniques bolster the inference of reliability;51 nonstandard techniques need to be well-explained. Additionally, if an expert applies certain techniques to a subset of the body of evidence and other techniques to another subset without explanation, this raises an inference of unreliable application of methodology.52
First, we find no abuse of discretion in the District Court‘s determination that Dr. Jewell unreliably analyzed the trend in insignificant results. Dr. Jewell applied this technique by qualitatively discussing the probative value of multiple positive, insignificant results. In justifying this approach, he relied on a quantitative method by which one can calculate the likelihood of seeing multiple positive but insignificant results if there were actually no true effect.53 However, after alluding to this presumably reliable mathematical calculation technique for analyzing trends in even insignificant results, Dr. Jewell did not actually implement it; instead he qualitatively discussed the general trend in the data. In light of the opportunity to actually conduct such quantitative analysis, his refusal to do so—without explanation—suggests that he did not reliably apply his stated methodology.54
Even assuming the reliability of Dr. Jewell‘s version of trend analysis, Dr. Jewell identified trends and interpreted insignificant results differently based on the outcome of the study. The District Court concluded that Dr. Jewell “selectively emphasize[d] observed consistency only when the consistent studies support his opinion.”55 Dr. Jewell emphasized the insignificance of results reporting odds ratios below 1 but not the insignificance of those reporting odds ratios above 1. He also paid attention to the upper bounds of the confidence intervals associated with odds ratios below 1, but not to the lower bounds.
Finally, Dr. Jewell reanalyzed two studies to control for confounding by indication. The need for conducting this reanalysis on Huybrechts (2014) was unclear. Dr. Jewell said that he wanted to control for indication by comparing the outcomes for “paused” Zoloft users to “exposed” Zoloft users; however, the study already controlled for indication. If Dr. Jewell wanted to correct for misclassification, the original study already controlled for that as well through extensive sensitivity analyses.57 Given that the study originally concluded that Zoloft was not associated with a statistically significant increase in the likelihood of birth defects, this reаnalysis seems conclusion-driven.
Ultimately, the fact that Dr. Jewell applied these techniques inconsistently, without explanation, to different subsets of the body of evidence raises real issues of reliability. Conclusions drawn from such unreliable application are themselves questionable.
2.
Using the techniques discussed above, Dr. Jewell went on to evaluate the Bradford Hill criteria. While Dr. Jewell did discuss the applicable Bradford Hill criteria and how he weighed the factors together, he did not explain how he drew conclusions for certain criteria, namely the strength of association and consistency.
Dr. Jewell concluded that the strength of association weighs in favor of causality. In doing so, he focused on studies reporting odds ratios between two and three (Colvin (2011),58 Jimenez-Solem (2012), Malm (2011),59 Pedersen (2009), and Louik
Similarly, while Dr. Jewell found that the causal effect of Zoloft on cardiac birth defects is consistent, it is not clear how he drew this conclusion. As noted above, Dr. Jewell classified insignificant odds ratios above one as supporting a “consistent” causality result, downplaying the possibility that they support no association between Zoloft use and cardiac birth defects. While an insignificant result may be consistent with a causal effect, Dr. Jewell‘s discussion is too far-reaching, sometimes understating the importance of statistical significance. For example, Furu (2015)—a study that incorporated almost all the data in Pedersen (2009), Jimenez-Solem (2012), and Kornum (2010)—included a larger sample but, unlike the former three studies, reported no significant association between Zoloft and cardiac birth defects. Insignificant results can occur merely because a study lacks power to produce a significant result, and, all else being equal, a larger sample size increases the power of a test.62 Unless there are other significant differences, we would expect Furu to be better able to capture a true effect than the preceding three studies. While an insignificant result from a low-powered study does not necessarily undermine a statistically significant result from a higher-powered study, the opposite argument (i.e., that an insignificant finding from a presumably better-powered study is evidence of consistency with significant findings from lower-powered studies) requires further explanation.63 While there may be a reason that such a result could be consistent with the past significant effects, Dr. Jewell did not meaningfully discuss why this may be.64 Without adequate explanation, this argument understates the importance of statistical significance. Like the expert in Magistrini, Dr. Jewell should have “sufficiently discredit[ed] other studies that found no association or a negative association with much more precise confidence intervals, [or] sufficiently
For these reasons, the District Court determined that Dr. Jewell did not consistently assess the evidence supporting each criterion or explain his method for doing so. Thus, it was not an abuse of discretion to find that Dr. Jewell‘s application of the Bradford Hill criteria was unreliable.
This is not to suggest that all of the District Court‘s criticisms were necessarily justified. For example, the fact that in his reanalysis Dr. Jewell drew a different conclusion from a study than its authors did is not necessarily a problem. Similarly, his imposition of a different assumption about the “exposed” group in Huybrechts (2014) did not require expert knоwledge about psychology; he was merely testing the robustness of the results to Huybrechts’ original assumption. Similarly, the District Court credited the claim that overlapping samples did not provide replicated results, despite the fact that Dr. Jewell claimed it provided some informational value.67 These inquiries are more appropriately left to the jury.
On the whole, however, the District Court did not improperly usurp the jury‘s role in assessing Dr. Jewell‘s credibility. There is sufficient reason to find Dr. Jewell‘s testimony was unreliable. Indeed, “any step that renders the analysis unreliable under the Daubert factors renders the expert‘s testimony inadmissible.”68 The fact that Dr. Jewell unreliably applied the techniques underlying the weight of the evidence analysis and the factors of the Bradford Hill analysis satisfies this standard for inadmissibility.
III.
This case involves complicated facts, statistical methodology, and competing claims of approрriate standards for assessing causality from observational epidemiological studies. Ultimately, however, the issue is quite clear. As a gatekeeper, courts are supposed to ensure that the testimony given to the jury is reliable and will be more informative than confusing. Dr. Jewell‘s application of his purported methods does not satisfy this standard. By applying different techniques to subsets of the data and inconsistently discussing statistical significance, Dr. Jewell does not reliably analyze the weight of the evidence. Selecting these conclusions to discuss certain Bradford Hill factors also contributes to the unreliability. While the District Court may have flagged a few issues that are not necessarily indicative of an unreliable application of methods, there is certainly sufficient evidence on the record to suggest that the court did not abuse its discretion in excluding Dr. Jewell as an expert on the basis of the unreliability of his methods. For these reasons, we will affirm the orders of the District Court, excluding the testimony of Dr. Jewell and granting summary judgment in favor of Pfizer.
