Mr. & Mrs. Flоyd Brock filed suit in federal district court on behalf of their minor child, Rachel Brock, to recover damages for birth defects that allegedly resulted from Mrs. Brock’s ingestion during her pregnancy of the anti-nausea drug Bendec-tin, which is manufactured by Merrell-Dow Pharmaceuticals, Inc. (“Merrell-Dow”). The Brocks obtained a jury verdict in the amount of $550,000 against Merrell-Dow, representing $240,000 in compensatory damages and $310,000 in punitive damages. Merrell-Dow appeals that verdict here, arguing that the Brocks did not present sufficient evidence to allow the jury to conclude that Bendectin caused Rachel Brock’s birth defect. After reviewing the record and decisions of other courts confronted with similar suits regarding Bendectin, we hold that Merrell-Dow was entitled to judgment notwithstanding the verdict, and the judgment in favor of the Brocks is therefore reversed and the case will be dismissed.
Background
Mrs. Brock conceived Rachel Brock on or around July 2, 1981. On July 28, 1981, Mrs. Brock began to experience morning sickness, and she began to take Bendectin, a prescription drug manufactured by Defendant, Merrell-Dow. 1 Rachel Brock was born on March 19, 1982 with a limb reduction defect known as Poland’s Syndrome, which is recognized by a shortening or absence of fingers with a decrease in the corrеsponding pectoralis muscle on one side.
Mr. and Mrs. Brock filed a diversity suit against Merrell-Dow on behalf of their daughter in the U.S. District Court for the Eastern District of Texas. The complaint alleged theories of improper inspection, design defect, and failure to warn. Causation was a hotly contested issue, with both sides presenting expert testimony and studies regarding the possible teratogenicity 2 of Bendectin. At the end of trial, Merrell-Dow moved for a directed verdict, arguing that there was no credible evidence tending to show that Bendectin causes birth defects. Merrell-Dow’s motion was denied, and the issue of whether Bendеctin caused Rachel Brock’s birth defect was given to the jury. The jury found for the Brocks, and awarded both compensatory and punitive damages. Merrell-Dow then moved for judgment notwithstanding the verdict, and that motion was denied. Merrell-Dow here appeals the denial of its motions for directed verdict and for judgment notwithstanding the verdict. 3
Standard for Determining Sufficiency of the Evidence
The standard for granting a judgment notwithstanding the verdict is the same as that governing rulings on directed verdicts: judgment notwithstanding the verdict is proper only when there can be only one reasonable conclusion drawn from the evidence.
Dun & Bradstreet, Inc. v. Miller,
These general and abstract formulations lose much of their usefulness, however, when we attempt to apply them to the concrete factual situation at hand. One certainly might infer from the evidence in the case that Bendectin causes birth defects, and further that Bendectin caused Rachel Brock’s limb reduction defect — in fact, the jury concluded that this very thing occurred. However, the court must determine whether this is a reasonable inference to be drawn from the evidence presented, and the formulae provide us with little guidance as to what constitutes a reasonable, as opposed to unreasonable, inference that a jury could draw from the evidence. 4 Ultimately, the “correctness” of our decision that there was insufficient evidence presented by plaintiff on the issue of whether Bendectin caused Rachel Brock’s limb reduction defect to enable a jury to draw a reasonable inference may be just a matter of opinion, but hopefully the reasoning below will persuade others of the insights of our perspective.
This case is one of a series of many cases filed against Merrell-Dow by parents of children with birth defects allegedly caused by the ingestion of Bendectin during pregnancy. Academic commentators have dubbed this case and others like it “mass toxic torts.” 5 This represents a growing realization among academics, lawyers, and judges that cases such as this present special problems and challenges to traditional ideas regarding the role of the jury as a decisionmaker.
The first problem is that there is often no consensus in the medical community regarding whether a given substance is teratogenic; this is the case with Bendec-tin. Moreover, while we now recognize some of the many factors which can cause birth defects, medical science is now unable, and will undoubtedly remain unable for the foreseeable future, to trace a known birth defect back to its precipitating cause. 6 The second problem, in addition to the problem of unknowability, is that juries are asked tо resolve these questions, upon which even our brightest medical minds disagree, in order to resolve the case at hand and decide whether the plaintiff is entitled to recovery, and in so doing must necessarily resort to speculation.
Under the traditional approach to scientific evidence, courts would not peer beneath the reasoning of medical experts to question their reasoning. 7 Confronted, as *310 we now are, with difficult medical questions, courts must critically evaluate the reasoning process by which the experts connect data to their conclusions in order for courts to consistently and rationаlly resolve the disputes before them. Moreover, in mass torts the same issue is often presented over and over to juries in different cases, and the juries often split both ways on the issue. 8 The effect of this is to create a state of uncertainty among manufacturers contemplating the research and development of new, and potentially lifesaving drugs. 9 Appellate courts, if they take the lead in resolving those questions upon which juries will go both ways, can reduce some of the uncertainty which can tend to produce a sub-optimal amount of new drug development.
We are not without precedent in our approach to this problem. The case before us parallels in many respects the recently conducted Agent Orange Litigation. 10 In those cases, plaintiffs attempted to prove that exposure to Agent Orange, a defoliant used during the Vietnam War, had caused them adverse health effects. Judge Wein-stein granted summary judgment against opt-out plaintiffs on the basis that they had been unable to prove that exposure to low levels of dioxin caused their health problems. Although plaintiffs had provided the affidavits of experts indicating that exposure to Agent Orange had caused their health problems, the court attacked the reasoning of the experts and found it to be inadequate. 11
Courts have not always been so willing to analyze the reasoning employed by experts to reach their conclusions. In
Ferebee v. Chevron Chemical Co.,
The District of Columbia Circuit retreated from this approach recently when it considered, in a Bendectin case, the very same issue we are addressing here. In
Richardson by Richardson v. Richardson-Merrell, Inc.,
Sufficiency of the Evidence Presented
Undoubtedly, the most useful and conclusive type of evidence in a case such as this is epidemiological studies. Epidemiology attempts to define a relationship between a disease and a factor suspected of causing it—in this case, ingestion of Bendectin during pregnancy. To define that relationship, the epidemiologist examines the general population, comparing the incidence of the disease among those people exposed to the factor in question to those not exposed. The epidemiologist then uses statistical methods and reasoning to allow her to draw a biоlogical inference between the factor being studied and the disease’s etiology. 13
One difficulty with epidemiologic studies is that often several factors can cause the same disease. Birth defects are known to be caused by mercury, nicotine, alcohol, radiation, and viruses, among other factors. When epidemiologists compare the birth defect rates for women who took Bendectin during pregnancy against those who did not take Bendectin during pregnancy, there is a chance that the distribution of the other causal factors may not be even between the two groups. Ususally, the larger thе size of the sample, the more likely that random chance will lead to an even distribution of these factors among the two comparison groups, unless there is a dependence between some of the other factors and the factor being studied. For example, there would be a dependence between variables if women who took Ben-dectin during pregnancy were more or less likely to smoke than women who did not take Bendectin. Another source of error in *312 epidemiological studies is selective recall— i.e., women who have children with birth defects may be more likely to remember taking Bendectin during pregnancy than those women with normal children. Fortunately, we do not have to resolve any of the above questions, since the studies presented to us incorporate the possibility of these factors by use of a confidence interval. The purpose of our mentioning these sources of error is to provide some background regarding the importance of confidence intervals.
In this case, the parties described the results of epidemiologic studies in terms of two numbers: a relative risk and a confidence interval. The relative risk is a number which describes the increased or decreased incidence of the disease in question in the population exposed to the factor as compared to the control population not exposed to the factor. In this case, the relative risk describes the increased or decreased incidence of birth defects in the group of women who took Bendectin versus women who did not take Bendectin. A relative risk of 1.0 means that the incidence of birth defects in the two groups were the same. A relative risk greater than 1.0 means that there were more birth defects in the group of women who took Bendectin.
Just because an epidemiological study concludes that a relative risk is greater than 1.0 does not establish that the factor caused the disease. If the confidence interval is so great that it includes the number 1.0, then the study will be said to show no statistically significant association between the factor and the disease. For example, if a study concluded that the relative risk for Bendectin was 1.30, which is consistent with a 30% elevated risk of harm, but the confidence interval was from 0.95 to 1.82, then no statistically significant conclusions could be drawn from this study because the relative risk, when adjusted by the confidence interval, includes 1.0. Again, it is important to remember that the confidence interval attempts to express mathematically the magnitude of possible error, due to the above mentioned sources as well as others, and therefore a study with a relative risk of greater than 1.0 must always be considered in light of its confidence interval before one can draw conclusions from it.
The Brocks relied on a reanalysis conducted by Dr. Jay Glasser of the previously conducted Heinonen study. The Heinonen study, which was conducted by Professor O.P. Heinonen under the auspices of the U.S. National Institute of Neurological and Communicative Disorders, was based on over 50,000 pregnancy records collected in the United States. Dr. Heinonen, in analyzing these records, found that approximately 1,000 women had taken Bendectin during the first four months of their pregnancies. 63 of those women had infants with malformations as opposed to 3,200 out of 49,000 who had not taken Bendectin. This yielded a relative risk of 0.97 with confidence limits from 0.75 to 1.26. Therefore, this study does not support the proposition that Bendectin causes birth defects.
The Heinonen study considered as birth defects all malformations, of which limb reduction defects of the type experienced by Rachel Brock are a subset. Dr. Glas-ser’s reanalysis of the data used in the Heinonen study only considered limb reduction defects. That study found a relative risk of 1.49. However, Dr. Glasser admits that the confidence interval was from 0.17 to 3; this renders the study statistically insignificant. The plaintiffs did not offer one statistically significant (one whose confidence interval did not include 1.0) study that concludes that Bendectin is a human teratogen. No published epidemiological study has found a statistically significant increased risk between exposure to Bendec-tin and birth defects. One of plaintiff’s experts, Dr. Snodgrass, conceded that he was not aware of any such studies. Transcript at 198-99. Nor have any such studies been рresented to the other two federal appeals courts which have considered this matter. 14
*313
Although we find Dr. Glasser’s results inconclusive due to the fact that the confidence intervals include 1.0, we further note that Dr. Glasser has not published his study or conclusions for the purposes of peer review. While we do not hold that this failure, in and of itself, renders his conclusions inadmissible, courts must nonetheless be especially skeptical of medical and other scientific evidence that has not been subjected to thorough peer review. Clearly, “the examination of a scientific study by a cadre of lawyers is not the same as its examination by others trained in the field of science or medicine.”
Richardson,
We find, in this case, the lack of conclusive epidemiological proof to be fatal to the Brock’s case. While we do not hold that epidemiologic proof is a necessary element in all toxic tort cases, it is certainly a very important element. This is especially true when the only other evidence is in the form of animal studies of questionable applicability to humans. We are not the first court to emphasize the importance of epide-miologic analysis. For instance, in
Heyman v. United States,
The Brocks have also introduced animal studies in order to prove that Bendectin is a teratogen.
This circuit has previously realized the very limited usefulness of animal studies when confronted with questions of toxicity. In
Gulf South Insulation v. U.S. Consumer Product Safety Comm.,
We need not address at length the animal studies presented by plaintiffs below, except to note several of the more important studies and their methodological flaws. The plaintiffs presented an in vitro study conducted by Drs. Hassell and Hori-gan. This study used cells cut from the limbs of mice and chickens which were then *314 exposed to various test compounds, including Bendectin. According to plaintiff's experts, limb bud cells which normally form in six days were reduced if those cells were exposеd to Bendectin; these limb bud cells ultimately form the arms and legs. However, Dr. Hassell himself cautioned that the body may break down doxylamine, the active ingredient in Bendectin, into a metabolic product which may differ from the pure test compound. Thus, human limb bud cells in a fetus may not be exposed to doxylamine, but rather the metabolic product of doxylamine. Moreover, extrapolation of these findings to humans cannot be done without knowing the dosage level and the corresponding drug level in the bloodstream of the mother. Transcript at 674. Taking the concept of metabolic products one step further, thе plaintiff introduced the testimony of Drs. Snodgrass and Newman, who hypothesized that the human body breaks down doxylamine into less complex molecules called metabolites. These metabolites, some of which are negatively charged, are attracted to the relatively alkaline embryotic fluids, and ultimately bond with the cells in the embryo, producing tissue damage. Transcript at 629-30. However, both experts admitted that different species of animals metabolize chemicals differently, and that there are no studies which show that doxylamine is broken down by humans into toxic metabolites. Transcript at 181. Thus, we must view the limb bud tests as quite sрeculative.
Other plaintiffs’ experts conducted animal tests and reanalyses of previously performed animal tests. Drs. Thiersch and Glauser both reanalyzed results of previous studies on animals. Both admitted that the results of their animal studies are difficult to extrapolate to humans. Dr. Thiersch admitted in his deposition that one must rely, in part, on epidemiological studies to determine whether a substance is toxic in humans. Similarly, Dr. Glauser admitted that the only way to tell whether a substance is teratogenic in humans is to look to the human experience.
Dr. McBride, another expert called by plaintiffs, conducted researсh in Australia, exposing white rabbits and marmosets to high doses of doxylamine — up to 500 times the normal human dose. 16 13 of 18 of the rabbits died due to either the toxic effects of doxylamine or the improper insertion of gavage tubes which were used to administer the doxylamine to the rabbits. Of the marmosets, all four which were given high doses of doxylamine (500 times the human dosage) aborted their fetuses, and it was impossible to tell if those fetuses were malformed since the aborted fetus is generally eaten by the mother. Dr. McBride hypothesized that the abortions were the result of the teratogenic effect of doxyla-mine, rather than the high dosage. However, the fifth marmoset was given a lesser dose (100 times the human dosage), and two of the three fetuses later examined lacked a hind leg. On the basis of this, Dr. McBride concluded that doxylamine is a teratogen. He hypothesized that Bendec-tin, an anticholinergic drug, reduces the amount of acetylcholine, a neurotransmitter released by nerve cells, in the embryo. If the amount of acetylcholine in the embryo is diminished, the trophic effect on body growth is interfered with. Essentially, Dr. McBride speculated that the development of body tissue depends on the initial development of the nervous system. He citеd another hypothesis, that of Professor Alexander Karczman, as supporting his theory of the anticholinergic action of doxylamine. However, the theory regarding the effects of doxylamine on acetylcho-line, the nervous system and, ultimately, tissue development is nothing more than *315 unproven medical speculation lacking any sort of consensus. Assuredly, one day in the future, medical science may have a clearer understanding of the mechanics of tissue development in the fetus. However, that is not the case today, and speculation unconfirmed by epidemiologic proof cannot form the basis for causation in a court of law.
In light of the evidence presented, we are convinced that the Brocks did not present sufficient evidence regarding causation to allow a trier of fact to make a reasonable inference that Bendectin caused Rachel Brock’s limb reduction defect. We expect that our decision here will have a prece-dential effect on other cases pending in this circuit which allege Bendectin as the cause of birth defects. Hopefully, our decision will have the effect of encouraging district judges faced with medical and eрidemiologic proof in subsequent toxic tort cases to be especially vigilant in scrutinizing the basis, reasoning, and conclusiveness of studies presented by both sides. However, we do not wish this case to stand as a bar to future Bendectin cases in the event that new and conclusive studies emerge which would give a jury a firmer basis on which to determine the issue of causation.
Accordingly, the judgment below is REVERSED and RENDERED, and the court below will enter an order of dismissal.
Notes
. Bendectin, a trade name, consists of 10 mg. doxylamine succinate and 10 mg. pyridoxin (Vitamin B-6).
. A teratogen is a substance that causes birth defects.
. Merrell-Dow raises two other points on appeal, specifically, 1) that a new triаl should be ordered because one of plaintiffs experts presented a study that allegedly relied on fraudulent data, and 2) that punitive damages were inappropriate. Our reversal on the grounds that there was insufficient credible evidence that Bendictin causes birth defects to send this case to the jury obviates the need for us to address these other two points.
. One commentator notes that “[i]t is commonplace that in large part the problem of defining the range of reasonable inference is dependent on the unique circumstances of each particular case. No self-аpplying rule can be offered for determining what inferences are reasonable.” Cooper, Directions for Directed Verdicts: A Compass for Federal Courts, 55 Minn.L.Rev. 903, 956 (1971).
. See Abraham, Individual Action and Collective Responsibility: the Dilemma of Mass Tort Reform, 73 Va.L.Rev. 845 (1987); Black & Lilienfeld, Epidemiologic Proof in Toxic Tort Litigation, 52 Fordham L.Rev. 732 (1984); Sugarman, Doing Away with Tort Law, 73 Calif.L.Rev. 555 (1985).
. Advances in science and medical technology have shed new light on the the way we perceive our world, but in so doing have raised new questions for our legal system. Professor Black notes that:
[a]round the turn of the twentieth century, however, advances in physiology and psychology and the advent of the quantum and relativity theories destroyed simple, mechanistic certainty. Quаntum theory tells us that certainty is a physical impossibility, relativity that time is not absolute, and psychology that preconceptions color supposedly objective accounts of the natural world.
Black, A Unified Theory of Scientific Evidence, 56 Fordham L.Rev. 595 (1988). See generally Kuhn, The Structure of Scientific Revolutions 5-7 (critiquing scientific method by noting that "normal science” is based on the assumptions of the commonly accepted paradigms, but a new theory can lead to a change in the accepted paradigms of normal science).
.Pike County Highway Dept v. Fowler,
. Over a century ago, Holmes identified this problem: But supposing a state of facts is often repeated in practice, is it to be imagined that the court is to go on leaving the standard to the jury forever? Is it not manifest, on the contrary, that if the jury is, on the whole, as fair a tribunal as it is represented to be, the lesson which can be got from that source will be learned? Either the court will find that the fair teaching of expеrience is that the conduct complained of usually is or is not blameworthy, and therefore, unless explained, is or is not a ground of liability; or it will find the jury oscillating to and fro, and will see the necessity of making up its mind for itself.
O. Holmes, The Common Law 98 (1881).
. Merrell-Dow has already taken Bendectin off the market out of fear of liability, even though there is not one study which definitively proves that Bendectin is a teratogen. Doctors are now forced to prescribe less extensively studied drugs, or perhaps nothing at all. The effects of uncertainty regarding liability extend beyond Bendectin. Private companies are withdrawing, not only from marketing but even from develоping, birth control technology. Gladwell, U.S. Firms Abandoning Birth Control Industry in Wake of Lawsuits, Washington Post, May 1, 1988 at H-1, col. 1.
.
In Re Agent Orange Prod. Liab. Litig.,
. The court noted that:
[plaintiff’s expert's] testimony amounts to this: the affiants complain of various medical problems; animals and workers exposed to extensive dosages of TCDD have suffered from related difficulties; therefore, assuming nothing else caused the affiants’ afflictions, Agent Orange caused them. One need hardly be a doctor of medicine to make the statement that if X is a possible cause of Y, and if there is no other possible cause of Y, X must have caused Y. [Plaintiff’s expert’s] formulation avoids the problem before us: which of myriad [sic] possible causes of Y created a particular veteran’s problems_ His analysis, in addition to being speculative, is so guarded as to be worthless.
.
See also Lynch v. Merrell-National Laboratories,
. See A. Lilienfeld & D. Lilienfeld, Foundations of Epidemiology 3 (2d Ed.1980).
.
See Richardson by Richardson v. Richardson-Merrell, Inc., 857
F.2d 823, 831 (D.C.Cir.1988) ("Dr. Done [plaintiffs expert] further admitted that no one who has published work on Bendec-tin has concluded that there is a statistically significant association between Bendectin and
*313
limb reduction defects of the type at issue in this case.”);
Lynch v. Merrell National Laboratories,
. In Richardson, the plaintiffs below submitted a reanalysis of data performed by Dr. Done. In holding this study to be inconclusive, the court prominently noted that Dr. Done had not published his findings nor had he submitted them for peer review.
Swan’s study has never been referred or published in a scientific journal or elswhere. We are informed of it only by the defendant’s excerpts. On the basis of what we have, it could not form the foundation for an expert opinion challenging the scientific consensus and making the issue of causation a factual question to be decided by the jury.
Lynch,
. At trial, there was some question raised as to whether Dr. McBride had falsified the results of his study. One of Dr. McBride’s lab assistants during the experiments, Mr. Phillip Vardy, testified during the damages phase of triаl that Dr. McBride’s published results differed from the notes that Mr. Vardy kept during the experiments. The Brocks, in turn, have alleged that Mr. Vardy was improperly financially reimbursed by Merrell-Dow for his testimony, and that Mr. Vardy failed to disclose this financial arrangement when he was questioned about it. The Brocks filed a post appeal motion to supplement the record in regards to this issue, and that motion was denied. We need not address the merits of this argument, since we have held that, taken at face value. Dr. McBride’s conclusions are too speculative to support causation in the absence of epidemiological support.
