Blum Ex Rel. Blum v. Merrell Dow Pharmaceuticals, Inc.

705 A.2d 1314 | Pa. Super. Ct. | 1997

705 A.2d 1314 (1997)

Jeffrey BLUM, a minor by his Parents and Natural Guardians, Joan and Fred BLUM, and Joan and Fred Blum, in their own right, Appellees
v.
MERRELL DOW PHARMACEUTICALS, INC., and Rite Aid Discount Pharmacy.
Appeal of MERRELL DOW PHARMACEUTICALS, INC. ("Merrell Dow").

Superior Court of Pennsylvania.

Argued May 13, 1997.
Filed December 29, 1997.

*1315 Edward W. Madeira, Jr., Philadelphia, for appellant.

Thomas R. Kline, Philadelphia, for appellees.

Arlin M. Adams, Philadelphia, for Chemical Manufacturing Association, Amicus Curiae.

Before BECK and HUDOCK, JJ., and CERCONE, President Judge Emeritus.

BECK, Judge:

This is a pharmaceutical products liability action. Plaintiffs-appellees are Jeffrey Blum, a minor, and his parents and natural guardians Joan and Fred Blum. Jeffrey Blum was born with clubfeet. The Blums filed this action against defendant-appellant Merrell Dow Pharmaceuticals, Inc. ("Merrell Dow"), the manufacturer of the drug Bendectin. While pregnant with Jeffrey, Joan Blum took Bendectin, which was prescribed by her doctor to relieve pregnancy-related nausea. After trial in 1986, the jury returned a verdict in favor of the Blums, finding specifically that his mother's ingestion of Bendectin during pregnancy caused Jeffrey Blum's clubfeet. However, the verdict was ultimately vacated because it was rendered by only eleven jurors. Blum v. Merrell Dow Pharmaceuticals, Inc., 385 Pa.Super. 151, 560 A.2d 212 (1989), aff'd, 534 Pa. 97, 626 A.2d 537 (1993) (Blum I).

The matter was remanded for a new trial, which was held between May 5 and June 14, 1994, when the jury found in favor of the Blums, awarding $4 million in compensatory damages to Jeffrey Blum, $200,000 in compensatory damages to his parents, and $15 million in punitive damages. The court denied appellant's motions for judgment notwithstanding the verdict (j.n.o.v.) or for a new trial, and molded the verdict to include delay damages for a total award of $24,111,147. This timely appeal followed.

On appeal, Merrell Dow argues that the trial court should have entered j.n.o.v. in its favor because the Blums did not present sufficient admissible evidence of causation to hold Merrell Dow liable for Jeffrey Blum's injuries. Merrell Dow further argues that, even if j.n.o.v. is not entered, it is entitled to a new trial because of several trial court errors, including: 1) the admission of incompetent expert testimony on the issue of causation; 2) permitting the jury to learn that Merrell Dow lost the first trial of this case; 3) instructing the jury on fraud despite the fact that there was no evidence of fraud; 4) instructing the jury on implied warranty of *1316 fitness for a particular purpose where no evidence of a particular purpose was shown and where there is no such cause of action in a prescription drug case; 5) instructing the jury on express warranty where a breach was not shown; and 6) instructing the jury on punitive damages where Merrell Dow's conduct was neither outrageous nor reckless and where the award of punitive damages in this case violates the United States Constitution. After our exhaustive review of the complex arguments and extensive record, we reluctantly reverse and remand with instructions to the trial court to enter j.n.o.v. in favor of Merrell Dow.

Standard of Review

Faced with a motion for j.n.o.v., the court must decide whether, viewing the evidence in the light most favorable to the verdict winner, there was sufficient evidence to sustain the verdict; if there was, j.n.o.v. should not be granted. Sheely v. Beard, 696 A.2d 214 (Pa.Super.1997); Gray v. H.C. Duke & Sons, Inc., 387 Pa.Super. 95, 563 A.2d 1201 (1989). We hold that the trial judge abused his discretion in allowing certain scientific expert testimony on causation to be admitted at trial. In the absence of this causation evidence, judgment should have been entered in favor of Merrell Dow as a matter of law. We therefore reverse the trial court's order denying judgment n.o.v. We need not reach appellant's other issues on appeal.

Plaintiffs Must Prove Causation

In any tort action based on a theory of negligence or products liability, the plaintiff is required to prove by a preponderance of the evidence that the defendant's conduct was the proximate cause of the plaintiff's damage. Sherk v. Daisy-Heddon, 498 Pa. 594, 450 A.2d 615 (1982); Christian v. Pennsylvania Financial Responsibility Assigned Claims Plan, 454 Pa.Super. 512, 686 A.2d 1 (1996). The test for proximate causation is whether the defendant's acts or omissions were a substantial factor in bringing about the plaintiff's harm. First v. Zem Zem Temple, 454 Pa.Super. 548, 686 A.2d 18 (1996). In this case, the Blums were required to prove that Joan Blum's ingestion of Bendectin during her pregnancy was the proximate cause of her son's injuries. This general causation issue involves two underlying questions: 1) Does the drug Bendectin cause birth defects such as clubfeet? and 2) Did Bendectin cause Jeffrey Blum's clubfeet?

It is extremely difficult to answer these basic questions. Birth defects occur in two to three percent of births regardless of exposure to Bendectin. Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F.3d 1311, 1313 (9th Cir.1995). Most birth defects occur for no known reason. Id. The causation evidence in this case, therefore, must necessarily come in the form of probabilities rather than certainties. Of course, circumstantial evidence may suffice to prove causation in a tort case, but it must establish by a preponderance of the evidence that the alleged cause was a substantial factor in bringing about the claimed effect. Finney v. G.C. Murphy Co., 406 Pa. 555, 178 A.2d 719 (1962) (plaintiff in tort case is not required to prove with mathematical exactness and caliper precision that an incident could only happen in one manner to the exclusion of all other possibilities).

In an effort to answer the critical causation question, the Blums proffered scientific expert testimony from several witnesses. Alan K. Done, M.D., and Adrian Gross, D.V.M., testified at the first trial, and their testimony was read to the jury during the second trial. In the second trial, Stuart Newman, Ph.D., testified via videotaped deposition. These witnesses offered their opinions that Bendectin is a human "teratogen"[1] while conceding that birth defects occur even in the absence of Bendectin exposure. Done, Gross and Newman all testified as to general causation, that is, the teratogenic potential of Bendectin. Only Dr. Done opined more specifically that Bendectin caused Jeffrey Blum's clubfeet.

An expert witness is qualified to offer an opinion if he or she has sufficient skill, knowledge, or experience in a field or calling as to make it appear that his or her *1317 opinion or inference will probably aid the trier in its search for truth. Dambacher v. Mallis, 336 Pa.Super. 22, 485 A.2d 408 (1984). To be admissible, expert evidence on scientific matters must pass through an additional hoop.

The Frye Test

Our law is well established that the trial court enjoys broad discretion in admitting or excluding evidence. However, this discretion is tempered with regard to the admission of scientific evidence, that which "draws its convincing force from some principle of science, mathematics and the like." Before scientifically adduced evidence may be considered admissible, it must first be shown that it meets the standard established in Frye v. United States....

Commonwealth v. Rodgers, 413 Pa.Super. 498, 509-510, 605 A.2d 1228, 1234 (1992) (citations omitted).

Merrell Dow challenged the admissibility of the causation evidence proffered by the Blums, arguing that the opinions held by the Blums' expert witnesses did not meet the requirements for admissibility of scientific evidence set forth in Frye v. United States, 293 F. 1013 (D.C.Cir.1923), and adopted by our supreme court in Commonwealth v. Topa, 471 Pa. 223, 369 A.2d 1277 (1977). The Frye test represents an attempt to measure the quality of scientific evidence prior to admission, so that jurors are not misled by unreliable evidence. Our courts have considered this to be necessary whenever science enters the courtroom, because "there is the danger that the trial judge or jury will ascribe a degree of certainty to the testimony of the expert ... which may not be deserved." Topa, 471 Pa. at 230, 369 A.2d at 1281. Therefore, because scientific testimony should aid jurors rather than mislead them, admissibility of scientific evidence depends upon "the general acceptance of its validity by those scientists active in the field to which the evidence belongs." Id. at 231, 369 A.2d at 1281.[2]

Frye v. United States involved the admissibility of the "systolic blood pressure deception test," a test designed to determine whether a defendant was answering questions truthfully based on variations in blood pressure. 293 F. at 1013-1014. In rejecting evidence of Frye's test results, the court concluded that "the systolic blood pressure deception test ha[d] not yet gained such standing and scientific recognition among physiological and psychological authorities as would justify the courts in admitting expert testimony deduced" from it. Id. at 1014. The court rejected the test because its results were not "deduced from a well-recognized scientific principle or discovery." Id. In other words, the concept that a suspect's blood pressure rises while telling a lie must be a scientific fact that is "sufficiently established to have gained general acceptance in the particular field to which it belongs," before the results of a test based on this concept can be admitted in a court of law. Id.

In Commonwealth v. Topa, the Pennsylvania Supreme Court adopted this line of reasoning in rejecting expert testimony by a *1318 police lieutenant based on spectrograph, or "voiceprint" analysis of a recorded telephone call. The Commonwealth had proffered evidence of the defendant's voiceprint in order to prove that he had made a crucial telephone call. The voiceprint analysis was based on the theory that "if you use unique voice mechanisms to produce the sounds of your voice, then the sounds will also be unique," and the voiceprint would serve as a tool to identify a suspect's voice. The supreme court decided that this underlying scientific principle did not meet the requirements of Frye, and held that voiceprint analysis evidence should be rejected as unreliable:

The testimony of one expert, Lieutenant Nash, cannot satisfy this standard. Furthermore, we must conclude, from our study of those cases and commentaries which have addressed the issue, that the reliability of the sound spectrograph and voiceprint identification has not, as yet, been generally accepted by the scientific community concerned with acoustical science.

Topa, 471 Pa. at 232, 369 A.2d at 1282. See also Commonwealth v. Gee, 467 Pa. 123, 354 A.2d 875 (1976), overruled on other grounds, 510 Pa. 123, 507 A.2d 66 (1986) (the results of a polygraph examination are inadmissible for any purpose because the scientific reliability of such tests has not been sufficiently established).

It is for much the same reasons that testimony about the results of a certain field sobriety test, the "horizontal gaze nystagmus" (HGN) test is inadmissible as evidence that a driver was drunk. Commonwealth v. Apollo, 412 Pa.Super. 453, 603 A.2d 1023 (1992); Commonwealth v. Miller, 367 Pa.Super. 359, 532 A.2d 1186 (1987). The underlying principle operating in the HGN test is that when a person is intoxicated, his or her eyes will move in a distinctive, involuntary fashion, allowing a trained observer to estimate blood alcohol content. Miller, 367 Pa.Super. at 365-67, 532 A.2d at 1188-89. In Apollo, the Commonwealth's expert witness testified that "he was aware of no studies evaluating the reliability of the HGN test that have reached any conclusion other than it is the most accurate field sobriety test available." 412 Pa.Super. at 459, 603 A.2d at 1027. The defense, however, presented evidence that the data supporting the reliability of HGN tests have been criticized, and that a positive HGN response can occur from causes other than the consumption of alcohol. The trial court excluded the HGN evidence and this court affirmed, holding that the Commonwealth's expert evidence "did not establish the general acceptance in the scientific community of the HGN test as required by Topa." Id. In other words, the underlying principle of the test, that a positive HGN response reveals intoxication, was not sufficiently reliable to aid the factfinder rather than mislead it.

By way of contrast, our courts have concluded that it is generally accepted in the scientific community that individuals have unique DNA patterns, and that the methods for analyzing DNA have likewise gained general acceptance. Evidence of DNA test results has therefore been admitted in criminal trials in order to connect a defendant to a specific crime through his or her DNA "fingerprint." Commonwealth v. Crews, 536 Pa. 508, 640 A.2d 395 (1994); Commonwealth v. Rodgers, supra. Because the basic theory underlying DNA testing—that all individuals possess unique DNA patterns that can be accurately analyzed through physical matching procedures—is generally accepted, the Frye standard is met and testimony about the existence of a "match" may be admitted at trial. Crews, supra; Rodgers, supra.

However, the Crews court held that testimony about the statistical probability of a random DNA match did not meet the Frye test, and therefore was not admissible in our courts:

What has not yet achieved universal agreement is the less objective selection of the appropriate population for statistical analysis which is to be applied to the physical analysis carried out in the laboratory. About the statistical treatment of the physical evidence there remains disagreement and continuing theoretical development. In short, scientists are almost certain to agree, assuming competent laboratory techniques, that two DNA samples do or do not match at a given number of critical loci, called alleles, based on generally accepted *1319 physical testing procedures. What is not universally agreed is what conclusions can validly be drawn from the matches observed in the samples.

Crews, 536 Pa. at 520, 640 A.2d at 401. Although the Commonwealth's expert was permitted to testify that the DNA samples taken from the crime scene were "extremely strongly associated with the defendant," he was not allowed to draw conclusions stating numerical or statistical probability because these methods had "not achieved widespread acceptance within the scientific community." Id. at 522, 640 A.2d at 402. The supreme court held that the reliable portions of DNA testing and analysis, though perhaps inconclusive, were admissible under Frye, while the less reliable, possibly misleading statistical counterpart to this evidence was properly kept from the jury. Id.

The same concerns for reliability that led to the adoption and application of Frye in criminal cases "are no less present because the action is civil in nature." Liles v. Balmer, 439 Pa.Super. 238, 653 A.2d 1237 (1994). It is clear that Frye is the applicable standard for admissibility in this case.

Admissibility of Causation Evidence

The admission of expert testimony lies within the discretion of the trial court, and we may not reverse such a decision absent a clear abuse of discretion. Commonwealth v. Zook, 532 Pa. 79, 615 A.2d 1 (1992). Merrell Dow challenged the admissibility of the Blums' expert testimony on causation before, during and after the second trial, on the basis that their expert witnesses' opinions were not based on methodologies generally accepted by scientists who study the causes of birth defects in humans.[3] The trial judge deferred ruling on appellant's pre-trial motion, and did not hold a hearing to determine whether the profferred testimony met the Frye standard of admissibility. The judge did, however, engage in an extended colloquy with each live witness regarding his or her methods and conclusions, and concluded that he had conducted the type of inquiry required by Frye. The court also heard argument on the admissibility of testimony from the first trial by Drs. Done and Gross, transcripts of which were read to the jury in the second trial, and the videotaped deposition of Dr. Newman.

Dr. Done testified about four different sources of information he used to reach his conclusion that Bendectin caused Jeffrey Blum's clubfeet. First he considered "chemical structure analysis," and stated that the molecular structure of doxylamine (an antihistamine which he described as the part of Bendectin that is "harmful and teratogenic") makes the drug "suspect" as a "possible" teratogen. However, even if the science leading to these statements were valid, such statements would lack the certainty necessary to establish causation. Smail v. Flock, 407 Pa. 148, 180 A.2d 59 (1962) (it is not enough for an expert to say something could have happened or to guess; expert testimony must assert that the result came from the cause alleged).

*1320 Dr. Done then analyzed the effects of Bendectin on animal cells in in vitro studies, and testified about live animal (in vivo) studies, while conceding that such studies do not prove that a drug will have the same effect on humans, or on any individual. Dr. Done was "unsure" whether one could extrapolate the results from animals to humans, and acknowledged that based on animal studies alone, he could not determine to a reasonable degree of scientific certainty that Bendectin was teratogenic in humans.

Dr. Done finally testified about human, or epidemiological,[4] studies. Despite the fact that no published epidemiological studies demonstrated a statistically significant association between Bendectin and limb defects, Dr. Done found evidence that Bendectin causes clubfeet when he recalculated some data in one published study, the Heinonen study, even though the authors of the published study had reached the contrary conclusion.[5] As will be explained later, Dr. Done's "recalculation" was based on a methodology that was not generally accepted. Other human data used by Dr. Done in reaching his conclusions were derived from unreported preliminary data generated by Dr. Jick. Dr. Jick himself criticized this data as "biased, outdated, or premature and preliminary." Dr. Done admitted that it would be inappropriate for a scientist to rely upon such data. R. 5290a-91a. Finally, Dr. Done agreed that the FDA, after complete review in 1980, found that available evidence showed no basis for a conclusion that Bendectin causes or increases the risk of birth defects in humans.[6] It is important to note that Dr. Done has never published his conclusions in any scientific journal so as to enable his peers to evaluate them scientifically.

Dr. Gross also testified for the Blums, inferring from animal studies alone that Bendectin causes birth defects. He claimed to have found a dose/response relationship in the data. He testified that:

the overall conclusion is that in each of the studies, the agent on test, which was either bendectin, the three ingredients, bendectin or doxylamine succinate or one of its ingredients, can be regarded as teratogenic, in that it significantly affects and it increases the frequency of birth defects, resorption, death, in the totality of all of these studies... in sum total adds up to the same picture.... These agents interfere with normal development of the young.

Blum I, 385 Pa.Super. at 155, 560 A.2d at 214. However, even Dr. Done admitted that animal studies alone, without corroborating human data, are not reliable because drugs do not have the same effect on humans as on animals; there are 600 proven teratogens for animals and only 15 for humans. N.T. 12/12/86 at 179. Like Dr. Done, Dr. Gross never published his conclusions in a scientific journal.

In addition, the testimony of Stuart Newman, Ph.D., a molecular biologist, was presented through a videotaped deposition taken in an unrelated case. He studied Bendectin through the literature, and concluded that the drug was a human teratogen. Newman based his opinion on in vitro animal cell studies and his belief that Bendectin had a certain chemical solubility and structural similarities similar to antihistamines, which *1321 he identified as teratogenic. Like Drs. Done and Gross, Dr. Newman never published his conclusions in a scientific journal.

Finally, the Blums read selected portions of the first trial testimony of Dr. Paul Stolley, who testified that certain unpublished recalculations revealed that a woman exposed to Bendectin during the first 12 weeks of pregnancy is three times more likely to have a child with a malformation than a woman whose first exposure came later. However, Stolley also testified that no causal relationship can be inferred from occasional associations that have been shown in some studies but not in others. Blum I, 385 Pa.Super. at 159, 560 A.2d at 215.

Dr. Done was the only witness to testify on behalf of the Blums that Bendectin specifically caused Jeffrey Blum's clubfeet. He concluded that Bendectin is a teratogen and, to a reasonable degree of certainty, was "`capable of contributing to the formation of Jeffrey's limb malformation[;]' there `being no reasonable likelihood ... that there could be any other basis for explaining it.'" Blum I, 385 Pa.Super. at 154, 560 A.2d at 213.

The Trial Court's Opinion

The Honorable Mark I. Bernstein stated that he applied Frye to the scientific evidence presented in this case, and made inquiries during trial of several witnesses in an attempt to discover what methodologies were generally accepted in many relevant fields, including epidemiology, toxicology, and pharmacology. At the same time, he criticized the limits that Frye places on the factfinder, and opined that exclusion of the Blums' causation evidence, though it represented unorthodox ideas, would needlessly "castrate" the jury. The judge, who wrote a remarkably passionate opinion with a painstaking analysis of the many complicated issues raised in this case,[7] evidently believed that new scientific ideas which are not yet "generally accepted" nonetheless deserve a hearing in court during a jury trial in a tort action for money damages. The court had confidence in the power of cross-examination to expose incredible, unscientific and unreliable expert testimony for what it is.

The Frye test, however, is designed to ensure that unreliable scientific evidence never reaches—or misleads—the factfinder. Moreover, the trial court's conclusion that both the plaintiffs' and defendant's experts used the same methodology to reach different results is not supported by the record. The Blums simply did not meet their burden of proving that their experts' reasoning and methodology—let alone their conclusions—were generally accepted by the relevant scientific communities. Dr. Done's own testimony that his opinions were based on generally accepted methods was not sufficient to carry this burden. Commonwealth v. Apollo, 412 Pa.Super. at 461-62, 603 A.2d at 1028. Compare Commonwealth v. Middleton, 379 Pa.Super. 502, 550 A.2d 561 (Pa.Super.1988) (scientific testimony admissible where expert's opinion was not merely based on personal views, or views of a small segment of scientific community, but rather were generally accepted). Therefore, the Blums' evidence on causation should not have been admitted at trial. Without such evidence, the Blums could not meet their burden of proof in this tort case, and j.n.o.v. should have been entered.[8]

*1322 The legal process has always relied on cross-examination of the expert to test the veracity of the expert's testimony. However, in dealing with complex scientific theories, cross-examination is not the appropriate tool to test the speciousness or accuracy of the expert's testimony where the evidence on which that testimony is based is not deemed reliable.

The judge in considering admissibility does not decide whether the propositions or theories are true or false. Rather the judge as gatekeeper decides whether the expert is offering sufficiently reliable, solid, trustworthy science. The question is: is the science good enough to serve as the basis for the jury's findings of fact, or is it dressed up to look good enough, but basically so untrustworthy that no finding of fact can properly be based on it. If the latter is true, the integrity of the trial process would be tainted were the jury to consider it.

The Scientific Principles Underlying the Blums' Expert Testimony Do Not Meet the Frye Test

A close reading of the relevant cases yields two ways to analyze the question of whether the causation testimony proffered in this case meets the Frye/Topa standard. One focuses on whether the causal relationship is generally accepted by the scientific community, and the other on whether the methodology is generally accepted by the scientific community. The Blums' evidence is inadmissible under either analysis.

In McKenzie v. Westinghouse Elec. Corp., 674 A.2d 1167 (Pa.Cmwlth. 1996), the Pennsylvania Commonwealth Court focused on the causal relationship. The court was faced with facts and legal issues almost identical to those in this case. The McKenzies sought recovery from defendant Westinghouse for injuries suffered by their daughter, who was born with, and eventually died from, a heart malformation. The plaintiffs alleged that the defendant's electrical plant produced certain substances that contaminated drinking water used by Mrs. McKenzie during her pregnancy, leading to her daughter's birth defect. The McKenzies presented expert testimony purporting to prove that the chemicals (TCE and DCE) were teratogenic, while conceding that this view was not generally accepted in the scientific community. They argued, however, that the methodology used by their experts was generally accepted, and that their testimony establishing causation should therefore be admissible under Frye and Topa. Id., 674 A.2d at 1171. If the expert is wrong, the McKenzies asserted, it is up to the jury to determine that fact after hearing the evidence. Id. This is essentially the argument made by the Blums in this case, and accepted by the trial judge.

The Commonwealth Court rejected this argument, interpreting Frye and Topa to hold that "there must be a showing, not that the studies establishing the causal relationship follow generally accepted methodologies, but that the existence of the causal relationship is generally accepted by the relevant medical community." Id., 674 A.2d at 1172. (emphasis added). In other words, the McKenzie court held that the underlying scientific principle of the plaintiffs' expert testimony was that TCE and DCE are human teratogens that cause malformations in a developing fetus, and that since this principle was not generally accepted, it rendered the scientific evidence based upon it inadmissible. The court thus held that the trial court did not abuse its discretion in precluding the evidence, and granting summary judgment to the defendant. Id.

Although we are not bound by the Commonwealth Court's analysis in McKenzie, we do note that its application to the record in this case would yield the same results.[9] Not one published study of Bendectin has reached the conclusion espoused by the Blums' experts. The underlying scientific principle of their testimony—that Bendectin causes birth defects, and more specifically, that it caused Jeffrey Blum's clubfeet—is not generally accepted in the relevant scientific communities. See Blum I, 385 Pa.Super. at 155, 560 A.2d at 214 (Drs. Done and Gross "held fast to their opinions—despite the flaws which plagued some of the studies they relied upon and the existence of a body of *1323 contrary views held by others in the field"). In applying McKenzie, we conclude that if the critical "underlying principle" that must be generally accepted before scientific evidence is admissible is that "Bendectin causes clubfeet," the Blums' causation evidence is not reliable, and inadmissible under Frye/Topa.

Turning to the second type of analysis used by Pennsylvania courts in applying Frye, we consider whether the methodology underlying the proffered expert testimony is generally accepted by the scientific community. In this case, Merrell Dow challenged the admissibility of the Blums' scientific evidence on the basis that the methodology utilized by their expert witnesses was not generally accepted.[10] The trial court asked questions of the live witnesses that satisfied him that Drs. Done, Gross and Newman used the same methods and techniques as the defense experts, and that these methods—chemical structure analysis, in vitro studies, in vivo studies and epidemiological studies—were generally accepted. The court thus concluded that their opinions on causation met the Frye standard for admissibility.

It is true that even Merrell Dow's witnesses agreed that each of these types of analysis had scientific validity and could contribute to an accurate picture of scientific fact. However, Merrell Dow argues that it is the way in which these studies were used by the Blums' experts that is not generally accepted.

We are persuaded by this argument. Testimony by a qualified expert "doesn't become `scientific knowledge' just because it's uttered by a scientist; nor can an expert's self-serving assertion that his conclusions were `derived by the scientific method' be deemed conclusive." Daubert, 43 F.3d at 1315-16. We are faced in this case with the use by scientists of statistical probabilities, and complicated epidemiological evidence. The Ninth Circuit cogently observed in Daubert that "scientists often have vigorous and sincere disagreements as to what research methodology is proper, what should be accepted as sufficient proof for the existence of a `fact,' and whether information derived from a particular method can tell us anything useful about the subject under study." 43 F.3d at 1316. Under this analysis, as distinguished from McKenzie, we do not ask whether the expert's conclusions regarding the teratogenic effects of Bendectin are generally accepted. Rather, we consider the "underlying principle" which must be generally accepted to be that the methods used by the experts to arrive at their conclusions actually give an accurate prediction of human teratogenicity.

The methodology used to assess the teratogenicity of drugs is more complex than simply collecting certain types of data, i.e., from chemical structure analysis, in vitro and in vivo studies, and re-analysis of epidemiological studies. Replicated epidemiological studies consistently finding a strong association are necessary to establish causation; chemical structure analysis and in vitro testing can confirm the biological plausibility of a causal relationship suggested by epidemiology, but without an epidemiologically demonstrated association, they contribute nothing to the demonstration of causation. Richardson v. Richardson-Merrell, Inc., 857 F.2d 823 (D.C.Cir.1988). Animal studies can also provide evidence suggestive of causation. However, animal studies without epidemiological studies cannot prove causation in humans because drugs do not have the same effect on humans as they do on animals; the doses given to animals in animal studies are very different from those given to humans. Even Dr. Done admitted that animal studies would not be sufficient to prove that Bendectin is teratogenic in human beings. N.T. 12/12/86 at 178. The fact that a few of the animal species tested in studies discussed in this case developed some kinds of birth defects after being given many times the human dose of Bendectin cannot substitute for the lack of epidemiological evidence that Bendectin causes clubfeet in humans. No epidemiological study of Bendectin concludes that there is a statistically significant relative risk high enough to support a claim of general causation of clubfeet.

Epidemiology deals with population samples and seeks to generalize those results; it goes from the specific, i.e., a sample, to the *1324 general, i.e., a population. Epidemiology provides useful information as to whether there is a relationship between an agent and a disease and, when properly interpreted, can provide insight into whether the agent can cause the disease. Dr. Done did not perform an epidemiological study. What he did do was reanalyze the results of other studies. Expert testimony may indeed be based upon data of a type "customarily relied upon by experts in the practice of their profession." Primavera v. Celotex Corp., 415 Pa.Super. 41, 608 A.2d 515, 518 (1992). However, in this case, Dr. Done did not properly utilize the data he relied upon to reach his controversial conclusions.

While epidemiologists choose their data and engage in statistical analysis in order to ensure that their experimental populations are not biased, Dr. Done did not. The Heinonen study used by Dr. Done included data from women who gave birth to children with clubfeet. Some of the women chosen for the study had taken Bendectin. Dr. Heinonen standardized the raw data for factors such as hospital mix,[11] ethnic group, and age of the mother in order to remove bias. This means the study—as a statistical, epidemiological test—was designed to account for certain systemic errors such as selection bias and confounding bias. Selection bias occurs when the subjects chosen to participate in the study are not representative of the general population. Confounding bias occurs when a study fails to control adequately for certain factors that vary between groups and, as a result, the groups being compared are not truly similar. Standardizing the data can adjust for these biases. Dr. Heinonen's published conclusions based on standardized data did not support plaintiffs' claim that Bendectin caused clubfeet.

Dr. Done eliminated all standardization in the Heinonen study data and thereby calculated a crude relative risk to support his opinion on causation, using "simple arithmetic." His conclusions, as a result, are not free of the bias—or "background noise"— that is taken into account in valid epidemiological studies. Epidemiological analyses that are not standardized are not generally accepted.

Moreover, he has never seen fit to publish his conclusions.[12] His methods call to mind the statistical evidence rejected in Crews, where our supreme court held that the fact-finder would be misled rather than aided by statistical evidence that has not "achieved widespread acceptance within the scientific community." Crews, 536 Pa. at 522, 640 A.2d at 402.

Although the general types of studies relied on by the Blums' experts are universally *1325 accepted as good science, the way they have utilized them to draw conclusions is not. Results derived from chemical analysis, in vitro and in vivo studies do not yield sufficiently reliable conclusions as to causation unless supported by epidemiological evidence. Dr. Done who was the only witness to testify specifically that Jeffrey Blum's clubfeet were caused by Benedictin, relied on epidemiological evidence. But his elimination of standardization from the epidemiological analysis made the epidemiological methodology not generally accepted. To be more specific, his epidemiological analysis was so flawed as to render his conclusions unreliable and therefore inadmissible.

Because Dr. Done's conclusions were too inherently unreliable to be submitted to the jury and no other witness proffered by the Blums would have testified that Bendectin caused Jeffrey's clubfeet, the conclusion is inescapable that the Blums failed to present properly admissible evidence raising a jury question as to causation.

Conclusions

It is true that effective cross-examination is a powerful tool, and suffices to reveal the weaknesses in a witness's testimony where the lay jury is faced with common-sense questions of credibility or abilities of observation. However, the complex, confusing and possibly misleading details of scientific testimony do not so readily lend themselves to accurate assessment by even the most discerning jury. Much of such testimony is sophisticated and difficult to comprehend, and an analysis of the scientific validity of the methodologies underlying the testimony is simply beyond the capabilities of most lay persons. Therefore, the gatekeeping role of the court, far from detracting from the jury's function, is in fact essential to it: scientific methodology and conclusions must initially be scrutinized by the court to ensure that what might appear to the jury to be science is not in fact speculation in disguise. Properly supported scientific evidence, however complex, can then reach the jury for its consideration, while material whose complexity merely hides its unreliability is winnowed out. This is, in essence, the teaching of Frye, and that teaching remains valid.

As we have demonstrated, the evidence submitted by the Blums on the issue of causation fails to meet the requirements of Frye, and the trial court therefore abused its discretion in admitting it.

Without such evidence, the appellees could not meet their burden of proof in this tort action. We have reviewed the record and decide that j.n.o.v. should have been entered on it; there is nothing to be gained by returning the matter for a third trial. The enormous record in this case, including more than 7,000 pages from two trials spanning 16 weeks, contains all relevant expert testimony proffered by the Blums, and we have reached our holding through an application of the relevant law to that ponderous record. We reverse the trial judge's decision denying j.n.o.v., and remand with instructions to the trial court to enter that judgment in favor of Merrell Dow.

Reversed and remanded for entry of judgment n.o.v.

NOTES

[1] A "teratogen" is an agent that causes the production of physical defects in the developing embryo.

[2] We are aware that the Frye test has been superseded in the federal courts by the Federal Rules of Evidence, and the interpretation of these rules set forth by the United States Supreme Court in Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579, 113 S. Ct. 2786, 125 L. Ed. 2d 469 (1993). See also Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F.3d 1311 (9th Cir.1995) (new test for admissibility of scientific evidence applied on remand from United States Supreme Court; scientific evidence inadmissible under Federal Rules of Evidence). We are not bound by the Federal Rules of Evidence, and our own supreme court has confirmed in a recent footnote that Daubert, interpreting and applying the Federal Rules of Evidence, does not control our analysis of the admissibility of scientific evidence. Commonwealth v. Crews, 536 Pa. 508, 518, 640 A.2d 395, 400 n. 2 (1994). It is not clear what effect, if any, the adoption of codified Pennsylvania Rules of Evidence, modeled quite closely on the Federal Rules, would have on this matter. Be that as it may, the Pennsylvania Supreme Court stopped short of expressing an interest in abrogating the Frye/Topa rule. Crews, supra; Dalrymple v. Brown, 549 Pa. 217, 701 A.2d 164 (1997) (Newman, J. concurring). See also McKenzie v. Westinghouse Elec. Corp., 674 A.2d 1167, 1172 n. 4 (Pa.Cmwlth. 1996), alloc. denied, 547 Pa. 733, 689 A.2d 237 (1997) (no Pennsylvania case law has addressed the effect of Daubert on the continued viability of the Frye standard in Pennsylvania). We therefore proceed in this case under the Frye rule, although as discussed in more detail infra at footnote 8, the application of Daubert would not, in our opinion, alter the outcome here.

[3] The Blums argue that Merrell Dow waived any arguments on this issue because it failed to challenge the evidence at the first trial, and on appeal after the first trial. It is true that neither this court nor the Pennsylvania Supreme Court considered any argument by Merrell Dow on the admissibility of the Blums' causation evidence in Blum I. Blum v. Merrell Dow Pharmaceuticals, Inc., 385 Pa.Super. 151, 560 A.2d 212 (1989), aff'd, 534 Pa. 97, 626 A.2d 537 (1993). The Blums claim that these decisions effectively estopped Merrell Dow from ever raising this issue during retrial, and that it is likewise waived in this appeal. We disagree. The grant of a new trial wipes the slate clean of the former trial. Commonwealth v. Oakes, 481 Pa. 343, 392 A.2d 1324, 1326 (1978). The case is left as though no trial had been held. Id. (citing Commonwealth v. Hart, 479 Pa. 84, 387 A.2d 845 (1978)). The new trial should be "unfettered by the rulings, pro or con, made at the first trial, and with the right to have new rulings on evidence, points for charge and other matters which arise in the course of a trial." Oakes, 481 Pa. at 348, 392 A.2d at 1327. Nor does the rule of Dilliplaine v. Lehigh Valley Trust Co., 457 Pa. 255, 322 A.2d 114 (1974), foreclose Merrell Dow's claim on appeal; that rule applies only where error at trial has been waived such that the trial court was never given an opportunity to remedy its own errors, and where an appellate court does not have an appropriate record to consider on appeal. In the instant appeal, we must concern ourselves with claimed errors that have been preserved in the second trial, which was a wholly new trial. See Commonwealth v. Throckmorton, 241 Pa.Super. 62, 359 A.2d 444 (1976) (no waiver of issue raised for first time at new trial, even though issue not raised or preserved by objection at first trial).

[4] Epidemiology is the study of the distribution and determinants of disease in human populations. Blum I, 385 Pa.Super. at 157, 560 A.2d at 214-15. Epidemiologists consider whether causation may be inferred by comparing the incidence of a disease in a group of humans who have been exposed to the substance in question with the incidence in a group of humans who have not been exposed to the substance. This ratio is described as an "odds-ratio" or "relative risk." R.2097a, 2098a, 2123a.

[5] In fact, the authors cautioned the reader of the raw data taken from their appendix, since "the positive associations even when striking are uninterpretable without independent confirmatory evidence. Also, estimates of statistical significance are improper and none are presented.... It must be reemphasized that the data presented in this appendix cannot be used to infer that a particular drug causes a particular malformation or group of malformations without independent confirmation." Birth Defects and Pregnancy, Heinonen, Slone and Shapiro. R. 5078. Even Dr. Done agreed that a "positive association" is distinguished from one that is "statistically significant."

[6] To this day, Bendectin remains approved by the FDA for sale in the United States; it was voluntarily withdrawn from the market by Merrell Dow in 1983. Apparently no decrease in the incidence of birth defects was detected when Bendectin was removed from the market.

[7] Judge Bernstein's 62-page opinion is accompanied by three appendices (totaling another 20 pages), entitled "Scientific Uniformity," "Science and Justice," and "Frye, Daubert, Policy and Pennsylvania Law."

[8] Even were we to analyze the evidence under the new federal standard, as set forth in Daubert v. Merrell Dow, 43 F.3d 1311, we would reach the same conclusion. Although the Daubert standard would appear to be more "open" or flexible than Frye with regard to the admission of scientific evidence, federal courts utilizing the Daubert test have uniformly decided that the evidence profferred by plaintiffs like the Blums to prove that Bendectin is teratogenic, most particularly the testimony of Alan Done, is not admissible. Daubert, supra. See, e.g., Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F.3d 594 (9th Cir.1996) (Dr. Done's causation testimony properly excluded by trial court); Ealy v. Richardson-Merrell, Inc., 897 F.2d 1159 (D.C.Cir.1990) (j.n.o.v. should have been entered in Bendectin/birth defect case because there was insufficient evidence of causation); Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307 (5th Cir.1989) (plaintiffs did not present sufficient evidence to allow jury to make reasonable inference that Bendectin caused limb reduction defect); DeLuca v. Merrell Dow Pharmaceuticals, Inc., 791 F. Supp. 1042 (D.N.J.1992) (Dr. Done's testimony excluded in Bendectin/limb reduction defect case); Ambrosini v. Richardson-Merrell, Inc., 1989 WL 298429 (D.D.C.1989).

[9] We also note that the Pennsylvania Supreme Court denied review of the Commonwealth Court's decision. McKenzie v. Westinghouse Elec. Corp., 547 Pa. 733, 689 A.2d 237 (1997).

[10] This is the pivotal question under a Daubert analysis. 43 F.3d at 1318.

[11] Some hospitals have older, sicker, poorer patients than do other hospitals, e.g., an inner city hospital as compared to a suburban hospital. The race, diet, and prenatal medical care, among many other things, are likely to vary. Standardization of data across hospitals allows researchers to take into account those different patient "mixes." Because failing to standardize hospital-based studies would inject biases into the results of any study, such failure is inconsistent with the generally accepted practices of epidemiology. See, e.g., 55 Pa.Code § 1163.126(b) (requiring standardization as part of the formula for reimbursing hospitals under Medicaid).

[12] Publication is a critical factor in the Daubert analysis, but it also serves to illuminate the question under Frye of whether a particular method or conclusion is generally accepted in the relevant scientific community. The reason publication and peer review is essential is that the basis for the expert's testimony is exposed to evaluation by knowledgeable scientists. It is inadequate for an expert to self-certify his own methodology and results. An important function of peer review is to screen out totally unsubstantiated or bogus science. While peer review does not ensure the validity of scientific findings, it subjects the theories or experiments to careful scrutiny.

It has been said that peer review prevents nonconventional, scientific theories from entering the scientific debate. This is not an accurate criticism. In every field there is a wide array of scientific periodicals. An unconventional experiment or theory or an unknown scientific author may find the pages of premiere journals unavailable, but this does not mean that other publications in the field will not publish such material and thereby generate robust scientific debate. The revolutionary scientific advances of the twentieth century would not have occurred if scientific publications of all stripes did not encourage both conventional and non-conventional science.

However, where the science has not been exposed to peer review the courts must examine with caution whether the expert's scientific observations and conclusions are based on the accepted scientific method before admitting or rejecting such evidence.

midpage