This appeal requires us to clarify and apply the harmless error test applicable to civil trials in our circuit.
I.
Appellant, Ronald L. Obrey, Jr., originally filed suit for declaratory and in-junctive relief, alleging that he was twice denied a promotion to the position of Production Resource Manager at the Pearl Harbor- Naval Shipyard (hereinafter, the “Shipyard”) on the basis of his race in violation of Title VII of the Civil Rights Act of 1964, as amended, 42 U.S.C. § 2000e et seq. (2000). Obrey alleged that the defendant, the Secretary' of the Navy,' had engaged in a pattern or practice of discriminating against qualified candidates of Asian-Pacific ancestry in favor of Caucasian applicants for senior management positions at the Shipyard. In a pre-trial hearing, the district court issued several evidentiary rulings excluding the principal evidence supporting Obrey’s pattern or practice claim. After a jury trial, judgment was entered against Obrey. The district court’s evi-dentiary rulings form the basis for this appeal.
The Pearl Harbor Shipyard is one of four Navy shipyards operated by the Navy organizational unit, the Naval Sea Systems Command. Obrey, an Asian-Pacific Islander, has, from 1995-2002, worked as a Project Superintendent at the Shipyard. In 2002, Obrey applied for the Production Resource Manager’s (“PRM”) position at the Shipyard, a position which carried a promotion from his current grade level of GM-14 to a GS-15 grade. Nine other individuals also applied. Pursuant to Navy guidelines, the applicants were rated in three categories, including relevant knowledge, ability to plan and manage resources, and ability to perform supervisory management functions. On the - basis of this rating, Obrey was ranked sixth out of ten applicants during the first, round of hiring, and fifth out of the- eight competitive applicants in the second round. The PRM, position rvvas subsequently offered to Ernest Chamberlain in the first round of hiring, and then David Reilly in the second, both of whom are Caucasian males and both of whom declined the offer. Recruitment was then cancelled.. . ■
In this appeal, Obrey claims that the district court abused its discretion in failing to admit three pieces of evidence: (1) a statistical report showing a correlation between race and promotion at the Shipyard; (2) the testimony of a Shipyard employee who recalled conversations in which Shipyard officials expressed discriminatory bias toward the local Asian-Pacific Islanders; and (3) the anecdotal testimony of three Shipyard employees who also believed they had suffered race discrimination at the Shipyard. The Navy argues that the exclusion was proper but "that, even if the district court erred, the error was harmless.-. Addressing each evidentia-ry ruling- in turn, we find that the district court’s decision excluding this evidence was an abuse of discretion as to all. We further conclude that the error was not harmless.
A.
The district court denied Obrey’s motion in limine to admit statistical evidence regarding hiring practices for senior-level positions at the Shipyard. The hiring practice evidence at’ issue was compiled through discovery and included the hiring history of the Pearl Harbor Shipyard for the period 1999-2002. Obrey retained Jaimes Dánnemiller, a statistician with SMS Research & Marketing Services, Inc., to analyze this data and provide a statistical report and opinion. Dannemiller’s re *694 port concludes that “[tjhere is no statistical evidence ... that the selection process for GS13 through GS15 positions between 1999 and 2002 were unbiased with- respect to race.”
The government challenged the admission of Dannemiller’s report on the ground that it was so incomplete that it was inadmissible as irrelevant, unfairly prejudicial, and unreliable.
See
Fed. R. Evid. 402, 403, 702. In the government’s view, the statistical analysis was inadmissible because it failed to account for the relative qualifications of the applicants being studied. The district court denied Obrey’s motion to admit Dannemiller’s statistical evidence. Although the court did not specify its reasons, presumably its ruling was based on the perceived, irrelevance and unreliability of the statistics. While we review evidentiary rulings for an abuse of discretion,
Coursen v. A.H. Robins Co., Inc.,
Obrey’s claim was premised on the theory that the Navy had engaged in a pattern or practice of discriminatory hiring practices. Employment discrimination claims styled in this manner are governed by “controlling legal principles that are relatively clear.”
Int’l Bhd. of Teamsters v. United States,
As the plaintiff, Obrey bore the initial burden of making out a prima facie case of discrimination.
Cooper v. Fed. Reserve Bank of Richmond,
In a case in which the plaintiff has alleged that his employer has engaged in a “pattern or practice” of discrimination, “Statistical data is relevant because it can be used to establish a general discriminatory pattern in an employer’s hiring or promotion practices. Such a discriminatory pattern is probative of motive and can therefore create an inference of discriminatory intent with respect to the individual employment decision at issue.”
Diaz v. Am. Tel. & Tel.,
Obrey’s statistical evidence was not rendered irrelevant under Rule 402 simply because it failed to account for the relative qualifications of the applicant pool.
See
Fed. R. Evid. 402 (“All relevant evidence is admissible, except as otherwise provided [by law]. Evidence which is not relevant is not admissible.”) A statistical study may fall short of proving the plaintiffs case, but still remain relevant to the issues in dispute. The Dannemiller study may be relevant, and therefore admissible, even if it is not sufficient to establish Obrey’s
pyima facie
case or a claim of pretext. Thus, objections to a study’s completeness generally go to “the weight, not the admissibility of the statistical evidence,”
Mangold v. Cal. Pub. Utils. Comm’n,
Statistics showing racial or ethnic imbalance are probative ... because such imbalance is often a telltale sign of purposeful discrimination;.... Considerations such as small sample size may, of course, detract from the value of such evidence, and evidence showing that the figures for the general population might not accurately reflect the pool of qualified job applicants would also be relevant.
Teamsters,
In some cases, statistical evidence may suffer from serious methodological flaws and can be excluded, consistent with the trial court’s “gatekeeping” power, under Rule 702.
See Kumho Tire Co. v. Carmichael,
Here, the Dannemiller study is based entirely on statistical disparities. While we, and other courts, have commented on the inadequacy of such studies, we have typically done so in the context of finding insufficient evidence to support a
prima facie
case of discrimination, and not to rule those studies inadmissible for purposes of Rule 702.
See, e.g., Coleman v. Quaker Oats Co.,
In sum, Dannemiller’s study was relevant for what it purported to analyze: the race of managers selected at the Shipyard
*697
compared to the race of those who applied for managerial positions. While, by itself,this cannot constitute proof that the Navy discriminated against Obrey,
see Cooper,
B.
The district court also excluded the testimony of a single Shipyard worker, Mr. Toyama, on the grounds -that his evidence was irrelevant. Fed. R. Evtd. 401 (“ ‘Relevant evidence’ means evidence having any tendency to make the existence of any fact that is of consequence to the determination of the action more probable or less probable than it would be .without the evidence.”). Toyama was expected to testify that Shipyard officials had informed him that off-yard employees were rotated to Pearl Harbor on a temporary basis because the “local” workers “were not good enough” and “can’t do a good job.”
Toyama’s testimony was plainly relevant to the issue of whether the defendant preferred off-yard, predominantly Caucasian, workers over the “local” Asian-Pacific Islanders. We have observed that “evidence that the defendant has made disparaging remarks about the class of persons to which plaintiff belongs! ] may be introduced to show that the defendant harbors prejudice toward that group.”
Lam v. Univ. of Haw.,
Toyama’s testimony was also relevant to whether the Navy’s proffered race-neutral reasons for preferring off-yard workers was a pretext for unlawful race discrimination. Obrey asserts that Toyama also would have challenged the Navy’s claim that off-yard managers were more capable of performing their tasks within the Shipyard’s budget by demonstrating that the imported managers were funded by budgeted funds separate and'apart from the Shipyard’s budget. According to Obrey, this testimony would have cast doubt on the Navy’s explanation by demonstrating that the off-yard managers exerted no effect whatsoever on the Shipyard’s budget.
Because Toyama’s testimony tended to make the existence of discriminatory bias and pretext more probable than it would be without his testimony, we find that the district court abused its discretion by excluding this evidence.
C.
The district court also excluded the testimony of three Shipyard workers, Kawachi, Pestaña and Tai See, who were prepared to testify that the Shipyard discriminated against them on the basis of race when it failed to select them, for supervisory .positions. The court found that the testimony at issue would require the jury to assess the discrimination claims of each of the three proposed witnesses by, essentially, conducting thrpe abbreviated employment discrimination trials. The court concluded that the testimony should be excluded on the basis of Federal Rule of Evidence 403, presumably because considerations of undue delay and waste of *698 time outweighed its probative value. See Fed. R. Evid. 403 (“Although relevant, evidence may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury, or by considerations of undue delay, waste of time, or needless presentation of cumulative evidence.”).
Like statistical evidence, anecdotal evidence of past discrimination can be used to establish a general discriminatory pattern in an employer’s hiring or promotion practices. While such evidence might prove inadmissible in the typical case of individual discrimination, in a case involving a claim of discriminatory pattern or practice “the combination of convincing anecdotal and statistical evidence is potent.”
Coral Constr. Co.,
We recognize, however, that the district court retains broad discretion to determine whether the probative value of the evidence at issue is substantially outweighed by considerations of “undue delay, waste of time, or needless presentation of cumulative evidence.” Fed. R. Evid. 403;
see also R.B. Matthews, Inc. v. Transamerica Transp. Servs., Inc.,
We acknowledge that the trial court was properly concerned with the prospect of mini-trials on the witnesses’ own claims of discrimination. The trial court should have first addressed these concerns with the parties through other, less restrictive means. On balance, we believe that this proposed testimony was likely to be relevant, and Rule 403 considerations do not warrant exclusion in this case. Consequently, we find that the district court abused its discretion when it excluded this testimony. On remand, the district court, of course, will retain discretion to decide that the witnesses’ claims so overwhelm the issues in the trial that their testimony must be excluded under Rule 403.
n.
Turning to the question of harmless error, we note, initially, that judicial error alone does not mandate reversal. Rather, in order to reverse, we must find that the error affected the substantial rights of the appellant.
See
Fed. R. Evid. 103(a) (“Error may not be predicated upon a ruling which admits or excludes evidence unless a substantial right of the party is affected... .”); Fed. R. Civ. P. 61 (“The court at every stage of the proceeding must disregard any error or defect in the proceeding which does not affect the substantial rights of-the parties.”). In other words, we require a finding of prejudice.
See Kisor v. Johns-Manville Corp.,
In a somewhat contradictory fashion, howevér, we have formulated two variations of the test for prejudice in civil cases. In
Haddad,
we held that the reviewing court
must
find prejudice
unless
it concludes that the verdict is “more probably than not untainted by the error.”
Id.
Purporting to restate the standard set forth in
Haddad,
we later wrote in
Kisor,
that “[t]o reverse, we must say that more probably than not, the error tainted the verdict.”
Kisor,
Making matters worse, we have inconsistently applied
Haddad
and.
Kisor.
We have cited both without recognizing the contradiction.
See, e.g., Baker v. Delta Air Lines, Inc.,
We must follow
Haddad.
We believe that our contrary language in
Kisor
inadvertently reversed the presumption of prejudice observed in
Haddad. See Pau,
Apart from its precedential pedigree, we adopt
Haddad’s
formulation of the harmless error standard for the additional reason that we believe it to be correct on the merits. First,
Haddad
is in keeping with “the original common-law harmless-error rule [that] put the burden on the beneficiary of the error either to prove that there was no injury or to suffer a reversal of his erroneously obtained judgment.”
Chapman v. California,
Second, we recognized in
Haddad
that “appellate courts have three possible standards of review: harmless beyond a reasonable doubt; high probability of harmlessness; and more probably than not harmless.”
*701 Third, presuming prejudice, rather than harmlessness, is required by Supreme Court precedent. In O’Neal, the Court rejected both the premise and conclusion of the argument that a presumption of harmlessness applies in civil cases and that therefore such a presumption should apply in habeas cases. The Court held:
[Pjrecedent suggests that civil and criminal harmless-error standards do not differ in their treatment of grave doubts as to the harmlessness of errors affecting substantial rights.... [Ejven if, fftr argument’s sake, we were to assume that the civil standard for judging harmlessness applies to habeas proceedings (despite the fact that they review errors in state criminal trials), it would make no difference with respect to the matter before us. For relevant authority rather clearly indicates that, either way, the courts should treat similarly the matter of “grave doubt” regarding the harmlessness of errors affecting substantial rights, and as Kotteakos provides.
O’Neal,
Thus, when reviewing the effect of erroneous evidentiary rulings, we will begin with a presumption of prejudice. That presumption can be rebutted by a showing that it is more probable than not that the jury would have reached the same verdict even if the evidence had been admitted.
Haddad,
Applying this standard to the facts before us, the Navy would have us hold that it is more probable than not that, the district court’s erroneous exclusion of evidence probative of its alleged discriminatory bias and pretext did not i;aint the jury’s verdict. Although recognizing the burden that an additional trial would place on the parties, we decline to do so.
As we noted in
Haddad:
“The danger of the harmless error doctrine is that an appellate court may usurp the jury’s function, by merely deleting improper evidence from the record and .assessing the sufficiency of the evidence to support the verdict below.”
We cannot conclude, based upon the facts of this case, that the erroneous exclusion of evidence directly probative of the defendant’s discriminatory bias and pretext did not taint the jury’s verdict. The evidence at issue was not merely tangential or cumulative; rather, it was di *702 rectly probative of the central issues in dispute. Although the Dannemiller study is in the record, neither Toyama nor the three Shipyard workers actually testified; we know only what Obrey claimed they would say. We are reluctant to judge a fact-intensive case on the basis of mere proffers of evidence. We thus cannot state that it is more probable than not that the jury was unaffected by the erroneous exclusion of the plaintiffs principal evidence. Accordingly, we hold that the district court’s erroneous exclusion of the Dannemiller study, the testimony of Mr. Toyama, and the anecdotal testimony of three Shipyard workers was an abuse of discretion requiring reversal. The erroneous exclusion was not harmless.
III.
For the foregoing reasons, the judgment of the district court is REVERSED and the case is REMANDED for proceedings consistent with this opinion.
Notes
. The Navy argues that Obrey abandoned his pattern or practice claim at trial. If Obrey did so, it was because the trial court excluded his evidence. Any abandonment was compelled and was not a waiver of the claim.
. The Supreme Court has suggested on several occasions that a statistical comparison is a valuable tool with which to evaluate a claim of employment discrimination.
See, e.g., Furnco Constr. Corp. v. Waters,
. Rule 702 provides:
If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles and methods, and (3) the witness has applied the principles and methods reliably to the facts of the case.
Fed. R. Evid. 702.
. The Navy argues that these comments were not directed at the "locals” — meaning the Asian-Pacific Islanders — but were critical of the general efforts of all Navy employees at the Shipyard. The inferences to be drawn from these comments should be resolved by á jury-
