MEMORANDUM OPINION
Before the Court is defendant’s motion to exclude the testimony of plaintiffs’ statistical expert pursuant to
Daubert v. Merrell Dow Pharmaceuticals, Inc.,
In essence, the parties invite this Court to become enmeshed in a classic “battle of the experts,” but courts are well advised to avoid such a role, absent a showing that the challenged evidence will prove either unreliable or unhelpful to the trier of fact.
See, e.g., Dukes v. Wal-Mart, Inc.,
ANALYSIS
I. Legal Standard
Daubert
made clear that expert testimony should not be considered in a case unless the expert has genuine expertise and [¡he testimony will assist the trier of fact to understand or determine a fact in issue.
If the Court finds Siskin’s opinions to be clearly unreliable, it may disregard his reports in deciding whether plaintiffs have created a genuine issue of material fact.
Munoz v.
Orr,
However, “the question before [the Court] is not whether the reports proffered by plaintiffs prove the entire case; it is whether they were prepared in a reliable and statistically sound way, such that they contained relevant evidence that a trier of fact would have been entitled to consider.”
Id.
at 425. “No one piece of evidence has to prove every element of the plaintiffs’ case; it need only make the existence of
‘any
fact that is of consequence’ more or less probable.”
Id.
(citing Fed.R.Evid. 401). Thus, it may be the case that although the expert’s analysis is admissible, it is nonetheless insufficient to establish a prima facie case of discrimination.
See, e.g., Scales v. George Wash. Univ.,
No. 89-0796,
The party offering the expert’s testimony must establish by a preponderance of the evidence that the expert testimony is admissible and that the expert is qualified.
See Meister,
II. Defendant’s Criticisms of Siskin and His Work
A. Siskin’s Qualifications to Testify About the Analyses of Data
Defendant claims that Siskin’s “unhesitating acceptance” of the data pro *36 vided to him and the analyses done by his staff “violate[ ] the principles of the scientific method,” and on that basis, he must be excluded as lacking the requisite level of reliability. (Def.’s Mot. at 9.) Specifically, defendant argues:
1. Siskin is not an expert in computer programming (see id. at 5 (citing Def.’s Ex. A1 [Siskin dep.] at 134)), and he allegedly “cannot review any computer program language for accuracy.” (Def.’s Reply at 5.) Instead, upon receiving the electronic data from Sodexho, “someone other than Siskin created and ran computer programs, which in turn generated printouts for Siskin to review.” (Def.’s Mot. at 7.) “Shockingly,” Siskin relied on an online discussion in deciding how to structure part of an analysis using the STATA statistical program, and allegedly no one on his team is sufficiently familiar with STA-TA, the program used to analyze the stratified sample data in the logistic regressions. (Id. at 9; Def.’s Reply at 7 & n. 5.)
2. Siskin did not determine whether the programmers conducted the analy-ses he requested and cannot attest to the accuracy of the computer output upon which he relied in drafting his reports. (Def's Mot. at 7-8 (citing his testimony that “hopefully his staff did it correctly,” and “the other side will ... make all the corrections and pick up any errors that occur.”)) Moreover, he relied on analyses generated by “unidentified programmers.” (Id. at 9.)
Essentially, defendant argues that Sis-kin’s reports lack foundation and are unreliable. But defendant’s protestations amount to disputed factual issues, which are insufficient as a matter of law to warrant his exclusion. For instance, Siskin testified that, although he is not familiar with STATA programming, he is familiar with the techniques used in structuring an analysis for the program, and the results were always reviewed for errors after running a program. (Def.’s Ex. A1 [Siskin dep.] at 134-36.) As plaintiffs point out, many senior statisticians design tests, but for various reasons — including costs to the client — :do not personally run them, but instead rely on their assistants to do so, reviewing their output to ensure that the test was properly conducted. (Pls.’ Opp. at 10.) (See also Pls.’ Ex. F (Haworth dep. at 16 in Bryant v. George Wash. Univ., No. 94-5522 (D.D.C. Mar. 21, 1995)) (“A: [U]sually I will work with one of the three senior economists to help get the work done in a reasonable cost effective way. Q: Do you have people to do programming for you? A: Yes.”); Pls.’ Ex. H (Haworth dep. in instant case at 56 (“It probably would be more efficient to have someone else [check back-up data] because it’s been a long time since I programmed.”)); see generally Pls.’ Exs. G & H (naming various programmers and assistants on whom Haworth relies).)
As a Ph. D. statistician, past chairman of the Temple University Department of Statistics, and an expert with decades of experience performing analyses such as those at issue in the instant case
(see
Pls.’ Ex. A(2) at ¶ 1), Siskin need not personally be an expert in STATA in order to be a qualified expert under
Daubert.
Rather, pursuant to Fed.R.Evid. 703, an expert may rely on any facts or data “of a type reasonably relied upon by experts in the particular field,” including facts, data, and opinions that are otherwise inadmissible. This includes relying on one’s assistants to carry out analyses that the expert designed.
See Astra Aktiebolag v. Andrx Pharmaceuticals, Inc.,
In
Derrickson v. Circuit City Stores, Inc.,
No. 95-3296,
In his report, Dr. Medoff presented various tables regarding compensation and promotions at Circuit City, and he based his opinion in part on the information in those tables. That information, however, is not the raw data produced by Circuit City. Instead, the tables are based on data that was subjected to various selection, aggregation and weighting processes performed by Dr. MedofFs assistant. Dr. Medoff testified at deposition that he told his assistant in general terms what manipulations and analyses he wanted performed on the data. His assistant then wrote a series of computer instructions using a commercially available statistics program that analyzed the data and produced the tables upon which Dr. Medoff relied.
Id.
at
Furthermore, defendant’s reliance upon
Washington v. Vogel,
B. Siskin’s Knowledge About Sodex-ho and the Lack of Independent Verification of Sodexho’s Data
Defendant argues that Siskin did not conduct independent investigations and does not know how different divisions in the company function, which purported failures “diminish[ ] unacceptably” the reliability of his opinions. (Def.’s Mot. at 22, 26.) Defendant also contends that Siskin failed to independently verify the data that defendant supplied him. (Id. at 6.) Plaintiffs respond that Siskin’s analyses did not require intimate familiarity with the inner workings of Sodexho, because as a statistician, his role was’not to interview individual Sodexho employees to determine what they considered to be a promotion, but rather his job was to look to the company’s own internal data in determining what constituted a promotion. (Pls.’ Opp. at 12.) Plaintiffs further note, that Haworth and Siskin both relied on basically the same data, such as Sodexho’s MARRPAY database, which was used to analyze, promotions and which was “the most accurate and reliable data available.” (Id. at 11 n. 8, 12.) 3
The Federal Rules of Evidence specifically provide that an expert may rely on facts or data “perceived by or made known to the expert at or before the hearing.” Fed.R.Evid. 703.
See also Gussack Realty Co. v. Xerox Corp.,
Furthermore, the cases defendant cites to support its contention that Siskin’s “ignorance” about the company are inappo-site. As a statistician working with company-supplied data, Siskin does not need an understanding of “what the laundry operation does or how it conducts its operations.” (Def.’s Mot. at 22 n. 7.)
C. Purported Errors in Siskin’s Reports
Defendant claims that Siskin frequently relied upon analytic output gener *39 ated from the wrong data files (Def.’s Mot. at 10), and his various reports are so riddled with errors as to render them unreliable. (Id. at 11-19.) In his declaration, Siskin rebuts each alleged “error,” and according to plaintiffs, the supposed errors are, in many cases, not mistakes at all, but rather misrepresentations of the record; some of the supposed “errors” were corrected in subsequent reports; and even where flaws may arguably exist, they are minor in nature and do not affect Siskin’s conclusions. 4 (See Pls.’ Opp. at 19-24 (“the few times that mistakes were made in the over 200 pages of reports and supporting tables that Dr. Siskin submitted in this case, they were of a minor nature, and he often corrected them and resubmitted the corrected versions to defendant”).) For example, defendant alleges that Siskin “ ‘made up’ promotions that did not actually exist at Sodexho in order to gain a significant outcome.” (Def.’s Mot. at 13.) Plaintiffs (and Siskin) rebut this charge by explaining that in fact Siskin was making an assumption to test an hypothesis and was not asserting facts, and he identifies the hypothesis as such in his report. (Pls.’ Opp. at 22; Pls.’ Ex. A [5/04 Siskin decl.] at ¶ 13.)
The gist of defendant’s argument is that Siskin’s work product is so bad that his testimony should be excluded. (See Def.’s Mot. at 3 (“His failure to carefully review the analytic output provided to him combined with his repeated production of incorrect information in this case should cause the Court to conclude that all of his work in this case is inherently unreliable and as such inadmissible.”).) However, beyond enumerating alleged errors, defendant fails to explain how these errors had any substantial bearing on the reliability of his reports, particularly when most purported mistakes were not errors to begin with or were contained only in early reports that were subsequently corrected or clarified. Absent such a showing of materiality, and in light of Siskin’s reasonable explanations for many of the purported deficiencies, the Court is unprepared to exclude his testimony on this basis. Further, the fact that Siskin was open to and in fact did correct deficiencies in his preliminary reports argues for the reliability of his testimony, not for its exclusion.
Moreover, insofar as many of Siskin’s “errors” result from a different understanding of the facts, his methodological decisions are not properly the subject of a
Daubert
motion. Instead, such issues are addressed in the Court’s accompanying summary judgment Memorandum Opinion, and as held there, they may ultimately need to be resolved by the finder of fact in determining what weight to give Siskin’s analysis.
See Adams,
It is clear that the Court exercises a “gatekeeper” function,
see Gen. Elec. Co. v. Joiner,
Moreover, focusing on errors Siskin may have made earlier during years of data gathering and refining of analyses misses the mark, for plaintiffs have shown that the end products on which they now rely meet the Daubert standard. Neither the record nor the cases cited by defendant support the contention that Siskin’s testimony should be excluded because, even though his final products are reliable, his alleged carelessness earlier in the process disqualifies him as an expert.
In
sum,
any errors
in
Siskin’s analysis do not rise to the level of warranting exclusion, but rather bear only on the proba-tiveness of his reports and his credibility. Despite defendant’s arguments to the contrary, its contention that Siskin’s statistical methodology is “so flawed” that it should be deemed inadmissible as a matter of law is rejected; the issue is one of the evidence’s weight, not its admissibility.
See Diehl v. Xerox Corp.,
Similarly, the Court rejects defendant’s contention that Siskin’s testimony must be excluded because he is untrustworthy, as allegedly evidenced by his “willing[ness] to mislead the Court.” (Tr. at 6.) Defendant maintains that, regardless of Siskin’s experience and expertise, contradictions between his depositions and declarations show that he would also be willing to mislead a jury. (Id. at 3, 6.) The Court notes as a threshold matter that the purported inconsistencies and misstatements are not in fact necessarily contradictory or false. (See, e.g., id. at 6 (criticizing Siskin for stating in his deposition that he could not “do a good job” of reviewing computer programs to ascertain whether the programming was proper, but stating in his subsequent declaration that he can determine from the resultant computer output whether the underlying programming was flawed).) Further, these are the type of criticisms that go to credibility, rather than Daubert’s standard of admissibility, particularly where defendant is unable to show that any such alleged misstatements and mistakes have affected the bottom-line reliability of Siskin’s analyses and testimony.
D. Siskin’s Definition of Promotion Within Sodexho
Defendant contends that Siskin improperly defined promotion in generating data for his analyses, and that because his definition does not comport with the evidence, his analyses must be rejected. (Def.’s Mot. at 30, citing,
inter alia, McCleskey v. Kemp,
Moreover, many of the supposed flaws are rebutted by Siskin or were addressed by his subsequent reports. For instance, defendant argues that Siskin did not include certain job moves that individual plaintiffs claim should have been analyzed as promotions. (Def.’s Mot. at 29.) As such, according to defendant, his analysis is based on assumptions that do not match the evidence in the case. In response, plaintiffs explain that Siskin did include the very promotions that defendant identifies. (Pls.’ Opp. at 15.) Plaintiffs further argue that Siskin properly arrived at a consistent definition of promotion for statistical purposes instead of relying upon what a particular individual considered to be a promotion. (Id. at 16.)
Defendant also claims that in Siskin’s October 2001 report, he failed in some instances to use the Sodexho MARRPAY computer database’s definition of promotion (certain job moves coded as 3 or 4). (Def.’s Mot. at 31.) As shown by plaintiffs, whether he failed to look at the two codes is irrelevant, because, based on those codes, only nine promotions were included in Haworth’s data that were not in Sis-kin’s. (Pls.’ Opp. at 15.) Defendant likewise criticizes Siskin for not including as a promotion any increase in grade, but rather identifying a promotion only where an employee’s job title changed. (Def.’s Mot. at 30-31.) But this is a mischaracterization — Siskin’s analysis was based on occupation codes, not job titles. (Pls.’ Opp. at 15.) Furthermore, defendant contends that Siskin failed to count many promotions because he created files by identifying employees present at work at the beginning of each year, and as a result, those employees who had been promoted but then left before the beginning of the next year were not included. (Def.’s Mot. at 31.) Additionally, if an employee had been promoted more than once in a year, she would be counted as a single promotion. (Id.) But these contentions are largely irrelevant because they ignore the fact that it was only Siskin’s first report that relied on yearly data, and by the time of his November 2001 report, the data included month-to-month changes. (Pls.’ Opp. at 16.)
Generally, plaintiffs explain the inconsistencies identified by defendant by reference to the fact that Siskin’s initial October 2001 report analyzed data available to him at that time, and that after he received new files from defendant containing more complete data, his report the next month took that data (Market Reference Rates (“MRRs”)) into account, confirming the promotion shortfall he had initially observed. (Id.) Thereafter, his September 2003 report was again refined to respond to additional evidence that had not been in Sodexho’s computer records but which Ha-worth had pulled from personnel files. As contended by plaintiffs, “Dr. Haworth’s analysis unjustifiably included all of these undocumented events as promotions, and Dr. Siskin responded by showing that a much more reasonable adjustment to account for these events would still have shown promotion differences that are statistically significantly adverse.” (Id.) Thus, he was “not changing his definition of promotion but simply demonstrating that even a liberal interpretation of these undocumented events as representing promotions would not alter his findings in this case.” (Id. at 16-17.)
More specifically, plaintiffs have proffered evidence to raise substantial factual issues as to whether defendant’s expert correctly defined promotion for purposes *42 of this litigation, thereby precluding any finding that Siskin’s definition is “wrong.” (Id. at 14-15.) The term “promotion” is defined in Sodexho’s official personnel manuals to mean either a move from one grade or band to the next higher grade or band, or, more ambiguously, “career moves” to positions within the same band but with higher MRRs. Plaintiffs (and Sis-kin) think that the most accurate way of capturing this definition when working with Sodexho’s computerized MARRPAY data is to define a promotion as an increase in grade or band, or a change in job or unit with either a salary change code of 3 or 4 or an increase in MRR. Plaintiffs contest Haworth’s definition of promotion on the grounds that it does not mirror Sodexho’s actual operations because she considers an employee who remains in the same job, in the same unit, at the same pay and same salary band but with an increase in MRR (which may simply be due to a new market survey, which reflects external factors rather than any factor related to an individual employee) 5 as having been “promoted.” (Id. at 14 n. 11.) In sum, while Siskin concedes that his definition of promotion may be somewhat under-inclusive, he persuasively demonstrates that Haworth’s definition may be overin-clusive. (Def.’s S.J. Mot. Ex. 45 [12/03 Siskin report] at 2.)
When the factual underpinnings of an expert’s opinion are in dispute, it is not the role of the court to determine the correctness of the facts underlying the expert’s testimony.
Defendants confuse the requirement for sufficient facts and data with the necessity for a reliable foundation in principles and method, and end up complaining that Fiorito’s testimony was not based on “reliable facts.” The parties disputed many of the facts relevant in determining a reasonable royalty .... When, as here, the parties’ experts rely on conflicting sets of facts, it is not the role of the trial court to evaluate the correctness of facts underlying one expert’s testimony.
Micro Chemical, Inc. v. Lextron, Inc.,
When facts are in dispute, experts sometimes reach different conclusions based on competing versions of the facts. The emphasis in the amendment on “sufficient facts or data” is not intended to authorize a trial court to exclude an expert’s testimony on the ground that the court believes one version of the facts and not the other.
Id. See also Pipitone v. Biomatrix, Inc.,
Indeed, as the Supreme Court stated in
Daubert,
“[vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but admissible evidence.”
E. The Time Period Covered by Sis-kin’s Analyses
Defendant takes issue with the fact that Siskin considered data from *43 1995-2001 in his analysis. Defendant argues that statistics are invalid on their face if they fail to exclude acts of discrimination that are outside the class period (March 27, 1998 through July 1, 2001). (Def.’s Mot. at 84.)
Defendant’s argument is, however, wrong as a matter of law. Cases in this Circuit clearly permit analysis of employment data that predates the class time frame.
See, e.g., Palmer v. Shultz,
F. Siskin’s Destruction of Underlying Documents
Defendant complains that Siskin’s process “cannot be replicated” because he destroyed the very documents from which he obtained the numbers for his reports. (Def.’s Mot. at 7-8 (accusing Siskin of throwing away or otherwise discarding the programs and output upon which he relied).) He produced “computer programs and output” for reports that are dated after the reports themselves, and thus, defendant does not have the handwritten notations he initially made about the programs. (Id. at 8.) Defendant also complains that Siskin failed to produce drafts of his reports that were sent to counsel and counsel’s comments. (Id. at 35.) Significantly, defendant does not claim that it has been prejudiced by this alleged failure to produce. It claims only that the “process cannot be replicated.” (Id. at 35.)
Plaintiffs claim that they turned over drafts of Siskin’s reports in his files and in counsel’s files and that Siskin provided Haworth with “exact replicas” of the documents he reviewed to prepare his reports. (Pls.’ Opp. at 8.) According to plaintiffs, Siskin’s practice was to review printouts from programs he instructed his staff to run, which printouts were subsequently discarded. However, the underlying data was preserved in original form so identical printouts could be and were created for Haworth; the printouts were even annotated, explaining which data corresponded to which table in Siskin’s reports. (Pls.’ Opp. at 9.) Haworth, by contrast, only provided her back-up data to Siskin in electronic form. (Id. at 9 n. 7.)
Fed.R.Civ.P. 26(a)(2)(B) requires that an expert produce “the data or other information considered by the witness in forming the opinions.” It is far from clear that Siskin did not meet this standard, and in fact, it appears that if he did not, Ha-worth’s noncompliance was even more striking. But more importantly, any failure to produce documents is not a basis for invoking exclusion under Daubert.
G. Siskin’s Segregation Analyses
Defendant claims that Siskin is not an expert on segregation analyses, and *44 he is therefore not qualified to offer such an analysis into evidence. (Def.’s Mot. at 36, Def.’s Reply at 16.) During his deposition, Siskin could not recall two segregation analysis-related arithmetical terms (Def.’s Mot. at 37), even though they appeared in an article that Siskin had read, but in the same deposition Siskin also explained that he performed the exact analy-ses that those terms describe. (Pls.’ Ex. A [5/04 Siskin decl.] at ¶ 28; Def.’s Ex. A(1) [Siskin dep.] at 495-97.) Although Siskin has never before offered a segregation analysis into evidence, he has performed such analyses during his career. (Pls.’ Ex. A [5/04 Siskin decl.] at ¶ 27.) Further, as a well-regarded statistician, Siskin is qualified to perform segregation analyses, which, as pointed out by plaintiffs, are “not qualitatively different from the statistical studies he has performed and analyzed over the last 30 years.” (Pls.’ Opp. at 24.)
“The
Daubert
test must be applied with due regard for the specialization of modern science. A scientist, however well credentialed he may be, is not permitted to be the mouthpiece of a scientist in a different specialty.”
Dura,
Defendant also attempts to cast doubt on Siskin’s segregation analysis because of the small size of his sample. (Def.’s Mot. at 36.) Although problems may arise when segregation analyses are conducted for small samples if gauged against “evenness,” that is not the approach Siskin used. (Carrington & Troske, at 405; Def.’s Ex. A1 [Siskin dep.] at 353.) Rather, he followed the approach proposed by Carrington and Troske, which measures segregation by looking at deviations from randomness and which is more accurate with small samples. (Pls.’ Ex. A [5/04 Siskin decl.] at ¶ 27.) As such, the Court sees no reliability problem in admitting this evidence and leaving it to the jury to determine its probativeness. 6
H. The Utility of a Breslow-Day Analysis
At the oral argument on this motion on December 3, 2004, defendant presented for the
first time
in nearly four years of litigation a “Breslow-Day” analysis that it claims is vital to establishing the reliability of Siskin’s aggregated pools analysis. (Tr. at 41, 136.) The Court refused to consider the results of BreslowDay analysis since neither the Court nor plaintiffs’ counsel had seen this analysis prior to the hearing.
7
(Id
at 124-25.) But
*45
defendant nonetheless argued at the hearing that plaintiffs bear the burden of showing that Siskin’s pools analyses are sufficiently homogeneous to be admitted.
(Id.
at 136-37.) While this is a correct statement of law,
see Meister,
I. Prior Rulings Regarding Siskin’s Testimony
Finally, defendant insinuates that this Court should deem Siskin an unqualified witness, because other courts have previously discounted his opinions.
See, e.g., Calloway v. Westinghouse Elec. Corp.,
Of course, the question is not whether courts have admitted Siskin’s testimony in the past, but rather whether his testimony in the instant case is sufficiently reliable and relevant as to warrant admission here. Both Siskin and Haworth have testified numerous times in Title VII cases, including on the same side
(see, e.g., Green v. United States Steel Corp.,
CONCLUSION
The instant motion has done much to bring home the truth of the Supreme Court’s observation in
Teamsters:
“[Statistics ... come in infinite variety .... [T]heir usefulness depends on all of the surrounding facts and circumstances.”
Notes
. Defendant's contention that Siskin’s testimony must be rejected because he did not consider the major, nondiscriminatory variables (education and experience) that purportedly played a role in the promotion decisions at issue (Def.’s Mot. at 26) goes to the heart of defendant's summary judgment motion, and therefore, it will not be addressed here, but is considered in the Court’s Memorandum Opinion relating to that motion.
. Defendant cites
Dura
for the proposition that the expert must be able to testify that he "supervised his staff carefully," claiming that Siskin cannot and did not do so here.
Dura,
however, does not counsel exclusion of the expert if the expert cannot (or did not) appropriately supervise the work of his assistants. Instead, the proper approach is to question the expert regarding his reliance on the assistant's work. "The opposing party can depose them in order to make sure they performed their tasks competently; and the expert witness can be asked at his deposition whether he supervised them carefully and whether his relying on their assistance was standard practice in his held.”
Dura,
. It is also'noteworthy that Siskin did in fact interview several present and former Sodexho employees regarding defendant's reliance on RVP codes. (See Pls.’ Opp. to Def.'s S.J. Mot. Ex. 158 [5/04 Siskin decl.] at ¶¶ 7-10).
. The Court notes that' — as is probably inevitable in a case as complicated and based on so much data and different analyses as this one — Haworth also apparently made errors. (See, e.g., Def.’s S.J. Mot. Ex. 45 [12/03 Siskin report] at 11 n. 6.)
. Sodexho's 1999 Management Career Structure Salary Administration Guidelines explain MRRs as follows: "An MRR represents the competitive or 'going' rate for a position within the national market.” (See Pls.’ Opp. to Def.'s S.J. Mot. Ex. 105.)
. It is noteworthy that, of defendant's numerous complaints about Siskin’s segregation analysis, Sodexho does not actually claim that Siskin's segregation analysis was improperly performed, and apparently Haworth dropped her major objections to it after Siskin pointed out that her main criticism was erroneous. (Pls.’ Ex. A [5/04 Siskin decl.] at ¶ 29.)
. Defendant also complained at the hearing that Siskin re-ran his multiple regression analysis following the close of discovery. (See, e.g., Tr. at 129-30.) But the evidence in question had been before the Court since May 2004 and defendant responded to his new regression analyses in its reply brief and in Haworth’s accompanying declaration. (See, *45 e.g., Def.’s S.J. Mot. Ex. 70 [5/17/04 Haworth decl.] at ¶¶ 10-12.) Moreover, Siskin’s introduction of analyses merely modifying preexisting results a few months after the close of discovery, at a time when defendant had ample time to respond in whatever manner it saw fit, is not comparable to defendant's attempted introduction of an entirely new form of analysis during oral argument.
. Similarly, the Court will not consider the computer codes introduced by defendant for the first time at oral argument, which purportedly show that Siskin did not code various analyses in the same manner as Haworth. (Tr. at 141.)
