Plaintiffs-appellants appeal from a final judgment of the United States District Court for the Western • District of New York (David G. Larimer, Chief Judge) entered January 16, 1998, granting summary judgment for Xerox Corporation (“Xerox”) on the plaintiffs’ employment discrimination claims based on both disparate treatment and disparate impact theories.
I. BACKGROUND-
A. Facts
The facts of this case are more fully set forth by the district court in its decision, see Wado v. Xerox Corp.,
In late Fall 1993, Xerox announced plans for a world-wide involuntary reduction in force (“IRIF”) which would reduce its 97,-500 member workforce by about 10,000 persons over the next two to three years. Each decentralized organization within Xerox was responsible for determining whethеr and by how much its workforce would be reduced. The organizations that chose to eliminate positions utilized the same decision-making process to determine which employees to retain.
In each work-unit an immediate supervisor ranked each employee in Work Quality, Work Speed, Work Orientation, and Work Skills, entering the scores on a Contribution Assessment Form (“CAF”). The Work Quality category purported to measure reliability and accuracy, as well as use of methods, tools, and processes. The Work Speed category was intended to measure the employee’s ability to plan, prioritize, execute a plan, and meet due dates. Work Orientation included action orientation, business orientation, team orientation, and customеr orientation. Work Skills were assessed as to adequacy, self-development, and continuous learning. The employee was given a score of 0-5 in each of the four areas, for a total of 0-20 points. A group of senior managers then reviewed the CAFs from each work-unit for fairness and consistency and made any adjustments deemed warranted.
Subsequent to receiving a final score of 0-20, the employees were stack-ranked on a matrix against other employees from their respective work-units. The vertical axis of the matrix represented the employee’s total CAF score and the horizontal axis represented years of service at Xerox, either less than 20 years or greater than or equal to 20 years. Selections for terminаtion were then made in a pattern of assessment score/tenure combinations that favored workers with greater years, with the exception of certain employees with special skills. For example, out of two employees each receiving a CAF score of 12, the employee with less than twenty
B. Proceedings Below
Fifteen Xerox employees selected for termination as part of the 1994 wave of the IRIF each filed suit against Xerox in federal district court pursuant to a Right to Sue letter issued to each complainant by the Equal Employment Opportunity Commission (“EEOC”). In their respective complaints the plaintiffs asserted various theories of employment discrimination under the following: (1) the Age Discrimination in Employment Act, 29 U.S.C. § 621 et seq. (“ADEA”), (2) Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq. (“Title VII”), (3) the Americans with Disabilities Act, 42 U.S.C. § 12101 et seq. (“ADA”), and (4) the New York State Human Rights Law, N.Y. Exec. Law § 296 (“NYSHRL”).
Xerox moved for summary judgment against all plaintiffs on April 21, 1997. The court consolidated the actions pursuant to Fed.R.Civ.P. 42(a) since they involved common questions of law and fact. The court heard oral argument on December 5, 1997 and granted the defendant’s
motion for summary judgment on January 16, 1998. In its ruling on the defendant’s summary judgment motion, the court first addressed the disparate impact claims brought by thirteen
The court next addressed the non-statistical evidence presented by each plaintiff to prove the respective disparate treatment claims. The court assumed that each plaintiff had made out a prima facie case of discrimination and focused on whether each plaintiff raised a genuine issue of material fact as to whether Xerox’s legitimate nondiscriminatory reason
Plaintiff Pedro Santiago also raisеd a retaliation claim under Title VII, contending that Xerox had terminated him because he had previously complained that he was being discriminated against because he is Hispanic. Noting that the plaintiffs most recent complaint was made four years before he was terminated, the court held that the plaintiff had not established any causal connection between his protected activity under Title VII and his termination. Id. at 202.
Three plaintiffs, Philip Cufari, Eugene Hosenfeld, and Patricia Rake, also asserted below that Xerox had discriminated against them because they were disabled. The court found as to all three of them that they failed to connect their disabilities to their terminations in any manner, as required by the ADA. See id. at 196, 200, 209-210. The court further held as to Cufari that he did not makе out a prima facie case because he failed to show that he was disabled within the meaning of the ADA. See id. at 209-210.
Twelve of the plaintiffs timely filed a notice of appeal.
II. DISCUSSION
On appeal, the plaintiffs contend that the district court erred in finding that their statistical evidence was not probative of either disparate impact or disparate treatment. In addition, each plaintiff argues that he or she presented sufficient non-statistical evidence to support a jury finding that Xerox used the IRIF as a pretext to discriminate on the basis of age or gender. This Court reviews a district court’s grant of summary judgment de novo. See Young v. County of Fulton,
A. Disparate Impact Claims in General
Ten of the twelve appellants claim that the IRIF disparately impacted certain groups of workers, specifically, employees 40 years of age or older under the ADEA, and either men or women under Title VII, depending on which plaintiff made the claim. A plaintiff need not prove discriminatory intent to make out a claim of disparate impact. See Griggs v. Duke Power Co.,
Once the plaintiff establishes a prima facie case, the employer must make a showing of explain the business necessity of the challenged employment practice. See Griggs v. Duke Power Co.,
B. Statistics in Disparate Impact Cases
Although no bright line rules exist to guide courts in deciding whether plaintiffs’ statistics raise an inference of discrimination, several overarching principles inform the issue. Among these is Congress’s intent that employers not be required to treat any individual or group preferentially because of a protected characteristic or tо establish a numerical quota system. See 42 U.S.C. § 2000e-2(j). Accordingly, the Supreme Court has established safeguards to prevent these results. First, plaintiffs are required to identify a specific employment practice, rather than rely on bottom line numbers in an employer’s workforce. See Watson,
In evaluating disparate impact claims under Title VII, this Court has primarily relied on two methods of measuring disparities between groups. First, we have considered persuasive the EEOC Guideline that states that:
A selection rate for any race, sex, or ethnic group which is less than four-fifths (%)(or eighty percent) of the rate for the group with the highest rate will generally be regarded by Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by Federal enforcement agencies as evidence of adverse impact. Smaller differences in selection rate may nevertheless constitute adverse impact, where they are significant in both statistical and practical terms....
29 C.F.R. § 1607.4D (1998). See, e.g., Waisome v. Port Authority of New York & New Jersey,
As an alternative measure of differences between groups, we have also looked to whether the plaintiff can show a statistically significant disparity of two standard deviatiоns. A standard deviation is a measure of variance from the mean (or average) value in a given sample. Basically, looking at standard deviations indicates how far an obtained result varies from an expected result. See Waisome,
Although courts have considered both the four-fifths rule and standard deviation calculations in deciding whether a disparity is sufficiently substantial to establish a prima facie case of disparate impact, there is no one test that always answers the question. Instead, the substantiality of a disparity is judged on a case-by-case basis. See Watson,
C. Plaintiffs’ Statistics
The plaintiffs’ statistician, Dr. Philip Smethurst, ran statistical tests for each of the plaintiffs disparate impact claims. Generally, he attempted to group each plaintiff with the coworkers to whom that particular plaintiff was compared for selection purposes. However, when the number of persons in a particular unit was too small to yield a statistically valid result he pooled units that he thought were reasonably homogeneous. For example, plaintiff John Smith worked in the Integrated Supply Chain (“ISC”) which consisted of 111 employees before the IRIF. Smethurst determined that this number was insufficiently lаrge to yield valid statistical results and thus decided to combine ISC (also known as Ml) with M2, both of which were Manufacturing Support groups within Corporate Strategic Services. The combination increased the group size to 2480 pre-IRIF employees.
On each plaintiffs work-group, many of which were reconstituted as described above, Smethurst performed hypothesis testing using a “t-test.” This methodology posits a null hypothesis, in this case that there was no disparity between the two groups compared, i.e., persons under 40 years of age compared to persons 40 or over, or women compared to men, as to rate of selection for retention. The data are analyzed using the t-test to determine whether the null hypothesis can be rejected, given a sеlected level of statistical significance, which in this case was p=.05, or 95% certainty. That is, if the null hypothesis can be rejected, then we can be 95% certain that chance does not account for the favored group of employees having a higher probability of being selected for retention. See Ramona L. Paetzold & Steven L. Willborn, The Statistics of Discrimination § 2.04 at 9-14 (1998). The results of a t-test can only tell us that it is very unlikely that chance is responsible for a disparity, this method cannot pinpoint what the, causative factor is. See id. § 2.04 at 10 n. 12. Another way of saying that a t-test is statistically significant at the p=.05 level is to say that the obtained result, i.e., an observed difference between the two groups compared, varied from the expected result, i.e., no difference between the two groups compared, by two standard deviations. Cf. Waisome,
Dr. Smethurst discovered statistically significant results for each of the t-tests he executed, indicating that the IRIF ad
D. Plaintiffs’ Claims
1. Disparate Impact
This Court generally assesses claims brought under the ADEA identically to those brought pursuant to Title VII, including disparate impact claims.
All plaintiffs to the current action identify the overall decision-making process utilized in the 1994 wave of the IRIF as the specific employment practice which allegedly had an adverse impact on older (or male) workers. Xerox argues that a decision-making process cannot constitute a specific employment practice. Defendant is correct that a plaintiff generally cannot rely on the overall decision-making process оf the employer as a specific employment practice. See Wards Cove,
After specifying the employment practice allegedly responsible for excluding members of their protected class from a benefit, plaintiffs must identify the correct population for analysis. In the typical disparate impact case the proper population for analysis is thе applicant pool or the eligible labor pool. The composition of this population is compared to the composition of the employer’s workforce in a relevant manner, depending on the nature of the benefit sought. See, e.g., Hazelwood Sch. Dist. v. United States,
The corresponding population in a reduetion-in-force situation consists of workers subject to termination. As in a promotion scenario like that of Waisome, the relevant population is divided into protected and non-protected groups and the selection rates of the two groups are compared. See, e.g., AFSCME,
In the present case, Xerox planned to reduce its workforce of approximately 97,500 by about 10,000. In order to determine whether the IRIF had a disparate impact on older (or male) workers we would first need to know how many of these 97,500 employees were subject to termination in the 1994 wave of lay-offs as part of which the plaintiffs were fired. From that population the rate of retention for older (or male) employees and the rate of retention for younger (or female) workers should be calculated. These rates could then be compared to ascertain whether there is a “gross statistical disparity” in the selection rates between groups. Waisome,
This idea can be illustrated by considering what would happen if the plaintiffs are correct in their claims that their respective supervisors purposely lowered the CAF scores of older (or male) employees. If that were the case, we would certainly expect to see a significant disparity in the selection rates betwеen older and younger (or male and female) workers in those work-groups. However, the facially neutral selection process would not be the cause of the disparity; instead, the difference in retention rates would stem from the intentional discrimination of those supervisors. Under these circumstances, we would not know if the IRIF decision-making process itself caused a disparate impact. And indeed, given that only 8,444 employees were included in the plaintiffs’ analyses, it is very possible that when the total number of workers subject to termination in 1994 were considered, the differences caused by the intentional discrimination in these discrete work-groups would no longer be reflected in the statistics. The bottom line is we cannot reasonably infer from statistics basеd on a sub-set of work-groups that the IRIF caused the observed disparate impact, rather than some other factor relevant only to those work-groups.
The problematic nature of isolating work-groups in this manner is further highlighted by the results obtained, in this case. In plaintiff Judith Caruana’s work-group, workers 40 years of age and older were retained at 88.79% the rate of workers who had not yet reached their fortieth birthday. In Harold Wado’s work-group, older workers were retained at 96.69% the rate of younger workers. The difference in retention rates for the remaining eight plaintiffs are the same or lie somewhere between these two figures: Smith= 89.29%; Lalik= 89.29%; Bernhard= 96.69%; Hamann= 90.99%; Gusciora= 93.77%; Rake= 90.99%; Hosenfeld= 90.36%; and Santiago = 90.86%. A similarly large range of selection rates was discovered in the work-groups chosen by Dr. Smethurst for analysis of the gender-based claims. In plaintiff Harold Wado’s group the retention rate of the protected group was 98.59% that of the favored group. The retention rate of men in plaintiff Eugene Hosenfeld’s work-group was 90.36% that of the retention rate for women.
In some workforces, a disparate impact might well be actionable if older workers were retained at 88.79% of the rate for younger workers, but not if the comparison were 96.69%, especially considering the finding of statistical significance. Yet, it would be nonsensical for a court to decide that only some of these plaintiffs established a prima facie case of disparate impact when they all purport to specify the identical employmеnt practice as causing a disparate impact. The decision-making
For these reasons, we conclude that plaintiffs relied on the wrong population for their statistical analyses. It is only reasonable to infer a disparate impact from the IRIF decision-making process if all persons who were subject to the process are included in the analysis.
This is not to say that there is no disparate impact on a protected group as long as the bottom line numbers show no adverse effect. An employer may not defend against a disparate impact claim by arguing that its workforce is balanced overall. See Connecticut v. Teal,
In other words, if the plaintiffs in this case had demonstrated statistically that some portion of the IRIF decision-making process, such as the evaluation of work speed, produced a disparate impact on older (or male) workers, Xerox could not defend by showing that overall the same percentages of older and younger (or male and female) workers were selected for retention. However, the plaintiffs here alleged that the overall decision-making process itself, not some component thereof, resulted in an adverse effect on older (or male) workers. Having chosen the overall process, they must present statistics that support that contention. As discussed above, isolating a few work-groups and analyzing the effect of the IRIF on each work-group is misleading at best. Cf. Fisher v. Vassar College,
Accordingly, we affirm the district court’s holding as to the lack of probative value of plaintiffs’ statistics for the disparate impact claim, albeit for sоmewhat different reasons than those on which the court below based its holding. The plaintiffs did not use incorrect statistical methodology, they applied appropriate analyses, but mismatched the population and the specific employment practice.
2. Disparate Treatment
A plaintiff may also present statistical findings as circumstantial evidence of intentional discrimination. See Hollander v. American Cyanamid Co.,
Xerox argues persuasively that plaintiffs’ statistics are inadequate to support the individual plaintiffs disparate treatment claims both because the work-units were pooled incorrectly and because Smethurst should have conducted multiple regression analyses to control for each plaintiffs performance evaluation. First, Dr. Smethurst pooled some plaintiffs into work-groups that included workers to whom the plaintiff was not directly compared in the IRIF process and who were, in fact, rated by other decision-makers. Because intent is the critical issue, only a comparison between persons evaluated by the same decision-maker is probative of
Moreover, plaintiffs’ statistical analyses fail to account for other possible causes for the fact that older (or male) workers were more likely to be terminated. See Hollander,
III. CONCLUSION
The district court correctly held that plaintiffs’ statistics were of little probative value in determining whether Xerox’s 1994 reduction-in-force caused a disparate impact on employees forty years of age or older or on male employees. The plaintiffs chose the overall decision-making process utilized in the IRIF as the specific employment practice which allegedly caused a disparity. Yet, plaintiffs’ statistical analy-ses, which isolate work-groups rather than assessing the effect of the decision-making process on the population of Xerox employees subject to termination, do not support a finding that the IRIF decision-making process resulted in a harsher effect on the protected groups. As to plaintiffs’ disparate treatment claims, the district court correctly held that the proffered evidence, both statistical and non-statistical, did not suffice to raise an inference of intentional discrimination. We have considered аll of plaintiffs’ other claims raised on appeal
Notes
. Beyond mentioning that the plaintiffs alleged violations of the New York State Human Rights Law, the district court did not explicitly evaluate the plaintiffs’ claims under that law. However, since claims under the NYSHRL are analyzed identically to claims under the ADEA and Title VII, the outcome of an employment discrimination claim made pursuant to the NYSHRL is the same as it is under the ADEA and Title VII. See Leopold v. Baccarat, Inc.,
. Philip Cufari and Salvatore Catalano stipulated to dismiss their disparate impact claims.
. Multiple regression analysis is a statistical test which identifies factors, called independent variables, that might influence the outcomе of an observed phenomenon, called a dependent variable. In the employment discrimination context the dependent variable is the employment decision, such as hiring, promotion, termination. The statistician identifies legitimate factors that could have influenced the decision, e.g., education and experience, and determines via multiple regression analyses how well these legitimate factors account for the employment decision. In this manner the influence of a protected characteristic on the employment decision can be statistically isolated. See Ottaviani v. State Univ. of New York,
. A plaintiff in a disparate impact case usually complains that persons from the favоred group were "selected” for some benefit, often related to hiring or promotion, at a greater rate than members of the protected class to which the plaintiff belongs. In the case of a reduction-in-force the use of the word "selection” is somewhat counter-intuitive, since plaintiffs are complaining that they were selected for termination, which is hardly a benefit. In order to utilize the word "selection” in a manner consistent with its usage in the majority of disparate impact cases, we will refer to being selected for retention.
. We acknowledge that a different analysis may apply to claims brought under the ADEA under some circumstances, because age tends to be highly correlated to certain factors an employеr is permitted to consider when making employment decisions, such as pension status. See Hazen Paper Co. v. Biggins,
. The viability of the disparate impact theory under the ADEA is far from settled among the circuits. Several circuits have rejected or called into question the availability of a disparate impact cause of action under the ADEA in light of Hazen Paper. See, e.g., Mullin v. Raytheon Co.,
.As for the gender-based claims brought by the two female plaintiffs, Judith Caruana and Patricia Rake, their own expert found that there was no statistically significant difference in retention rates between male and female workers in their work-groups. In fact, females were retained at a slightly higher rate than males, 89% versus 85% in Caruana’s work-group and 89% versus 87% in Rake’s work-group. Since there was no disparate impact on women even on plaintiffs’ terms, we need not consider the validity of their statistical analyses as to these two claims.
. We would be presented with a different proposition if the groups used by plaintiffs’ expert purported to be randomly drawn samples of the total population of Xerox workers subject to the IRIF. A properly chosen random sample that evidenced a disparate impact would reflect a disparate impact on the entire population. However, the groups chosen by plaintiffs' expert were not randomly drawn from all such Xerox workers.
. Because we hold that plaintiffs’ statistical methodology was flawed we need not decide whether these selection ratеs would serve to establish a prima facie case of disparate impact. However, it is interesting to note that even were we to accept the plaintiffs’ statistics as valid, no plaintiff demonstrated that the retention rate for the protected group was less than 80% of that of the group supposedly favored by the IRIF selection process.
. Of course, this point assumes that Xerox did not direct supervisors to treat older (or male) workers more harshly, but that certain supervisors may have chosen on their own to discriminate. The assumption is warranted in this case because plaintiffs present no evidence that Xerox instructed all of its supervisors to give lower scores than were deserved to male workers and/or workers forty years of age or over. In fact, those plaintiffs who were themselves managers within the company at some point, testified at their depositions that Xerox had never directed them or requested them to consider an employee's age or sex in decision-making.
. Plaintiffs argue that inclusion of their CAF scores as independent variables in multiple regression analyses would not have been appropriate since they all contend that they were purposely given scores lower than they deserved. They are correct that tainted variables do not further the causation inquiry. Cf. Ottaviani,
