The Illinois Central Gulf Railroad (ICG) needs many stong backs. Would-be laborers apply to a central office, where a clerk hires from among those whose applications are less than six months old. During 1979 and 1980 the ICG had eight divisions, autonomous in hiring. The hiring office of the St. Louis division was in Carbondale, Illinois, where a single clerk hired most laborers. During 1979 the Carbondale office hired 11% of the black applicants for laborer jobs (including applicants with no preference for type of job) and 39% of the white applicants for these positions. In 1980 the railroad hired no black applicants and 6.1% of the white applicants. We have the ensuing class action, filed under Title VII of the Civil Rights . Act of 1964, 42 U.S.C. § 2000e et seq., by one of 1979’s disappointed black applicants.
The district court certified a class of applicants for laborers’ positions in the ICG’s St. Louis division between December 8, 1978, and the filing of the case in early 1981. After a lengthy trial, the court entered judgment for the ICG.
The applicant class contends on appeal that the ICG has not established that it had a poliсy of hiring the applicants who lived closest to their places of work, let alone that this policy produced the racially disparate effects or was justified if it did. The ICG, for its part, attacks the district court’s conclusion that the class made out a prima facie case; it maintains that it did not need to offer any justification at all. We conclude, with the district court, that the class established a high probability of disparate treatment requiring the ICG to offer a race-neutral explanation for its hiring. Cf.
Furnco Construction Corp. v. Waters,
I
The disparate treatment claim is that the ICG considered race when making hiring decisions. The class offered several pieces of evidence that, the district court bеlieved, established a presumptive (prima facie) case on this score. (1) The railroad hired a much larger portion of white applicants than of black applicants. The plaintiffs’ expert witness testified that there was less than one chance in a million that this disparity was consistent with race-neutral hiring. (2) The railroad hired more than 500 laborers in 1979, including 11 for work in East St. Louis. About 150 people from East St. Louis, most of them black, applied for work on March 27, 1979, while *1430 the ICG had more than 100 laborer jobs open on a repair project. ■ The ICG did not hire a one of these applicants, though when it hired laborers to work in East St. Louis itself it turned outside that city (sometimes far outside it) to hire white laborers. (3) Although the railroad argued that its desire to hire local labor explained the disparate consequences, a survey of its employment records showed that 122 laborers holding non-mobile jobs (that is, jobs with a single place of work) lived more than 50 miles from their job sites. All but three of these laborers are white.
This combination powerfully suggests that the Carbondale office of the ICG had race on its mind when hiring laborers in 1979 and 1980. It is hard to imagine a stronger case, short of an announcement of discrimination. The ICG attacks only the statistical component of this showing, on two grounds: bad data and inaccurate assumptions about the labor market. The first is unavailing. The plaintiffs’ expert used the best data available; that the data were not better is the ICG’s fault, and we agree with the district court that the data at hand werе good enough even though imperfect.
The argument about the labor market goes to the heart of the case. The plaintiffs' expert assumed that the St. Louis division of the ICG is a single labor market. This meant that every applicant was interested in every job, and the railroad potentially interested in every applicant for every job. Yet the St. Louis division extended more than 200 miles north to south, from central Illinois to western Tennessee, and as the ICG points out it is unlikely that applicants from one end of the division really wanted to work at the other. It adds that it was not interested in people who lived far from the places that needed laborers. Only an analysis of many geographically smaller labor markets, the ICG insists, could support a valid conclusion about its hiring processes.
The district court did not agree with this argument, and neither do we. The ICG used one hiring office for the entire division and maintained one set of employment records. It was not possible to construct geographic markets out of preferences expressed in the applications, for there were none. The forms the railroad provided to applicants did not ask them where in the division they wanted to work Or suggest that they had any say in the matter. Anyway, the expert witness did not need to examine smaller labor markets unless the geographic distribution of applicants by race was uneven. If black and white applicants throughout the ICG’s St. Louis division lived (on average) equal distances from the ICG’s work sites, then the assumption thаt there was only one labor market made no difference. Smaller labor markets would mirror the division as a whole; if the ICG’s performance were better in some, it was bound to be worse in others. Only non-uniform distribution would make the study inappropriate. If, say, the black applicants predominantly lived in the center of the district while the work was located at the far north and far south, then the plaintiffs’ assumption would invalidate the results. The study was enough to cast on the railroad the burden of showing that the applicants were not homogeneously distributed by race with respect to places of work. But the ICG never showed that there was such a non-uniform distribution of applicants-indeed, as we conclude below, the railroad did not show even that distancе from work mattered much.
The court (and therefore the expert witness) wants to test hypotheses relevant to the litigation. The witness formulates the hypothesis (such as: “the ICG was hiring without regard to race” or “the ICG was hiring labor from nearby”) and uses this hypothesis to make predictions. Then the expert examines the data to see whether they confirm the predictions and thereby support the hypothesis. Much judicial discussion about statistical evidence labors over questions whether findings are reliable enough to confirm a prediction and support or reject the hypothesis. That is, how confident can we be that chance or irrelevant details did not account for the pattern we observe in the data? Unless a chance outcomе is unlikely, the study does
*1431
not confirm or refute the hypothesis. Drawing from offhand footnotes in two cases,
Hazelwood School District v. United States,
The ICG’s argument about labor markets is an effort to persuade us that the plaintiffs’ expert specified the hypothesis incorrectly. The plaintiffs replied that the ICG’s decision to hire from Carbondale shows that there was a single labor market. Neither side asked whether the specification of the test adequately accounted for the hiring decision. Plaintiffs’ statistical work leaves much to be desired. The plaintiffs’ expert used no independent variables other than race. He assumed, in other words, that all applicants are identical in every respect except race. This is not necessarily true. For laborers’ jobs, other variables matter—physical condition (for which age and a weight/height ratio may be prоxies) and employment history (has the person been fired from other jobs or convicted of job-related offenses?) are important to any employer. The omission of these variables weakens the plaintiffs’ case by leaving open the possibility that important, non-racial variables account for the hiring decisions. But the ICG does not suggest that these things account for the pattern of its hiring. The ICG’s expert replicated the findings of the plaintiffs’ expert, and the ICG does not offer any explanation other than distance from work for the startling disparity in rates of hiring. We are not about to discount the plaintiffs’ statistical work on grounds that the employer, with the best access to data, chose not to raise. We therefore conclude, with thе district court, that the plaintiff class made out a presumptive case of disparate treatment.
II
The railroad’s response to the plaintiffs’ evidence is to say that distance accounts for everything. The district court found that the ICG had a preference for people who lived close to work (where “close” meant less than 50 miles), though not a hard-and-fast rule. We accept this finding. The ICG then asserts that black applicants lived farther from the ICG’s places of work. If this disparity in dis *1432 tance produced the disparity in hiring, the ICG would have a sufficient answer to the disparate treatment claim. (Whether it would have a sufficient reply to the disparate impact claim we need not say.) The essential ingredient of the ICG’s case was prоof that black applicants did live farther from the worksites, enough so to account for the different rates of hiring.
The ICG had a straightforward way to establish its claim. Applications for employment gave the applicants’ addresses. The ICG could have measured the distance between the residences of “active” applicants (those who applied within six months) and the places for which it was hiring at a given moment. If the data showed that it hired the applicants who lived close by, without regard to race — that is, if black and white applicants who lived 10 miles from work had equal probabilities of being hired, and applicants who lived 10 miles from the site had higher probabilities than applicants who lived farther away — it would have had a good reply to the plaintiffs’ case. But the ICG did not conduct this inquiry. Its expert regarded the applicant data as incomplete and had trouble matching the applications active at a given moment against jobs then available. See
This made it difficult, though not impossible, to test the ICG’s hypothesis. Showing (as the ICG did) that 80% of the laborers for which it compiled data live within 50 miles of their assigned station does not prove that the ICG used any sort of distance test in hiring. People generally want jobs close to home, and laborers especially are unlikely to drive more than 100 miles round trip daily for the wage being offered. An employer that paid no attention to thе residence of its applicants likely would find that 80% or more of its employees lived within 50 miles of work; self-selection would see to that. Furthermore, the observation that 80% of those hired lived within 50 miles of work is consistent with the hypothesis that 80% of those not hired lived within 50 miles of the ICG’s work-sites. It is consistent, indeed, with the proposition that black applicants lived closer to the jobsites than white applicants, and that had the ICG hired more blacks it would have had a more compact labor force.
The hypotheses that black applicants live farther from the ICG’s yards than white applicants and that the ICG tried to minimize the commuting distance have some testable implications, however — even if the data are limited to people actually hired. One is that the white laborers live closer to work than the black laborers do. If, for example, white applicants live (on average) 30 miles from the ICG’s worksites, while black applicants live 60 miles away, and if the distributions around these means are normal, then the ICG will be hiring from the “tail” of the black applicant pool and the center of the distribution of the white applicant pool. The whites hired will live closer to work than the blacks hired; the distribution of commuting distances for white employees also will be more compact. (That is, the standard deviation of distances from work for white employees will be lower than the standard deviation for black employees.) Although the ICG hired as its expert witness a statistician who had written extensively on the subject^ the ICG either neglected to test for these implications or elected not to inform the district court of its findings.
The ICG’s expert introduced a number of tabulations that have allowed us to determine the averages, though with some imprecision because we used only the aggregate data the expert compiled. Black laborers hired for non-mobile jobs lived only half as far from their worksites as white laborers. The distribution of commuting distances for black laborers was much more compact.
1976-1980 Blacks Whites 1979-1980 Blacks Whites
Mean distance from work, miles 14.2 29.4 16.5 30.7
Standard deviation of distance from work, miles 15.9 33.7 18.8 28.6
The ICG’s expert did not have data for all of the ICG’s employees, so we used a standard test of statistical significance to in *1433 quire whether the distances (and distributions) of white and black laborers might be the same, appearing diffеrent only because of the sampling from the pool of employees. The values of the t-test exceed 4.0 for both sets of years, which implies a probability less than one in a thousand. 1 So far the ICG’s data support the plaintiffs’ case rather than the ICG’s.
There is a related implication: if black applicants live farther from the jobsites than white applicants, then the “exceptions” to the ICG’s desire to minimize distance should favor black applicants, who will be overrepresented in the pool eligible for exceptional treatment. The St. Louis division had 122 non-traveling laborers who lived more than 50 miles from work. All but three were white. The ICG has no explanation for this datum, which was introduced at trial and is inconsistent with its contention that black аpplicants lived farther from work than white applicants.
It might be possible to explain these findings by showing that the ICG had some rural worksites, with predominantly white workers commuting long distances, and some urban worksites with black laborers commuting short distances; the laborers commuting more than 50 miles also would be concentrated at the rural sites. But the ICG did not suggest that this accurately represents its employment picture, and none of its sites had a labor force more than 30% black, which seems to exclude the possibility we mention. (The largest urban area in the district, St. Louis and East St. Louis, easily could produce a labor force more than 30% black, and the white workers at these sites would not have commuting distances longer than the black workers.)
The ICG’s statistical expert took a different approach altogether. See
The district court was not convinced by this evidence,
We have used an extreme hypothetical, but the problem is real. Suppose there are two counties, equal in size and distance to the ICG’s worksites. If one county’s labor force is 30% black and the other county’s labor force 10% black, race-blind hiring from these two counties as a unit should produce a labor force 20% black (if people apply for work in рroportion to their numbers). Suppose, however, the ICG hires only from the second county (with 90% white laborers). The first county will receive a weight of zero, and if the ICG’s hires are 10% black the study will conclude that there has been no discrimination. This is an artifact of the way in which the hypothesis was formulated. We need not go into the details of statistical significance to see that the results are unreliable. The problem is more than hypothetical. The plaintiffs offered evidence that some counties, with relatively large black labor forces, had been skipped over while the ICG hired from other, “whiter” counties farther from its workplaces. Four counties in the St. Louis division have “diminished civilian labor forces” more than 20% black. The ICG hired significant numbers of employеes from only one of these counties, while hiring larger numbers from adjacent counties.
So the defendants’ statistical work simply does not show that distance accounts for the racial differences in the rate of hiring. The district court found that the ICG used distance as a ground of decision and apparently believed that this finding carried the day for the railroad on the disparate treatment claim. But it does not; there must be a causal link between criterion and consequence. Only that causal link would compel the plaintiff class to demonstrate (in the disparate treatment portion of the litigation) that the railroad was using distance to get at race (because of, not just in spite of, its disparate consequence). The only piece of evidence providing the causal link was the statistical study, which the district court set aside (correctly, we have held). That left the ICG’s burden of production unsatisfied.
This is so even though we have considered only the plaintiffs’ disparate treatment theory. Under
Texas Department of Community Affairs v. Burdine,
When the defendant pronounces a reason unrelated to the plaintiff (“a midget can’t do the job”, followed by silence on the plaintiff's height), it has not adequately articulated a neutral reason within the meaning of Burdine. The employer’s burden of production means that it must introduce facts sufficient in principle to explain *1435 what happened. The ICG did not need to “prove” the validity of its explanation, but it did need to give one. In a pattern-or-practice case like this the employer need not show that distance accounted for the decision concerning any particular applicant, or that distance was important to rail operations (“business necessity”); it had at least to produce evidence that black and white applicants lived different distances from work.
Just in case the evidence was sufficient to satisfy the burden of production, we add that the explanation was pretextual. The definition of a pretext, which we discussed in
Pollard,
is а statement that does not describe the actual reasons for the decision. The employer need not have “good” reasons, and a mistaken business decision is not on that account a “pretext”. See also, e.g.,
Friedel v. City of Madison,
There remains only one possibility: that the ICG’s studies, although failures as attempts to relate hiring to distance, undercut the plaintiffs’ prima facie case of discrimination. They do not do so, however. Studies based on Census labor pool data rarely overcome studies based on actual applicants. Each data base produces some biases. When used to study unskilled jobs, each set of biases will favor the employer. As a result, an inference of discrimination from either is strong, and failure to replicate the inference using other data does not support a finding of no discrimination.
Statistical analysis of the actual applicants has the advantage of self-selection: the study examines how the employer actually treated the people who wanted the job. Applicant studies are preferable as a rule because Title VII governs the treatment оf applicants. What better statistical base than the people to whom Title VII applies? See
Movement for Opportunity and Equality v. General Motors Corp.,
People apply because they want the job more than their current employment (or prefer it to unemployment). Even those who want the job will not apply, however, unless they think they have some chance of being hired; They must travel to Carbon-dale, which may be a long and costly trip (costly not only in money but also in time, which includes income from other work *1436 foregone). To determine whether a study of applicants will identify an employer’s rule of decision (taking account of race and taking account of distance are rules of decision), we must first know how the rule of decision influences who applies.
Take the first possibility, that the firm is using race as a rule of decision. If this is known — either to applicants or to those ubiquitous informational intermediaries, emplоyment agencies, which act as applicants’ proxies — it will deter blacks from applying. Cf.
Dothard v. Rawlinson,
An employer’s decision to take distance into account also affects who applies. If the employer announces that it will hire only people who live within 50 miles of work, then only people with this characteristic will apply. It will make no differеnce whether blacks in the St. Louis district live farther from the ICG’s yards than whites; the distant blacks will select themselves out. If white workers predominate in the vicinity of the ICG’s worksites, and people know of the ICG’s preferences, most applicants will be white. In equilibrium people will self-select so that each applicant has an equal chance of being hired — which will mean, if the ICG is race-blind, that the average distance from work of black and white applicants is identical. 2 Whether or not black people in the district live farther from the ICG, black applicants should be situated similarly to white applicants. This, too, produces a bias in the ICG’s favor. If what the ICG says about its hiring policy is true, hiring rates of actual applicants by race should be similar evеn if the- average distances of potential applicants are dissimilar. When the study shows a substantial disparity in hiring rates even among applicants, it has made a strong case.
Now consider the potential bias in a study based on population rather than applicants. The labor force in any county contains many workers satisfied with their jobs, perhaps because they are already earning more than the ICG offers to unskilled labor. Because black workers in any given labor force earn lower wages than their white counterparts, 1987 Statistical Abstract of the United States Table 680, and are unemployed more frequently, id. at Tables 664, 667, they are more likely to be available for unskilled work than are white workers in the same labor force. Knowledge that 10% of the workers in the “diminished civilian labor force” of a given cоunty are black does not imply that only 10% of the laborers realistically available to *1437 the ICG are black. The ICG’s offer will beat the status quo more frequently for black laborers than for white laborers. If 10% of the labor force in a given market is black, 20% or more of the laborers available to the ICG may be black. A study disclosing that the employer’s hires match the color distribution of the “diminished civilian Labor force” (in this hypothetical, that 10% of the laborers are black) does not permit a confident inference that the employer is not discriminating. 3
The record in this case contains both applicant and labor force studies. Each kind of study, for unskilled labor, is biased against finding discrimination. When the applicant pool study produces a powerful inference of discrimination, and the labor force study produces at best a weak inference of race-neutral action, the plaintiffs have done what they must.
Another, more general, effect is at work, and it cuts in the other direction. Either kind of study can overstate discrimination because of the way statistical methods deal with unknown variables. Much statistical work — including that in this case — implicitly attributes “unexplained” variance to race. Yet labor markets are poorly understood; to chalk up all unexplained variation to race is to find more discrimination than there is. There is also a selection bias in selecting cases to be litigated. A statistician calls a result “significant” if there is less than one chance in 20 that it is consistent with the null hypothesis (here, race-blind hiring). In a world with 1000 employers, 50 will meet this criterion of significance by chance. These will become targets of litigation, and courts are apt to find discrimination in all 50 cases even though by hypothesis none of the 1000 employers is discriminating. Because many employers have multiple divisions, and each job for each year in each division may be the subject of a separate study, each employer is bound to have some job categories in which statistical techniques will show chance variations that permit the inference of discrimination. See David W. Barnes, The Problem of Multiple Components or Divisions in Title VII Litigation: A Comment, 46 L. & Contemp.Prob. 185 (Aut. 1983).
These troubling effects of statistical inferences require thoughtful consideration in each case. Courts ought not demand that employers do the impossible and explain behavior that baffles the best labor economists. And it would be most unpleasant if statistical techniques were simply being used to find the inevitable imbalances in a population of non-discriminating firms. Parties should address explicitly how to handle unexplained variance. There are several potentially useful methods. A court may use the EEOC’s “80% rule” (which Professors Meier, Sacks & Zabell explain and endorse, and which was satisfied here). Perhaps a court should say that only statistical models explaining a substantial portion of the variance may be used to infer discrimination (which Professor Campbell believes the best approach). See also Richard Goldstein, Two Tyрes of Statistical Errors in Employment Discrimination Cases, 26 Jurimetrics 32 (1985). The disparities in this case are so striking, however, and the ICG’s efforts to explain them so unsuccessful, that the tendency of statistical devices in general to assign as “discrimination” too much of what is simply puzzling does not assist the ICG. The prima facie case of discrimination stands, and as we have concluded that the ICG’s studies do not satisfy the employer’s need to articulate a reason unrelated to race, the class must prevail on the merits.
HI
Robert Earl Mister, the class representative, pressed a disparate treatment
*1438
claim on his own behalf. The district court resolved this adversely to Mister.
The judgment is affirmed to the extent it concerns Mister’s personal claim. The remainder of the judgment is reversed, and the case is remanded for the award of appropriate relief. The class shall recover its costs. Circuit Rule 36 shall not apply on remand.
Notes
. The distribution is not "normal”, so the t-test may not be the best, but its values are so large (and there are so many degrees of freedom) that we have confidence that any different method of analysis would not come up with a different answer.
. One caveat. Societal discrimination against black workers would reduce the opportunity cost of their time. That is, black applicants on average would be giving up less income from other employment to apply for a new job, and therefore would travel farther and accept lower probabilities of being hired. The railroad then would hire a smaller portion of black applicants without engaging in discrimination. This undercuts, to an unknowable degree, both self-selection effects discussed in the text. It is impossible to bеlieve — and the ICG does not contend — that this counterweight is large enough to account for disparate effects of the magnitude the plaintiffs established.
. The bias comes from the supposition that the employer is hiring unskilled labor. It can easily be reversed if the employer is hiring skilled labor. That 10% of the workers in the labor force are black does not imply that 10% of the workers qualified to be aeronautical engineers or professors of Greek are black. Proof that an employer of engineers hired in proportion to the race of the labor force as a whole would acquit it of discriminating against blacks. We mention this to reinforce the point that it is necessary to review the inferential process case by case; the biases of one study in one case may be avoided or reversed in the next.
