Mаrilyn JOHNSON, et al., Plaintiffs-Appellants/Cross-Appellees, v. CITY OF MEMPHIS, Defendant-Appellee/Cross-Appellant.
Nos. 13-5452, 13-5454
United States Court of Appeals, Sixth Circuit.
Argued: Jan. 30, 2014. Decided and Filed: Oct. 27, 2014.
Rehearing En Banc Denied Dec. 30, 2014.
770 F.3d 464
* Judge Donald recused herself from participation in this ruling.
Because AMOS has suffered a cognizable injury sufficient to confer standing upon it to bring this action, and because I agree with the district court‘s conclusion with regard to the merits, I would affirm its judgment. I respectfully dissent.
Before: SUHRHEINRICH, GIBBONS, and COOK, Circuit Judges.
OPINION
COOK, Circuit Judge.
After more than thirteen years of litigation, including a bench trial, numerous preliminary injunctions, and a previous appeal affirming the grant of injunctive relief for some plaintiffs, see Johnson v. City of Memphis (”Johnson Appeal I“), 444 Fed. Appx. 856, 861 (6th Cir.2011), three consolidated cases challenging the City of Memphis‘s (“City“) police promotional processes as racially discriminatory return on cross-appeals. The appeals address two allegedly discriminatory sergeant promotional processes that occurred in 2000 and 2002 (the “2000 process” and “2002 prоcess”1), targeting three matters decided by the district court at different phases of the litigation: (1) the order dismissing plaintiffs’ negligence claim concerning the already-invalidated 2000 process under Tennessee‘s governmental-immunity statute,
For the following reasons, we affirm in part and reverse in part the district court‘s judgment, and we remand the fees issues for further consideration.
I. BACKGROUND
We briefly summarize the factual background of these cases thoroughly detailed in the district court‘s bench-trial opinion. The City‘s promotional processes have engendered controversy for nearly forty years, prompting numerous lawsuits alleging racial and gender discrimination by
The City responded with a 1996 promotional process (“1996 process“) designed by Dr. Mark Jones, an industrial and organizational psychologist, and overseen by a Department of Justice consultant. The 1996 process consisted of four components, weighted as follows: a “high-fidelity” law enforcement role-play exercise, 50%; written test, 20%; performance evaluations, 20%; and seniority, 10%. Arbitration proceedings involving claims under the City‘s Memorandum of Understanding with the police union ensued, but no Title VII litigation resulted.
Dr. Jones modeled the City‘s next promotion protocol after the 1996 process, replacing the role-play component with a video-based practical test because of security and practicability concerns. The 1996 simulation had taken more than two months (testing and scoring) to evaluate individually more than 400 candidates, and the City discovered problems with candidate coaching during the exercise. The following сomponents initially comprised the 2000 process: a “low-fidelity” (i.e., no role-play) video-based practical test, 50%; job knowledge test, 20%; performance evaluations, 20%; seniority, 10%. After the City discovered that leaked answers compromised the results of the video test, the City excluded the video test and reweighted the remaining test components. The adjustments to the 2000 process prompted the first of these disparate-impact cases, Johnson v. City of Memphis, No. 00-2608, and the City ultimately consented to the invalidation of the 2000 process by Judge Jon McCalla in June 2001. (See R. 58, Order at 1-2.2)
Attempting to avoid the test-security issues encountered in the previous two promotional periods, the City hired outside consultants Jeanneret & Associates to design the replacement tests that would become the 2002 process. After the City submitted a testing proposal to the district court, Judge McCalla held a status conference to hear plaintiffs’ objections and instructed plaintiffs’ expert to work with the City‘s expert, Dr. Richard Jeanneret. The City addressed the concerns raised by plaintiffs’ expert, and the district court granted the City‘s motion to proceed with the 2002 process. The 2002 process included the following equally weighted test componеnts: an investigative logic test; a job-knowledge test; an application-of-knowledge test; a grammar and clarity test; and a “low-fidelity” video-based practical test.
The City administered the 2002 process to 517 applicants between September 27-29, 2001, and completed grading in fall 2002. Raw scores ranged from 174.75-358.75 out of a possible 384.5 points. The City converted these scores to a 100-point scale and then—honoring an agreement with the officers’ union—added up to 10 points for seniority to the final promotion score. Promotion scores ranged from 53.511-103.303, of a possible 110 points. Despite the City‘s efforts, the 2002 process resulted in minority candidates scoring disproportionately worse than white candidates. Using Dr. Jeanneret‘s rank-or-
The district court held a bench trial in July 2005 and issued its decision in December 2006. Its Memorandum Opinion and Order on Rеmedies rejected all claims except plaintiffs’ Title VII disparate-impact claims as to the 2002 process. The court found that, while the 2002 sergeant test was valid and reliable, less discriminatory valid alternatives were available and, thus, the 2002 process violated Title VII. Though the court ordered the promotion of all minority plaintiffs, with back pay and seniority, it denied plaintiffs’ request, at that time, to compete for promotion to the rank of lieutenant because they lacked the requisite two years’ experience as sergeant. See Johnson Appeal I, 444 Fed. Appx. at 857 (detailing district court‘s procedural history).
Following the bench-trial decision, the district court fielded a variety of remedies-related motions for injunctions and stays between 2007 and 2010. Because so much time had passed since the problematic 2000 and 2002 processes, plaintiffs’ alleged injuries, in terms of lost pay and seniority, spilled over into subsequent promotional processes, as plaintiffs were denied the opportunity to apply for additional promotions. At different points, court orders relying on the Title VII judgment invalidating the 2002 process permitted plaintiffs to participate in those promotions, see generally Johnson Appeal I, 444 Fed. Appx. at 857 (lieutenant promotions), but the district court repeatedly denied plaintiffs’ request for additional retroactive seniority and back pay.
In March 2010, the court entered a preliminary injunction ordering the immediate promotion to the rank of lieutenant of 28 plaintiffs with passing exam scores and sufficient work experience, and we affirmed in Johnson Appeal I, 444 Fed. Appx. at 857-58, 861. In affirming the preliminary injunction, the panel expressed “concern[] at the degree of delay” of “this case, now in its eleventh year,” and admonished that it would entertain a mandamus petition if the district court failed to enter a final judgment within the next six months. Id. at 861 (noting that the district court‘s 2006 bench-trial decision “remains interlocutory almost five years later“). After plaintiffs petitioned for mandamus in January 2013, the district court awarded back pay, interest, and attorneys’ fees and entered a final judgment, whereupon plaintiffs voluntarily dismissed their mandamus action.
The plaintiffs appeal the immunity-based denial of their negligence claim related to the 2000 process and various remedies and attorneys’ fees issues related to the 2000 and 2002 processes; the City cross-appeals the district court‘s Title VII judgment invalidating the 2002 process and the related million-dollar attorneys’ fees award; and the plaintiffs present an alternative legal justification3 for the Title VII judgment against the 2002 process.
II. JOHNSON I PLAINTIFFS’ APPEAL: NEGLIGENCE CLAIM, 2000 PROCESS
First, the non-minority Johnson I plaintiffs dispute the application of governmental immunity to their negligence claim, targeting the already-invalidated 2000 process. They press this claim—their only one seeking damages—arguing that the decisionmakers responsible for the 2000 process committed non-discretionary acts ineligible for immunity. We review the district court‘s grant of summary judgment de novo. Ciminillo v. Streicher, 434 F.3d 461, 464 (6th Cir.2006).
According to the Johnson I plaintiffs, City officials violated a key provision of the City Charter requiring the use of “practical tests” in the promotion process. Specifically, they object to the City‘s exclusion of the interactive, video-based component of the 2000 process upon discovering that some candidates received advance notice of the questions.
The district court rejected this argument, finding that “the decisions concerning what type of test to use, how to weight the various testing components, and how the tests are to be administered are left to the discretion of the director of personnel,” and noting that the Charter‘s practical-test requirement “must be interpreted by those in a position to make such decisions for [the City].” We agree with the district court.
Tennessee‘s Governmental Tort Liability Act (GTLA) immunizes the state‘s public officials from negligence suits where “the injury arises out of . . . [t]he exercise or performance . . . of a discretionary function, whether or not the discretion is abused.”
Contrary to the Johnson I plaintiffs’ suggestion, the City Charter and related ordinance do not require “practical tests.” Rather, they provide that employment examinations “shall be of a practical nature and relate to such matters as will fairly test the relative competency of the applicant to discharge the duties of the particular position.” (R. 656-25, City Charter § 250.1 (emphasis added); accord R. 656-26, Civil Service Ordinance § 9-3.) This subtle difference suggests that the regulations provide a broad instruction that examinations test actual job functions, instead of a strict requirement for a specific
The district cоurt correctly recognized that City officials must interpret and implement the Charter‘s broad guidance in devising fair and effective promotional processes. In the absence of specific regulations confining the City‘s discretion, GTLA immunity shields this discretionary decision. See Giggers, 363 S.W.3d at 507-08. We therefore AFFIRM the district court‘s grant of partial summary judgment to the City on this claim.
III. CITY‘S CROSS-APPEAL: TITLE VII JUDGMENT, 2002 PROCESS
Next, the City cross-appeals the district court‘s bench-trial ruling finding a Title VII disparate-impact violation. The parties agree that plaintiffs presented a prima facie case of the 2002 process‘s disparate impact; the City promoted 264 of the 517 candidates, with a substantial disparity between the success rate of non-minority (175/240) and African-American candidates (86/274). The City argues, however, that the court applied an unduly deferential legal standard in finding that plaintiffs showed less discriminatory alternatives to the 2002 process. We review the court‘s legal conclusions de novo and findings of fact for clear error. E.g., Beaven v. U.S. Dep‘t of Justice, 622 F.3d 540, 547 (6th Cir.2010).
A. The Title VII Disparate-Impact Standard
Though Title VII disparate-impact claims originated with the Supreme Court‘s decision in Griggs v. Duke Power Co., 401 U.S. 424, 91 S.Ct. 849, 28 L.Ed.2d 158 (1971), Congress codified the disparate-impact standard in the Civil Rights Act of 1991. See
[First,] a plaintiff еstablishes a prima facie violation by showing that an employer uses “a particular employment practice that causes a disparate impact on the basis of race, color, religion, sex, or national origin.”
The City contests plaintiffs’ step-three showing of less discriminatory alternatives. To satisfy this element, the plaintiff must demonstrate: (1) the availability of alternative procedures that serve the employer‘s legitimate interests and (2) produce “substantially equally valid” results, but with (3) less discriminatory outcomes.
B. Components of the 2002 Process & Plaintiffs’ Proposed Alternatives
As noted above, the 2002 process consisted of five testing components: (1) a “lowfidelity” video test, which required oral responses to video depictions of law enforcement scenarios; (2) an investigative logic test, consisting of multiple-choice and short-answer questions; (3) an open-book job-knowledge test; (4) an application test, with weighted scores differentiating between the most and least effective responses; and (5) a written communications exam testing for grammar and clarity.
As they did before the district court, plaintiffs assert three available alternatives to improve the 2002 process: (1) the 1996 process‘s high-fidelity role-playing exercise, which required candidates to respond to simulated law-enforcement scenarios (“1996 simulation“); (2) assessments of candidates’ “integrity” and “conscientiousness“; and (3) a merit-promotion system similar to one used by the Chicago Police Department, which consists of interviews by merit-review boards. Yet, in arguing before this court for these alternatives, they shirk their duty to demonstrate the benefits of the Chicago-plan and integrity/conscientiousness theories, defending only the 1996 simulation as equally valid and less discriminatory. (Third Br. at 31-37.) Similarly problematic, plaintiffs neglect to explain how any of these alternatives would fit into the 2002 process, but we gather that they would either replace or complement its existing components.
Plaintiffs vouch for the 1996 simulation by pointing to its past success, including a sterling validation report documenting its non-discriminatory results. They also tout its benefits compared to the less practical (i.e., less like actual job duties), low-fidelity video test used in the 2002 process. Finally, they rely on their expert‘s claim that the 1996 simulation is more valid than the 2002 tests and “easily replicated.” (See Third Br. at 32-35; R. 648-13, Trial Tr. (DeShon) at 1681-82; see also R. 648-15, Trial Tr. (DeShon) at 1848 (likening the difference between high-fidelity simulations and low-fidelity response exercises to “knowing versus doing“).)
C. The District Court‘s Bench-Trial Findings Regarding Available Alternatives
After summarizing the proffered alternatives, which the court characterized as “broad suggestions [of] alternative testing modalities,” thе court found that plaintiffs satisfied the step-three burden of demonstrating available, equally valid, less discriminatory alternatives. It reasoned as follows:
It is of considerable significance that the City had achieved a successful promotional program in 1996 and yet failed to build upon that success. While the 1996 process was not perfect it appears to have satisfied all of the legal requirements of promotional processes. The 2000 process departed substantially from the 1996 model in its abandonment of the practical exercise and reweighting of the remaining elements. The 2002 processes, while arguably more sophisticated than its predecessors, suffered from a grossly disproportionate impact on minority candidates.
It is unnecessary for the Court to scrutinize the advisability of incorporating assessments of qualities such as integrity and conscientiousness or the relative merits of the Chicago process. It is sufficient to acknowledge that the existence of such alternative measures and methods belies, as Plaintiffs suggest, Defendants’ position that they had no choice but to go forward with the 2002 promotion process despite its adverse impact because no alternative methods with less adverse impact were available.
Defendant argues that Plaintiffs have failed to meet their burden because none of the alternatives now suggested were proposed at the time the 2002 process was implemented. This argument misconstrues the appropriate standard. Plaintiffs must prove that there was “another available method of evaluation which was equally valid and less discriminatory.” Bryant v. City of Chicago, 200 F.3d 1092, 1094 (7th Cir.2000) (emphasis added). Plaintiffs are not required to have proposed the alternative. The requirement is only that the alternative was available. The Court reads “availability” in this context to mean that Defendant either knew or should have known that such an alternative existed. Plaintiffs have amply demonstrated that Defendant knew of all three alternatives they have set forth.
(R. 388, Bench Trial Op. at 25-26.)
Notably, the court relies on the relative success of the 1996 test, without (1) requiring evidence that the 2002 process would benefit from incorporating the 1996 test‘s simulation, or (2) addressing the City‘s interest in test-security, in light of the 1996 simulation‘s documented cheating. Also, the district court expressly declines to consider the merits of the integrity/conscientiousness and Chicago-plan alternatives, resting its conclusion solely on the City‘s denial of altеrnatives.
D. The City‘s Challenge to the Court‘s Analysis
The City challenges the district court‘s judgment, asserting both legal error and factual deficiencies with plaintiffs’ step-three showing. Though plaintiffs characterize the City‘s argument as an attack on the district court‘s factual findings, invoking the deference of clear-error review, the district court‘s analysis contains legal errors subject to our de novo review. Beaven, 622 F.3d at 547.
First, the district court readily admits crediting the Chicago-plan and integrity/conscientiousness alternatives without considering their relative merit; this approach conflicts with Title VII‘s requirement that plaintiffs prove the availability
Second, the district court accords “considerable significance” to the results of the 1996 simulation with no discussion of the City‘s test-security concerns. Courts recognize employers’ legitimate interest in preserving the integrity of their employment processes. E.g., Hearn v. City of Jackson, 340 F.Supp.2d 728, 742 (S.D.Miss.2003) (overruling disparate-impact plaintiffs’ proposal requiring all applicants to complete a lengthy, interview-based selection procedure, noting the city‘s legitimate interests in resource preservation, avoiding the appearance of selection bias, and preventing later applicants from obtaining the questions in advance), aff‘d, 110 Fed.Appx. 424 (5th Cir.2004) (per curiam).
Here, the City presented undisputed еvidence that leaked information and candidate coaching compromised both the 1996 simulation and its 2000-process replacement, a video-based test of law enforcement techniques. (R. 648-6, Trial Tr. (Jones) at 863-65 (discussing the “coaching” problems experienced with the 1996 simulation); R. 648-16, Trial Tr. (Claxton) at 2003 (explaining that City employees were excluded from the creation of the 2002 process, because “city employees are accused of funneling questions and/or answers to participants in a prior process“).) Though candidate coaching did not affect the outcome of the 1996 simulation—evaluators helped poor-performing candidates who would not qualify for promotion—it exposed a security flaw, and the 1996 process‘s designer testified that the simulator “was [the] weakest link” of the process, noting that “it contributed to most of the race differences” arising from the 1996 process‘s testing methodologies. (R. 648-7, Trial Tr. (Jones) at 921-22.) The parties certainly knew of these security problems during the development of the 2002 process, as evidenced by Judge McCalla‘s statements at the parties’ June 27, 2001 status conference. (See, e.g., R. 656-17, 6/27/01 Hr‘g Tr. at 42 (“[T]he issues that arose in the previous test, we don‘t want to run the chance of affecting the outcome of the test by giving out unnecessary information. . . .“).)
Third, the district court‘s analysis elides the City‘s concern regarding the impracticability of the 1996 simulation, which required numerous actors to portray the two-hour law enforcement scenarios and took nearly three months to evaluate more than 400 applicants. (See R. 648-6, Trial Tr. (Jones) at 863-66.) As the City‘s expert explained, the protracted nature of simulation testing and the number of moving parts reinforced the City‘s concerns about testing security. (Id.; see also R. 648-11, Trial Tr. (Jeanneret) at 1461 (citing “all of the issues that had been raised about the [City‘s testing] and the confidentiality and . . . prior knowledge of the test and . . . the integrity of the process” as reasons he declined to use the 1996 process).) The court should have accounted for the City‘s legitimate interests in test security and practicability in assessing plaintiffs’ proffered alternatives. See Watson, 487 U.S. at 998, 108 S.Ct. 2777 (plurality) (“Factors such as the cost or other burdens of proposed alternative selection devices are relevant in determining whether they would be equally as effective as the challenged practice in serving the employer‘s legitimate business goals.“); see also Allen, 351 F.3d at 314-15 (considеring proposal‘s effect on the city-employer‘s financial interests); Clady v. Cnty. of Los Angeles, 770 F.2d 1421, 1432 (9th Cir.1985) (“Financial concerns are legitimate needs of the employer.“); Chrisner v. Complete Auto Transit, Inc., 645 F.2d 1251, 1263 (6th Cir.1981) (“Of course, the marginal cost of another hiring policy and its implications for public safety are factors which should not be omitted from consideration.“).
Finally, the Seventh Circuit‘s decision in Allen persuades us that the district court erred by relying solely on the past success of the 1996 process in determining that the 2002 process should have incorporated a live simulation. Allen similarly involved police officers’ challenge to a city‘s promotion process. The officers proposed eliminating the written job-skills test from the process, so as to give full weight to merit-review boards. See Allen, 351 F.3d at 316-17. Noting the absence of “evidence that merit selection is inherently less likely to cause a disparate impact” than the other testing procedures, the court rejected this proposal and affirmed the grant of summary judgment to the city, explaining that “[t]he non-discriminatory history of past merit selection in the [Chicago Police Department] is not sufficient evidence to withstand the City‘s motion for summary judgment.” Id. at 317.
In sum, these legal errors improperly shifted plaintiffs’ evidentiary burden to the City, undermining the district court‘s judgment. At a minimum, we must vacаte the district court‘s Title VII judgment. The City asks us to go further, though, and find plaintiffs’ step-three showing insufficient as a matter of law. We thus must decide whether plaintiffs’ evidence presents a triable issue as to the availability of equally valid, less discriminatory testing alternatives. It does not.
E. Plaintiffs’ Insufficient Step-Three Showing
As noted above, the plaintiffs’ appellate briefing defends the validity and racial impact of only the 1996 simulation. The plaintiffs first point to the 1996 process‘s validation report and the City‘s Answer, which concedes that the 1996 process resulted in no adverse impact. The plaintiffs next highlight their expert‘s testimony regarding the difference between high-fidelity simulations and the 2002 process‘s low-fidelity video test. Third, the plaintiffs claim that statistical evidence shows that the 1996 simulation had higher content validity and lower disparate-impact scores than the 2002 process‘s tests. Finally, the plaintiffs stress the simplicity and affordability of the 1996 process compared to the 2002 process. The scant evidence supporting these claims dooms plaintiffs’ reliance on the 1996 simulation as satisfying its step-three burden.
Beginning with the results of the 1996 process as a whole, that evidence does not persuade inasmuch as plaintiffs do not seek to substitute the entire 1996 process for the 2002 prоcess.
As for the expert testimony, plaintiffs’ expert, Dr. Richard DeShon, asserted that high-fidelity exercises have greater validity than video-based tests, explaining that law enforcement simulations, like pilot simulators, require the candidate to perform the necessary tasks under realistic conditions. (See R. 648-4, Trial Tr. (DeShon) at 533; R. 648-15, Trial Tr. (DeShon) at 1848.4) But plaintiffs’ briefing
Subjective testing mechanisms open the door to random results and real and perceived scoring bias. See, e.g., Allen, 351 F.3d at 315 (“This court previously has noted the potential objection to subjective components of evaluation in selection procedures.“); Hearn, 340 F.Supp.2d at 742 (rejecting panel-interviews proposal, explaining that they “could have contributed to a feeling among candidates that the process was not fair and unbiased“); Nash v. Consol. City of Jacksonville, 895 F.Supp. 1536, 1553 (M.D.Fla.1995) (rejecting subjective performance evaluations, expressing concern that they “would open the process to favoritism, politics and tokenism“), aff‘d, 85 F.3d 643 (11th Cir.1996). Tellingly, plaintiffs’ counsel acknowledged this problem during thе formulation of the 2002 process when he objected to the inclusion of subjective testing components. (See R. 657-1, Feb. 26, 2001 Letter to City‘s Expert at 4.) Equally revealing, plaintiffs’ appellate briefing remains silent on the subjectivity problem.
We might overlook this pitfall if plaintiffs proffered evidence detailing how a subjective component could be scored so as to minimize disparate impact. But, as discussed, they provide no explanation for how the City should have meshed the 1996 simulation into the 2002 process, whether as a replacement or supplement for the low-fidelity video test, other testing components, or the entire process. Without that type of evidence, plaintiffs lose their argument that use of a high-fidelity simulation would produce better outcomes, because plaintiffs acknowledge that “[e]very single component of the 2002 testing process resulted in ‘very substantial’ adverse impact.” (Third Br. at 34; see also First Br. at 23 (detailing the adverse impact of each testing component).)
The plaintiffs likewise neglect to account for the City‘s legitimate interests in test security and efficiency. The 1996 simulation, which individually evaluated more than 400 candidates’ law-enforcement techniques via two-hour role-play scenarios, required numerous actors to produce, lasted three weeks, and took two months to grade. (R. 648-6, Trial Tr. (Jones) at 863-66.) Then the City discovered instances of candidate coaching, for which the plaintiffs prescribe no remedy, seemingly content with their expert‘s unqualified assurance that the 1996 simulation would be “easily replicated” at a lesser cost than the 2002 process. (Third Br. at 35 (comparing the costs of the two processes: $79,250 for 1996, more than $400,000 for 2002).) But the costs argument overlooks the cheating problems associated with the 1996 and 2000 testing; the City hired outside consultants to design the 2002 process to insulate the exam from the potential biases of City employees. (See Second Br. at 14-15; R. 648-16, Trial Tr. (Claxton) at 2003.) And plaintiffs point to no evidence showing administration of a reliable simulation exercise to more than 500 candidates at a reasonable cost (time and money) and in a manner that minimizes the likelihood of candidate coaching or information leaking. The City‘s expert report advised the parties in 2001 that simulations pose such
At bottom, plaintiffs rest their proposal on the actual results of the 1996 simulation, stressing that it produced less racial disparity than the 2002 process. (See Third Br. at 35 (comparing the 1996 simulation‘s race-disparity score, d=.21, to that of the 2002 process, d=.83).) Yet, as the Seventh Circuit explained in Allen—and we agree—past practice alone does not suffice. 351 F.3d at 315-17. The “[p]ast success” of a specific testing process “merely predicts, but does not establish, success” in future applications. Id. at 315. This broadest of Title VII remedies—which requires no showing of discriminatory motive, see Griggs, 401 U.S. at 431, 91 S.Ct. 849—demands evidence that plaintiffs’ preferred alternative would have improved upon the challenged practice. See Allen, 351 F.3d at 315 (“We cannot require the City to [incorporate plaintiffs’ alternative testing proposal based] on mere speculation.“); Zamlen v. City of Cleveland, 906 F.2d 209, 220 (6th Cir.1990) (rejecting test-rescoring proposal, where plaintiffs offered only speculation of a less discriminatory impact). This is especially true here, where plaintiffs propose a cumbersome exercise with a track record of security problems, no objective measures of candidate performance, and no explanation for how it could fit into the 2002 process or why it would produce better outcomes. The one-off results of the 1996 simulation, without more, do not carry plaintiffs’ burden.
Though arguably forfeited by plaintiffs’ minimalist briefing, the Chicago-plan and integrity/conscientiousness-testing proposals fare no better. Again, plaintiffs offer no justification for their comparative validity or discriminatory effect, as compared to the 2002 process‘s testing features. We further note that the Chicago plan‘s use of merit-review boards suffers from the same subjectivity and speculation problems identified by the Seventh Circuit in Allen. See 351 F.3d at 315-17. As for integrity/conscientiousness testing, EEOC guidelines generally disfavor tests that measure abstract character traits by making inferences about candidates’ mental processes. See
Ultimately, the district court aptly described plaintiffs’ proposed alternatives as “broad suggestions.” No doubt, the 2002 process resulted in a substantially higher percentage of unsuccessful African-American applicants. But plaintiffs must offer more to establish a Title VII disparate-impact violation. Because plaintiffs failed
Perhaps anticipating this outcome, plaintiffs offer an alternative defense of the district court‘s Title VII judgment that assails the City‘s step-two showing (credited by the district court) that the 2002 process was job-related and consistent with business necessity. See Ricci, 557 U.S. at 578, 129 S.Ct. 2658. Accordingly, we backtrack to the step-two standard.
IV. PLAINTIFFS’ ALTERNATIVE DEFENSE OF TITLE VII JUDGMENT: THE CITY‘S STEP-TWO SHOWING
“Once the plaintiff succeeds in making a prima facie disparate-impact case, the defendant may avoid liability by showing that the protocol in question has a manifest relationship to the employment.” Davis, 717 F.3d at 494 (citation and internal quotation marks omitted). The City may meet its step-two burden by showing through “professionally acceptable methods, [that its testing methodology is] predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.” City of Akron, 824 F.2d at 480 (citation and internal quotation marks omitted). Courts often refer to a test‘s job-relatedness and business necessity in terms of its “validity“—denoting the test‘s relationship to relevant job content—and “reliability“—referring to its ability to produce consistent results. See, e.g., Guardians Ass‘n of N.Y. City Police Dep‘t, Inc. v. Civil Serv. Comm‘n, 630 F.2d 79, 101 (2d Cir.1980). When the employment position involves public safety, we accord greater latitude to the employer‘s showing of job-relatedness and business necessity. Chrisner, 645 F.2d at 1262-63 (finding sufficient support for an employer‘s truckdriving experience requirements, noting that “[a]n industry with the primary function of managing the safety of large numbers of passengers must be allowed more latitude in structuring the requirements which could [a]ffect the performance of a primary business objective“); see also Spurlock v. United Airlines, Inc., 475 F.2d 216, 219 (10th Cir.1972) (“[W]hen the job clearly requires a high degree of skill and the economic and human risks involved in hiring an unqualified applicant are great, the employer bears a corresрondingly lighter burden to show that his employment criteria are job-related.“).
The City used a “content validity” model for the 2002 process that tests a “representative sample of the content of the job.”
A. District Court‘s Validity Findings
Here, in deeming the 2002 process‘s testing methods valid, the district court detailed Dr. Jeanneret‘s “comprehensive job analysis,” on behalf of the City, to identify the most important knowledge, skills, abilities, and personal characteristics (KSAPs) for the sergeant position.
B. District Court‘s Findings Regarding Reliability & Rank Ordering
Plaintiffs devote most of their alternative argument to the district court‘s findings regarding reliability and rank ordering. On reliability, the court found:
[The City‘s expert and the designer of the 2002 process] Dr. Jeanneret testified that he did not include a reliability estimate in the validation report because the 2002 process was heterogeneous, i.e., it measured numerous broad KSAP dimensions that were correlated with one another, and he felt that there was no appropriate estimate of reliability. According to Dr. Jeanneret, the most appropriate approach to reliability for such a heterogeneous test was test-retest reliability, which was not feasible under the circumstances. A reasonable alternative, Dr. Jeanneret asserted, would have been to develop an alternate form, requiring two identical tests which, he believed, was not possible in light of the particular testing environment. Since neither multiple administrations of the test nor parallel administration of identical tests were practicable, Dr. Jeanneret believed the only potentially applicable method of assessing reliаbility was to measure internal consistency using “coefficient alpha.” Dr. Jeanneret did not initially compute coefficient alpha because he intentionally designed a very heterogeneous test and making coefficient alpha, in his opinion, an inappropriate index of reliability.
Both Dr. Jeanneret and [plaintiffs’ expert] Dr. DeShon subsequently measured coefficient alpha, using somewhat different methodologies. Dr. DeShon reported an overall reliability coefficient of .76 using a method known as stratified alpha. Dr. DeShon included seniority in his analysis, which Dr. Jeanneret testified was inappropriate because seniority was not part of the measurement process. (Jeanneret, Tr. Vol. 11, 1287-88; DeShon, Tr. Vol. 5, 575; Tr. Vol. 16, 1898, 1912.) The Court agrees that inclusion of seniority was inappropriate in assessing the reliability of the test.
Since seniority was an administrative add-on component, there is no reason to expect that there would be a significant correlation or internal consistency between seniority and test items. Dr. Jeanneret eventually performed a reliability analysis using a “linear composite,” which resulted in a coefficient of .82. He also computed reliability using the formula for stratified аlpha, which resulted in a coefficient of .83. The Court finds credible Dr. Jeanneret‘s testimony as to the limited applicability of coefficient alpha in measuring reliability of a heterogeneous test which draws material for test items from multiple sources. The Court further finds that Dr. Jeanneret‘s computations of stratified alpha without inclusion of seniority scores to be more appropriate than Dr. DeShon‘s computation, which included seniority. Finally, the Court finds that Dr. Jeanneret‘s conclusion that the 2002 process was sufficiently reliable is consistent with professional standards and is supported by relevant law. See Hearn v. City of Jackson, 340 F.Supp.2d 728, 740-41 (S.D.Miss.2003) (finding that a reliability coefficient of .79 is a common and acceptable value in the context of a heterogeneous test environment).
(R. 388, Bench Trial Op. at 21-22 (transcript citations omitted).)
On the subject of rank ordering, the court found:
Under both Sixth Circuit precedent and the Guidelines, ranking of candidates is appropriate where it can be shown that a higher score correlates with higher job performance. See Williams v. Vukovich, 720 F.2d 909, 924 (6th Cir.1983);
29 C.F.R. § 1607.14(C)(9) . The requirements for rank ordering can be met through a substantial demonstration of job-relatedness, variance in test scores, and an adequate degree of test reliability. Guardians Ass‘n of N.Y. City Pоlice Dep‘t, Inc. v. Civil Serv., 630 F.2d 79, 104 (2d Cir.1980).As discussed above, the test content of the 2002 process was substantially job-related and there was an acceptable level of test reliability. Many sections of the test consisted of items in which there were several right answers, with differing point values for various elements, and/or opportunities for additional credit, all of which serve to distinguish better performing candidates from lesser performing candidates. (Def‘s Ex. 22, pp. 43-46.) The written test was closely modeled after the like section in the 2000 process, which Dr. DeShon acknowledged was able to differentiate between those candidates with more job knowledge from those with less knowledge. (DeShon, Tr. Vol.5, 546-47.) Additionally, the raw scores on the 2002 assessment show a substantial variance, with the highest raw score of 358.750 and the lowest of 174.750, among 517 candidates. (Def‘s Ex. 17.) See City of Columbus, 916 F.2d at 1102-03 (upholding rank ordering where score range was 40 points among 71 candidates).
Based on the foregoing, the Court finds that rank ordering of the results of the 2002 process was proper, given that the test had an acceptable level of test reliability, was substantially job-related, and had substantial variance among the scores.
(Id. at 22-23.)
Plaintiffs lodge several objections to the reliability and rаnk-ordering findings, laced with a variety of counter-evidence in the opening of their response brief. (See Third Br. at 3-15, 44-62.) We distill three primary arguments: (1) that the district court incorrectly determined that Dr. De-
1. Dr. DeShon‘s Non-Use of Seniority & the Court‘s Credibility Finding
First, plaintiffs deny the district court‘s factual assertion that Dr. DeShon included seniority in his reliability calculations. The City appears to concede the inconclusive nature of the evidence cited by the district court (see Fourth Br. at 27-28), but notes that any error in this regard is harmless because both experts’ reliability scores (.76 from DeShon, .82-.83 from Jeanneret) fall within the range of reliability scores accepted by courts. See, e.g., Hearn, 340 F.Supp.2d at 740 (approving of exam with .79 reliability coefficient). Yet any mistake regarding the constituent parts of Dr. DeShon‘s composite rеliability score (.76) leaves undisturbed the court‘s remaining credibility determinations pertaining to Dr. Jeanneret‘s reliability methodology and testimony—namely, its approval of (1) “Dr. Jeanneret‘s testimony as to the limited applicability of coefficient alpha in measuring reliability of a heterogeneous test which draws material for test items from multiple sources,” and (2) his “conclusion that the 2002 process was sufficiently reliable.” (R. 388, Bench Trial Op. at 21-22.)
The court‘s remaining conclusion—choosing Dr. Jeanneret‘s reliability estimates (.82-.83) over that of Dr. DeShon (.76)—suffers only from the court‘s mistaken belief that Dr. DeShon‘s figure included seniority. So far as we can tell, plaintiffs accept the court‘s related finding that these specific reliability calculations should not include seniority. Surprisingly, for all their complaints about Dr. Jeanneret‘s methods, plaintiffs voice no concern for the higher result he achieved (.82 or .837) using their preferred calculation method, stratified alpha. Arguably, the district court selected Dr. Jeanneret‘s number because it found his testimony more credible (consistent with its other credibility findings on this issue), not because it believed that Dr. DeShon made a calculation error. And even if the district court chose Dr. DeShon‘s reliability number (.76), the district court cited authority approving a similar reliability coefficient. Hearn, 340 F.Supp.2d at 740-41 (.79); cf. Nash, 895 F.Supp. at 1548 (stating that a reliability coefficient “above 0.70 is considered to be reliable“). Plaintiffs provide no authority compelling the conclusion that either a .76 or .82-.83 reliability score for this type of test fails as a matter of law.8
Instead, plaintiffs charge that Dr. Jeanneret conceded the inappropriateness of his own reliability estimate. To the extent plaintiffs suggest that Dr. Jeanneret rejected his own calculations, they misread his testimony. (See R. 648-12, Trial Tr. (Jeanneret) at 1507 (acknowledging that his original report excluded a reliability coefficient, because it would not be an appropriate measure for the test, and stating his belief “that the coefficient alpha or internal consistency index of reliability [would not be] the most appropriate or even really an appropriate index for the reliability of the [2002 process]“).) As the district court noted, Dr. Jeanneret‘s testimony explains the difficulty of calculating a reliability coefficient for a heterogeneous test—i.e., one consisting of multiple, unrelated components that evaluate multiple tasks and characteristics. (See R. 648-10, Trial Tr. (Jeanneret) at 1273-81.) In choosing between the pаrties’ similar reliability estimates, the district court reasonably credited Dr. Jeanneret‘s testimony that the best reliability measures—retesting candidates or administering duplicate tests—were impracticable for a process administered to more than 500 candidates. See, e.g., Anderson v. City of Bessemer City, 470 U.S. 564, 573-74, 105 S.Ct. 1504, 84 L.Ed.2d 518 (1985) (“If the district court‘s account of the evidence is plausible in light of the record viewed in its entirety, the court of appeals may not reverse it even though convinced that had it been sitting as the trier of fact, it would have weighed the evidence differently.“).
2. Rank Ordering
Next, plaintiffs challenge the district court‘s approval of the City‘s use of rank ordering to distinguish between the candidates’ scores, arguing that the court misapplied three legal requirements for this scoring method set by this court in Police Officers for Equal Rights: (1) sufficient raw score spread (2) composite and component reliability, and (3) reasonable job analysis. Yet, as the City points out, our decision in Police Officers for Equal Rights included no such rule; it merely observed that the employer‘s expert used those requirements. See 916 F.2d at 1102. Our standard states that “[r]anking is a valid, job-related selection technique only where the test scores vary directly with job performance.” Id. (quoting Williams v. Vukovich, 720 F.2d 909, 924 (6th Cir.1983)). The EEOC guidelines for content-validity studies support this approach:
If a user can show, by a job analysis or otherwise, that a higher score on a content valid selection procedure is likely to result in better job performance, the results may be used to rank persons who score above minimum levels.
The City‘s evidence clears this hurdle.
a. Job-Relatedness
First, the district court found that the City‘s consultants conducted a “compre-
b. Score Variance
Second, the district court found “substantial variance” among the promotion scores: of the 517 tested candidates, the 2002 process yielded a raw-score point spread of 184 points between the highest and lowest candidates (358.75-174.75), out of a possible 384.5 points. (Id. at 23.) Our review of the exam results reveals no clear error in this finding. (R. 656-23, 2002 Process Exam Results at 1-14.) Nor do we detect clear error in the court‘s finding of significant variance. Cf. Police Officers for Equal Rights, 916 F.2d at 1102-03 (permitting rank ordering where “[t]here was a spread of more than forty points among 71 test takers,” the highest score was 89.66, and the passing score was 70).
Though plaintiffs stress that only one point separated approximately 30 of the more than 500 candidate scores, that circumstance pales in comparison to the sort of score-bunching found problematic elsewhere. See Guardians, 630 F.2d at 103 & nn. 19-20 (finding insufficient reliability for rank ordering where nearly 9,000 applicants, or 2/3 of the passing scores, had scores between 94 and 97, out of 110 possible points). Moreover, the focus on promotional scores here exaggerates the 2002 process‘s bunching effect, because the same candidates’ raw scores ranged between 303 and 341, or 79.0 and 88.7 on a 100-point scale. (See R. 656-23, 2002 Process Exam Results at 3-4.) Varying seniority points (1-10) contributed significantly to this purported bunching problem.
c. Reliability
Third, the district court found sufficient test reliability, crediting Dr. Jeanneret‘s composite reliability scores of .82-.83. Again, we find no clear error with the court‘s factual findings and no error with its legal conclusion.
Plaintiffs briefly mention that the individual components of the 2002 process received poor reliability scores ranging from .32-.79. Indeed, the relatively low component reliability scores give pause. See Police Officers for Equal Rights, 916 F.2d at 1102 (allowing rank ordering where the exam‘s component tests achieved reliability scores ranging from .85-.97). Though the district court did not make specific findings regarding component reliability scores, plaintiffs point to no authority requiring such findings to sustain a rankordering test. Cf. id. at 1103 (holding that “the trial court was not clearly erroneous in accepting . . . [expert] testimony . . . on the issue of reliability and rank order scoring” that happened to include a component reliability estimate) (footnote omitted).
“The district judge is entitled in questions of this kind which require expert [statistical] opinion to rely on that opinion.” Id. So too here, where the district court relied on Dr. Jeanneret‘s opinion that the heterogeneous nature of the 2002 process‘s component tests made reliability coefficients less appropriate mеasures of reliability than other, impracticable methods, like test/re-test consistency or dual-test administration. (R. 388, Bench Trial Op. at 21-22.) And, as we said, both the plaintiffs’ expert and the City‘s expert attained composite reliability figures greater than .75 regardless of any reliability problems with the component tests.
On the topic of SEM, plaintiffs offer no authority explaining why an SEM range of 2.8 (Dr. Jeanneret‘s corrected estimate calculated during trial) to 3.7, by itself, renders the 2002 process inherently unreliable or trumps other measurements of reliability. They do not show, for instance, the sort of score-bunching and passage-rates deemed problematic by the Second Circuit in Guardians. See 630 F.2d at 103 & n. 19 (finding unreliable a rank-ordered promotional test with an SEM of 2.4, explaining that the test “was too easy” and resulted in “8,928 applicants, two-thirds of all who passed, [with] bunched [scores] between 94 and 97” out of a possible 110 points).
As for SED, Dr. Jeanneret‘s supplemental report provides detailed reasons, supported by industry publications, for not relying on this measurement. (See R. 656-7, Jeanneret Resp. Suppl. Rpt. at 34-35.) Specifically, he opposes using large SED bands to equate broad ranges of test scores, explaining that SED bands “are calculated based on the normal probability distribution,” meaning that “the further apart two scores are, the more likely those scores are to be truly different.” (Id. at 34.) He elaborates, citing an industry publication finding that “even when a test is quite reliable, a typical SED band covers so large a part of the test score range that the preferred interpretation of banding advocates . . . is false.” Dr. Jeanneret goes on to note that “test score bands . . . try[ing] to account for measurement error . . . [are] not required, or even endorsed by the professional standards in the field of industrial and organizational psychology (i.e., Principles, 2003; Standards, 1999).” (Id.)
Ultimately, the district court heard the parties’ competing evidence regarding reliability, SEM, and SED, and the court found that the City justified the use of rank ordering with a substantial demonstration of job-relatedness, score variance, and an adequate degree of reliability supporting the likelihood that test scores would correlate to job performance. We find no clear error with the court‘s findings of fact in this regard and no error with its ultimate legal conclusion regarding rank ordering.
3. Seniority Scoring
Last, plaintiffs denounce the City‘s use and weighting of candidates’
Though not quarreling with this standard, plaintiffs challenge the binding effect of the MOU on the City. But, contractual enforceability aside, without showing discriminatory intent or illegal purpose, plaintiffs have no grounds to impugn the City‘s use of seniority. As for weighting, the plaintiffs suggest that the City‘s scoring errors inflated seniority‘s impact from an intended 10% to 25%. The cited testimony, however, appears to refer to something other than a tabulation error; Dr. DeShon differentiates between a “nominal weight” of 10% and an “effective” or “actual weight” of 25%, referring to the degree to which seniority affected promotion score variance. (R. 648-14, Trial Tr. (DeShon) at 1753-55.) Review of the test results (raw scores, scaled scores, and promotion scores) confirms this, revealing that seniority аccounted for up to 10 points of the promotion score, out of a possible 110 points. (See generally R. 656-23.) Regardless of the nature of the alleged scoring error, in the absence of evidence that the City‘s weighting of seniority reflects a discriminatory intent or other illegal purpose, plaintiffs gain no ground. See City of Akron, 824 F.2d at 481. Because the seniority component required no additional validation, the district court properly rejected this aspect of the plaintiffs’ challenge.
V. CONCLUSION
For these reasons, we affirm in part and reverse in part the district court‘s judgment. We AFFIRM the district court‘s immunity-based dismissal of plaintiffs’ negligence claim related to the 2000 process, but we REVERSE the district court‘s Title VII judgment invalidating the 2002 process, thereby MOOTING plaintiffs’ challenge to the district court‘s choice of remedies for the 2002 process. We VACATE the district court‘s fees award and REMAND for further consideration in light of these developments.
James FRAZIER, Petitioner-Appellant, v. Charlotte JENKINS, Warden, Respondent-Appellee.
No. 11-4262
United States Court of Appeals, Sixth Circuit.
Argued: Nov. 18, 2013. Decided and Filed: Oct. 27, 2014.
Rehearing En Banc Denied Dec. 18, 2014.
