771 F.2d 1035 | 7th Cir. | 1985
The plaintiff appeals the findings of the district court, 583 F.Supp. 1475 (D.Wis.1984), that the defendants’ written employment test was job-related and further that the plaintiff failed to establish that other tests would serve the defendants’ legitimate employment interests without a disparate impact on minority applicants. We affirm.
I.
Ronald A. Gillespie is the named plaintiff of a class of approximately forty unsuccessful minority applicants for the position of Personnel Specialist I or Personnel Manager I with the State of Wisconsin. The examination challenged by Gillespie was the first step in the hiring process for these positions. Candidates whose scores on the written examination were above an acceptable level were invited to an interview.
The Wisconsin Department of Employment Relations (“DER”) developed the new written examination in question as former tests utilized by the Department were previously held to have had an inordinate adverse impact on minority applicants. As an initial step in the test development process, the DER convened a committee of “job specialists” who not only had personnel experience in positions similar to the Personnel Specialist/Personnel Manager I positions but also supervised employees in these positions. The job specialists were asked to analyze the positions in order that they might determine what tasks were required of a Personnel Specialist/Personnel Manager I to perform and to identify the knowledges, skills or abilities necessary to perform these tasks. The job specialists delineated the skills, abilities and know-ledges required as: an ability to write standard English; inter-personal skills; decision-making; the ability to work under pressure; and the ability to establish priorities. After the committee reviewed and discussed the skills, abilities, and knowledge required, they rated each as to their respective importance.
Deborah Koyen, a DER employee specializing in recruiting and in developing employment examinations, utilized the list of knowledges, skills, and abilities to construct the written examination. In determining what form of test to implement, Koyen interviewed other government personnel departments concerning the type of test they utilize to screen applicants for entry level professional positions such as the Personnel Specialist/Personnel Manager I positions. After reviewing, weighing and considering their responses, the adverse impact of the previous test (the multiple-choice test), and the skills to be measured by the written test, Koyen decided to construct an essay type rather than a multiple-choice question and answer examination. The test written by Koyen was designed to test the applicants’ abilities to use standard English and to analyze and organize information, leaving inter-personal skills to be tested at the interview level. The essay examination contained three questions. In the first question, the applicants were given a narrative description of a groundkeeper’s duties and were given instructions on how to write a job description utilizing the narrative description. The instructions also contained an example of a complete job description. These extensive instructions on how to write, a job description were also included in an instruction packet to be sent to applicants before the examination to aid the applicant in preparing for the exam. The second question directed the applicants to write a memorandum to another department member requesting certain information required for a departmental meeting. The third question presented statistical information about the minority enrollment in two schools and posed a hypothetical recruiting trip in which the recruiter, whose goal was to contact potential minority applicants, had time only to visit one school. The applicants were asked two questions: First, they were asked to choose which school to visit and to justify that choice. Secondly, they were asked what additional information about the schools they would like to have had before making their selection. Koyen presented the test to the job specialists and the committee agreed that the examination tested the necessary basic skills required for the position.
Furthermore, Koyen administered this test to lower level personnel specialists at the Division of Personnel to determine if the questions and the instructions were clear. Koyen also devised a grading system, established rating criteria, and trained graders to evaluate the examination. Koyen gave the graders sample examination answers to grade and examined the results
The test was given on February 9, 1980 to 451 applicants. Koyen divided the sixteen employees who were to grade the examinations into eight teams of two graders. The papers were distributed among the teams and each paper assigned to a team was evaluated by both graders. In addition, ten to twelve papers were given to all the grading teams. The teams were not told that all the graders were evaluating these particular papers. Koyen ran statistical tests of the reliability of the scores both between and within the grading teams and found that the graders were reliable in their application of the rating criteria. Koyen and other DER employees next examined the scores to determine a proper cut-off point; only applicants whose scores were above the cut-off point would be invited to the interviews. In arriving at the cut-off score, the DER attempted to maximize the number of minority applicants to be invited to the interview while at the same time restricting the total number of invitees in order that the interviewers might not be overwhelmed. One hundred eighty-four candidates, including eleven minority applicants, were invited to the interviews and, after the completion of the interviewing and rating process, nine applicants, three of which were minorities, were offered positions.
Gillespie, at the time of the examination a Personnel Specialist I for the Wisconsin Department of Health and Social Services on a temporary basis for a year, was advised that he did not qualify to be advanced to the interview stage and lodged a complaint with the DER. Because of the complaint, Koyen had Gillespie’s examination re-tabulated by a different team, unaware that they were re-evaluating a test previously scored. This re-evaluated test was rated three points lower than Gillespie’s initial test result. Gillespie subsequently filed a charge of employment discrimination with the EEOC, commenced this action in the United States District Court for the Western District of Wisconsin, and received a determination in a bench trial that the written examination had an adverse impact on minority applicants. Advancing to the second stage of the analysis of a disparate impact claim — whether the test was job-related — the court held that the job specialists were familiar with the tasks performed by Personnel Specialists/Personnel Managers I. Furthermore, the judge decided that the purpose of the test was to determine each applicant’s abilities to “communicate in standard written English; to prepare basic written position descriptions; to place the analysis of a recruiting problem in an acceptable written form; and to identify fundamental and basic skills which are required to perform the duties of the position.” Thus, the district court found that the examination was job-related and was designed to directly measure specific skills important to job performance. Finally, the court determined that the plaintiff failed to show that other tests without a similarly undesirable disparate impact would serve the defendants’ legitimate interest in obtaining qualified applicants.
II.
The Supreme Court in dealing with the disparate impact of an employment test has established a three-part analysis:
“Title VII forbids the use of employment tests that are discriminatory in effect unless the employer meets ‘the burden of showing that any given requirement [has] ... a manifest relationship to the employment in question.’ This burden arises, of course, only after the complaining party or class has made out a prima facie ease of discrimination, i.e., has shown that the tests in question select applicants for hire or promotion in a racial pattern significantly different from that of the pool of applicants. If an employer does then meet the burden of proving that its tests are ‘job-related,’ it remains open to the complaining party to show that other tests or selection devices, without a similarly undesirable racial effect, would also serve the employer’s legitimate interest in ‘efficient and trustworthy workmanship.’ Such a showing would be evidence that the em*1040 ployer was using its test merely as a ‘pretext’ for discrimination.”
Albemarle Paper Co. v. Moody, 422 U.S. 405, 425, 95 S.Ct. 2362, 2375, 45 L.Ed.2d 280 (1975) (citations omitted).
A test is job-related if it measures traits that are significantly related to the applicant’s ability to perform the job. Griggs v. Duke Power Co., 401 U.S. 424, 436, 91 S.Ct. 849, 856, 28 L.Ed.2d 158 (1971). The Uniform Guidelines on Employee Selections Procedures, 29 C.F.R. § 1607 et seq. (1978) (“Guidelines”),
A. Appropriateness of the Choice of a Content Validation Strategy.
1. The Uniform Guidelines do not favor criterion-related validity.
Gillespie points to two sections of the Uniform Guidelines and the APA Standards as support for his contention that the Uniform Guidelines prefer criterion-related validity. According to Gillespie, the APA Standards, incorporated by reference into the Uniform Guidelines, express a preference for criterion-related validity by providing:
“Other forms of validity are not substitutes for criterion-related validity. In choosing the test to select people for a job, for example, an abundance of evidence of the construct validity of a test of flexibility in divergent thinking, or of the content validity of a test of elementary calculus, is of no predictive value without reason to believe that flexibility of thinking or knowledge of calculus aids performance on that job.”
APA Standards, at 27. Furthermore, Gillespie interprets § 1607.14C(5) of the Uniform Guidelines as reflecting a concern that content validated tests are inferior. Section 1607.140(5) provides:
“Reliability. The reliability of selection procedures justified on the basis of content validity should be a matter of concern to the user. Whenever it is feasible, appropriate statistical estimates should be made of the reliability of the selection procedure.”
A review of the Uniform Guidelines and the relevant psychological literature reveals no preference for criterion-related va
Furthermore, the sections of the APA Standards and the Uniform Guidelines that Gillespie relies upon fail to support his argument that criterion-related validity is preferred. The paragraph of the APA Standards Gillespie selectively quotes merely sets forth two basic principals of psychometrics: (1) the choice of validation strategy is dictated by the type of inference the user wishes to draw; (2) a test is irrelevant for employment testing purposes if the factor or trait it measures is not important for job performance. When read in its full and in its complete context, rather than in Gillespie’s selective quotation, the paragraph cannot fairly be construed as expressing a preference for criterion-related validity. Furthermore, the Uniform Guidelines do not express a preference for criterion-related validity by admonishing users to check the reliability of a content validated test. Reliability is a technical term referring to the consistency of the test results. Anastasi at 102. All tests must be statistically examined for evidence of reliability before the test developer can establish the validity of the test. Id. Thus, § 1607.14C of the Uniform Guidelines emphasizes that a content validated test must be examined for reliability, just as criterion-related or construct validated test must be tested for reliability. Gillespie’s argument that this admonition displays a preference for criterion-related validity results only from his reading the section out of context, and is without merit and displays merely a failure, or a refusal, on the part of Gillespie to grasp a basic principal of psychometrics — that all tests must be reliable.
2. Whether the choice of a content validation strategy was appropriate for the characteristics to be measured.
Gillespie points to the district court’s finding that one purpose of the test was to measure an applicant’s ability to prepare written position descriptions and argues that the testing of this ability was inappropriate because writing a job description is a skill readily learned within the first few months of employment. The Uniform Guidelines caution that, “[i]n general, users should avoid making employment decisions on the basis of measures of know-ledges, skills or abilities which are normally learned in a brief orientation period, and which have an adverse impact.” § 1607.-5F. Use of tests which measure knowledge of factual information that will be acquired in a training program, “risks favoring applicants with prior exposure to that information, a course likely to discriminate against a disadvantaged minority.” Guardians Ass’n of New York City v. Civil Service Comm’n of City of New York, 630 F.2d 79, 94 (2d Cir.1980). A test must measure ability rather than factual information normally acquired during the training period. Id. The defendant in this action developed an extensive set of instructions for writing job descriptions and not only included these instructions with the test but also forwarded the instructions to the applicants before the test to aid the
The one argument of the plaintiff’s that merits discussion is that “skills in decision-making and priority setting ... are mere rephrasings of such traits as common sense and judgment, items specifically precluded from a content-validated examination.” The Uniform Guidelines prohibit testing for constructs with a content-validated test:
“A selection procedure based upon inferences about mental processes cannot be supported solely or primarily on the basis of content validity. Thus, a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, aptitude, personality, common sense, judgment, leadership, and spatial ability.”
§ 1607.14C. The specific issue raised by Gillespie is whether a plaintiff sufficiently demonstrates that a content validated test attempted to measure a construct merely by alleging that the test’s goals are “re-phrasings” of a construct.
Gillespie’s free use of the lay definitions of “skills, decision-making and priority setting” fails to acknowledge that psychometricians and the Uniform Guidelines utilize operational definitions of terms. See, e.g., §§ 1607.16A; 1607.14C(4); 1607.16T. The Guidelines specifically provide:
“Where the job analysis also identified the knowledges, skills, and abilities used in work behavior(s), an operational definition for each knowledge in terms of a body of learned information and for each skill and ability in terms of observable behaviors and outcomes, and the relationship between each knowledge, skill or ability and each work behavior, as well as the method used to determine this relationship, should be provided (essential).”
§ 1607.15C(3). By resorting to lay definitions of terms in his “rephrasing,” the plaintiff asks the court to ignore the Guideline’s explicit requirement that knowledges, skills and abilities be operationally defined. Furthermore, the plaintiff’s reliance on lay definitions distracts attention from a critical question under Title YII in determining whether a content validation strategy could be used — whether the trait is too abstract to be measured with a content validated test. Accordingly, we hold that “rephrasing” the goal of the test as a construct is insufficient to prove that a content validated test attempted to measure a construct rather than existing job skills, knowledge or behavior.
The test for determining whether a characteristic is too abstract to be measured by a content validated test is given by the Uniform Guidelines:
“[T]o be content valid, a selection procedure measuring a skill or ability should either closely approximate an observable work behavior, or its product should closely approximate an observable work product. If a test purports to sample a work behavior or to provide a sample of a work product, the manner and setting of the selection procedure and its level and complexity should closely approximate the work situation. The closer the content and the context of the selection procedure are to work samples or work behaviors, the stronger is the basis for showing content validity. As the content*1043 of the selection procedure less resembles a work behavior, or the setting and manner of the administration of the selection procedure less resemble the work situation, or the result less resembles the work product, the less likely the selection procedure is to be content valid, and the greater the need for other evidence of validity.”
§ 1607.14C(4). Thus, the court must evaluate the test for: (1) the degree to which the nature of the examination procedure approximates the job conditions; (2) whether the test measures abstract or concrete qualities; and (3) the combination of these factors, i.e. whether the test attempts to measure an abstract trait with a test that fails to closely approximate the working situation. Guardians, 630 F.2d at 93; Wollack at 405-06.
After analyzing Gillespie’s misstatements of the standards to be applied, we reach the ultimate issue of whether the content validation strategy adopted by the DER was appropriate under the circumstances. Both Deborah Koyen and Daniel Wallock, a testing expert employed by the DER, testified that the purpose of the written examination was to screen out those persons who did not possess the fundamental skills necessary to perform as a Personnel Specialist/Personnel Manager I. A test that screens candidates for fundamental skills may be validated by the content validation procedure. Wollack at 404-05. Furthermore, a review of the characteristics to be measured by the test and of the form of the test reveals that a content validation method was appropriate. The abilities to communicate in standard written English, to prepare a written job description, and to place the analysis of a recruiting problem in acceptable form are concrete, observable and quantifiable characteristics. Moreover, planning a recruiting trip and writing a job description and a memorandum closely simulate the actual work performed by a Personnel Specialist/Personnel Manager I. Because the test measured concrete characteristics in a form that simulated actual work behavior, we hold that the content validation procedure could be employed to validate the test.
B. Sufficiency of the Content Validation Study Conducted by the DER.
Gillespie argues that the job analysis performed by the DER as part of its content validation of the written test was inadequate, thus rendering the content validation study insufficient, because the job specialists were not trained in psychometrics and were unfamiliar with the job of Personnel Specialist/Personnel Managers I.
We turn now to Gillespie’s second attack on the sufficiency of the content validation study, his contention that the skills measured by the test did not reflect the work done in the personnel positions. Gillespie alleges that the test was not representative of the tasks performed by Personnel Specialist/Personnel Managers I because the examination did not test all or nearly all skills required. Title VII does not require an employer to test all or nearly all skills required for the occupation. Guardians, 630 F.2d at 99. To be representative for Title VII purposes, an employment test must neither: (1) focus exclusively on a minor aspect of the position; nor (2) fail to test a significant skill required by the position. See, id. Testimony at trial established that the ability to communicate in standard written English, to analyze and organize information, and to use these skills to write, edit or review job descriptions; to write memoranda; or to plan recruiting trips were vitally important in performing as a Personnel Specialist/Personnel Manager I; thus, the record reveals that the test did not focus exclusively on a minor aspect of the work. Moreover, the plaintiff does not suggest that the examination failed to test a significant skill required by the position. Accordingly, we hold that the examination was representative because it tested skills necessary for adequate performance as a Personnel Specialist/Personnel Manager I.
C. Reliability and Use of the Test Scores
According to the record, a pretest was conducted to determine whether the questions and instructions were clear.
Gillespie also argues that the cutoff score was not logically chosen and that the DER improperly used the scores on the written examination to rank-order the applicants. An employer using a test to screen job applicants must fulfill two requirements to justify the choice of a cut-off score: The test scores must be reliable, Guardians, 630 F.2d at 101-02, and the employer must have some justifiable rea
Finally, we turn to Gillespie’s contention that the DER improperly used the written examination scores to rank-order the applicants. “Use of a selection procedure on a ranking basis may be supported by content validity if there is evidence from job analysis or other empirical data that what is measured by the selection procedure is associated with differences in levels of job performance.” EEOC Questions and Answers, Q. 62. Gillespie contends in his appellate brief, that the testimony of his expert, Dr. Raynor, established that the written examination score caused a disparate impact on the final, rank-ordered score. However, on cross-examination Gillespie’s own expert witness stated that his testimony had not addressed the questions of whether “there was any statistically adverse impact as to the composite exam or as to the oral exam.” We hold that the plaintiff failed to demonstrate that the use of the written exam scores in determining a final score caused a disparate impact on minority applicants.
D. The Plaintiff Failed to Produce an Alternative Test.
Gillespie contends that the DER could have employed an essay examination that sought short answers to a large number of questions; could have developed a multiple-choice test; or could have utilized a commercially developed test. However, this bare assertion, without any supporting examples, much less supporting data, that using other tests was possible fails to satisfy the plaintiff’s burden of demonstrating that other tests or selection devices without a similar undesirable racial effect would also serve the employer's legitimate interests; specifically, the plaintiff fails to demonstrate that each of these hypothetical alternatives would have adequately measured the applicant’s ability to communicate in standard written English or the ability to organize and analyze information. Furthermore, Gillespie’s suggestion that the DER should have used a multiple-choice test ignores Koyen’s testimony that the DER previously used multiple-choice tests and found that there was a disparate impact on minorities. Finally, Gillespie’s bald assertion that commercially available tests are valid ignores requirements imposed on the employer by the Uniform Guidelines. An employer may support the
The judgment of the district court is . AFFIRMED.
. The Guidelines are issued by the agencies having primary responsibility for the enforcement of the Federal Equal Employment Opportunities Laws — the EEOC, the Civil Service Commission, the Department of Labor, and the Department of Justice. 29 C.F.R. § 1607.1A.
. APA, Standards for Educational and Psychological Test (1974) ("Standards”).
. A criterion-related validation study determines whether the test is adequately correlated with the applicant’s future job performance. Wollack, Content Validity: Its Legal and Psychometric Bases, Personnel Management, Nov-Dec 1976, 397 at 402 (hereinafter "Wollack”). Criterion-related tests are constructed to measure certain trails or characteristics thought to be relevant to future job performance. Id. at 403. An example of an employment test that would be validated by the criterion-related validation method is an intelligence test. The content validation strategy is utilized when a test purports to measure existing job skills, knowledge or behaviors. Id. "The purpose of content validity is to show that the test measures the job or adequately reflects the skills or knowledge required by the job.” Id. For example, a typing test given to prospective typists would be validated by the content validation method. Construct validity is used to determine the extent to which a test may be said to measure a theoretical construct or trait. Anastasi, Psychological Testing, 144 (1982) (hereinafter "Anastasi"). For example, if a psychologist gave vocabulary, analogies, opposites and sentence completion tests to a group of subjects and found that the tests have a high correlation with one another, he might infer the presence of a construct — a verbal comprehension factor. Anastasi at 146.
. Additionally, Gillespie contends that the job analysis was inadequate because the relative importance of the qualities were not adequately measured. However, the DER employees’ testimony support the district court’s finding that “the job elements were ranked on the basis of the importance to the position, and a weighting was done as it concerns the critical elements.” The findings of fact made by a district court will be upheld unless they are clearly erroneous. Fed.R.Civ.P. 52(a). Because the plaintiff is unable to show that the district court’s finding that the job elements were ranked was clearly erroneous, it will not be set aside by this court.
. Gillespie argues that Koyen failed to correct problems revealed by the pretest — specifically, that the pretestees chose schools A and B in the recruiting question with equal regularity and that Question 1 failed to supply sufficient information to calculate percentages called for in the question. The record reveals that Koyen was aware of the problem with the recruiting question and discussed it with the graders. Furthermore, Gillespie's alleged problem with lack of information to calculate the percentages asked for in Question 1 was not identified either by the pretestees or the graders; indeed, the “problem" only appears in Gillespie's strained reading of the instructions. We hold that the plaintiff's allegation that the defendant ignored problems revealed by the pretest is not supported by the record.