United States v. Yonkers Board of Education

123 F. Supp. 2d 694 | S.D.N.Y. | 2000

123 F. Supp. 2d 694 (2000)

UNITED STATES of America, Plaintiff,
and
YONKERS BRANCH—NAACP, et al., Plaintiffs-Intervenors,
v.
Yonkers Board of Education, et al., Defendants.

No. 80 Civ. 6761(LBS).

United States District Court, S.D. New York.

November 30, 2000.

*695 Michael H. Sussman, Law Offices of Michael H. Sussman, Goshen, NY, for plaintiff-intervenor, Yonkers Branch—NAACP.

Raymond P. Fitzpatrick, Jr., Fitzpatrick, Cooper & Clark, Birmingham, AL, for defendant City of Yonkers.

Stephen M. Jacoby, Eliot Spitzer, New York City, Attorney General for the State of New York, for State defendants.

OPINION

SAND, District Judge.

PREFACE

This Opinion was prepared and ready for filing this past August. However, apprised of ongoing settlement discussions, the Court, on August 21, 2000, issued the following Order:

The Court notes that Governor George E. Pataki has issued a statement asserting the State's commitment to a consensual resolution of the pending school litigation and the payment of a $10,000,000 advance toward such settlement.
In light of the foregoing, this Court will withhold its response to the Court of Appeals remand pending submission to it of a proposed settlement agreement.
If at any time any party shall be of the opinion that progress toward achieving a settlement is not being made at a satisfactory pace or that settlement efforts are at an impasse, this Court shall be promptly advised of such circumstances.
SO ORDERED.

On November 29, 2000, the Court received a letter from the NAACP urging that the Court "issue its vestiges remand decision at the earliest possible occasion and move the process of final resolution forward."

OPINION

Plaintiffs brought this action alleging that public housing and education in the City of Yonkers had been unlawfully segregated according to race. The Court finds that, as of 1997, vestiges of segregation existed in the Yonkers public schools. We therefore refer the matter to the Court-appointed School Monitor to report and recommend, after appropriate proceedings, as to a suitable remedy.

I. BACKGROUND

A. Procedural History

In 1985, this Court found that the City of Yonkers ("the City") and the Yonkers Board of Education ("the YBE") had intentionally segregated the Yonkers public schools ("the YPS"). See United States v. Yonkers Bd. of Educ., 624 F. Supp. 1276, *696 1376-1545 (S.D.N.Y.1985) ("Yonkers I")[1], aff'd, 837 F.2d 1181 (2d Cir.1987) ("Yonkers III"). The following year, we ordered a remedy, which came to be known as the "educational improvement plan," or "EIP I." See United States v. Yonkers Bd. of Educ., 635 F. Supp. 1538 (S.D.N.Y.1986) ("Yonkers II"), aff'd, 837 F.2d 1181 (2d Cir.1987). The centerpiece of EIP I was a voluntary magnet school program that was designed to eliminate the severe racial imbalance that had previously existed with respect to student and faculty assignments, as well as to alleviate inequalities in facilities and extracurricular offerings. See id. By all accounts, the plan—which organized schools and programs around particular themes and assigned students based on their thematic and programmatic preferences— was a dramatic success. School enrollments were totally desegregated within one year of EIP I's implementation and, moreover, "[t]he transition took place in a relatively smooth and peaceful manner, without the disturbances and disruption which plagued desegregating school districts elsewhere in this country." United States v. City of Yonkers, 833 F. Supp. 214, 216 (S.D.N.Y.1993) ("Yonkers IV").

Despite EIP I's obvious successes, local school officials in Yonkers came to believe that it had only partially remedied the many entrenched problems which, they believed, were the legacy of the prior segregation.[2] These officials were unable to implement more thorough reform, however, because all available funds were being used to implement EIP I. The YBE, therefore, in 1987, filed a cross-claim against the State of New York and various state agencies and officials (collectively, "the State Defendants"), seeking a contribution of state funds that could be used to eradicate all remaining vestiges of public school segregation in Yonkers.[3]

After the State Defendants' motions to dismiss and for summary judgment were denied,[4] the Court commenced a trial, which was to be conducted in three stages. Because the State Defendants would not be liable for remedial funding if segregation had been completely eradicated by EIP I, the first stage ("the 1993 trial") sought to determine whether or not there were vestiges of segregation. Our conclusion —that vestiges of segregation remained —was premised upon two findings of fact. We found, first, that a disparity existed with respect to the level of academic achievement attained by minority and non-minority students, see Yonkers IV, 833 F.Supp. at 220-22; and second, that the causes of that disparity were a combination of low teacher expectations for minority students and a curriculum that predated desegregation and had become anachronistic, see id. at 222.

Having found that vestiges existed, the Court then turned, in the trial's second phase ("the 1994 trial"), to the question of whether or not the State could be held liable for the pre-1985 segregation of the YPS and, therefore, required to contribute to the remedy. At the conclusion of that *697 phase, we found that, as a matter of fact, the State Defendants' conduct had been a contributing cause to the pre-1985 segregation, but nevertheless concluded, following Arthur v. Nyquist, 573 F.2d 134 (2d Cir.1978), that the State Defendants could not be held liable for their role in that violation. See United States v. City of Yonkers, 880 F. Supp. 212 (S.D.N.Y.1995). The Court of Appeals affirmed our factual finding (i.e., that the State Defendants' conduct had contributed to the pre-1985 segregation), but reversed our legal conclusion. The court held that the State Defendants were liable, along with the City and the YBE, for the prior segregation of the YPS and that the State Defendants' could therefore be required to contribute funding for remedial measures. See United States v. City of Yonkers, 96 F.3d 600 (2d Cir.1996) ("Yonkers V").

The court's decision in Yonkers V required this Court to proceed to the third, and final, stage of the trial that had begun in 1993. The principal focus of the third stage ("the 1997 trial") was to determine an appropriate remedy. However, because four years had by that time elapsed since our initial finding that vestiges of segregation remained, the parties were also permitted to present evidence as to whether or not any vestiges continued to exist as of 1997. At the conclusion of the trial's third stage, we re-affirmed our prior finding that vestiges of segregation remained and ordered the state to contribute funding for additional remedial measures, which came to be known as "EIP II." See United States v. Yonkers Bd. of Educ., 984 F. Supp. 687 (S.D.N.Y.1997) ("Yonkers VI").

The State Defendants and the City appealed, but sought no stay of our order, which was therefore in effect from October 8, 1997 until the Court of Appeals' stay order of August 5, 1999.[5] Because it believed that we had not articulated in detail all of the reasoning that underlay our findings, nor provided a detailed summary of the evidence, the Court of Appeals characterized our findings with respect to vestiges as "vague." See United States v. City of Yonkers, 197 F.3d 41, 51 (2d Cir.1999) ("Yonkers VII"). Nevertheless, a majority of the appellate panel was able to discern two vestiges that, it believed, we had identified. It characterized those vestiges as (1) that "Yonkers' curriculum and teaching techniques are insufficiently multi-cultural," id. at 51; and (2) "low teacher expectations for minority students," id. at 52. The majority found the record to be legally insufficient to support those findings, and, therefore, reversed. The majority also explained that it had scrutinized the record and (at least initially) determined that it could support no "alternative findings" of vestiges. See id. at 45.

The third member of the panel, Judge Sack, filed an opinion concurring in part and dissenting in part. Although Judge Sack agreed that this Court had not set forth an adequate basis to support our finding of vestiges, he dissented on the ground that it would have been preferable for the Court to remand for further factual findings, rather than to reverse the findings we had made, scrutinize the record, and foreclose the possibility of any alternative findings of vestiges. See United States v. City of Yonkers, 181 F.3d 301, 321-30 (2d Cir.1999) (Sack, J., concurring in part and dissenting in part). After the NAACP sought reargument and an en banc hearing, the majority, for reasons that are not disclosed, came to agree with Judge Sack's views. It, therefore, vacated its prior opinion and remanded the case. The "limited purpose" of the remand was to permit this Court

to make further findings on the present record and in light of this opinion as to *698 whether—or not—there are remaining vestiges of segregation in the Yonkers school system, and if so what they are and what record evidence is relied on for support.

Yonkers VII, 197 F.3d at 46.

B. Scope of Remand

It is not without some trepidation that we now approach that limited task. We have been directed to render more detailed findings of fact in support of a conclusion that two members of the panel that will hear this case on appeal[6] have expressly rejected. According to a published, though withdrawn, opinion, those two have "conducted [their] own careful scrutiny of the record, to see if it could support findings of vestige[s] ..." and, after that careful study, were "convinced that a remand would waste judicial resources and put off what in the end would be the same result." United States v. City of Yonkers, 181 F.3d at 313 n. 3

Our trepidation is enhanced by the fact that, despite our best efforts and the guidance of the parties, we are genuinely puzzled as to the scope of the issues that have been remanded. Judge Sack's view—of which the majority says it ultimately became convinced, see Yonkers VII, 197 F.3d at 46—was that our findings were insufficiently detailed, as required by Federal Rule of Civil Procedure 52(a). See United States v. City of Yonkers, 181 F.3d at 322-25. Ordinarily, when findings are vacated on Rule 52(a) grounds, the purpose of the remand is to permit the district court to either supplement its findings or to conclude that in light of the Court of Appeals' analysis, its findings are unsustainable. See Davis v. New York City Housing Authority, 166 F.3d 432, 435 (2d Cir.1999) (citations omitted); Inverness Corp. v. Whitehall Labs., 819 F.2d 48, 50-51 (2d Cir.1987). The passage quoted above, which describes the purpose of the remand as being to provide this Court with an opportunity "to make further findings on the present record and in light of this opinion," Yonkers VII, 197 F.3d at 46, seems consistent with this view. The panel also characterizes its opinion as being "carefully limited to a review of the findings actually made by the district court and of the record evidence cited by the Board of Education and the NAACP," id. at 49, which suggests, as well, that the panel has rejected its prior decision to scrutinize the record and has decided, instead, to limit its review to an analysis of the adequacy of our findings. Those two passages, read in light of the fact that the panel claims to have been persuaded by Judge Sack's analysis, would lead us to conclude that the panel wishes us to reexamine the record and either render more detailed findings in support of our conclusion that vestiges of segregation remained as of 1997, or to conclude that no such finding is possible in light of the panel's reasoning.

Other portions of the remanding opinion, however, indicate that, at least with respect to some issues, no amount of elaboration or explanation would suffice. The introductory paragraph of the majority's discussion of what it calls the "educational theory" vestige concludes with the following sentence:

*699 Our review is somewhat hampered by the district court's failure to make specific factual findings on the subject, but because we do not want to prolong unnecessarily this already-lengthy litigation, we look to the record ourselves (and specifically to passages highlighted by the Board) rather than remand the case for a further articulation of findings on this particular issue.

Id. at 51 (citing Wessmann v. Gittens, 160 F.3d 790, 802 (1st Cir.1998)) (emphasis added); see also id. at 45 (indicating that the panel adheres to its prior conclusion "that there was insufficient record support for the only two vestiges found by the district court....") Although this passage strikes us as inconsistent with the panel's decision to remand (which necessarily entails a prolongation of this "already-lengthy litigation"), and with the majority's characterization of its opinion as being limited to a review of our findings, we must acknowledge that the sentence is included in the panel's second, substituted opinion and is, therefore, legally binding upon this Court. See United States v. Tenzer, 213 F.3d 34, 40 (2d Cir.2000). The remanding panel has definitively resolved, therefore, that the record cannot support any findings that vestiges of segregation existed in the Yonkers public schools as of 1997 with respect to curriculum or teaching techniques. We therefore do not address those issues in this opinion.[7]

The panel's discussion of the "teacher expectations" vestige, by contrast, is summarized with the following observation:

[T]he evidence that teachers have low expectations of minority students is entirely based on scattered anecdotes, and the evidence supporting a causal link between these low expectations and prior de jure segregation is a set of subjective, intuitive. impressions. This is not enough.

Id. at 53 (citations omitted). The State Defendants read this passage to mean that the panel has also adhered to its conclusion that the record cannot support a finding that low teacher expectations are a vestige of segregation. (See State Defendants' Reply Memorandum of Law on Remand, Pursuant to the November 16, 1999 Order of the Court of Appeals for the Second Circuit ("State Reply") at 5-6.) We agree. See Yonkers VII, 197 F.3d at 45 (indicating that the second opinion adheres to the conclusion that the two vestiges (educational theory and teacher expectations) discussed in the earlier opinion were insupportable).

But the State Defendants also read this passage to signify that this Court may not in any way, directly or indirectly, refer to "teacher expectations," nor the evidence in the record on that subject. (See State Reply at 16; State Defendants' Proposed Findings of Fact and Conclusions of Law on Remand Pursuant to the November 16, 1999 Order of the Court of Appeals for the Second Circuit ("State Proposed Findings") at 61, ¶ 7.)[8] With this interpretation of the panel's discussion, we do not agree. The remanding panel, apparently, held that the record failed to support our finding that low teacher expectations for minority students are a vestige of segregation. It said nothing, however, about whether evidence of teacher expectations, *700 when considered in conjunction with other evidence, might support a finding that an alternative vestige of segregation exists.

Similarly, the panel's opinion addressed the disparity in achievement test scores, upon which this Court placed great reliance in our earlier opinions. The court noted that "using achievement test scores as a measure, either direct or indirect, of a school system's movement away from segregation is deeply problematic." Yonkers VII, 197 F.3d at 54 (citing Missouri v. Jenkins, 515 U.S. 70, 101, 115 S. Ct. 2038, 132 L. Ed. 2d 63 (1995); People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537 (7th Cir.1997); Coalition to Save Our Children v. State Bd. of Educ., 90 F.3d 752, 776-78 (3d Cir.1996)). However, the court did not examine the propriety of our reliance on such a measure because it reasoned that even "[a]ccepting arguendo the study's conclusion of a racial disparity, the study fails to show that the disparity was caused by pre-1986 segregation in Yonkers, as opposed to, for example, generalized `societal discrimination.'" Id. at 54-55 (citing Wessmann, 160 F.3d at 803-04; Swann v. Charlotte-Mecklenburg Bd. of Educ., 402 U.S. at 1, 22, 91 S. Ct. 1267, 28 L. Ed. 2d 554 (1971)). "In short," the court concluded, "a finding of prior segregation coupled with a finding of present day racial differences in educational achievement, is an insufficient positive test for the presence of residual segregative effects." Id. at 55 (citing Wessmann, 160 F.3d at 801).

The State Defendants' argue that these passages indicate that the Court of Appeals has foreclosed any consideration of the demonstrated gap in minority achievement in Yonkers. (See State Reply at 5-7; State Proposed Findings at 61, ¶ 7.) We, however, read the panel's discussion of this issue far more narrowly. We adhere to the panel's conclusion that statistical analyses of test scores, standing alone, fail to establish the existence of vestiges, but we do not believe that the panel's conclusion on this point implies that statistical analyses of test scores may not be an evidentiary factor weighed by the Court along with other evidence in reaching a conclusion that vestiges of segregation existed.[9]

We understand, of course, that "[t]his Court does not function as an appellate court from the Court of Appeals." (State Reply at 4.) We have no desire or inclination to contradict or somehow evade the rulings of that court. But this case was remanded because the appellate panel came to believe that it was "worthwhile ... to ensure" that it had "the full benefit" of our views. Yonkers VII, 197 F.3d at 46. We have attempted, in the pages that follow, to provide the panel with that "full benefit," though we recognize that some of our views have already been rejected. We do not therefore consider, in this opinion, whether educational theories, teacher expectations, or disparate test results are themselves vestiges of segregation—the remanding panel has foreclosed any such consideration. But for us to determine whether or not alternative vestiges existed and to explain the evidentiary basis of any such finding, it is necessary for us to consider evidence with respect to disparities in test scores and teacher expectations. That evidence is part of the record in this case. The Court of Appeals' conclusion that it provides insufficient support for certain findings does not render it inadmissible. Our consideration of test scores and teacher expectations is not therefore, as the City and the State Defendants would have it, an attempt to "indirectly" overrule or evade the Court of Appeals' ruling. To the contrary, it is the only way we can complete meaningfully the task that court has assigned us.

II. LEGAL STANDARDS

A vestige of segregation is a policy or practice which is traceable to the prior de jure system of segregation and which continues to have discriminatory effects. See United States v. Fordice, 505 U.S. 717, *701 727-28, 112 S. Ct. 2727, 120 L. Ed. 2d 575 (1992); Freeman v. Pitts, 503 U.S. 467, 495-96, 112 S. Ct. 1430, 118 L. Ed. 2d 108 (1992).[10] This Court's approach to the question of vestiges, both in our prior opinion and as amplified below, is to focus, first, on whether or not current policies or practices in Yonkers were, as of 1997, having a segregative effect in the public schools.

As courts do in a variety of legal contexts that involve intricate and subtle questions of causation, we examine the question of segregative effect inferentially. The Supreme Court's approach in employment discrimination cases provides perhaps the most familiar analogy. See, e.g., St. Mary's Honor Ctr. v. Hicks, 509 U.S. 502, 113 S. Ct. 2742, 125 L. Ed. 2d 407 (1993); Texas Dep't of Community Affairs v. Burdine, 450 U.S. 248, 101 S. Ct. 1089, 67 L. Ed. 2d 207 (1981); McDonnell Douglas Corp. v. Green, 411 U.S. 792, 93 S. Ct. 1817, 36 L. Ed. 2d 668 (1973). Gross statistical disparities in hiring data may justify an inference of discrimination because "absent explanation, it is ordinarily to be expected that nondiscriminatory hiring practices will in time result in a work force more or less representative of the racial and ethnic composition of the population in the community from which employees are hired." Hazelwood Sch. District v. United States, 433 U.S. 299, 307, 97 S. Ct. 2736, 53 L. Ed. 2d 768 (1977) (citations and internal quotation marks omitted). A prima facie showing of a discriminatory employment practice "raises an inference of discrimination ... because we presume those acts, if otherwise unexplained, are more likely than not based on the consideration of impermissible factors." Burdine, 450 U.S. at 254, 101 S. Ct. 1089 (quoting Furnco Construction Corp. v. Waters, 438 U.S. 567, 577, 98 S. Ct. 2943, 57 L. Ed. 2d 957 (1978) (internal quotation marks omitted)). A similar approach is followed in disparate impact cases, see Connecticut v. Teal, 457 U.S. 440, 446, 102 S. Ct. 2525, 73 L. Ed. 2d 130 (1982); Albemarle Paper Co. v. Moody, 422 U.S. 405, 425, 95 S. Ct. 2362, 45 L. Ed. 2d 280 (1975); Dothard v. Rawlinson, 433 U.S. 321, 329, 97 S. Ct. 2720, 53 L. Ed. 2d 786 (1977); Griggs v. Duke Power Co., 401 U.S. 424, 431, 91 S. Ct. 849, 28 L. Ed. 2d 158 (1971), which we find particularly instructive here, since the question we are addressing is remarkably similar to the one addressed in that context. See Griggs, 401 U.S. at 430-31, 91 S. Ct. 849 (describing goal of disparate impact section of Title VII as being "to achieve equality of employment opportunities and remove barriers that have operated in the past to favor an identifiable group of white employees over other employees.")

We begin, in this case, with two premises shared by all parties: (1) the function of schools is to teach; and (2) all children can learn, without regard to their ethnic or racial heritage.[11] The implication of those premises is that, absent some explanation, one would expect students of *702 different races to achieve similar levels of academic success. If statistical data demonstrates a racial disparity in academic achievement, as the record in this case indisputably does, then an explanation is required, just as a prima facie showing of a discriminatory employment practice requires an employer to come forward with an explanation for the apparent disparity. The employer's burden in that context, and the City and the State Defendants' burden here, is one of production only, i.e., the burden of coming forward with a nondiscriminatory, or non-segregative, explanation. See Reeves v. Sanderson Plumbing Products, Inc., 530 U.S. 133, 120 S. Ct. 2097, 147 L. Ed. 2d 105 (2000); Burdine, 450 U.S. at 254-56, 101 S. Ct. 1089; Meiri v. Dacon, 759 F.2d 989, 996-97 (2d Cir. 1985).

The YBE[12] and NAACP have attempted to demonstrate that the explanation for the shortfall in minority achievement in Yonkers is a cluster of policies and practices that have a disparate, negative impact on minority students. The City and the State Defendants maintain that the explanation is a combination of non-racial factors (such as socioeconomic status, birth weight, and levels of parental education) that disproportionately disadvantage minority students, as well as ambient societal discrimination, as reflected by the fact that an achievement gap exists in several other districts that have not been the subject of a judicial finding of unlawful segregation.

Although we address in detail the parties' arguments on these issues below,[13] we note in advance that we recognize the temptation to ascribe the shortfall in minority achievement to a concept as amorphous[14] and imperceptible as "ambient societal discrimination"—a problem for which no individual or group bears any particular responsibility. It is similarly tempting to attribute the disparity to certain entrenched realities of this nation's economic and social history—such as the levels of parental education in, and socioeconomic status of, Latino and African-American households—which are simply beyond the remedial reach of courts, schools, or other government officials. We believe, however, that it is essential for this Court, having already found a constitutional violation which we have attempted to remedy, to insure that these tempting explanations are not accepted as a more palatable surrogate for what is, in reality, a denial of our aforementioned fundamental premise that all children can learn.

To provide a degree of assurance that the City and the State Defendants are advancing a tangible and credible explanation for the shortfall in minority achievement, rather than merely sloughing off responsibility under the guise of "ambient societal discrimination," we assign those parties the burden of producing evidence that might explain the shortfall in minority achievement and exclude the explanation urged by the YBE and NAACP. Cf. Burdine, 450 U.S. at 255 n. 8, 101 S. Ct. 1089 ("[A]ssessing the burden of production helps the judge determine whether the litigants have created an issue of fact to be decided.... In a Title VII case, the allocation of burdens and the creation of a presumption by the establishment of a prima facie case is intended progressively to sharpen the inquiry into the elusive factual question of intentional discrimination."). *703 In other words, the combination of our fundamental premise that all children can learn with the demonstrated shortfall in minority achievement, leads the Court to allocate the City and the State Defendants, if not the ultimate burden of persuasion,[15] at least a burden of production. See Fleming James, Jr. et al., Civil Procedure § 7.15, at 342 ("Allocation of the burden of production is determined by rules extrinsic to the rules of evidence and trial procedure. These rules are specific to various issues and generally are correlated with the allocation of the burden of pleading.").

The State Defendants rely, for example, on the fact that four districts other than *704 Yonkers which are said to be comparable demographically to Yonkers, have never been judicially determined to have engaged in racial discrimination and have some achievement test scores (PEP scores) which are comparable to Yonkers in terms of their racial disparity. The State proffers nothing more concerning these four districts which it selected for comparison. That is, it offered nothing to show that these districts are indeed comparable or that, despite the absence of a judicial decree, they did not engage in de jure segregative practices.[16] If any burden is placed on the party with the readiest access to information on the contested issue, see Fleming James, Jr. et al., Civil Procedure § 716, at 344, cited in United States v. City of Yonkers, 181 F.3d at 310, then with respect to this particular issue, the State surely bears a burden of production. Moreover, if one rejects the State's reliance on those four other school districts about which they have produced relatively little information (as we do, for the reasons set forth below, see infra Part *705 III(A)(2)), their defense is reduced to a totally ineffective statistical presentation and anecdotal testimony by persons totally unfamiliar with the YPS. Such a defense fails to sustain even the minimal burden of production described above and therefore fails to rebut the presumption—grounded in the premise that all children can learn and the statistical evidence of a significant gap in minority achievement—that existing disparities in educational quality are the effect of segregative practices.

Of course, for the Court to conclude that vestiges of segregation existed, it is also necessary for the YBE and the NAACP to identify those segregative policies and practices and to establish that they are "traceable" to the prior segregation. Although a policy or practice must "have a causal link to the de jure violation" in order to constitute a vestige, Freeman, 503 U.S. at 496, 112 S. Ct. 1430, that link need not be the exclusive cause for the policy in question. "Traceable" does not mean "exclusively caused by," or even "predominantly caused by." So long as the current policy had its roots in the prior regime, or had an antecedent in the prior regime, it may constitute a vestige of segregation if it has a segregative effect.[17]

At its core, the question of whether or not vestiges of segregation existed in Yonkers as of 1997 is a question about the adequacy of prior remedial measures. We certainly understand that this Court's remedial authority extends only so far as the constitutional violation to which it is addressed. See Milliken v. Bradley, 418 U.S. 717, 744, 94 S. Ct. 3112, 41 L. Ed. 2d 1069 (1974); Swann, 402 U.S. at 22, 91 S. Ct. 1267 ("One vehicle can carry only a limited amount of baggage."). But before we abandon all remedial efforts and conclude that any existing disparities in the Yonkers public schools are the inevitable consequence of intractable social and economic inequalities, it is incumbent upon all interested parties to be certain that there is nothing further that can be done, practicably, to address the problems caused by the City and the State Defendants' past violations. "Ambient societal discrimination" is an acceptable residual position if but only if all feasible remedies have been exhausted.[18] It is this Court's view, after *706 twenty years of intense involvement with this case, that we have not yet reached that point.

III. FINDINGS OF FACT[19]

A. The Disparity in Academic Achievement

In our earlier opinions, we explained that this Court has found that "minority students (black and Hispanic) lag behind majority students (for these purposes, white and Asian students) in reading and math ... and that to a statistically significant extent, race is a factor with regard to levels of academic achievement in the Yonkers public schools." Yonkers IV, 833 F.Supp. at 221; see Yonkers VI, 984 F.Supp. at 690. The evidentiary basis for that finding was our conclusion that the statistical expert proffered by the YBE, Dr. Jomills Braddock, presented a more credible and persuasive explanation of the test score data than did the State Defendants' expert, Dr. David Rindskopf. Moreover, we found that upon close scrutiny, Dr. Rindskopf's analysis actually corroborated Dr. Braddock's conclusion. See Yonkers IV, 833 F.Supp. at 221-22.

The remanding panel did not disturb our finding on this subject. It believed it was unnecessary to address the issue, reasoning that a finding of a racial disparity in test scores failed to support the conclusion that vestiges of segregation remained. See Yonkers VII, 197 F.3d at 54-55 ("Accepting arguendo the study's conclusion of a racial disparity, the study fails to show that the disparity was caused by pre-1986 segregation in Yonkers, as opposed to, for example, generalized `societal discrimination.'") (citations omitted). Upon remand, for the reasons set forth in Appendix A, we re-affirm and elaborate upon our finding that race is a statistically significant factor in explaining the gap in achievement between minority and non-minority students in Yonkers. We take this opportunity to explain further, in light of the remanding panel's concerns, the significance of that finding in our reasoning as to the existence of vestiges.

We agree with the remanding panel that "a finding of prior segregation, coupled with a finding of present day racial differences in educational achievement, is an insufficient positive test for the presence of residual segregative effects." Yonkers VII, 197 F.3d at 55 (citing Wessmann, 160 F.3d at 801). Such a test has never been applied by this Court. However, when one begins with the premise that all children can learn, a "finding of present day racial differences in educational achievement" creates a strong basis for inferring that, in some way, the school district is failing to teach minority students. Because the State Defendants have failed to come forward with any evidence that might rebut that inference, we find that some set of policies or practices in the Yonkers public schools, which inadequately serves the needs of minority students, must be responsible for the shortfall in minority achievement.[20] The more closely that shortfall is correlated with the students' racial or ethnic heritage, the more confident we are in the reliability of that inference.

The Court has received testimony about a variety of different measures of academic achievement: (1) Metropolitan Achievement Test ("MAT") scores; (2) Pupil Evaluation Program ("PEP") scores; (3) dropout *707 and graduation rates; and (4) the rates at which students pursue post-secondary education.[21] Because MAT scores have been the subject of two detailed studies by statistical experts, they necessarily occupy a prominent place in our reasoning. It is, principally, due to the disparity in MAT scores and the statistical analyses' thereof that we conclude that the record demonstrates a correlation between the gap in minority achievement and the students' ethnic or racial heritage. Because all of the other measures of academic achievement manifest a gap in minority achievement that is consistent with the gap in MAT scores, we believe those measures provide additional corroborative support for our finding of a correlation between the achievement gap and students' race. That finding, however, rests principally on the detailed analyses' of MAT scores discussed below in Appendix A.

The State Defendants argue, repeatedly,[22] that any statistical evidence we have received that has not been the subject of an expert's scrutiny (i.e., evidence about all measures other than MAT scores) is utterly worthless. Without a regression analysis or some other method of isolating the effect attributable to race, they maintain, evidence of a statistical disparity with respect to an educational outcome is meaningless.[23] We believe this contention misperceives the nature of statistical evidence. Cf. Bazemore v. Friday, 478 U.S. 385, 400, 106 S. Ct. 3000, 92 L. Ed. 2d 315 (explaining that regression analysis may be sufficient to establish Title VII plaintiffs case even if it does not establish proof of discrimination "with scientific certainty"). The purpose of the experts' analyses is to establish an explanation for a disparity in student outcomes; if that explanation holds, then we believe it is plausible to presume that it holds regardless of which measure of student performance is used. Approached somewhat differently, for the State Defendants argument to be at all persuasive, it is incumbent upon them, at the very least, to suggest some reason that socioeconomic status, or any other variable, might fail to account for the disparity in MAT scores, yet nevertheless account for a similar disparity in, for example, PEP scores. Because they have not suggested any such reason, we presume that the explanatory value of the experts' analyses of MAT scores is applicable to the other measures of student achievement as to which we have received evidence.

1. MAT Scores

The MAT has been administered annually to almost every student in the Yonkers public school system, from the time of this Court's initial remedial decree in 1986 until 1996.[24] The only students who did not *708 take the MAT were those students in a special education program whose individual learning plan did not include standardized tests, and students who demonstrated limited proficiency in English ("LEP") and had received less than 20 months of instruction in English. (See 1993 Trial Tr. at 208-09 (Batista).) The exam is administered each May to students in grades 1 through 9. (See 1993 Tr. 350 (Weinberger).) Similar or identical forms of the test are given to students in grades 1 and 2, grades 3 and 4, grades 5 and 6, and grades 7 through 9. (See 1993 State Trial Ex. N at 14.) The test measures student achievement in three categories—Reading, Mathematics, and Language. (See id.) Because the "MAT was the outcome for which the widest set of data was available over time," all of the experts who testified at trial agreed that it was "one of the key indicators through which" the district's performance can be assessed. (See 1993 Tr. at 1378 (Braddock).)

One method of measuring a student's performance on the MAT is to compare the student's performance to that which would be expected of a hypothetical average student, based on national norms, at that particular grade level. (See 1993 Tr. at 350 (Weinberger).) Because the examination occurs in May, which is the eighth month of a ten-month school year, the norm for a first grader taking the exam, for example, would be the performance that would be expected of a student who had completed one year and eight months of school, which is represented numerically as 1.8. (See id.) This numerical representation of the performance expected of an average student at a particular grade level are referred to as a "grade equivalent." (See id.) Educators and administrators use grade equivalents data to determine how their students' performance compares to that of other students nationwide.

Dr. David Weinberger, the YBE's Director of Research and Evaluation, testified that for every single grade level in the YPS, between the years 1987 and 1996, the average MAT scores of white students was consistently higher than the national norm, but that the average MAT score of minority students was consistently lower than national norms. (See 1997 YBE Ex. 4, at ¶¶ 9-11 & Tabs 3-4; 1993 Trial Tr. at 352-55 (Weinberger); 1993 YBE Exs. 5A-5F.) Moreover, the disparity in grade equivalents was greater in the higher grades than in the lower grades.[25] (See 1997 YBE Ex. 4, at ¶¶ 9, 11 & Tabs 3-4; 1993 YBE Exs. 5A-5F.) For example, focusing on the 1991-92 test, when the students' performance on all three of the test's components —reading, language, and math— were combined (producing a figure referred to as "total battery"),[26] white first graders in Yonkers scored ½ of a grade above grade equivalent (i.e., as second graders in their third month), but minority students performed slightly below grade *709 equivalent. (See 1993 YBE Ex. 5A.) The disparity among first graders, therefore, was approximately one whole grade level. By comparison, white ninth graders, however, performed almost a whole grade level higher than their grade equivalents (i.e., as tenth graders), while black ninth graders performed over a grade level below expected and Hispanic ninth graders performed two grade levels lower than expected. (See id.; 1993 Trial Tr. at 354-55 (Weinberger).) The disparity among ninth graders, therefore, was between two and three grade levels. These trends of widening gaps from expected grade level, upwards for white students and downward for blacks and Hispanics, was consistently found in Yonkers between 1987-88 and 1996-97, on all three components of the MAT (or CAT)[27] and in terms of total battery. (See 1997 YBE Ex. 4, at 9-12 and Tabs 3-4; YBE 1993 Trial Exs. 5A-5F; 1993 Trial Tr. at 356-57 (Weinberger).) Moreover, the disparity does not appear to be abating over time; the largest disparity is observed in 1994-95, in which "the gap between both non-minority and Black achievement and non-minority and Hispanic achievement grew to 2 grade equivalents by grade 3; was at 3 grade equivalents by grade 6; and reached between 3.5 and 4.0 grade equivalents by grades 8 and 9." (1997 YBE Ex. 4, at ¶ 12 and Tabs 3-4.)

This evidence of a gap in minority achievement, as explained and analyzed by the statistical experts, see infra Appendix A, provides the principal evidentiary basis for our finding that, as of 1997, race was meaningfully correlated with the disparity in academic achievement in the Yonkers public schools.

2. PEP Scores

Virtually all students[28] in the State of New York were, as of 1997, required to take an exam called the Pupil Evaluation Program ("PEP"). (See 1997 YBE Ex. 4, at ¶ 17; 1993 Trial Tr. at 365 (Weinberger).) The PEP consists of reading and math tests that are given to 3rd and 6th graders and a writing test that is administered to 5th graders. (See 1993 Trial Tr. at 365 (Weinberger).) The test measures whether a student has acquired competence in certain basic skills. It does not rank students against each other; it only reports whether a student has passed certain thresholds. (See 1997 YBE Ex. 4, at ¶ 18; 1993 Trial Tr. at 365 (Weinberger).) The thresholds are a state reference point ("SRP"), which represents the number of questions a student must answer correctly to be considered minimally competent, and a mastery level, which represents the number of questions that a student must answer correctly to be considered to have mastered the skills being tested.[29] (See 1997 YBE Ex. 4, at ¶ 18; 1993 Trial Tr. at 365 (Weinberger).)

The YBE and the NAACP presented evidence disaggregating by race the Yonkers' students' performance on the PEP for the years 1992 through 1997.[30] (See 1997 YBE Ex. 4, at Tabs 6-7; 1993 YBE Ex. *710 6A-6B.) The data reveal a consistent racial disparity in the rates with which students meet the SRP. (See 1997 YBE Ex. 4, at ¶¶ 19-20.) For all six years presented, on all 5 of the individual tests, a significantly lower percentage of white students than minority students failed to achieve the SRP level. (See 1993 YBE Ex. 6A-6B.) Minority students were at least twice as likely to fail to achieve the SRP as were non-minority students, and on many of the tests, they were almost three times as likely to fail.[31] Between 1993 and 1997, students in the YPS improved, overall, in their performance on the PEP tests, but the racial disparity remained consistent, with African-American and Latino students failing at rates between two and four times higher than those for non-minority students. (See 1997 YBE Ex. 4, at ¶ 20 and Tab 7.) The disparity is even greater when one focuses on the rates at which students achieve the mastery level, with non-minority students achieving that level approximately twice as frequently than minority students and, on some tests, as much as five times more frequently. (See id. at ¶ 21 and Tab 8.) With respect to the 1992 data, Dr. Weinberger determined that the observed disparity was statistically significant to a factor of pSee 1993 Trial Tr. at 447.)

The Defendants' principal response[32] to the PEP data is their claim that a similar disparity exists in four other school districts —New York City, two community school districts within New York City, and Freeport—the racial demographics of which are allegedly similar to that found in Yonkers. (See Ct.Ex. A of 9/16/97.) In a prior opinion, we explained that we found this argument unpersuasive because we felt it inappropriate to assume that no segregation had occurred in those districts merely because they had not been the subject of a judicial finding of unlawful segregation. See Yonkers VI, 984 F.Supp. at 690. The Court of Appeals found our reasoning to be flawed, noting that "the district court's approach would invalidate reality-checking comparisons with any and all other districts." Yonkers VII, 197 F.3d at 55.[33]

*711 In light of the Court of Appeals' opinion, we have considered the Defendants' argument anew, but still find it to be, ultimately, unpersuasive. We are troubled by the fact that the analysis was presented in terms of PEP scores.[34] The PEP only measures whether or not a student has passed a certain threshold; it does not say by how much. We are unable to conclude on the basis of this evidence, therefore, that the disparity in achievement in these other four districts is of a similar magnitude as that observed in Yonkers. Moreover, even if the disparity in Freeport and New York City was of a similar magnitude, the object of this Court's remedy order was not to raise Yonkers schools to the level of other districts which were not adequately meeting the needs of their minority students, but to the level that would have obtained had there been no unlawful segregation.

For all these reasons, the Court finds that the Yonkers' students PEP scores provide additional evidentiary support, in corroboration of the MAT data, for a finding that, as of 1997, there was a meaningful racial disparity with respect to the levels of academic achievement found among students in the Yonkers public schools. That four districts outside Yonkers, not shown to be highly comparable or to have been free of de jure segregation, does not negate this finding.

3. Dropout and Graduation Rates

The New York State Education Department ("SED") defines a dropout as a student over the age of compulsory attendance (17 years old) who has not transferred to another district or to an approved education program. (See 1993 Trial Tr. at 388 (Weinberger); 1993 YBE Ex. 12, at n. *; 1997 YBE Ex. 4, at ¶ 30.) The YPS only considers a student to be a dropout if it can confirm that the student is older than the age of compulsory attendance and that the student has not entered another approved program.[35] For example, a student who left school at the age of 16 or younger, or one who left without any indication of his future plans would not be considered a formal dropout, even though it is possible that the student will never return to school again. (See 1997 YBE Ex. 4, at ¶ 30.)

Between 1987 and 1997, African-American and Latino students dropped out of the YPS at a higher rate than white students. (See 1997 YBE Ex. 4, at ¶ 30 & tab 18; 1993 YBE Ex. 12; 1993 Trial Tr. at 388 (Weinberger).) The African-American and Latino dropout rates ranged from approximately 4 to 9% over the ten years for which evidence is available, with an average in the 6 to 7% range. (See 1997 YBE Ex. 4, at Tab 18; 1993 YBE Ex. 12.) Over the same period, the white students' dropout rate only exceeded 4% in one year (1992) and seems to have averaged, approximately 2.5% to 3.5%.[36] (See id.) Among those students who remained in school long enough to begin their senior year,[37] the graduation rate among African-American and Latino students was substantially lower than it was for white students. (See 1997 YBE Ex. 4, at ¶¶ 30-31 & Tab 18; 1993 YBE Ex. 13; 1993 Trial *712 Tr. at 396-97 (Weinberger).) Throughout the ten year period examined, the minority students were 1.5 to 2 times more likely not to graduate than were white students; the minority graduation rate, for those students who earned enough credits to be qualified as seniors, ranged between 20 and 30% while the white graduation rate remained within the 10% to 15% range. (See 1997 YBE Ex. 4, at ¶¶ 30-31 & Tab 18; 1993 YBE Ex. 13; 1993 Trial Tr. at 396-97 (Weinberger).)

Poor academic performance is, obviously, not the only reason that a student might drop out of high school. Any number of other factors could, conceivably, motivate that decision. A student's economic circumstances, work schedule, or immigration status are just a few examples of noneducational factors that might influence a decision to drop out of high school. However, we believe that the evidence of a disparity in dropout rates does, to a limited degree, account for such factors. If a student leaves the YPS prior to reaching the age of seventeen, which is likely an accurate description of some of the most transient students upon whom the State Defendants focus, he would not be counted as a dropout, due to the narrow way that dropouts are defined.[38] Moreover, the YBE supplemented its report of a disparity in raw dropout rates, by showing that the data was consistent when refined to include only those students who earned enough credits to be classified as seniors. Certainly, some of the students whose dropouts were caused by transient lifestyles or persistent economic disruption would not be able to earn enough credits to be classified as seniors and would, therefore, have been excluded from this analysis.[39] (See NAACP's Reply Submission on Vestiges, Ex. 1, at ¶ 230 ("Even those black and Hispanic students who have the desire and support to remain in school into their senior year lack the credits or skills needed to graduate.").)

Although the evidence is not sufficiently refined to support a conclusion that poor academic performance is the exclusive cause of the racial disparity in dropout rates, it is persuasive enough, in light of our findings with respect to MAT and PEP scores, to provide additional, corroborative support for a finding that, as of 1997, a racial disparity existed in the YPS with respect to the levels of academic achievement. The Defendants argue that the disparity is consistent with national trends. However, the very witness upon whose testimony they rely for that argument, Professor Darling-Hammond, (see State Defendants' Proposed Findings of Fact at ¶ J(8)) explained that the reason for that national disparity is that educational policies or practices nationwide have a disparate impact on racial minorities. (See Darling-Hammond 8/13/97 dep. at 92.) Her testimony, therefore, further supports our finding, and even supports the inference we draw therefrom.

4. Post-Secondary Education

The disparate impact of the YPS's educational policies and practices is also reflected, to a very limited degree, in the disparity, as of 1997, between minority and non-minority students' rates of applying, and of being accepted by, post-secondary educational programs. Between 1988 and 1996, approximately 55% of the African-American students in the YPS applied to post-secondary programs.[40] (See 1997 *713 YBE Ex. 4, at Tab 22.) For Latino students, the application rate demonstrated a downward trend over the same period, declining from approximately 68% in 1988 to approximately 55% in 1996. (See id.) During those years, the application rate among white students increased dramatically, from a low of approximately 55% in 1990, to a high of almost 90% in 1995, and averaging somewhere between 60 and 70%. (See id.) While, as of 1997, the rate at which students from the YPS had been accepted into post-secondary educational programs had declined for all three ethnic groups, the overall rate was significantly lower (approximately 10%) for minority students than it was for non-minority students. (See id.)

We recognize, as the State argues, that many students who do, in fact, apply to post-secondary educational programs and who are accepted will not be captured in the YBE's data. The data only accounts for those students who applied for, and were accepted by, post-secondary programs while they were seniors in the Yonkers school system. (See 1993 Trial Tr. at 484-85 (Weinberger).) If a student waited for a year, after graduating from high school, before applying, the student would not be reflected in the Board's data. However, we have no reason to assume that any particular racial or ethnic group will fall into this category at a higher rate than another. The Defendants argue that waiting to apply for post-secondary education is more common among poor students, who are disproportionately minority, but we decline to engage in such speculation.

Evidence of a disparity in the rates with which students pursued post-secondary education would not be sufficiently probative, taken alone, to support an inference of a segregative effect. However, we believe that this evidence provides some additional corroboration of our finding, based on the test score data and dropout rates, that there was, as of 1997, a racial disparity with respect to the quality of education offered to students in the Yonkers public schools.

5. Conclusion

All of the evidence of academic achievement outcomes indicates, consistently, that, as of 1997, a disparity existed between the academic performance of minority and non-minority students in Yonkers. Regardless of the measure that is used, white students were, as of 1997, performing at higher levels than minority students. Expert analyses of some of the numerical data convinces us that race is a statistically meaningful variable in explaining the disparity. The Court finds, therefore, by a preponderance of the evidence, that, as of 1997, a racial disparity in academic performance existed in the Yonkers Public Schools. Given our premise that all children can learn, and the City and the State Defendants' total failure to establish an alternative explanation for that gap, we infer that the cause of the disparity was a set of policies and practices that existed in the Yonkers public schools as of 1997, an issue which we address next.

B. Current Policies and Practices

Notwithstanding our inference that educational policies and practices were responsible for the gap in minority performance, to conclude that vestiges of segregation existed, we must identify those practices and determine whether or not they are traceable to the prior segregation. Below, we address five[41]*714 of the policies and practices suggested by the YBE and the NAACP—(1) tracking; (2) disciplinary practices; (3) special education referrals; (4) inadequate provision of pupil personnel services; and (5) inadequate services for LEP students.

Before summarizing and analyzing the record with respect to each of those five policies and practices, we note that it is not alleged that any of the putative vestiges is the product of intentional discrimination, nor that any of the policies or practices operate in an openly discriminatory manner.[42] Rather, the YBE's and NAACP's claim is that racism is so embedded in the YPS, as a result of what occurred prior to 1985, that teachers and other administrators unintentionally administer facially neutral policies in a racially discriminatory manner, or, at least, without sufficient sensitivity to the disparate negative impact that they have on minority students. The evidence offered in support of this position consists primarily of testimony from educators in the Yonkers public schools as to their perception of the attitudes of their colleagues, and their understanding of the sources of those attitudes.

Many YPS administrators, almost all of whom had formerly been teachers in the YPS, testified that they believed non-minority teachers had lower expectations for minority students than they did for non-minority students. (See 1993 Trial Tr. at 234-37 (Batista), 542, 545-47 (Pack), 804-05 (Duncan), 878 (Jamieson); 972 (Cardona-Zuckerman)). Those judgments were formed after classroom visits (see id. at 731-42 (Pack), 809-11 (Duncan)), reviews of teacher evaluations (see id. at 958 (Jamieson)), reviews of reports prepared by outside consultants (see id. at 58 (Batista)), and conversations with teachers and students (see, e.g., id. at 234-37 (Batista), 878 (Jamieson), 804-07 (Duncan)).[43] The low expectations were demonstrated in a lack of homework assigned to minority students (see id. at 1047 (Fries)), seating patterns in classrooms (see id. at 809-10 (Duncan)), and a general lack of concern for minority students' inattentiveness in class (see id. at 731-42 (Pack)).

The witnesses testified that, in their judgment, the improper attitudes they observed were rooted in the district's history of segregation and its attempts to integrate. (See id. at 551 (Pack); 1191 (Cardona-Zuckerman).)[44] Dr. Gladys Pack, *715 the YPS's Assistant Superintendent for Restructuring, explained one way in which the district's history of segregation influenced teachers' current attitudes—the move away from neighborhood schools (because of the segregation caused thereby) required some parents to travel some distance to attend parent-teacher conferences; minority parents, many of whom were poor, often lacked the resources to undertake such travel; some non-minority teachers, unaware of the parents' hardships, inferred that the parents were not interested in their children's schooling and, consequently, developed low expectations for those students. (See id. at 552-53; see also id. at 70-81 (Batista).) That schools that had previously been identified as minority schools had also been perceived to be inferior schools, has also, according to the witnesses, left many non-minority teachers with the impression that minority students are unable to achieve at high levels. (See id. at 972 (Cardona-Zuckerman); 878 (Jamieson).)

We find this testimony to be highly credible and persuasive. The racism that existed in Yonkers prior to 1985—which was demonstrated overwhelmingly to this Court in the course of literally hundreds of days of trial and hearings—was invidious and pervasive. Our liability opinion contains numerous examples of the ways in which blatantly racist attitudes were exhibited in the administration of school policies. See, e.g., Yonkers I, 624 F.Supp. at 1454-62 (describing pervasive racist attitudes exhibited in the administration of special education program). In light of that history, the testimony of local school officials who perceive the continuation of racist expectations and attitudes in the Yonkers public schools is highly credible and persuasive. See Village of Arlington Heights v. Metropolitan Housing Development Corp., 429 U.S. 252, 267-68, 97 S. Ct. 555, 50 L. Ed. 2d 450 (1977) (noting that historical background is an "evidentiary source" that can be used to interpret other forms of evidence); Brown v. Bd. of Educ., 978 F.2d 585, 590 (10th Cir.1992) ("To expect the effects of legally mandated segregation to magically dissolve is to expect too much.").

The Court of Appeals rejected the testimony about teacher expectations on the ground that it was nothing more than "scattered anecdotes" and "subjective, intuitive impressions." Yonkers VII, 197 F.3d at 53 (citations omitted). We do not deny that much of the evidence is, obviously, anecdotal. But it is anecdotal evidence that is fully consistent with the history and background of this case, and with all of the available quantitative evidence.[45]Cf. Wessmann v. Gittens, 160 F.3d at 806 ("[A]necdotal evidence may prove powerful when proffered in conjunction with admissions or valid statistical evidence ....") (citation omitted). Moreover, it is evidence that was provided, without contradiction, by a group of educators who, collectively, have decades of experience in the *716 Yonkers public schools.[46] The Court of Appeals doubted the credibility of those witnesses, reasoning that they have a financial incentive to exaggerate the effects of the prior segregation. See Yonkers VII, 197 F.3d at 54 ("[I]t is clear enough that the Board has no incentive to rid itself of that taint so long as its self-accusation generates a flow of state remedial funds through this litigation."). But none of the YBE's witnesses have any personal, financial interest in the outcome of this litigation. To the contrary, they have demonstrated themselves, over the course of the last several years, to be committed educators. They have testified before this Court that, in their experienced judgment, racist attitudes and expectations affect the administration of a variety of YPS policies. Cf. People Who Care, 111 F.3d at 536 (reasoning that "consensus of ... educational authorities ... deserves some consideration by a federal court"). Because the City and the State Defendants did not call a single witness who had any degree of experience in the YPS to contradict that testimony, and because (as we have indicated) we find it credible, the Court finds that, as of 1997, there were teachers and administrators, in the YPS who exhibited reduced expectations for minority students and that those expectations are traceable to the pre-1985 segregation of the YPS. Although we do not consider those expectations, themselves, to be a vestige of segregation, they play an integral role in our assessment of the five policies and practices that the YBE and the NAACP suggest are vestiges of segregation. We turn now to an analysis of the record with respect to those five policies and practices.

1. Tracking

Like many other school districts, the YPS, as of 1997, separated its students into discrete groups based on some assessment of the students' abilities.[47] At the top of the spectrum is the Century Honors program, which provides enrichment programs to a select group of students who have demonstrated high academic ability. The students participate in these enrichment programs as a group and their participation is supplemental to their other courses. (See 1993 Trial Tr. at 90-91, 258 (Batista).) Students apply for participation in the Century Honors program and are selected on the basis of their grades in junior high school. (See id. at 91.) High-achieving students may also take Advanced Placement, or "AP" courses. AP courses conclude with a national AP Exam, prepared by the College Board, and may permit passing students to receive college credit. Most students who participate in the Century Honors program take AP classes, but there are also students in the AP classes that do not participate in the Century Honors program. (See id. at 147-48, 258 (Batista).)

*717 As of 1997, a distinction was also made between courses that ended with a Regents exam, and those that did not, with the former being considered the more academically-rigorous or challenging courses. (See id. at 91.) The courses that did not end with a Regents exam are called "survey classes." (See id.) Only if a student took the Regents courses can the student qualify for a Regents Diploma, (see id. at 91-92 (Batista), 398 (Weinberger)), which was generally considered to be more prestigious than an ordinary diploma earned by completion of survey classes (see id. at 259-60 (Batista), 399 (Weinberger)).

Dr. Weinberger presented the Court with a collection of data he had prepared in conjunction with a research team from Columbia University that examined whether all students who were capable were taking advantage of the YPS' "college-bound" curriculum.[48] The study focused on those students that were capable of pursuing high-level academic work, which the study defined as those students who scored above the 50th percentile on their most recent MAT.[49] While minority students were substantially under-represented within that group,[50] even more striking was the fact that within the group, minority students were consistently observed to have been selected for, and to have selected, the less demanding programs of study. White students accounted for 81.82 % of those students who undertook the most demanding curriculum (consisting of Century Honors, AP, or College link), and for 61.11% of those students who took the next most demanding curriculum (consisting of Century Honors, AP, and College link courses mixed with Regents courses). By contrast, only 9.09 % of the students in the most demanding courses and 12.63 % of the students in the second most demanding curriculum were African-American. Hispanic students accounted for 7.58% of the students in the most demanding curriculum and 17.17 % of the students in the second most demanding curriculum. Even among those African-American and Latino students who were included in the College Bound Study (those scoring at or above the 50th percentile on the MAT), a majority (72.35% of African-Americans; 68.55% of the Hispanics) enrolled in curricula that were characterized as Medium, Low, or Very Low, in terms of the level of academic rigor. A Regents diploma was awarded to a smaller percentage of minority students than it was to non-minority students. (See 1997 YBE Ex. 4 at ¶ 32; 1993 Trial Tr. at 399 (Weinberger).) Over the entire ten-year period studied, white students were 2 to 3 times more likely than minority students to be awarded a Regents diploma; on average, 25 to 35 % of non-minority students received Regents diplomas; during the same period, but only 7 to 13 % of minority students did. (See 1997 YBE Ex. 4, at 32 & Tabs 19-20; 1993 YBE Ex. 14.)

It is not suggested, by any party that the apparent disparity in the types of courses taken by minority and non-minority students reflects any intentional segregation on the part of officials associated with the YPS. (See 1993 Tr. at 232, 264 *718 (Batista) (indicating that intentional tracking of students would be rejected as inconsistent with the district's desegregative goals))); cf. Hart v. Community Sch. Bd. of Ed., New York Sch. Dist. # 21, 512 F.2d 37, 45 n. 11 (2d Cir.1975) (citing United States Comm'n on Civil Rights, Racial Isolation in the Public Schools 161-62 (1967)) (indicating that tracking has been used as means of intentionally evading desegregation orders); McNeal v. Tate County Sch. Dist., 508 F.2d 1017 (5th Cir. 1975) (same). However, we nevertheless find that the academic tracking reflected in Dr. Weinberger's data is the result of segregative policies and practices. We reach that conclusion, in part, because a principal determinant for placement in the higher level courses is the test scores as to which we have already found a statistically significant racial disparity. Moreover, our finding as to teacher expectations, coupled with testimony that those expectations influence student's course selections (see id. at 523, 719 (Pack), 813-14, 863 (Duncan), 880-81 (Jamieson), 1022-23, 1033 (Fries))[51] convinces us that those expectations are also responsible for the segregative tracking depicted by Dr. Weinberger's data. Moreover, we find credible the testimony that the racial disparity in the Century Honors program "perpetuates the myth that white is smart or only white people can be smart," (1993 Trial Tr. at 1056-57 (Fries); see also id. at 817-18 Duncan)), which results in further entrenchment of segregative tracking practices. Finally, the evidence of segregative tracking has a clear antecedent in the pre-1985 regime, when minority students were quite frequently enrolled in the least demanding curriculum.[52]

Because tracking practices in the YPS were, as of 1997, segregative in nature, and because they were based on teacher attitudes and expectations that are traceable to the prior segregation, the Court finds that those practices are vestiges of segregation. Even if we were persuaded, as the State Defendants urge, that the principal reason for the continuation of invidious tracking in Yonkers is intransigence on the part of the teachers, (see id. at 94-95, 225-28 (Batista)) such intransigence is not a reason for denying that tracking is traceable to the prior segregation; in fact, it reinforces that conclusion. We find that there is a causal connection between conditions and attitudes which developed during the segregative regime and the subsequent tracking practices.

2. Discipline

Principals may impose many types of discipline on students who misbehave, including detention and suspension. (See id. at 82-83 (Batista).) Disciplinary actions require a report to the superintendent. (See id. at 241-42 (Batista).) If a principal wishes to suspend a student for more than five days, then there must be a superintendent's hearing. (See id.) Dr. Batista testified that, in his impression, after reviewing many of the reports of disciplinary action that he had received, and reviewing the record from superintendent's hearings, that there was a racial disparity in terms of the number of students that were disciplined. (See id. at 83.) According to Dr. Batista, that disparity was directly attributable to the prior segregation in that it is based on teacher attitudes about minority *719 students that were formed during the era of segregation. (See id. at 83-84, 173.)

Bedelia Fries, the principal of Lincoln High School, shared Dr. Batista's impression. She testified that many non-minority teachers consider infractions by minority students to be more serious than infractions by non-minority students. As she put it, when non-minority students fight, it's just a fight, but when minority students fight, the teachers consider it an assault and, consequently, recommend a more severe disciplinary response.[53] (See 1993 Trial Tr. at 1033-34.) Dr. Barbara Cox, the Assistant Superintendent of Pupil Services and Assessment, testified, that in her view there was an "inability among part of the staff to deal with the minority student," resulting in the fact that "they are more quick to suspend a minority male than they would a non-minority male." (Id. at 1679.) Moreover, according to Dr. Cox, the disproportionate rate at which minority males continued to be suspended, as of 1997, was a continuation of the racist practices that existed prior to desegregation. (See id. at 1680-81.)[54]

The data with respect to student disciplinary measures confirms the educators' subjective impressions. Between 1989 and 1997, African-American students were twice as likely to be suspended from school as non-minority students, and Latino students were 1.5 times as likely to be suspended. (See 1997 YBE Ex. 4, at ¶ 23 & Tabs 9-10; 1993 YBE Exs. 8A-8B; 1993 Trial Tr. at 373-75 (Weinberger).)[55] With respect to the students who were the subject of a superintendent's hearings, which are required for the more serious disciplinary matters, African-American students constituted approximate 50-60 % of the total, although they only constitute approximately 30% of the overall population. By comparison, non-minority students represented only approximately 7 to 10% of the students who were the subject of superintendent hearings. Latino students were also over-represented, as compared to non-minority students, but not disproportionately given the size of the Latino population. (See 1997 YBE Ex. 4, at ¶ 25 & Tab 12; 1993 YBE Ex. 9; 1993 Trial Tr. at 375-77 (Weinberger).)[56] Finally, in terms of incident reports—which are filed by building administrator's to alert central administration to the existence of a problem (most typically, a fight)—African-American students were approximately twice as likely as non-minority students to be mentioned and Latino students were approximately 1.5 times as likely to have been involved. (See 1997 YBE Ex. 4, at ¶¶ 26-27 & Tab 14; 1993 Trial Tr. at 377-78 (Weinberger).)

To the extent this disparate use of disciplinary measures is tied to the reduced expectations that we have found to be *720 traceable to the prior segregation, we find that this practice is itself traceable to the prior segregation. We find therefore that it is a vestige of segregation.

3. Special Education Referrals

A classroom teacher, a parent, or other professional staff can identify a child who they believe requires a special education program by referring that student for evaluation. (See 1993 Trial Tr. at 1692(Cox).) If the parent consents, the Department of Special Education conducts an evaluation, which includes an examination by a psychologist, a social worker, an educational assessment, a classroom observation, and a medical report. (See id. at 1692-93(Cox).) All of that material is reviewed by a Committee on Special Education, in consultation with a parent, and a determination is made as to whether or not the child is to be placed in a special education program. (See id. at 1693(Cox).) Pursuant to state guidelines, the Committee on Special Education can classify a child in one of a variety of ways—emotionally handicapped, learning disabled, mentally retarded, physically handicapped, multiply handicapped, or speech impaired. (See id. at 1694(Cox).) The most common classifications are "learning disabled" or "LD" and "emotionally disturbed" or "ED." (See id.)

Historically, prior to segregation, minorities were over-included in the district's referrals to special education. (See 1993 Trial Tr. at 21-23 (Batista), 518-19 (Pack).) Moreover, among those students that were referred for special education, minority students were more likely to be classified as emotionally disturbed, as opposed to learning disabled. (See id. at 29-30 (Batista), 519-20 (Pack), 1709-10(Cox).) According to Dr. Pack, who was in charge of the YPS Special Education programs from 1979-1982, the reason for these excessive referrals of minority students in special education and over-diagnoses as emotionally disturbed was the racist attitudes and expectations of the faculty. (See id. at 521 (Pack).) Dr. Cox's impression was the same—she testified that she has consistently observed an over-referral of minority males, dating from the time prior to desegregation and continuing to the present. (See id. at 1694-95.) The reasons for that referral were a combination of low teacher expectations and student frustrations derived from a lack of achievement. (See id. at 1695(Cox).) Dr. Cox specifically recalled that in the period immediately after desegregation, there were many inappropriate referrals of minority students who had enrolled in east-side, non-minority schools. (See id. at 1696.) Moreover, she testified that, although in her judgment, such inappropriate referrals were declining, it still continued to be an observable phenomenon. (See id. at 1697(Cox).)

Data presented at trial reflect a continuing, and growing, trend of disproportionate special education referrals of non-minority students that corroborates the educators' subjective impressions. Between 1986 and 1997, the percentage of non-minority high school students[57] that were enrolled in special education ranged from just above 6% to approximately 8%. (See 1997 YBE Ex. 4 at Tab 16.)[58] Over the same period the percentage of minority high school students that were referred were significantly higher—between 12 and 18% of all African-American high school students were referred for special education, and approximately 11% of Latino high school students were referred.[59] Moreover, the disparity *721 appears to be widening over time. (See id.) In terms of the numbers of students referred, but not placed,[60] for special education, the disparity is somewhat smaller, though it continues to exist for each school year between 1988 and 1996, with African-American students being referred at the highest rates, and non-minority students being referred at the lowest rates. (See id.)

Although the State Defendants presented evidence indicating that, as of 1997, the YPS' special education program was in compliance with state requirements, that state compliance review does not examine the racial or ethnic composition of the group of students that are referred, or placed, in special education. (See id. at 1759-60(Cox).) The Court finds, therefore, that despite the YPS' compliance with state requirements, its special education policies and practices, as of 1997, were segregative and, to the extent they were influenced by teacher expectations that we have found to be traceable to the prior segregation, they were themselves traceable to the prior segregation. The Court finds, therefore, that the special education programs in the YPS were administered in a manner that constitutes a vestige of segregation.

4. Pupil Personnel Services

The record reveals that the YPS has inadequately provided its students with the services of professional guidance counselors, psychologists, and social workers. (See id. at 979 (Cardona-Zuckerman).) Each guidance counselor in the district is responsible, at the elementary school level for somewhere between 6 and 800 students; in some instances, the ratio is one for 1,000 students. (See id. at 1673(Cox).) The counselors are responsible for students in multiple buildings, and have to divide their time between different buildings. (See id. at 1673-74(Cox).) These ratios are below those which exist in other parts of Westchester County, and lower than state educational policymakers believe to be ideal. (See id. at 1731-32(Cox).) As a result, counselors focus on the crisis situations and are unable to intervene, in a preventative manner, with those students whose problems have not yet reached crisis status. (See id. at 1674-75(Cox).) The effect of this is felt disproportionately by those minority students who are most in need of the services offered by these professional.

The shortfall in the availability of guidance counselors, social workers, and psychologists has a clear antecedent in the history of pre-1985 segregation. Prior to desegregation, the students most in need of these services were segregated into a few schools. As those students spread into multiple schools, there has been no increase in the number of service professionals, causing them to be spread too thin. (See id. at 1672(Cox).) Moreover, according to Dr. Cox, there has been too little sensitivity to the need for these service professionals on the part of those responsible for the district's operating budget. (See id. at 1683.) Because we believe that insensitivity derives, at least in part, from expectations and attitudes that are traceable to the prior segregation, the Court finds that the denial of adequate pupil personnel services was a vestige of segregation.

5. LEP Services

A student is classified as an LEP student based upon a test administered upon *722 entering school. Children who fall below the 40th percentile on that test, and whose first language at home is not English, qualify for LEP services. (See id. at 1006-07 (Fries).) LEP students are given the option of taking ESL classes or pursuing a bilingual education. (See id.)

The school district makes an inadequate effort to integrate these students into the mainstream of the school; they remain linguistically segregated. (See id. at 1010 (Fries), 1187 (Cardona-Zuckerman).) Many of the support service professionals (counselors, psychologists, and social workers) are not bilingual, causing the LEP population to be under-served. (See id. at 979 (Cardona-Zuckerman), 1690-91(Cox).) Moreover, many of the teachers who are provided for bilingual programs are not certified in the subjects they teach. (See id. at 1008-10 (Fries).) Some of the students who participated in an ESL pull-out program are educated in inadequate facilities. (See id. at 1685(Cox).) There is insufficient native language instruction. (See id. at 1685-86(Cox).) The LEP students are also less able to participate in the magnet programs because they are not offered in languages other than English. (See id. at 1686-87(Cox).)

The State contends that the lack of adequate numbers of LEP teachers and other professionals is caused by an overall lack of a qualified pool of applicants, rather than because of an inadequate commitment on the part of the school system. (See id. at 1745-46(Cox) ("We have had some difficulty in finding teachers.").) However, one principal of a school in which many of the students are LEP status, testified that she believed, the attitudes of the school faculty and administration also contribute. (See id. at 1187-88.) Dr. Cox, who bears responsibility for recruitment of LEP teachers and professionals testified that, in her view, more could be done to attract an adequate number of LEP certified staff. We find that testimony to be credible.

Because we find that the inadequate provision of LEP education and services is tied, at least in part, to the prior segregation, the Court finds that it is a vestige of segregation.

6. Conclusion

No educator would deny, and all parties to this litigation agree, that a teacher's expectations for a student and a student's self-esteem are crucial factors in explaining a student's level of academic achievement. Nevertheless, a student's expectations for himself, and a teacher's expectations for a student, are exceedingly difficult to measure and observe, and are even more difficult to quantify directly. The Court is persuaded, by a preponderance of the evidence, that the disproportionately high enrollment of minority students in the YPS' least demanding academic program, minority students' disproportionately high involvement in disciplinary matters, and the disproportionately high rate of referrals of minority students for special education —as demonstrated both through objective data and the testimony of experienced educators in Yonkers as to their subjective impressions—both reflects and contributes toward a reduction in teacher expectations for minority students and a diminution of minority students' own expectations for themselves. We further find that school administrators have been insufficiently sensitive to the special needs of many minority students and, therefore, provide inadequate pupil personnel services and inadequate services for LEP students. Moreover, all of these factors surely influence one another and have a collective impact that is greater than the sum of each policy's effect, in contributing to the demonstrated shortfall in minority achievement. (See, e.g., id. at 863 (Duncan), 1679-80(Cox).) They are, therefore, vestiges of segregation and must be remedied.

CONCLUSION

When a school district that was once racially segregated and has not yet been found to have achieved unitary status educates *723 its children in accordance with policies and practices that are traceable to the prior segregation, the parties responsible for that segregation have the burden of producing evidence that factors for which they have no responsibility, rather than those practices, are the cause of any continuing disparities. Ascribing the cause to "ambient societal discrimination" does not satisfy that burden but is rather an admission of helplessness which can not and should not be invoked until all reasonable efforts to rectify the disparities have been exhausted.

The Court finds that vestiges of segregation existed in the Yonkers public schools as of 1997 with respect to academic tracking, disciplinary practices, administration of special education programs, pupil personnel services, and services for LEP students. The State Defendants have failed to come forward with any persuasive evidence that racial disparities in achievement scores are attributable not to vestiges of segregation but to some other causes for which they are not responsible.

Pursuant to the terms of the Court of Appeals' remand, having found that vestiges of segregation existed, we are now to "fashion an order of relief." Yonkers VII, 197 F.3d at 58. It is apparent to the Court that EIP II is no longer a feasible option. Although the focus of this opinion has been upon whether or not vestiges remained in the YPS as of 1997, any remedial order of this Court, sitting in equity, must be directed to current conditions. Events that have occurred over the last few years require further development of a factual record before any remedy can be decreed. For example, because no stay was sought of this Court's 1997 EIP II order, the YPS budgeted and hired teachers and otherwise engaged in implementation of that order, with the expectation that state funds would be available. When the Court of Appeals granted a stay of EIP II on August 5, 1999, disputes developed between the YBE and the City administration with respect to school budgets and whether or not to seek reinstatement of this Court's prior order. A school budget shortfall of approximately $16 million is now said to exist. Moreover, in June 2000, the YBE discharged its Superintendent of Schools, Dr. Andre Hornsby, noting that the City administration and Dr. Hornsby disagreed as to whether or not vestiges of segregation remained and, consequently, as to the position the YBE should take in connection with this remand. While none of these events impact the question of whether vestiges of segregation existed in the YPS as of 1997, they necessarily have a significant impact on any remedy which we might order.[61] We therefore refer the case to the Court-appointed School Monitor to report and recommend, after appropriate proceedings, a suitable remedy. Needless to say, even more desirable would be a consensual resolution of this case, which would of course obviate the need or at least alter the nature of any proceedings before the Monitor.

SO ORDERED.

APPENDIX A

STATISTICAL ANALYSES OF MAT SCORES

1. Multiple Regression Analysis

The YBE's expert, Dr. Jomills Braddock, examined the MAT score data using a statistical methodology called a multiple regression analysis, which permits a researcher to identify relationships between a set of independent variables and a single dependent variable. The goal is to examine the relationship among the set of variables and thereby ascertain the relative degrees to which those variables account for *724 variance in the dependent variable.[1] (See 1993 Trial Tr. at 1373 (Braddock).)

Dr. Braddock's regression analysis was based on both subjective and objective evidence. The subjective evidence was used to determine which independent variables should be considered, and consisted of interviews that Dr. Braddock conducted with school administrators and teachers (see id. at 1367-68), observation of classes and other activities at the schools (see id. at 1368), and a review of formal reports that had been prepared by outside experts and consultants (see id. at 1367).[2] The objective evidence consisted of all of the statistical data about YPS students that could be obtained. This data was derived from archival computer tapes maintained by the YBE. (See 1993 YBE Ex. 27C.) The archival data included, with respect to each student in the YPS, information about the student's racial identification, (see 1993 Tr. at 1387), whether or not the student participated in a subsidized lunch program, (see id.), information about the students' age and gender (see id.), and the student's MAT scores. Dr. Braddock explained that he (working with a colleague) integrated these sets of data into a single, longitudinal file that identified each student (with some sort of identification number) and listed that student's race, age, gender, LEP status, participation in a subsidized lunch program, and annual MAT scores. (See id. at 1387-88.) Only those students for whom all of those items of information were available were included in the analysis.

Dr. Braddock studied a total of four MAT tests—the reading and math tests for 1990-91 and 1991-92. His analysis was presented, graphically, in a series of four charts, one for each test. See Yonkers IV, 833 F.Supp. at 226-36, Appendix A; (1993 YBE Ex. 27B.) Each chart contains 18 rows—two rows each (one for African-Americans and one for Latinos) for each of the nine grade levels of students who took the test. Each chart, therefore, contains 72 discrete comparisons. Moreover, because there are four charts, each column (representing the correlation between test scores and a different independent variable or set of independent variables) is the subject of 72 comparisons—4 tests, 2 racial or ethnic groups, and 9 grade levels.

The first column of each of Dr. Braddock's four charts presents the correlation between test scores and the students' racial and ethnic identification, without adjusting for any control variables. For every one of the 72 comparisons, a statistically significant disparity was observed. With respect to each comparison, there was a likelihood smaller than one in 1,000 that the observed disparity would occur by chance, a relationship which is represented numerically with the symbol "p[3] On average, correlating test scores with the students' race accounted for approximately *725 11% of the total variance in MAT scores.[4]

With each successive column, moving to the right on Dr. Braddock's charts, the analysis includes another independent variable (or set of variables) which function as controls. After adding each control variable, Dr. Braddock examined the correlation between test scores and the entire set of independent variables that had been included up to that point.[5] With the inclusion of each control variable, therefore, one would expect the size of the disparity, in absolute terms, to decrease and the percentage of the variance that is explained to increase. With a few puzzling exceptions,[6] that is the pattern demonstrated on Dr. Braddock's charts.

The first control variable considered was student background, which included the student's age and gender, LEP status, and socioeconomic status ("SES"). SES was defined in terms of whether or not the student participated in a subsidized lunch program. (See id. at 1379 (Braddock).) Controlling for student background eliminated a sizable degree of the variance that had been observed when test scores were correlated with race alone. Whereas the disparity was statistically significant to a degree of p[7] The correlation between test scores and student *726 background and race, together, accounted for approximately 22.3% of the overall variance observed, as compared to approximately 11% of the variance, which was accounted for by the correlation with race alone.

The second control variable considered was the student's prior test scores, for which Dr. Braddock used the earliest test score available for each student. (See id. at 1384.) The rationale behind using this particular control variable is that if a student's early test scores were, for example, low, then a subsequent low test score might not be attributable to his experiences in school.[8] If all of the apparent racial disparity could be correlated with prior test scores, it would undermine the inference that the cause of the shortfall in minority achievement was a set of policies or practices in the YPS that are having a disparate, negative impact on minority students.

Inclusion of a student's prior test scores in the set of independent variables had a dramatic effect on the correlation between test scores and the independent variables, rendering the disparity between minority and non-minority students' test scores statistically insignificant in 24 of the 72 comparisons, and reducing the degree of significance to p[9] The amount of the variance that was explained once the prior test scores was included in the analysis increased to 44 %, as compared to 22.3% when student background and race were the only independent variables considered.

The final control variable utilized by Dr. Braddock was school characteristics, which included school size, measured in terms of total enrollment, and poverty concentration, measured in terms of the percentage of the total enrollment that participated in a subsidized lunch program. Inclusion of school characteristics as an independent variable had a relatively minor effect on the correlation of test scores with the set of independent variables. None of the statistically significant disparities were rendered insignificant by the inclusion of school characteristics, and the degree of significance was only reduced with respect to two discrete comparisons.[10] The percentage of the variance that was accounted for once school characteristics were factored into the analysis increased only slightly from an average of 44% to an average of 45.6%. When all of the independent variables were included in the analysis, a statistically significant disparity remained for 54 of 72 comparisons made by Dr. Braddock. With respect to 4 of those comparisons, the degree of significance was only at the p[11]

*727 2. Census Tract Study

The Defendants presented the Court with a very different type of expert analysis of the MAT data. Instead of controlling, numerically, for the correlation between test scores and non-racial factors, the State Defendants' statistical expert, Dr. David Rindskopf, attempted to identify students of different races who were similar in terms of a set of non-racial factors. If the students shared a set of non-racial characteristics but differed in terms of race, he reasoned, any disparity that was observed could be attributed uniquely to the students' race.

Like Dr. Braddock, Dr. Rindskopf began by simply recording the disparity between minority and non-minority students' performance on the MAT, without adjusting for any control variables. His observations were identical to those of Dr. Braddock —whites consistently outperformed both Latinos and African-Americans on all portions of the exam to a statistically significant degree.[12]

The first non-racial factor for which Dr. Rindskopf controlled was whether a student was in a special education, or was an LEP student. (See 1993 Trial Tr. at 2001, 2002 (Rindskopf).) Pursuant to the methodology described above, therefore, Dr. Rindskopf eliminated all special education and LEP students from his model.[13] With those students eliminated, it remained the case that white students scored higher than both minority groups on every single test recorded.[14]

The second non-racial factor for which Dr. Rindskopf controlled was a student's socioeconomic status, measured (like Dr. Braddock) by whether or not the student participated in a subsidized lunch program.[15] He reported data both for those students that did participate in a subsidized lunch program (excluding those who did not) and data for those students who did not (excluding those who did). In both categories, there was a noticeable disparity between minority and non-minority student performance. Among the higher income students (those who did not participate in a subsidized lunch program), nonminorities scored significantly higher than both minority groups on all tests; among the lower income students, white students scored higher than African-American students on all tests, though not to as noticeable a degree, and only scored higher than Latino students on a majority of the tests reported.

Finally, Dr. Rindskopf attempted to control for an unspecified set of non-racial factors by controlling for the census tract[16] in which the student lived. The theory behind doing so is that individuals living in the same census tract share many characteristics,[17] which are difficult to disaggregate *728 and separately identify. Census tracts are a proxy, therefore, for a set of unidentified factors,[18] which one is asked to assume includes factors relevant to academic achievement, such as parental education (see id. at 2059 (Rindskopf)), and percentage of single-parent homes (see id. at 2062 (Rindskopf)).[19] However, census tract is quite clearly only a very rough proxy for those non-racial characteristics. No data was presented to substantiate the assumption that people with similar levels of education, or similar family structures, in Yonkers live in the same census tract. (See Trial Tr. at 2060 (Rindskopf).)

Dr. Rindskopf thought it would be inappropriate to compare students' performance in a census tract in which a particular race predominated. He therefore only included in his analysis those six tracts in which the non-white population was between 20 and 50 percent.[20] (See 1993 Trial Tr. at 2000 (Rindskopf).) Within those six tracts, the only students whose scores were compared were those students that participated in a subsidized lunch program, on the theory that one could be certain that those families' incomes would necessarily be limited to a relatively limited range. (See 1993 Trial Tr. at 2001-02 (Rindskopf).) It was only when census tract was included in his analysis that the achievement disparity, for the most part, disappeared.[21]

3. Analysis

It is not, and could not be, disputed that African-American and Latino students in the Yonkers Public Schools have consistently earned lower scores on the MAT than white students. All of the available evidence—Dr. Braddock's unadjusted figures, Dr. Rindskopf's unadjusted figures, and the grade equivalents data presented by Dr. Weinberger— supports that conclusion. No party claims that the minority students' racial or ethnic heritage is the cause of the shortfall; such a position would, of course, contradict the premise that all children can learn. The disputed issue is whether the students' race is correlated in a meaningful way with their academic performance, or only appears to be so correlated due to the disproportionate occurrence in minority communities of certain non-racial factors (such as poverty or low levels of parental education) that more accurately explain disparities in academic performance.

We begin by noting that the magnitude of the disparity that Dr. Braddock claimed could be attributed uniquely to race is, in our view, not overwhelming. Race accounted for approximately 11% of the total variance in Dr. Braddock's model, as compared, for example, to 54% of the variance, which was left totally unexplained. Approached somewhat differently, as we noted above, once all measurable variables *729 were controlled, a statistically significant racial disparity only existed with respect to 54 of the 72 comparisons he examined (all 36 of the comparisons between African-American students' scores and white students' scores and 18 of the 36 comparisons of Latino students' scores and white students' scores). We nevertheless believe, for three reasons, that Dr. Braddock's study provides strong evidentiary support for a conclusion that race accounts for a significant portion of the shortfall in minority achievement.

First, many of the independent variables for which Dr. Braddock controlled undoubtedly incorporated elements of racial bias which, strictly speaking, should not have been excluded from the analysis. For example, prior test scores—which had, by far, the most dramatic effect on the final result—are surely effected by the policies the racial impact of which we are trying to assess. If a student suffered from an unequal educational opportunity in his earlier years in a school district, his earliest test scores would, presumably, incorporate the effects of that inequality. When that factor was used as a control, therefore, Dr. Braddock seems to have excluded some portion of an effect that could be attributed to race.[22]See James v. Stockham Valves and Fittings Co., 559 F.2d 310, 332 (5th Cir.1977) (finding control variables used in regression analysis to have improperly incorporated the principal independent variable being studied). Consequently, his conclusion that 11% of the variance is uniquely attributable to race is quite likely an understatement, which is to say that the 11% figure is a highly conservative estimate.

Second, we agree with the Defendants that some of the variables utilized by Dr. Braddock are only very rough proxies for the factors they are supposed to have measured. Most notably, participation in a subsidized lunch program is quite clearly only a very rough proxy for a student's socioeconomic status. See People Who Care, 111 F.3d at 537 (noting that use of participation in school lunch program as proxy for poverty does "not even measure poverty," but only identifies "students who were below a poverty line").[23] However, *730 according to Dr. Braddock, it is a common measurement used in his field for identifying a student's SES[24] (see id. at 1380), and the State Defendants' expert, Dr. Rindskopf, used the same measurement. We conclude, therefore, that though participation in a subsidized lunch program is an imprecise measure of a student's SES, it is not an inappropriate measure of that factor.

Similarly, we recognize that the regression analysis did not control for all of the variables that contribute to a student's academic performance. Dr. Braddock acknowledged, in fact, that his study only accounted for 46.5% of the total variance, leaving 53.5% totally unexplained. The failure of a regression analysis to account for every conceivable variable might render it less probative than it otherwise might be, however the inclusion of "the major factors" is sufficient for the analysis to be acceptable as evidence of discrimination. Bazemore v. Friday, 478 U.S. 385, 400, 106 S. Ct. 3000, 92 L. Ed. 2d 315 (1986); see Bickerstaff v. Vassar College, 196 F.3d 435, 448-50 (2d Cir.1999) (citing Koger v. Reno, 98 F.3d 631, 637 (D.C.Cir.1996)), cert. denied, ___ U.S. ___, 120 S. Ct. 2688, 147 L. Ed. 2d 960 (2000). Moreover, we believe that Dr. Braddock's study was as complete as possible, given the data which was available. (See 1993 Trial Tr. at 2146 (Steinberg).) Although there are undoubtedly a whole range of variables that might affect a student's academic performance, cf. People Who Care, 111 F.3d at 537 ("The social scientific literature on educational achievement identifies a number of other variables besides poverty and discrimination that explain differences in scholastic achievement, such as the educational attainments of the student's parents and the extent of their involvement in their children's schooling.") (citing David J. Armor, Forced Justice: School Desegregation and the Law 96, 98; James S. Coleman, Equality of Educational Opportunity 302 (1966)); (1993 Trial Tr. at 70-71, 106-09, 191-92 (Batista), 2131-32 (Steinberg)), Dr. Braddock seems to have included in his analysis all of the factors that were "measurable."[25]Bazemore, 478 U.S. at 400, 106 S. Ct. 3000. Overall, therefore, the imprecision in some of Dr. Braddock's variables, and the failure to consider every factor that influences student achievement, weakens the force of Dr. Braddock's conclusion, but does not render it totally unpersuasive.

Third, Dr. Rindskopf's testimony fails to undermine our confidence in Dr. Braddock's analysis. To the contrary, Dr. Rindskopf's analysis actually confirms that conclusion. Until Dr. Rindskopf controlled for census tract (which, as explained below, improperly skewed his data), his results were consistent with those of Dr. Braddock. Even using Dr. Rindskopf's alternative methodology for controlling for variables, a racial disparity was observed, when controlling for LEP status, special education status, and participation in a subsidized lunch program.

We place little weight on the fact that the disparity disappeared once census tract was used as a control, because the use of that control variable reduced the sample size to such a large extent that the resulting conclusions are not statistically significant. Dr. Rindskopf's data accounts for approximately 800-1,000 students, out of a total of 12 to 13 thousand students who took the exams in question. (See Trial Tr. at 2026-27 (Rindskopf).) In other words, his analysis only considered 8% of the total population under consideration. (See id.) See Pollis v. New Sch. for Social Research, 132 F.3d 115, 121-22 (2d Cir. 1997) ("The smaller the sample, the greater the likelihood that an observed pattern is attributable to other factors and accordingly the less persuasive the inference ... to be drawn from it.") (citing Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605, 621, 94 S. Ct. 1323, 39 L. Ed. 2d 630 (1974); Haskell v. Kaman Corp., 743 F.2d 113, 121 (2d Cir.1984); Coble v. Hot Springs Sch. Dist. No. 6, 682 F.2d 721, 733-34 (8th Cir.1982)).[26] Moreover, because the six census tracts he examined were plainly not representative of the district as a whole,[27] the results observed for that 8% of the population cannot be generalized as an explanation for the whole population. Nor can it be shown that the census tract is a factor that operates independently of race. (see Trial Tr. at 2141 (Steinberg)), especially in a city like Yonkers where unlawful segregation permeated the low-income housing market for years.

NOTES

[26] This problem was magnified by Dr. Rindskopf's use of unweighted means across the six tracts to determine aggregate scores. In determining the mean score for each race for the six tracts, in the aggregate, Dr. Rindskopf simply averaged the six mean scores that he found for each of the six tracts, weighting them all equally. In some of the tracts, however, as few as 7 students were considered; their scores were weighted equivalently to other tracts in which 23 students were considered. Dr. Rindskopf justified this approach on the ground that the six tracts were relatively homogeneous—one implication of which is that it makes little sense to differentiate among the six tracts. It would seem to make little sense therefore to aggregate the scores for the six tracts by differentiating the mean for each of the tracts and then computing a weighted average for all of them. (See generally Trial Tr. at 2067-74 (Rindskopf).)

[27] The white students in the six census tracts with which Dr. Rindskopf worked are significantly less well off than white students, on average, in the City of Yonkers. (See Trial Tr. at 2041 (Rindskopf).) Relative to African-American and Latino families in the other 43 census tracts, those in the six tracts studied were at average levels of poverty. The six census tracts examined, then, can be characterized as ones including "poor whites with average blacks in Hispanic neighborhoods." (See Trial Tr. at 2041 (Rindskopf).)

[3] At the same time, Plaintiff-Intervenor, the Yonkers Branch—NAACP, amended its complaint to include claims against the State Defendants.

[4] See United States v. Yonkers Bd. of Educ., No. 80 Civ. 6761(LBS), 1989 WL 88698 (S.D.N.Y. Aug. 1, 1989), appeal dismissed, 893 F.2d 498 (2d Cir.1990) (motion to dismiss); United States v. Yonkers Bd. of Educ., No. 80 Civ. 6761(LBS), 1992 WL 176953, 1992 U.S.Dist. LEXIS 10059 (S.D.N.Y. July 10, 1992) (summary judgment).

[5] The remedy ordered by this Court in 1997 contained detailed provisions for the monitoring, by an outside expert observer, of the efficacy of the steps mandated by EIP II. As a consequence of the stay, there has not yet been any appraisal of the impact, if any, of EIP II during the time it was in effect.

[6] The remanding panel provided for "automatic restoration of appellate jurisdiction without a new notice of appeal" once this Court had completed the limited task it had been assigned. Yonkers VII, at 46 (citing United States v. Salameh, 84 F.3d 47, 50 (2d Cir.1996); United States v. Jacobson, 15 F.3d 19, 22 (2d Cir.1994)). Appellate jurisdiction is to be restored "when any of the parties furnishes a copy of the district court's ruling on remand to the clerk of [the Court of Appeals]." Id. at 58. The clerk is then to refer the matter to the panel that decided Yonkers VII. As we explain below, this opinion—being limited to an explanation of our finding that vestiges of segregation existed as of 1997 without discussing the question of a remedy, as called for by the terms of the remand, see Yonkers VII, 197 F.3d at 58—does not yet fully satisfy the terms of the remand. Filing of this opinion, therefore, with the Clerk of the Court of Appeals would not trigger the restoration of appellate jurisdiction.

[7] Because we foresee a possibility that, after the remanding panel has reviewed the findings set forth herein, the NAACP might renew its request for rehearing en banc, we emphasize that this Court does not agree with the panel's analysis of the record. We nevertheless recognize, of course, that the appellate panel's finding is the law of this case and we therefore have no authority to address the issue in this opinion.

[8] The State Defendants also read this passage as an articulation of a "standard of legally sufficient evidence," which they insist this Court is bound to apply in its consideration of any other putative vestiges of segregation. (See id. at 11.) We do not understand, however, the State's reasoning in urging that evidence which is insufficient for one purpose is necessarily insufficient for other purposes. That is not our understanding of the panel's opinion and it is not the approach we pursue in this opinion.

[9] For this Court's approach to the achievement gap, see infra Part III(A) & Appendix A.

[10] Our definition of a vestige of segregation is identical to the one we applied in our earlier opinion on this subject. See Yonkers IV, 833 F.Supp. at 218-19. The Court of Appeals expressed no disagreement with that definition and we are aware of no intervening legal development which would cause us to alter it.

[11] The fundamental document upon which New York's State Education Department bases its programs is the New Compact for Learning ("New Compact"). (See 1993 YBE Ex. 71.) The New Compact lists a set of fundamental principles which underlie the state's educational system, first among which is the principle that "All children can learn.... No child should be permitted to fail." Indeed the phrase, "all children can learn" is the title of a supplemental report prepared by the State Board of Regents in 1993-94 "to support schools and implement A New Compact for Learning." (See 1993 YBE Ex. 72.) During the 1993 trial, one of the State Defendants' witnesses, Barbara J. Martinage, an SED employee, testified that the principle that all children can learn is a fundamental premise shared by all parties. (See 1993 Trial Tr. at 1892.) Similarly, Dr. Batista testified that, in his view, student outcome was a universally recognized measure of a school's success. (See id. at 84; see also id. at 1884-85 (Martinage).)

[12] The YBE takes no position, and has made no submission to the Court, with respect to this remand. Because the findings on which we elaborate today are based on the record developed in the 1993 and 1997 trials, we refer throughout this opinion to the positions taken by the YBE in the course of those proceedings, prior to the remand and have made occasional use of the YBE's putative submission (see NAACP's Reply Submission on Vestiges, Ex. 1) solely as an aid to highlighting relevant portions of the extensive record before the Court.

[13] See infra Part III(A) & Appendix A.

[14] Cf. Wygant v. Jackson Bd. of Educ., 476 U.S. 267, 276, 106 S. Ct. 1842, 90 L. Ed. 2d 260 (1986) (Powell, J., for four-Justice plurality) ("characterizing the concept of societal discrimination as being too `amorphous' to justify race-conscious remedial measures").

[15] The Court of Appeals' withdrawn opinion concluded that it was appropriate to assign the burden of persuasion to the YBE and the NAACP. See United States v. City of Yonkers, 181 F.3d at 309-12 (citations omitted). The court did not address the question in its final opinion, reasoning that it was "immaterial" because the evidence was "inadequate to support the district court's findings of vestiges regardless of where the burden of persuasion falls." Yonkers VII, 197 F.3d at 49-50; but see id. at 55 (addressing expert analysis of test scores and explaining that "[t]he burden of proof comes into play at this juncture. If (as is the case) the Yonkers Board cannot demonstrate salient differences between its experience with changing school demographics and the experience of other school districts, there is no reason to attribute the Yonkers experience to circumstances particular to Yonkers and its history of segregation."). Although the court clearly does, therefore, seem to assign the burden of persuasion to the YBE and NAACP in the course of its opinion, we treat the issue as open given the court's express assertion that the burden of proof issue is immaterial to its opinion.

Although it is not necessary to our reasoning, we would assign the burden of persuasion to the City and the State Defendants. We recognize that, on a few occasions, courts have assigned the burden of proof to the party claiming that vestiges exist. See Coalition, 90 F.3d at 776-77; Sch. Bd. v. Baliles, 829 F.2d 1308, 1312 (4th Cir.1987); Riddick by Riddick v. Sch. Bd., 784 F.2d 521, 534 (4th Cir.1986). But the more common practice has always been to allocate the burden of persuasion to the parties who have been found to have violated the Constitution. See Fordice, 505 U.S. at 739, 112 S. Ct. 2727; Keyes v. School Dist. No. 1, Denver, 413 U.S. 189, 209-10, 93 S. Ct. 2686, 37 L. Ed. 2d 548 (1973); Swann, 402 U.S. at 26, 91 S. Ct. 1267. The decisions to the contrary in Coalition, Baliles, and Riddick are unpersuasive in this case because in each of those cases there had been either a declaration or, at least, a finding of unitary status. See Coalition, 90 F.3d at 758; Baliles, 829 F.2d at 1311; Riddick, 784 F.2d at 525. In Riddick, the vestiges issue was raised pursuant to an application to reinstate the case after it had been dismissed. See Riddick, 784 F.2d at 525. Similarly, in Wessmann v. Gittens, 160 F.3d 790 (1st Cir.1998)—a case heavily relied upon by the remanding panel— unitary status had been declared, see id. at 792 (citing Morgan v. Nucci, 831 F.2d 313, 326 (1st Cir.1987)), and the issue of vestiges was before the court pursuant to a constitutional challenge to the race-conscious remedial policy that had been instituted during the period of federal supervision. By contrast, in this case, there has been no finding or declaration of unitary status, and no party has even filed an application to have unitary status declared. Once unitary status has been declared, a party alleging the existence of vestiges is in a posture similar to that of a plaintiff at the liability stage and, quite naturally, therefore bears the burden of proof. But where, as is the case here, there has been no finding of unitary status, the parties alleging the existence of vestiges are entitled to a presumption that current disparities are tied to the violations we have previously found to have occurred.

We also recognize, as did the courts in Coalition and Baliles, that the traditional presumption that lingering disparities are the effect of prior segregation has most commonly been applied with respect to the so-called Green factors—faculty, staff, transportation, extracurricular activities and facilities. See Green v. County Sch. Bd., 391 U.S. 430, 435, 88 S. Ct. 1689, 20 L. Ed. 2d 716 (1968). But the presumption has certainly not been limited, exclusively, to cases in which the only alleged vestiges are Green factors. See Jenkins v. Missouri, 205 F.3d 361, 366 (8th Cir. 2000) (placing the burden of proof on the defendant with respect to "the issue of reduction in student achievement and the achievement gap"), rev'd on other grounds, 216 F.3d 720 (8th Cir.2000) (en banc). In fact, in Coalition itself, the court notes that at earlier stages in the proceedings, the court had allocated the burden of proof to the defendants with respect to educational quality vestiges of segregation. See Coalition, 90 F.3d at 776. We see no reason, therefore, to alter in this case the traditional rule that the parties who have been found to have violated the Constitution bear the burden of proving that the effects of their violations have been eradicated to the extent practicable.

[16] The State was given every opportunity to provide such information, but has never done so. When data about these four other school districts was first proffered to the Court, the following colloquy ensued:

THE COURT: I have looked at those graphs, and I have questions with respect to them. Are you planning to call the proponent of those graphs?

COUNSEL: I wasn't because ... he was saying this is what it is, but I will if the court has questions about them.

THE COURT: I don't really know what they tell me. They compare findings in Yonkers with other communities. I don't know whether those other communities' public school systems suffered from some of the discriminations which were operative in Yonkers. I don't know that Yonkers is claiming that it is unique in American society. ...

COUNSEL: [The proponent of those graphs] does not address that question. All he does is collect the information that is filed with him.

THE COURT: What is the [probative] value absent that?

COUNSEL: Because, your Honor, the point is Yonkers is claiming it has special needs because it had to share desegregation. ... [T]he graphs show ... communities where it has never been alleged they had segregation, the gaps are the same or larger. Yonkers—

THE COURT: But let me interrupt you. You say in communities in which it has never been alleged that there is racial discrimination?

COUNSEL: It has never been a determination, as far as I know, never been a court case involving the district, the entire districts that we have given you.

THE COURT: What does that tell me? What does that tell me with respect to the probative value of a comparison of the communities? There have not been school desegregation cases in every community in the state, to my knowledge, in which racial discrimination existed. In some instances, ... self-correction has been attempted. In some instances I think—I don't know the situation has been address for whatever reason.

COUNSEL: .... A district cannot be presumed to have had segregation. That has to be something proved in the case by the preponderance of the evidence, and it is not our burden to show that these districts have never had—never been de jure segregated.

THE COURT: It is the burden at least of these graphs.

COUNSEL: All the graphs are doing is providing the information.....

THE COURT: ... [T]hat is not really a meaningful statement to say is all it is providing the information. What I'm trying to explore with you is what the probative value of the information is. If I understand what you're saying is that these charts show there are other school districts in the state in which there is not a court decree finding that the school system is racially segregated, in which there are gaps between minority and non-minority student achievement— is that it?

COUNSEL: These are the school systems in the state, only school systems close to Yonkers in ethnicity and the ethnicity of their students and Yonkers is—the minority students in Yonkers are in most cases doing better tha[n] the minority students in these other districts.....

THE COURT: All right. I understand what you're saying.

(1997 Trial Tr. at 8-12.) When the proponent of the graphs was called, he reported that he was merely summarizing data filed with the state and had no other information about any of the four districts. (See id. at 187-209 (Streeter).) To date, the State Defendants have never come forward with any information about these four districts which might explain the probative value of the data they present.

[17] Moreover, while liability for segregation requires intentional segregative conduct, a finding that vestiges exist does not. See Dayton Bd. of Educ. v. Brinkman, 443 U.S. 526, 538, 99 S. Ct. 2971, 61 L. Ed. 2d 720 (1979) (explaining that the inquiry with respect to vestiges focuses on "the effectiveness, not the purpose, of the [school district's] actions"); but cf. Yonkers VII, 197 F.3d at 52 (rejecting the possibility that the Yonkers curriculum might be a vestige of segregation because "[t]here was no demonstration that those who drafted the curriculum in 1980 acted with racial animus to craft a school program such that children of certain ethnicities or races would fail to learn, or that the curriculum represented anything other than the pedagogical thinking of the time.").

[18] Judicial experience with desegregating school districts belies the notion that a racial disparity in test scores can not be remedied through court supervision. See Capacchione v. Charlotte-Mecklenburg Bd. of Educ., 57 F. Supp. 2d 228, 274 (W.D.N.C.1999) (noting that remedial measures had resulted in a seven-fold increase in black students' enrollment in advanced placement courses, a greater percentage of black students being prepared for the next grade level, and a reduction in the gap in achievement scores); Tasby v. Woolery, 869 F. Supp. 454, 476-77 (N.D.Tex.1994) (noting that court-ordered remedial measures had reduced the achievement gap from an average of 20 points to an average of 10-14 points); United States v. Bd. of Educ. of Chicago, 588 F. Supp. 132, 163 (N.D.Ill.1984) (finding that gap had been narrowed through remedial program); see also John A. Powell, Living and Learning: Linking Housing and Education, 80 Minn.L.Rev. 749, 793 n. 132 (1996) (noting a 19-point reduction in the achievement gap in Dallas schools and a 17-point reduction in Louisville schools); see general id. at 788-89 ("In cities across the country the achievement gap between black students and white students narrowed considerably with the implementation of school integration plans."); Paul Gewirtz, Choice in the Transition: School Desegregation and the Corrective Ideal, 86 Colum.L.Rev. 728, 798 n. 75 (1986) ("Integration has been linked to a range of equality goals: ... reducing achievement gaps between whites and blacks, especially educational achievement gaps.").

[19] There is no evidence currently before the Court as to current practices or conditions in the YPS, from which we might render a finding. In several places in the pages that follow we refer to circumstances using the present tense, because we do not wish to suggest that the practices we discuss no longer exist. It is expressly to be understood, however, that regardless of which tense is used, all findings of fact rendered herein relate to circumstances as they existed in 1997.

[20] Because we further find that there are policies and practices in Yonkers that disproportionately impact minority students and are traceable to the prior segregation, see infra Part III(B), we conclude that those policies and practices are vestiges of segregation.

[21] We might have added a fifth area about which the Court has received a great deal of evidence— the degree to which students participate in a more rigorous, academic curriculum. As is set forth below, selection for advanced classes in Yonkers is made, at least in part, on the basis of test scores and other measures of academic achievement. See Part III(B)(1). Participation in those classes, therefore, seems like a measure of academic performance. However, we believe that the record establishes that a variety of factors, in addition to academic performance, influences whether or not a student participates in a more demanding academic curriculum. See id. We therefore defer discussion of the issue until the portion of this opinion in which we discuss the policies and practices of the YPS with respect to tracking.

[22] (See, e.g., 1993 Trial Tr. at 455-57 (Weinberger); State Proposed Findings at 11-12, 30; State Reply at 6, 11.)

[23] The State Defendants' incessant repetition of this argument makes particularly puzzling their failure to subject evidence from the four comparison districts, upon which they seem to stake their entire defense, to a regression analysis. There is not a single bit of evidence in the record about these four districts which might be consulted to explain or analyze the "raw, gross, unanalyzed quantitative data" (State Reply at 11) urged upon the Court by the State Defendants.

[24] In 1996, due to a change in the MAT's publication policies, the YPS stopped administering the MAT and began administering the California Achievement Test ("CAT") instead. (See 1997 YBE Ex. 4, at ¶ 13.) The switch does not alter our analysis, but where relevant, we note any significant differences between MAT scores and CAT scores.

[25] One could not infer from this statistic, however, that this trend reflects a failure on the part of the YPS, since many of the students in the higher grades could be transferring in to the YPS. To evaluate whether the widening disparity in grade equivalents is due to failures in the YPS programs, it would be necessary to track a particular cohort of students as they progress through the school. The capability of conducting such an analysis was not available as of the time of the 1993 or 1997 trials. (See 1993 Trial Tr. at 359-60, 363-64 (Weinberger).)

[26] The Defendants argue that using the "total battery" overstates the magnitude of the disparity. (See 1993 Trial Tr. at 453-54 (Weinberger).) Because the MAT is taken by all students, including LEP students, they argue that language deficiencies account for a substantial portion of the disparity on the reading and language portions of the exam, but do not have as significant effect on the mathematics portion of the exam. (See id.) Indeed, the disparity in grade equivalents is smaller on the mathematics portion of the MAT than it is on the reading and language portions. (See 1993 Trial Tr. at 454-55 (Weinberger); 1993 YBE Ex. 5F.) It is not, however, non-existent.

[27] Beginning with the 1995-96 school year, the YPS switched from administering the MAT test to using the California Achievement Test ("CAT"), because of a change in the MAT's publication policy. (See 1997 YBE Ex. 4, at ¶ 13.) Although the CAT scores were more erratic than the MAT scores, the trend observed on the MAT scores (a widening of racial disparity from lower grades to higher grades), was also observed on the CAT. (See id. at ¶¶ 14-16 and Tab 5.)

[28] The only students who are not required to take the PEP exam are "[s]pecial education students for which there is a clinical and educational judgment that the test is not appropriate," and LEP students who have not yet received 20 months of instruction in English. (1993 Trial Tr. at 367 (Weinberger).)

[29] For example, on the 1996-97 6th grade math test, the SRP was 23 questions out of 65(35%) and the mastery level was 55 questions (85%). (See 1997 YBE Ex. 4, at ¶ 18.)

[30] The 1992 data, however, does not indicate the number of percentage of students who achieved mastery level. (See 1997 YBE Ex. 4, at Tab 6; 1993 YBE Ex. 6A.)

[31] In 1992, for example, only 6.9% of the white students failed to meet the SRP on the third grade math PEP, but 22% of the Latino students and 19.1% of the African-American students failed. (See 1993 YBE Ex. 6B.) For the sixth grade math test, the failure rate was 9.2% for white students, 22.1% for Latino students, and 24.5% for African-American students. (See id.) On the 1992 reading test, the third grade failure rates were 26.1% for white students, 53.5% of the Latinos, and 44.9% of the African-Americans; the sixth grade rates were 14.5% of whites, 39.7% of Latinos and 36.2% of African-Americans. (See id.) Minority students were approximately three to four times more likely to fail to meet the SRP on the fifth grade writing test in 1992; only 4.1 % of the white students failed, as compared to 18.2% of the Latinos and 15.1 % of the African-Americans. (See)

[32] The Defendants also contend that the disparities are not meaningful because they have not been subjected to a regression analysis. This is true, but as we have noted, not persuasive. In addition, we note that in another portion of Dr. Weinberger's testimony, he explained that the math portion of the PEP exam is provided in alternate languages, including Spanish, and that his analysis of those test results included the exams taken in the alternate languages. (See id. at 491-92 (Weinberger).) That the racial disparity in PEP scores was actually higher, on average, for the math tests (which are available in alternate languages) than it was for the reading tests suggests to us that the LEP status of some of the minority students does not provide a thorough explanation for the apparent disparity in PEP scores.

[33] This is also the portion of the opinion where the panel indicates, contrary to its earlier express assertion that the burden of proof question was immaterial, that "[t]he burden of proof comes into play at this juncture." Id. We agree that one must consider the question of which party has what burden in connection with the state's reference to scores outside Yonkers. We submit, however, that, at a minimum, the State has the burden of proffering evidence to show that the out-of-Yonkers districts were truly comparable and had not in fact engaged in de jure segregation regardless of whether or not they were subject to a court finding on that subject. See supra at pp. 703-05.

[34] We do not suggest bad faith on the part of the State Defendants. The State does not mandate that all schools administer the MAT (as it does with the PEP) and it is possible, therefore, that comparable MAT scores were not available in the other districts.

[35] The SED penalizes school districts that have a dropout rate higher than 10%, which creates an incentive on the part of school districts to under-report the number of dropouts in their district. (See 1993 Trial Tr. at 388-89 (Weinberger).)

[36] Overall, the dropout rate in the YPS declined consistently between 1987 and 1996, with an upward spike occurring in 1992. These trends were consistent across all three racial/ethnic groups.

[37] That group was defined as "those in the high schools who are, by the credits that they have earned, moved forward from 11th grade to 12th grade, and if, as of the beginning of the year." (1993 Trial Tr. at 396 (Weinberger).)

[38] Moreover, there is no evidence in the record to support an inference that the uncounted dropouts are disproportionately members of minority groups. If Dr. Weinberger's characterization was intended to convey his belief that the racial disparity in dropout rates is actually stronger than his data indicate, we disagree and do not so find.

[39] The State urges us to reject this wholly plausible inference on the ground that it is "an untested assumption," which "has no evidentiary value." (State Defendants' Proposed Findings of Fact at J(11).) However, we believe this is an inference grounded in logic and analysis and does not, therefore require an evidentiary basis.

[40] The data reflected strikingly little upward or downward movement, staying consistent at approximately 55%, between 1988 and 1996. (See id.)

[41] We are somewhat unsure as to the extent of the finding rendered by the Court of Appeals with respect to what it called the "educational theory" vestige. Our review of the court's discussion, and the evidence cited therein, convinces us that it is limited to curricular changes, and the introduction of new pedagogical methods such as the "cooperative learning" technique which was the subject of a great deal of testimony in the 1993 trial. Since the NAACP's discussion of ability grouping in the elementary schools is, in substance, a discussion of pedagogical methods such as cooperative learning, the finding rendered by the remanding panel precludes this Court from addressing that putative vestige of segregation. By contrast, because academic tracking relates neither to the curriculum nor to a pedagogical method, per se, we believe it is still open for us to consider whether a vestige of segregation existed as of 1997 with respect to academic tracking. We don't consider the policies or practices that the NAACP refers to as "curricular inadequacies" and "inappropriate classroom practices" because we believe any finding with respect to those practices is foreclosed by the Court of Appeals' conclusion that the record with respect to "educational theory" does not support a finding that those practices are vestiges of segregation.

[42] See supra note 17.

[43] Donald Duncan, the director of Human Resources at the YPS, and a former New York City police officer, testified about one occasion in which church officials in Yonkers approached him because their parishioners informed them that public school teachers in the YPS had expectations of them that were too low. (See id. at 816.) Although this testimony would be inadmissible hearsay if considered for its truth, it is admissible as evidence of the sources of this witness's understanding with respect to teacher expectations.

[44] For one explanation of the historical dynamic, consider the testimony of Lincoln High School principal, Bedelia Fries:

If you've only had to work with a very homogeneous clientele that is self-driven, self-motivated, upper middle class or middle class parents who value education, most of them were college bound, basically kids who could teach themselves, if you've only taught that type of youngster for 10, 20, 30 years and suddenly you are working with students who come from families where there is neglect, where education is perhaps not as valued, where there are dysfunctional behavior exhibited, minority students that feel, for whatever reason right or wrong, that perhaps society hasn't dealt them a fair hand and these kind of students now walk into your classroom and you haven't had any training or very little training, that's going to be a tough, a tough thing to deal with.

(Id. at 1038; see also id. at 1189 (Cardona-Zuckerman).) As of 1993, approximately 1200 of the 1600 teachers in the YPS had worked there prior to 1986. (See id. at 1614-15 (Guerney).)

[45] Indeed, it is entirely unclear to this Court how a party could establish a vestige of segregation with respect to educational quality without relying, at least in part, on anecdotal evidence. The only vestiges which might be proved without relying at all on such evidence would be the relatively objective and quantifiable factors identified by the Court in Green. But, as we noted above, it is well established that the Green factors are not an exclusive list of factors that might be vestiges of segregation. Given that to be the case, it seems to us to be necessary for anecdotal evidence to be permitted to support a finding that vestiges of segregation exist with respect to educational quality, much as anecdotal evidence is relied upon in other areas of the law concerning discrimination.

[46] By contrast, the State Defendants' witnesses provided testimony about the lack of racism which could, equally, be characterized as "anecdotal." However, most of those witnesses had no experience whatsoever in the Yonkers public schools. (See, e.g., 1849, 1855 (Adams).) Not a single one of those witnesses, however, has had a single day of experience working in the Yonkers public schools. On balance, therefore, we credit the anecdotal evidence of those who have a degree of familiarity and experience with local conditions over that of witnesses who have none.

[47] The educational merit of this approach is the subject of a significant controversy among educators and policymakers. See People Who Care, 111 F.3d at 536-37; Jack W. Londen, School Desegregation and Tracking: A Dual System Within Schools, 29 U.S.F.L.Rev. 705, 710 (1995) ("[S]ubstantial educational research show[s] that tracking is harmful, particularly to students placed in low-track classrooms.") (citations omitted); Kimberly C. West, Note, A Desegregation Tool that Backfired: Magnet Schools and Classroom Segregation, 103 Yale L.J. 2567, 2577-79 (1994). We understand, and respect, that it is educators rather than a federal court that should resolve that controversy. Our treatment of this issue relates solely to the question of whether such a policy is consistent with the City and State Defendants' duty to eradicate all vestiges of segregation in the YPS. That question is one as to which the Court is competent, and as to which judicial experience is relevant.

[48] The College Bound Study produced a lengthy report detailing academic tracking in the YPS and the causes thereof. (See 1993 YBE Ex. 22.) While this report was the subject of a substantial amount of testimony during the 1993 trial, it was not admitted for its truth, but rather to explain the bases for conclusions reached by various administrators. (See id. at 58 (Batista).)

[49] The Defendants challenge the accuracy of the conclusions reached in the College Bound study on the grounds that defining students with high ability as those students who had scored at or above the 50th percentile on the MAT was arbitrary, and that the study did not distinguish among students within that broadly-defined category. (See 1993 Trial Tr. at 493-94 (Weinberger).) While these criticisms appear, to us, to be valid criticisms of the College Bound study and its conclusions, they do not effect the validity of the underlying data prepared by Dr. Weinberger.

[50] This result is, of course, not surprising in light of our finding about the achievement gap. See supra Part III(A) & Appendix A.

[51] "Kids are so sensitive, so honest, and they can smell attitudes on the part of adults toward them. Yeah, when I encounter youngsters in the halls and [ask] why aren't you in class if they're cutting, well, that teacher, and then you get the student's version of what's not happening in the classroom. Often times it is an issue, what I would call low expectation, the child may not call it that but I would view it as that." (1993 Trial Tr. at 1047 (Fries).) The YBE did not, however, present any quantitative evidence of minority students' self-esteem. (See id. 177, 234-39 (Batista), 706 (Pack), 833 (Duncan).)

[52] Dr. Batista testified that, as a guidance counselor during that period, he routinely was encouraged to steer minority students towards that curriculum. (See id. at 18, 72 (Batista).)

[53] Ms. Fries also testified that some non-minority teachers are more confrontational with minority students than they are with non-minority students. (See id. at 1037.) However, she testified that other non-minority teachers felt threatened by minority students and were, therefore, less likely to confront them. (See id. at 1035.)

[54] Dr. Cox testified that the basis for her judgment with respect to suspensions was the review she conducts of reports prepared from superintendent's hearings, in the course of determining an appropriate placement for the child upon their return to school. (See id. at 1751.)

[55] The data presented with respect to suspensions reports the total number of suspensions, which, of course, is greater than the total number of students suspended because some students are suspended more than once. (See 1997 YBE Ex. 4, at ¶ 22; 1993 Trial Tr. at 373 (Weinberger).) To account for this, Dr. Weinberger also presented data on the total number of students suspended. The results, with respect to the proportion of minority students suspended were roughly the same regardless of which measurement technique was used. (See 1997 YBE Ex. 4, at ¶ 22 & Tab 9.)

[56] (See also 1993 Trial Tr. at 1681-82(Cox) (testifying that, in her experience as a counselor working with students who have been suspended pursuant to a superintendent's hearing, "[a]n overwhelming number of the children are Afro American males").)

[57] The disproportion was of a similar magnitude for junior high and elementary school students, but the absolute numbers were slightly lower. (See id.)

[58] As with the disproportionate rate at which minority students were subject to discipline, the overall data represents a smaller disparity than it would if it desegregated students by gender. (See id. at 1699-1703(Cox)).

[59] Moreover, if one were to focus solely on African-American and Latino males, the disproportion would be even more pronounced. (See 1993 Trial Tr. at 381-83 (Weinberger); 1993 YBE Ex. 25.) African-American and Latino males were referred for special education at a much more disproportionate rate than were African-American and Latino females, who were not referred at a noticeably disproportionate rate. The overall figures described above, therefore, are averages that are brought down somewhat by the inclusion of both genders.

[60] According to Dr. Cox, many students who are referred, but not placed are stigmatized by the experience and are the subject of extremely low expectations on the part of the staff. (See id. at 1761-62.)

[61] And, of course, if it should ultimately be concluded by the appellate courts that no vestiges exist, current conditions will impact on whether the YBE will be found to have demonstrated the good faith required to warrant a finding of unitary status.

[1] "Regression analysis is a statistical method that permits analysis of a group of variables simultaneously as part of an attempt to explain a particular phenomenon .... The method attempts to isolate the effects of various factors on the phenomenon." James v. Stockham Valves and Fittings Co., 559 F.2d 310, 332 (5th Cir.1977).

[2] Dr. Braddock also testified that he reviewed many of the numerous court documents that this case has generated, including this Court's opinion with respect to the YBE's and the City's liability for intentional segregation, see Yonkers I, to develop a sense of the case's history and background (see id. at 1366).

[3] A similar notation system is used to denote other degrees of statistical significance. pSee Smith v. Xerox, 196 F.3d 358, 365-66 (2d Cir.1999) (citing Waisome v. Port Authority of New York & New Jersey, 948 F.2d 1370, 1376 (2d Cir. 1991)) (explaining that statistical significance of at least p

[4] The correlation between race and achievement scores on the reading test was greater than the correlation on the math test. It accounted for approximately 12% of the variance on the reading test, but only 10% of the variance on the math test. It seems plausible that this trend is due to the disproportionately high number of LEP students who are minorities. Because performance on the reading test is, presumably, more significantly affected by one's language ability than is performance on the math test, the disproportionate number of LEP students who are minorities might account for their being a seemingly larger race effect on the reading test than on the math test. This hypothesis is partially confirmed by disaggregating the disparity between white and African-American scores from the disparity between Latino and white scores. The average disparity between African-American scores and white scores was 28.25 (28.7 on reading and 27.8 on math). For Latinos, the disparity with white scores was 32.2 (37.45 on reading and 26.95 on math). When student background, including LEP status, was taken into account, the disparity between white scores and African-American scores dropped to 24.14 (24.03 on reading and 24.26 on math), but the disparity between white and Latino scores dropped to only 13.95 (15.35 on reading and 12.56 on math).

[5] For example, the first control variable considered was student background. When that variable was included, Dr. Braddock did not report the correlation between test scores and student background alone; he reported the correlation between test scores and the entire set of independent variables that had been included up to that point (i.e., race and student background). When the next independent variable was included, prior test score, the reported correlation was between test scores and all the independent variables included up to that point (i.e., race, student background, and prior test scores). Finally, when the last independent variable was included (school characteristics), the reported correlation is that which was observed between test scores and all the independent variables (i.e., race, student background, prior test scores, and school characteristics).

[6] For example, when Dr. Braddock included school characteristics in his model, the size of the overall disparity for 1990-91 seventh grade math scores increased from 11.810 to 14.148 (for African-Americans) and from 6.148 to 7.457 (for Latinos). A similar pattern can be seen with respect to the 1990-91 eighth grade math scores, almost all of the 1991-92 reading scores, and many of the 1991-92 math scores.

[7] Including student background as a control variable had a more noticeable effect on the disparity between the scores of Latino students and white students than it did on the disparity between African-American students' scores and white students' scores. Of the nine comparisons in which the degree of significance dropped from p

[8] "The best predictor of student performance on some academic outcome is how they have performed previously on that same academic outcome.... [T]hat's the reasoning behind using grades and tests for selection into college programs." (Trial Tr. at 1383 (Braddock).) Accordingly, "[h]aving a measure of prior performance and finding an effect of either factors over and above the effect of prior performance on an achievement outcome is strong evidence of the veracity of that effect." (Id.)

[9] Consideration of prior test scores had a more substantial effect in the later grades than it did in the earlier grades. Of the 24 comparisons made for students in grades 1-3, the disparity in minority achievement scores remained at least at the .01 level for all but 5, consisting primarily of the comparison of white and Latino scores on the math test. Of the 24 comparisons made in grades 4-6, only 12 remained at the .01 level or higher. And, of the 24 comparisons made for grades 7-9, only 7 were at the .01 level or higher.

[10] Those two comparisons were the comparison of African-American students' scores and white students' scores on the 1990-91 first grade math test and on the 1991-92 third grade math test. With respect to both of these comparisons, inclusion of school characteristics reduced the degree of significance of the disparity from p

[11] There was a greater degree of disparity between African-American students' scores and white students' scores, which was statistically significant for all 36 comparisons, than there was between white students' scores and Latino students' scores, which was only statistically significant for 18 of the 36 comparisons.

[12] When no controls were used, African-American students also seemed to have scored higher than Latino students on the reading and language tests.

[13] A degree of bias was thereby introduced into his study, because, as Dr. Rindskopf acknowledged, minority students are disproportionately of LEP status and are disproportionately enrolled in special education programs. In fact, specifically in the six census tracts he examined, exclusion of LEP students amounted to an exclusion of 1/3 of all the Latino students (the lowest performing 1/3) (see id. at 2033-34 (Rindskopf)), but less than 1% of the white students and 8-9 % of the non-minorities. (See id. at 2032 (Rindskopf).)

[14] The Latino students, however, scored higher than the African-American students on many of the tests, suggesting that LEP status was an important factor in causing the unadjusted scores of Latino students to be lower than the unadjusted scores of African-American students.

[15] Dr. Rindskopf defined socioeconomic status as consisting of "income, education and occupation." (Tr. at 2061.) Census tracts do not seem to provide any useful information about any of those three items. (See id.)

[16] Census tracts were used, as opposed, for example, to postal zip code, because they describe a much more circumscribed geographical area. (See 1993 Trial Tr. at 1999 (Rindskopf).)

[17] The authority for this conclusion comes from experience with commercial marketing, in which products are marketed according to census tract. (See Trial Tr. at 2092 (Rindskopf).)

[18] As Dr. Rindskopf described it: "Past research has shown that within a census tract, people are more alike on a lot of different characteristics than would be two people from different census tracts. So that census tract is being used as a geographic control." (Trial Tr. at 2062.) Dr. Rindskopf explained that he did not know one way or another whether the six census tracts he studied controlled for the traditional SES factors. (See id. at 2064.)

[19] Dr. Rindskopf was clear, however, that a census tract is not a proxy for income (see Trial Tr. at 2059 (Rindskopf)); Dr. Rindskopf acknowledged that, at least with respect to a poor person's income (which was his focus in this study), participation in a subsidized lunch program is a better measure of an individual family's income than is the census tract in which he lives. (See Trial Tr. at 2042.) Moreover, as Dr. Rindskopf acknowledged, there is substantial variation in income within any given census tract. (See Trial Tr. at 2043-47 (Rindskopf).)

[20] Each of the six census tracts that Dr. Rindskopf utilized is located in Southwest Yonkers. (See Trial Tr. at 2058 (Rindskopf).)

[21] A statistically significant disparity was still observed between African-American and Latino students.

[22] The use of that control variable is somewhat analogous to the frequently litigated issue in the employment context as to whether the relevant sample is the entire minority population, or just the pool of qualified minority applicants.

[W]here the issue is not present discrimination but rather whether past discrimination has resulted in the continuing exclusion of minorities from a historically tight-knit industry, a contrast between population and work force is entirely appropriate to help gauge the degree of the exclusion. In Johnson v. Transportation Agency, Santa Clara County, [480 U.S. 616, 107 S. Ct. 1442, 94 L. Ed. 2d 615 (1987)], Justice O'Connor specifically observed that, when it is alleged that discrimination has prevented blacks from `obtaining th[e] experience' needed to qualify for a position, the `relevant comparison' is not to the percentage of blacks in the pool of unqualified candidates, but to `the total percentage of blacks in the labor force. Id. at 651, 107 S. Ct. 1442; see also Steelworkers v. Weber, 443 U.S. 193, 198-99 and n. 1, 99 S. Ct. 2721, 61 L. Ed. 2d 480 (1979); Teamsters, 431 U.S. at 339 n. 20, 97 S. Ct. 1843.

City of Richmond v. J.A. Croson Co., 488 U.S. 469, 541, 109 S. Ct. 706, 102 L. Ed. 2d 854 (Marshall, J., dissenting); see also E.C. Dothard v. Rawlinson, 433 U.S. 321, 330, 97 S. Ct. 2720, 53 L. Ed. 2d 786 (1977) (finding total population, rather than pool of qualified applicants, to be relevant sample because "[t]he application process might itself not adequately reflect the actual potential applicant pool, since otherwise qualified people might be discouraged from applying because of a self-recognized inability to meet the very standards challenged as being discriminatory"). Similarly, here, where the issue is whether lingering effects of discrimination exist, to control for students with low prior test scores is tantamount to excluding those who have already been the victims of the prior discrimination.

[23] "Even if the line were correctly chosen, the black students eligible for free lunches could be on average significantly poorer than the white students eligible for them; they could be further below the poverty line, and this could make a difference in their educational achievement." Id.

[24] Any review of pertinent caselaw quickly confirms the accuracy of this testimony. See, e.g., Capacchione v. Charlotte-Mecklenburg Schs., 57 F.Supp.2d at 275; Manning v. Sch. Bd., 24 F. Supp. 2d 1277, 1323 (M.D.Fla.1998); Reed v. Rhodes, 1 F. Supp. 2d 705, 739 (N.D.Ohio 1998); Tasby v. Woolery, 869 F.Supp. at 477. According to Dr. Braddock, participation in a subsidized lunch program was even used as a proxy for socioeconomic status in a Congressional Reauthorization of Chapter I funding. (See id. at 1380.)

[25] Even the City's statistical expert, Dr. Steinberg, concluded that Dr. Braddock's study was as complete as it could have been, given the data available. (See Trial Tr. at 2146 (Steinberg).)

[26] This problem was magnified by Dr. Rindskopf's use of unweighted means across the six tracts to determine aggregate scores. In determining the mean score for each race for the six tracts, in the aggregate, Dr. Rindskopf simply averaged the six mean scores that he found for each of the six tracts, weighting them all equally. In some of the tracts, however, as few as 7 students were considered; their scores were weighted equivalently to other tracts in which 23 students were considered. Dr. Rindskopf justified this approach on the ground that the six tracts were relatively homogeneous—one implication of which is that it makes little sense to differentiate among the six tracts. It would seem to make little sense therefore to aggregate the scores for the six tracts by differentiating the mean for each of the tracts and then computing a weighted average for all of them. (See generally Trial Tr. at 2067-74 (Rindskopf).)

[27] The white students in the six census tracts with which Dr. Rindskopf worked are significantly less well off than white students, on average, in the City of Yonkers. (See Trial Tr. at 2041 (Rindskopf).) Relative to African-American and Latino families in the other 43 census tracts, those in the six tracts studied were at average levels of poverty. The six census tracts examined, then, can be characterized as ones including "poor whites with average blacks in Hispanic neighborhoods." (See Trial Tr. at 2041 (Rindskopf).)