Following a bench trial, the superior court upheld the State Bar's denial of Petitioners' request on five independent grounds. We need address only the first of them. The court correctly found Petitioners' request to be beyond the purview of the California Public Records Act ( Gov. Code § 6250 et seq. (CPRA) ) because it would compel the State Bar to create new records.
BACKGROUND
I. Phase I Litigation
This case has taken a long and well-documented path between the trial
At issue in Sander was the trial court's determination that the State Bar has no common law or state Constitutional duty to disclose records in its admissions database. The Supreme Court reversed and remanded. It explained: "The question presented is whether any law requires disclosure of the State Bar's admissions database on bar applicants. We conclude that under the common law right of public access, there is sufficient public interest in the information contained in the admissions database such that the State Bar is required to provide access to it if the information can be provided in a form that protects the privacy of applicants and if no countervailing interest outweighs the public's interest in disclosure. Because the trial court concluded that there was no legal basis for requiring disclosure of the admissions database, the parties did not litigate, and the trial court did not decide, whether and how the admissions database might be redacted or otherwise modified to protect applicants' privacy and whether any countervailing interests weigh in favor of nondisclosure. Consequently, the Court of appeal will be directed to remand the case to the trial court." (Sander, supra ,
The Supreme Court in Sander expressly left open whether changes to the admissions database necessary to protect applicants' privacy would entail the creation of new records, and thereby exceed the scope of disclosure required under public access laws. ( Sander, supra ,
Following remand, more than a dozen individuals who had applied to take the California Bar Exam since 1972 and two non-profit professional associations of African American lawyers, the Black Women Lawyers Association of Los Angeles, Inc. and the John M. Langston Bar Association
II. The Admissions Database
Subject to a stringent stipulated protective order, the State Bar provided Petitioners' experts with highly confidential data from its admissions databases to facilitate expert analysis concerning the issues to be tried upon remand. The admissions databases consist of five separate text files containing for each applicant who sat for the bar examination between 1972 and 2008: (1) the number of times the applicant took the exam, and whether he or she passed or failed; (2) law school graduation date and a code for any law school attended; (3) LSAT score and law school GPA; (4) race and ethnicity; (5) a file linking the law school codes and law school names.
III. Post-Sander Legislation
In 2015 the Legislature enacted Business and Professions Code section 6026.11, which for the first time made records of the State Bar subject to production under the CPRA. (Stats. 2015, ch. 537, § 6.) It also enacted a specific confidentiality statute governing State Bar admission records. With exceptions not relevant here, section 6060.25 provides that "Notwithstanding any other law, any identifying information submitted by an applicant to the State Bar for admission and a license to practice law and all State Bar admission records, including, but not limited to, bar examination scores, law school grade point average (GPA), undergraduate GPA, Law School Admission Test scores, race or ethnicity, and any information contained within the State Bar Admissions database or any file or other data created by the State Bar with information submitted by the applicant that may identify an individual applicant ... shall be confidential and shall not be disclosed pursuant to any state law, including, but not limited to, the California Public Records Act. ..." ( Bus. & Prof. Code, § 6060.25, subd. (a), added by Stats 2015, ch. 537, § 8.)
IV. The "Phase II" Trial
The "Phase II" issues were tried to the court over five days. Much of the trial was devoted to competing expert testimony about whether disclosure of the admission records may reveal bar applicants' private information or
A. Data Anonymization and Re-Identification
Dr. Latanya Sweeney, a leading expert in data privacy and anonymization, provided a 67-page report and testified for the State Bar about the risks to bar candidates' privacy in releasing data in the manner proposed by Petitioners. Dr. Sweeney is a professor at Harvard University, where she leads the Data Privacy Lab and teaches classes in data privacy. She holds a Master's Degree in computer science and electrical engineering and a Ph.D. in computer science from M.I.T. Her work in the field of data privacy has been cited in over 5,000 scientific publications.
Dr. Sweeney testified that re-identification refers to whether someone can "use reasonable effort to match the person's identity to details in the released dataset sufficient to know enough information about the person to identify him or her as a specific person." "We use the term 'named person' to refer to having sufficient information to individually identify the person who is the subject of the data. Thus, if records in the DataSet can be associated with named people, then the DataSet would be re-identified. Harm from a reidentification can result if sensitive information contained in the data becomes known about named persons. Although in most circumstances privacy concerns relate to identification of an individual by strangers, in some cases, targeted identification of a specific known person, and even self-identification, can be problematic where either the data is not intended to be known to the person to whom it pertains or the facts that enable self-identification can be compelled (such as by a prospective employer) even where they are not generally known to the public. In each case the question is the same: can the 'anonymous' data be re-identified such that information is learned about specific individuals?"
Dr. Sweeney explained: "A unique re-identification occurs when a record in the data matches to one person's information uniquely. A 'group re-identification' occurs when a few records in the dataset match to a small number of people. Both are examples of reidentification that raise privacy
B. Petitioners' De-Identification Protocols
Petitioners' experts proposed four methods or "protocols" for rendering data anonymous that they believed could protect privacy rights without unduly burdening the State Bar. Under Protocol One, the State Bar would set up a physical "data enclave" to house a version of the admissions database stripped of personal identifiers (name, address, social security numbers and the like) and specified records (e.g., records of students who attended unaccredited or correspondence schools or whose race is coded American Indian, Alaska native, Filipino, Pacific Islander or from the Indian subcontinent and for law schools that graduated fewer than 10 students who took the bar exam in a given year). These processes would exclude approximately 30 percent of all student records. The remaining data would be converted into a format compatible with a statistical analysis software package and maintained in a "safe room" where members of the public could conduct research under the supervision of an on-site operator. Anyone seeking access to the data would have to explain their purpose for seeking the information and sign an agreement not to re-identify individual applicants from the data. Once granted access, users could use only electronic equipment and software provided within the safe room and would be strictly limited as to hours of access and the kind of information they could take away.
Under Protocols Two, Three and Four, petitioners proposed to apply various techniques to the admissions database, such as data redaction, recoding and binning, that would conceal applicant identities and prevent the risk of reidentification.
Applying k-anonymity to a dataset requires both choosing the size of k and the data fields (e.g., bar passage, school, graduation year and race), also called variables or attributes, to be anonymized.
Under Protocols 2 and 4 Petitioners would apply a k of 11 to four different variables: law school, graduation year, race, and whether the person ever passed the bar.
Protocol 3 did not employ K-anonymity. Instead, it principally relied upon a statistical analysis or, "more precisely, it describes intense computations of mathematically determined standardizations of the data that report relative positions within various distributions found in the data." According to Dr. Sweeney, "it is clear that this protocol requires a great deal of effort, even for me writing a Python [software] program." "The first step in Protocol 3 removes huge portions of the data. All data from 1982-1984, and 1999-2005 are removed. Records from 1985-1998 and 2006-2008 remain. The second step recodes race into the same four categories used in Protocol 2 and Protocol 4-namely, black, white, Hispanic and other. Then come the steps involving statistical computations. The computations are done only on those records for law schools having 20 or more LSAT scores in a given year; all other records are dropped. The computations themselves require computing different averages and standard deviations and then recoding the data with new LSAT fields that contain the relative position of the original LSAT score within the distributions computed." The next step in Protocol 3 manipulates the GPA field similarly, while the year of graduation is replaced with a 4-year period and, finally, law school names are (with some exceptions) replaced with "California" or "out of state." Petitioners' experts describe Protocol 3 as "a more radical approach to de-identifying the data" and conceded it was "not a method we recommend, because of its impact upon data utility."
Protocol Four, as described by Petitioners' experts, incorporated Protocol Two, and (1) randomly redacted the applicant's law school for 25% of the observations (see infra, fn. 4); (2) rounded or suppressed law school GPA to no more than 2 digits; and (3) redacted unique GPA values.
Dr. Sweeney concluded that "[a]lthough the level of risk varies amongst the protocols, each of Petitioners' proposals for releasing the State Bar admissions data presents cognizable risks that individuals may be specifically identified in the data, and thus their bar scores, academic history, and other private information publicly revealed. Not only may the information reveal specific individuals, as I have demonstrated it clearly does reveal information on specific individuals. [¶] I am not opining that it is impossible to anonymize the data, quite the contrary. However, the proper way to anonymize this kind of sensitive data requires anonymization of all fields in the data ... and to do so using scientifically proven methods, not ad hoc binning, including replacing substantial portions of the data with more generalized data or codes, and potentially adding additional fictitious data. Petitioners' protocols come nowhere close to meeting those standards or otherwise assuring that none of the individuals in the DataSet can be identified with reasonable certainty to
Admissions director Murphy testified that the State Bar does not maintain admissions data in the clustered and banded formats Petitioners were requesting and that it would have to create new documents to provide the requested records.
Petitioners introduced an expert report that explained their protocols and disagreed with Dr. Sweeney's analysis. Their expert, Luk Arbuckle, testified about the efficacy of the protocols to produce useful data while protecting the privacy of bar applicants.
The court denied the petition in a detailed 22-page order on five independent grounds. "1. The disclosure of the requested records pursuant to any of
Petitioners filed a timely appeal and petition for writ of mandate from the superior court judgment. We consolidated the appeal and the writ petition and issued an order to show cause why the petition should not be granted. We have also considered amicus curiae briefs filed on Petitioners' behalf by the Reporters Committee for Freedom of the Press and 13 media organizations; the Pacific Legal Foundation; the National Association of Scholars; Gail Heriot and Peter Kirsanow; and the Electronic Frontier Foundation.
DISCUSSION
The court's initial ground for denying the petition is that disclosure of the bar admissions data would require the Bar to create new records, a duty not imposed by California's access to public records laws. Petitioners contend this was erroneous as a matter of law and that undisputed evidence shows the data manipulations required to institute their proposed protocols are within the duties imposed by the CPRA. We disagree.
I. Standard of Review
We independently review the trial court's interpretation of the CPRA and its application to undisputed facts, but accept the court's findings of historical fact if supported by substantial evidence. ( American Civil Liberties Union of Northern Cal. v. Superior Court (2011)
" ' "A court's overriding purpose in construing a statute is to ascertain legislative intent. ... [Citation.] In interpreting a statute to determine legislative intent, a court looks first to the words of the statute and gives them
II. Analysis
"The core purposes of the CPRA are to prevent secrecy in government and to contribute significantly to the public understanding of government activities." ( Fredericks , supra ,
As manifested by this case, an unavoidable tension exists between the CPRA's laudable purposes of transparency and disclosure and " 'the equally important public interest in protecting citizens and public servants from unwarranted exposure of private matters.' " ( Fredericks, supra ,
The threshold question in this case is not whether the information Petitioners seek is subject to one of the CPRA's statutory exemptions from disclosure. Nor is it whether, as Petitioners would have it, the trial court improperly created a nonstatutory "new exemption" for the records sought. The question, rather, is whether the information in the form Petitioners ask the State Bar to release it is subject to the obligations imposed by the CPRA in the first instance. The trial court correctly concluded that it is not.
In any event, the argument is not convincing. The trial court expressly stated that "the State Bar has demonstrated that disclosure of the requested records is prohibited by Business and Professions Code 6060.25 because individual applicants may be identified from the data resulting from application of any of Petitioners' protocols. Accordingly, the Court finds that the State Bar has met its burden. "
The trial court also expressly acknowledged the State Bar's burden under section 6255, subdivision (a) (the "catch-all" exemption) to show that the public interest served by nondisclosure outweighed the public interest served by disclosure: "The CPRA provides that a public agency is justified in withholding records if it demonstrates that 'the public interest served by not disclosing the record clearly outweighs the public interest served by disclosure of the record.' Gov't Code § 6255(a)." (Italics added.) We will not infer error where it is not shown on the record.
But, as we have indicated, the trial court was correct for another, independent reason. It is well established under California law and guiding federal precedent under the Freedom of Information Act (FOIA) (see Regents , supra ,
Federal law construing the FOIA is in accord. "The Act does not obligate agencies to create or retain documents; it only obligates them to provide access to those which it in fact has created and retained. ... [O]nly the Federal Records Act, and not the FOIA, requires an agency to actually create records, even though the agency's failure to do so deprives the public of information which might have otherwise been available to it." ( Kissinger v. Reporters Committee for Freedom of the Press (1980)
We have found no cases addressing proposals for data manipulations as complex as those proposed by Petitioners, but Center for Public Integrity,
Petitioners argue that much of this authority predates the emergence of electronic databases as a commonplace repository of government information, and that more recent cases require disclosure of electronically stored information "even if it requires extensive compilation or extraction of data contained in electronic public records." Their argument is premised upon mischaracterizing the cited cases, which merely distinguish between searching, extracting, compiling or redacting electronically stored data, which our state and federal public access laws require, and creating new records, which they do not. (See Schladetsch v. U.S. Dept. of H.U.D (D.D.C. 2000)
Here, the trial court determined that each of Petitioners' four proposed protocols would require the creation of new records. "All of the protocols require the State Bar to recode its original data into new values. ... For example, the protocols group law schools into three classes, designating a 'school class' code, which is not present in the original Admissions Database. [Citations.] The protocols also involve recoding race/ethnicity values to reflect four categories (Asian, Hispanic, Black, or White) instead of the State Bar's original eight race categories. Similar codes are created with respect to year of graduation. [Citations.]
"Protocols Two, Three, and Four require the creation of even more new data. For example, Protocol Two involves replacing some applicants' actual LSAT scores with a calculated median, as well as possibly creating a new 'underrepresented minority' or 'URM' category that does not exist in the original Admissions Database. [Citation.] Protocol Four involves rounding off actual law school GPAs to two significant digits. [Citation.] Finally, Protocol Three, which both the State Bar's and Petitioners' experts agree requires drastic changes to the State Bar's original data, requires calculating new values for GPAs and LSAT scores, as well as creating a variable indicating whether an applicant's law school is located in California or out-of-state. [Citations.]" The court cited Dr. Sweeney's testimony that none of the variables in Protocol Three exist in the raw Bar data, and that every variable would have to be calculated or recoded or both, as well as testimony from one of Petitioners' experts that " '[s]o much information has been changed or removed entirely from the data, law school name, the exact GPA, LSAT, all the data cleaning steps, they do a lot to change the structure of the data.' "
The court also rejected the Petitioners' contention that their protocols merely required the State Bar to redact or manipulate existing data and do some computer programming. "It is clear that the various steps outlined do more than simply redact or omit existing data. In order to achieve the 'manipulated' data contemplated under each of the protocols, Petitioners had to produce a 'Stata' software that applied a code specifically created to generate new data. [Citations.] Indeed, this case is vastly distinct from the two Illinois district court cases cited by Petitioners, in which the public agencies were ordered to produce a computer program that could delete certain information." Requiring the Bar to recode its existing data, the court concluded, would thus require it to create new records. We agree.
Petitioners argue that two provisions of the CPRA demonstrate that it in fact does impose a duty on public agencies to create new records. Section 6253.9, subdivision (b) authorizes an agency to charge a party requesting electronic records "the cost to construct a record, and the cost of programming and computer services necessary to produce a copy of the record" if the
In short, the trial court got the law right. There is no doubt that a government agency is required to produce non-exempt responsive computer records in the same manner as paper records and can be required to compile, redact or omit information from an electronic record. (See, e.g., §§ 6253.9, 6253 subd. (a); Sierra Club v. Superior Court (2013)
Petitioners also contend the trial court got it wrong on the facts. In their view, notwithstanding Dr. Sweeney's and Murphy's testimony the evidence established that disclosure pursuant to their protocols "would not create a new record, but would at most require data extraction, compilation and programming." But "[w]hen a trial court's factual determination is attacked on the ground that there is no substantial evidence to sustain it, the power of an appellate court begins and ends with the determination as to whether, on the entire record, there is substantial evidence, contradicted or uncontradicted, which will support the determination, and when two or more inferences can
Petitioners assert that, if nothing else, the court on its own initiative should have concocted a plan for disclosing the bar application data "subject to a process that entails only redaction of information, which would not require creating anything." As we understand it, they premise this suggestion on the requirement that it is the public agency's burden to prove a basis for nondisclosure of a public record. (See § 6255, subd.(a) [agency must show withheld record is exempt from disclosure]; American Civil Liberties Union of Northern Cal. v. Superior Court (2011)
In summary, the trial court's determination that Petitioners' requests are beyond the purview of the CPRA is legally correct and supported by the record. That finding is an independently sufficient ground to deny the petition, so we need not address the trial court's four additional stated bases for its decision.
The judgment is affirmed. The petition for writ of mandate is denied.
We concur:
Pollak, J.
Jenkins, J.
Notes
Unless otherwise indicated, further statutory citations are to the Government Code.
Four of the Intervenors testified to their concerns about the release of information they provided to the State Bar with the understanding it would remain confidential. State Bar admissions director Gayle Murphy testified, inter alia, about the Bar's collection and treatment of confidential information from applicants for admission to the bar and the State Bar's response to Petitioners' requests.
Binning refers to the practice of grouping and segregating data of reasonably equivalent values into a single group or set.
A particular instance of the variable is called a "value." A row of data is an "observation." For any particular set of variables, a group of observations that share the same value for those variables is a "cell."
They would not k-anonymize LSAT or GPA scores or whether the applicant took the bar more than once. Dr. Sweeney opined that these data fields are "potentially knowable" by third parties and, if not anonymized, can be used to re-identify applicants from the dataset using personally known or searchable public information.
Protocol 1 would also employ k-anonymity, but using a k of 5.
Arbuckle holds Master of Science degrees in Statistics and Mathematics. He had five years' experience in the field of data anonymization and re-identification risk at the time of trial. In addition, labor economist Dr. Peter Arcidiacono testified for Petitioners about the usefulness of the de-identified datasets under the different protocols for economic and social research. Petitioners also presented deposition testimony from Dr. Felicia LeClere, who developed and analyzed the protocols with Arbuckle, and social science researcher Samuel Canas.
Business and Professions Code section 6060.25, subdivision (a) provides: "Notwithstanding any other law, any identifying information submitted by an applicant to the State Bar for admission and a license to practice law and all State Bar admission records, including, but not limited to, bar examination scores, law school grade point average (GPA), undergraduate GPA, Law School Admission Test scores, race or ethnicity, and any information contained within the State Bar Admissions database or any file or other data created by the State Bar with information submitted by the applicant that may identify an individual applicant,
Our rejection of Petitioner's claim that sections 6253.9 and 6253, subdivision (c)(4) imply a break with settled law that public agencies are not required to create new records also dispenses with Petitioners' argument that federal cases interpreting FOIA on this issue have no bearing on the CPRA because FOIA "has no equivalent provisions."
We previously deferred ruling on Petitioners' January 17, 2018 request for judicial notice of a 2018 report by the State Bar of California to the Supreme Court titled "Report to the Supreme Court of the State of California: Final Report on the 2017 California Bar Exam Standard Setting Study" and related correspondence. We now deny the request because the documents to be judicially noticed have no bearing on our analysis and disposition here. (Mangini v. R.J. Reynolds Tobacco Co. (1994)
