SARAH ANDERSEN, et al. v. STABILITY AI LTD., et al.
Case No. 23-cv-00201-WHO (LJC)
UNITED STATES DISTRICT COURT NORTHERN DISTRICT OF CALIFORNIA
June 27, 2025
Re: Dkt. No. 307
ORDER DENYING PLAINTIFFS’ REQUEST FOR NON-LAION TRAINING DATASETS
The parties dispute whether Defendant Midjourney must produce all datasets it used to train its generative AI models or produce only datasets sourced from LAION.1 Plaintiffs argue that the production of all datasets is warranted and these “datasets are among the most crucial evidence in this case.” ECF No. 307 at 2. Midjourney objects that non-LAION datasets are irrelevant to Plaintiffs’ claims and their production would be overly burdensome. The Court sides with Midjourney and denies Plaintiffs’ request for an order compelling production of the non-LAION datasets.
Plaintiffs’ Count Four, brought on behalf of the “LAION-400M Registered Plaintiffs and Damages Subclass,”3 asserts that Midjourney directly infringed on copyrighted works contained in the LAION-400M dataset. Id. ¶ 271-75. Their Count Five, brought on behalf of the “LAION-5B Registered Plaintiffs and Damages Subclass,”4 asserts that Midjourney directly infringed on copyrighted works contained in the LAION-5B dataset. Id. ¶ 276-82.5 These copyright claims are explicitly limited to Midjourney‘s alleged copying from the LAION datasets. See ECF No. 223 at 19-21 (denying Midjourney‘s motion to dismiss Plaintiffs’ copyright claims where Plaintiffs had plausibly pled that “their works were included in the LAION datasets“). Despite this, Plaintiffs first argue that obtaining non-LAION datasets is warranted because evidence that Midjourney copied Plaintiffs’ registered works from other, non-LAION sources would “definitively” establish “a violation of copyright law[.]” ECF No. 307 at 2. This argument both puts the cart before the horse—whether using registered works to train an AI model is “a violation
Plaintiffs’ second argument why non-LAION datasets are relevant to this lawsuit is that they anticipate Midjourney will argue that “Plaintiffs’ works comprise a very small fraction of its models[‘] datasets, and thus any infringement would be fair use.” ECF No. 307 at 2. The fair use doctrine establishes that “the fair use of a copyrighted work... for purposes such as criticism, comment, news reporting, teaching, scholarship, or research, is not an infringement of copyright.” Kadrey v. Meta Platforms, Inc., No. 23-cv-03417-VC, 2025 WL 1752484, at *3 (N.D. Cal. June 25, 2025) (quoting
Plaintiffs argue that they need to have access to Midjourney‘s entire training corpus to rebut the third factor and show that “their works (or LAION in general)” is not such a “small proportion” of Midjourney‘s training data so that “any infringement is excused.” ECF No. 307 at 2. But this misconstrues the fair use doctrine. Under the third factor, courts consider the amount of the copyrighted work the alleged copier uses relative to the “copyrighted work as a whole[.]”
The Court accordingly finds that non-LAION datasets are not relevant to Plaintiffs’ copyright claims against Midjourney. However, the Court notes that Plaintiffs’ argument regarding Midjourney‘s anticipated fair use defense is, at this point, abstract. Plaintiffs may renew their request for non-LAION training datasets based on Midjourney‘s fair use defense only if, at a later stage in the case, they can concretely articulate how the content (rather than the overall size or sources) of non-LAION datasets would be relevant to rebutting Midjourney‘s fair use defense.
Plaintiffs assert two Lanham Act claims against Midjourney. Midjourney allegedly published a list of artists (the Name List), including many named Plaintiffs (the Name List Plaintiffs), whose styles its generative AI models could emulate. Second Am. Compl. ¶¶ 254-56. Plaintiffs’ Count Seven, for false endorsement, asserts that Midjourney violated the Lanham Act by releasing the Name List, which “created a likelihood of confusion over whether the” Name List Plaintiffs endorsed or were affiliated with Midjourney‘s products. Second Am. Compl. ¶ 299. Plaintiffs’ Count Eight, for vicarious trade dress infringement, alleges that Midjourney violated the Lanham Act by profiting from imitations of the Name List Plaintiffs’ protectable artistic styles. Id. ¶ 312-17. Both of these claims hinge on whether Midjourney‘s use of the Name List Plaintiffs’ names and styles was likely to cause confusion or mistake as to the affiliation between
Plaintiffs do not offer any argument as to how the non-LAION datasets could be necessary to their false endorsement claim, which turns on whether the Names List gave the false impression that the Names List Plaintiffs approved of or were associated with Midjourney. The content of Midjourney‘s training data has no bearing on that claim. Plaintiffs argue that the non-LAION datasets are relevant to their vicarious trade dress claim, because “comparing whether there is a likelihood of confusion between Plaintiffs’ works and Midjourney‘s outputs require comprehensive comparison of Plaintiffs’ works with the entirety of what is in Midjourney‘s [entire] training corpus[.]” ECF No. 307 at 2-3. Comparing Plaintiffs’ works and Midjourney‘s outputs will certainly be necessary, but, Plaintiffs’ conclusory statements to the contrary, the undersigned does not see how Midjourney‘s training data bears on this comparison.
The undersigned accordingly finds that Plaintiffs have not established that non-LAION training data is relevant to their claims against Midjourney. In re Glumetza Antitrust Litig., No. 19-cv-05822-WHA (RMI), 2020 WL 3498067, at *7 (N.D. Cal. June 29, 2020) (the party seeking discovery has the “initial burden of establishing that the request satisfies the relevancy requirements of
IT IS SO ORDERED.
Dated: June 27, 2025
LISA J. CISNEROS
United States Magistrate Judge
