Reddit Alleges Systematic Data Theft
Reddit has escalated the ongoing battle over AI training data with a federal lawsuit against Perplexity AI, accusing the company of orchestrating what sources describe as an industrial-scale scheme to illegally scrape and profit from its content. According to court documents filed in New York federal court, the social media platform alleges Perplexity turned to data-scraping service providers after being explicitly told to stop accessing Reddit’s content directly.
Table of Contents
The complaint states that “these AI companies, worth up to tens of billions of dollars, desperately need access to more and more high quality, current data to support their ambitions, and Reddit is a top-cited source of data for them.” This legal action represents a significant test of how user-generated content on social platforms can be protected from commercial exploitation by artificial intelligence firms.
The “Bank Robber” Analogy
Reddit’s legal filing employs vivid imagery to describe the alleged operation, comparing data-scraping service providers SerpApi, Oxylabs, and AWMProxy to “would-be bank robbers.” The lawsuit states: “In a very real sense, these Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.”
Analysts suggest the “vault” represents Reddit’s platform protected by technological barriers, while the “armored truck” refers to Google Search. The complaint alleges these specialized providers mask their identities and locations to circumvent Google’s controls, scraping billions of search results pages containing Reddit content that they then sell to companies like Perplexity.
Massive Scale of Alleged Data Harvesting
The scale of the alleged operation is staggering, with reports indicating the defendants accessed nearly three billion pages containing Reddit content during a two-week period in July 2025 alone. Perplexity, described as an advertised client of SerpApi, is portrayed in the lawsuit as the willing beneficiary of this massive data harvesting operation.
According to the complaint, Perplexity “will apparently do anything to get the Reddit data it desperately needs… anything other than enter into an agreement with Reddit directly, as some of its competitors have done.” This appears to reference the licensing deals Reddit has already established with major technology firms including Google and OpenAI.
Pattern of Legal Enforcement
This lawsuit against Perplexity continues Reddit’s aggressive legal strategy to protect its data assets. The company filed a similar action against AI giant Anthropic in June 2025, alleging unauthorized training on Reddit data and continued server access after promises to stop.
The current case reveals that Reddit sent a cease-and-desist letter to Perplexity in May 2024. In response, Perplexity reportedly claimed it did not use Reddit content to train its models and would respect the site’s robots.txt protocol. However, sources indicate that instead of decreasing, Reddit citations on Perplexity allegedly increased forty-fold following this exchange., according to technological advances
Testing the Evidence
Reddit appears to have conducted its own investigation to build its case, creating a specific post configured to be accessible only to Google’s crawler. The company states that “within hours,” Perplexity allegedly “produced the contents” of that exclusive post, suggesting the AI company was accessing Reddit content through indirect channels.
“As an advertised client of SerpApi, there can be little doubt where and how Perplexity is getting its illicit Reddit data,” the firm states in its legal filing. This evidence forms a crucial part of Reddit’s argument that Perplexity knowingly participated in circumventing access controls.
Broader Implications for AI Industry
The outcome of this lawsuit could establish critical precedents for how AI companies access and use publicly available human conversations. Reddit’s chief legal officer, Ben Lee, told The Verge that “AI companies are locked in an arms race for quality human content — and that pressure has fueled an industrial-scale ‘data-laundering’ economy.”
Meanwhile, Perplexity’s Jesse Dwyer stated that the company had not yet received the lawsuit but “will always fight vigorously for users’ rights to freely and fairly access public knowledge.” This positions the case as a clash between competing visions of data access rights in the AI era.
Potential Industry Transformation
Legal analysts suggest this case poses a direct test of the Digital Millennium Copyright Act as a tool to protect both discrete copyrighted works and entire databases of public-yet-commercially-valuable human expression from unauthorized commercial exploitation. The lawsuit embodies Reddit’s controversial 2023 API changes, which were framed as requiring commercial entities to pay for access to its data.
A ruling in Reddit’s favor would solidify its data as a protected commercial asset, creating a major shift in how AI companies operate. Conversely, a victory for Perplexity could legitimize new approaches to data acquisition, potentially undermining the emerging data-licensing economy that platforms such as Reddit are betting their future on. Either outcome appears likely to establish a transformative precedent for the value of online discourse in the artificial intelligence age.
Related Articles You May Find Interesting
- European Aerospace Giants Forge Alliance to Compete in Global Space Race
- Europe’s Path to Tech Titan Status: Insights from Sequoia’s Luciana Lixandru on
- Google’s NotebookLM Transforms Self-Hosting Education with Public Learning Noteb
- Norway’s Green Pivot: Can Climate Innovation Outperform Petroleum Prosperity?
- Unlocking Catalyst Potential: How Water Layers Drive Metal Migration for Enhance
References
- https://www.documentcloud.org/documents/26193527-reddit-v-serpapi-et-al/
- http://en.wikipedia.org/wiki/Perplexity
- http://en.wikipedia.org/wiki/Reddit
- http://en.wikipedia.org/wiki/Artificial_intelligence
- http://en.wikipedia.org/wiki/Lawsuit
- http://en.wikipedia.org/wiki/Google
This article aggregates information from publicly available sources. All trademarks and copyrights belong to their respective owners.
Note: Featured image is for illustrative purposes only and does not represent any specific product, service, or entity mentioned in this article.