Reddit Files Lawsuit Against Perplexity AI Over Alleged Mass Data Scraping Operation

Legal Action Over Alleged Data Theft

Reddit has filed a lawsuit against artificial intelligence laboratory Perplexity and three data scraping companies, accusing them of what sources describe as an “industrial-scale” scheme to illegally harvest the social media platform’s content. According to court documents filed in the Southern District of New York, the defendants allegedly collaborated to bypass Reddit’s data protections to steal copyrighted user conversations for training and fueling AI products.

Legal Action Over Alleged Data Theft
The Alleged Scraping Network
Perplexity’s Business Model Under Scrutiny
Evidence Gathering Through Deliberate Trap
Broader Context of AI Data Scraping Battles
Legal Remedies Sought

The Alleged Scraping Network

The lawsuit names scraping firms Oxylabs UAB, AWMProxy, and SerpApi as participants in what the report states was a coordinated effort to evade Reddit’s anti-scraping measures. Analysts suggest these companies allegedly accessed Reddit content through backdoor methods, with some scraping occurring directly from Google search results pages rather than Reddit’s platform directly. This approach reportedly allowed them to circumvent technical protections Reddit had implemented.

Perplexity’s Business Model Under Scrutiny

Reddit’s legal filing contains sharp criticism of Perplexity’s core technology, which the report states is “nothing groundbreaking” and built on “retrieval-augmented generation” (RAG). According to the complaint, this means Perplexity’s business model effectively involves taking Reddit’s content from Google search results, processing it through another company’s large language model, and presenting it as a new product. The lawsuit notes that while this approach has reportedly translated into a $20 billion valuation, it hasn’t resulted in willingness to pay for content that other companies have licensed.

Evidence Gathering Through Deliberate Trap

Sources indicate Reddit set a deliberate trap to gather evidence against Perplexity, creating a unique “test post” that was only accessible to Google’s search crawler and unavailable elsewhere online. According to the lawsuit, content from this hidden post appeared in Perplexity’s search results within hours, allegedly proving the AI company was scraping protected content despite previous assurances to respect Reddit’s robots.txt file. This follows similar complaints from Cloudflare in August alleging Perplexity ignored robots.txt files and used stealth crawlers to evade blocking attempts.

Broader Context of AI Data Scraping Battles

This lawsuit represents the latest escalation in Reddit’s efforts to control how its data is used by artificial intelligence companies. In June, Reddit reportedly sued Anthropic for similar unauthorized data scraping, accusing the Claude creator of publicly advocating for responsible AI while privately scraping data against terms of service. The pattern suggests growing tension between content platforms and AI companies over training data sourcing, with platforms increasingly demanding compensation for content that fuels AI development.

Legal Remedies Sought

Reddit is seeking court intervention to stop the defendants from scraping its data and is requesting damages for the harm caused, including what analysts describe as the “disgorgement of any ill-gotten gains” earned from unauthorized use of its content. The outcome of this case could establish important precedents for how AI companies source training data and what obligations they have to content creators and platforms.