After Cloudflare publicly criticized Perplexity's crawling strategy, some began to defend it

When Cloudflare on Monday accused AI search engine PerplexityCatch secretlyGet website data, while ignoring the specific methods by which websites blocked it, there were many who defended Perplexity. They argued that Perplexity's behavior in accessing websites against their owners' wishes was controversial but acceptable. As artificial intelligence agents proliferate on the Internet, this debate is set to intensify: Should agents that visit websites on behalf of users be considered bots? Or should it be considered a human being making the same request?

Cloudflare is known for providing anti-bot crawlers and other cybersecurity services to millions of websites. Essentially, Cloudflare's test case involves creating a new website using a new domain name that has never been crawled by any robot crawlers; setting up a robots.txt file that specifically blocks AI crawlers known to Perplexity; and then asking Perplexity for the content of the website. Perplexity answers this question.

Researchers at Cloudflare found that when the AI search engine's web crawler itself was blocked, it was using "a universal browser designed to mimic Google Chrome on macOS." Cloudflare CEO Matthew Prince published the study on

But many disagreed with Prince's assessment, arguing that this was not truly bad behavior. Those who have defended Perplexity on sites like

“If I as a human request a website, then I should be able to see its content,” one user on Hacker News wrote, adding, “Why would the large language model accessing the website on my behalf be in a different legal category than my Firefox web browser?”

A spokesperson for Perplexity previously denied that the bots were the company's and called Cloudflare's blog post a sales pitch for Cloudflare. However, on Tuesday, Perplexity published another blog post defending itself (and its attack on Cloudflare in general), claiming that the behavior was the result of a third-party service the company occasionally uses.

But the core of Perplexity's post deserves just as much attention as its online apologists, who read: "The difference between automated and user-driven scraping isn't just technical, it's about who has access to information on the open web. This controversy demonstrates that Cloudflare's systems are fundamentally inadequate at distinguishing between legitimate AI assistants and real threats."

Perplexity's accusations aren't entirely fair either. In criticizing Perplexity's approach, Prince and Cloudflare made the argument that OpenAI's approach is different from Perplexity's.

Cloudflare writes: "OpenAI is an excellent example of a leading AI company following these best practices. They respect robots.txt files and do not attempt to circumvent robots.txt directives or network-level blocking. ChatGPT Agent signs http requests using the newly proposed open standard Web Bot Auth."

Web Bot Auth is a Cloudflare-backed standard developed by the Internet Engineering Task Force in hopes of creating an encrypted method for identifying AI agent network requests.

The debate comes as bot activity reshapes the internet. As TechCrunch previously reported, bots trying to crawl large amounts of content to train AI models have become a threat, especially for smaller websites.

According to Imperva's "Malicious Bots Report" released last month, for the first time in Internet history, bot activity exceeded human online activity, with artificial intelligence traffic accounting for more than 50%. Most of this activity comes from LLM. But the report also found that malicious bots now account for 37% of all internet traffic. These activities range from persistent scraping of data to unauthorized login attempts.

Before the advent of large language models (LLMs), there was a general consensus on the Internet that websites could and should block most bot activity, which often used CAPTCHAs and other services (such as Cloudflare). Websites also have clear incentives to work with specific good actors (such as Googlebot) by instructing Googlebot through robots.txt what content should not be indexed. Google indexes the internet, which in turn sends traffic to websites.

Today, large language models (LLMs) are gobbling up more and more traffic. Gartner predicts that search engine traffic will decline by 25% by 2026. Currently, people tend to click on these links when LLM is most valuable to the site, which is when they are ready to make a transaction.

But if humans, as the tech industry predicts, will proactively seek out agents—to help us arrange travel, make dinner reservations, and shop for us—will sites blocking these agents harm their business interests? The debate on X illustrates this dilemma perfectly:

“I want Perplexity to be able to access any public content on my behalf when I send requests/tasks to it!” someone wrote in Cloudflare’s rant condemning Perplexity.

"What if the site owner doesn't want that? They just want you to go directly to their homepage and look at their stuff," another user countered, noting that the site owner who created the content wants traffic and potential ad revenue, not for Perplexity to take it.

"Here's why I don't think 'proxy browsing' will really work - it's a much harder problem than people think. Most website owners will just block it," predicted a third.

Related articles:

Perplexity accused of crawling websites that explicitly block AI crawling