Cloudflare’s new policy urges AI companies to pay publishers for content and separate search from training crawlers

Cloudflare recently announced that it will make important adjustments to the default configuration of websites using its services, setting a new "deadline" for the entire artificial intelligence industry: requiring AI companies to clearly distinguish web crawlers used for traditional search from crawlers used for AI agents and model training by September 15 this year, otherwise these "mixed-use" crawlers will be blocked by default on a large number of advertising pages.

According to the details announced by Cloudflare, any data crawler that is used for search, AI agent invocation, and model training at the same time will be blocked from crawling by default if it accesses a web page that hosts advertisements, unless the website owner actively changes the relevant settings. These new default settings will apply to new Cloudflare customers, new sites created by existing customers, and all existing free user sites. This move will directly affect the way AI model providers obtain web content for training and service generation, and will also change the data supply pattern behind AI agent services.

Cloudflare pointed out that most website owners hope that their content can be discovered through traditional search engines, and are also happy to be cited by AI services under certain conditions, but they do not want their intellectual property rights to be misappropriated for free and on a large scale without authorization. Cloudflare named "the world's largest search engine" (apparently pointing to Google) in its description, saying it has "approximately twice the amount of accessible information" compared to other AI companies. The reason is that the search giant makes it difficult for sites to maintain search visibility while completely avoiding the use of AI.

Google has always refuted similar general accusations, emphasizing that it provides a robot called "Google Extended" for sites to choose, which is used to explicitly refuse website content to be used for AI training and AI products and services such as Gemini Apps and Vertex API, without affecting the inclusion of the website in Google search. However, while Google's core crawler Googlebot indexes pages for search, it also provides data support for search-embedded AI functions such as AI Overviews and AI Mode.

Matthew Prince, co-founder and CEO of Cloudflare, said in the announcement that as the Internet traffic structure changes, "the vast majority of traffic on the Internet today is no longer accessed by humans." The industry had previously expected that the inflection point where "robot traffic exceeds human traffic" would not occur until next year. He emphasized: “In this case, we must go further and move faster to truly form a sustainable ecosystem.”

Prince said that Cloudflare's new tools and partnerships will provide website owners with greater visibility and business opportunities in the AI era, while also benefiting AI crawlers with clear uses and transparent intentions. He hopes that by adjusting the default policy, he can force "mixed-purpose crawlers" to clearly separate traditional search from agent calls and training purposes. At the external business level, Cloudflare provides a variety of products to help users build their own AI systems. On the other hand, it has also launched a series of "control enhancement" tools for publishers and content parties in recent years.

As early as 2024, Cloudflare launched a tool specifically to combat AI crawlers, and then launched a market called "Pay Per Crawl" in 2025, allowing websites to charge AI crawlers for crawling fees. The latest news shows that this model is further evolving into "Pay Per Use", that is, it no longer only charges based on "crawling behavior", but charges AI companies based on the actual "value creation" of the content in the AI system.

Cloudflare pointed out that this "pay-per-use" model not only provides publishers with new revenue channels, but also helps save their bandwidth and computing resources, because its internal data shows that more than 50% of AI crawler crawling traffic is spent on repeatedly crawling pages that have not been updated. Through new billing and control mechanisms, publishers can prioritize limited resources on truly valuable requests while imposing financial constraints on "ineffective duplicate crawls."

In terms of specific implementation cooperation, Cloudflare has currently launched pilot projects with two partners, Ceramic.ai and You.com. When publishers choose to join the program, they will receive corresponding compensation as long as their content appears in Ceramic's AI search results or is accessed as a piece of "paid premium content" by You.com. Cloudflare said that other AI companies can also customize and expand this payment model according to their own product forms.

Against the background of increasing regulatory and public attention on AI crawling and copyright issues, Cloudflare's policy adjustments and business model upgrades are obviously aimed at gaining more say and profit space for publishers, while also putting new transparency and compliance pressure on AI companies. For the AI industry, while continuing to rely on massive web content to train and run various intelligent agents, how to strike a balance between technical convenience and the rights and interests of content owners will become an unavoidable core issue in the future.