Claude is an artificial intelligence application developed by artificial intelligence developer Anthropic. Like most artificial intelligence developers, the crawlers sent by Anthropic retrieve and crawl massive amounts of content on the Internet every day to train artificial intelligence models. iFixit is a well-known disassembly and repair website in the industry. The website has many disassembly articles with text and pictures, so the crawler sent by Anthropic also launched a crazy crawl on iFixit.
The webmaster complained on X/Twitter: I know you're hungry for data and Claude is really smart, but do you really need to hit our servers a million times in 24 hours? Not only are you stealing our content without paying for it, but you're also taking away our DevOps resources, which is just not cool.
Website logs show that ClaudeBot initiates thousands of visits to iFixit every minute, which will have a negative impact on the iFixit server, because this kind of crawling will not only consume server CPU resources but also consume network bandwidth. No website wants to see this situation.
iFixit said in an interview with 404media:
We have the largest maintenance information database in the world, and if they took all the information without permission, it would crash our servers. iFixit currently has millions of links to various repair guides, repair revision history, blogs, news posts, research, forums, community-contributed repair guides, Q&A, and more.
Anthropic's support team did not apologize for the complaint and gave the following response:
Following industry standards, Anthropic uses a variety of data sources for model development, such as publicly available data on the Internet collected through web crawlers. Our crawling should not be intrusive or destructive, and our goal is to minimize disruption by respecting crawl latency where appropriate.
The easiest way for a website is to directly block the Claude crawler. Bluedot.com also faces DDoS attacks from the Claude crawler. The crawler does crawl thousands of times per minute, which has an impact on the Bluedot.com server, so we blocked the Claude crawler early.
If you want to block it, you can add the following content to robots.txt:
User-agent: ClaudeBotDisallow: /
Of course, to be on the safe side, we also use regular expressions on Nginx to match the ClaudeBot crawler. If the ClaudeBot crawler does not comply with the robots.txt protocol and continues to crawl, it can be intercepted directly.
In order to prevent the crawler from being unable to crawl the robots.txt file, it is recommended that the webmaster update the robots.txt first. If you can still see records of ClaudeBot grabbing non-robots.txt files in the website log after a few days, it means that the protocol has not been followed. You can directly return HTTP 444 through Nginx to discard the connection to reduce the server load.