Bytespider tops list of AI crawlers, Cloudflare finds
Despite their activity, many website operators are unaware of these crawlers, and only 2.98% of the top one million websites actively block or challenge AI bot requests.
Cloudflare has revealed that the most active AI web crawler over the past year is Bytespider, operated by Bytedance, which uses it to gather training data for its AI models, including the ChatGPT rival Doubao. Amazonbot, which indexes content for Alexa, and ClaudeBot, training the Claude chatbot, rank second and third, respectively. OpenAI’s GPTBot comes in fourth place.
Interestingly, while Bytespider leads in requests and blocking frequency, GPTBot ranks second in both areas. Despite this, many website operators remain unaware of these popular AI crawlers visiting their sites.
Cloudflare’s analysis shows that only a small percentage of websites, around 2.98% of the top one million, take measures to block or challenge AI bot requests. The despite the fact that more popular websites are both more frequently targeted by and more likely to block such crawlers.
The study also highlights that although many sites reference GPTBot, CCBot, and Google in their robots.txt files, they do not specifically disallow popular AI crawlers like Bytespider and ClaudeBot. The effectiveness of blocking relies on bot operators respecting these instructions.