Skip to content
  • There are no suggestions because the search field is empty.

AI Crawlers Reference Guide

Overview

This document provides comprehensive information about AI crawlers used by major platforms including ChatGPT, Perplexity, Microsoft Copilot, Google Gemini, and Claude. Each crawler has specific purposes ranging from AI training to search indexing to on-demand content fetching.

AI Crawlers by Platform

ChatGPT (OpenAI)

Crawler

User Agent

Purpose

GPTBot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

AI training - collects data for training GPT models

OAI-SearchBot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

Search functionality - retrieves web content for ChatGPT search

ChatGPT-User

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

On-demand fetcher - when users ask ChatGPT to visit a page

IP Ranges & Documentation:

Perplexity

Crawler

User Agent

Purpose

PerplexityBot

Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://www.perplexity.ai/bot)

AI search indexing - indexes content for search results (not for model training)

Perplexity-User

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity/1.0; +https://www.perplexity.ai)

On-demand fetcher - retrieves content when users ask questions

IP Ranges & Documentation:

Microsoft Copilot / Bing

Crawler

User Agent

Purpose

Bingbot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36

Search indexing - powers Bing Search and Microsoft Copilot answers

Documentation: https://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0

Google (Gemini / AI Overview)

Crawler

User Agent

Purpose

Googlebot

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Search indexing - indexes content for Google Search, Discover, Images, Video, News

Google-Extended

Google-Extended (token)

AI training - collects data for Gemini Apps and Vertex AI (does NOT affect Google Search)

Gemini-Deep-Research

Varies

On-demand fetcher - used by Gemini Deep Research feature for user queries

IP Ranges & Documentation:

Claude (Anthropic)

Crawler

User Agent

Purpose

ClaudeBot

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +https://www.anthropic.com/)

AI training - collects web content for training Claude AI models

Claude-SearchBot

Claude-SearchBot

Search indexing - indexes content to improve Claude search result quality

Claude-User

Claude-User

On-demand fetcher - retrieves web content at users' direction when using Claude

Documentation: https://support.anthropic.com/en/articles/8896518 (Note: Anthropic does NOT publish IP ranges)

 

If you want to block some of the ai crawlers or agents, here are some tipps for the

Firewall Configuration Guide

Option 1: Using robots.txt (Recommended)

All major AI crawlers respect robots.txt directives. Add these to your site's robots.txt file:

# Block OpenAI GPTBot (training)

User-agent: GPTBot

Disallow: /

 

# Block OpenAI SearchBot

User-agent: OAI-SearchBot

Disallow: /

 

# Block ChatGPT-User

User-agent: ChatGPT-User

Disallow: /

 

# Block Perplexity

User-agent: PerplexityBot

Disallow: /

 

User-agent: Perplexity-User

Disallow: /

 

# Block Bingbot (affects Bing Search AND Copilot)

User-agent: bingbot

Disallow: /

 

# Block Googlebot (affects ALL Google products)

User-agent: Googlebot

Disallow: /

 

# Block Google-Extended (AI training ONLY)

User-agent: Google-Extended

Disallow: /

 

# Block Anthropic ClaudeBot (training)

User-agent: ClaudeBot

Disallow: /

 

# Block Claude-SearchBot

User-agent: Claude-SearchBot

Disallow: /

 

# Block Claude-User

User-agent: Claude-User

Disallow: /

Option 2: IP-Based Firewall Rules

For platforms that publish IP ranges, you can implement IP-based blocking:

Platforms with Published IP Ranges:

  • OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User (see JSON links above)
  • Perplexity: PerplexityBot, Perplexity-User (see JSON links above)
  • Google: Googlebot (see JSON link above)
  • Microsoft: Verify via Bing Webmaster Tools

Important: Anthropic (Claude) does NOT publish IP ranges and advises against IP-based blocking as it prevents the crawler from reading your robots.txt file.

Option 3: Web Application Firewall (WAF)

Cloudflare WAF Configuration:

  • Go to Security → WAF
  • Create custom rules
  • Field: User-Agent
  • Operator: Contains
  • Value: Crawler name (e.g., PerplexityBot)
  • Action: Block or Allow as needed

Important Notes

About Blocking Crawlers:

  • robots.txt is the recommended method - All major AI companies respect it
  • IP blocking has limitations: ranges change frequently, some don't publish IPs
  • Allow 24-48 hours for robots.txt changes to take effect

Selective Blocking Strategy:

  • Allow search crawlers for visibility in search results
  • Block training crawlers if you don't want content used for AI training
  • Allow user-initiated fetchers to enable AI assistant access

Special Considerations:

  • Perplexity Warning: Reports indicate Perplexity has used undeclared crawlers to bypass robots.txt. Consider using WAF rules in addition to robots.txt.
  • Google-Extended: Only affects Gemini Apps and Vertex AI training - blocking it does NOT affect Google Search rankings.
  • Bingbot: Blocking Bingbot affects BOTH Bing Search AND Microsoft Copilot functionality.

Last Updated: January 2026

Note: Crawler information and IP ranges are subject to change. Always verify with official documentation.