/ llmtxt.info

Which AI Crawlers Read llms.txt?

A breakdown of which AI systems, search engines, coding assistants, and open-source RAG tools actively read llms.txt — and how each one uses the file.

Last updated:

How AI crawlers use llms.txt

llms.txt is a passive file — your server simply makes it available at /llms.txt. AI systems that support the standard will fetch it when they crawl or analyse your site. The file tells them:

  • What the site is about (via the H1 and blockquote summary).
  • Which URLs contain the most important documentation and content.
  • How to prioritise pages when context windows are limited.

Unlike robots.txt (which blocks crawlers), llms.txt is a positive signal — it invites AI systems to read your content and explains where to find the best material.

Systems that read llms.txt

Perplexity AI

Perplexity is the highest-profile AI search engine to officially announce llms.txt support. PerplexityBot reads the file during crawls to determine which pages to prioritize for its answer engine. If you want your documentation to appear in Perplexity answers, a well-structured llms.txt is a direct signal.

AI coding assistants

Tools like Cursor, GitHub Copilot (with context fetching), Cline, and Aider actively fetch llms.txt when users add a URL as a project context source. This is one of the most immediate practical use cases: your documentation becomes instantly accessible to developers using AI-assisted editors.

ChatGPT (manual fetch)

ChatGPT's web-browsing mode can fetch llms.txt when a user or plugin provides the URL. While OpenAI's GPTBot crawler does not automatically prioritize llms.txt, power users reference it explicitly to prime ChatGPT with accurate documentation context.

Claude (Anthropic)

Anthropic's Claude can retrieve llms.txt via its tool-use and computer-use capabilities. ClaudeBot (Anthropic's crawler) respects robots.txt; future versions may add automatic llms.txt discovery. Today, developers reference llms.txt manually in Claude Projects to give the model accurate context about a codebase.

Developer tools and RAG pipelines

Open-source and commercial tools that build knowledge bases from web content have been among the earliest adopters:

  • LlamaIndex — has a built-in LlmsTxtReader loader that parses llms.txt and fetches the linked pages to build a document index automatically.
  • LangChain — the LlmsTxtLoader community integration reads llms.txt and recursively fetches linked URLs for ingestion into vector stores.
  • Firecrawl — the web scraping API uses llms.txt to prioritize which pages to include when a user requests a full site crawl.
  • Mintlify, GitBook, Docusaurus — popular documentation platforms now offer llms.txt auto-generation, meaning their hosted sites automatically expose the file.

Systems with pending / partial support

  • Google AI Overview / Gemini — no public announcement yet; Google uses its own crawling signals. A well-structured llms.txt does not hurt, and Google may add explicit support as the standard matures.
  • Bing Copilot — Microsoft's Bing crawler (Bingbot) does not yet explicitly support llms.txt, but Bing has expressed interest in AI content signals.
  • Meta AI — no announced support; Meta's web crawler (FacebookBot) is primarily used for social graph data.

llms.txt vs robots.txt

robots.txt tells crawlers what they cannot access. llms.txt tells AI systems what they should focus on. They serve complementary purposes:

  • Use robots.txt to block AI crawlers from private or low-value pages.
  • Use llms.txt to highlight your best documentation and guide AI systems to the content that accurately represents your project.
  • Neither file replaces the other — deploy both for maximum control over how AI reads your site.

FAQ

Does ChatGPT read llms.txt?

ChatGPT's browsing feature can retrieve llms.txt when a user provides a URL. It does not automatically crawl every site for llms.txt, but users and plugin developers can reference it explicitly.

Does Claude read llms.txt?

Anthropic's Claude can fetch llms.txt via its tool-use capabilities. Developers commonly add llms.txt as a context source in Claude Projects to give the model accurate knowledge of a codebase or product.

What is the llms.txt crawler user agent?

There is no single user agent for llms.txt crawlers. Each AI system uses its own crawler identity (PerplexityBot, GPTBot, ClaudeBot, etc.). llms.txt is a passive file — crawlers must explicitly fetch it.

Related pages

Sources