Which AI Crawlers Read llms.txt?
A breakdown of which AI systems, search engines, coding assistants, and open-source RAG tools actively read llms.txt — and how each one uses the file.
Last updated:
How AI crawlers use llms.txt
llms.txt is a passive file — your server simply makes it available at /llms.txt. AI systems that support the standard will fetch it when they crawl or
analyse your site. The file tells them:
- What the site is about (via the H1 and blockquote summary).
- Which URLs contain the most important documentation and content.
- How to prioritise pages when context windows are limited.
Unlike robots.txt (which blocks crawlers), llms.txt is a positive signal
— it invites AI systems to read your content and explains where to find the best material.
Systems that read llms.txt
Perplexity AI
Perplexity is the highest-profile AI search engine to officially announce llms.txt support. PerplexityBot reads the file during crawls to determine which pages
to prioritize for its answer engine. If you want your documentation to appear in Perplexity answers,
a well-structured llms.txt is a direct signal.
AI coding assistants
Tools like Cursor, GitHub Copilot (with context fetching), Cline, and Aider actively fetch llms.txt when users add a URL as a project context source. This is one of the most immediate
practical use cases: your documentation becomes instantly accessible to developers using AI-assisted
editors.
ChatGPT (manual fetch)
ChatGPT's web-browsing mode can fetch llms.txt when a user or plugin provides the URL.
While OpenAI's GPTBot crawler does not automatically prioritize llms.txt, power users reference it explicitly to prime ChatGPT with accurate
documentation context.
Claude (Anthropic)
Anthropic's Claude can retrieve llms.txt via its tool-use and computer-use capabilities.
ClaudeBot (Anthropic's crawler) respects robots.txt; future versions may add automatic llms.txt
discovery. Today, developers reference llms.txt manually in Claude Projects to give the
model accurate context about a codebase.
Developer tools and RAG pipelines
Open-source and commercial tools that build knowledge bases from web content have been among the earliest adopters:
- LlamaIndex — has a built-in
LlmsTxtReaderloader that parsesllms.txtand fetches the linked pages to build a document index automatically. - LangChain — the
LlmsTxtLoadercommunity integration readsllms.txtand recursively fetches linked URLs for ingestion into vector stores. - Firecrawl — the web scraping API uses
llms.txtto prioritize which pages to include when a user requests a full site crawl. - Mintlify, GitBook, Docusaurus — popular documentation platforms now offer
llms.txtauto-generation, meaning their hosted sites automatically expose the file.
Systems with pending / partial support
- Google AI Overview / Gemini — no public announcement yet; Google uses its own crawling
signals. A well-structured
llms.txtdoes not hurt, and Google may add explicit support as the standard matures. - Bing Copilot — Microsoft's Bing crawler (Bingbot) does not yet explicitly support
llms.txt, but Bing has expressed interest in AI content signals. - Meta AI — no announced support; Meta's web crawler (FacebookBot) is primarily used for social graph data.
llms.txt vs robots.txt
robots.txt tells crawlers what they cannot access. llms.txt tells AI systems what they should focus on. They serve complementary
purposes:
- Use
robots.txtto block AI crawlers from private or low-value pages. -
Use
llms.txtto highlight your best documentation and guide AI systems to the content that accurately represents your project. - Neither file replaces the other — deploy both for maximum control over how AI reads your site.
FAQ
Does ChatGPT read llms.txt?
ChatGPT's browsing feature can retrieve llms.txt when a user provides a URL. It does
not automatically crawl every site for llms.txt, but users and plugin developers
can reference it explicitly.
Does Claude read llms.txt?
Anthropic's Claude can fetch llms.txt via its tool-use capabilities. Developers commonly
add llms.txt as a context source in Claude Projects to give the model accurate knowledge
of a codebase or product.
What is the llms.txt crawler user agent?
There is no single user agent for llms.txt crawlers. Each AI system uses its own crawler
identity (PerplexityBot, GPTBot, ClaudeBot, etc.). llms.txt is a passive file — crawlers must explicitly fetch it.
Related pages
- Who uses llms.txt — real-world adoption evidence.
- llms.txt format reference — spec details.
- How to create llms.txt — step-by-step for any stack.
- Validator · Generator.