Your site can rank first on Google and still be completely absent from ChatGPT. The cause is rarely your content. It is technical. On July 1, 2025, Cloudflare switched on default blocking of AI bots, cutting thousands of sites off from ChatGPT, Perplexity and Claude without their owners knowing (Cloudflare, “Content Independence Day”, 2025). AI visibility therefore starts with one simple question: can generative engines even read your pages? This technical GEO audit rests on 5 checks, ordered from the most severe block to the lightest optimization signal. I’ll give you the method to test them yourself.
And your brand, does ChatGPT recommend it?Measure your presence and spot the brands cited in your place. No credit card.
Key takeaways
- A site that ranks well on Google can still be invisible to AI: the block is almost always technical, not editorial.
- AI bots (GPTBot, ClaudeBot, PerplexityBot) do not execute JavaScript: your content must exist in the raw HTML.
- A WAF, Cloudflare or a security plugin often blocks AI bots without your knowledge, without touching your robots.txt.
- Five checks cover a first diagnosis: indexability, robots.txt, JavaScript, WAF, structured data.
Why can your site be invisible in ChatGPT while ranking well on Google?
A good Google ranking guarantees no AI visibility. These are two different systems. Google crawls your site with Googlebot, which can read JavaScript. Generative engines use their own bots, with their own rules.
There are three families of AI bots, and confusing them is costly. Training bots (GPTBot, ClaudeBot, Google-Extended) collect content to feed the models. Search bots (OAI-SearchBot, PerplexityBot, Claude-SearchBot) index your pages to enable real-time citation. User-triggered fetchers (ChatGPT-User, Claude-User) pull a specific page when someone asks a question.
This distinction changes everything. You can block a training bot to protect your intellectual property while staying eligible for citation by the search bot. Blocking GPTBot does not necessarily stop OAI-SearchBot from citing you. You just need to configure the two separately.
The real danger sits higher up, at the infrastructure layer. If your server, your CDN or your firewall returns an error to the bot, that bot never sees your content. Quality is irrelevant. An invisible page will never be cited.
The 5 technical checks of a GEO audit (in order of priority)
A technical GEO audit follows a logic of priority. You start with what fully blocks access, then move down toward what optimizes readability. There is no point polishing your structured data if your pages return a 403 error to bots.
| Check | What blocks it | Severity |
|---|---|---|
| 1. Indexability | noindex tag, X-Robots-Tag | Total block |
| 2. Robots.txt | Disallow on AI bots | Total block |
| 3. JavaScript | Client-side rendered content | Partial block |
| 4. WAF / Cloudflare / plugin | 403 or 429 error to the bot | Total and invisible block |
| 5. Structured data | Missing JSON-LD | Optimization |
1. Indexability: the forgotten noindex tag
The noindex tag is the first culprit to check. A single line is enough to make a page invisible. It tells engines not to index the page, and most AI bots respect it.
The problem often comes from an oversight. A page kept in staging retains its noindex when it goes live. An SEO plugin applies it by mistake to a whole category. The result is the same: the page never enters the models’ memory.
Check two places. First the meta robots tag in the page source. Then the X-Robots-Tag HTTP header, more discreet because it does not appear in the visible HTML. This one traps many sites, because it is set at server level and slips under the radar of standard audits.
2. Robots.txt: are you blocking the right bots or the wrong ones?
Your robots.txt file dictates who can access what. A misplaced Disallow blocks AI bots without you noticing. Many files date from 2022 and simply ignore the existence of GPTBot or ClaudeBot.
One technical nuance deserves attention. The robots.txt governs crawling, not indexing. If you block the crawl of a page, the bot can never read its noindex tag. To prevent indexing, leave crawling open and use noindex. Blocking both at once produces the opposite of the intended effect.
To open access to citation bots, your file must explicitly allow the useful agents:
- OAI-SearchBot and ChatGPT-User for visibility in ChatGPT.
- PerplexityBot for Perplexity, which always displays its sources.
- Claude-SearchBot for Claude.
Check your file by typing your-domain.com/robots.txt in a browser. If you see “Disallow: /” under one of these agents, you are cutting yourself off from AI answers.
3. JavaScript: the invisible wall of 2026
AI bots do not read JavaScript. It is the most underestimated technical difference of 2026. An analysis of more than 500 million GPTBot requests found no trace of JavaScript execution: the bot downloads the initial HTML, then moves to the next page (Passionfruit, 2026).
Googlebot can render JavaScript through a Chromium environment. Generative engines cannot. GPTBot, ClaudeBot and OAI-SearchBot behave like plain text readers. If your price, your product description or your FAQ appears only after a script runs, these bots see an empty shell.
The risk applies particularly to WordPress sites built with page builders. WPBakery, Elementor or Divi sometimes generate content via JavaScript, in tabs or accordions. A human clicks to reveal the content. An AI bot does not click.
The no-code test: on your page, right-click then “View page source”. That is the raw HTML an AI bot sees. If your key content is missing there and appears only with “Inspect” (the rendered DOM), it is invisible to AI.
The solution is server-side rendering (SSR) or static generation. Your server then delivers complete, ready-to-read HTML, without depending on the browser. For a WordPress site, keep your essential content in native HTML rather than in interactive elements.
WAF, Cloudflare and WordPress plugins: the silent block on your AI visibility
The web application firewall (WAF) is the most insidious trap of a GEO audit. It blocks bots before they even reach your server. Your robots.txt can be perfect, your content flawless: if the WAF shuts the door, nothing gets through.
On July 1, 2025, Cloudflare took a major step. The company switched on default blocking of AI bots, an event it named “Content Independence Day”. Thousands of sites found themselves cut off from ChatGPT, Perplexity and Claude overnight. Most owners still don’t know it (SEO Engico, 2026). According to field audits, nearly 30% of sites that believed they were open were in fact returning an error to AI bots (ViaMetric, 2026).
The WordPress case deserves a separate mention. On managed hosting, the block can come from a layer you don’t control. Worse, the error returned is often a 429 code (too many requests) rather than a 403 (forbidden). This nuance fools audits, because a 429 looks like a simple rate limit while the real block happens elsewhere.
Security plugins add a layer of opacity. Wordfence, Sucuri or Solid Security sometimes ship block lists covering GPTBot or ClaudeBot by default. You install the plugin to protect against spam, and you cut your AI citations without knowing it.
Three layers determine an AI bot’s access to your content:
- The robots.txt, which provides rules that compliant bots respect.
- The CDN or WAF (Cloudflare, AWS, Fastly), which filters by signature and IP address.
- The application plugins, which block at the WordPress level itself.
To diagnose, check your server logs and look for the “GPTBot” or “ChatGPT-User” agents. If you use Cloudflare, check the AI crawl metrics dashboard and disable blocking for the agents you want to keep.
Structured data: essential or overrated?
Structured data helps machines understand your content. The reference format is JSON-LD, recommended by Google for AI-optimized content. It explicitly describes what a page is: an article, a person, an FAQ, a product.
The argument in its favor is solid. A Data World study shows that GPT-4 response accuracy on niche questions rises from 16% to 54% when content relies on structured data (Data World, 2024, secondary source to verify).
Caution still applies. A Search Atlas analysis found no correlation between structured data coverage and LLM citation rate across OpenAI, Gemini and Perplexity (Search Atlas, 2026, secondary source to verify). To date, no peer-reviewed study confirms a direct impact of schema on AI visibility.
My position as a consultant is nuanced. Structured data is not a magic formula. But it remains cheap to implement, useful for classic SEO, and it clarifies your entities. The FAQPage schema structures your content into standalone question-answer pairs, easy to extract. I recommend it as a foundation, not a single lever.
How to run your own technical GEO mini-audit?
You can run a first diagnosis without advanced technical skills. Follow these five steps in order.
- Check indexability. View your page source and search for “noindex”. If it appears on an important page, that is your first emergency.
- Read your robots.txt. Type your-domain.com/robots.txt. Spot any “Disallow: /” tied to an AI bot.
- Test JavaScript. Compare “View source” and “Inspect”. If your key content is missing in the first, it is invisible to AI.
- Check your firewall. Review your server logs or the Cloudflare dashboard. Look for 403 and 429 errors on AI agents.
- Audit your structured data. Make sure your JSON-LD appears in the raw HTML, not injected afterward by a script.
This diagnosis gives you a first map. To then measure whether your brand is actually cited by AI, you need dedicated tracking, beyond simple technical accessibility.
Read also: How to get referenced on ChatGPT? 2026 Guide
FAQ
Do AI bots respect the robots.txt file?
Compliant bots like GPTBot, ClaudeBot and OAI-SearchBot respect robots.txt. Others ignore it. In January 2026, Cloudflare documented cases of crawlers using disguised agents to bypass blocks. Robots.txt remains a useful signal, but not a security guarantee.
Does blocking GPTBot remove my site from ChatGPT?
Not entirely. GPTBot serves training, while OAI-SearchBot and ChatGPT-User serve search and real-time answers. Blocking the first does not stop citation by the others, provided you allow them separately.
Is my JavaScript site necessarily invisible to AI?
Not if the content exists in the HTML served by the server. The problem comes from client-side rendering, where content appears only after a script runs. Server-side rendering or static generation solve this point.
How do I know if Cloudflare is blocking AI bots on my site?
Log into your Cloudflare dashboard and check the AI crawl metrics page. Also check Bot Fight Mode, enabled by default. Your server logs reveal the errors returned to AI agents.
Does the llms.txt file improve my AI visibility?
Its effect remains limited today. No major AI provider has confirmed using it as a citation signal, and crawl data shows bots largely ignore it. It costs little to set up as a long-term bet, but it does not replace a readable architecture.
Does structured data guarantee an AI citation?
No. The evidence is contradictory: one study shows a strong accuracy gain, another finds no correlation with citations. Structured data remains a useful technical foundation, not a guarantee.
How often should I redo this technical GEO audit?
At least once a quarter, and after every major change: new host, new security plugin, redesign with a page builder, or activation of a Cloudflare service. These events alter bot access without warning.
Sources
- Passionfruit, “JavaScript Rendering and AI Crawlers: Can LLMs Read Your SPA?”, March 2026 — getpassionfruit.com
- SEO Engico, “Cloudflare AI Bot Blocking: Is Your Site Locked Out of AI Search?”, May 2026 — seoengico.com
- ViaMetric, “Is your firewall (WAF) accidentally blocking ChatGPT?”, January 2026 — viametric.app
- Cloudflare, “Content Independence Day”, July 1, 2025 — blog.cloudflare.com


