People are asking ChatGPT, Perplexity, and Google's AI what to buy, who to hire, and what to read. Whether your site appears in those answers, or doesn't, is increasingly a business decision rather than a technical accident. Here's exactly how to control it.

First: do you actually need this?

Before optimising for AI visibility, it's worth asking whether it's the right priority for your business right now.

AI search referral traffic is still a small share of overall web traffic, roughly 1% of total visits across the web per recent Similarweb data. Organic search from Google still drives the majority of traffic for most sites, and that's not changing overnight. If your fundamentals (content quality, technical SEO, backlinks) are weak, fixing those will outperform AI-specific optimisation by a wide margin.

Where AI visibility matters most:

E-commerce and direct-to-consumer brands have the strongest case. Adobe Analytics reported AI traffic converting at 42% better than non-AI traffic, because buyers arriving from AI referrals are further along in their decision process and arrive with more intent.

B2B companies targeting professional buyers have an equally strong case. Perplexity, which cites sources on every response, attracts a disproportionately senior audience: 80% college graduates, 65% high-income white-collar professionals, and 30% in senior leadership roles, per a 2025 WARC audit. When those people ask Perplexity who the best vendor is in your space and you're not in the answer, you're not in the conversation.

SaaS, professional services, agencies, and publishers are all in active need of AI visibility strategies. The channel is small but growing at 357% year-over-year in referral volume, and it converts at rates that outpace traditional organic.

Where urgency is lower: purely local brick-and-mortar businesses with a tightly bounded service area; geo-restricted services where users rarely compare options via AI before visiting in person; and sites with very thin content, where improving the fundamentals is a higher ROI than AI-specific work.

The rest of this guide assumes AI visibility is a legitimate priority for your business. If it is, read on.

Understanding how each AI tool actually works

This is the part most guides skip, and skipping it leads to generic advice that doesn't differentiate between platforms. Each major AI tool retrieves content differently. What gets your site cited in Perplexity is not the same as what gets you cited in ChatGPT.

Perplexity

Perplexity performs real-time web retrieval on every single query. It runs its own crawler (PerplexityBot), indexes the web continuously, and fetches live content at the moment someone asks a question. That means new content can appear in Perplexity citations within hours or days of being published and indexed.

Perplexity averages 21.9 citations per response, more than double ChatGPT's average. It's the platform where fresh, well-structured, authoritative content pays off fastest. Reddit accounts for 46.7% of Perplexity's citations across most industries, which is not a typo. It's nearly twice Wikipedia's share, and it signals that community-validated, conversational content performs strongly here.

Perplexity also cites sources inline on every response, by default, making it the most transparent of all the major AI search tools about where its information comes from.

ChatGPT

ChatGPT operates on a two-layer system. The base is its training data, which is static and built from content crawled before the model's training cutoff. The retrieval layer (web search via Bing) is activated primarily for queries that include commercial signals: terms like "reviews," "comparison," "best," "features," or a specific year. For everything else, it's answering from memory.

That means two different strategies apply for ChatGPT: appearing in its training data (a long game, dependent on your content being published, indexed, and widely referenced before the next training run) and appearing in its live Bing-powered web search results (a shorter game, dependent on your Bing indexing and ranking).

ChatGPT cites brands at a far lower rate than Perplexity, roughly 0.59% of responses in one large-scale analysis, versus 13.05% for Perplexity. It's a harder platform to get cited in, but with over 900 million weekly active users as of early 2026, being part of the training data or the Bing-retrieved sources is still enormously valuable.

Google AI Overviews and AI Mode

Google AI Overviews appear on approximately 48% of US Google searches as of early 2026, up from around 6.5% a year prior. They use Google's own search index, which means your traditional SEO work feeds directly into AI Overview eligibility. However, the relationship is weakening. While 76% of AI Overview citations came from top-10 organic results in mid-2025, BrightEdge research by early 2026 put that figure as low as 17%. Structured data, E-E-A-T signals, and content comprehensiveness now influence selection independently of ranking position.

Google AI Mode (the full conversational interface) is a distinct system that cites different URLs than AI Overviews around 86% of the time, even when reaching similar conclusions. If you're optimising for Google's AI, you need to think about both surfaces.

Claude (Anthropic)

Claude with web access fetches content directly from the open web at query time. It doesn't route through Bing or a proprietary index; it retrieves URLs directly. Claude tends to favour longer, more substantive content and demonstrates a stronger preference for depth and thoroughness over brevity. Getting cited by Claude is less about gaming an index and more about being the most comprehensive and credible source available on a topic.

Step 1: Check whether AI crawlers can actually reach you

Before anything else, verify that you aren't accidentally invisible to every AI tool at once. This happens more than you'd expect, because a robots.txt directive intended to block scrapers often blocks legitimate AI retrieval crawlers at the same time.

Visit yourdomain.com/robots.txt in a browser. Look for any of the following:

User-agent: *
Disallow: /

or

User-agent: GPTBot
Disallow: /

or broad wildcard disallows that catch all bots. If any of these are present, AI crawlers may be blocked entirely.

Also check that your pages are server-rendered and accessible without JavaScript execution. AI retrieval systems typically cannot execute JavaScript; they receive the HTML response from your server and parse it directly. If your site renders content client-side via React, Vue, Angular, or a similar framework, the content AI crawlers see may be an empty shell. Either implement server-side rendering (SSR), use static site generation, or make sure critical content is present in the HTML response before JavaScript runs.

Check for Cloudflare or CDN bot challenge pages that may be inadvertently serving AI crawlers a JavaScript challenge rather than your actual content. If your Cloudflare bot fight mode is aggressive, add explicit exceptions for known AI retrieval agents.

Step 2: Configure robots.txt for AI crawlers

This is the most important technical step, and the most misunderstood. There are two entirely separate categories of AI crawlers with completely different implications.

Training crawlers. Bots that collect your content to train future AI models. They improve the model's general knowledge but do not directly cause your site to appear in AI answers. Examples: GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider.

Retrieval crawlers. Bots that fetch your content at query time to generate real-time AI answers. Blocking these removes you from AI search results directly. Examples: OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot (Anthropic's dedicated search crawler, launched 2026).

The strategic choice: most businesses in 2026 block training-only crawlers while explicitly allowing retrieval crawlers. This keeps your content eligible to appear in AI-generated answers while limiting how much of your content enters future training datasets. One data point worth knowing: GPTBot has a crawl-to-referral ratio of approximately 1,255:1, meaning it crawls 1,255 of your pages for every visitor it sends back to you. ClaudeBot's ratio is around 20,583:1. The retrieval bots send actual traffic; the training bots primarily take content.

Here is a working robots.txt template for this approach:

# Traditional search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# AI retrieval crawlers: ALLOW (these power AI search citations)
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# AI training crawlers: BLOCK (these collect for model training)
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Diffbot
Disallow: /

# Default: protect sensitive paths
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /api/
Sitemap: https://yourdomain.com/sitemap.xml

A few important caveats. Bytespider (TikTok's crawler) is notorious for ignoring robots.txt entirely. HAProxy reported nearly 90% of AI crawler traffic across their customer base came from Bytespider, much of it bypassing disallow rules. Block it at the WAF or server level rather than relying on robots.txt alone. Perplexity has also been documented using undeclared crawlers that rotate user agents and IP addresses to bypass disallow directives, and Cloudflare published a detailed report on this in August 2025. robots.txt is a voluntary compliance mechanism; well-behaved bots respect it, poorly behaved ones don't.

Related readingIf you're not familiar with WAF-level bot blocking, Cloudflare's Bot Fight Mode, or how to identify rogue crawlers in your server logs, we cover all of it in detail in Bot Traffic Has Overtaken Human Traffic. Start there before setting server-level rules, so you don't accidentally block traffic you want to keep.

Review and update your robots.txt quarterly. New AI crawlers launch regularly, and existing ones occasionally change their user-agent strings.

Step 3: Structure your content for machine extraction

AI retrieval systems don't read your page the way a human does. They operate at the passage level, retrieving specific paragraphs or sections that match the query rather than the page as a whole. The practical implication is that how you structure each section matters as much as the overall quality of the page.

The answer-first rule

Lead each major section with a direct, concise answer in the first 40 to 75 words. This is the passage AI systems are most likely to extract and cite. If a section heading is a question ("How does X work?") and the very next sentence answers it clearly, you've pre-formatted that section for AI retrieval.

Compare:

Harder to extract: "The history of X is complex and multifaceted. In the early years, practitioners debated various approaches before the field eventually converged on what we now recognize as the standard method. This method involves…"

Easier to extract: "X works by [concise direct answer in one to two sentences]. The process involves three steps: [clear enumeration]. Here's why each step matters…"

Make sections self-contained

Each major section of your content should be comprehensible without the surrounding context. AI retrieval pulls passages, not full pages. If a paragraph requires the reader to have absorbed the previous three paragraphs to make sense of it, it won't function well as a standalone citation.

Use descriptive headings that mirror how people ask questions

Headings formatted as questions ("What is the best way to X?", "How long does X take?", "What's the difference between X and Y?") align your content structure with the natural language queries AI systems process. When a user asks Perplexity "how long does X take?" and your H2 is "How long does X take?", the structural match strengthens your candidacy as the cited source.

Tables, numbered lists, and specific data

AI systems extract and present structured data efficiently. A table comparing options, a numbered list of steps, or a specific statistic with attribution is easier to cite precisely than a descriptive paragraph covering the same ground. Where the information naturally fits a table or list format, use it.

Step 4: Implement schema markup

Schema markup is machine-readable metadata that tells AI systems and search engines explicitly what your content is, who created it, and what it contains. It doesn't replace content quality, but it gives AI retrieval systems a faster, cleaner path to understanding and trusting your content.

Priority schema types for AI visibility:

  • FAQPage, the most direct alignment with how AI systems extract answers. Each question-answer pair is cleanly structured for retrieval. Pages with FAQPage schema appear in Google AI Overviews at rates 2.7 to 3.2 times higher than pages without it, per 2025 studies. Note that Google restricted FAQ rich results in traditional search (the dropdown expansion in SERPs) to health and government sites in August 2023, but the schema remains highly effective for AI retrieval systems specifically. → schema.org/FAQPage
  • Article, which provides explicit metadata about author, publication date, and modification date. The dateModified field matters, since AI systems with freshness preferences use it to assess whether content is current. → schema.org/Article
  • Organization and Person, which establish entity identity. AI systems are increasingly entity-aware, meaning they understand the relationship between your brand, your authors, and the topics you cover. Explicit entity markup strengthens the model's confidence in citing you as an authoritative source on specific subjects. → schema.org/Organization · schema.org/Person
  • HowTo, for step-by-step instructional content. HowTo schema labels each step explicitly, and AI systems that encounter this markup can extract and present steps in their responses with higher precision. → schema.org/HowTo
  • BreadcrumbList, which helps AI systems understand where a page sits within your site structure and what broader topic it belongs to. → schema.org/BreadcrumbList

All schema should be implemented in JSON-LD format, placed in the <head> of the page. JSON-LD is Google's recommended format and is the most reliably parsed by AI systems.

Validate your implementation at schema.org/validator and Google's Rich Results Test after deployment.

Step 5: Submit to the right indexes

AI tools use different indexes. Making sure you're in all of them matters.

Google Search Console. Submit your XML sitemap and monitor coverage. Google's index feeds both traditional search and Google AI Overviews/AI Mode. Address any crawl errors promptly, because pages that Google can't reliably access won't be cited.

Bing Webmaster Tools. This is the one most people skip, and it's increasingly important. ChatGPT's web search is Bing-powered. If you're not in Bing's index, you don't exist to ChatGPT Search. Submit your sitemap, verify ownership, and monitor your Bing indexing status. It's free and takes under 30 minutes to set up.

Request indexing for high-priority pages in both tools whenever you publish or significantly update content. Don't wait for passive crawling on pages you need indexed quickly.

Step 6: Create an llms.txt file

llms.txt is an emerging standard, created by Jeremy Howard and analogous to robots.txt for AI, that places a structured, Markdown-formatted summary of your site at yourdomain.com/llms.txt. The file is designed to help AI systems understand what your site is, what it offers, and which pages are most important, without having to crawl and parse your entire site.

Important context on adoption: as of mid-2026, no major AI platform has officially confirmed reading llms.txt as a first-class input during inference. Semrush's controlled study found no statistically significant correlation between implementing llms.txt and improved AI search performance. Google's John Mueller has noted that major crawlers don't currently prioritize these files over standard HTML.

Implement it anyway. Here's why. The pattern for web standards is that sites publish first and platform adoption follows once a critical mass exists. robots.txt followed exactly this trajectory. Yoast SEO shipped an automatic llms.txt generator in version 25.3 in June 2025. The cost of implementation is low, and the option value if adoption formalizes is real.

What to include: keep it concise. An audit of 30 production llms.txt files found the most common failure was treating it as a second sitemap with 800 to 1,200 unsorted links. Aim for 20 to 50 high-signal links covering:

  • Core product or service pages
  • Pricing and plan pages
  • Your most authoritative content by topic
  • Technical documentation or API references (if relevant)
  • Key comparison or decision-support pages

Basic format:

# Your Brand Name

> One sentence description of what your site is and who it serves.

## Products / Services
- [Product Name](https://yourdomain.com/product): Brief description
- [Service Name](https://yourdomain.com/service): Brief description

## Key Resources
- [Guide Title](https://yourdomain.com/guide): What it covers
- [Documentation](https://yourdomain.com/docs): What it covers

## About
- [About Us](https://yourdomain.com/about)
- [Contact](https://yourdomain.com/contact)

Place the file at your domain root: yourdomain.com/llms.txt. Verify it's publicly accessible, served as plain text (text/plain), and update it when your product or content library changes significantly.

Step 7: Build external presence

AI systems don't only read your website. They retrieve content from across the web, and the more often your brand is mentioned, cited, or discussed on external authoritative sources, the more confidently an AI will include you in a relevant response.

Wikipedia. One of the highest-authority sources for AI training data and retrieval. If your brand, product, or key concepts have Wikipedia entries that mention you accurately, this is a strong signal. Don't create promotional pages (they'll be deleted), but make sure factual information about your brand is accurate where it does appear.

Reddit. Non-negotiable for Perplexity visibility. Reddit accounts for 46.7% of Perplexity's citations across most industries. Genuine participation in relevant subreddits (answering questions, sharing expertise, being a useful member of communities related to your topic) creates citation-worthy content. This is not about posting links to your site; it's about building presence in the discussions AI systems are actively reading.

Quora, LinkedIn, industry publications. The same principle applies. Substantive, expertise-demonstrating content on platforms that AI systems regularly retrieve from extends your footprint beyond your own domain.

Third-party reviews. G2, Capterra, Trustpilot, and similar review platforms are heavily cited by AI systems for product and service comparisons. Maintaining accurate profiles on relevant platforms and encouraging genuine reviews matters for AI visibility, not just traditional reputation management.

Press coverage and backlinks from authoritative sites. Brand mentions and citations in reputable publications serve as credibility signals that AI systems weight in determining which sources to trust. This is the same logic as traditional link building, applied to AI retrieval.

Brand entity consistency. Make sure your brand name, description, founding date, product names, and other factual details are consistent across your website, social profiles, Wikipedia, review platforms, and press mentions. AI systems synthesise information across sources, and inconsistency creates ambiguity about who you are. It can also result in hallucinated or inaccurate information being attached to your brand.

Step 8: Measure whether it's working

Traditional rank tracking doesn't capture AI visibility. A position-one ranking in Google doesn't mean you're being cited in AI Overviews, and Perplexity doesn't have a "position" in any conventional sense. You need different measurement approaches.

Manual sampling

The most accessible starting point is to regularly ask AI tools the questions your potential customers are asking, in your category, and observe whether your site is cited or mentioned.

For Perplexity, look at the Sources section of each response. For ChatGPT Search, observe which URLs appear in the search results it draws from. For Google AI Overviews, look for your domain in the source citations that expand from the overview. Do this weekly for your ten most important queries, track which competitors appear, and note whether your presence changes after implementing the steps in this guide.

GA4 referral traffic monitoring

In GA4, create a segment for referral traffic from AI platforms. The main domains to watch:

  • chat.openai.com / chatgpt.com for ChatGPT referrals
  • perplexity.ai for Perplexity referrals
  • claude.ai for Claude referrals
  • gemini.google.com for Gemini referrals
  • bing.com (with AI referral parameters) for Copilot referrals

AI referral traffic is still small in absolute terms for most sites, but track it as a trend line. Growth here is a leading indicator that your AI visibility efforts are working. Also note how this traffic behaves versus organic: AI-referred visitors typically have higher engagement rates and convert at higher rates than organic search visitors, which makes even small volumes worth tracking carefully.

Bing Webmaster Tools

Since ChatGPT Search uses Bing's index, monitoring your Bing crawl coverage, indexed pages, and any crawl errors is a proxy signal for ChatGPT Search eligibility.

Dedicated AI visibility tools

A category of purpose-built tools has emerged specifically for tracking brand mentions in AI responses. Semrush's AI Toolkit, Profound, Otterly, and similar platforms run systematic queries across AI platforms and report on brand citation frequency, context, and accuracy.

These tools are most relevant for brands where AI visibility is a significant business priority and where manual sampling at scale isn't feasible. If your team is regularly losing deals to competitors who are getting recommended by AI tools, the investment in tooling to understand the gap is justified.

How long does it take?

Timelines vary significantly by platform and by what you're fixing.

Perplexity: fastest. Structural fixes (updating robots.txt, adding schema, rewriting content to answer-first format) can produce results within two to six weeks. One documented case study showed a brand moving from zero to a 16.5% citation rate on relevant queries within six weeks.

Google AI Overviews: aligned with traditional SEO timelines. Content and structural improvements take two to four weeks to affect AI Overview citations; authority and backlink changes take longer.

ChatGPT Search (Bing-powered): a few weeks for content indexed by Bing, tied to Bing's crawl frequency for your site.

ChatGPT base model (training data): measured in months or longer, dependent on model retraining cycles. Publishing high-quality, widely referenced content now builds the probability of appearing in future training datasets, but there's no direct feedback loop.

The sequence that works

If you're starting from scratch, here's the order of operations.

Week 1: Audit your robots.txt and fix any blocks on retrieval crawlers. Verify server-side rendering. Set up Bing Webmaster Tools and submit your sitemap. Check Google Search Console for crawl errors.

Weeks 2 to 3: Audit your top ten pages for answer-first structure. Rewrite section openings to lead with direct answers in 40 to 75 words. Add or fix schema markup on key pages: at minimum, FAQPage where relevant, Article on blog content, and Organization on your homepage.

Month 1: Create your llms.txt file. Build out or refresh your Bing-indexed content. Begin systematic manual sampling of AI tool responses for your target queries to establish a baseline.

Ongoing: Develop genuine external presence through Reddit participation, third-party directory accuracy, and press coverage. Refresh content regularly and update dateModified timestamps. Track AI referral traffic in GA4 monthly. Review and update your robots.txt quarterly as the AI crawler landscape evolves.