Research Series

LLM Citation vs Organic Visibility

When AI cites what Google won't index β€” a cross-client analysis

πŸ”¬
What This Research Looks At

This research cross-references GA4 LLM referral traffic (sessions from ChatGPT, Perplexity, Gemini, Copilot, Claude and Kagi) against Google Search Console coverage data to understand the relationship between organic visibility and LLM citation. The headline finding across both clients: Google indexation is the strongest predictor of LLM citation β€” indexed pages are 5–7x more likely to receive LLM traffic than non-indexed ones. Getting content indexed remains the foundation. The secondary finding is that a small number of specific, factual pages do get LLM citations without Google indexation β€” this is an interesting find since most SEO experts today claim that LLMs can't cite what is not indexed.

Double Win
Indexed by Google and cited by LLMs β€” the gold standard
LLM Only
Cited by LLMs but Google refuses to index β€” hidden opportunity
GEO Gap
Indexed by Google but zero LLM citations β€” visible but AI-invisible
πŸ’‘
Cross-Client Takeaways
Client A
Double Wins15
LLM-Only (not indexed)2
LLM citations as % of indexed9%
Total LLM sessions233
Top LLM sourceChatGPT (80%)
Client B (ContentForge)
Double Wins34
LLM-Only (not indexed)3
LLM citations as % of indexed52%
Total LLM sessions122
Top LLM sourceChatGPT (52%)
Indexation Is the Strongest Predictor of LLM Citation
Across both clients, indexed pages are 5–7x more likely to receive LLM citations than non-indexed ones. Client B: 52% of indexed pages cited vs 7.7% of non-indexed. Client A: 9.4% vs 1.8%. Getting Google to index content remains the single most important step for LLM visibility β€” GEO does not bypass SEO fundamentals, it builds on them.
Structured, Factual Content Wins Both Channels
The same content types dominate both Google rankings and LLM citations: cost comparisons, regulatory guides, "X vs Y" formats, and data-led pages. Both algorithms reward content that directly and specifically answers a question. ContentForge's structured output achieved a 52% Double Win rate from indexed pages in 3 months versus Client A's 9% with manually produced content.
LLMs Occasionally Override Google's Rankings
A small but notable exception: Client B's /services/drones-for-surveying-and-mapping sits at avg position 29.7 on Google yet attracted 8 LLM sessions. Across both clients, 5 pages received LLM citations despite not being indexed. These are almost always highly specific, factual posts β€” the edge case worth fixing, not the rule to build strategy around.
AI Is Already a Commercial Referral Channel
Both clients see LLMs referring traffic to commercial and brand pages β€” /apply-now and /founder-profile for Client A; /contact-us, /who-we-are, and /services for Client B. AI is acting as a bottom-of-funnel channel, not just a research tool. These sessions need conversion tracking separate from organic.
πŸ“‹
Client Analysis 1 of 2
Client A
MBA & Masters Admissions Consultancy Β· Data: April 2026
Site Indexation Snapshot
Total Pages
273
Known to Google
Indexed
160
59% of site
Crawled, Not Indexed
48
Google visited, rejected
Discovered, Not Indexed
65
Never crawled
Indexation Context
41% of the site (113 pages) is invisible to Google. The bulk of non-indexed pages are question-format posts published in late 2025 β€” Google is treating them as thin or duplicate content. This makes the LLM citation picture below more striking: AI models are citing content that Google has explicitly declined to rank.
LLM Traffic Snapshot
Total LLM Sessions
233
Dec 2025 – Feb 2026
Pages Cited
48
Unique URLs receiving LLM traffic
Top Source
ChatGPT
186 sessions (80%)
Other Sources
4
Perplexity, Gemini, Copilot, Kagi
Citation vs Indexation Classification
βœ… Double Win
15
Pages indexed by Google and cited by LLMs β€” capturing both channels
⚑ LLM Only β€” Google Rejected
2
LLMs cite these pages but Google refuses to index them
⚠️ GEO Gap
145
Indexed by Google but receiving zero LLM citations
πŸ”• Dark Pages
111
Not indexed by Google, no LLM citations either
The 2 Pages LLMs Cite That Google Won't Index
Page URL GSC Status Last Crawled Google Clicks LLM Sessions
/post/insead-mba-interview-preparation-guide
Crawled – Not Indexed Nov 2025 0 ~3
/post/how-to-discuss-setbacks-and-challenges-in-your-insead-mba-interview
Crawled – Not Indexed Nov 2025 0 ~2
What This Means
Both are INSEAD interview posts β€” question-format content Google is rejecting as thin. Despite that, ChatGPT has trained on them and is sending real sessions. These are the exception that proves the rule: specific, factual content can get LLM pickup without Google indexation, but the far more valuable opportunity is fixing the indexation so these pages capture both channels rather than just one.
15 Double Win Pages β€” Indexed + LLM Cited
Page GSC Clicks Impressions LLM Source(s)
/post/insead-mba-fees-cost-breakdown
52 54,845 ChatGPT Perplexity
/post/insead-mba-deadlines
36 14,408 ChatGPT
/post/which-european-mba-is-best-for-consulting-careers-insead-lbs-or-iese
54 10,126 ChatGPT Perplexity
/post/insead-mba-gmat-gre-scores
49 7,628 ChatGPT
/post/finding-value-a-guide-to-europe-s-most-affordable-top-mba-programmes
39 17,227 ChatGPT
/post/insead-mba-cv-resume-formatting-tips-templates
47 2,591 ChatGPT
/post/insead-vs-lbs-vs-oxford-mba
42 5,276 ChatGPT Perplexity
/post/best-mba-programs-in-europe
23 10,095 ChatGPT
/post/insead-scholarships-application-guide
7 642 ChatGPT
/post/insead-mba-essays
3 97 ChatGPT
/post/insead-vs-lbs
7 1,510 ChatGPT
/founder-profile
Founder profile page
β€” β€” ChatGPT
/apply-now
Commercial page
β€” β€” ChatGPT
/emba/cambridge
β€” β€” ChatGPT
/post/will-lbs-and-cambridge-judge-accept-my-gre-score-instead-of-gmat
15 1,602 Perplexity
Pattern in the Double Wins
The 15 Double Win pages share three characteristics: they are data-led (fees, GMAT scores, deadlines), comparative ("X vs Y" or "best for Z"), or structured factual guides. These formats answer AI queries directly and concisely β€” exactly what both Google and LLMs reward. The founder profile and /apply-now pages being cited by ChatGPT is notable: AI is acting as a sales referral channel for branded search.
πŸ“‹
Client Analysis 2 of 2
Client B
Drone Inspection & Services Β· ContentForge (Jan–Mar 2026) Β· Data: April 2026
Context
Client B is a GEOforge ContentForge client β€” an automated content production programme publishing AI-assisted blog posts targeting drone inspection, insurance, energy, and construction use cases. This report covers data from January to end of March 2026, showing what structured automated content delivery produced across both Google and LLM channels during that period.
Site Indexation Snapshot
Total Pages
104
Known to Google
Indexed
65
63% of site
Crawled, Not Indexed
21
Google visited, rejected
Discovered, Not Indexed
18
Never crawled
Indexation Context
37% of the site (39 pages) is not indexed β€” a similar ratio to Client A. The crawled-not-indexed pages are predominantly newer ContentForge posts from Feb–March 2026, suggesting Google is applying quality filtering to the higher-volume automated content. The 63% index rate on a programmatic content site is a reasonable baseline and reflects well-structured, topic-specific content.
LLM Traffic Snapshot (Jan–Mar 2026)
Total LLM Sessions
122
Across 5 AI sources
Pages Cited
37
Unique URLs receiving LLM traffic
Top Source
ChatGPT
64 sessions (52%)
Notable Source
Claude
27 sessions (22%) β€” 2nd largest
Claude.ai as a Referral Source
Claude.ai sent 27 sessions β€” making it the second-largest LLM referral source, ahead of Gemini (18) and Perplexity (12). This is notable for a ContentForge client: AI-assisted content is being indexed and cited by the same AI models used to help produce it. Claude referred traffic to the homepage, disaster relief posts, and the /services page β€” covering both informational and commercial intent.
Citation vs Indexation Classification
βœ… Double Win
34
Pages indexed by Google and cited by LLMs β€” 92% of all LLM-cited pages
⚑ LLM Only β€” Google Rejected
3
LLMs cite these pages but Google has not indexed them
⚠️ GEO Gap
31
Indexed by Google but receiving zero LLM citations
πŸ”• Dark Pages
36
Not indexed by Google, no LLM citations either
3 Pages LLMs Cite That Google Won't Index
Page URL GSC Status Last Crawled LLM Source Sessions
/blog/blue-uas-compliance-requirements-for-critical-energy-infrastructure-inspections
Crawled – Not Indexed Recent Claude 1
/blog/how-drones-preserve-disaster-scenes-for-legal-investigations
Discovered – Not Indexed β€” Claude 1
/blog/drone-assisted-roof-inspection
Discovered – Not Indexed β€” Claude 1
What This Means
Only 3 of 37 LLM-cited pages (8%) are not indexed β€” a significantly healthier ratio than Client A's. All three were cited exclusively by Claude.ai. The topics (Blue UAS compliance, legal drone evidence, roof inspection) are specific and factual β€” exactly what AI training captures even when Google filters pages out. With content production stopped, fixing indexation on these 3 would recover the lost LLM referral potential at no additional content cost.
Top Double Win Pages β€” Indexed + LLM Cited
Page GSC Clicks Impressions Avg Position LLM Sessions LLM Sources
/blog/drone-thermography
11 3,166 15.2 10 ChatGPT
/blog/roi-of-drone-pipeline-inspections-vs-manual-crews
8 9,073 9.2 8 ChatGPT Gemini
/services/drones-for-surveying-and-mapping
4 6,602 29.7 8 ChatGPT
/blog/drone-vs-manual-inspections-cost-for-adjusters
22 5,365 7.3 7 ChatGPT Gemini
/blog/drones-in-disaster-relief-drones-for-emergency-management
13 4,182 9.7 7 ChatGPT Claude Perplexity
/blog/can-ai-powered-drones-auto-detect-roof-hail-damage
40 6,910 7.3 6 ChatGPT Perplexity
/blog/drones-thermal-imaging-for-detecting-pipeline-leaks
30 21,124 8.5 5 ChatGPT Gemini Perplexity
/blog/state-regulations-for-drone-use-in-insurance-inspections
37 9,308 7.4 3 ChatGPT
/blog/insurance-companies-that-accept-drone-inspection-reports
36 6,508 9.6 4 ChatGPT
/who-we-are
Company profile page
21 837 5.9 4 ChatGPT Claude
Pattern in the Double Wins
The top LLM-cited pages follow a consistent pattern: cost comparisons (drone vs manual, cost breakdowns), regulatory/compliance topics (FEMA guidelines, state regulations, OSHA), and sector-specific use cases (pipeline leaks, disaster relief, hail damage). These are exactly the types of factual, decision-relevant queries AI users ask. Notably, /services/drones-for-surveying-and-mapping is ranked at an average position of 29.7 on Google yet attracted 8 LLM sessions β€” LLMs are citing a page Google has buried on page 3.
πŸ—οΈ
Study Findings
Proof Points
β†’ Google indexation is the single strongest predictor of LLM citation β€” indexed pages are 5–7x more likely to receive LLM traffic across both clients
β†’ Structured, factual content formats (cost comparisons, "X vs Y", regulatory guides) dominate both Google rankings and LLM citations β€” the same content wins both channels
β†’ LLMs are already acting as a bottom-of-funnel referral channel β€” commercial and brand pages received AI-referred traffic, not just informational posts
β†’ Contrary to prevailing expert opinion, LLMs can and do cite pages that Google has not indexed β€” 5 confirmed instances across both clients, all highly specific and factual in nature
Open Questions
! What content characteristics cause indexed pages to fall into the GEO Gap (indexed by Google but zero LLM citations)? Format, depth, and structured data are likely factors worth testing
! Does the 5–7x indexation multiplier hold across industries beyond admissions and drone services, or is it sector-dependent?
! How quickly do LLM citation rates decay when content production stops β€” and how much of the indexed base retains citations over a 6–12 month window?