How AI Search Grounding Actually Works: Google vs OpenAI vs Anthropic

Reverse Engineering

An analysis of how Google, OpenAI, and Anthropic handle web grounding, comparing their search processes, citation rates, and how they process page content.

Listen

When you ask a modern artificial intelligence model a question that needs fresh facts, it does not just answer from memory. It runs a web search, reads the results, and weaves those pages into its response. This process is called grounding. But behind the scenes, different platforms handle this search in wildly different ways.

Every platform uses a pipeline called a grounding funnel. They query the web, receive pages, extract readable text, and finally, cite their sources. But how they tighten this funnel reveals their unique personalities.

Google is highly economical. Its model, Gemini, retrieves only a few pages and cites almost every single one of them. It does not waste space on sources it did not use, though it wraps the actual links in redirect web addresses.

OpenAI takes the opposite approach. It is incredibly fast and casts a massive net. It might pull in nearly forty pages to read short snippets of text, but then cite only two. With OpenAI, there is a huge gap between what the model reads and what it actually credits.

Finally, Anthropic's Claude is the most thorough, but also the slowest. It performs a deep, two-pass reading process. Claude gives you high visibility, showing you not only what it cited, but also the pages it considered and rejected.

Ultimately, a single search query on the same day can look completely different depending on the model you use. What one artificial intelligence considers a crucial source, another might ignore entirely.

When you ask a modern AI model a question that needs fresh facts, it doesn’t answer from memory. It runs its own web search, reads what comes back, and weaves some of those pages into its answer. That process is called grounding. But “it searches the web” hides a lot. Each platform receives a different number of pages, keeps a different fraction of them, hands back a different kind of evidence, and cites at a wildly different rate. To show this concretely, we ran the exact same query — “best ai seo agency 2026” — through all three, on the same day, with no location set, and inspected the raw grounding data each one returned.

The three runs:

Three answers to one question, three completely different shapes underneath.

The one idea that explains all three: the grounding funnel

Every platform runs the same basic pipeline, just with different valves:

search query → pages RECEIVED → pages with READABLE content → pages CITED in the answer

Received = every URL the search step handed back to the model.
Readable content = the subset the model actually got text for (a snippet,
a window, or an encrypted blob).
Cited = the pages whose URL ends up footnoted in the final answer.

The gap between received and cited is where each platform reveals its personality. Google cites almost everything it receives; OpenAI receives a flood and cites a trickle; Anthropic sits in between. Let’s walk through each.

Google (Gemini): “I only show you what I used”

Google’s model is the most economical. It sent only 205 input tokens — basically just the question — then did the heavy lifting server-side and streamed back a long, fully-written answer (15,321 output tokens).

It rewrote the question into two searches before answering:

top ai seo agencies 2026
best ai seo agency 2026

The funnel barely narrows: 7 received, 7 cited. This is the defining trait of Google’s grounding — it doesn’t expose pages it didn’t use. Every source you see is a grounding chunk that genuinely supported a sentence. Behind those 7 sources sit 24 support segments: individual answer sentences mapped back to the page that justifies them. For example, this sentence —

“In 2026, the landscape of search engine optimization has shifted dramatically

from traditional ‘blue links’ to Generative Engine Optimization (GEO) and

Answer Engine Optimization (AEO)”

— is backed by both europeanbusinessreview.com and thriveagency.com.

The URL catch: everything arrives wrapped in a redirect. Google never hands you the real link directly. Each source comes as a vertexaisearch.cloud.google.com/grounding/... redirect that has to be resolved to recover the true destination:

What Google sendsWhat it resolves toCitesvertexaisearch.cloud.google.com/grounding/...https://thriveagency.com/news/best-ai-seo…6vertexaisearch.cloud.google.com/grounding/...https://dejanmarketing.com/best-ai-seo…6vertexaisearch.cloud.google.com/grounding/...https://wellows.com/blog/top-ai-seo…5

It also gives you the retrieved page content per source — e.g. a Catalyst review snippet stamped (2026-06-09).

Google’s philosophy: tight, citation-first, nothing wasted — but the real URLs are hidden behind redirects you must unwrap.

OpenAI (gpt-5.5): “Here’s a huge pile — I’ll cite two of them”

OpenAI was the fastest by far (14 seconds) and ran with reasoning effort set to none (just 60 reasoning tokens), yet it cast the widest net.

It fanned the question into three searches:

best AI SEO agencies 2026 AI search optimization agency
top AI SEO agency generative engine optimization 2026
best GEO agency AI search optimization 2026

The funnel is dramatic: 39 received → 37 with readable content → 2 cited.

That’s roughly a 20:1 drop. OpenAI is happy to show you 37 pages of evidence but footnotes only the two it leaned on.

The snippets are a sliding window. This is OpenAI’s most distinctive feature.

Every page comes back as a short extractive window, explicitly tagged with a word limit. In this run:

6 pages capped at a 100-word window (these are “news”-class results)
31 pages capped at a 200-word window (general web / forum results)

Each window is a ...-joined collage of the most relevant passages plus freshness metadata. Here is the actual snippet OpenAI kept for the page it ended up citing, marketingltb.com (search-class, 200-word window, published “2 weeks ago”):

10 Best AI SEO Agencies in 2026

… In 2026, Directive has extended this approach into AI search optimization,

building content architectures designed to appear in Google AI Overviews,

ChatGPT recommendations, and Perplexity answers …

… iPullRank, a New York-based enterprise SEO and content strategy agency

founded by Mike King … is one of the most technically distinctive AI SEO

agencies in the market.

The two it actually cited, and the claim each one supports:

Cited pageClassWindowClaim it groundsmarketingltb.comsearch200 w“Several recent 2026 lists rank agencies like Directive, iPullRank, Searchbloom…”techradar.com (Ahrefs)news100 w“Recent SEO platforms are also adding AI visibility tracking across ChatGPT, Perplexity, Google AI Overviews…”

URLs are clean and direct — no redirect wrapper like Google — though cited links carry a ?utm_source=openai tracking tag.

OpenAI’s philosophy: retrieve aggressively, expose readable windows for everything, but cite conservatively. Great for seeing the evidence pool, weak if you equate “cited” with “considered.”

Anthropic (Claude): “I’ll read deeply, but the snippets are sealed”

Claude took the longest (138 seconds) and consumed by far the most input tokens (83,708) — because it grounds in two passes. First it narrates and searches (“I’ll search for current information on the top AI SEO agencies in 2026.”), then it feeds the retrieved material back through itself to write the final answer.

The funnel: 14 received → 9 cited, 5 unselected. Unlike Google, Claude does keep pages it ultimately didn’t cite (the 5 “unselected” ones), so you can see what it considered and passed over — e.g. it received frase.io, revvgrowth.com and stridec.com but cited none of them.

The catch: the snippet content is encrypted. Each received page carries a sealed blob — between roughly 3,200 and 4,800 bytes — that you cannot read directly. It’s opaque by design.

You recover it as “boundary lines.” By passing the encrypted material back in a second turn, the snippet can be reconstructed as a list of passages, long ones collapsed to a start […] end form. One page alone yielded 44 of these lines; across the 14 sources they range from 19 to 44 each. A sample from spicymargarita.co:

“Intero Digital is a large, full-service […] into AI-driven search environments.”
“Good AI SEO is also good SEO.”
“There is no separation.”
“Unlike traditional SEO where you […] mentions, and references inside generative outputs.”

And Claude records the exact verbatim span it quoted into the answer (its cited_text), e.g.:

“Good AI SEO is also good SEO. There is no separation. Thrive integrates

AI-powered SEO into every campaign…”

Anthropic’s philosophy: deep, two-pass reading with a generous citation rate and full visibility into considered-but-rejected sources — but the raw snippets are sealed and only recoverable indirectly, which is why it’s the slowest and most token-hungry of the three.

Side by side

What this means in practice

If you want to know what an AI cited, the platform changes the answer.
The same query yields 7, 2, and 9 cited sources. “AI mentioned us” is only
meaningful relative to a specific platform’s citation behaviour.
OpenAI’s wide-receive / narrow-cite split is a visibility trap. Being one
of the 37 pages OpenAI read is very different from being one of the 2 it
cited. Both are worth tracking, but they’re not the same win.
Google’s redirect wrapping means any tool reading Gemini grounding must
resolve URLs before it can match them to your domains.
Anthropic gives the richest “considered set” (cited + rejected) but hides
the snippet text, so reconstructing what it actually read costs a second pass —
and real time and tokens.

Same question. Same day. Three very different machines deciding what counts as evidence.

Dan Petrovic · Jun 13, 16:31