SRO & Grounding Snippets

by

in

Source: dejan.ai/blog/category/ai-seo/sro/
Author: Dan Petrovic, DEJAN AI
Posts analyzed: 5 (Sep 2025 – Feb 2026)


What is SRO?

SRO — Selection Rate Optimization — is a new discipline coined by DEJAN that addresses visibility in AI-powered search (Google AI Mode, Gemini Chat, AI Overviews). It is the AI-native successor to traditional SEO click-through-rate optimization.

The core premise: ranking #1 in traditional search is necessary but no longer sufficient. In AI search, your page content goes through a grounding pipeline that extracts only select sentences to feed to the generative model. If your content isn’t selected and grounded, you’re invisible — even if you rank.


The Core Metric: Selection Rate (SR)

Selection Rate (SR) measures how often an AI system selects and incorporates a specific source from the total set of grounding results it retrieves.

SR = (Number of selections / Total available results) × 100

SR is the Gen AI equivalent of CTR. Unlike CTR, which requires a user click, SR captures the AI’s implicit selection behavior — what information actually influences outputs versus what gets retrieved but ignored.


How Google’s Grounding Pipeline Works

DEJAN reverse-engineered Google’s Gemini grounding pipeline by examining raw groundingSupports and groundingChunks from the API. The pipeline operates in this sequence:

  1. User enters a prompt.
  2. Query fanout: A model decomposes the prompt into single-intent sub-queries (fanout queries).
  3. Retrieval: For each fanout query, Google’s search index returns ranked results, narrowed to ~5–20 sources per query.
  4. Extractive summarization (snippet construction): For each selected result, the system builds a grounding snippet. Page content is chunked into sentences, each scored against the query, and the highest-scoring chunks are assembled into the snippet — joined by ellipses where non-contiguous.
  5. Grounding context assembly: All snippets across all sources are supplied to the model as context alongside the user prompt, media, and personalization signals.
  6. Synthesis & attribution: The model generates its answer, and each claim is attributed back to specific source sentences.

Key insight: Because snippets are query-dependent, the same page yields different extractions for different fanout queries.


The Extraction Method: Extractive Summarization

Google uses extractive (not abstractive) summarization for grounding. This means it pulls exact sentences from your page — it does not rewrite or paraphrase your content for the grounding context.

Observed Extraction Characteristics

  • Query-focused selection: Sentences semantically close to the query are strongly preferred. Unrelated sections on the same page are skipped entirely.
  • Heavy positional/lead bias: Opening paragraphs are extracted almost wholesale, regardless of content.
  • Structural noise ingestion: Table-of-contents entries, section headers, link artifacts, and markers are treated as sentences and scored alongside prose.
  • Sentence-level granularity: The extraction unit is individual sentences, not passages or paragraphs.
  • Confidence scores: Per-chunk scores range from 0.1 to 1.0, representing grounding-source-to-generative-chunk relevance.

DEJAN successfully fine-tuned microsoft/deberta-v3-large to produce results similar to Google’s extraction behavior.


The ~2,000 Word Budget

A pivotal finding from analysis of 7,060 queries with 2,275 tokenized pages and 883,262 total snippets:

Each query operates under a fixed grounding budget of approximately 2,000 words total, distributed across sources by relevance rank.

PercentileTotal Words Per Query
p251,546
p50 (median)1,929
p752,325
p952,798

This budget is remarkably consistent regardless of the number of sources used or the length of individual pages. The average grounding chunk is ~15.5 words.


Rank Determines Your Share of the Budget

The fixed budget is divided among sources based on relevance ranking:

RankMedian WordsShare of Total
#153128%
#243323%
#337820%
#433017%
#526613%

The #1 source gets 2× the grounding of the #5 source. You’re competing for share of a fixed pie, not expanding it.


Content Survival Rate: Only ~32% Gets Through

On average, only about one-third of a page’s content makes it through the AI search filter into the grounding context. But this varies dramatically by page length:

Page LengthAvg Grounding WordsCoverage
<1K words37061%
1–2K words49235%
2–3K words53222%
3K+ words54413%

Grounding plateaus at ~540 words / ~3,500 characters. Pages over 2,000 words see sharply diminishing returns — more content dilutes your coverage percentage without increasing what gets selected.


What Gets Included vs. Excluded

Based on DEJAN’s annotated analysis of actual grounding extractions:

Included (Green)

  • Core service/value proposition statements
  • Specific customization and feature details
  • Key process steps and how-to information
  • Pricing, product specifications, concrete details
  • Customer support/interaction information
  • High-relevance keyword/topic phrases

Excluded (Red)

  • Navigation and structural elements (menus, footers, headers)
  • Time-sensitive promotional claims (“50% off”)
  • Unrelated product categories
  • Verbose customer review text (only summaries survive)
  • Legal/copyright boilerplate
  • Decorative or redundant text

Primary Bias on Selection Rate

The primary bias affecting SR is the model’s internal relevance perception of the grounding entity (brand, site, source). This is essentially the model’s pre-existing “worldview” about how relevant a source is for a given topic — formed during training and fine-tuning.

How It Works

If a brand is perceived as highly relevant for a topic (e.g., “custom cycling jerseys”), it’s much more likely to achieve a higher SR when supplied as a grounding source. A brand with low primary bias for that topic will be deprioritized even if it appears in the result set.

Can Primary Bias Be Influenced?

  • Pre-training data: Very unlikely to influence, as LLMs have moved past noisy web data in favor of curated datasets.
  • Fine-tuning data: Definitely a factor. Traditional SEO on-page and off-page work can influence fine-tuning data.
  • Timeline: ~3–6 months for major fine-tunes and sub-releases; ~annually for major model releases.

Primary Bias Detection: Tree Walker Algorithm

DEJAN developed a “Tree Walker” algorithm that walks the probability paths of what a model wants to say about a brand, identifying high-uncertainty spots — token positions where the model is least confident about associating a concept with the brand. These represent opportunities for brand-association strengthening.


Search Compression Effect

Analysis of 158 grounding responses revealed a power-law relationship between snippet count and snippet length:

$$\bar{L} = 1283.15 \times N^{-0.07}$$

The exponent β ≈ 0.07 shows a weak but consistent compression effect: as more snippets are added, average snippet length decreases slightly. The system emphasizes coverage over brevity, compressing only mildly — a sign of balanced aggregation rather than aggressive summarization.

Total text volume remains relatively stable across responses, implying word-limit constraints operate at the response level rather than per snippet.


Strategic Implications for Content Creation

1. Density Beats Length

A tight 800-word page can get 50%+ of its content grounded. A 4,000-word page gets ~13%. Focus on making every sentence count rather than adding volume.

2. Front-Load Key Information

The lead/positional bias means sentences appearing early and standalone are much more likely to be extracted. Put your most important, query-relevant statements at the top.

3. Write Grounding-Friendly Copy

Clear, factual, self-contained statements perform best. Each sentence should be able to stand alone as a useful answer fragment. Avoid sentences that rely heavily on surrounding context to make sense.

4. Address Fanout Query Angles

The AI decomposes prompts into sub-queries. Structure content to directly address multiple facets of intent, not just the primary keyword.

5. Eliminate Structural Noise

Google’s system ingests ToC entries, headers, and navigation artifacts as “sentences.” Clean, well-structured pages reduce noise competing with your actual content for selection.

6. Consider Modular Content Architecture

Dan Petrovic suggested testing “small modular content pieces that can be assembled into different content units like lego blocks” — controlling completeness of context and avoiding undesirable narrative fragmentation.

7. Build Brand-Level Relevance Signals

Primary bias is the biggest lever on SR and it’s rooted in model training data. Invest in the traditional off-page and on-page signals that shape how models perceive your brand’s topical authority.


Tools & Resources

  • Grounding Snippet Extraction Tool: snippets.dejan.ai — Enter a query to see which URLs and exact sentences Gemini extracts for grounding.
  • AI Rank: airank.dejan.ai — Measures brand relevance as perceived by Google’s AI.
  • Tree Walker Algorithm: Early-access tool for detecting low-confidence brand associations in model outputs.
  • Fine-tuned DeBERTa model: Demo replicating Google’s extractive behavior using microsoft/deberta-v3-large.

Key Takeaways (TL;DR)

  1. AI search uses extractive summarization — Google pulls exact sentences from your pages, not paraphrases.
  2. There’s a ~2,000 word budget per query split among sources by rank. You’re fighting for a share of a fixed pie.
  3. Only ~32% of your content survives the grounding filter on average. Long pages fare worse.
  4. Rank still matters — the #1 source gets 2× the grounding share of #5.
  5. Primary bias (model’s internal brand perception) is the biggest factor in whether your content gets selected. It takes 3–6 months to shift via fine-tuning cycles.
  6. Density > Length — concise, front-loaded, self-contained sentences that directly address query intent win the grounding game.
  7. SRO is the new discipline — optimizing not just for ranking, but for being selected and grounded by AI systems.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *