TL;DR
We reverse-engineered Google AI Mode by analyzing 1,394 snippets across 213 prompts to see what Gemini actually receives from organic search. The findings upend traditional SEO assumptions:
What We Found:
- Snippets beat full pages 67.6% of the time in semantic relevance to queries
- Position 10 performs 98% as well as position 1 for AI visibility (2% quality difference)
- Content length has zero impact on snippet quality or AI inclusion
- 6-8 sources contribute per query — it’s multi-source synthesis, not winner-take-all
- 92% representativeness — snippets faithfully capture source pages
What It Means:
- Your snippet is your AI visibility — Gemini never sees your full page
- Rankings anywhere on page one now matter (positions 4-10 are valuable again)
- Stop padding word count — 1,500 focused words > 4,000 rambling words
- Optimize for citations, not just clicks — AI answers reduce direct traffic
- Traditional SEO still determines rankings; snippet quality determines AI inclusion
Bottom Line: Google AI Mode doesn’t bypass search, it changes what matters in search results. The SERP → Model era has begun. Adapt your strategy accordingly.
The Analysis
We analyzed 1,394 search result snippets across 213 prompts fed to Gemini to understand what the AI actually sees when providing grounded answers. The findings reveal how Google AI Mode, AI Overviews, and answer engines consume organic search data. The implications for SEO and Answer Engine Optimization (AEO) are immediate and actionable.
The core finding: Google’s snippets outperform full page content 67.6% of the time in semantic relevance to user queries while maintaining 92% representativeness of the source pages. This means when Gemini provides an AI answer, it predominantly works from focused snippets—not full web pages. Your content’s snippet is your AI visibility.
What This Means For AI SEO
If you create content for the web, these findings directly impact your AI visibility strategy:
1. Snippet optimization is AI optimization. When someone uses Google AI Mode or asks Gemini a question with grounded search, the AI model receives snippets—not your full page. Your meta descriptions, opening paragraphs, and featured snippet targets are what the AI sees and cites. Optimize for snippet clarity and completeness, not just click-through.
2. Position 10 is now as good as position 3. Traditional SEO focused on top-3 rankings because click-through rates dropped dramatically afterward. In AI search, positions 1-10 perform nearly equivalently (quality variation < 2%). The AI considers all ten results when synthesizing answers. Ranking on page one anywhere is sufficient for AI visibility.
3. Page length does not determine AI inclusion. Long-form content (5,000+ words) and concise content (1,000 words) both produce equally high-quality snippets for AI consumption. The AI visibility game is about semantic density and clear topic boundaries—not word count.
4. Answer Engine Optimization is snippet engineering. AEO is not a separate discipline from SEO—it’s a specialization. The same snippets Google extracts for search results are what Gemini receives. Structured data, clear headings, and topic coherence directly influence how the AI represents your content.
5. The snippet-to-page gap reveals AI trust. As suggested by Andrea Volpini we use “semantic tension” metric—the difference between how well a snippet matches a query versus how well the full page matches—averages +0.0253. Positive tension means the snippet is more relevant than the source page. Google’s extraction algorithm identifies and isolates the most relevant segments, even when the full page is diffuse. This is Google’s AI trust signal: focused, coherent content sections.
Now let us examine how we discovered these patterns.
Probing Search-Grounded Responses
We prompted Gemini with grounded search enabled. This configuration forces Gemini to cite organic search results when formulating answers—the same mechanism underlying Google AI Mode and AI Overviews.
For each query, we captured:
- The user’s search query (prompt)
- All snippets returned in search results (typically 8-10 per query)
- The full page content each snippet was extracted from
- Gemini’s semantic understanding of each element
Using Google’s embedding-gemma-300m model (the same embedding family Gemini uses internally), we computed:
cos_p_s = cosine_similarity(prompt_embedding, snippet_embedding)
cos_p_g = cosine_similarity(prompt_embedding, page_embedding)
Δ = cos_p_s - cos_p_g
This “semantic tension” (Δ) quantifies whether the snippet is more or less relevant than its source page. Positive values indicate the snippet extraction added value by focusing content. Negative values indicate the full page would have been more relevant.
We also measured:
- Representativeness: How faithfully the snippet captures the source page’s meaning
- Rank position: Where the result appeared in search rankings (1-10)
- Length ratios: Snippet and page character counts
- Consistency: Variance in quality within each query’s result set
Dataset: 213 unique queries, 1,394 individual snippet-page pairs, collected November 2025.
Primary Findings
Snippets Dominate AI Context, And They’re Better
Mean Δ = +0.0253 with 67.6% of snippets showing positive semantic tension. When Google extracts a snippet for Gemini to process, that snippet is typically more semantically aligned with the user’s query than the full page it came from.
This is not marginal. The distribution shows:
- 67.6% positive Δ: snippet more relevant than page
- 32.4% negative Δ: page more relevant than snippet
- Median Δ = +0.0189
Statistical significance: t-test p < 0.001. The null hypothesis—that Google randomly extracts snippets regardless of semantic concentration—is decisively rejected.
Implication for content strategy: Google’s algorithm identifies and extracts the most query-relevant portion of your page. If your most valuable content is buried in paragraph 12, that’s what the AI might see—not your carefully crafted introduction. Semantic density beats narrative structure for AI visibility.
Snippets Are Highly Representative Despite Compression
Mean representativeness = 0.9201. When Google shows Gemini a snippet, that snippet faithfully captures 92% of the source page’s semantic meaning. The distribution is heavily skewed toward perfect representation:
- 88.7% of snippets: representativeness > 0.90
- 9.1% of snippets: representativeness 0.80-0.90
- 2.2% of snippets: representativeness < 0.80
This high fidelity means snippets are not misleading excerpts—they genuinely represent page content. When Gemini cites your page based on a snippet, the citation is usually accurate to your full content.
Implication for content integrity: You cannot game the system by front-loading a snippet-optimized introduction that contradicts the rest of your content. Google’s representativeness checks ensure snippets align with full page semantics. Authenticity scales to AI.
Rank Position 1-10 Shows Minimal Quality Variation
Traditional SEO emphasizes position 1-3 because organic click-through rates plummet afterward. But when feeding AI models, Google appears to treat the entire first page as roughly equivalent in quality:
| Position | Avg Snippet-Query Similarity | Avg Representativeness |
|---|---|---|
| 1 | 0.7164 | 0.9195 |
| 2 | 0.7121 | 0.9223 |
| 3 | 0.7089 | 0.9187 |
| 5 | 0.7076 | 0.9201 |
| 8 | 0.7042 | 0.9196 |
| 10 | 0.7018 | 0.9184 |
Maximum quality drop from position 1 to position 10: only 2%. ANOVA testing confirms no statistically significant difference (F=1.23, p=0.287).
Implication for SEO strategy: Ranking anywhere on page one gives you AI visibility. The obsessive focus on top-3 positions may be outdated in the AI search era. A position 8 ranking for a long-tail query is as valuable for AI citation as position 1, because Gemini considers the full result set when synthesizing answers.
This fundamentally changes keyword targeting. Competitive head terms where you rank position 6-10 are now viable AI visibility targets, whereas traditional SEO would write them off.
Content Length Is Irrelevant to AI Inclusion
Correlation analysis:
cor(snippet_length, relevance_score) = +0.05
cor(page_length, relevance_score) = +0.02
Both correlations are negligible. Whether your page is 1,000 or 10,000 words makes no meaningful difference to snippet quality or AI inclusion probability. We tested this across length buckets:
- Short snippets (< 1,500 chars): mean relevance 0.7089
- Medium snippets (1,500-3,500 chars): mean relevance 0.7142
- Long snippets (> 3,500 chars): mean relevance 0.7098
The difference is statistically insignificant (p = 0.412).
Implication for content strategy: The “comprehensive content” strategy—writing 3,000+ word articles to outrank competitors—does not translate to AI visibility advantages. A focused 1,200-word article with clear topic boundaries performs equivalently to a 4,000-word comprehensive guide when fed to Gemini.
This doesn’t mean short content is always better. It means semantic coherence and query alignment matter more than raw word count. A 3,000-word article that stays tightly on-topic will outperform a 1,500-word article that digresses, but not because of length—because of focus.
Finding 5: Multi-Snippet Synthesis Is The Norm
The average query resulted in 6.5 snippets being analyzed. Google doesn’t pick one “winner” to show Gemini—it provides multiple sources. This multi-source approach enables:
- Cross-reference validation (checking facts across sources)
- Diverse perspective synthesis (combining viewpoints)
- Coverage of subtopics (different snippets address different query aspects)
All 6.5 snippets maintain consistently high quality (no significant quality degradation across positions). This means Gemini receives a rich, multi-source context for every query.
Implication for competitive analysis: You’re not competing for “the” AI answer position—you’re competing to be included in the synthesis set. Multiple sources can all win AI visibility for the same query. The pie is bigger than traditional position 1 optimization suggested.
The SERP→Model Architecture: How Google AI Mode Actually Works
Based on these findings, we can infer the architecture of Google’s search-grounded AI responses:
Stage 1: Traditional Search Ranking
Google’s core search algorithm ranks pages as usual. The top 10 results are determined by traditional ranking factors (relevance, authority, user signals, etc.).
Stage 2: Snippet Extraction
For each ranked page, Google’s extraction algorithm identifies the most query-relevant segment. This extraction:
- Targets positive semantic tension (snippet more relevant than full page)
- Maintains high representativeness (>90% fidelity to source)
- Operates independent of length (extracts what’s semantically coherent)
- Preserves topic boundaries (doesn’t split concepts)
Stage 3: Multi-Source Context Assembly
The top 8-10 snippets are assembled into a context package. This package:
- Provides diverse sources (not just position 1)
- Maintains quality across all positions
- Enables cross-reference validation
- Totals ~18,000-25,000 characters of focused, relevant content
Stage 4: Model Synthesis
Gemini receives this multi-snippet context and synthesizes a grounded answer. The model:
- Can cite specific sources (it knows which snippet came from which URL)
- Benefits from pre-filtered, relevant content (not full pages)
- Works with high signal-to-noise ratio (92% representative snippets)
- Processes 68% fewer tokens than full-page alternatives
This architecture explains why traditional SEO rankings still matter (Stage 1) but why snippet optimization is increasingly critical (Stage 2).
Implications for AI SEO and Answer Engine Optimization
For Content Creators
1. Write for snippet extraction, not just human readers.
Your opening paragraphs, section introductions, and topic sentences are extraction targets. Ask: “If an AI saw only these 200 words, could it understand and cite my key point?”
Bad for AI: “Before we dive into the main topic, let me tell you a story about…”
Good for AI: “The primary causes of X are Y and Z. Here’s why…”
2. Use explicit structure and semantic markers.
Headings, lists, and structured data help Google identify extraction boundaries. A well-structured page produces better snippets:
<h2>What is [Topic]?</h2>
<p>[Concise definition with key details]</p>
<h2>Why [Topic] Matters</h2>
<ul>
<li><strong>Reason 1:</strong> [Explanation]</li>
<li><strong>Reason 2:</strong> [Explanation]</li>
</ul>
This structure lets Google extract clean, coherent snippets that maintain high representativeness.
3. Optimize for topical coherence over length.
Don’t pad content to hit arbitrary word counts. A tightly focused 1,500-word article outperforms a rambling 3,500-word article for AI visibility. Every paragraph should advance the core topic.
4. Embrace multi-source strategies.
Since positions 1-10 all contribute to AI answers, you can:
- Target multiple related queries with different focused articles
- Accept page-one rankings instead of obsessing over position 1
- Create complementary content that addresses different query aspects
- Build topical authority across a cluster of related rankings
Traditional SEO: One article targeting one keyword to rank position 1
AI SEO: Five articles targeting related queries, all ranking positions 3-8, all contributing to AI answer synthesis
For SEO Practitioners
1. Track AI visibility metrics, not just rankings.
Traditional rank tracking is insufficient. New metrics to monitor:
- Snippet inclusion rate: What % of your rankings produce snippets in AI answers?
- Citation frequency: How often does your domain get cited in AI responses?
- Representativeness score: Do your snippets accurately reflect your full content?
- Semantic tension: Are your snippets more relevant than your full pages?
2. Audit for semantic coherence.
Use embedding models to measure:
- How well each section of your content aligns with target queries
- Whether your highest-alignment sections are extractable as clean snippets
- If your full page semantics match your snippet-level semantics
Pages with low representativeness (snippet doesn’t match page) are AI visibility risks.
3. Optimize for the synthesis set, not the top position.
Competitive analysis should identify:
- Which competitors get included in AI answers (not just who ranks #1)
- What combination of perspectives the AI synthesizes
- Where gaps exist that your content could fill
A position 7 ranking that fills a synthesis gap is more valuable than a position 3 ranking that duplicates existing sources.
4. Prepare for SERP fragmentation.
Traditional organic results and AI-synthesized answers will coexist. Some queries will predominantly show AI answers (informational), others will show traditional results (transactional, navigational). Strategy must account for both:
- Informational queries: Optimize for snippet extraction and AI synthesis
- Transactional queries: Optimize for traditional CTR and conversion
- Navigational queries: Focus on brand authority and direct traffic
For Publishers and Platforms
1. The snippet is your distribution unit.
In traditional web publishing, the page is the atomic unit. You optimize pages, share pages, monetize pages. In AI-mediated search, the snippet is the atomic unit. Your content’s value is increasingly determined by:
- How well snippets extract from it
- How often those snippets get cited
- How accurately those snippets represent your authority
This shifts editorial strategy toward modular, snippet-optimized content sections rather than long-form narrative structures.
2. Attribution and citation become critical.
When AI answers synthesize multiple sources, clear attribution is your brand visibility. Optimize for:
- Distinctive voice and perspective (so citations are recognizable as yours)
- Quotable, specific claims (so AI models cite you directly)
- Data and statistics (highly citable elements)
- Expert credentialing (so the AI identifies you as an authority)
Generic, consensus content blends into synthesis invisibly. Specific, authoritative, distinctive content gets cited.
3. The pageview metric is dying.
If users get answers from AI synthesis without clicking through, traditional pageview-based business models suffer. Alternative metrics and monetization approaches:
- Brand impressions in AI citations
- Attribution-based compensation models
- Premium content beyond free AI reach
- Interactive experiences AI cannot replicate
Publishers must adapt to AI-mediated distribution or face declining direct traffic.
What Drives Semantic Tension?
The Mathematics of Snippet Superiority
Why do snippets outperform full pages 67.6% of the time? Three factors:
1. Signal concentration: Snippets eliminate off-topic content, noise, and digression. The compression ratio averages 32% (snippets are 32% the length of source pages), but semantic relevance increases. This is positive compression—the algorithm selectively preserves high-signal content.
2. Query alignment: Snippets are extracted with the specific query in mind (at least implicitly, through search ranking signals). Full pages address broader topics. The snippet-query alignment is by design; the page-query alignment is often incidental.
3. Boundary coherence: Good snippets respect semantic boundaries—they don’t start mid-sentence or split concepts. This coherence improves embedding quality. Full pages often have weak transitions, tangential sections, or introductory fluff that dilutes embedding quality.
Mathematically:
Relevance(snippet) = Signal(snippet) / Noise(snippet)
Relevance(page) = Signal(page) / Noise(page)
Signal(snippet) ≈ Signal(page) [high representativeness]
Noise(snippet) < Noise(page) [focused extraction]
Therefore: Relevance(snippet) > Relevance(page)
When Snippets Fail: The 32.4% Negative Tension Cases
Not all snippets outperform their source pages. The 32.4% negative tension cases (where Δ < 0) reveal important failure modes:
Type 1: Context-dependent content (43% of failures)
The snippet requires surrounding context to make sense. Example: A snippet discussing “the second approach” without explaining what the first approach was.
Type 2: Incomplete coverage (29% of failures)
The query requires synthesizing multiple page sections, but the snippet only extracts one section.
Type 3: Ambiguous extraction (18% of failures)
The extraction boundary splits a concept or includes transitional language that reduces clarity.
Type 4: Multi-topic pages (10% of failures)
The page covers multiple distinct topics. The snippet extracts one topic, but the full page’s multiple topics collectively better match the query.
Implication: These failure modes are addressable through content structure. Pages with clear topic separation, self-contained sections, and minimal context dependencies produce better snippets.
The Representativeness Paradox
High representativeness (92%) coexisting with positive semantic tension (snippets more relevant than pages) seems contradictory. How can a snippet be more relevant while also being highly representative?
The resolution: representativeness measures semantic fidelity, not topical focus.
A page might say: “Topic X involves A, B, and C. [500 words on A] [500 words on B] [500 words on C].”
If the query asks specifically about B, the snippet extracts the B section. This snippet:
- Is highly representative (faithfully captures the page’s position on B)
- Shows positive tension (more relevant to the query because it focuses on B)
Representativeness is not about including everything the page says—it’s about accurately reflecting what the page says about the snippet’s topic. A 1,000-word page about three topics can produce three different 300-word snippets, each highly representative of the page’s treatment of that specific topic.
This is why content with clear topical sections outperforms undifferentiated content. Each section can serve as a high-quality snippet for related queries.
The Future of Search: From SERP Clicks to Model Context
These findings illuminate a fundamental shift in how search engines operate:
Traditional Search (Web 1.0 – Web 2.0)
- User enters query
- Search engine ranks pages
- User sees 10 blue links
- User clicks through to pages
- User synthesizes information across multiple page visits
Optimization target: Click-through rate from position 1-3
AI-Mediated Search (Web 3.0)
- User enters query
- Search engine ranks pages
- Search engine extracts snippets
- AI model synthesizes snippets into answer
- User receives synthesized answer with citations
Optimization target: Snippet inclusion and citation in AI synthesis
The shift is from SERP→User→Content to SERP→Model→User. The model becomes the intermediary consumer of your content. The user consumes the model’s synthesis, not your content directly.
Implications:
For creators: Your audience is increasingly the AI model, not the end user. Write for machine understanding while maintaining human readability.
For platforms: The SERP itself becomes a product, not just a navigation layer. Google AI Mode isn’t replacing search—it’s upgrading what the SERP delivers.
For users: Increased efficiency (answers without clicks) but decreased serendipity (no browsing multiple sources). AI answers optimize for accuracy and speed at the cost of exploration.
For the web ecosystem: Potential traffic decline to original sources. If AI provides sufficient answers, click-throughs decrease. Publishers must find value beyond being source material for AI synthesis.
Limitations and Future Research
This analysis has constraints:
1. Single AI model: We only tested Gemini with Google Search grounding. Other AI search products (Perplexity, You.com, Bing Copilot) may exhibit different snippet selection behaviors.
2. English-only queries: Multilingual and cross-lingual search may show different patterns.
3. Informational queries only: We did not analyze transactional or navigational queries, which may not benefit from snippet synthesis approaches.
4. Snapshot in time: Google’s algorithms evolve. These patterns reflect November 2025 behavior and may shift.
5. No user outcome testing: We measured semantic metrics, not user satisfaction. High semantic tension doesn’t guarantee users prefer AI answers over clicking through to sources.
Future research should:
- Compare snippet behavior across Perplexity, You.com, ChatGPT search, and Bing
- Test multilingual queries and translation quality
- Measure user satisfaction with snippet-based vs full-page answers
- Track how snippet optimization affects traditional SEO metrics over time
- Investigate whether AI citation drives brand awareness and trust
Conclusion
Google’s AI Mode, AI Overviews, and answer engines don’t bypass search—they enhance how search results are consumed. The findings are clear:
- Snippets are what AI models see (not full pages)
- Snippets outperform full pages 68% of the time
- Positions 1-10 contribute roughly equally to AI synthesis
- Content length doesn’t determine AI inclusion
- Snippet optimization is the new frontier of SEO
The practical path forward for content creators and SEOs:
- Audit your content for snippet quality. Use embedding models to measure semantic tension and representativeness on your own pages.
- Restructure for semantic coherence. Clear topical sections, self-contained paragraphs, and explicit structure improve snippet extraction.
- Track AI visibility metrics. Monitor snippet inclusion and citation rates, not just traditional rankings.
- Optimize for the synthesis set. Target page-one rankings across related queries rather than position 1 dominance on single keywords.
- Prepare for the SERP→Model future. AI-mediated search is not replacing traditional search—it’s augmenting it. Strategy must address both channels.
The semantic tension framework provides a rigorous method for measuring and optimizing content for AI visibility. By quantifying the gap between snippet and page relevance, we identify exactly what makes content AI-friendly. This precision enables targeted optimization rather than speculative adjustments.
For those who adapt, AI search represents expanded visibility opportunities. Positions 4-10 become viable. Multi-query strategies scale better. Content gets discovered through synthesis rather than requiring direct clicks.
For those who don’t adapt, AI search represents risk. Traffic declines as users get answers without clicking. Rankings without snippets become invisible. Traditional SEO tactics fail to translate to AI visibility.
The data points one direction: optimize for snippets, embrace synthesis, and prepare for search where the model is the primary consumer of your content.
The SERP→Model era has begun.
Appendix: Methodology Details
Embedding Model Configuration
Model: google/embedding-gemma-300m
Precision: bfloat16 (GPU) or float32 (CPU)
Context window: 2,048 tokens (truncation enabled)
Normalization: L2 norm applied to all embeddings
Similarity metric: Cosine similarity
Query Selection
Queries were sampled from:
- Informational search patterns (how, what, why, when)
- Technical topics (to maximize grounding need)
- Current events (to trigger fresh search results)
- Long-tail queries (to test specificity)
All queries triggered Gemini grounded search responses with citations.
Data Collection Pipeline
def collect_semantic_tension_data(query):
# Get Gemini response with grounded search
response = gemini.generate(query, grounding=True)
# Extract cited sources and their snippets
snippets = extract_search_snippets(response.grounding_metadata)
# Fetch full page content for each snippet
pages = fetch_full_pages(snippets)
# Compute embeddings
emb_query = embed(query)
emb_snippets = [embed(s) for s in snippets]
emb_pages = [embed(p) for p in pages]
# Calculate metrics
for snip, page, emb_s, emb_p in zip(snippets, pages, emb_snippets, emb_pages):
cos_p_s = cosine_sim(emb_query, emb_s)
cos_p_g = cosine_sim(emb_query, emb_p)
delta = cos_p_s - cos_p_g
rep = cosine_sim(emb_s, emb_p)
record_metrics(query, snip, page, cos_p_s, cos_p_g, delta, rep)
Statistical Testing
All hypothesis tests use α = 0.05 significance threshold.
- T-tests: Paired tests for snippet vs page comparisons
- ANOVA: One-way ANOVA for rank position effects
- Correlation: Pearson correlation for continuous variables
- Normality: Shapiro-Wilk tests for distribution assumptions
- Python 3.10+
- transformers >= 4.35.0
- torch >= 2.0.0
- numpy >= 1.24.0
- pandas >= 2.0.0
- scipy >= 1.11.0
Analysis conducted November 2025. Dataset: 213 Google queries with grounded Gemini responses, 1,394 snippet-page pairs. Compute: NVIDIA RTX 4090. All embeddings computed using embedding-gemma-300m with bfloat16 precision.
Related Reads
- Analysis of grounding context snippet generation.
- Google’s AI Mode content fetching mechanisms.
The Visual Guide

Chart 1: Distribution of Snippet-Page Similarity
1. What is this
This is a histogram showing how well snippets represent their full source pages. The x-axis shows similarity scores from 0 to 1 (where 1 is perfect match), and the y-axis shows how many queries fall into each range. The red dashed line marks the average at 0.916, meaning snippets capture 91.6% of their source page’s meaning on average. Most bars cluster on the right side, showing high similarity.
2. How it was generated
Data: 213 queries from your aggregate-level CSV file.
Process:
- For each query, we took all the snippets Google returned
- We calculated how similar each snippet was to its full source page using embeddings
- Embeddings are mathematical representations of text created by Google’s embedding-gemma-300m model
- We used cosine similarity to measure how close the snippet’s meaning is to the page’s meaning
- We averaged these scores for each query
- We plotted all 213 average scores as a histogram
3. What do I do with this information
If your data shows high similarity (most bars on the right, above 0.9): Your content structure is working well. Google can extract representative snippets from your pages. Keep doing what you’re doing.
If you see scores below 0.8: You have a problem. Your snippets don’t accurately represent your pages, which means:
- AI might misrepresent your content when citing you
- Your pages cover too many unrelated topics
- Your content structure needs fixing
Action: Find the pages with low representativeness scores (bottom 20%) and audit them. Look for pages that jump between multiple topics or bury important information. Restructure these pages to have clear, focused sections.
Chart 2: Prompt vs Snippet Similarity

1. What is this
This scatter plot compares overall search quality to snippet quality. Each dot represents one query. The x-axis shows how well the entire set of search results matches the query. The y-axis shows how well just the snippets match the query. The red diagonal line represents “equal performance” – dots above the line mean snippets perform better than overall results. Colors show snippet-page similarity (yellow = high, purple = low).
2. How it was generated
Data: 213 queries from your aggregate-level CSV file.
Process:
- For each query, we calculated similarity between the query and all search results combined (x-axis)
- We calculated similarity between the query and just the snippets (y-axis)
- Both calculations use embeddings (mathematical text representations) and cosine similarity
- We colored each dot based on how well snippets represent their source pages
- We added the diagonal reference line to show where snippet performance equals overall performance
3. What do I do with this information
If most dots are on or above the diagonal line: Your snippet extraction is working well. Snippets are as good as or better than full results for matching queries. This is ideal for AI visibility.
If many dots fall below the line: Your full search results are better than their extracted snippets. This means:
- The extraction is choosing the wrong sections
- Your best content isn’t in easily extractable locations
- You need clearer structure (H2/H3 headings) to help extraction work better
Action: For queries where dots are far below the line, identify which pages were returned. Check where the most valuable content is located on those pages. Restructure to move key information to the top or into clearly marked sections with descriptive headings.
Chart 3: Distribution of Snippet Counts per Query

1. What is this
This bar chart shows how many snippets Google typically returns per query. The x-axis lists the number of snippets (1 through 10), and the y-axis shows how many queries returned that count. You can see most queries return either 4, 8, or 10 snippets. Very few queries return just 1-3 snippets. The average is 6.5 snippets per query.
2. How it was generated
Data: 213 queries from your aggregate-level CSV file.
Process:
- For each of the 213 queries, we simply counted how many snippets Google returned
- We grouped queries by their snippet count
- We plotted the frequency of each count as bars
- This is straightforward counting – no complex calculations needed
3. What do I do with this information
This fundamentally changes SEO strategy. Google shows Gemini multiple snippets (average 6.5), not just the top result. This means:
- It’s not winner-take-all anymore
- Multiple sources contribute to AI answers
- Position 2-10 have real value now
- You don’t need to rank #1 to get AI visibility
Action: Stop obsessing over position 1 for every keyword. Instead:
- Target clusters of related long-tail queries where you can rank page one
- Build multiple focused articles on related topics instead of one massive article trying to rank #1
- Accept positions 5-8 as valuable AI visibility opportunities, not losses
- Focus on being “one of the good sources” rather than “the only source”
Chart 4: Snippet vs Page Similarity Comparison

1. What is this
This scatter plot directly compares snippet quality (x-axis) to full page quality (y-axis) for each query. Each dot is one query. The black diagonal line means “equal quality” – dots above the line mean full pages performed better than snippets, dots below mean snippets performed better. The colors show snippet-page representativeness (red = high, blue = low). Most dots cluster along or below the diagonal.
2. How it was generated
Data: 213 queries from your aggregate-level CSV file.
Process:
- For each query, we calculated average similarity between the query and all snippets (x-axis)
- We calculated average similarity between the query and all full pages (y-axis)
- Both use embedding-based cosine similarity measurements
- We colored each dot by how representative the snippets are of their pages
- We added the diagonal line as a reference for equal performance
3. What do I do with this information
Dots below the line (snippets win): Your extraction is adding value by focusing content. The snippet is more relevant than the bloated full page. This is good.
Dots above the line (pages win): Full pages are more relevant than their snippets. This happens when:
- The query needs information from multiple page sections
- The best content isn’t in extractable chunks
- The page has comprehensive coverage that gets diluted when extracted
Most dots should be red/orange colored: This means even when snippets are more focused, they still accurately represent the full page content.
Action:
- For dots far above the line: These pages have scattered information. Consolidate related information into clear sections.
- For dots below the line with blue/purple colors: These snippets outperform pages but don’t represent them well. Usually means the page covers too many topics. Split multi-topic pages into focused single-topic pages.
Chart 5: Similarity Quality by Number of Snippets

1. What is this
This box plot shows whether having more snippets per query hurts or helps quality. Each green box represents queries grouped by snippet count (1, 2, 3, etc.). The box shows the range of quality scores for that group – the line in the middle is the median, the box shows the middle 50% of values, and circles show outliers. The y-axis measures snippet-page representativeness from 0 to 1.
2. How it was generated
Data: 213 queries grouped by their snippet count.
Process:
- We grouped queries by how many snippets they returned (1 snippet, 2 snippets, etc.)
- For each group, we collected all the snippet-page representativeness scores
- We calculated statistical distributions (median, quartiles, outliers)
- We plotted these distributions as box plots
- The triangle marks show the mean for each group
3. What do I do with this information
Key finding: Quality stays consistently high across all snippet counts. More snippets does NOT dilute quality.
What this means:
- Google can safely show Gemini 8-10 snippets without quality degradation
- Multiple sources provide better coverage, not noise
- The AI benefits from diverse perspectives on the same query
For your content strategy:
- Stop fighting to be “the one answer” for a query
- Build topic clusters with multiple pages on related angles
- All your pages ranking on page one contribute to AI synthesis
- Embrace multi-source strategies instead of single-hero-page approaches
Action: Review your content strategy. If you’re trying to create one massive comprehensive article to dominate a topic, consider splitting it into 3-5 focused articles targeting related queries. Each can rank page one and contribute to AI answers.
Chart 6: Feature Correlation Matrix

1. What is this
This heatmap shows how different metrics relate to each other. Each cell shows the correlation between two metrics. Red means positive correlation (when one goes up, the other goes up), blue means negative correlation (inverse relationship), white means no correlation. The numbers show correlation strength from -1 to +1. The diagonal is always 1.0 because everything correlates perfectly with itself.
2. How it was generated
Data: 213 queries with 7 metrics each.
Process:
- We took 7 key metrics: sim_prompt_all, snip_mean, snip_std, page_mean, page_std, snip_page_mean, and n_snippets
- We calculated Pearson correlation coefficients between every pair of metrics
- This measures how much two metrics move together
- We visualized the correlation matrix as a color-coded heatmap
- The matrix is symmetric – top-right mirrors bottom-left
3. What do I do with this information
Key correlations to notice:
sim_prompt_all and snip_mean (0.93): Strong positive correlation. When overall search quality is high, snippet quality is high. This is expected.
page_std and snip_page_mean (-0.65): Strong negative correlation. When page similarity scores vary wildly (inconsistent quality across pages), snippet-page alignment suffers. This tells you inconsistent pages produce worse snippets.
n_snippets and everything else (around 0.05): Almost no correlation. Number of snippets doesn’t predict quality. More is not better or worse.
What to do:
- Aim for consistency within each page – don’t mix high-value and low-value content
- Create focused pages with one clear purpose
- Don’t pad content with more snippets thinking it improves things
- Stop writing multi-topic pages that would have high page_std scores
Action: Audit your pages for topical consistency. Pages that jump between multiple unrelated topics or mix quality levels will hurt snippet extraction. Split them into focused single-topic pages.
Chart 7: Semantic Tension (Delta Distribution)

1. What is this
This histogram shows “semantic tension” – the mathematical difference between how well a snippet matches a query versus how well the full page matches. The x-axis shows delta values (positive means snippet wins, negative means page wins). The red line at zero represents equal performance. The orange dashed line shows the average delta of +0.0253. The yellow box highlights that 67.6% of snippets have positive delta (they outperform their source pages).
2. How it was generated
Data: 1,394 individual snippets from your granular CSV file.
Process:
- For each snippet, we calculated its similarity to the query using embeddings
- We calculated the full page’s similarity to the same query
- We subtracted: Delta = (snippet-query similarity) – (page-query similarity)
- Positive delta means the snippet is more relevant than the full page
- We plotted all 1,394 delta values as a histogram
- We marked the zero line (equal) and mean line for reference
3. What do I do with this information
This is THE critical metric for AI visibility. Positive delta means Google’s extraction is working – it’s pulling the most relevant parts and giving them to Gemini.
Why snippets win (67.6% of cases):
- Focus: Snippets remove fluff, ads, navigation, tangents
- Relevance: Extraction targets the most query-relevant section
- Coherence: Good snippets respect topic boundaries
For your content:
- Front-load value: Put your best content in the first few paragraphs
- Use clear structure: H2/H3 tags help define extraction boundaries
- One topic per section: Each section should be independently valuable
- Kill the storytelling intros: “Let me tell you about the time…” is death for AI
Negative delta cases (32.4%): The full page is more relevant. This happens when the query needs information from multiple sections or context from elsewhere.
Action: Audit pages with negative delta. Your most valuable content is probably buried mid-article or scattered across sections. Restructure to make your best content snippet-friendly (extractable as a coherent 200-300 word chunk).
Chart 8: Does Rank Position Affect Quality?

1. What is this
This line chart shows quality metrics across rank positions 1-10. The blue line (left y-axis) shows how well snippets match queries. The orange line (right y-axis) shows how well snippets represent their source pages. Both lines stay relatively flat across all positions, meaning quality doesn’t significantly drop from position 1 to position 10.
2. How it was generated
Data: 1,394 individual snippets grouped by rank position.
Process:
- We grouped all snippets by their rank position (1st result, 2nd result, etc.)
- For each rank group, we calculated the average snippet-query similarity
- We calculated the average snippet-page representativeness
- We plotted both metrics across ranks 1-10
- We used dual y-axes because the metrics have different scales
- The chart only shows ranks 1-10 for clarity
3. What do I do with this information
This changes everything about SEO strategy.
Traditional SEO: You MUST rank top 3 because click-through rates drop dramatically after that.
AI search reality: Positions 1-10 are nearly equal in quality. Google feeds all of them to Gemini.
The numbers:
- Position 1: 0.7164 relevance
- Position 5: 0.7076 relevance
- Position 10: 0.7018 relevance
- Quality drop: Only 2% (essentially negligible)
What this means:
- Position 7 now has real value for AI visibility
- You don’t need position 1 to contribute to AI answers
- Page-one presence is enough
- Competitive keywords where you rank 4-10 are now valuable
Strategic shift:
- Old: Fight for position 1-3 on high-volume keywords
- New: Secure page-one presence across multiple related queries
Action: Review all keywords where you currently rank positions 4-10. These are NOW valuable for AI citations (previously considered “losses”). Stop abandoning these keywords. Instead, invest in maintaining these page-one positions and add more related page-one rankings rather than fighting for position 1 on one keyword.
Chart 9: Does Snippet Length Affect Quality?

1. What is this
This scatter plot shows snippet length on the x-axis (in characters) and snippet quality on the y-axis (how well it matches the query). Each dot is one snippet. The colors show representativeness (yellow/green = high). If length mattered, you’d see a clear upward or downward trend – dots moving up or down as you go right. Instead, the dots are randomly scattered with no pattern.
2. How it was generated
Data: 1,394 individual snippets from your granular CSV.
Process:
- For each snippet, we measured its length in characters (x-axis)
- We measured its similarity to the query using embeddings (y-axis)
- We colored each dot by how well the snippet represents its source page
- We limited the x-axis to 10,000 characters for readability
- No trend line was added because there’s no meaningful correlation (r = 0.05)
3. What do I do with this information
THE BIG TAKEAWAY: Length doesn’t matter for quality.
The correlation between length and quality is 0.05 – essentially zero.
What this means:
- A 1,500-character snippet can be just as good as a 4,000-character snippet
- Short snippets aren’t worse quality
- Long snippets aren’t automatically better
- The “comprehensive long-form content” strategy doesn’t guarantee AI visibility
Why this matters:
- Stop padding word count to hit arbitrary targets (3,000+ words)
- A tight 1,200-word article can outperform a rambling 4,000-word guide
- Semantic density matters more than length
- Quality is about relevance and focus, not volume
What drives quality instead:
- Topical focus: Stay on one clear topic
- Query alignment: Match what users are actually asking
- Clear structure: Make extraction easy with headers
- Coherence: Each section should stand alone
Action: Audit your content strategy. If you’re writing long-form content just to be “comprehensive,” stop. Instead, write focused content that directly addresses specific queries, regardless of final word count. A focused 1,500-word article beats an unfocused 4,000-word article for AI visibility.
Chart 10: How Well Do Snippets Represent Their Source Pages?

1. What is this
This histogram shows representativeness scores – how faithfully snippets capture their source pages’ meaning. The x-axis goes from 0 (not representative at all) to 1 (perfectly representative). The y-axis shows frequency. The red line marks the mean at 0.92 (92%). The green box notes that 88.7% of snippets score above 0.9. The distribution is heavily skewed left with most values clustered at the high end.
2. How it was generated
Data: 1,394 individual snippets.
Process:
- For each snippet, we created an embedding (mathematical representation) of its text
- We created an embedding of its full source page
- We calculated cosine similarity between these embeddings
- This gives a 0-1 score of how much the snippet captures the page’s meaning
- We plotted all 1,394 representativeness scores as a histogram
- We added the mean line and calculated the percentage above 0.9
3. What do I do with this information
This measures extraction quality and content integrity.
High representativeness (>0.90, which is 88.7% of cases): The snippet accurately reflects what the page is about. When Gemini cites your page based on a snippet, the citation is accurate to your actual content.
Medium representativeness (0.80-0.90, which is 9.1%): The snippet is somewhat representative but might miss important nuances or context.
Low representativeness (<0.80, which is 2.2%): RED FLAG. The snippet doesn’t match the page well. This happens when:
- The page covers multiple unrelated topics
- Google’s extraction grabbed the wrong section
- Important context is scattered throughout the page
Why this matters:
- AI accuracy depends on representative snippets
- Accurate citations build trust in AI answers
- Low scores reveal structural problems in your content
Action:
- Find your bottom 2.2% of snippets (representativeness < 0.80)
- These are your problem cases – only 31 snippets out of 1,394 but worth investigating
- Manually review these pages to understand why snippets don’t match
- Common fixes: split multi-topic pages into focused single-topic pages, move key information to better locations, add clear section markers
Chart 11: Do Long Pages Hurt Snippet Quality?

1. What is this
This scatter plot shows page length on the x-axis (in characters) and snippet representativeness on the y-axis. Each dot represents one snippet-page pair. Colors show semantic tension (red/pink means page performed better, blue means snippet performed better). If long pages hurt extraction, you’d see representativeness decline as you move right. Instead, dots maintain high values across all page lengths.
2. How it was generated
Data: 1,394 snippet-page pairs with length information.
Process:
- X-axis: Each snippet’s source page length in characters
- Y-axis: That snippet’s representativeness score (how well it captures the page)
- Colors: Delta (semantic tension) – whether the snippet or page was more relevant to the query
- We capped the x-axis at 30,000 characters for visual clarity
- No trend line because there’s no significant correlation (r = -0.07)
3. What do I do with this information
Good news: Page length doesn’t hurt snippet extraction quality.
What the data shows:
- Short pages (1,000 chars): produce good snippets
- Medium pages (5,000 chars): produce good snippets
- Long pages (20,000+ chars): still produce good snippets
- No quality degradation as length increases
Why this matters:
- Long-form content is safe: If you prefer writing comprehensive 5,000+ word guides, extraction still works well
- Concise content also works: Shorter 1,500-word articles extract just as well
- The extraction algorithm is sophisticated: Google handles both short and long sources effectively
What DOES matter (since length doesn’t):
- Topic coherence: Don’t mix unrelated topics regardless of length
- Clear sections: Use H2/H3 structure so extraction finds boundaries
- Front-loading: Put key information early, though long pages still work
Action: Stop worrying about page length as a factor in AI visibility. A well-structured 6,000-word article will produce better snippets than a poorly-structured 2,000-word article. Focus on structure and coherence, not hitting or avoiding certain word counts.
Chart 12: Snippet vs Page – Direct Comparison

1. What is this
This scatter plot directly compares each snippet to its full page. The x-axis shows snippet quality (similarity to query), the y-axis shows page quality (similarity to query). Each dot is one snippet-page pair. The black diagonal line means “equal” – dots below the line mean snippets are better, dots above mean pages are better. Colors show representativeness (yellow = high). Most dots fall below the line.
2. How it was generated
Data: 1,394 snippet-page pairs.
Process:
- For each snippet, we calculated its similarity to the query using embeddings (x-axis)
- We calculated its source page’s similarity to the same query (y-axis)
- We colored each dot by snippet-page representativeness
- We added the diagonal equality line for visual reference
- Both axes use the same 0-1 cosine similarity scale
3. What do I do with this information
This validates the semantic tension concept.
Most dots below the line: Snippets outperform their source pages 67.6% of the time.
What this means:
- Extraction ADDS VALUE by focusing content
- Google successfully identifies and isolates the most relevant portions
- Gemini gets better input from snippets than from full pages
- Your snippet quality (x-axis position) determines AI citations
The color pattern matters:
- Yellow dots below the line = IDEAL: Snippet wins AND accurately represents the page
- Purple dots below the line = RISKY: Snippet wins but doesn’t represent page well (cherry-picking)
- Yellow dots above the line: Page wins, but snippet still captures the essence
Why this matters:
- Gemini sees the snippet, not the full page
- Your x-axis position (snippet quality) drives AI visibility
- Pages that produce dots above the line are losing potential AI citations
Action:
- Identify dots that fall above the line (pages winning)
- Ask: “Why isn’t my snippet as good as my full page?”
- Usually means: best content is scattered throughout the page or buried
- Solution: Restructure so the most relevant content is in a clear, extractable section near the top or with clear H2 markers
Chart 13: Performance Delta by Rank Position

1. What is this
These box plots show semantic tension (snippet quality minus page quality) across rank positions 1-10. Each green box represents one rank. The y-axis shows delta values – positive means snippets outperform pages, negative means pages outperform snippets. The red dashed line at zero represents equal performance. All boxes hover around or above zero, showing snippets consistently win across all ranks.
2. How it was generated
Data: 1,394 snippets grouped by rank position.
Process:
- For each rank position (1 through 10), we collected all delta values
- Delta = (snippet-query similarity) – (page-query similarity)
- We calculated the distribution of deltas for each rank
- We plotted as box plots showing median, quartiles, and outliers
- The box shows the middle 50% of values, circles are outliers, diamonds are means
3. What do I do with this information
Key finding: Snippets outperform pages consistently across ALL rank positions, not just the top ones.
What this tells you:
- Even rank 8-10 snippets beat their source pages
- Google’s extraction quality doesn’t degrade at lower ranks
- Gemini receives high-quality focused content regardless of rank position
Why this matters for SEO:
- Every page-one position contributes quality snippets to AI
- Lower ranks (positions 5-10) are viable AI visibility targets
- You don’t need position 1 to get your content into AI answers
- Multi-source synthesis works because all sources maintain quality
Strategic implication:
- Target long-tail queries where you can rank anywhere on page one
- Build topical authority across multiple page-one rankings
- Don’t abandon a keyword just because you’re position 7
- The AI considers all page-one results when synthesizing answers
Action:
- Review keywords where you rank positions 5-10
- These are now valuable AI visibility opportunities (not “losses”)
- Create complementary content to build page-one presence on related queries
- Stop fighting for position 1 if you can secure multiple page-one positions on related terms
Chart 14: Quality Quadrants – Where Do Your Snippets Fall?

1. What is this
This scatter plot divides snippets into four quality categories using median values as dividers. The x-axis shows snippet relevance to the query, the y-axis shows snippet representativeness of the page. Black dashed lines mark the medians. Green dots (top-right) are ideal – high relevance and high representativeness. Orange dots (bottom-right) are relevant but not representative. Yellow dots (top-left) are representative but not relevant. Red dots (bottom-left) are poor on both metrics.
2. How it was generated
Data: 1,394 individual snippets.
Process:
- X-axis: Snippet-query similarity for each snippet
- Y-axis: Snippet-page representativeness for each snippet
- We calculated the median value for both metrics
- We drew lines at the medians to create four quadrants
- We colored dots based on which quadrant they fall into
- Green = both high, orange = relevant only, yellow = representative only, red = both low
3. What do I do with this information
This is your diagnostic tool for content quality.
GREEN quadrant (top-right) – IDEAL:
- High relevance to the query
- High fidelity to the source page
- These snippets drive AI citations
- This is where you want to be
ORANGE quadrant (bottom-right) – Relevant but not representative:
- Snippet matches the query well
- But doesn’t reflect the full page accurately
- Risky: AI might cite you but misrepresent your content
- Usually caused by: multi-topic pages or cherry-picked snippets
YELLOW quadrant (top-left) – Representative but not relevant:
- Snippet accurately reflects the page
- But the page isn’t that relevant to the query
- Means: extraction worked fine, but you’re targeting the wrong keywords
- Usually caused by: general content or poor keyword targeting
RED quadrant (bottom-left) – POOR:
- Low relevance AND low representativeness
- Fundamental mismatch or extraction failure
- These are your problem cases requiring immediate attention
Action:
- Count your dots in each quadrant – calculate what % are green
- Target: 75%+ in the green quadrant
- For orange cases: Review pages for multiple topics, split them up
- For yellow cases: Improve keyword targeting or make content more specific
- For red cases: Complete audit needed – fundamental issues with content or targeting
Chart 15: Distribution of Snippet Ranks Used

1. What is this
This bar chart shows how frequently each rank position (1 through 10) appears in the dataset. The x-axis lists rank positions, the y-axis shows the count of snippets at each position. You can see that positions 1, 3, 4, 5, 8, and 10 are most common, with some variation. This represents Google’s snippet selection patterns across all queries.
2. How it was generated
Data: 1,394 snippets with rank position labels.
Process:
- For each snippet in the dataset, we recorded which rank position it came from
- We counted how many snippets came from each rank (1-10)
- We plotted these counts as bars
- This is simple frequency counting – no calculations needed
- Limited to ranks 1-10 for clarity
3. What do I do with this information
This reveals Google’s snippet selection patterns.
What the distribution shows:
- Google doesn’t always use all 10 results
- Some positions appear more frequently (1, 3, 4, 5, 8, 10)
- Selection is dynamic based on query needs
- Multiple sources are standard practice
Why this matters:
- Not every page-one ranking automatically becomes a snippet
- Google curates which results to extract from
- Just ranking isn’t enough – quality signals matter after ranking
- Multiple sources contribute but selection is quality-based
What affects snippet selection probability after you rank:
- Content structure: Clear H2/H3 sections help
- Query alignment: Focused topic match improves odds
- E-E-A-T signals: Authority and expertise markers
- Technical SEO: Proper markup and structured data
- Extractability: Content that can stand alone as a snippet
Action:
- Don’t assume every page-one ranking becomes an AI source
- Optimize ranked pages for snippet extraction quality
- Use schema markup and clear structure to signal snippet-worthiness
- Add H2/H3 headings that directly answer potential queries
- Monitor which of your ranked pages actually appear in AI answers (track citation rates)

Leave a Reply