What extraction method is Google using to build grounding snippets?

by

in ,

I’ve been reverse-engineering Google’s Gemini grounding pipeline (AI Mode, Gemini Chat…etc) by examining the raw groundingSupports and groundingChunks returned by the API. Specifically, I’m interested in the snippet construction step, the part where, given a query and a retrieved web page, the system selects which sentences to include in the grounding context supplied to the model.

From examining the extracted sentences against full source content, I’ve observed:

  • Query-focused selection: Sentences semantically close to the query are strongly preferred. Sections about “abstractive summarization” on the same page are skipped entirely.
  • Heavy positional / lead bias: Opening paragraphs are extracted almost wholesale regardless of content.
  • Structural noise ingestion: ToC entries, section headers, “link code.” artifacts, and markers are treated as sentences and scored alongside prose.
  • Sentence-level granularity: The unit of extraction is clearly individual sentences, not passages or paragraphs.
  • Confidence scores: Observed per-chunk confidence scores range from 0.1 to 1.0 representing grounding source to generative chunk relevance score.

Note: I’ve successfully fine-tuned microsoft/deberta-v3-large and it produces fairly similar results to what Google does. Here’s a demo.


Below: full pipeline diagram, raw grounding snippets, and one source article annotated to show which sentences were extracted (green) vs skipped.

Google’s Extractive Summarization in the Grounding Pipeline

Google’s extractive summarization takes place as part of their model grounding pipeline — the system that connects Gemini’s generative output to real web sources.

When a user enters a prompt, a query fanout model deconstructs it into single-intent queries — essentially a separation of concerns where a multi-faceted prompt is broken into individual dimensions of intent.

For each fanout query, Google’s search index returns a ranked list of relevant results. A selection step narrows these down to a limited set, typically 5–20 sources per query.

Here’s where the extractive summarization happens: for each selected result, the system builds a grounding snippet relative to the specific query. Page content is chunked into sentences, each chunk is scored against the query, and the highest-scoring chunks are assembled into the final snippet — joined by ellipses (...) where non-contiguous. Because the snippet is query-dependent, the same page will yield different extractions for different fanout queries.

The complete set of grounding snippets across all sources is then supplied to the model as grounding context, alongside the user prompt, any attached media, and personalization signals (history, user data, location, time, etc.).

Once the model synthesizes its final answer, each generative claim is supported by one or more grounding sources. Attribution annotation is attached by the system using internal indexation logic — mapping each claim back to specific source sentences.

The pipeline looks like this:

Annotated Content Example

Full Grounding Context Example


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *