Implicit Queries in AI Search

by

in ,

Back in 2015 I wrote about Google’s reliance of user behaviours signals for ranking purposes. In that article I already covered their use of implicit signals, but now there’s an update!

While investigating Google’s grounding pipeline (the system that feeds web content to Gemini before it generates an answer) I came across the same patent most of us already looked at (US11769017B1), titled “Generative summaries for search results”, filed March 2023 and assigned to Google LLC. Most of it describes the AI Overview pipeline we already know: select search result documents, extract content, feed it to an LLM, generate a summary, linkify portions back to sources. Standard grounding architecture.

But buried in the system description are two components that skipped my attention: the Context Engine and the Implied Input Engine.

The patent describes a client-side system architecture with named components. Here’s what it outlines, in Google’s own words:

The client device 110 can include a context engine 113 that is configured to determine a context (e.g., current or recent context) of the client device 110 and/or of a user of the client device 110.

This context engine monitors:

  • Current or recent interactions on the device
  • Device location
  • User profile data
  • Which application is active in the foreground
  • Content currently being rendered on screen
  • Current state of a query session

Then it feeds all of this into the next component:

The client device 110 can include an implied input engine 114 that is configured to: generate an implied query independent of any user input directed to formulating the implied query; to submit an implied query, optionally independent of any user input that requests submission of the implied query; and/or to cause rendering of result(s) for an implied query, optionally independent of any user input that requests rendering of the result(s).

Read that again. The system:

  1. Watches what you’re doing on your device
  2. Decides what you might want to know
  3. Formulates a search query you never typed
  4. Submits it silently
  5. Generates an AI summary from the results
  6. Pushes it to search, without you asking

The Example Google Gives

The patent provides a concrete example:

The implied query can be “patent news” based on profile data indicating interest in patents, the implied query periodically submitted, and a corresponding NL based summary result automatically rendered. It is noted that the provided NL based summary result can vary over time in view of e.g., presence of new/fresh search result document(s) over time.

So the system profiles your interests, generates a standing query, resubmits it at intervals, and auto-renders updated AI summaries as new content appears on the web. A personalised, recurring, AI-curated news feed, driven entirely by inferred intent.

It Gets More Specific

The context engine doesn’t just know what app you’re using. It knows what you’re looking at inside the app:

The context engine 113 can determine a current context based on which application is active in the foreground of the client device 110, a current or recent state of the active application, and/or content currently or recently rendered by the active application.

And it uses this to rewrite your actual queries or generate entirely new ones:

A context determined by the context engine 113 can be utilized, for example, in supplementing or rewriting a query that is formulated based on user input, in generating an implied query (e.g., a query formulated independent of user input), and/or in determining to submit an implied query and/or to render result(s) (e.g., an NL based summary) for an implied query.

The patent even describes the push mechanism:

The implied input engine 114 can automatically push result(s) to the implied query to cause them to be automatically rendered or can automatically push a notification of the result(s), such as a selectable notification that, when selected, causes rendering of the result(s).

What This Means for Search

This isn’t a search engine anymore. It’s an anticipatory information system. The shift is fundamental:

Traditional search: User has intent → types query → receives results.

This patent: Device observes behaviour → system infers intent → generates query → retrieves results → pushes AI summary.

The user never searches. The system decides what information to deliver, when to deliver it, and how to present it, all wrapped in an LLM-generated natural language summary grounded in real search results.

The Pipeline Behind It

For those following our grounding research, this patent describes the full architecture behind what we’ve been reverse-engineering from the API side:

  1. SRD Selection Engine — picks which search result documents to include
  2. LLM Selection Engine — chooses which model to use (informational, creative, or even image generation)
  3. LLM Input Engine — generates the prompt from selected content
  4. LLM Response Generation Engine — produces the summary
  5. Response Linkifying Engine — maps portions of the summary back to source documents using embedding distance
  6. Response Confidence Engine — assigns confidence scores to summary portions

This maps directly to the grounding metadata structure we’ve observed: source indices, snippets, confidence scores, and the redirect URLs through vertexaisearch.cloud.google.com.

The Confidence System

The patent also describes the confidence annotation system:

A portion with a high confidence measure can be annotated in a first color (e.g., green), a portion with a medium confidence measure can be annotated in a second color (e.g., orange), and a portion with a low confidence measure can be annotated in a third color (e.g., red).

And it uses confidence to decide whether to even show you the AI summary at all, or fall back to traditional search results:

If confidence measure(s) for portion(s) and/or a confidence measure for the NL based summary as a whole satisfies upper threshold(s) most indicative of confidence, the NL based summary can be rendered responsive to the query and without any initial rendering of any additional search results.

When confidence is high, search results are suppressed entirely. Only the AI summary appears.

The Evolving Summary

One more detail worth flagging. The patent describes a system where the AI summary evolves as you interact with search results:

The system generates a revised NL based summary based on processing revised input using the LLM or an additional LLM. The revised input reflects the occurrence of the interaction(s) with the search result document(s).

Click on a source about router IP addresses? The next version of the summary assumes you already know that and skips ahead to the next step. The LLM prompt is literally revised to include instructions like “assuming the user already knows X”.

The summary isn’t static. It’s a living document that rewrites itself based on your behaviour within the session.

Here’s what I take away from this:

  1. Google has patented the infrastructure for proactive AI search. Not reactive, proactive. The system generates and submits queries on your behalf based on behavioural signals.
  2. The grounding pipeline is designed to suppress traditional results when confidence is high enough. AI summaries aren’t a complement to search results, they’re architected to replace them.
  3. Content selection feeds through embedding distance, not keyword matching. If your content doesn’t land close to query embeddings in vector space, it won’t be selected as grounding material, regardless of how well it ranks in traditional search.
  4. The summary adapts to user behaviour in real-time. This creates a feedback loop where what content gets surfaced depends on what the user has already consumed.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *