Google uses dynamic retrieval to decide when Gemini models should use grounding. A prediction score and configurable threshold determine if a query needs search data.
When you ask Google's Gemini model a question, it wants to give you the most accurate and up-to-date answer possible. It does this through a process called grounding, which connects the model's responses to real-time information from Google Search.
But searching the web for every single query isn't practical. It adds lag time and increases costs, especially for simple questions that don't need fresh data. To solve this, Google uses a system called dynamic retrieval.
Before answering, the system evaluates your query and assigns it a prediction score between zero and one. A higher score means the query is highly likely to benefit from a live search. Developers can set a threshold, which defaults to zero point three. If the query's score meets or exceeds that threshold, the system triggers Google Search to ground the response. If the score is lower, Gemini simply relies on its own pre-trained knowledge.
This smart, selective approach keeps costs down and response times fast, while ensuring you get the most accurate, grounded answers when they matter most.
Google’s Gemini models are designed to provide users with accurate, timely, and trustworthy responses. A key innovation in this process is grounding, the ability to enhance model responses by anchoring them to up-to-date information from Google Search. However, not every query benefits from grounding, and Google has implemented a smart mechanism to decide when to activate this feature.
Even when grounding is available, grounding every query can lead to unnecessary cost and latency. To tackle this, Google uses a dynamic retrieval configuration that evaluates each query before deciding whether to ground the response. This configuration assigns each prompt a prediction score, a value between 0 and 1, that estimates the likelihood a query will benefit from grounding.
“…the dynamic retrieval configuration assigns the prompt a prediction score, which is a floating point value between 0 and 1. The value is higher when a prompt is more likely to benefit from grounding. In their requests, developers can set a threshold for what scores should result in grounding (the default threshold value is 0.3).”
This score-driven approach allows developers to fine-tune when grounding should be applied. For instance, if a query involves recent events or requires highly accurate data, it is more likely to receive a higher prediction score and trigger grounding. Conversely, queries that rely on general knowledge may bypass grounding, reducing unnecessary processing overhead.
The prediction score is at the heart of the decision-making process:
This dynamic evaluation ensures that grounding is applied selectively, enhancing the model’s accuracy and relevance only when necessary.
By using dynamic retrieval with a configurable threshold, Google achieves several benefits:
Google’s method for deciding whether to use Gemini grounding is a thoughtful balance between performance, cost, and response quality. By assigning a prediction score to each query and applying a configurable threshold, the dynamic retrieval system ensures that grounding is used judiciously, delivering richer and more accurate answers when they matter most.
Source: Google Developers Blog