- Can your favourite LLM accurately estimate query search volumes? No.
- Does it have a general idea? Yes.
We put Google’s Gemini to the test by comparing its keyword volume predictions to actual search data from Google Search Console (GSC). Here’s what we learned and how we did it.
How We Collected and Compared the Data
- Data Sources
- Predicted volumes: For each search query, we asked Google Gemini for a monthly search volume estimate and keyword difficulty, specifying the country for localization.
- Actual volumes: We extracted the real number of impressions for the same queries from our verified GSC property, aggregating over a full month.
- Automation Pipeline
- Queries were selected from GSC data using Python, focusing on top-performing keywords and filtering out outliers or brand terms as needed.
- For each query, the Gemini API was called to generate search volume and difficulty estimates.
- Results were automatically stored in a database, along with actual impressions, clicks, and positions from GSC.
- The analysis and all visualizations were produced using custom scripts and dashboards.

What Did We Find?
1. Direct Correlation Is Weak-to-Moderate
- Pearson correlation (linear): ~0.41
- Spearman correlation (rank order): ~0.57
AI predictions align better in rank (high vs. low), but aren’t reliably linear.
2. Bucket Accuracy: More Forgiving, Still Limited
- We grouped both Gemini and GSC volumes into 5 buckets: very low, low, medium, high, very high (using quantiles).
- Exact bucket match: Only 35% of predictions landed in the same bucket as reality.
- Exact or adjacent bucket: ~69% were at least “close” (the right bucket or one away).
- Accuracy varied by bucket: Middle buckets (medium/high) tended to be more accurate, while extremes were less so.



3. Visuals Make It Clear
- Scatterplots show broad scatter, with only a loose trend.
- Bucket heatmaps and per-bucket bar charts show the model is “in the ballpark” but misses precise targeting often.
Why the Discrepancy?
- GSC impressions and keyword volumes measure different things: Impressions can be influenced by your ranking, page coverage, and seasonality.
- AI predictions use web-scale patterns, not your site’s visibility.
- Noise in both sources: GSC can undercount, Gemini can overgeneralize, and search volumes themselves are inherently rough estimates.
Practical Takeaways
- AI keyword volumes are directionally useful: They help spot “big” vs. “small” topics, but don’t expect precision.
- Use buckets, not raw numbers: Focus on opportunity tiers (e.g., “high potential” keywords), not exact volume predictions.
- Always verify with your real data: AI tools are a shortcut for ideation, not a replacement for analytics.
Leave a Reply