Fan-Out Query Search Volume Prediction Using Deep Learning

Project

A deep learning approach using a Query Demand Estimator to automatically predict search volume ranges for long-tail queries generated by a fan-out model.

Listen

Finding new search terms for your website is easy with modern language models, but figuring out which ones actually get traffic is a massive bottleneck. That is why we developed the Query Demand Estimator, or QDE.

The QDE is a deep learning model designed to automatically predict search volume ranges for millions of generated keywords. We trained the model using historical search data, grouping queries into twelve distinct volume buckets.

While predicting the exact volume bucket is challenging, our model gets it right, or very close, nearly fifty-five percent of the time. Because it is choosing from twelve possible ranges, this performance is far better than random guessing. It gives content teams a fast, reliable way to prioritize their search engine optimization efforts.

For queries that have never been seen before, the most promising approach is entity tracking. Since most searches are about specific people, products, or brands, we can predict search volume by analyzing how popular those entities are across Wikipedia, knowledge graphs, and news trends.

By combining deep learning with entity analysis, businesses can turn a massive list of raw keywords into a prioritized, actionable roadmap for organic growth.

While traditional keyword research tools provide valuable data, they often fall short in discovering truly novel or long-tail search query variations that a business might not yet rank for, or even be aware of. This is where our query fan-out model comes in. Using advanced language models to generate a vast array of related search queries from existing organic queries.

https://dejan.ai/tools/fanout

However, generating a massive list of potential keywords creates a new challenge: how do you efficiently assess the search volume potential of these new, unproven queries? Manually checking each one is impractical. This article we present a deep learning approach developed to automatically predict the search volume ranges for these fan-out queries, transforming a broad list into an actionable, prioritized asset.

The Challenge: Scaling Keyword Research

Content teams and SEO strategists constantly seek to expand their keyword footprint. Given a primary query like “AI SEO” and a target URL (e.g., dejan.ai), a fan-out generation model can suggest many diverse, yet related queries.

Here’s the exact output from the fan-out model for a single search query:

ai seo tools
ai powered seo tools
ai powered search engine optimization
ai search engine optimization
ai for search engine optimization
seo automation with ai
artificial intelligence for seo
best ai seo tools
ai for seo optimization
seo with artificial intelligence
best ai for seo
artificial intelligence seo tools
ai for seo ranking
ai for seo a/b testing
ai powered seo
artificial intelligence in seo
ai in search engine optimization
ai for seo agencies
ai for seo beginners
artificial intelligence for search engines
ai seo automation
ai for seo
AI-powered SEO tools
ai seo best practices
benefits of ai for SEO
ai powered keyword research
ai for seo content
ai powered search engine
ai seo tools 2025
ai website optimization
artificial intelligence seo
benefits of ai in seo
search engine optimization ai
ai powered seo services
AI for SEO automation
ai SEO tools comparison
ai seo optimization
best ai tools for seo
SEO with AI tools
improve seo with ai
ai in seo
ai powered seo software
machine learning for seo
ai seo examples
AI SEO implementation
automate seo with AI
ai powered website ranking
ai marketing automation
ai seo platforms
zendesk ai seo
artificial intelligence for website ranking
ai for seo tracking
ai seo algorithm
ai for digital marketing
ai for seo audit
artificial intelligence seo strategy
AI for SEO strategy
ai seo automation tools
dejan ai SEO
ai SEO agency
AI SEO services
best ai SEO software
website seo with ai
dejan ai seo expert
predictive seo ai
AI SEO software
AI for website seo
generate backlinks with ai
ai for seo reviews
website ranking automation with ai
google ai SEO
ai website traffic

While invaluable for identifying new opportunities, this explosion of data quickly becomes overwhelming. Each generated query ideally needs a search volume estimate to determine its potential value and prioritize content efforts. Relying on external tools for millions of queries is costly and time-consuming.

Query Demand Estimator

To address this, we developed a Query Demand Estimator (QDE) using a deep learning model. The core idea is to train a sequence classification model to categorize a given query into predefined search volume buckets.

A tool driven by QDE model, trained for one specific industry.

Model Training

1. Data Preparation: The Ground Truth

The success of any supervised learning model hinges on the quality and quantity of its training data. Our approach involved:

Collecting Organic Performance Data: We aggregated historical search performance data (impressions and clicks) for millions of queries where our digital properties ranked well (positions 1-10). Ranking well implies that the impression data is a good proxy for actual search demand, as the content is visible to a significant portion of searchers.
Defining Volume Buckets: We established 12 distinct search volume ranges, from very low (“51-100” impressions) to very high (“200,001+” impressions). These ranges became our target labels.
Labeling Queries: Each query from our high-ranking dataset was assigned to its corresponding impression bucket, creating a (query_text, volume_label) pair dataset. For example, “dejan ai query fan-out tool” might be labeled as “501-1000”, while “top AI SEO agencies”

label_id,label_text

0,51-100
1,101-150
2,151-200
3,201-250
4,251-500
5,501-1000
6,1001-2000
7,2001-5000
8,5001-10000
9,10001-100000
10,100001-200000
11,200001+

2. Model Architecture and Training

We leveraged a pre-trained transformer model, specifically mDeBERTa-v3-base, known for its strong performance across various natural language understanding tasks, including classification. The choice of mDeBERTa also offers multilingual capabilities, which is advantageous for global businesses.

The model was fine-tuned as a sequence classifier:

Input: A search query.
Output: One of the 12 predefined search volume buckets.

The training process involved:

Tokenization: Converting text queries into numerical tokens using the MDEBERTa tokenizer, ensuring consistent input length (MAX_LENGTH=256).
Batching and Epochs: Training in batches (BATCH_SIZE=16) over several epochs (EPOCHS=3) to allow the model to learn from the data efficiently.
Optimization: Using AdamW optimizer with a low learning rate (LR=2e-5) and weight decay to prevent overfitting.
Evaluation: Regular evaluation on a held-out validation set to monitor performance using metrics like accuracy, precision, recall, and F1-score. Weights & Biases (WandB) was used for experiment tracking.

MODEL_NAME = "microsoft/mdeberta-v3-base"WANDB_PROJECT = "mdeberta-finetune"NUM_LABELS = 12MAX_LENGTH = 256EPOCHS = 3BATCH_SIZE = 16LR = 2e-5WEIGHT_DECAY = 0.01WARMUP_RATIO = 0.1LOGGING_STEPS = 10EVAL_STEPS = 200SAVE_TOTAL_LIMIT = 10OUTPUT_DIR = "./finetuned-mdeberta"

3. Integration into the Fan-Out Workflow

Once trained, the QDE model was integrated into our fan-out query generation system. As the fan-out model generated new query variations for a given URL and seed query, each new variation was immediately passed to the QDE model for a volume prediction. This allowed the system to:

Generate an extensive list of relevant keywords.
Assign an estimated search volume range and a confidence score to each generated keyword.
Store these predictions alongside the fan-out query and its original source, making the data directly actionable.

Validation: How Accurate Are the Predictions?

Validation is crucial. To assess the QDE model’s real-world utility, we compared its predictions against a true gold standard: a subset of queries from a held out dataset, representing terms where our properties consistently ranked in the top 1-10 positions. For these queries, impression data closely reflects actual search volume.

The validation process involved:

Extracting the QDE model’s volume predictions for all fan-out queries.
Identifying queries that overlapped with our high-ranking ground truth dataset.
Comparing the QDE predicted_volume bucket with the actual_volume_bucket from our ground truth.

Key Findings:

Exact Match Accuracy: 23.31%

Initially, this might seem modest. It means that for 23.31% of the overlapping queries, the model predicted the exact search volume bucket.

Combined Accuracy (Exact + Adjacent): 54.80%

This metric is far more representative of the model’s practical value. It indicates that for 54.80% of the queries, the model’s prediction was either exactly correct OR within one adjacent search volume bucket (e.g., predicting “501-1000” when the actual was “251-500” or “1001-2000”). This level of accuracy is highly beneficial for prioritizing.

What the numbers mean

Exact Match Accuracy (23%): Out of all predictions, only about 1 in 4 were exactly correct.
Combined Accuracy (55%): If we also count predictions that were very close (off by just one “volume bucket”), the model got it right more than half the time.

Why 50% isn’t “coin flip” odds

This isn’t a yes/no problem. The model isn’t picking between just 2 outcomes (like heads vs. tails). Instead, it has to choose among 11 different possible volume ranges (labels).

If the model were guessing randomly, each guess would have about a 1 in 11 chance (~9%) of being correct.
Getting ~23% exact match accuracy is much better than random chance—it means the model is finding real patterns.
The ~55% combined accuracy shows that even when it misses, it’s often close to the right bucket, not completely wrong. That’s useful for practical decision-making.

How to read the confusion matrix

The diagonal shows “perfect hits.” Those are the exact matches.
The cells right next to the diagonal are “near misses” (predicted slightly higher or lower than reality).
Off-diagonal far-away values mean the model got it very wrong—these are the cases we want to reduce.

Insights from the Confusion Matrix:

The confusion matrix (a table showing actual vs. predicted labels) provided deeper insights:

Directional Correctness: The predictions clustered strongly around the diagonal, confirming the model’s ability to broadly categorize queries into low, medium, and high-volume ranges.
Systematic Biases:Under-prediction in Low-Mid Range: The model showed a slight tendency to predict slightly lower volume buckets (e.g., 51-100) for queries that actually fell into the next higher categories (101-150, 151-200). This is a useful bias, as it means potentially under-valued queries might be identified, encouraging further investigation.
Slight Over-prediction in Mid-Range: Conversely, some mid-range queries were occasionally over-predicted by one or two buckets, which can help flag terms as potentially more valuable than initially perceived.

A Powerful Tool for SEO Strategy

The deep learning-powered QDE model, integrated with fan-out query generation, transforms a previously manual and time-consuming process into an automated, scalable, and data-driven one. While not always achieving perfect exact-bucket accuracy, its ability to correctly or nearly correctly classify query search volume over 50% of the time provides an invaluable, actionable signal.

This system empowers SEO teams to:

Rapidly identify and prioritize millions of new keyword opportunities.
Uncover long-tail queries that traditional tools might miss.
Strategically plan content and optimize existing pages with a clearer understanding of potential demand, moving beyond guesswork with the power of deep learning.

The future of SEO keyword research is increasingly augmented by AI, allowing businesses to be more agile, comprehensive, and ultimately, more successful in capturing organic search demand.

State of the Art – Leading Research

Entity popularity signals offer the most promising foundation for zero-shot query volume prediction, according to a synthesis of recent machine learning and information retrieval literature. While direct research on predicting volume for never-before-seen queries remains sparse, converging evidence from entity importance estimation, query performance prediction, knowledge graph embeddings, and emerging LLM prompt analysis provides a viable methodological path. The core insight: because queries fundamentally seek information about entities—people, products, brands, concepts—modeling entity-level demand can generalize to arbitrary query formulations containing those entities.

Entity popularity prediction provides the cold-start foundation

The most directly relevant work comes from Van Gysel et al. (SIGIR 2020), who developed entity popularity prediction for virtual assistants at Apple. Their system forecasts which entities will trend before substantial query history exists, achieving 20% relative error reduction on emerging entity recognition. The key finding: temporal patterns of entity mentions in external signals (news, social media) correlate with future search demand, enabling predictions for entities with minimal query history.

Yang et al.’s Named Entity Topic Model (NETM) from Knowledge-Based Systems (2020) provides a theoretical framework for cold-start popularity prediction. The model assigns each named entity a “popularity-gain distribution over semantic topics”—essentially learning that certain entity-topic combinations drive engagement. For novel content, the model aggregates popularity gains from constituent entities across their topic distributions, predicting volume without any prior behavioral data.

The KB-PPN architecture (KDD 2018 Deep Learning Day) demonstrates how knowledge graph embeddings enable volume transfer between entities. By linking items to Freebase entities and encoding them via TransE embeddings, an LSTM with attention can identify similar entities in the knowledge graph and transfer their popularity dynamics. This directly addresses zero-shot prediction: a new product launch can borrow volume patterns from similar products identified via embedding proximity.

Approach	Key Signal	Cold-Start Capability	Venue
Van Gysel et al.	Historical entity interactions + temporal patterns	High	SIGIR 2020
NETM	Entity × topic popularity matrix	High	KBS 2020
KB-PPN	TransE entity embeddings + KB neighbors	Medium-High	KDD 2018
GENI	GNN on KG predicate-aware importance	Medium	KDD 2019

Knowledge graphs encode entity importance implicitly

Park et al.’s GENI (KDD 2019) directly addresses the question: “How can we estimate the importance of nodes in a knowledge graph?” Their graph neural network aggregates importance scores (not just embeddings) across predicate-aware relationships, outperforming PageRank by 5-17% on NDCG@100 for importance prediction. The architecture captures that entities connected to other important entities via meaningful relations inherit importance—a signal correlating with search demand.

Thalhammer and Rettinger (ESWC 2016) found that combining Wikipedia’s internal link structure (PageRank) with page-view data provides the strongest entity importance estimates. Wikipedia page views serve as a direct search demand proxy, while link-based importance captures “objective relevance” independent of current interest. Their methodology enables entity importance estimation even for entities lacking search engine volume data.

Entity salience offers another avenue. Google’s work on entity salience (Dunietz & Gillick, EACL 2014) introduced automatic corpus generation using document-abstract pairs—entities appearing in abstracts are deemed salient. Features like first mention position, mention frequency, and headline presence predict which entities will be searched. The NYT-Salience dataset with millions of training examples is publicly available at google-research-datasets/nyt-salience, enabling training of salience predictors.

The recent GUM-SAGE dataset (ACL Findings 2025) advances this with graded salience scores (1-5) based on summary inclusion across 5 annotators and 12 genres. This graded approach better models varying levels of search interest than binary salience. Code and data are available at github.com/jessicaxylin/GUM-SAGE.

Query performance prediction offers transferable methodology

The information retrieval community has extensively studied Query Performance Prediction (QPP)—predicting query difficulty before or after retrieval. While QPP targets retrieval effectiveness rather than volume, its methods transfer remarkably well.

Cronen-Townsend et al.’s Clarity Score (SIGIR 2002) measures the KL-divergence between a query language model and the collection language model. High clarity indicates focused, unambiguous queries; low clarity suggests ambiguity. The connection to volume: ambiguous queries may have higher aggregate volume (multiple intents summed) while highly specific queries have lower volume but clearer intent. Query specificity features from QPP can inform volume estimation.

Carmel and Yom-Tov’s synthesis book (2010) provides a comprehensive taxonomy: pre-retrieval predictors use only query features (IDF statistics, linguistic features), while post-retrieval methods use search results. For novel query volume prediction, pre-retrieval features are essential—they require no behavioral data.

Modern neural approaches show promise. BERT-QPP (Arabzadeh et al., CIKM 2021) fine-tunes BERT for pre-retrieval difficulty prediction, demonstrating that contextualized embeddings capture query characteristics correlating with retrieval outcomes. Deep-QPP (Datta et al., WSDM 2022) uses 2D convolution over query-document interaction matrices. These architectures could be adapted for volume prediction by replacing retrieval effectiveness targets with volume labels.

Query representation determines prediction architecture

Three representation paradigms dominate:

Semantic embeddings via Sentence-BERT enable similarity-based volume transfer. The intuition: if a novel query is semantically similar to known queries with volume data, transfer their volumes weighted by similarity. Nishikawa et al. (NAACL 2025 Industry) advanced this with user-behavior-driven contrastive learning at Yahoo Japan—training embeddings where queries leading to the same click or appearing in the same session are pulled together. This captures behavioral intent rather than surface lexical similarity.

Entity-based features leverage named entity recognition to extract persons, organizations, locations, and products from queries. Features include entity type, entity popularity (Wikipedia pageviews), entity recency (trending vs. evergreen), and entity embeddings from knowledge graphs. Queries containing trending celebrities or new product launches can be flagged for higher predicted volume based on entity-level signals alone.

Hybrid representations combining embeddings, entity features, and lexical statistics (query length, average IDF) perform best in production. Query length correlates strongly with volume—shorter queries tend to have higher volume (head queries) while longer, more specific queries populate the tail.

Recommended feature vector:- SBERT embedding (768-dim)- Entity types present (one-hot)- Entity popularity scores (continuous)- Query length, avg IDF- Query type (navigational/informational/transactional)

LLM prompt volume prediction is nascent but datasets exist

The emergence of conversational AI has created new query surfaces. LMSYS-Chat-1M (ICLR 2024) provides 1 million real ChatGPT/LLaMA/Vicuna conversations across 25 models, revealing that prompt distributions follow power-law patterns similar to web search. WildChat-4.8M extends this with demographic metadata, showing geographic heterogeneity (21% US, 15% Russia, 10% China) and task distribution (creative writing dominates, followed by analysis and coding).

Critically, no published research exists on predicting novel LLM prompt volume or transferring web search patterns to prompt prediction. This represents a significant research gap. The datasets enable such work: BERTopic analysis of LMSYS-Chat-1M identified 29+ coherent topics, providing a taxonomy for prompt volume modeling.

Yelp’s engineering team demonstrated practical query volume prediction: they found query distributions follow power laws enabling pre-computation of LLM responses for 95% of traffic. This validates that prompt/query volume is predictable enough for industrial applications.

Transfer learning across search domains remains underexplored

While cross-domain recommendation research provides architectural patterns—DSAM uses shared LSTMs with attention for preference transfer, CDTM addresses feature dimensional heterogeneity—no work specifically addresses transferring volume models from web search to e-commerce search to LLM prompts.

Domain generalization frameworks (Wang et al., IEEE TKDE 2022) suggest approaches: domain-invariant representation learning, meta-learning for few-shot adaptation, and self-supervised pretraining. The recipe would be: pretrain query embeddings on web search volume prediction, then fine-tune on the target domain (e-commerce, LLM prompts) with limited labeled data.

Google Trends preprocessing research (arXiv 2024) offers practical methodology: hierarchical clustering groups semantically similar queries to overcome data sparsity, combined with correlation-based filtering. This semantic grouping approach directly applies to novel query volume estimation—cluster novel queries with known queries, then estimate volume from cluster characteristics.

Available resources for implementation

Datasets enabling entity-centric volume prediction:

Dataset	Description	Access
NYT-Salience	Millions of entity salience labels	github.com/google-research-datasets
GUM-SAGE	Graded entity salience (12 genres)	github.com/jessicaxylin/GUM-SAGE
WikiPopular	Cold-start web traffic with text	Recent WWW paper
LMSYS-Chat-1M	LLM conversation logs	huggingface.co/datasets/lmsys
WildChat-4.8M	ChatGPT conversations + metadata	huggingface.co/datasets/allenai
Wikipedia Pageviews	Direct popularity signals	dumps.wikimedia.org

Key code resources:

Sentence-Transformers library (sbert.net) for query embeddings
spaCy and BERT-NER for entity extraction
TransE implementations in PyKEEN for entity embeddings
GENI implementation for GNN-based entity importance

Research gaps and recommended directions

Confirmed gaps in the literature:

No direct zero-shot query volume models exist—most work targets content popularity or query performance, not explicit volume estimation for unseen queries
Entity-to-query volume transfer is undertheorized—while entity importance predicts entity-seeking query volume, the mapping from entity demand to specific query formulation demand lacks formal treatment
Multi-entity query composition is unexplored—queries containing multiple entities require compositional volume estimation methods
Cross-domain volume transfer from web search to e-commerce to LLM prompts has no published research despite strong industrial motivation
Temporal dynamics of novel entity volume (predicting when an emerging entity will peak in search demand) combines entity popularity forecasting with time-series modeling, a combination rarely studied

The most promising research direction combines entity importance from knowledge graphs with semantic similarity to known queries: extract entities from a novel query, estimate entity-level importance via KG signals, identify semantically similar known queries, then fuse these signals in a learned model. This entity-centric architecture leverages the key insight that entities are the fundamental units of search demand, enabling generalization to arbitrary query formulations about those entities.

The path to predicting volume for unseen queries runs through entities. Entity popularity prediction from knowledge graphs, Wikipedia signals, and content salience provides cold-start capability. Query performance prediction offers proven pre-retrieval features. Semantic embeddings enable similarity-based volume transfer from known queries. The missing piece—direct volume prediction models combining these signals—represents both the primary research gap and the clearest opportunity. Industrial applications at Apple, Yahoo Japan, and Yelp validate the approach’s viability; academic formalization and public benchmarks remain the field’s next frontier.

Want one for your industry?

If you’d like a custom QDE model trained for your own website or client please apply below. This type of model training is best suited for websites with at least 100K, ideally 1M queries. We’ll evaluate your dataset and advise whether it’s suitable for model training.

Expressions of Interest

Dan Petrovic · Aug 30, 13:09