Most text classification systems depend on a fixed set of labels. This is a major limitation for tasks like search query classification where label sets can be open or constantly changing. We present a method that classifies queries using arbitrary labels provided at inference. The system uses a dual encoder architecture: one encoder processes queries, the other processes label texts. Both are mapped into the same vector space. The model is trained to score the semantic compatibility between each query and each candidate label, using binary relevance targets for supervision. By framing classification as a matching problem, the model generalizes to labels not seen in training. Experiments show that this approach supports robust multi-label and zero-shot classification, making it practical for any scenario where new or domain-specific labels must be handled without retraining.
Fixed-label text classifiers dominate industrial practice, but real-world tasks often need open, domain-specific labels. Web search, digital marketing, and customer support all involve classification problems where the label set changes constantly. In search engine optimization (SEO) agencies, for example, teams are expected to deliver high-quality query intent classification for clients across dozens of industries. Each industry, and often each client, comes with its own taxonomy of intents, topics, and conversion goals. Building and retraining custom classifiers for each use case is wasteful, slow, and labor intensive. On the other hand, generalist classifiers, trained to cover all industries at once, tend to underperform on fine-grained, domain-specific queries because they cannot meaningfully distinguish subtle industry context and label nuance.
Recent advances in token classification, especially the GLiNER architecture, have shown that models can assign arbitrary labels to entities in text by modeling the semantic relationship between a span and a label description. Inspired by this, we propose a semantic matching approach for query classification, using arbitrary labels as textual inputs and framing the problem as semantic relevance prediction.
Practical Scenario: SEO at Scale
Consider an SEO agency tasked with intent classification for search queries across hundreds of clients. A law firm wants to classify queries as “Legal Advice”, “Attorney Profile”, “Appointment Booking”, or “FAQ”. An e-commerce retailer wants categories like “Product Inquiry”, “Shipping Policy”, “Returns”, or “In-Store Availability”. Meanwhile, a travel company asks for labels like “Flight Booking”, “Hotel Search”, “Travel Restrictions”, and “Visa Information”. Even within a single client, the label taxonomy may change month to month as new campaigns launch. Maintaining a separate classifier for each client and taxonomy quickly becomes impossible. Teams spend most of their time relabeling data and retraining models, which is slow and error-prone.
General-purpose classifiers, trained on pooled data across clients and label sets, also fail in practice. Their outputs are diluted by conflicting definitions and inconsistent coverage of industry-specific terms. They often miss subtle distinctions like “Legal Consultation” vs “Attorney Referral”, or cannot adapt when the taxonomy changes.
Method
Our solution is a generalist, semantic classifier that accepts any set of candidate labels at inference. The architecture uses dual encoders, both based on state-of-the-art language models. One encoder processes the search query; the other processes each candidate label, expressed as a short phrase or full-sentence description. Both outputs are projected into a shared vector space using linear projection layers. The similarity between a query vector and each label vector is computed using cosine similarity. The model is trained to produce high similarity for relevant query-label pairs and low similarity for unrelated ones, using binary cross-entropy loss.

During training, the model sees a rotating set of candidate labels, many of which are paraphrases or descriptions never seen before. The training set is built to maximize diversity: different domains, different industries, overlapping and fine-grained intent types, and both simple and descriptive label text. This setup teaches the model to focus on the underlying semantic compatibility, not just memorizing fixed label IDs.
Leveraging Modern LLM Technology
This method is directly inspired by GLiNER, which solves arbitrary-label entity recognition by treating labels as text and matching them to spans in context. Our classifier extends this paradigm to full queries and intent classification, but instead of token-level matching, we operate at the sentence level.

To maximize generalization and semantic capacity, we leverage Google’s latest large language model technologies. Gemma 3, a Gemini-class model, is used as the encoder backbone. This brings the benefits of superior pretraining, high-quality tokenization, and transfer learning from Gemini-scale data and methods. Both query and label encoders are initialized from Gemma 3 weights, enabling robust understanding of a wide range of topics, domains, and phrasing. This LLM foundation is essential for handling open vocabularies and subtle semantic distinctions in label text.
Training Objective and Data
Each training example consists of a query, a list of candidate labels (which can change per example), and a subset of correct labels. The system constructs pairs of (query, label) and trains the model to score 1 for correct pairs and 0 for incorrect ones. By constantly varying the label set and including paraphrases, hard negatives, and descriptive label text, we force the model to learn general semantic compatibility, not just association with seen label IDs.
Example training instance:
{
"query": "how to appeal a traffic ticket",
"labels": ["Legal Advice", "Attorney Referral", "DIY Guide", "Case Law"],
"correct_labels": ["Legal Advice", "DIY Guide"]
}
The model never assumes any fixed set of labels; at inference, new labels can be introduced by simply providing their text.
Results
We evaluate the model on synthetic and real-world SEO query datasets. For each test case, we present a new set of candidate labels, often containing paraphrases and domain-specific descriptions unseen in training. The model consistently outperforms generalist, fixed-label classifiers in accuracy and coverage, especially for previously unseen or rephrased label sets. Zero-shot and few-shot scenarios show that the model can handle label drift and evolving taxonomies without retraining or relabelling data.

In head-to-head tests, traditional classifiers fall apart when asked to distinguish between new intent categories or industry terms. Our method maintains performance, with only minor drops for rare or highly ambiguous cases.
Query classification with arbitrary labels is critical for many industries where label sets evolve rapidly. Legacy methods, both per-client retraining and generic generalist classifiers, are inefficient and inaccurate. By reframing the task as semantic query-to-label matching and leveraging dual encoders built from advanced LLM backbones, we enable robust, flexible classification for any label set, at any time. This approach, inspired by GLiNER and powered by Gemma 3, unlocks open-set classification for real-world, production-scale applications like SEO, marketing, and search, eliminating the need for constant retraining and manual curation of label sets.
Leave a Reply