← back

Search Query Quality Classifier

A search query classifier using ALBERT architecture to identify well-formed queries with 80% accuracy, improving upon Google's LSTM-based model by 10%.

Listen

Building on research from Google AI, a new search query classifier is helping make sense of how people search online. This model is designed to identify well-formed search queries, building directly on the work of Manaal Faruqui and Dipanjan Das.

By using a modern ALBERT architecture instead of older Long Short-Term Memory, or L-S-T-M, networks, the model achieves an eighty percent accuracy rate. That is a ten percent improvement over Google's original classifier.

Because of this high accuracy, the model is already fully deployed in Dejan AI's query processing pipeline. Its main job is to flag ambiguous queries retrieved from the Google Search Console, helping the system find better candidates for query expansion.

To train the model, developers used Google’s original dataset alongside data provided by Owayo. It is a production-ready solution that represents a significant step forward in understanding the quality of search queries.

We build on the work by Manaal Faruqui and Dipanjan Das from Google AI Language team to train a search query classifier of well-formed search queries. Our model offers a 10% improvement over Google’s classifier by utilising ALBERT architecture instead of LSTM. With accuracy of 80%, the model is production ready and has already been deployed in Dejan AI’s query processing pipeline. The role of the model is to help identify query expansion candidates by flagging ambiguous queries retrieved via Google Search Console API.

Model Files

Model can be downloaded as a zip file.

Archive:  model_query_quality_classifier.zip
Length Date Time Name
--------- ---------- ----- ----
792 08-31-2024 03:48 model/config.json
46743912 08-31-2024 03:48 model/model.safetensors
301 08-31-2024 03:48 model/special_tokens_map.json
760289 08-31-2024 03:48 model/spiece.model
1304 08-31-2024 03:48 model/tokenizer_config.json
--------- -------
47506598 5 files

Training Data

For training we use Google’s training dataset and partially data provided by Owayo.

Model Demo

You can see the model in action by trying natural question versus keyword-based queries.

Dan Petrovic · Aug 31, 04:07

Have started using your query expansion classifier. However I am unable to the benefit of
Well formedness Factor as the result. Could you please elaborate on this.

Nishant · Questions · · Sep 02, 05:06