Our Machine Learning Models

We believe in small, dedicated models trained on the highest quality data. Each model we train does one thing only and does it really well.

Interested in hearing how we can transform your SEO using machine learning techniques?

AI Content Detection Model

Binary classification model for detecting AI-generated vs human-written text

A fine-tuned DeBERTa-v3 model trained to distinguish between organic (human-written) and AI-generated content. Uses class-weighted training to handle imbalanced datasets and optimized for high precision in content authenticity detection.

Key Features

  • Binary classification – Organic vs AI-generated content detection
  • DeBERTa-v3 architecture – State-of-the-art transformer for text understanding
  • Class-weighted training – Handles imbalanced datasets effectively
  • High precision – Optimized for content authenticity verification

Use Cases

  • Content moderation – Detect AI-generated spam or fake content
  • Academic integrity – Identify AI-assisted writing in submissions
  • Publishing verification – Ensure content authenticity for publications
  • SEO quality control – Filter AI-generated content in content strategies

Universal Search Query Classifier

Generalist, Open‑Set Classification for Any Label Taxonomy

Multi-label text classification for search queries with arbitrary label support. The Universal Query Classifier is a specialized model for classifying search queries into multiple intent categories simultaneously. Unlike traditional single-label classifiers, this model supports threshold-based multi-label assignment and works with any custom label set.

Key Features

  • Multi-label classification – Assigns multiple relevant labels per query
  • Arbitrary label support – Works with any custom label set and descriptions
  • Threshold-based assignment – Configurable confidence thresholds for label assignment
  • Special token format – Uses [QUERY], [LABEL_NAME], and [LABEL_DESCRIPTION] tokens for structured input
  • Multiple model sizes – Available in X-Small, Small, Base, and Large variants

Use Cases

  • SEO intent analysis – Classify search queries by commercial intent, brand awareness, etc.
  • Content strategy – Understand query intent to optimize content targeting
  • Search advertising – Categorize keywords for campaign optimization
  • Customer support – Route queries based on intent classification

Model Variants

  • X-Small – Fast inference for real-time applications
  • Small – Balanced speed and accuracy
  • Base – Recommended for most use cases
  • Large – Highest accuracy for complex classification tasks

Classic Query Intent Classifier

Multi-label search query classification model developed by Dejan AI. The model is designed to be deployed in an automated pipeline capable of classifying search query intent for large volumes of search queries from common data sources such as ad campaigns and organic search tools and platforms.

Classification Labels

LABEL_0: ‘Commercial’
LABEL_1: ‘Non-Commercial’
LABEL_2: # Unused
LABEL_3: # Unused
LABEL_4: ‘Informational’
LABEL_5: ‘Navigational’
LABEL_6: ‘Transactional’
LABEL_7: ‘Commercial Investigation’
LABEL_8: ‘Local’
LABEL_9: ‘Entertainment’

Base Models:

LinkBERT

LinkBERT is a fine-tuned version of Google’s BERT model, designed to predict natural link placement within web content. This binary classification model excels in identifying distinct token ranges that web authors are likely to choose as anchor text for links. By analysing never-before-seen texts, LinkBERT can predict areas within the content where links might naturally occur, effectively simulating web author behaviour in link creation.

LinkBERT is positioned as a powerful tool for content creators, SEO specialists, and webmasters, offering unparalleled support in optimizing web content for both user engagement and search engine recognition. Its predictive capabilities not only streamline the content creation process but also offer insights into the natural integration of links, enhancing the overall quality and relevance of web content.

Spam and Inorganic SEO Detection: Helps identify unnatural link patterns, contributing to the detection of spam and inorganic SEO tactics.

Anchor Text Suggestion: Acts as a mechanism during internal link optimization, suggesting potential anchor texts to web authors.

Evaluation of Existing Links: Assesses the naturalness of link placements within existing content, aiding in the refinement of web pages.

Link Placement Guide: Offers guidance to link builders by suggesting optimal placement for links within content.

Anchor Text Idea Generator: Provides creative anchor text suggestions to enrich content and improve SEO strategies.

Sentiment

Multi-label sentiment classification model developed by Dejan Marketing. The model is designed to be deployed in an automated pipeline capable of classifying text sentiment for thousands (or even millions) of text chunks or as a part of a scraping pipeline.

Classification Labels

0: “very positive”,
1: “positive”,
2: “somewhat positive”,
3: “neutral”,
4: “somewhat negative”,
5: “negative”,
6: “very negative”

Sources of Training Data

Synthetic. Llama3.

Try the model:

Query Form Quality Classifier

We build on the work by Manaal Faruqui and Dipanjan Das from Google AI Language team to train a search query classifier of well-formed search queries. Our model offers a 10% improvement over Google’s classifier by utilising ALBERT architecture instead of LSTM.

Practical Application

With accuracy of 80%, the model is production ready and has already been deployed in Dejan AI’s query processing pipeline. The role of the model is to help identify query expansion candidates by flagging ambiguous queries retrieved via Google Search Console API.

Most search queries are ambiguous making it difficult to classify intent and make decisions on how to optimise for them. Query expansion helps, but only only if you know which queries to expand. This is where our model comes in. Take it for a spin here and try proper questions vs raw keyword queries and experience the model in action.

Model Quality

Our robust model validation process ensures model quality for most common classification and natural language processing tasks.

Metrics

  • Recall
  • Precision
  • Accuracy
  • F1

Custom Models

Our team can work with you to design and training of your very own industry, language or task specific model.

Example

Bulgarian Search Query Intent

This model is fine-tuned from mDeBERTa V3 for search query intent classification in Bulgarian language. It predicts one of five intent categories:

  • COMMERCIAL_INVESTIGATION – Queries with a purchase intent but requiring additional research.
  • INFORMATIONAL – Queries seeking knowledge or facts.
  • LOCAL – Queries related to local services or locations.
  • NAVIGATIONAL – Queries aiming to reach a specific website or service.
  • TRANSACTIONAL – Queries with a direct intent to complete an action (e.g., purchase, reservation).

Model Link