We believe in small, dedicated models trained on the highest quality data. Each model we train does one thing only and does it really well.

Interested in hearing how we can transform your SEO using machine learning techniques?

AI Content Detection Model

Binary classification model for detecting AI-generated vs human-written text

A fine-tuned DeBERTa-v3 model trained to distinguish between organic (human-written) and AI-generated content. Uses class-weighted training to handle imbalanced datasets and optimized for high precision in content authenticity detection.

Key Features

Binary classification – Organic vs AI-generated content detection
DeBERTa-v3 architecture – State-of-the-art transformer for text understanding
Class-weighted training – Handles imbalanced datasets effectively
High precision – Optimized for content authenticity verification

MODEL DEMO

Use Cases

Content moderation – Detect AI-generated spam or fake content
Academic integrity – Identify AI-assisted writing in submissions
Publishing verification – Ensure content authenticity for publications
SEO quality control – Filter AI-generated content in content strategies

Link Spam Algorithm

Link spam algorithm which can identify money links on any page. If our algorithm can spot your link, so can Google’s, and that means that your links are either being devalued or pose a risk of penalty.

Batch Processing

Batch analysis is only available to our paying clients.

If you’re interested in batch processing, please get in touch.

Public tool:

Link Spam Algorithm Demo

Search Query Fan-Out

The model generates diverse, contextually relevant search query variations for a given URL and seed query.

By reformulating queries, it helps capture a broader range of search intents, improving organic search visibility and click-through rates.

This is especially valuable for SEO, content optimization, and keyword targeting, enabling discovery of traffic-driving variations that may not surface through manual keyword research.

Inference

High Effort (Deep Analysis) – Stochastic sampling in large batches with varied seeds, high temperature/top-p diversity, duplicate suppression, and log-probability/length scoring. Produces up to 200 unique candidates, sorted by quality.

Quick Fan-Out – Beam search with diversity penalty and no sampling. Small set (default 10) of deterministic, diverse expansions generated quickly.

Search Query Fan-Out Demo

Training ran for 70 hours, 5 times over 15 million training samples.

Rigorously following all available steps in Google’s query fan-out process we train a search query reformulation model with optimization and improvements geared towards SEO use. In the two step process we first create a custom architecture Gemma 3 1B for feature extraction and use it to augment our training data via interpolation between the query and the target documents using vector embedding space traversal. Combining Google Search Console data (query and URL pairs) augmented with synthetic data we then fine-tune a large multilingual T5 model for query expansion.

Universal Search Query Classifier

Generalist, Open‑Set Classification for Any Label Taxonomy

Multi-label text classification for search queries with arbitrary label support. The Universal Query Classifier is a specialized model for classifying search queries into multiple intent categories simultaneously. Unlike traditional single-label classifiers, this model supports threshold-based multi-label assignment and works with any custom label set.

Key Features

Multi-label classification – Assigns multiple relevant labels per query
Arbitrary label support – Works with any custom label set and descriptions
Threshold-based assignment – Configurable confidence thresholds for label assignment
Special token format – Uses [QUERY], [LABEL_NAME], and [LABEL_DESCRIPTION] tokens for structured input
Multiple model sizes – Available in X-Small, Small, Base, and Large variants

Use Cases

SEO intent analysis – Classify search queries by commercial intent, brand awareness, etc.
Content strategy – Understand query intent to optimize content targeting
Search advertising – Categorize keywords for campaign optimization
Customer support – Route queries based on intent classification

Model Variants

X-Small – Fast inference for real-time applications
Small – Balanced speed and accuracy
Base – Recommended for most use cases
Large – Highest accuracy for complex classification tasks

Classic Query Intent Classifier

Multi-label search query classification model developed by Dejan AI. The model is designed to be deployed in an automated pipeline capable of classifying search query intent for large volumes of search queries from common data sources such as ad campaigns and organic search tools and platforms.

Classification Labels

LABEL_0: ‘Commercial’
LABEL_1: ‘Non-Commercial’
LABEL_2: # Unused
LABEL_3: # Unused
LABEL_4: ‘Informational’
LABEL_5: ‘Navigational’
LABEL_6: ‘Transactional’
LABEL_7: ‘Commercial Investigation’
LABEL_8: ‘Local’
LABEL_9: ‘Entertainment’

Base Models:

LinkBERT

LinkBERT is a fine-tuned version of Google’s BERT model, designed to predict natural link placement within web content. This binary classification model excels in identifying distinct token ranges that web authors are likely to choose as anchor text for links. By analysing never-before-seen texts, LinkBERT can predict areas within the content where links might naturally occur, effectively simulating web author behaviour in link creation.

LinkBERT is positioned as a powerful tool for content creators, SEO specialists, and webmasters, offering unparalleled support in optimizing web content for both user engagement and search engine recognition. Its predictive capabilities not only streamline the content creation process but also offer insights into the natural integration of links, enhancing the overall quality and relevance of web content.

Spam and Inorganic SEO Detection: Helps identify unnatural link patterns, contributing to the detection of spam and inorganic SEO tactics.

Anchor Text Suggestion: Acts as a mechanism during internal link optimization, suggesting potential anchor texts to web authors.

Evaluation of Existing Links: Assesses the naturalness of link placements within existing content, aiding in the refinement of web pages.

Link Placement Guide: Offers guidance to link builders by suggesting optimal placement for links within content.

Anchor Text Idea Generator: Provides creative anchor text suggestions to enrich content and improve SEO strategies.

DEMO

Sentiment

Multi-label sentiment classification model developed by Dejan Marketing. The model is designed to be deployed in an automated pipeline capable of classifying text sentiment for thousands (or even millions) of text chunks or as a part of a scraping pipeline.

Classification Labels

0: “very positive”,
1: “positive”,
2: “somewhat positive”,
3: “neutral”,
4: “somewhat negative”,
5: “negative”,
6: “very negative”

Sources of Training Data

Synthetic. Llama3.

Try the model:

DEMO

Query Form Quality Classifier

We build on the work by Manaal Faruqui and Dipanjan Das from Google AI Language team to train a search query classifier of well-formed search queries. Our model offers a 10% improvement over Google’s classifier by utilising ALBERT architecture instead of LSTM.

https://huggingface.co/dejanseo/Query-Quality-Classifier

Practical Application

With accuracy of 80%, the model is production ready and has already been deployed in Dejan AI’s query processing pipeline. The role of the model is to help identify query expansion candidates by flagging ambiguous queries retrieved via Google Search Console API.

Most search queries are ambiguous making it difficult to classify intent and make decisions on how to optimise for them. Query expansion helps, but only only if you know which queries to expand. This is where our model comes in. Take it for a spin here and try proper questions vs raw keyword queries and experience the model in action.

DEMO

Model Quality

Our robust model validation process ensures model quality for most common classification and natural language processing tasks.

Metrics

Recall
Precision
Accuracy
F1

Evaluation Metrics

Custom Models

Our team can work with you to design and training of your very own industry, language or task specific model.

Example

Bulgarian Search Query Intent

This model is fine-tuned from mDeBERTa V3 for search query intent classification in Bulgarian language. It predicts one of five intent categories:

COMMERCIAL_INVESTIGATION – Queries with a purchase intent but requiring additional research.
INFORMATIONAL – Queries seeking knowledge or facts.
LOCAL – Queries related to local services or locations.
NAVIGATIONAL – Queries aiming to reach a specific website or service.
TRANSACTIONAL – Queries with a direct intent to complete an action (e.g., purchase, reservation).

Model Link

DEMO

Our Machine Learning Models

Binary classification model for detecting AI-generated vs human-written text

Key Features

Use Cases

Link Spam Algorithm

Batch Processing

Search Query Fan-Out

Inference

Generalist, Open‑Set Classification for Any Label Taxonomy

Key Features

Use Cases

Model Variants

Classification Labels

Base Models:

Sentiment

Multi-label sentiment classification model developed by Dejan Marketing. The model is designed to be deployed in an automated pipeline capable of classifying text sentiment for thousands (or even millions) of text chunks or as a part of a scraping pipeline.

Classification Labels

Sources of Training Data

Practical Application

Model Quality

Our robust model validation process ensures model quality for most common classification and natural language processing tasks.

Metrics

Example

Generalist, Open‑Set Classification for Any Label Taxonomy