Machine Learning – Page 2

Training a Query Fan-Out Model

24 Jun 2025

—

by

Dan Petrovic

in AI SEO, Google, Machine Learning, Mechanistic Interpretability

Google discovered how to generate millions of high-quality query reformulations without human input by literally traversing the mathematical space between queries and their target documents. Here’s How it Works This generated 863,307 training examples for a query suggestion model (qsT5) that outperforms all existing baselines. Query Decoder + Latent Space Traversal Step 1: Build a…

Cosine Similarity or Dot Product?

19 Jun 2025

—

by

Dan Petrovic

in Google, Machine Learning

Google’s embedder uses dot product between normalized vectors which is computationally more efficient but mathematically equivalent to cosine similarity. How Googler’s work and think internally typically aligns with their open source code (Gemini -> Gemma) and Chrome is no exception. It’s why I look there for answers and clarity on Google’s machine learning approaches. After…

Universal Query Classifier

13 Jun 2025

—

by

Dan Petrovic

in Keyword Research, Machine Learning, SEO

Generalist, Open‑Set Classification for Any Label Taxonomy We’ve developed a search query classifier that takes any list of labels you hand it at inference time and tells you which ones match each search query. No retraining, ever. Just swap in new labels as they appear. Old workflow Pain New workflow Build + label data + retrain…

Vector Embedding Optimization

6 Jun 2025

—

by

Dan Petrovic

in Machine Learning

Embedding Methods Evaluation: Results, Key Findings, and a Surprising Insight On June 6, 2025, we ran a comprehensive evaluation comparing four different embedding methods—regular, binary, mrl, and mrl_binary—on a dataset of paired sentences. The goal was to measure each method’s speed, storage footprint, similarity quality, and accuracy against a ground-truth of sentence pairs. Below, we…

Live Blog: Hacking Gemini Embeddings

24 May 2025

—

by

Dan Petrovic

in Google, Machine Learning

Prompted by Darwin Santos on the 22th of May and a few days later by Dan Hickley, I had no choice but to jump on this experiment, it’s just too fun to skip. Especially now that I’m aware of the Gemini embedding model. The objective is to do reproduce the claims of this research paper…

Content Substance Classification

23 Apr 2025

—

by

Dan Petrovic

in Content, Machine Learning, SEO

Demo: https://dejan.ai/tools/substance/ Preface In 1951, Isaac Asimov proposed an NLP method called Symbolic Logic Analysis (SLA) where text is reduced to its essential logical components. This method involves breaking down sentences into symbolic forms, allowing for a precise examination of salience and semantics analogous to contemporary transformer-based NER (named entity recognition) and summarisation techniques. In…

Chrome’s New Embedding Model: Smaller, Faster, Same Quality

19 Apr 2025

—

by

Dan Petrovic

in Google, Machine Learning

TL;DR Discovery and Extraction During routine analysis of Chrome’s binary components, I discovered a new version of the embedding model in the browser’s optimization guide directory. This model is used for history clustering and semantic search. Model directory: Technical Analysis Methodology To analyze the models, I developed a multi-faceted testing approach: Key Findings 1. Architecture…

AI Content Detection

17 Apr 2025

—

by

Dan Petrovic

in AI, Content, Machine Learning

As models advance, AI content detection tools are struggling to keep up. Text generated by the latest Gemini, GPT and Claude models is fooling even the best of them. We’ve decided to bring AI content detection back in-house in order to keep up. Each time a new model comes out the classifier needs a fine-tune…

Introducing Grounding Classifier

2 Apr 2025

—

by

Dan Petrovic

in Google, Machine Learning, SEO

Using the same tech behind AI Rank, we prompted Google’s latest Gemini 2.5 Pro model with search grounding enabled in the API request. A total of 10,000 prompts were collected and analysed to determine the grounding status of the prompt. The resulting data was then used to train a replica of Google’s internal classifier which…

Advanced Interpretability Techniques for Tracing LLM Activations

31 Mar 2025

—

by

Dan Petrovic

in AI SEO, Mechanistic Interpretability

Activation Logging and Internal State Monitoring One foundational approach is activation logging, which involves recording the internal activations (neuron outputs, attention patterns, etc.) of a model during its forward pass. By inspecting these activations, researchers can identify which parts of the network are highly active or contributing to a given output. Many open-source transformer models…

Category: Machine Learning