Category: Machine Learning
-

OpenAI’s Sparse Circuits Breakthrough and What It Means for AI SEO
OpenAI recently released research showing that AI models can be built with far fewer active connections inside them. This makes them easier to understand because each part of the model does fewer things and is less tangled up with everything else. Think of it like taking a spaghetti bowl and straightening the noodles into clean,…
-

BlockRank: A Faster, Smarter Way to Rank Documents with LLMs
Large Language Models (LLMs) have revolutionized many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the…
-

From Free-Text to Likert Distributions: A Practical Guide to SSR for Purchase Intent
Instead of forcing LLMs to pick a number on a 1–5 scale, ask them to speak like a person and map the text to a Likert distribution via Semantic Similarity Rating (SSR). In benchmarks across 57 personal-care concept surveys (9.3k human responses), SSR reproduced human purchase intent signals with ~90% of human test–retest reliability and…
-

TimesFM-ICF
In-Context Fine-Tuning for Time-Series: The Next Evolution Beyond Prophet and Traditional Forecasting How Google’s TimesFM-ICF achieves fine-tuned model performance without training – and why this changes everything for production forecasting systems If you’re reading this, you’ve likely wrestled with time-series forecasting in production. Perhaps you’ve implemented Facebook Prophet for its interpretable seasonality decomposition, experimented with…
-

RexBERT
RexBERT is a domain-specialized language model trained on massive volumes of e-commerce text (product titles, descriptions, attributes, reviews, FAQs). Unlike general-purpose transformers, it is optimized to understand the quirks of product data and the way consumers phrase queries. For a technical SEO professional, this means better alignment between how search engines interpret product content and…
-

Analysis of Gemini Embed Task-Based Dimensionality Deltas
When generating vector embeddings for your text using Gemini Embed there are several embedding optimisation modes: For each one you get slightly different embeddings, each optimised for the task at hand. The embeddings for semantic similarity are the most unique from all other types while retrieval query, retrieval document and fact verification embeddings are most…
-

Dynamic per-label thresholds for large-scale search query classification with Otsu’s method
Solving the “Which Score Is Good Enough?” Puzzle The real-world problem Arbitrary label search-query intent classifiers spit out a confidence score per label.On clean demos you set one global cut-off say 0.50 and move on.In production: Manual tuning per label quickly turns into a never-ending whack-a-mole, especially when the taxonomy is customized client-by-client (e.g., SaaS…
-

Top 10 Most Recent Papers by MUVERA Authors
MUVERA Authors: 1. Laxman Dhulipala (Google Research & UMD) Top 10 Recent Papers (2023-2025) Research Focus Areas 2. Majid Hadian (Google DeepMind) Top 10 Recent Papers (2023-2025) Research Focus Areas 3. Jason Lee (Google Research & UC Berkeley) Top 10 Recent Papers (2023-2025) Research Focus Areas 4. Rajesh Jayaram (Google Research) Top 10 Recent Papers…
-

Training Gemma‑3‑1B Embedding Model with LoRA
In our previous post, Training a Query Fan-Out Model, we demonstrated how to generate millions of high-quality query reformulations without human labelling, by navigating the embedding space between a seed query and its target document and then decoding each intermediate vector back into text using a trained query decoder. That decoder’s success critically depends on…
-

Training a Query Fan-Out Model
Google discovered how to generate millions of high-quality query reformulations without human input by literally traversing the mathematical space between queries and their target documents. Here’s How it Works This generated 863,307 training examples for a query suggestion model (qsT5) that outperforms all existing baselines. Query Decoder + Latent Space Traversal Step 1: Build a…
