• The Future of Google

    The Future of Google

    Sundar Pichai, in his post-I/O discussion with Nilay Patel, framed the surge in AI products not as an existential threat to the web, but as the dawn of its “new era.” Confronted with the critical question of what happens when AI agents dominate browsing, Pichai projected an evolution rather than an obsolescence. Google’s AI Strategy…

  • The Inner Workings of GPT’s file_search Tool

    The Inner Workings of GPT’s file_search Tool

    The file_search tool enables GPT models to extract specific information directly from documents uploaded by users. This feature is essential when user queries require precise answers based explicitly on the contents of these documents. The exact hidden system instruction is as follows: How the Tool Functions Upon receiving a file from a user, such as…

  • Live Blog: Hacking Gemini Embeddings

    Live Blog: Hacking Gemini Embeddings

    Prompted by Darwin Santos on the 22th of May and a few days later by Dan Hickley, I had no choice but to jump on this experiment, it’s just too fun to skip. Especially now that I’m aware of the Gemini embedding model. The objective is to do reproduce the claims of this research paper…

  • Google’s New URL Context Tool

    Google’s New URL Context Tool

    Google’s just released a new system which allows Gemini to fetch text directly from a supplied page. OpenAI had this ability for a while now, but for Google, this is completely new. Previously their models were limited to the Search Grounding tool alone. Gemini now employs a combination of tools and processes with the ability…

  • LLM-Based Search Volume Prediction

    LLM-Based Search Volume Prediction

    We put Google’s Gemini to the test by comparing its keyword volume predictions to actual search data from Google Search Console (GSC). Here’s what we learned and how we did it. How We Collected and Compared the Data What Did We Find? 1. Direct Correlation Is Weak-to-Moderate 2. Bucket Accuracy: More Forgiving, Still Limited 3.…

  • How Google grounds its LLM, Gemini.

    How Google grounds its LLM, Gemini.

    In previous analyses (Gemini System Prompt Breakdown, Google’s Grounding Decision Process, and Hacking Gemini), we uncovered key aspects of how Google’s Gemini large language model verifies its responses through external grounding. A recent accidental exposure has provided deeper insights into Google’s internal processes, confirming and significantly expanding our earlier findings. Accidental Exposure of Gemini’s Grounding…

  • Google Lens Modes

    Google Lens Modes

    lns_mode is a parameter that classifies Google Lens queries into text, un (unimodal), or mu (multimodal). Google Lens has quietly become one of the most advanced visual search tools in the world. Behind the scenes, it works by constructing detailed, context-rich search queries that include a growing set of parameters. One of the newest additions…

  • Content Substance Classification

    Content Substance Classification

    Demo: https://dejan.ai/tools/substance/ Preface In 1951, Isaac Asimov proposed an NLP method called Symbolic Logic Analysis (SLA) where text is reduced to its essential logical components. This method involves breaking down sentences into symbolic forms, allowing for a precise examination of salience and semantics analogous to contemporary transformer-based NER (named entity recognition) and summarisation techniques. In…

  • Chrome’s New Embedding Model: Smaller, Faster, Same Quality

    Chrome’s New Embedding Model: Smaller, Faster, Same Quality

    TL;DR Discovery and Extraction During routine analysis of Chrome’s binary components, I discovered a new version of the embedding model in the browser’s optimization guide directory. This model is used for history clustering and semantic search. Model directory: Technical Analysis Methodology To analyze the models, I developed a multi-faceted testing approach: Key Findings 1. Architecture…

  • AI Content Detection

    AI Content Detection

    As models advance, AI content detection tools are struggling to keep up. Text generated by the latest Gemini, GPT and Claude models is fooling even the best of them. We’ve decided to bring AI content detection back in-house in order to keep up. Each time a new model comes out the classifier needs a fine-tune…