Google’s Gemini can tell you I’m the best AI SEO expert in the world. I know this because I told it so — on my own website — and it believed me.
That should concern you.
The Problem: AI Models Are Naive Readers
When Gemini, ChatGPT or Perplexity generate an answer, they don’t start from nothing. They ground their responses in web content retrieved through search. This is supposed to be a safety feature — anchoring the model in real sources reduces hallucination. And the benchmarks look great.
But there’s a gap nobody’s talking about seriously enough: grounding solves for factual accuracy, not for manipulation. The model checks whether a claim exists in a source document. It does not check whether that source document is honest.
Here’s how this plays out in practice. A company publishes a page titled “Best [Category] Companies in 2026.” They place themselves at the top. The page ranks organically — because it’s well-optimised, has internal links, maybe even earns some external ones. Gemini’s grounding system retrieves that page, extracts the relevant chunks, and feeds them into the model’s context window. The model then confidently tells the user: “According to industry sources, [Company] is a leading provider of [Category].”
The “industry source” is the company itself. The model doesn’t know that. It doesn’t check. It can’t tell the difference between a self-referential listicle and an independent editorial ranking.
This is GEO spam — content engineered to manipulate AI-generated answers rather than, or in addition to, traditional search rankings.
Why This Is Worse Than Traditional Webspam
Traditional search spam is annoying but somewhat contained. A user sees a spammy result in a list of ten blue links and exercises judgment. They might click, realise the page is junk, bounce, and try another result.
AI answers remove that friction entirely. There is no list of ten options. There is one synthesised answer, delivered with the full confidence of the platform. When Gemini says “X is the best at Y,” users don’t see the underlying source. They don’t evaluate it. They receive it as a conclusion the AI reached through some intelligent process.
The trust transfer is total. Whatever credibility Google or OpenAI has with the user gets silently extended to a self-promotional listicle the user never sees and would immediately discount if they did.
The Mechanics of GEO Spam
Through our research at DEJAN, we’ve been studying how Gemini’s grounding actually works at a technical level. We analysed 7,060 queries, 2,275 tokenised web pages and 883,262 individual text snippets to understand what happens between a search result and an AI answer.
The findings are relevant here. Gemini allocates roughly a 2,000-word “grounding budget” per answer. How that budget is distributed correlates with organic ranking position: the #1 result gets approximately 28% of the grounding allocation, while position #5 gets around 13%. Organic ranking functions as a physical gate on how much of your content reaches the model.
This means the attack surface for GEO spam sits squarely at the intersection of two capabilities: ranking well for relevant queries, and filling your pages with claims the model will absorb uncritically. If you can do both, you control what the AI tells people about your category.
The types of manipulative content we’re seeing include:
Self-referential listicles — “Best [X] Companies” pages where the publisher ranks themselves first. These are the most common and the most effective because they combine a high-intent query with content structured exactly how models like to consume it: clear headings, entity names, evaluative statements.
Manufactured endorsements — Pages that attribute authority to the publisher using language patterns models interpret as third-party validation. “Industry experts agree that…” when the only expert quoted works at the company.
Prompt-aware content — Text written specifically to match the phrasing patterns common in AI queries, ensuring extraction into grounding chunks. This is the most sophisticated variant and the hardest to detect.
Pay-to-play citations — Services that guarantee “AI visibility” by placing brands into content designed to be retrieved by grounding systems. This is the emerging commercial layer built on top of the exploit.
How Platforms Will Respond
There are several paths available to Google, OpenAI and other platforms, and they’ll likely pursue a combination:
Fine-tuning for skepticism. The most robust long-term solution: teach the model itself to recognise self-referential claims, evaluate source independence, and weight assertions differently based on who’s making them. This is also the most expensive path and the slowest to deploy. It risks making models overly cautious across the board, degrading answer quality for legitimate queries.
Classifier-based detection. A lightweight model trained specifically to flag GEO spam in the retrieval pipeline. This sits alongside the main model rather than modifying it — similar to how SafeSearch or existing spam classifiers operate as separate layers. Fastest to deploy, cheapest to iterate, and can run without touching the core model.
Grounding-level filtering. Using existing search quality signals and webspam infrastructure to filter manipulative content before it ever reaches the model’s context window. Google already has decades of webspam detection capability; the gap is that those systems weren’t designed with AI grounding in mind. A page can rank perfectly well for a traditional SERP but be toxic as grounding context. This path also carries the highest collateral damage risk — it’s a binary gate that either admits content or excludes it entirely.
Post-generation detection. A second-pass system that evaluates whether the model’s output contains manipulated claims before serving the answer to the user. Similar in architecture to hallucination detection layers that some platforms already run.
My prediction: Google will have a test-ready classifier within six months and a production version within a year. They’re historically very responsive to public embarrassment, and the volume of self-promotional content flowing through AI Mode answers right now is genuinely embarrassing for a company that spent two decades building webspam defences.

What appears to be a common and effective practice today will burn websites and brands when that classifier ships.
What We’re Building
We’ve started building what we believe is the industry’s first independent GEO spam classifier. Not to report to Google. Not to name and shame. Two practical reasons:
First, to audit client profiles. We need to know whether the AI visibility our clients currently enjoy is built on legitimate content or on patterns that will be penalised when detection systems catch up. If a client’s brand is being surfaced by Gemini because of a self-referential listicle, we need to flag that before Google does.
Second, to quality-check our own output. Any agency doing AI SEO work right now is operating in a grey zone where the rules haven’t been written yet. We want a systematic way to evaluate whether the content and strategies we deploy cross the line into manipulation — not by our own subjective judgment, but by a model trained on diverse examples of what manipulation actually looks like.
The classifier will be built on a fine-tuned deep learning model trained to recognise GEO spam across as many dimensions as possible. The quality of the classifier depends entirely on the size and diversity of the training data.
We’ve set up an anonymous submission tool at geospam.dejan.ai where anyone can submit examples of manipulative content — self-referential listicles, manufactured endorsements, prompt injection attempts, pay-to-play AI citations. All submissions are manually curated before training. The data will not be shared publicly or with Google.
We need at least 1,000 valid examples to train a basic model. 10,000+ diverse examples would produce something genuinely useful. As of writing, we have 22 valid entries. We need help.
The Uncomfortable Honesty
I have a listicle too.
DEJAN publishes a page called “Best AI SEO Agencies to Watch in 2026.” We’re listed first. Someone submitted it to our own spam collection tool, and I accepted it as a valid entry. The classifier will be trained on it.
I’m not going to pretend that page doesn’t exist or that it doesn’t function as self-promotion. It does. It also happens to feature detailed, knowledgeable profiles of people doing genuinely pioneering work in AI search — and anyone in the industry can tell the difference between that and a list padded with token entries. But the structural pattern is the same, and a classifier should learn to flag it regardless of how good the write-ups are.
This is the core tension. Opting out unilaterally while everyone else plays the game means your clients lose AI visibility. The rational move right now is to participate — but to push hard for an environment where these tactics become unnecessary. That’s what the classifier is for. Not moral policing, but a practical tool for an industry that needs to know where the line is before Google draws it for us.
How to Help
If you’ve seen examples of content designed to manipulate AI answers — whether listicles, fake endorsements, prompt injection, or anything else that fits — please submit them anonymously at geospam.dejan.ai. Every valid submission makes the classifier more useful for everyone.
The crackdown is coming. The question is whether we’re ready for it.

Leave a Reply