AI Brand Authority Index: Ranking 2.9 Million Brands by Associative Embeddedness in Gemini’s Memory

Abstract

When a large language model is asked to “name 100 brands at random,” it doesn’t produce uniform randomness. It produces a distribution shaped by its training data, revealing which brands occupy the most cognitive real estate in the model’s parametric memory. We present a methodology for quantifying brand authority in AI memory using Personalized PageRank with seed-weighted teleportation. Phase 1 establishes seed brands through 200,000 independent recall surveys. Phase 2 constructs a two-level directed association graph. Phase 3 computes authority scores using sparse matrix power iteration across 2.9 million brand nodes. Manual quality control of 8,055 seed entries removes 2,163 junk artifacts produced by Gemini’s generation failures.

1. Background

PageRank models a random surfer who follows links across a graph. A node’s score depends on how many other nodes link to it and how authoritative those linking nodes are. The iterative computation converges on the stationary distribution of the random walk.

We apply this framework to brand recall in large language models. Instead of web pages and hyperlinks, our graph consists of brands and directed associations extracted from Google’s Gemini model. Instead of uniform teleportation, we use seed-weighted teleportation where brands the model recalls most frequently and earliest receive proportionally more random walk restarts.

2. Phase 1: Establishing the Seed Set

2.1 The Recall Survey

We conducted 200,000 independent runs against Google’s Gemini model (gemini-3-flash-preview), each with the same prompt:

name 100 brands at random, one per line, all lowercase, no spaces, no hyphens, say nothing else

Despite the instruction to respond “at random,” the model’s outputs are far from uniform. Brands like Google, Microsoft, and Nike appear in nearly every run, while obscure brands appear only once. This non-uniformity is the signal, not the noise.

2.2 Seed Statistics

From 200,000 runs, we extracted:

  • 8,608 unique brands (the raw seed set)
  • ~20 million total mentions
  • Per-brand metrics:
  • Frequency: total mentions across all runs
  • Distinct runs: number of unique runs containing the brand
  • Average rank: mean position when the brand appears (1 = first recalled, 100 = last)

2.3 Seed Weights

Each seed brand receives an initial authority weight combining recall frequency and recall priority:

$$w_i = \hat{f}_i \times \hat{r}_i^{-1}$$

where:

  • $\hat{f}_i = \frac{\text{distinct runs}_i}{\max(\text{distinct runs})}$ is the normalized recall frequency
  • $\hat{r}_i^{-1} = \frac{1/\text{avg rank}_i}{\max(1/\text{avg rank})}$ is the normalized inverse rank

A brand recalled in every run AND recalled first receives a weight near 1.0. A brand recalled once at position 98 receives a weight near zero. These weights become the personalization vector for PageRank teleportation.

2.4 Seed Quality Control

Raw Gemini output contained significant contamination. Manual review of all 8,055 seed entries (ranked by PageRank score) identified 2,163 junk entries — 26.8% of the seed set — across several distinct failure modes:

Concatenation artifacts — Gemini fused adjacent brand names together. The coca* prefix alone produced 11 variants: cocaapple, cocaflops, cocaalcola, cocaicoca, cocaelsa, cocaiccola, cocaicola, cocaonla, cocaformula, cocaole, cocaocla. The visa* prefix generated 80+ junk entries: visafarm, visafold, visafans, visafacebook, visanetwork, visahub, visawash, visacard, visafocus, visaglobal, visamatte, visaeurope, and dozens more. Similarly, hp* produced 100+ entries (hpmicrolab, hpmillett, hpmachines, hpmilwaukee), and tesla* generated 30+ (teslatotalsenergies, teslouisvuitton, teslacoil, teslapump).

Inner monologue leakage — Gemini’s internal reasoning about character constraints leaked into output as literal brand entries. Over 200 entries followed the pattern 雀巢 (parenthetical self-correction):

  • 雀巢 (actually nestle, switching to latin)
  • 雀巢 (oops, sticking to alphabet)
  • 雀巢 (replaced with nestle, wait, no spaces/hyphens only)
  • 雀巢 (thinking of brands...)
  • 雀巢 (just kidding)
  • 雀巢 (actually nestle, replace with kpmg)

These represent the model’s chain-of-thought processing about the CJK character 雀巢 (Nestle in Chinese) bleeding through as output tokens.

Typos and garbled namestoyote (toyota), hundai (hyundai), adidsa (adidas), luluemon (lululemon), rebok (reebok), porche (porsche), royleroyce (rollsroyce), senheiser (sennheiser).

Mixed-script artifacts — Partial CJK character insertion mid-brand: home固定depot, pizza动hut, dr控martens, estee固定lauder, western吐igital, cooler避master.

HTML/prompt leaks — Model markup and instructions appearing as brands: hugo</thought>apple, hugo</p>, and most remarkably: unite 100 brands at random, one per line, all lowercase, no spaces, no hyphens, say nothing else — the model echoed its own prompt as a brand name.

Generic wordsluxury, all, delivery, generic, detergent, pudding — words that aren’t brands.

Why this matters for PageRank: Junk seeds receive direct teleportation mass every iteration (alpha=0.15). A garbage entry like cocaapple at rank 789 receives the same structural boost as lecreuset at rank 790. Without filtering, junk seeds contaminate the authority signal at the core of the algorithm. The 2,163 entries were loaded into a brand_ignore table and excluded from the personalization vector during PageRank computation.

3. Phase 2: Constructing a Two-Level Association Graph

3.1 Level 1 (L1): Seed Associations

For each effective seed (~5,892 after filtering), we queried Gemini:

name 100 brands most closely associated with [brand], ordered from most to least associated, one per line, all lowercase, no spaces, no hyphens, say nothing else

This produced ~860,000 directed edges. These associations are genuinely asymmetric: Apple’s association with Beats (which it owns) carries different positional weight than Beats’ association with Apple.

3.2 Level 2 (L2): Discovered Brand Associations

Brands discovered at L1 that weren’t original seeds were themselves queried for their associations. This second pass dramatically expanded the graph into the long tail. A brand like titois (a Turkish textile company) appeared as an L1 association of vice, and when queried at L2, generated its own set of 100 associations including vuteks — another Turkish industrial brand that would never surface in a consumer-focused recall survey.

The full discovery chain for any brand can be traced: vice (seed) → titois (L1) → vuteks (L2).

3.3 Graph Scale

The resulting graph contains:

  • 2,886,212 unique brand nodes
  • Millions of directed weighted edges across L1 and L2
  • 5,892 effective seeds (after ignoring 2,163 junk entries)
  • ~201,000 L1 brands discovered through seed associations
  • ~2.68 million L2 brands discovered through L1 associations

3.4 Canonicalization

Brand names required normalization before graph construction:

  • Cyrillic homoglyph mapping: Characters like а (Cyrillic) mapped to a (Latin) to merge visually identical variants
  • CJK+Latin mixed-script filtering: Entries mixing Chinese/Japanese/Korean characters with Latin text flagged as junk
  • Manual aliases: 15 CJK-to-Latin mappings for legitimate brands (e.g., 雀巢nestle)
  • Variant tracking: 193,070 name variants mapped to canonical forms, preserving display names while merging duplicates

4. Computing Personalized PageRank

4.1 Random Walk Model

At each step of the random walk, a surfer either:

  • Teleports (probability alpha=0.15) — jumps to a seed brand, with probability proportional to that seed’s authority weight. Ignored seeds receive zero teleportation mass.
  • Follows an edge (probability 1-alpha=0.85) — follows an outgoing association edge, weighted by inverse position. Position 1 associations receive more weight than position 100.

4.2 Edge Weights

Association position determines edge weight. Brands listed earlier in Gemini’s association response receive proportionally more link equity via inverse position weighting. Each node’s outgoing edges are row-normalized to form a proper transition matrix.

4.3 Dangling Nodes

Brands with no outgoing edges (leaf nodes discovered at L2 but never queried) redistribute their accumulated mass back to the personalization vector, preserving the stochastic property of the transition matrix.

4.4 Sparse Matrix Power Iteration

The transition matrix is stored as a scipy CSR sparse matrix. Power iteration multiplies the current score vector by the transition matrix, adds the teleportation component, and repeats until convergence. Convergence criterion: L1 norm between successive score vectors falls below 1e-8, typically achieved within 30-50 iterations.

4.5 Why Personalized PageRank

Standard PageRank uses uniform teleportation — the random surfer restarts at any node with equal probability. Personalized PageRank biases the restart distribution toward specific nodes. In our case, seeds with higher recall frequency and earlier recall position receive more teleportation mass, making them stronger sources of authority in the network. Authority accumulates continuously from all reachable seeds, weighted by both seed authority and graph structure.

5. Results

5.1 Top 30 Brands

RankBrandScore
1Google1.000000
2Microsoft0.983081
3Nike0.951061
4Apple0.876266
5Adidas0.700542
6Sony0.684061
7Gucci0.639839
8Amazon0.623930
9Coca-Cola0.590042
10Chanel0.570568
11Prada0.550746
12Samsung0.532741
13Toyota0.516163
14Louis Vuitton0.511476
15Rolex0.508761
16Disney0.507488
17Hermes0.487205
18Dior0.479031
19Pepsi0.442026
20Intel0.427143
21Honda0.420288
22Patagonia0.417196
23Audi0.405366
24Panasonic0.396073
25Cartier0.374052
26Volkswagen0.368643
27Nintendo0.361812
28Porsche0.360956
29McDonald’s0.344910
30PUMA0.330191

5.2 Top Non-Seed Brands

The highest-ranking brands that Gemini never recalled unprompted but discovered purely through association:

RankBrandScore
1Maison Margiela0.094542
2Office0.075253
3L.L.Bean0.074981
4Cotopaxi0.072272
5Rick Owens0.070130
6Grand Seiko0.066426
7Bravia0.059241
8Jil Sander0.058125
9Mickey Mouse0.057300
10Richard Mille0.055195

These brands score high not because the model recalls them spontaneously, but because they sit at dense intersections of associations from high-authority seeds.

5.3 Scale

  • Total ranked brands: 2,886,212
  • Score range: 0.000000 to 1.000000
  • Seeds in top 30: 30/30
  • Non-seed brands discovered: 2,880,320
PageRank NS

6. What the Scores Measure

The final scores capture associative embeddedness — a combination of:

  1. Direct recall — Seeds that Gemini recalls frequently and early receive teleportation mass every iteration
  2. Centrality — Brands associated with many other high-authority brands accumulate more random walk traffic
  3. Network position — A brand with moderate recall but central positioning scores higher than a frequently recalled but isolated brand

This is distinct from simple popularity or recall frequency. A brand like Maison Margiela ranks as the top non-seed brand not because Gemini recalls it unprompted, but because it sits at a dense intersection of luxury fashion associations — reachable from dozens of high-authority seeds via short, heavily-weighted paths.

The PageRank scores answer not “how often does the model think of this brand?” but “how deeply embedded is this brand in the model’s associative structure?”

7. Technical Stack

  • Model: Google Gemini 3 Flash Preview
  • Phase 1: 200,000 recall surveys, 8,608 raw seeds, ~20M total mentions
  • Phase 2: ~14,500 association queries (L1 + L2), millions of directed edges
  • Graph: 2,886,212 nodes
  • Algorithm: Personalized PageRank via scipy sparse matrix power iteration
  • Teleportation factor (alpha): 0.15
  • Convergence tolerance: 1e-8
  • Seed quality control: 2,163 junk seeds identified via manual review and excluded
  • Canonicalization: Cyrillic homoglyph mapping, CJK filtering, 193,070 variant mappings, 15 manual CJK aliases
  • Storage: SQLite (1.5GB)
  • Dashboard: Streamlit with Plotly 3D network visualization
  • Concurrency: 20 simultaneous async API calls with incremental database commits

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *