Measuring Brand Authority in Gemini’s Parametric Memory: A PageRank NS Approach

When a large language model is asked to “name 100 brands at random,” it doesn’t produce uniform randomness. It produces a distribution shaped by its training data, revealing which brands occupy the most cognitive real estate in the model’s latent space. We present a two-phase methodology for quantifying brand authority in AI memory, inspired by Google’s PageRank Nearest Seeds (PageRank NS) algorithm. Phase 1 establishes 8,608 seed brands through 200,000 independent recall surveys. Phase 2 constructs a directed brand association graph by querying the model for the 100 closest brands to each seed, then computes authority scores using shortest-path distances through this graph.

1. From Web Authority to Brand Authority

Google’s original PageRank algorithm ranks web pages by modeling a “random surfer” who follows links. A page’s score depends on how many pages link to it and how authoritative those linking pages are. The iterative computation converges on an eigenvector of the web’s link matrix.

PageRank NS (Nearest Seeds), described in US Patent 9,165,040, reformulates this as a shortest-path problem. Instead of iterating until convergence, it:

  1. Selects a set of trusted seed pages
  2. Assigns lengths to links based on source page properties
  3. Computes shortest distances from seeds to all pages
  4. Ranks each page by the distance from its nearest seeds

The key mathematical insight is that replacing the sum in PageRank with the dominant (maximum) incoming contribution transforms the multiplicative iterative computation into additive shortest paths in log-space.

We adapt this framework to a fundamentally different domain: brand recall in large language models.

2. Phase 1: Establishing the Seed Set

2.1 The Recall Survey

We conducted 200,000 independent runs against Google’s Gemini model, each with the same prompt:

name 100 brands at random, one per line, all lowercase, no spaces, no hyphens, say nothing else

Despite the instruction to respond “at random,” the model’s outputs are far from uniform. Brands like Apple, Nike, and Samsung appear in nearly every run, while obscure brands appear only once. This non-uniformity is the signal, not the noise.

2.2 Seed Statistics

From 200,000 runs, we extracted:

  • 8,608 unique brands (the seed set)
  • 19,995,027 total mentions
  • Per-brand metrics:
  • Frequency: total mentions across all runs
  • Distinct runs: number of unique runs containing the brand (caps at 200,000)
  • Average rank: mean position when the brand appears (1 = first recalled, 100 = last)

2.3 Seed Weights

Each seed brand receives an initial authority weight combining recall frequency and recall priority:

$$w_i = \hat{f}_i \times \hat{r}_i^{-1}$$

where:

  • $\hat{f}_i = \frac{\text{distinct runs}_i}{\max(\text{distinct runs})}$ is the normalized recall frequency
  • $\hat{r}_i^{-1} = \frac{1/\text{avg rank}_i}{\max(1/\text{avg rank})}$ is the normalized inverse rank

A brand recalled in every run AND recalled first (like Apple, with distinct runs = 200,000 and avg rank = 1.22) receives a weight near 1.0. A brand recalled once at position 98 receives a weight near zero.

This weight translates to an initial distance for the PageRank NS computation:

$$d_i^0 = -\ln(w_i)$$

High-authority seeds start with short initial distances; low-authority seeds start far away.

3. Phase 2: Constructing the Directed Association Graph

3.1 From Co-occurrence to Directed Association

Phase 1 produces co-occurrence data: which brands appear together in the same run. This yields an undirected graph where edges represent “recalled together.” While useful, undirected co-occurrence graphs limit PageRank’s discriminative power, since PageRank on undirected graphs reduces to weighted degree centrality.

Phase 2 constructs a genuinely directed graph. For each of the 8,608 seed brands, we query the model:

name 100 brands most closely associated with [brand], ordered from most to least associated, one per line, all lowercase, no spaces, no hyphens, say nothing else

This produces directed edges: seed A associates with brand B at position j. The ordering provides implicit edge weights. Crucially, the relationship is asymmetric: Apple may strongly associate with Beats (which it owns), but Beats’ association with Apple, while present, carries different positional weight.

3.2 Graph Properties

The resulting graph contains:

  • 8,608 seed nodes (sources of outgoing edges)
  • ~860,800 directed edges (8,608 seeds x ~100 associations each)
  • Non-seed nodes: brands that appear in association lists but were not in the original 8,608 seed set

Non-seed nodes are reachable from seeds but have no outgoing edges of their own. In PageRank NS terms, they are “pages” that can be ranked but do not propagate authority further. This naturally mirrors the patent’s topology of seed pages linking to non-seed pages.

3.3 Multi-hop Paths

Many associated brands are themselves seeds. When Nike points to Adidas and Adidas points to Puma, a two-hop path exists from Nike to Puma. These transitive associations emerge without explicit prompting. A brand that is reachable via short paths from many seeds accumulates authority from all of them.

4. Computing PageRank NS

4.1 Link Lengths

Following the patent’s formulation, we assign a length to each directed edge based on the source node’s out-degree and the link’s position:

$$L(q \rightarrow p) = \alpha + \ln(\text{position})$$

where $\alpha = -\ln(d)$ and $d = 0.85$ is the damping factor.

Position 1 (strongest association) receives length $\alpha + \ln(1) = \alpha \approx 0.163$. Position 100 receives length $\alpha + \ln(100) \approx 4.768$. This logarithmic scaling ensures that the top associations contribute substantially shorter paths without completely ignoring lower-ranked ones.

Since every seed has approximately the same out-degree (~100), the $\ln(|q|_{\text{out}})$ term from the patent’s general formula becomes a constant absorbed into the base distance.

4.2 Shortest Paths via Dijkstra

For each seed $s_i$, we run Dijkstra’s algorithm from initial distance $d_i^0 = -\ln(w_i)$:

$$D_i(p) = d_i^0 + \text{shortest path}(s_i, p)$$

where the shortest path sums link lengths along the minimum-cost route from $s_i$ to $p$.

4.3 Aggregated Authority Score

The original PageRank NS patent uses the $k$-th nearest seed distance as the final score, choosing a fixed $k$ (typically 3-6) to suppress flukes where a page is artificially close to just one seed.

We generalize this by removing the fixed $k$ cutoff entirely. Instead, we sum contributions from all seeds that reach a given brand:

$$\text{Score}(p) = \sum_{i=1}^{n} e^{-D_i(p)}$$

where $n = 8{,}608$ and $D_i(p) = \infty$ (contributing zero) if seed $i$ cannot reach $p$.

This formulation naturally rewards both breadth (being reachable from many seeds) and depth (being close to high-authority seeds). The exponential decay $e^{-D}$ ensures that distant seeds contribute negligibly without requiring an arbitrary cutoff. A brand reachable from 500 seeds accumulates far more authority than one reachable from 5, and closer seeds contribute exponentially more than distant ones.

5. Challenges and Solutions

5.1 Scale of Data Collection

Challenge: 200,000 API calls for Phase 1 and 8,608 calls for Phase 2, each requiring parsing, validation, and storage.

Solution: Asynchronous concurrent requests (20 simultaneous) with incremental database commits every 100 seeds. Resume capability ensures that crashes don’t lose completed work. A brand processed at seed 5,000 is permanently stored regardless of what happens at seed 5,001.

5.2 Duplicate Mentions Within Runs

Challenge: The model occasionally mentions a brand multiple times within a single run, inflating frequency counts. Nike appears 221,072 times across 200,000 runs, meaning it’s duplicated in ~21,000 runs.

Solution: We track distinct_runs (unique runs containing the brand) separately from raw frequency. The seed weight formula uses distinct_runs to avoid double-counting.

5.3 Undirected Co-occurrence Limits PageRank

Challenge: Phase 1 co-occurrence data is inherently undirected. Running PageRank on an undirected graph collapses to weighted degree centrality, providing no insight beyond what simpler metrics already capture.

Solution: Phase 2 introduces genuine directionality. “Apple associates with Beats” is a different edge than “Beats associates with Apple.” This asymmetry is what gives PageRank its discriminative power over simpler centrality measures.

5.4 Fixed $k$ Is Arbitrary

Challenge: The original PageRank NS uses the $k$-th nearest seed, but any fixed $k$ is arbitrary. A brand close to 2 seeds scores zero at $k = 3$, while a brand close to exactly 3 scores well. This creates a discontinuity.

Solution: Summing $e^{-D_i(p)}$ across all seeds eliminates the cutoff. Authority accumulates continuously. The exponential decay naturally handles the concern that motivated fixed $k$: a brand artificially close to one seed gets that seed’s $e^{-D}$ contribution, but it’s overwhelmed by brands that accumulate contributions from hundreds of seeds.

5.5 Non-Seed Brand Discovery

Challenge: The Phase 2 association queries surface brands that never appeared in any of the 200,000 Phase 1 runs. These are real brands with genuine associations but zero recall frequency.

Solution: Non-seed brands become leaf nodes in the graph. They receive authority from incoming edges but propagate none further (having no outgoing edges). Their scores reflect how strongly the seed brands associate with them, providing a measure of “associative relevance” distinct from “recall frequency.”

6. Interpretation

The final PageRank NS score captures something distinct from simple popularity metrics:

MetricWhat It Measures
FrequencyHow often the model recalls the brand
Average rankHow early the model recalls the brand
Popularity (freq / avg_rank)Combined recall strength
Co-occurrence PMIPairwise association strength
PageRank NSNetwork authority through directed associations

A brand can have moderate recall frequency but high PageRank NS if it sits at a central position in the association network, reachable via short paths from many high-authority seeds. Conversely, a frequently recalled brand with few inbound associations may score lower than expected.

This distinction matters for understanding AI brand perception. PageRank NS answers not “how often does the model think of this brand?” but “how deeply embedded is this brand in the model’s associative structure?”

7. Technical Details

  • Model: Google Gemini (gemini-3-flash-preview)
  • Phase 1: 200,000 runs, 8,608 unique brands, 19,995,027 total mentions
  • Phase 2: 8,608 seed queries, ~860,800 directed edges
  • Damping factor: $d = 0.85$ ($\alpha = 0.163$)
  • Link length: $L = 0.163 + \ln(\text{position})$
  • Score: $\text{Score}(p) = \sum_i e^{-D_i(p)}$

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *