How GPT Sees the Web

Reverse Engineering

A technical walkthrough of how GPT handles web search, including snippets, expansions, context size settings, and the sliding window mechanism for retrieval.

Listen

Many people assume that a Generative Pre-trained Transformer, or GPT, browses the web just like humans do, loading full pages, layouts, and images. In reality, the process is heavily restricted. GPT does not actually browse or ingest entire documents. Instead, it relies on a controlled, windowed system.

When GPT first performs a web search, it receives only a tiny snippet of text, usually just one to three sentences. To see more, the model must request specific expansions. It can use an open command to fetch a larger slice of text from a specific line number, which acts like scrolling down the page. It can also use a click command to follow links.

This creates a sliding window pattern. While GPT can request multiple expansions to read further down a page, it can never reconstruct the entire document. There are strict limits on how much text each window can hold and how much information the model can output.

Developers using these search tools in their assistants can choose low, medium, or high context settings. A higher setting provides larger slices of text, but the hard limits remain. Ultimately, GPT never sees raw HTML or full web pages. It only reads sequential, plain-text fragments, piece by piece.

A Technical Walkthrough of Web Search, Snippets, Expansions, Context Sizes, and Sliding Windows

Many people assume GPT “views” the web the way humans do: full pages, HTML, images, layout, and complete articles. Reality is very different. GPT doesn’t browse. It doesn’t load pages. It doesn’t ingest entire documents. What it sees is controlled, windowed, and heavily restricted.

This article explains:

What GPT receives from Web Search
How grounding snippets work
How expansions (open / click) operate
How context size settings (Low/Medium/High) affect retrieval
How the sliding window mechanism reveals more of a page
What developers actually receive in API responses

All examples are based on real interactions during testing.

What GPT Receives From a Web Search

When GPT requests a web search result, it receives a small structured object:

Title
URL
Short text snippet (1–3 sentences)
Optional metadata such as date or score
A unique internal ID (turn0search0, etc.)

This is all the grounding GPT gets initially.

It does not receive:

Full pages
Raw HTML
Full article content
Site navigation or structure

These snippets behave like compact search-engine result cards.

What “open” and “click” do.

Each snippet comes with a retrieval ID. GPT can request more with:

open()

Fetches a larger slice of text from the same page, centered around a line number.

This is how GPT “scrolls.”

click()

Follows an outgoing link from the snippet.

The new page is fetched as another snippet, using the same rules as the original search.

Are Expansions Limited to One Round?

No. There is no single-round limit.

GPT can:

open at line 1
then open at line 50
then open at line 120
and so on

Each call retrieves a new window of text.

This creates a sliding-window browsing pattern.

Can GPT Retrieve the Entire Page With Enough Windows?

In theory: it can see many parts of a page.

In practice: no, because of strict limits:

Each expansion is a capped-size window
Tool responses per turn are capped
The model cannot output more than allowed quoting limits

So despite multiple expansions, GPT cannot reconstruct or reproduce entire pages.

Hard Limits: Retrieval and Output

Two types of restrictions exist:

Retrieval limits

Each open call returns only a fixed window, even if the page is thousands of words long.

Output limits

Even if GPT has seen multiple windows, it cannot return:

Full content
Long passages
More than allowed quoted text

It must summarise.

Is This a Special Internal Tool?

No.

Everything demonstrated here uses the same Web Search tool available in the Assistants API.

Developers enabling Web Search in their assistants get the same capabilities.

Context Size Settings

In the Assistants configuration panel, developers choose how large the web context should be.

GPT cannot choose this.

It simply receives:

Low: smallest windowed slices
Medium: more expansive slices
High: largest allowable slices

Exact numeric values are not publicly documented.

These settings affect:

How much snippet text is returned
How large each expansion window is
How much surrounding content comes with a search result

Which brings us to…

The Sliding Window in Action

We tested a live page using increasing context sizes and repeated expansions.

This illustrates exactly how GPT “walks” through a page using windows.

Step 1 — Initial snippet (Low context)

Returned only a tiny excerpt of the article.

Limited, barely useful for understanding the page.

Snippet 1: from his blog page on DEJAN

“From Free-Text to Likert Distributions: A Practical Guide to SSR for Purchase Intent Oct 15, 2025 — by Dan Petrovic” dejan.ai

“# Author: Dan Petrovic … ## BlockRank: A Faster, Smarter Way to Rank Documents with LLMs 10 Nov 2025 — by Dan Petrovic” dejan.ai

Snippet 2: blog list (same source)

“Instead of sending a user to one “best” page, Google’s AI Mode assembles an answer from short text extracts … 9 Nov 2025 — by Dan Petrovic” dejan.ai

Snippet 3: bio/reference page

“Dan Petrovic is the most prominent AI SEO … applies his AEO framework to numerous global brands.” dejan.ai

Snippet 4: conference page

“Beyond Rank Tracking: Analyzing Brand Perceptions Through Language Model Association Networks Featuring Dan Petrovic … Posted On March 18, 2025” SEO Week

Step 2 — Expanding at line 1

First open() call reveals the start of the article:

Title
Date
First paragraph
Some introductory context

This is still capped: the entire introduction does not fit.

Result snippet (initial)

From the search result:

“Large Language Models (LLMs) have revolutionized many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism …” dejan.ai

Expanded view (open the page around line ~1)

“# BlockRank: A Faster, Smarter Way to Rank Documents with LLMs

10 Nov 2025 — by Dan Petrovic — in Machine Learning

Large Language Models (LLMs) have revolutionised many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism in LLMs scales quadratically with the length of the input context, making it slow and expensive to rank a large number of documents.” dejan.ai

Further down the page (open around line ~30)

“How BlockRank Works: A Two-Pronged Approach

Based on these insights, BlockRank introduces two key innovations to the standard LLM architecture and fine-tuning process:

1. Structured Sparse Attention

BlockRank modifies the attention mechanism to enforce the observed block sparsity. This is achieved by restricting the attention flow as follows:

• Document tokens only attend to other tokens within the same document and to the initial instruction tokens.

• Query tokens attend to all tokens in the prompt (instructions and all documents) to gather the necessary context for ranking.

This structured attention pattern reduces the computational complexity from quadratic (O(n²)) to linear (O(n)), resulting in a significant speed-up in both training and inference.” dejan.ai

One more section (open around line ~60)

“Performance: Faster and More Accurate

The BlockRank paper presents a comprehensive evaluation of the method on several standard information retrieval benchmarks. The results are impressive:

• State-of-the-art performance: On the BEIR benchmark, BlockRank outperforms existing state-of-the-art listwise rankers like FIRST, RankZephyr, and RankVicuna.

• Significant speed-up: BlockRank is 4.7x faster than a standard fine-tuned Mistral-7B model when ranking 100 documents.

• Scalability: BlockRank can rank up to 500 documents (≈100 000 tokens) in under a second, with its latency scaling linearly with the number of documents.” dejan.ai

The tool returned a title, date, author metadata in the expansion.
It gives the first few paragraphs (but still truncated).
Content is formatted in markdown / HTML-converted plain text.
You can scroll further by opening more lines at different lineno.
You don’t get the full page — but you get useful sections.

Step 3 — Expanding deeper (line 30, line 60, etc.)

Each expansion retrieves more of the page:

Body sections
Headings
Explanatory paragraphs
Lists and examples

But still windowed.

You can see sequential slices, but not all slices, and not the full article.

“## Open-Source Implementation

The authors have released the code for BlockRank on GitHub [2], making it easy for researchers and practitioners to use and build upon their work. The repository includes:

• The core BlockRank attention implementation in both standard PyTorch and optimized Triton kernels.

• The auxiliary attention loss module.

• Training and evaluation scripts.

• A pre-trained BlockRank model based on Mistral-7B, available on Hugging Face.

• A quickstart notebook to help you get started.

The code is well-documented and provides a solid foundation for experimenting with BlockRank on your own datasets.

“## Conclusion

BlockRank is a significant step forward in making LLM-based in-context ranking more practical and accessible. By identifying and exploiting the inherent structure of the attention mechanism for this task, the authors have developed a method that is both faster and more accurate than existing approaches. The open-source release of the code and a pre-trained model further lowers the barrier to entry for using this powerful technique.

As LLMs continue to grow in size and capability, methods like BlockRank that focus on efficiency and scalability will become increasingly important. We’re excited to see how the community will build upon this work and apply it to new and challenging information retrieval problems.

References

[1] Gupta, N., You, C., … & Yu, F. (2025). Scalable In-context Ranking with Generative Models. arXiv preprint arXiv:2510.05396. https://arxiv.org/abs/2510.05396

[2] BlockRank GitHub Repository. https://github.com/dejanai/BlockRank”

Comments

Cancel reply

Your email address will not be published. Required fields are marked *

Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

I am a robot. I am a human.

←Previous: In AI SEO #10 is the new #1

DEJAN

AI Marketing Agency

AI Rank Privacy Policy | Dan Petrovic | Noli esse malus.”

Step 4 — High context mode

Switching to High context makes each window taller, so expansions return:

Longer excerpts
More adjacent paragraphs
Larger text blocks per request

But even on High, expansions eventually hit tool caps.

The sliding window becomes more efficient but still cannot reveal the full page.

This directly demonstrates that:

GPT sees sequential slices, not the full page
Retrieval is windowed
Windows are capped
Each window is a plaintext extraction
Developers never see any of these windows

GPT does not browse the web.

It receives:

Small snippets
Structured metadata
Limited expansions through windowed retrieval
No HTML, no full pages

Dan Petrovic · Nov 14, 00:50