How GPT Sees the Web

A Technical Walkthrough of Web Search, Snippets, Expansions, Context Sizes, and Sliding Windows

Many people assume GPT “views” the web the way humans do: full pages, HTML, images, layout, and complete articles. Reality is very different. GPT doesn’t browse. It doesn’t load pages. It doesn’t ingest entire documents. What it sees is controlled, windowed, and heavily restricted.

This article explains:

What GPT receives from Web Search
How grounding snippets work
How expansions (open / click) operate
How context size settings (Low/Medium/High) affect retrieval
How the sliding window mechanism reveals more of a page
What developers actually receive in API responses

All examples are based on real interactions during testing.

What GPT Receives From a Web Search

When GPT requests a web search result, it receives a small structured object:

Title
URL
Short text snippet (1–3 sentences)
Optional metadata such as date or score
A unique internal ID (turn0search0, etc.)

This is all the grounding GPT gets initially.

It does not receive:

Full pages
Raw HTML
Full article content
Site navigation or structure

These snippets behave like compact search-engine result cards.

What “open” and “click” do.

Each snippet comes with a retrieval ID. GPT can request more with:

open()

Fetches a larger slice of text from the same page, centered around a line number.

This is how GPT “scrolls.”

click()

Follows an outgoing link from the snippet.
The new page is fetched as another snippet, using the same rules as the original search.

Are Expansions Limited to One Round?

No. There is no single-round limit.

GPT can:

open at line 1
then open at line 50
then open at line 120
and so on

Each call retrieves a new window of text.
This creates a sliding-window browsing pattern.

Can GPT Retrieve the Entire Page With Enough Windows?

In theory: it can see many parts of a page.
In practice: no, because of strict limits:

Each expansion is a capped-size window
Tool responses per turn are capped
The model cannot output more than allowed quoting limits

So despite multiple expansions, GPT cannot reconstruct or reproduce entire pages.

Hard Limits: Retrieval and Output

Two types of restrictions exist:

Retrieval limits

Each open call returns only a fixed window, even if the page is thousands of words long.

Output limits

Even if GPT has seen multiple windows, it cannot return:

Full content
Long passages
More than allowed quoted text

It must summarise.

Is This a Special Internal Tool?

No.

Everything demonstrated here uses the same Web Search tool available in the Assistants API.
Developers enabling Web Search in their assistants get the same capabilities.

Context Size Settings

In the Assistants configuration panel, developers choose how large the web context should be.

GPT cannot choose this.
It simply receives:

Low: smallest windowed slices
Medium: more expansive slices
High: largest allowable slices

Exact numeric values are not publicly documented.

These settings affect:

How much snippet text is returned
How large each expansion window is
How much surrounding content comes with a search result

Which brings us to…

The Sliding Window in Action

We tested a live page using increasing context sizes and repeated expansions.
This illustrates exactly how GPT “walks” through a page using windows.

Step 1 — Initial snippet (Low context)

Returned only a tiny excerpt of the article.

Limited, barely useful for understanding the page.

Snippet 1: from his blog page on DEJAN

“From Free-Text to Likert Distributions: A Practical Guide to SSR for Purchase Intent Oct 15, 2025 — by Dan Petrovic” dejan.ai
“# Author: Dan Petrovic … ## BlockRank: A Faster, Smarter Way to Rank Documents with LLMs 10 Nov 2025 — by Dan Petrovic” dejan.ai

Snippet 2: blog list (same source)

“Instead of sending a user to one “best” page, Google’s AI Mode assembles an answer from short text extracts … 9 Nov 2025 — by Dan Petrovic” dejan.ai

Snippet 3: bio/reference page

“Dan Petrovic is the most prominent AI SEO … applies his AEO framework to numerous global brands.” dejan.ai

Snippet 4: conference page

“Beyond Rank Tracking: Analyzing Brand Perceptions Through Language Model Association Networks Featuring Dan Petrovic … Posted On March 18, 2025” SEO Week

Step 2 — Expanding at line 1

First open() call reveals the start of the article:

Title
Date
First paragraph
Some introductory context

This is still capped: the entire introduction does not fit.

Result snippet (initial)

From the search result:

“Large Language Models (LLMs) have revolutionized many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism …” dejan.ai

Expanded view (open the page around line ~1)

“# BlockRank: A Faster, Smarter Way to Rank Documents with LLMs
10 Nov 2025 — by Dan Petrovic — in Machine Learning
Large Language Models (LLMs) have revolutionised many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism in LLMs scales quadratically with the length of the input context, making it slow and expensive to rank a large number of documents.” dejan.ai

Further down the page (open around line ~30)

“How BlockRank Works: A Two-Pronged Approach
Based on these insights, BlockRank introduces two key innovations to the standard LLM architecture and fine-tuning process:

1. Structured Sparse Attention

BlockRank modifies the attention mechanism to enforce the observed block sparsity. This is achieved by restricting the attention flow as follows:
• Document tokens only attend to other tokens within the same document and to the initial instruction tokens.
• Query tokens attend to all tokens in the prompt (instructions and all documents) to gather the necessary context for ranking.
This structured attention pattern reduces the computational complexity from quadratic (O(n²)) to linear (O(n)), resulting in a significant speed-up in both training and inference.” dejan.ai

One more section (open around line ~60)

“Performance: Faster and More Accurate
The BlockRank paper presents a comprehensive evaluation of the method on several standard information retrieval benchmarks. The results are impressive:
• State-of-the-art performance: On the BEIR benchmark, BlockRank outperforms existing state-of-the-art listwise rankers like FIRST, RankZephyr, and RankVicuna.
• Significant speed-up: BlockRank is 4.7x faster than a standard fine-tuned Mistral-7B model when ranking 100 documents.
• Scalability: BlockRank can rank up to 500 documents (≈100 000 tokens) in under a second, with its latency scaling linearly with the number of documents.” dejan.ai

The tool returned a title, date, author metadata in the expansion.
It gives the first few paragraphs (but still truncated).
Content is formatted in markdown / HTML-converted plain text.
You can scroll further by opening more lines at different lineno.
You don’t get the full page — but you get useful sections.

Step 3 — Expanding deeper (line 30, line 60, etc.)

Each expansion retrieves more of the page:

Body sections
Headings
Explanatory paragraphs
Lists and examples

But still windowed.

You can see sequential slices, but not all slices, and not the full article.

“## Open-Source Implementation
The authors have released the code for BlockRank on GitHub [2], making it easy for researchers and practitioners to use and build upon their work. The repository includes:
• The core BlockRank attention implementation in both standard PyTorch and optimized Triton kernels.
• The auxiliary attention loss module.
• Training and evaluation scripts.
• A pre-trained BlockRank model based on Mistral-7B, available on Hugging Face.
• A quickstart notebook to help you get started.
The code is well-documented and provides a solid foundation for experimenting with BlockRank on your own datasets.

“## Conclusion
BlockRank is a significant step forward in making LLM-based in-context ranking more practical and accessible. By identifying and exploiting the inherent structure of the attention mechanism for this task, the authors have developed a method that is both faster and more accurate than existing approaches. The open-source release of the code and a pre-trained model further lowers the barrier to entry for using this powerful technique.
As LLMs continue to grow in size and capability, methods like BlockRank that focus on efficiency and scalability will become increasingly important. We’re excited to see how the community will build upon this work and apply it to new and challenging information retrieval problems.

References

[1] Gupta, N., You, C., … & Yu, F. (2025). Scalable In-context Ranking with Generative Models. arXiv preprint arXiv:2510.05396. https://arxiv.org/abs/2510.05396
[2] BlockRank GitHub Repository. https://github.com/dejanai/BlockRank”

Comments

Cancel reply
Your email address will not be published. Required fields are marked *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
I am a robot. I am a human.
←Previous: In AI SEO #10 is the new #1
DEJAN
AI Marketing Agency
AI Rank Privacy Policy | Dan Petrovic | Noli esse malus.”

Step 4 — High context mode

Switching to High context makes each window taller, so expansions return:

Longer excerpts
More adjacent paragraphs
Larger text blocks per request

But even on High, expansions eventually hit tool caps.

The sliding window becomes more efficient but still cannot reveal the full page.

This directly demonstrates that:

GPT sees sequential slices, not the full page
Retrieval is windowed
Windows are capped
Each window is a plaintext extraction
Developers never see any of these windows

GPT does not browse the web.

It receives:

Small snippets
Structured metadata
Limited expansions through windowed retrieval
No HTML, no full pages

How GPT Sees the Web

A Technical Walkthrough of Web Search, Snippets, Expansions, Context Sizes, and Sliding Windows

What GPT Receives From a Web Search

What “open” and “click” do.

open()

click()

Are Expansions Limited to One Round?

Can GPT Retrieve the Entire Page With Enough Windows?

Hard Limits: Retrieval and Output

Retrieval limits

Output limits

Is This a Special Internal Tool?

Context Size Settings

The Sliding Window in Action

Step 1 — Initial snippet (Low context)

Step 2 — Expanding at line 1

Result snippet (initial)

Expanded view (open the page around line ~1)

Further down the page (open around line ~30)

1. Structured Sparse Attention

One more section (open around line ~60)

Step 3 — Expanding deeper (line 30, line 60, etc.)

References

Comments

Leave a Reply

Step 4 — High context mode

Comments

Leave a Reply Cancel reply