A Technical Walkthrough of Web Search, Snippets, Expansions, Context Sizes, and Sliding Windows
Many people assume GPT “views” the web the way humans do: full pages, HTML, images, layout, and complete articles. Reality is very different. GPT doesn’t browse. It doesn’t load pages. It doesn’t ingest entire documents. What it sees is controlled, windowed, and heavily restricted.
This article explains:
- What GPT receives from Web Search
- How grounding snippets work
- How expansions (
open/click) operate - How context size settings (Low/Medium/High) affect retrieval
- How the sliding window mechanism reveals more of a page
- What developers actually receive in API responses
All examples are based on real interactions during testing.
What GPT Receives From a Web Search
When GPT requests a web search result, it receives a small structured object:
- Title
- URL
- Short text snippet (1–3 sentences)
- Optional metadata such as date or score
- A unique internal ID (
turn0search0, etc.)
This is all the grounding GPT gets initially.
It does not receive:
- Full pages
- Raw HTML
- Full article content
- Site navigation or structure
These snippets behave like compact search-engine result cards.
What “open” and “click” do.
Each snippet comes with a retrieval ID. GPT can request more with:
open()
Fetches a larger slice of text from the same page, centered around a line number.
This is how GPT “scrolls.”
click()
Follows an outgoing link from the snippet.
The new page is fetched as another snippet, using the same rules as the original search.
Are Expansions Limited to One Round?
No. There is no single-round limit.
GPT can:
openat line 1- then
openat line 50 - then
openat line 120 - and so on
Each call retrieves a new window of text.
This creates a sliding-window browsing pattern.
Can GPT Retrieve the Entire Page With Enough Windows?
In theory: it can see many parts of a page.
In practice: no, because of strict limits:
- Each expansion is a capped-size window
- Tool responses per turn are capped
- The model cannot output more than allowed quoting limits
So despite multiple expansions, GPT cannot reconstruct or reproduce entire pages.
Hard Limits: Retrieval and Output
Two types of restrictions exist:
Retrieval limits
Each open call returns only a fixed window, even if the page is thousands of words long.
Output limits
Even if GPT has seen multiple windows, it cannot return:
- Full content
- Long passages
- More than allowed quoted text
It must summarise.
Is This a Special Internal Tool?
No.
Everything demonstrated here uses the same Web Search tool available in the Assistants API.
Developers enabling Web Search in their assistants get the same capabilities.
Context Size Settings
In the Assistants configuration panel, developers choose how large the web context should be.
GPT cannot choose this.
It simply receives:
- Low: smallest windowed slices
- Medium: more expansive slices
- High: largest allowable slices
Exact numeric values are not publicly documented.
These settings affect:
- How much snippet text is returned
- How large each expansion window is
- How much surrounding content comes with a search result
Which brings us to…
The Sliding Window in Action
We tested a live page using increasing context sizes and repeated expansions.
This illustrates exactly how GPT “walks” through a page using windows.
Step 1 — Initial snippet (Low context)
Returned only a tiny excerpt of the article.
Limited, barely useful for understanding the page.
Snippet 1: from his blog page on DEJAN
“From Free-Text to Likert Distributions: A Practical Guide to SSR for Purchase Intent Oct 15, 2025 — by Dan Petrovic” dejan.ai
“# Author: Dan Petrovic … ## BlockRank: A Faster, Smarter Way to Rank Documents with LLMs 10 Nov 2025 — by Dan Petrovic” dejan.ai
Snippet 2: blog list (same source)
“Instead of sending a user to one “best” page, Google’s AI Mode assembles an answer from short text extracts … 9 Nov 2025 — by Dan Petrovic” dejan.ai
Snippet 3: bio/reference page
“Dan Petrovic is the most prominent AI SEO … applies his AEO framework to numerous global brands.” dejan.ai
Snippet 4: conference page
“Beyond Rank Tracking: Analyzing Brand Perceptions Through Language Model Association Networks Featuring Dan Petrovic … Posted On March 18, 2025” SEO Week
Step 2 — Expanding at line 1
First open() call reveals the start of the article:
- Title
- Date
- First paragraph
- Some introductory context
This is still capped: the entire introduction does not fit.
Result snippet (initial)
From the search result:
“Large Language Models (LLMs) have revolutionized many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism …” dejan.ai
Expanded view (open the page around line ~1)
“# BlockRank: A Faster, Smarter Way to Rank Documents with LLMs
10 Nov 2025 — by Dan Petrovic — in Machine Learning
Large Language Models (LLMs) have revolutionised many areas of natural language processing, and information retrieval is no exception. A promising new paradigm called In-Context Ranking (ICR) leverages the contextual understanding of LLMs to re-rank a list of candidate documents for a given query. However, this power comes at a cost: the computational complexity of the attention mechanism in LLMs scales quadratically with the length of the input context, making it slow and expensive to rank a large number of documents.” dejan.ai
Further down the page (open around line ~30)
“How BlockRank Works: A Two-Pronged Approach
Based on these insights, BlockRank introduces two key innovations to the standard LLM architecture and fine-tuning process:
1. Structured Sparse Attention
BlockRank modifies the attention mechanism to enforce the observed block sparsity. This is achieved by restricting the attention flow as follows:
• Document tokens only attend to other tokens within the same document and to the initial instruction tokens.
• Query tokens attend to all tokens in the prompt (instructions and all documents) to gather the necessary context for ranking.
This structured attention pattern reduces the computational complexity from quadratic (O(n²)) to linear (O(n)), resulting in a significant speed-up in both training and inference.” dejan.ai
One more section (open around line ~60)
“Performance: Faster and More Accurate
The BlockRank paper presents a comprehensive evaluation of the method on several standard information retrieval benchmarks. The results are impressive:
• State-of-the-art performance: On the BEIR benchmark, BlockRank outperforms existing state-of-the-art listwise rankers like FIRST, RankZephyr, and RankVicuna.
• Significant speed-up: BlockRank is 4.7x faster than a standard fine-tuned Mistral-7B model when ranking 100 documents.
• Scalability: BlockRank can rank up to 500 documents (≈100 000 tokens) in under a second, with its latency scaling linearly with the number of documents.” dejan.ai
- The tool returned a title, date, author metadata in the expansion.
- It gives the first few paragraphs (but still truncated).
- Content is formatted in markdown / HTML-converted plain text.
- You can scroll further by opening more lines at different
lineno. - You don’t get the full page — but you get useful sections.
Step 3 — Expanding deeper (line 30, line 60, etc.)
Each expansion retrieves more of the page:
- Body sections
- Headings
- Explanatory paragraphs
- Lists and examples
But still windowed.
You can see sequential slices, but not all slices, and not the full article.
“## Open-Source Implementation
The authors have released the code for BlockRank on GitHub [2], making it easy for researchers and practitioners to use and build upon their work. The repository includes:
• The core BlockRank attention implementation in both standard PyTorch and optimized Triton kernels.
• The auxiliary attention loss module.
• Training and evaluation scripts.
• A pre-trained BlockRank model based on Mistral-7B, available on Hugging Face.
• A quickstart notebook to help you get started.
The code is well-documented and provides a solid foundation for experimenting with BlockRank on your own datasets.
“## Conclusion
BlockRank is a significant step forward in making LLM-based in-context ranking more practical and accessible. By identifying and exploiting the inherent structure of the attention mechanism for this task, the authors have developed a method that is both faster and more accurate than existing approaches. The open-source release of the code and a pre-trained model further lowers the barrier to entry for using this powerful technique.
As LLMs continue to grow in size and capability, methods like BlockRank that focus on efficiency and scalability will become increasingly important. We’re excited to see how the community will build upon this work and apply it to new and challenging information retrieval problems.
References
[1] Gupta, N., You, C., … & Yu, F. (2025). Scalable In-context Ranking with Generative Models. arXiv preprint arXiv:2510.05396. https://arxiv.org/abs/2510.05396
[2] BlockRank GitHub Repository. https://github.com/dejanai/BlockRank”
Comments
Leave a Reply
Cancel reply
Your email address will not be published. Required fields are marked *
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
I am a robot. I am a human.
←Previous: In AI SEO #10 is the new #1
DEJAN
AI Marketing Agency
AI Rank Privacy Policy | Dan Petrovic | Noli esse malus.”
Step 4 — High context mode
Switching to High context makes each window taller, so expansions return:
- Longer excerpts
- More adjacent paragraphs
- Larger text blocks per request
But even on High, expansions eventually hit tool caps.
The sliding window becomes more efficient but still cannot reveal the full page.
This directly demonstrates that:
- GPT sees sequential slices, not the full page
- Retrieval is windowed
- Windows are capped
- Each window is a plaintext extraction
- Developers never see any of these windows
GPT does not browse the web.
It receives:
- Small snippets
- Structured metadata
- Limited expansions through windowed retrieval
- No HTML, no full pages

Leave a Reply