AI Mode is Not Live Web

Discovery

An experiment testing Google's AI Mode suggests it may rely on Google's existing index or cached web data rather than performing live HTTP requests for all URLs.

Listen

When you use Google's AI Mode to fetch a web page, you might assume it is browsing the live internet in real time. But a recent experiment reveals that this is not exactly how it works.

To test this, a brand-new web page was uploaded to a live server. A standard connection check confirmed the page was live and fully accessible. However, when a Python script was run inside Google's AI Mode to fetch that same new web address, it returned a "not found" error.

The script was then changed to fetch an older, already indexed page on the same server. This time, the AI successfully retrieved the page and its content.

This suggests that Google's AI Mode does not perform a fresh, live request to the target server every time. Instead, it seems to consult Google's existing search index or cache first. If a page is brand-new and has not been crawled yet, the AI acts as if it does not exist and simulates a 404 error, even if the page is actually live.

For developers and researchers, this is a crucial distinction. It means the AI's web access is mediated through Google's existing snapshot of the internet, rather than direct, real-time browsing. If you need up-to-the-second accuracy on a newly published page, the AI might tell you it simply is not there.

I recently stumbled upon a fascinating aspect of how Google’s AI Mode (powered by a custom Gemini model) interacts with the internet. I ran a simple test, and the results suggest that instead of performing truly live fetches for all URLs, the AI Mode relies on Google’s existing index or a cached version of the web. This can lead to some surprising discrepancies when dealing with brand-new or unindexed content.

Here’s What I Did:

First, I disabled the use of search_tool and made AI Mode run python code in its local environment.

My experiment was straightforward:

Create a New, Live Web Page: First, I created a new HTML file named test123.html and uploaded it to my live web server at dejan.ai.
Verify Live Access (The Old-Fashioned Way):
- Using the curl command in my terminal, I successfully fetched this new test123.html page. This proved it was live and accessible directly from the internet.
- I also checked my server’s access logs, which confirmed the curl request and showed the page was successfully delivered.
Test with Google’s AI Mode (Python):
- Next, I wrote a simple Python script using the urllib.request library to fetch this same new URL: https://dejan.ai/test123.html.
- When I executed this script within Google’s AI Mode, it reported an “HTTP Error 404: Not Found.”
Test with a Known, Indexed Page:
- To cross-check, I slightly modified my Python script to fetch an existing, presumably indexed, PHP file (test.php) on the same domain.
- When I ran this version in AI Mode, it successfully returned a “Status: 200” (OK) and the content of the PHP file: “Hello from a php file!”.

I repeated the test with another file (test.php) and replicated the test successfully.

My Observations and Implications:

The key takeaway for me was the stark difference in how AI Mode handled the newly created page:

My external tools (curl, server logs): Confirmed the page was live and accessible.
Google’s AI Mode: Reported the page as “Not Found” (404).

However, for a page that is likely already known to Google (indexed or cached), the AI Mode correctly fetched and reported its status and content.

This strongly suggests to me that when Google’s AI Mode (or its Python execution environment) attempts to access a URL, it doesn’t necessarily perform a fresh, live HTTP request to the target server every single time. It seems more likely that it first consults Google’s vast index or a cached representation of the web.

If the URL is not found in this internal knowledge base (as was the case with my brand-new, unindexed page), the AI seems to simulate a 404 error, even if the page is actually live.
If the URL is found, the AI can then proceed, possibly by serving cached content or using the cached information to then perform a live fetch (though my test points more towards reliance on pre-existing knowledge for the initial status check).

Why This Matters (To Me, and Maybe To You):

This behavior has several implications:

Real-time Data Accuracy: If you’re relying on AI Mode for tasks requiring access to the absolute latest version of a web page or data from a newly published URL, it might not provide up-to-the-second accuracy if its first check is against an internal cache/index.
Development and Testing: If you’re a developer testing new web endpoints or pages, you might find AI Mode reporting them as inaccessible until they’ve had a chance to be crawled and indexed by Google.
Understanding AI “Web Access”: It’s a good reminder that “web access” for AI models can mean different things. In this instance, it appears to be web access mediated through Google’s existing snapshot of the internet, rather than a direct, unfettered live browsing capability for every single request.

Dan Petrovic · May 29, 18:30