404

AI Mode is Not Live Web

I recently stumbled upon a fascinating aspect of how Google’s AI Mode (powered by a custom Gemini model) interacts with the internet. I ran a simple test, and the results suggest that instead of performing truly live fetches for all URLs, the AI Mode relies on Google’s existing index or a cached version of the web. This can lead to some surprising discrepancies when dealing with brand-new or unindexed content.

Here’s What I Did:

First, I disabled the use of search_tool and made AI Mode run python code in its local environment.

My experiment was straightforward:

  1. Create a New, Live Web Page: First, I created a new HTML file named test123.html and uploaded it to my live web server at dejan.ai.
  2. Verify Live Access (The Old-Fashioned Way):
    • Using the curl command in my terminal, I successfully fetched this new test123.html page. This proved it was live and accessible directly from the internet.
    • I also checked my server’s access logs, which confirmed the curl request and showed the page was successfully delivered.
  3. Test with Google’s AI Mode (Python):
    • Next, I wrote a simple Python script using the urllib.request library to fetch this same new URL: https://dejan.ai/test123.html.
    • When I executed this script within Google’s AI Mode, it reported an “HTTP Error 404: Not Found.”
  4. Test with a Known, Indexed Page:
    • To cross-check, I slightly modified my Python script to fetch an existing, presumably indexed, PHP file (test.php) on the same domain.
    • When I ran this version in AI Mode, it successfully returned a “Status: 200” (OK) and the content of the PHP file: “Hello from a php file!”.

I repeated the test with another file (test.php) and replicated the test successfully.

My Observations and Implications:

The key takeaway for me was the stark difference in how AI Mode handled the newly created page:

  • My external tools (curl, server logs): Confirmed the page was live and accessible.
  • Google’s AI Mode: Reported the page as “Not Found” (404).

However, for a page that is likely already known to Google (indexed or cached), the AI Mode correctly fetched and reported its status and content.

This strongly suggests to me that when Google’s AI Mode (or its Python execution environment) attempts to access a URL, it doesn’t necessarily perform a fresh, live HTTP request to the target server every single time. It seems more likely that it first consults Google’s vast index or a cached representation of the web.

  • If the URL is not found in this internal knowledge base (as was the case with my brand-new, unindexed page), the AI seems to simulate a 404 error, even if the page is actually live.
  • If the URL is found, the AI can then proceed, possibly by serving cached content or using the cached information to then perform a live fetch (though my test points more towards reliance on pre-existing knowledge for the initial status check).

Why This Matters (To Me, and Maybe To You):

This behavior has several implications:

  • Real-time Data Accuracy: If you’re relying on AI Mode for tasks requiring access to the absolute latest version of a web page or data from a newly published URL, it might not provide up-to-the-second accuracy if its first check is against an internal cache/index.
  • Development and Testing: If you’re a developer testing new web endpoints or pages, you might find AI Mode reporting them as inaccessible until they’ve had a chance to be crawled and indexed by Google.
  • Understanding AI “Web Access”: It’s a good reminder that “web access” for AI models can mean different things. In this instance, it appears to be web access mediated through Google’s existing snapshot of the internet, rather than a direct, unfettered live browsing capability for every single request.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *