Sorry Google, I was wrong.

by

in ,
  • I blamed Google for a $2,000 Gemini API bill I thought was a billing bug. It wasn’t.
  • The cost came from the URL Context tool which ingests entire web pages as input tokens.
  • If you hit a million token PDF or a large page you pay for it.
  • I think allowing users to control the input size is a sensible thing for Google to implement.

What Happened

I run several tools on the Gemini API. One of them is a grounded search analysis tool that works in two stages: Gemini 2.0 Flash does a Google Search grounded query, then Gemini 3 Pro visits each source page using the URL Context tool to classify its content.

Through a strange coincidence, two things happened at the same time. I was running a large API job for a client using Gemini Flash, and separately, Aleyda Solis shared my grounded search tool with her audience. Overnight it went from a quiet internal tool to hundreds of users running searches, each triggering the full pipeline.

I noticed the client job. I didn’t notice the tool going viral.

When I checked billing and saw $2,000 in four days, I assumed the client job was the cause. The charges showed up under a “Gemini 3 Pro” SKU, but I was certain I’d only called Flash. I contacted Google convinced I’d found a billing bug.

Logan Kilpatrick and his team looked into it and told me: you’re calling Gemini 3 Pro. I pushed back. Built test scripts, pulled database records. I was sure.

They were right.

The Real Cause

The client job was a red herring. The cost was coming from the search tool’s second stage: 2,731 calls to Gemini 3 Pro with URL Context over four days.

Here’s the thing about URL Context that caught me off guard: when you give it a URL, the entire HTML page goes in as input tokens. You have no control over this. You don’t get to choose what parts of the page to ingest. You don’t get a size estimate beforehand. The model fetches the page and bills you for every token on it.

Now imagine your tool hits a PDF. A single PDF can be 1 million tokens. That’s one API call. Now imagine your tool is processing search results at scale and it encounters hundreds of PDFs, or thousands of long-form pages. Each one goes in at full size. You can see how this adds up fast.

On top of that, I’d set thinking to HIGH and max output to 65,535 tokens with instructions not to truncate. So each call was expensive on input (entire pages), expensive on thinking (HIGH level reasoning over those pages), and expensive on output (uncapped).

The result: 4.5 million words of output alone, with unmeasured input and thinking tokens on top. I wasn’t logging token usage on these calls, so I had zero cost visibility on the most expensive part of my stack.

Why I’m Writing This

Some bad PR is circulating about me criticising Google’s API billing. I want to be clear: there was no billing bug. Every charge was legitimate. I was calling the model I was being billed for. The cost came from my own tool, my own configuration choices, and my failure to log and limit token usage.

The Google team, particularly Logan Kilpatrick who was generous with his time and being patient while I was pointing fingers in the wrong direction. They didn’t have to help, but they did. I appreciate that.

What to Watch Out For

If you’re using URL Context in the Gemini API, be aware:

  • It ingests entire pages as input tokens. You don’t control or know the size in advance.
  • PDFs and large pages can send input tokens through the roof.
  • HIGH thinking level multiplies costs with invisible tokens you’re billed for but never see.
  • Google Cloud budgets are alert-only. There is no hard spending cap.

My recommendations: Log usage_metadata on every call. Set output token limits. Use MINIMAL thinking for batch jobs. Build your own circuit breakers. And if your tool goes public, be ready for costs to scale with traffic.

A Note for Google

The URL Context tool is powerful but still early. Input size warnings, per-tool token breakdowns, and hard budget caps would help developers avoid this situation. These are suggestions, not complaints. The Gemini API and Google Search grounding are excellent.

The Numbers

Total spend~$2,000 in 4 days
URL Context calls2,731
Total output4.5 million words
Billing bug?No.
Does URL context tool need input size cap?Yes!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *