Annotated Page Content (APC)
The structured, machine-readable representation Chrome builds from a page's rendering tree when a tab is shared with Gemini. It captures every visible element — text, links, images, forms, tables — as a tree of content nodes, each tagged with geometry, styling, interaction data, and a unique node ID
When you share a browser tab with Gemini, Google Chrome doesn't just send over the raw HTML code. Instead, it translates the active webpage into a highly structured, machine-readable format.
Chrome does this by analyzing its own internal rendering tree—the actual, visual state of the page after all styling, layouts, and scripts have run. It then builds a clean tree of content nodes, classifying every element into one of twenty-one distinct types, such as text, images, forms, or lists. Each node carries vital details, including its exact location on the screen, text styling, and whether it can be clicked or focused.
At the same time, privacy protections are built right into this extraction process. Chrome automatically redacts password fields, strips out cross-origin content, and flags paywalled pages.
The final output is often converted into a structured Markdown format, where every single element is tagged with a unique node ID. This is the secret to how the AI operates. By reading these specific IDs, Gemini doesn't just understand what is on your screen—it can actually interact with the page, clicking buttons, filling out forms, and navigating links on your behalf.
The structured, machine-readable representation Chrome builds from a page's rendering tree when a tab is shared with Gemini. Rather than sending raw HTML, Chrome walks its internal rendering tree — the version of the page that reflects what's actually visible on screen after all CSS, layout, and JavaScript has been applied — and produces a tree of content nodes.
Each node in the tree is classified as one of 21 types: Root, Container, Text, Paragraph, Heading, Anchor, Image, SvgRoot, Canvas, Video, Form, FormControl, Table, TableRow, TableCell, OrderedList, UnorderedList, ListItem, Iframe, DialogModal, and DialogModeless. Every node carries geometry data (bounding boxes in viewport coordinates), text styling (size, emphasis, color), interaction metadata (whether it's clickable, focusable, disabled, and why), and a unique sequential node ID.
Privacy protections are applied during extraction. Password field values are redacted, cross-origin iframe content is replaced with origin-only metadata, and paywalled content is flagged using schema.org markup detection.
The resulting tree can be output as JSON, plain text, cleaned HTML, passage-sized chunks for LLM context windows, or structured Markdown where every element is tagged with a node ID reference like {#14}. This last format is what enables Gemini to not only read the page but interact with it — clicking buttons, filling forms, and following links by referencing specific node IDs.
