1. Introduction
What is APC?
Annotated Page Content (APC) is a structured and actionable representation of a webpage’s content and layout. Its primary function is to enable a deep understanding of page structure, content, and interactive elements by downstream clients, who can receive the information as a protobuf tree.
Core Principles
APC is designed with the following principles in mind:
- Completeness: Capture all relevant page information, including text, images, forms, and tables. This encompasses content that is both visible in the viewport and findable through user actions like scrolling or searching.
- Actionability: Allow systems to not only parse content but also identify and support interactions with elements like buttons, links, and form fields.
- Consistency: Provide a stable representation of the page, even as it changes, to support reliable multi-step interactions.
- Efficiency: Minimize the computational cost and data size required.
- Extensibility: Support a wide variety of current and future features.
- Privacy & Security: Prevent the leakage of sensitive user information and protect against security threats like cross-origin attacks.
- Safety: Support pre-action verification before performing requested tasks on a page.
2. The APC Data Structure
The foundation of APC is the AnnotatedPageContent
protobuf message, which organizes page content into a hierarchical tree.
A Tree of ContentNode
s
The representation is a tree of ContentNode
s. These nodes can represent layout containers on the page, grouping related information in a structure derived from the layout tree. This includes:
- Content sectioning elements (
<article>
,<nav>
,<section>
) - Lists, tables, and forms
- Text, images, paragraphs, headings, and links
- Interactive elements
- Iframes, with origin information
Key Information in Each Node (ContentAttributes
)
Each ContentNode
contains attributes that describe the element in detail:
- General Metadata: Includes a unique content node ID, DOM Node ID(s), the role of the content (e.g., header, main), and the node’s type.
- Geometry: Bounding box coordinates for each node are provided, allowing its content to be mapped to visual representations of the page like screenshots.
- Text (
TextInfo
): The text content, along with styling information like size, emphasis, and color. - Images (
ImageInfo
): The image’s alt text or caption, its URL, and security origin. - Links (
AnchorData
): The destination URL and the link’srel
attribute. - Forms (
FormInfo
,FormControlData
): Includes the form’s name/ID and data for individual controls like field name, value, and type. Password field values are omitted unless the user has made them visible on the page. - Interaction (
InteractionInfo
): Describes the node’s interactivity (e.g., clickable, editable, focusable).
What’s Not Currently Included
The following elements are under consideration for future inclusion but are not currently part of the APC structure:
- Multimedia (
<audio>
,<video>
) - Canvas (
<canvas>
) and SVG (<svg>
) - Scripts (explicitly excluded as they are not user-visible content)
- Structured PDF content (currently, only raw bytes are sent)
3. How APC is Generated
APC is generated by traversing Blink’s layout tree, not the DOM tree. This is a critical distinction because the layout tree only includes content that is actually rendered on the page.
The generation algorithm recursively traverses the layout tree, creating a ContentNode
for each rendered object with structured content or a significant semantic role. It extracts relevant data and organizes the nodes into a hierarchy that preserves the visual order of the page.
4. Using APC: Formats and Interactions
Available Formats (“Views”)
On the browser side, the raw APC proto can be converted into various consumable formats, including:
- Structured Markdown: A Markdown representation of the page that preserves structure and visual order. Elements in the Markdown can be labeled with unique IDs (
{#ID}
) that link back to the originalContentNode
. - Passage Chunks: The visible content of the page broken down into consistently sized passages, useful for citing specific sections of the page.
Enabling Page Interactions
A key goal of APC is to enable reliable interactions with webpages, even when they change dynamically.
To handle dynamic page changes, an algorithm robustly identifies the target element by matching key properties like its type, interactivity, and location. If needed, it can further verify the element by comparing its text content to ensure the correct action is taken.
5. Critical Considerations for Implementation
Using APC requires careful attention to privacy and security. While APC provides data to help mitigate risks, feature owners bear ultimate responsibility.
- Data Exfiltration and Origin Tracking: Webpages often contain content from multiple origins (e.g., in iframes). APC tags all data with its source origin, allowing consumers to detect and handle cross-origin information appropriately.
- Handling Password Fields: Values from password fields are removed from the APC representation unless the user has explicitly made them visible on the page.
- Paywalled Content: APC’s design helps exclude most paywalled content. Websites can also use specific markup ([
isAccessibleForFree=false
](https://developers.google.com/search/docs/appeara nce/structured-data/paywalled-content)) to flag paid content, and APC includes this signal. - Data from Protected Environments: Systems using APC should be aware that content may originate from sources with special data handling requirements. Consumers of APC data are responsible for enforcing all applicable data protection and access control rules.
- Guidelines for Storing APC Data: Due to the potential for private information, APC data or its derivatives should not be persisted beyond the scope of a user’s immediate task without explicit user consent.
- ai_page_content_agent.cc
- ai_page_content_agent.h
- ai_page_content_agent_unittest.cc
- ai_page_content_debug_utils.cc
- ai_page_content_debug_utils.h
- BUILD.gn
- DEPS
- document_chunker.cc
- document_chunker.h
- frame_metadata_observer_registry.cc
- frame_metadata_observer_registry.h
- frame_metadata_observer_registry_unittest.cc
- GEMINI.md
- inner_html_agent.cc
- inner_html_agent.h
- inner_html_builder.cc
- inner_html_builder.h
- inner_html_builder_unittest.cc
- inner_text_agent.cc
- inner_text_agent.h
- inner_text_builder.cc
- inner_text_builder.h
- inner_text_builder_unittest.cc
- OWNERS
- paid_content.cc
- paid_content.h
- readme.md
Leave a Reply