Read Webpage

Controlled node

Overview

The Read Webpage node fetches and extracts text content from a specified URL. It supports two extraction modes—quick mode for fast retrieval and standard mode for full page rendering—and includes built-in caching to avoid redundant fetches. The node can optionally chunk the extracted text into smaller pieces for downstream processing.

By default, the node returns markdown-formatted text. Enable Raw Output to receive the raw HTML instead. Results are cached based on the URL and input parameters for the current day.

Extraction Modes

Mode	Description	Use Case
Standard (default)	Uses full browser rendering to extract text	JavaScript-heavy sites, SPAs, or when accurate text extraction is critical
Quick Mode	Uses fast HTTP retrieval with markdown conversion	Static sites, blogs, or when speed is prioritized over rendering accuracy

Caching Behavior

The node automatically caches successful extractions for the current day (based on UTC timestamp). If the same URL and parameters are requested again, the cached result is returned immediately without re-fetching the webpage.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the webpage fetch operation	-
URL	Text	The webpage URL to fetch and extract text from	-
Quick Mode	Boolean	When enabled, uses fast HTTP extraction instead of full browser rendering	`false`
Raw Output	Boolean	When enabled, returns raw HTML instead of markdown text	`false`
Chunk	Options	Configuration for splitting the extracted text into chunks (see Chunk Options below)	-

Chunk Options

When the Chunk input is configured, the node splits the extracted text according to the specified strategy:

Strategy	Description
Structure	Returns an array of elements (sentences, paragraphs, or words)
Count	Groups elements into chunks of a specified count
Divide	Divides the text into a specified number of equal parts
Separator	Splits text by a custom separator string

Outputs

Output	Type	Description
Done	Event	Fires when the webpage extraction is complete
Text	Data	The extracted text content (string or array of strings if chunked). Returns `'undefined'` on failure
Success	Data	Boolean indicating whether the extraction succeeded (`true`) or failed (`false`)

Runtime Behavior and Defaults

Controlled Execution: This node requires a Run event to execute. It will not run automatically when inputs change.
Daily Caching: Successful extractions are cached using a hash of the inputs and current day timestamp. Cache is checked before making any external requests.
Error Handling: If the URL is invalid, the request fails, or the page cannot be parsed, the node sets Success to false and outputs 'undefined' as the text.
Chunking: When chunking is enabled, the output is an array of text chunks rather than a single string. If chunking fails or options are invalid, the raw text is returned instead.

Example

Basic Webpage Extraction

Connect a Start node to Read Webpage, then to a Show node to display the extracted text:

Start node fires the run event
Read Webpage receives the event and fetches the URL (e.g., https://example.com)
Show displays the extracted markdown text when done fires

Chunked Processing for Large Pages

To process a large webpage in manageable pieces:

Set the Chunk option to Count with a value of 10 (groups of 10 sentences/paragraphs)
Connect the Text output to a For Each node to iterate through chunks
Process each chunk individually (e.g., send to AI Write for analysis)

Quick Mode Limitations

Quick mode may not execute JavaScript or render dynamic content. For modern web applications or pages requiring login, use standard mode (Quick Mode unchecked).

Overview​

Extraction Modes​

Inputs​

Chunk Options​

Outputs​

Runtime Behavior and Defaults​

Example​

Basic Webpage Extraction​

Chunked Processing for Large Pages​