Read Webpage
Controlled node
Overview
The Read Webpage node fetches and extracts text content from a specified URL. It supports two extraction modes—quick mode for fast retrieval and standard mode for full page rendering—and includes built-in caching to avoid redundant fetches. The node can optionally chunk the extracted text into smaller pieces for downstream processing.
By default, the node returns markdown-formatted text. Enable Raw Output to receive the raw HTML instead. Results are cached based on the URL and input parameters for the current day.
Extraction Modes
| Mode | Description | Use Case |
|---|---|---|
| Standard (default) | Uses full browser rendering to extract text | JavaScript-heavy sites, SPAs, or when accurate text extraction is critical |
| Quick Mode | Uses fast HTTP retrieval with markdown conversion | Static sites, blogs, or when speed is prioritized over rendering accuracy |
The node automatically caches successful extractions for the current day (based on UTC timestamp). If the same URL and parameters are requested again, the cached result is returned immediately without re-fetching the webpage.
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the webpage fetch operation | - |
| URL | Text | The webpage URL to fetch and extract text from | - |
| Quick Mode | Boolean | When enabled, uses fast HTTP extraction instead of full browser rendering | false |
| Raw Output | Boolean | When enabled, returns raw HTML instead of markdown text | false |
| Chunk | Options | Configuration for splitting the extracted text into chunks (see Chunk Options below) | - |
Chunk Options
When the Chunk input is configured, the node splits the extracted text according to the specified strategy:
| Strategy | Description |
|---|---|
| Structure | Returns an array of elements (sentences, paragraphs, or words) |
| Count | Groups elements into chunks of a specified count |
| Divide | Divides the text into a specified number of equal parts |
| Separator | Splits text by a custom separator string |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when the webpage extraction is complete |
| Text | Data | The extracted text content (string or array of strings if chunked). Returns 'undefined' on failure |
| Success | Data | Boolean indicating whether the extraction succeeded (true) or failed (false) |
Runtime Behavior and Defaults
- Controlled Execution: This node requires a
Runevent to execute. It will not run automatically when inputs change. - Daily Caching: Successful extractions are cached using a hash of the inputs and current day timestamp. Cache is checked before making any external requests.
- Error Handling: If the URL is invalid, the request fails, or the page cannot be parsed, the node sets
Successtofalseand outputs'undefined'as the text. - Chunking: When chunking is enabled, the output is an array of text chunks rather than a single string. If chunking fails or options are invalid, the raw text is returned instead.
Example
Basic Webpage Extraction
Connect a Start node to Read Webpage, then to a Show node to display the extracted text:
- Start node fires the
runevent - Read Webpage receives the event and fetches the URL (e.g.,
https://example.com) - Show displays the extracted markdown text when
donefires
Chunked Processing for Large Pages
To process a large webpage in manageable pieces:
- Set the Chunk option to Count with a value of
10(groups of 10 sentences/paragraphs) - Connect the Text output to a For Each node to iterate through chunks
- Process each chunk individually (e.g., send to AI Write for analysis)
Quick mode may not execute JavaScript or render dynamic content. For modern web applications or pages requiring login, use standard mode (Quick Mode unchecked).