Skip to main content

Read Webpage

Controlled node

Overview

The Read Webpage node fetches and extracts text content from a specified URL. It supports two extraction modes—quick mode for fast retrieval and standard mode for full page rendering—and includes built-in caching to avoid redundant fetches. The node can optionally chunk the extracted text into smaller pieces for downstream processing.

By default, the node returns markdown-formatted text. Enable Raw Output to receive the raw HTML instead. Results are cached based on the URL and input parameters for the current day.

Extraction Modes

ModeDescriptionUse Case
Standard (default)Uses full browser rendering to extract textJavaScript-heavy sites, SPAs, or when accurate text extraction is critical
Quick ModeUses fast HTTP retrieval with markdown conversionStatic sites, blogs, or when speed is prioritized over rendering accuracy
Caching Behavior

The node automatically caches successful extractions for the current day (based on UTC timestamp). If the same URL and parameters are requested again, the cached result is returned immediately without re-fetching the webpage.

Inputs

InputTypeDescriptionDefault
RunEventTriggers the webpage fetch operation-
URLTextThe webpage URL to fetch and extract text from-
Quick ModeBooleanWhen enabled, uses fast HTTP extraction instead of full browser renderingfalse
Raw OutputBooleanWhen enabled, returns raw HTML instead of markdown textfalse
ChunkOptionsConfiguration for splitting the extracted text into chunks (see Chunk Options below)-

Chunk Options

When the Chunk input is configured, the node splits the extracted text according to the specified strategy:

StrategyDescription
StructureReturns an array of elements (sentences, paragraphs, or words)
CountGroups elements into chunks of a specified count
DivideDivides the text into a specified number of equal parts
SeparatorSplits text by a custom separator string

Outputs

OutputTypeDescription
DoneEventFires when the webpage extraction is complete
TextDataThe extracted text content (string or array of strings if chunked). Returns 'undefined' on failure
SuccessDataBoolean indicating whether the extraction succeeded (true) or failed (false)

Runtime Behavior and Defaults

  • Controlled Execution: This node requires a Run event to execute. It will not run automatically when inputs change.
  • Daily Caching: Successful extractions are cached using a hash of the inputs and current day timestamp. Cache is checked before making any external requests.
  • Error Handling: If the URL is invalid, the request fails, or the page cannot be parsed, the node sets Success to false and outputs 'undefined' as the text.
  • Chunking: When chunking is enabled, the output is an array of text chunks rather than a single string. If chunking fails or options are invalid, the raw text is returned instead.

Example

Basic Webpage Extraction

Connect a Start node to Read Webpage, then to a Show node to display the extracted text:

  1. Start node fires the run event
  2. Read Webpage receives the event and fetches the URL (e.g., https://example.com)
  3. Show displays the extracted markdown text when done fires

Chunked Processing for Large Pages

To process a large webpage in manageable pieces:

  1. Set the Chunk option to Count with a value of 10 (groups of 10 sentences/paragraphs)
  2. Connect the Text output to a For Each node to iterate through chunks
  3. Process each chunk individually (e.g., send to AI Write for analysis)
Quick Mode Limitations

Quick mode may not execute JavaScript or render dynamic content. For modern web applications or pages requiring login, use standard mode (Quick Mode unchecked).