Read Webpage

Controlled node

Overview

This node is used to parse text and content from a webpage allowing you to manipulate and process the data within your workflow. A common use for this would be to include the parsed web content in a prompt for an AI generation task. The web parser does a best effort to extract the main content of the page and remove any non-essential elements such as ads, navigation bars, and other distractions. The content is returned as markdown text, which can be used in other nodes or saved to a file or the library.

Inputs

Input	Type	Description	Default
URL	Text	The URL of the webpage	-
Run	Event	Fires when the node starts running	-

Outputs

Output	Type	Description
Done	Event	Fires when the node finishes running
Text	Text \| List	The parsed text content of the webpage as markdown. If chunk is selected a list will be returned.

Panel Controls

There are some optional control flags in the panel to configure certain aspects of the web parser.

Quick mode: This mode is faster and less accurate, it will not parse the webpage as thoroughly as the default mode. It will work with simple pages but complex modern websites may lack content.
Chunk: This works in a similar way to the chunking in the Read Document node. If the webpage content is large you can chunk it into a list of smaller pieces and process each one individually.
Raw output: This will return the raw HTML content of the webpage instead of the parsed text.
Deep clean: This will make a best effort to remove content from the output that does not contribute to the main content of the page. For example, it will attempt to remove ads, navigation bars, and other non-essential elements.

Overview​

Inputs​

Outputs​

Panel Controls​

Overview

Inputs

Outputs

Panel Controls