Read Document

Controlled node

Overview

The Read Document node extracts text content from documents stored in the Intellectible library. It supports PDF, DOCX, TXT, MD, and HTML file formats. The node can optionally chunk the extracted text into smaller pieces—such as sentences, paragraphs, or words—for downstream processing in workflows.

Supported File Types

Format	Description	Notes
PDF	Portable Document Format	Extracts raw text content from PDF documents
DOCX	Word Document	Converts document structure to markdown format
TXT	Plain Text	Reads raw text content
MD	Markdown	Reads raw text content
HTML	HyperText Markup Language	Extracts text content from HTML documents

Inputs

Input	Type	Description	Default
Run	Event	Fires when the node starts running	-
File	FileSource	The document file to read from the library. Accepts single files via the file picker or file objects from other nodes.	-
Chunk	Chunk Options	Optional configuration to split the document into smaller pieces. Configure in the properties panel or pass as data. If not provided, returns the full text as a single string.	-

Chunk Options

When chunking is enabled, the node supports the following strategies:

Count: Groups elements (words, sentences, or paragraphs) into chunks of a specified size
Divide: Divides the text into a specified number of equal parts
Separator: Splits text by a custom delimiter string
Structure: Returns the elements as an array without combining them

Outputs

Output	Type	Description
Text	Text / Array	The extracted text content. If chunking is enabled, returns an array of text chunks; otherwise returns a single string.
Success	Boolean	Indicates whether the document was successfully parsed (`true`) or if an error occurred (`false`).
Done	Event	Fires when the node has finished processing the document.

Runtime Behavior and Defaults

File Validation: The node checks for a valid project ID and file object at runtime. If the file is missing or invalid, it returns undefined for text and false for success.
Extension Detection: The node attempts to determine the file type from the filename extension. If no extension is found, it falls back to the MIME type provided in the file metadata.
Default Chunking: If chunking is enabled but no specific strategy is configured, the node defaults to creating chunks of approximately 700 words.
Element Types: When chunking by sentence, paragraph, or word, the node uses natural language processing to identify boundaries accurately.
Error Handling: Unsupported file types will result in success: false and text: undefined.

Example Usage

Scenario: Extract text from a PDF contract and split it into paragraphs for clause-by-clause analysis.

Add a Read Document node to your workflow.
Connect a trigger event (like Start) to the Run input.
Select a PDF file from the library using the File input in the properties panel.
Enable Chunk in the properties panel and set the strategy to "Paragraph" to split the document by paragraph breaks.
Connect the Text output to a For Each node to iterate through each paragraph.
Connect the Done event to trigger downstream processing after all chunks are ready.

Tip: For large documents, use the "Count" strategy with a size of 1 to process the document sentence-by-sentence, or use "Divide" to split the document into a specific number of chunks for parallel processing.

Overview​

Supported File Types​

Inputs​

Chunk Options​

Outputs​

Runtime Behavior and Defaults​

Example Usage​