Read Documents

Controlled node

Overview

The Read Documents node extracts text content from multiple files in your library. It supports batch processing of PDF, DOCX, TXT, and MD files, returning the extracted text as an array.

When the Chunk option is enabled, documents are split into smaller segments (chunks) based on your specified strategy (count, divide, separator, or structure). This is useful for processing large documents in manageable pieces or for creating embeddings.

The node processes all files in parallel and outputs an array where each element corresponds to one input file. If chunking is enabled with the Flatten option, all chunks from all documents are combined into a single flat array.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the document reading process.	-
Files	FileSource	One or more documents to read from the library. Supports PDF, DOCX, TXT, and MD files.	-

Outputs

Output	Type	Description
Results	Data	An array containing the extracted text. Without chunking: `["text from doc1", "text from doc2"]`. With chunking (no flatten): `[["chunk1", "chunk2"], ["chunk1", "chunk2"]]`. With chunking and flatten: `["chunk1", "chunk2", "chunk1", "chunk2"]`.
Done	Event	Fires when all documents have been processed.

Properties

Property	Type	Description	Default
Chunk	Chunk	Optional chunking configuration. If enabled, splits documents into smaller pieces based on strategy (count, divide, separator, or structure).	Disabled
Flatten	Boolean	When chunking is enabled, flattens all chunks from all documents into a single array instead of nested arrays.	`false`

Runtime Behavior and Defaults

File Processing: The node processes all input files in parallel. Supported formats are automatically detected by file extension and MIME type.
Text Extraction:
- PDF: Extracts raw text content
- DOCX: Converts to Markdown format
- TXT/MD: Reads as plain text
Chunking: When enabled, the node uses the same chunking engine as the Chunk Document node. Available strategies include:
- Count: Split by number of words/sentences/paragraphs per chunk
- Divide: Split document into N equal parts
- Separator: Split by custom delimiter (e.g., \n\n for double newlines)
- Structure: Return as array of sentences/paragraphs/words
Error Handling: If a file cannot be read, it is skipped and an error is logged. The node continues processing remaining files.

Example Usage

Basic Document Reading

Connect a File Source node (or multiple files) to the Files input
Trigger the Run event
The Results output contains: ["Full text of document 1...", "Full text of document 2..."]

Chunking Multiple Documents

Enable the Chunk property and set strategy to "Count" with 500 words per chunk
Connect multiple PDF files to Files
Results outputs nested arrays: [["chunk1", "chunk2"], ["chunk1", "chunk2"]]
To process all chunks in a single loop, enable Flatten in the Chunk options
With flatten enabled: ["chunk1", "chunk2", "chunk1", "chunk2"]

Processing with AI Connect the Results output to a For Each node to iterate through documents, or to an AI Write node with a prompt like: "Summarize each of these documents: {{results}}"

Overview​

Inputs​

Outputs​

Properties​

Runtime Behavior and Defaults​

Example Usage​