Skip to main content

Read Documents

Controlled node

Overview

The Read Documents node extracts text content from multiple files in your library. It supports batch processing of PDF, DOCX, TXT, and MD files, returning the extracted text as an array.

When the Chunk option is enabled, documents are split into smaller segments (chunks) based on your specified strategy (count, divide, separator, or structure). This is useful for processing large documents in manageable pieces or for creating embeddings.

The node processes all files in parallel and outputs an array where each element corresponds to one input file. If chunking is enabled with the Flatten option, all chunks from all documents are combined into a single flat array.

Inputs

InputTypeDescriptionDefault
RunEventTriggers the document reading process.-
FilesFileSourceOne or more documents to read from the library. Supports PDF, DOCX, TXT, and MD files.-

Outputs

OutputTypeDescription
ResultsDataAn array containing the extracted text. Without chunking: ["text from doc1", "text from doc2"]. With chunking (no flatten): [["chunk1", "chunk2"], ["chunk1", "chunk2"]]. With chunking and flatten: ["chunk1", "chunk2", "chunk1", "chunk2"].
DoneEventFires when all documents have been processed.

Properties

PropertyTypeDescriptionDefault
ChunkChunkOptional chunking configuration. If enabled, splits documents into smaller pieces based on strategy (count, divide, separator, or structure).Disabled
FlattenBooleanWhen chunking is enabled, flattens all chunks from all documents into a single array instead of nested arrays.false

Runtime Behavior and Defaults

  • File Processing: The node processes all input files in parallel. Supported formats are automatically detected by file extension and MIME type.
  • Text Extraction:
    • PDF: Extracts raw text content
    • DOCX: Converts to Markdown format
    • TXT/MD: Reads as plain text
  • Chunking: When enabled, the node uses the same chunking engine as the Chunk Document node. Available strategies include:
    • Count: Split by number of words/sentences/paragraphs per chunk
    • Divide: Split document into N equal parts
    • Separator: Split by custom delimiter (e.g., \n\n for double newlines)
    • Structure: Return as array of sentences/paragraphs/words
  • Error Handling: If a file cannot be read, it is skipped and an error is logged. The node continues processing remaining files.

Example Usage

Basic Document Reading

  1. Connect a File Source node (or multiple files) to the Files input
  2. Trigger the Run event
  3. The Results output contains: ["Full text of document 1...", "Full text of document 2..."]

Chunking Multiple Documents

  1. Enable the Chunk property and set strategy to "Count" with 500 words per chunk
  2. Connect multiple PDF files to Files
  3. Results outputs nested arrays: [["chunk1", "chunk2"], ["chunk1", "chunk2"]]
  4. To process all chunks in a single loop, enable Flatten in the Chunk options
  5. With flatten enabled: ["chunk1", "chunk2", "chunk1", "chunk2"]

Processing with AI Connect the Results output to a For Each node to iterate through documents, or to an AI Write node with a prompt like: "Summarize each of these documents: {{results}}"