Read Document
Controlled node
Overview
The Read Document node extracts text content from documents stored in the Intellectible library. It supports PDF, DOCX, TXT, MD, and HTML file formats. The node can optionally chunk the extracted text into smaller pieces—such as sentences, paragraphs, or words—for downstream processing in workflows.
Supported File Types
| Format | Description | Notes |
|---|---|---|
| Portable Document Format | Extracts raw text content from PDF documents | |
| DOCX | Word Document | Converts document structure to markdown format |
| TXT | Plain Text | Reads raw text content |
| MD | Markdown | Reads raw text content |
| HTML | HyperText Markup Language | Extracts text content from HTML documents |
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Fires when the node starts running | - |
| File | FileSource | The document file to read from the library. Accepts single files via the file picker or file objects from other nodes. | - |
| Chunk | Chunk Options | Optional configuration to split the document into smaller pieces. Configure in the properties panel or pass as data. If not provided, returns the full text as a single string. | - |
Chunk Options
When chunking is enabled, the node supports the following strategies:
- Count: Groups elements (words, sentences, or paragraphs) into chunks of a specified size
- Divide: Divides the text into a specified number of equal parts
- Separator: Splits text by a custom delimiter string
- Structure: Returns the elements as an array without combining them
Outputs
| Output | Type | Description |
|---|---|---|
| Text | Text / Array | The extracted text content. If chunking is enabled, returns an array of text chunks; otherwise returns a single string. |
| Success | Boolean | Indicates whether the document was successfully parsed (true) or if an error occurred (false). |
| Done | Event | Fires when the node has finished processing the document. |
Runtime Behavior and Defaults
- File Validation: The node checks for a valid project ID and file object at runtime. If the file is missing or invalid, it returns
undefinedfor text andfalsefor success. - Extension Detection: The node attempts to determine the file type from the filename extension. If no extension is found, it falls back to the MIME type provided in the file metadata.
- Default Chunking: If chunking is enabled but no specific strategy is configured, the node defaults to creating chunks of approximately 700 words.
- Element Types: When chunking by sentence, paragraph, or word, the node uses natural language processing to identify boundaries accurately.
- Error Handling: Unsupported file types will result in
success: falseandtext: undefined.
Example Usage
Scenario: Extract text from a PDF contract and split it into paragraphs for clause-by-clause analysis.
- Add a Read Document node to your workflow.
- Connect a trigger event (like Start) to the Run input.
- Select a PDF file from the library using the File input in the properties panel.
- Enable Chunk in the properties panel and set the strategy to "Paragraph" to split the document by paragraph breaks.
- Connect the Text output to a For Each node to iterate through each paragraph.
- Connect the Done event to trigger downstream processing after all chunks are ready.
Tip: For large documents, use the "Count" strategy with a size of 1 to process the document sentence-by-sentence, or use "Divide" to split the document into a specific number of chunks for parallel processing.