Chunk Document
Controlled node
Overview
The Chunk Document node splits a large text document into smaller, manageable pieces (chunks) based on configurable strategies. This is essential for processing long documents that exceed AI model context windows or for creating semantic chunks for vector databases.
The node supports two primary chunking strategies:
- Count: Divides text into chunks of a specified size (e.g., 500 words per chunk)
- Divide: Splits text into a specified number of equal parts (e.g., divide into 10 chunks)
You can chunk by various units including words, sentences, paragraphs, pages, or custom separators. The node also supports overlap between chunks to maintain context continuity.
Chunk Options
Configure chunking behavior through the properties panel:
| Strategy | Description |
|---|---|
| Count | Creates chunks containing a specific number of units (words, sentences, etc.) |
| Divide | Divides the entire text into a specified number of equal parts |
| Unit | Description |
|---|---|
| Word | Chunks by word count |
| Sentence | Chunks by sentence boundaries |
| Paragraph | Chunks by paragraph breaks (\n\n) |
| Page | Chunks by page breaks |
| Custom | Chunks using a custom separator string |
Use the Overlap setting to include units from the previous chunk at the start of the next chunk. This helps maintain context between chunks, especially useful when processing chunks independently through AI models.
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the chunking operation | - |
| Text | Text | The document content to be chunked | - |
| Options | Data | Chunking configuration object (set via properties panel) | See defaults below |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when chunking is complete |
| Chunks | Data | Array of chunk metadata objects containing start, end, and index positions |
| Texts | Data | Array of the actual text strings for each chunk |
Runtime Behavior and Defaults
When the Run event fires, the node processes the input text according to the configured options:
- If no text is provided, the node outputs empty arrays for both
chunksandtexts - The node extracts substring ranges from the original text based on the chunk boundaries calculated by the chunking algorithm
- Both
chunks(metadata) andtexts(content) are output simultaneously when processing completes
Default Options
{
"strategy": "count",
"unit": "word",
"size": 700,
"parts": 3,
"separator": "\n\n",
"overlap": 0
}
- Strategy:
count(creates chunks of specified size) - Unit:
word(chunks by word count) - Size:
700(units per chunk when using count strategy) - Parts:
3(number of chunks when using divide strategy) - Separator:
\n\n(used when unit is set to custom) - Overlap:
0(no overlap between chunks)
Example
Basic Document Chunking
Scenario: Split a long article into 500-word chunks for processing by an AI model with a limited context window.
- Connect a Text node or document reader node to the Text input
- Set the properties panel options:
- Strategy:
count - Unit:
word - Size:
500 - Overlap:
50(to maintain context between chunks)
- Strategy:
- Connect the Done event to an AI Write node
- Use the Texts output to feed chunks sequentially into the AI model
Semantic Chunking by Paragraph
Scenario: Split a document by paragraphs to preserve semantic boundaries.
- Set Strategy to
count - Set Unit to
paragraph - Set Size to
1(one paragraph per chunk) - The Texts output will contain each paragraph as a separate array element
Fixed Number of Chunks
Scenario: Divide a document into exactly 5 equal parts for parallel processing.
- Set Strategy to
divide - Set Unit to
word(orsentencefor better semantic splits) - Set Parts to
5 - The node will output exactly 5 chunks in the Texts array
Custom Separator Chunking
Scenario: Split a markdown document by headers.
- Set Unit to
custom - Set Separator to
##(or whatever delimiter marks your sections) - The node will split the text at each occurrence of the separator
The Chunks output provides metadata including start and end character indices, allowing you to map processed results back to the original document positions.