Chunk Document

Controlled node

Overview

The Chunk Document node splits a large text document into smaller, manageable pieces (chunks) based on configurable strategies. This is essential for processing long documents that exceed AI model context windows or for creating semantic chunks for vector databases.

The node supports two primary chunking strategies:

Count: Divides text into chunks of a specified size (e.g., 500 words per chunk)
Divide: Splits text into a specified number of equal parts (e.g., divide into 10 chunks)

You can chunk by various units including words, sentences, paragraphs, pages, or custom separators. The node also supports overlap between chunks to maintain context continuity.

Chunk Options

Configure chunking behavior through the properties panel:

Strategy	Description
Count	Creates chunks containing a specific number of units (words, sentences, etc.)
Divide	Divides the entire text into a specified number of equal parts

Unit	Description
Word	Chunks by word count
Sentence	Chunks by sentence boundaries
Paragraph	Chunks by paragraph breaks (`\n\n`)
Page	Chunks by page breaks
Custom	Chunks using a custom separator string

Overlap

Use the Overlap setting to include units from the previous chunk at the start of the next chunk. This helps maintain context between chunks, especially useful when processing chunks independently through AI models.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the chunking operation	-
Text	Text	The document content to be chunked	-
Options	Data	Chunking configuration object (set via properties panel)	See defaults below

Outputs

Output	Type	Description
Done	Event	Fires when chunking is complete
Chunks	Data	Array of chunk metadata objects containing `start`, `end`, and `index` positions
Texts	Data	Array of the actual text strings for each chunk

Runtime Behavior and Defaults

When the Run event fires, the node processes the input text according to the configured options:

If no text is provided, the node outputs empty arrays for both chunks and texts
The node extracts substring ranges from the original text based on the chunk boundaries calculated by the chunking algorithm
Both chunks (metadata) and texts (content) are output simultaneously when processing completes

Default Options

{
  "strategy": "count",
  "unit": "word",
  "size": 700,
  "parts": 3,
  "separator": "\n\n",
  "overlap": 0
}

Strategy: count (creates chunks of specified size)
Unit: word (chunks by word count)
Size: 700 (units per chunk when using count strategy)
Parts: 3 (number of chunks when using divide strategy)
Separator: \n\n (used when unit is set to custom)
Overlap: 0 (no overlap between chunks)

Example

Basic Document Chunking

Scenario: Split a long article into 500-word chunks for processing by an AI model with a limited context window.

Connect a Text node or document reader node to the Text input
Set the properties panel options:
- Strategy: count
- Unit: word
- Size: 500
- Overlap: 50 (to maintain context between chunks)
Connect the Done event to an AI Write node
Use the Texts output to feed chunks sequentially into the AI model

Semantic Chunking by Paragraph

Scenario: Split a document by paragraphs to preserve semantic boundaries.

Set Strategy to count
Set Unit to paragraph
Set Size to 1 (one paragraph per chunk)
The Texts output will contain each paragraph as a separate array element

Fixed Number of Chunks

Scenario: Divide a document into exactly 5 equal parts for parallel processing.

Set Strategy to divide
Set Unit to word (or sentence for better semantic splits)
Set Parts to 5
The node will output exactly 5 chunks in the Texts array

Custom Separator Chunking

Scenario: Split a markdown document by headers.

Set Unit to custom
Set Separator to ## (or whatever delimiter marks your sections)
The node will split the text at each occurrence of the separator

Chunk Metadata

The Chunks output provides metadata including start and end character indices, allowing you to map processed results back to the original document positions.

Overview​

Chunk Options​

Inputs​

Outputs​

Runtime Behavior and Defaults​

Default Options​

Example​

Basic Document Chunking​

Semantic Chunking by Paragraph​

Fixed Number of Chunks​

Custom Separator Chunking​