Chunk Text
Uncontrolled node
Overview
The Chunk Text node splits a large text input into smaller, manageable pieces (chunks) based on configurable strategies. This is useful for processing long documents in batches, preparing text for AI models with token limits, or segmenting content for analysis.
By default, the node chunks text into approximately 700-word segments. However, you can customize the chunking behavior through the Options input to split by sentence, paragraph, word count, or custom separators.
Chunking Strategies
The node supports several chunking strategies via the Options configuration:
| Strategy | Description |
|---|---|
| Count | Creates chunks containing a specific number of units (words, sentences, or paragraphs). |
| Divide | Divides the text into a specified number of equal parts. |
| Separator | Splits text using a custom delimiter (e.g., \n\n for double line breaks). |
| Structure | Returns the extracted elements (sentences, paragraphs, or words) as individual items without combining them. |
Unit Types
When using Count or Divide strategies, you can specify the unit to chunk by:
| Unit | Description |
|---|---|
| Word | Chunks based on word boundaries using NLP tokenization. |
| Sentence | Chunks based on sentence boundaries using NLP analysis. |
| Paragraph | Chunks based on paragraph breaks (double line breaks). |
| Custom | Uses a custom separator string for splitting (only available with Separator strategy). |
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Text | Text | The input text to be chunked. | - |
| Options | Data | Configuration object specifying chunking strategy, unit type, size/parts, overlap, and separator. Configure via the properties panel. | See defaults below |
Options Configuration
The Options input accepts an object with the following structure:
{
"type": "count", // "count", "divide", "separator", or "structure"
"options": {
"element": "word", // "word", "sentence", "paragraph" (for count/divide)
"count": 100, // for "count" strategy: units per chunk
"divisor": 5, // for "divide" strategy: number of parts
"value": "\\n\\n" // for "separator" strategy: delimiter string
},
"overlap": 0 // number of units to overlap between chunks
}
Defaults: If no options are provided, the node defaults to chunking by approximately 700 words per chunk.
Outputs
| Output | Type | Description |
|---|---|---|
| Chunks | Data | An array of text strings, where each string is a chunk of the original text. |
Runtime Behavior and Defaults
- Uncontrolled Node: The node runs automatically when its inputs change and does not require an event trigger.
- Default Chunking: Without configuration, splits text into ~700-word chunks using NLP tokenization.
- NLP Processing: When using "word" or "sentence" units, the node uses natural language processing to identify proper boundaries rather than simple string splitting.
- Overlap Support: You can configure overlapping content between chunks to maintain context (configured in options).
- Empty Input: Returns an empty array if the input text is empty or null.
Example Usage
Basic Chunking (Default)
Connect a text source to the Text input. The node will automatically output chunks of approximately 700 words each.
Input: "Long document text..."
Output: ["First ~700 words...", "Next ~700 words...", ...]
Chunk by Sentences (Count Strategy)
Configure the Options to create chunks of 5 sentences each:
{
"type": "count",
"options": {
"element": "sentence",
"count": 5
},
"overlap": 1
}
This creates chunks containing 5 sentences each, with 1 sentence overlapping between consecutive chunks.
Divide into Equal Parts
Split a document into exactly 3 equal parts by word count:
{
"type": "divide",
"options": {
"element": "word",
"divisor": 3
}
}
Split by Custom Separator
Break text at specific delimiters (e.g., markdown headers):
{
"type": "separator",
"options": {
"value": "## "
}
}
Processing Chunks
Connect the Chunks output to a For Each node to process each chunk individually:
- Connect Chunks to the Object input of a For Each node
- The For Each node will iterate through each chunk
- Process each chunk with AI nodes, database operations, or other transformations
When preparing text for AI processing, use the Token Count node to estimate tokens per chunk and adjust your chunk size accordingly to stay within model context windows.