Skip to main content

Chunk Text

Uncontrolled node

Overview

The Chunk Text node splits a large text input into smaller, manageable pieces (chunks) based on configurable strategies. This is useful for processing long documents in batches, preparing text for AI models with token limits, or segmenting content for analysis.

By default, the node chunks text into approximately 700-word segments. However, you can customize the chunking behavior through the Options input to split by sentence, paragraph, word count, or custom separators.

Chunking Strategies

The node supports several chunking strategies via the Options configuration:

StrategyDescription
CountCreates chunks containing a specific number of units (words, sentences, or paragraphs).
DivideDivides the text into a specified number of equal parts.
SeparatorSplits text using a custom delimiter (e.g., \n\n for double line breaks).
StructureReturns the extracted elements (sentences, paragraphs, or words) as individual items without combining them.

Unit Types

When using Count or Divide strategies, you can specify the unit to chunk by:

UnitDescription
WordChunks based on word boundaries using NLP tokenization.
SentenceChunks based on sentence boundaries using NLP analysis.
ParagraphChunks based on paragraph breaks (double line breaks).
CustomUses a custom separator string for splitting (only available with Separator strategy).

Inputs

InputTypeDescriptionDefault
TextTextThe input text to be chunked.-
OptionsDataConfiguration object specifying chunking strategy, unit type, size/parts, overlap, and separator. Configure via the properties panel.See defaults below

Options Configuration

The Options input accepts an object with the following structure:

{
"type": "count", // "count", "divide", "separator", or "structure"
"options": {
"element": "word", // "word", "sentence", "paragraph" (for count/divide)
"count": 100, // for "count" strategy: units per chunk
"divisor": 5, // for "divide" strategy: number of parts
"value": "\\n\\n" // for "separator" strategy: delimiter string
},
"overlap": 0 // number of units to overlap between chunks
}

Defaults: If no options are provided, the node defaults to chunking by approximately 700 words per chunk.

Outputs

OutputTypeDescription
ChunksDataAn array of text strings, where each string is a chunk of the original text.

Runtime Behavior and Defaults

  • Uncontrolled Node: The node runs automatically when its inputs change and does not require an event trigger.
  • Default Chunking: Without configuration, splits text into ~700-word chunks using NLP tokenization.
  • NLP Processing: When using "word" or "sentence" units, the node uses natural language processing to identify proper boundaries rather than simple string splitting.
  • Overlap Support: You can configure overlapping content between chunks to maintain context (configured in options).
  • Empty Input: Returns an empty array if the input text is empty or null.

Example Usage

Basic Chunking (Default)

Connect a text source to the Text input. The node will automatically output chunks of approximately 700 words each.

Input: "Long document text..."
Output: ["First ~700 words...", "Next ~700 words...", ...]

Chunk by Sentences (Count Strategy)

Configure the Options to create chunks of 5 sentences each:

{
"type": "count",
"options": {
"element": "sentence",
"count": 5
},
"overlap": 1
}

This creates chunks containing 5 sentences each, with 1 sentence overlapping between consecutive chunks.

Divide into Equal Parts

Split a document into exactly 3 equal parts by word count:

{
"type": "divide",
"options": {
"element": "word",
"divisor": 3
}
}

Split by Custom Separator

Break text at specific delimiters (e.g., markdown headers):

{
"type": "separator",
"options": {
"value": "## "
}
}

Processing Chunks

Connect the Chunks output to a For Each node to process each chunk individually:

  1. Connect Chunks to the Object input of a For Each node
  2. The For Each node will iterate through each chunk
  3. Process each chunk with AI nodes, database operations, or other transformations
Use with AI Write

When preparing text for AI processing, use the Token Count node to estimate tokens per chunk and adjust your chunk size accordingly to stay within model context windows.