Skip to main content

Read Parquet

Controlled node

Overview

The Read Parquet node reads Apache Parquet files from the Intellectible library and converts them into a list of records (objects) that can be used within your workflow. Parquet is a columnar storage format optimized for complex data processing and analytics.

This node is useful when you need to:

  • Import tabular data stored in Parquet format
  • Process large datasets efficiently
  • Convert Parquet files into JSON-like records for manipulation by other nodes
Performance Note

Parquet files are stored in a columnar format, making them efficient for reading specific columns. The Read Parquet node loads the data as rows (records), which is ideal for row-based operations in workflows.

Inputs

InputTypeDescriptionDefault
RunEventTriggers the node to read the Parquet file.-
FileFileSourceThe Parquet file to read from the library. Only accepts files with MIME type application/vnd.apache.parquet.-
Num RowsNumberOptional. Limits the number of rows to read from the file. If not specified or set to 0, all rows are read.-

Outputs

OutputTypeDescription
DoneEventFires when the file has been successfully read and the data is available.
OutputDataA list of records (objects) representing the rows in the Parquet file. Each object contains key-value pairs where keys are column names and values are the cell data.

Runtime Behavior and Defaults

  • File Validation: The node validates that the input file is a valid Parquet file. If an unsupported file type is provided, the node returns an error.
  • Row Limiting: When Num Rows is provided, only that many rows are read from the beginning of the file. If omitted, the entire file is read.
  • Data Structure: The output is always a list (array) of objects, even if the Parquet file contains a single row. Each object represents one row with column names as keys.
  • Error Handling: If the file is missing, the project ID cannot be determined, or an error occurs during reading, the Output will contain an error object with a descriptive message.

Example Usage

Basic File Reading

Connect a Start node or any trigger to the Run input, and provide a Parquet file via the File input (using a File Source control or connecting from another node that provides file objects):

[Start Node] → Run
[File Source] → File

The Output will contain:

[
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25}
]

Limiting Rows for Preview

To read only the first 100 rows for a preview or testing:

InputValue
Filemy_large_dataset.parquet
Num Rows100

This is useful when working with large Parquet files and you want to test your workflow on a subset of data before processing the entire file.

Processing with Loop

Connect the Output to a For Each node to process each record individually:

Read Parquet → Output → For Each → [Process Each Record]
Read Parquet → Done → [Next Step]