Read Parquet

Controlled node

Overview

The Read Parquet node reads Apache Parquet files from the Intellectible library and converts them into a list of records (objects) that can be used within your workflow. Parquet is a columnar storage format optimized for complex data processing and analytics.

This node is useful when you need to:

Import tabular data stored in Parquet format
Process large datasets efficiently
Convert Parquet files into JSON-like records for manipulation by other nodes

Performance Note

Parquet files are stored in a columnar format, making them efficient for reading specific columns. The Read Parquet node loads the data as rows (records), which is ideal for row-based operations in workflows.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the node to read the Parquet file.	-
File	FileSource	The Parquet file to read from the library. Only accepts files with MIME type `application/vnd.apache.parquet`.	-
Num Rows	Number	Optional. Limits the number of rows to read from the file. If not specified or set to 0, all rows are read.	-

Outputs

Output	Type	Description
Done	Event	Fires when the file has been successfully read and the data is available.
Output	Data	A list of records (objects) representing the rows in the Parquet file. Each object contains key-value pairs where keys are column names and values are the cell data.

Runtime Behavior and Defaults

File Validation: The node validates that the input file is a valid Parquet file. If an unsupported file type is provided, the node returns an error.
Row Limiting: When Num Rows is provided, only that many rows are read from the beginning of the file. If omitted, the entire file is read.
Data Structure: The output is always a list (array) of objects, even if the Parquet file contains a single row. Each object represents one row with column names as keys.
Error Handling: If the file is missing, the project ID cannot be determined, or an error occurs during reading, the Output will contain an error object with a descriptive message.

Example Usage

Basic File Reading

Connect a Start node or any trigger to the Run input, and provide a Parquet file via the File input (using a File Source control or connecting from another node that provides file objects):

[Start Node] → Run
[File Source] → File

The Output will contain:

[
  {"id": 1, "name": "Alice", "age": 30},
  {"id": 2, "name": "Bob", "age": 25}
]

Limiting Rows for Preview

To read only the first 100 rows for a preview or testing:

Input	Value
File	`my_large_dataset.parquet`
Num Rows	`100`

This is useful when working with large Parquet files and you want to test your workflow on a subset of data before processing the entire file.

Processing with Loop

Connect the Output to a For Each node to process each record individually:

Read Parquet → Output → For Each → [Process Each Record]
Read Parquet → Done → [Next Step]

Overview​

Inputs​

Outputs​

Runtime Behavior and Defaults​

Example Usage​

Basic File Reading​

Limiting Rows for Preview​

Processing with Loop​