Read Parquet
Controlled node
Overview
The Read Parquet node reads Apache Parquet files from the Intellectible library and converts them into a list of records (objects) that can be used within your workflow. Parquet is a columnar storage format optimized for complex data processing and analytics.
This node is useful when you need to:
- Import tabular data stored in Parquet format
- Process large datasets efficiently
- Convert Parquet files into JSON-like records for manipulation by other nodes
Parquet files are stored in a columnar format, making them efficient for reading specific columns. The Read Parquet node loads the data as rows (records), which is ideal for row-based operations in workflows.
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the node to read the Parquet file. | - |
| File | FileSource | The Parquet file to read from the library. Only accepts files with MIME type application/vnd.apache.parquet. | - |
| Num Rows | Number | Optional. Limits the number of rows to read from the file. If not specified or set to 0, all rows are read. | - |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when the file has been successfully read and the data is available. |
| Output | Data | A list of records (objects) representing the rows in the Parquet file. Each object contains key-value pairs where keys are column names and values are the cell data. |
Runtime Behavior and Defaults
- File Validation: The node validates that the input file is a valid Parquet file. If an unsupported file type is provided, the node returns an error.
- Row Limiting: When
Num Rowsis provided, only that many rows are read from the beginning of the file. If omitted, the entire file is read. - Data Structure: The output is always a list (array) of objects, even if the Parquet file contains a single row. Each object represents one row with column names as keys.
- Error Handling: If the file is missing, the project ID cannot be determined, or an error occurs during reading, the
Outputwill contain an error object with a descriptive message.
Example Usage
Basic File Reading
Connect a Start node or any trigger to the Run input, and provide a Parquet file via the File input (using a File Source control or connecting from another node that provides file objects):
[Start Node] → Run
[File Source] → File
The Output will contain:
[
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25}
]
Limiting Rows for Preview
To read only the first 100 rows for a preview or testing:
| Input | Value |
|---|---|
| File | my_large_dataset.parquet |
| Num Rows | 100 |
This is useful when working with large Parquet files and you want to test your workflow on a subset of data before processing the entire file.
Processing with Loop
Connect the Output to a For Each node to process each record individually:
Read Parquet → Output → For Each → [Process Each Record]
Read Parquet → Done → [Next Step]