Write Parquet
Controlled node
Overview
The Write Parquet node converts tabular data (arrays of records/objects) into Apache Parquet format and saves it to the project's library. Parquet files are columnar storage files that provide efficient compression and encoding schemes, making them ideal for storing large datasets and analytical workloads.
This node is commonly used to persist processed dataframes, database query results, or transformed CSV data in a format optimized for downstream analytics and storage efficiency.
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the write operation. The node will not execute until this event fires. | - |
| Data | Data | The tabular data to write. Accepts an array of objects/records (e.g., [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]). | - |
| Name | Text | The filename for the output. If the name does not end with .parquet, the extension is automatically appended. | - |
| Path | Text | Optional folder path within the library where the file should be saved (e.g., "exports/2024"). If the path does not exist, directories are created automatically. If omitted, the file is saved to the root directory. | root |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when the Parquet file has been successfully written to storage and metadata has been updated. |
| File | Data | Returns a file object containing metadata about the saved Parquet file: id, name, mimeType (application/vnd.apache.parquet), dir (directory ID), and size (bytes). |
Runtime Behavior and Defaults
- Automatic Extension: If the provided filename does not include the
.parquetextension, it is automatically appended to ensure proper file typing. - Directory Creation: The node automatically creates any missing directories specified in the path input using the library's folder structure.
- Storage: Files are written to Google Cloud Storage at
gs://{bucket}/libraryDocuments/{projectId}/{fileId}.parquetand registered in the project's document metadata. - Error Handling: If no data is provided, the project ID is unavailable, or the filename is invalid, the node outputs an error object on the
fileoutput instead of the file metadata. - Data Format: The input data should be an array of plain objects. Nested objects and arrays are supported but will be serialized according to Parquet schema inference rules.
Example Usage
Connect the output of a database query or data processing node to the Data input, specify a filename, and trigger the Run event to save the results:
- Connect a Query Database node's
resultoutput to the Data input of Write Parquet. - Set the Name input to
"monthly_sales_report"(the node will save it asmonthly_sales_report.parquet). - Set the Path input to
"reports/2024"to organize the file in a subfolder. - Connect a Start node or event trigger to the Run input.
- When executed, the node fires the Done event and outputs the file reference, which can then be passed to Get Library Download URLs to generate a shareable link or used as input to other file processing nodes.