CSV to Parquet
Controlled node
Overview
Converts CSV files to Apache Parquet format for efficient data processing and storage. This node is essential when working with large datasets that need to be queried or processed using the Query Dataframes or Pivot Dataframe nodes, as Parquet provides significantly better performance and compression than CSV.
The node automatically handles the conversion process, storing the resulting Parquet file in a temporary directory within your project library.
The node enforces a maximum file size limit of 30GB for CSV files. Attempting to convert files larger than this will result in an error.
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the conversion process. | - |
| File | FileSource | The CSV file to convert. Only accepts files with MIME type text/csv. | - |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when the conversion is complete and the Parquet file has been created. |
| Output | Data | The converted Parquet file as a library file object. Contains id, name, mimeType (application/vnd.apache.parquet), dir, and size properties. |
Runtime Behavior
When triggered, the node performs the following operations:
- Validates that the input file exists and is a CSV file
- Checks that the file size does not exceed 30GB
- Converts the CSV to Parquet format using the
dataframeSystemservice - Creates a new file entry in the
__temporaryFiles__directory of your project library - Returns the file metadata object on the Output socket
If the conversion fails or the file exceeds size limits, the Output will contain an error object with a descriptive message instead of the file data.
Example Usage
Scenario: Converting a large CSV dataset for efficient querying.
- Connect a Read CSV node or a file input to the File socket of the CSV to Parquet node.
- Trigger the Run event (typically from a Start node or button event).
- Connect the Output to a Query Dataframes node to perform SQL queries on the converted data, or to a Write To Library node to save it permanently.
[Start Node] → Run → [CSV to Parquet] → Done → [Query Dataframes]
↓
Output (Parquet file object)