CSV to Parquet

Controlled node

Overview

Converts CSV files to Apache Parquet format for efficient data processing and storage. This node is essential when working with large datasets that need to be queried or processed using the Query Dataframes or Pivot Dataframe nodes, as Parquet provides significantly better performance and compression than CSV.

The node automatically handles the conversion process, storing the resulting Parquet file in a temporary directory within your project library.

File Size Limit

The node enforces a maximum file size limit of 30GB for CSV files. Attempting to convert files larger than this will result in an error.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the conversion process.	-
File	FileSource	The CSV file to convert. Only accepts files with MIME type `text/csv`.	-

Outputs

Output	Type	Description
Done	Event	Fires when the conversion is complete and the Parquet file has been created.
Output	Data	The converted Parquet file as a library file object. Contains `id`, `name`, `mimeType` (`application/vnd.apache.parquet`), `dir`, and `size` properties.

Runtime Behavior

When triggered, the node performs the following operations:

Validates that the input file exists and is a CSV file
Checks that the file size does not exceed 30GB
Converts the CSV to Parquet format using the dataframeSystem service
Creates a new file entry in the __temporaryFiles__ directory of your project library
Returns the file metadata object on the Output socket

If the conversion fails or the file exceeds size limits, the Output will contain an error object with a descriptive message instead of the file data.

Example Usage

Scenario: Converting a large CSV dataset for efficient querying.

Connect a Read CSV node or a file input to the File socket of the CSV to Parquet node.
Trigger the Run event (typically from a Start node or button event).
Connect the Output to a Query Dataframes node to perform SQL queries on the converted data, or to a Write To Library node to save it permanently.

[Start Node] → Run → [CSV to Parquet] → Done → [Query Dataframes]
                    ↓
                  Output (Parquet file object)

Overview​

Inputs​

Outputs​

Runtime Behavior​

Example Usage​

Overview

Inputs

Outputs

Runtime Behavior

Example Usage