Skip to main content

CSV to Parquet

Controlled node

Overview

Converts CSV files to Apache Parquet format for efficient data processing and storage. This node is essential when working with large datasets that need to be queried or processed using the Query Dataframes or Pivot Dataframe nodes, as Parquet provides significantly better performance and compression than CSV.

The node automatically handles the conversion process, storing the resulting Parquet file in a temporary directory within your project library.

File Size Limit

The node enforces a maximum file size limit of 30GB for CSV files. Attempting to convert files larger than this will result in an error.

Inputs

InputTypeDescriptionDefault
RunEventTriggers the conversion process.-
FileFileSourceThe CSV file to convert. Only accepts files with MIME type text/csv.-

Outputs

OutputTypeDescription
DoneEventFires when the conversion is complete and the Parquet file has been created.
OutputDataThe converted Parquet file as a library file object. Contains id, name, mimeType (application/vnd.apache.parquet), dir, and size properties.

Runtime Behavior

When triggered, the node performs the following operations:

  1. Validates that the input file exists and is a CSV file
  2. Checks that the file size does not exceed 30GB
  3. Converts the CSV to Parquet format using the dataframeSystem service
  4. Creates a new file entry in the __temporaryFiles__ directory of your project library
  5. Returns the file metadata object on the Output socket

If the conversion fails or the file exceeds size limits, the Output will contain an error object with a descriptive message instead of the file data.

Example Usage

Scenario: Converting a large CSV dataset for efficient querying.

  1. Connect a Read CSV node or a file input to the File socket of the CSV to Parquet node.
  2. Trigger the Run event (typically from a Start node or button event).
  3. Connect the Output to a Query Dataframes node to perform SQL queries on the converted data, or to a Write To Library node to save it permanently.
[Start Node] → Run → [CSV to Parquet] → Done → [Query Dataframes]

Output (Parquet file object)