Query Dataframes

Controlled node

Overview

The Query Dataframes node allows you to execute SQL queries against one or more dataframes (Parquet or CSV files) from your library. This node is useful for filtering, joining, aggregating, and transforming tabular data using standard SQL syntax.

The node accepts multiple input files, assigns them table names (either automatically or via the Table Names input), executes the provided SQL query, and outputs the results as a new Parquet file in your library.

Supported File Formats

While the node primarily works with Parquet files (.parquet, .pqt), it also supports CSV files (.csv) which are automatically handled during query execution.

Inputs

Input	Type	Description	Default
Run	Event	Triggers the node to execute the SQL query against the provided files.	-
Files	Data	One or more files to query. Accepts Parquet or CSV files from the library. Supports up to 1000 files.	-
Table Names	Text	Comma-separated list of table names to use in the SQL query (e.g., `"users, orders"`). If not provided, files are automatically named `df0`, `df1`, etc. based on their order.	`df0, df1, ...`
Query	Code	The SQL query to execute against the dataframes. Use the table names defined in Table Names (or the defaults) to reference the files.	-

Outputs

Output	Type	Description
Done	Event	Fires when the query has completed successfully.
Output	Data	A library file object containing the query results as a Parquet file. The object includes `id`, `name`, `mimeType`, `dir`, and `size` properties.

Runtime Behavior and Defaults

When the Query Dataframes node runs:

File Validation: The node validates that at least one file is provided and that all files are in a supported format (Parquet or CSV).
Table Naming:
- If Table Names is provided as a comma-separated string (e.g., "customers, transactions"), these names are used in the SQL query.
- If Table Names is provided as an array, those values are used.
- If not provided, files are automatically assigned names df0, df1, df2, etc., corresponding to their order in the Files input.
Query Execution: The SQL query is executed against the dataframes using the specified table names. The query runs in a temporary context and does not modify the original files.
Output Generation: The results are saved as a new Parquet file in the library under the __temporaryFiles__ directory. The file is named query_result_{timestamp}.parquet.
Error Handling: If no files are provided, no query is provided, or the query fails, the Output will contain an error object with a descriptive message.

Output Format

The Output is always a Parquet file, regardless of whether the input files were CSV or Parquet. The output file object follows the standard Intellectible library format:

{
  "id": "unique-file-id",
  "name": "query_result_1699123456789.parquet",
  "mimeType": "application/vnd.apache.parquet",
  "dir": "directory-id",
  "size": 1024
}

Example

Basic Query on a Single File

Connect a Read Parquet node (or CSV to Parquet node) to the Files input of the Query Dataframes node.
Leave Table Names empty (or set it to "sales").
In the Query input, enter:
```
SELECT * FROM df0 WHERE amount > 1000
```
(or FROM sales if you named the table).
Trigger the Run event.
The Output will contain a new Parquet file with only the rows where the amount exceeds 1000.

Joining Multiple Dataframes

Connect two Read Parquet nodes to the Files input (combine them using a Combine Data node or connect both to the same input).
Set Table Names to "users, orders".

In the Query input, enter:

SELECT u.name, o.total 
FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE o.status = 'completed'

The Output will contain the joined results as a new Parquet file.

File Limits

The node supports up to 1000 input files. Exceeding this limit will result in an error.

Overview​

Inputs​

Outputs​

Runtime Behavior and Defaults​

Example​

Basic Query on a Single File​

Joining Multiple Dataframes​