Skip to main content

Pivot Dataframe

Controlled node

Overview

The Pivot Dataframe node transforms tabular data by pivoting (reshaping) it, similar to Excel pivot tables or pandas pivot operations. It takes a dataframe file (Parquet or CSV) and reorganizes it by turning unique values from one column into new columns, while aggregating values from another column.

This is useful for:

  • Reshaping data from long format to wide format
  • Creating summary tables with aggregated metrics
  • Preparing data for visualization or reporting
  • Transforming categorical data into feature columns

Inputs

InputTypeDescriptionDefault
RunEventTriggers the pivot operation-
FileDataThe input dataframe file (Parquet or CSV format)-
Index ColumnsTextComma-separated list of column names to use as row identifiers (index)-
Column to PivotTextThe column whose unique values will become new columns-
Value ColumnTextThe column containing values to aggregate-
Aggregation FunctionEnumHow to aggregate values when multiple rows map to the same index/pivot combination: sum, mean, median, min, max, count, first, lastsum

Outputs

OutputTypeDescription
DoneEventFires when the pivot operation completes
OutputDataThe pivoted dataframe as a new Parquet file object

Runtime Behavior and Defaults

When the node runs, it:

  1. Reads the input file (Parquet or CSV) from storage
  2. Parses the indexColumns input - if provided as a string, splits by commas and trims whitespace
  3. Performs the pivot operation using the specified aggregation function
  4. Creates a new Parquet file in the project's temporary files directory
  5. Returns a file object with metadata including the new file ID, name, and size

Default Values:

  • aggFunction: sum

File Handling:

  • Input files must be Parquet (.parquet, .pqt) or CSV (.csv) format
  • Output is always a Parquet file
  • The output file is automatically named with a timestamp (e.g., pivot_result_1699123456789.parquet)
  • Files are stored in the __temporaryFiles__ directory of the project library

Example Usage

Scenario: You have a sales dataset with columns date, product, and revenue, and you want to see total revenue per product for each date.

Configuration:

  • File: Connect your sales dataset file
  • Index Columns: date
  • Column to Pivot: product
  • Value Column: revenue
  • Aggregation Function: sum

Result: The output will be a pivoted table where:

  • Rows represent unique dates
  • Columns represent unique products (e.g., "Product_A", "Product_B")
  • Cells contain the sum of revenue for that date-product combination

Input data (long format):

dateproductrevenue
2024-01-01A100
2024-01-01B150
2024-01-02A200

Output data (wide format):

dateAB
2024-01-01100150
2024-01-02200null

Note: If multiple rows exist for the same index/pivot combination, they are aggregated according to the selected function. For example, with count, you would get the number of transactions per date-product pair.