Pivot Dataframe
Controlled node
Overview
The Pivot Dataframe node transforms tabular data by pivoting (reshaping) it, similar to Excel pivot tables or pandas pivot operations. It takes a dataframe file (Parquet or CSV) and reorganizes it by turning unique values from one column into new columns, while aggregating values from another column.
This is useful for:
- Reshaping data from long format to wide format
- Creating summary tables with aggregated metrics
- Preparing data for visualization or reporting
- Transforming categorical data into feature columns
Inputs
| Input | Type | Description | Default |
|---|---|---|---|
| Run | Event | Triggers the pivot operation | - |
| File | Data | The input dataframe file (Parquet or CSV format) | - |
| Index Columns | Text | Comma-separated list of column names to use as row identifiers (index) | - |
| Column to Pivot | Text | The column whose unique values will become new columns | - |
| Value Column | Text | The column containing values to aggregate | - |
| Aggregation Function | Enum | How to aggregate values when multiple rows map to the same index/pivot combination: sum, mean, median, min, max, count, first, last | sum |
Outputs
| Output | Type | Description |
|---|---|---|
| Done | Event | Fires when the pivot operation completes |
| Output | Data | The pivoted dataframe as a new Parquet file object |
Runtime Behavior and Defaults
When the node runs, it:
- Reads the input file (Parquet or CSV) from storage
- Parses the
indexColumnsinput - if provided as a string, splits by commas and trims whitespace - Performs the pivot operation using the specified aggregation function
- Creates a new Parquet file in the project's temporary files directory
- Returns a file object with metadata including the new file ID, name, and size
Default Values:
aggFunction:sum
File Handling:
- Input files must be Parquet (
.parquet,.pqt) or CSV (.csv) format - Output is always a Parquet file
- The output file is automatically named with a timestamp (e.g.,
pivot_result_1699123456789.parquet) - Files are stored in the
__temporaryFiles__directory of the project library
Example Usage
Scenario: You have a sales dataset with columns date, product, and revenue, and you want to see total revenue per product for each date.
Configuration:
- File: Connect your sales dataset file
- Index Columns:
date - Column to Pivot:
product - Value Column:
revenue - Aggregation Function:
sum
Result: The output will be a pivoted table where:
- Rows represent unique dates
- Columns represent unique products (e.g., "Product_A", "Product_B")
- Cells contain the sum of revenue for that date-product combination
Input data (long format):
| date | product | revenue |
|---|---|---|
| 2024-01-01 | A | 100 |
| 2024-01-01 | B | 150 |
| 2024-01-02 | A | 200 |
Output data (wide format):
| date | A | B |
|---|---|---|
| 2024-01-01 | 100 | 150 |
| 2024-01-02 | 200 | null |
Note: If multiple rows exist for the same index/pivot combination, they are aggregated according to the selected function. For example, with count, you would get the number of transactions per date-product pair.