Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
Spark SQL - PIVOT Clause
insights Stats
warning Please login first to view stats information.
Raymond
Spark & PySpark
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Like other SQL engines, Spark also supports PIVOT clause. PIVOT is usually used to calculated aggregated values for each value in a column and the calculated values will be included as columns in the result set.
Syntax for PIVOT clause
PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ] FOR column_list IN ( expression_list ) )
- aggregate_expression: an aggregate expression like MAX, MEAN, etc.
- aggregate_expression_alias: alias name for the aggregation expression.
- column_list: column name list that we can to replace in the FROM clause. Note it can be multiple columns.
- expression_list: to specify new column names for the aggregated columns. Alias can also be used for these new columns use AS clause.
Pivot one column
The following code snippet pivot one column named Product:
SELECT * FROM VALUES (101,'A',1000.01), (101,'B',2000), (101,'C',5000), (102,'A',2000.01), (102,'B',4000.1), (103,'A',2000.01), (103,'B',4000.1), (101,'A',3000.01) AS Sales(Employee,Product,Amount) PIVOT ( SUM(Amount) AS amt, COUNT(Amount) AS cnt FOR Product IN ( 'A' AS a, 'B' as b, 'C' AS c) ) ;
The output looks like the following:
Employee a_amt a_cnt b_amt b_cnt c_amt c_cnt 101 4000.02 2 2000.00 1 5000.00 1 103 2000.01 1 4000.10 1 NULL NULL 102 2000.01 1 4000.10 1 NULL NULL
Pivot two columns
You can pivot multiple columns at the same time in Spark SQL. For example, combinations of product and channels.
Example:
SELECT * FROM VALUES (101,'A',1,1000.01), (101,'B',0,2000), (102,'A',0,2000.01), (102,'B',1,4000.1), (103,'A',0,2000.01), (103,'B',1,4000.1), (101,'A',1,3000.01) AS Sales(Employee,Product,Online,Amount) PIVOT ( SUM(Amount) AS amt FOR (Product,Online) IN ( ('A',1) AS a_online, ('B',1) as b_online, ('A',0) AS a_other, ('B',0) as b_other) ) ;
Output:
Employee a_online b_online a_other b_other 101 4000.02 NULL NULL 2000.00 103 NULL 4000.10 2000.01 NULL 102 NULL 4000.10 2000.01 NULL
infoYou can also specify multiple aggregation functions in the above example. For simplicity, I only used SUM as the aggregation function.
copyright
This page is subject to Site terms.
comment Comments
No comments yet.