Kontext Copilot - An AI assistant for data analytics. Learn more
Expression of Interest
Spark SQL - Average (AVG) Calculation
insights Stats
warning Please login first to view stats information.
Raymond
Code Snippets & Tips
Code snippets and tips for various programming languages/frameworks. All code examples are under MIT or Apache 2.0 license unless specified otherwise.
Spark provides flexible APIs to perform average calculation against a data set. You can either use Spark SQL or fluent APIs to implement it.
Spark SQL - AVG
The follow code snippet shows you how to use AVG directly via Spark SQL. You can run the query against Hive databases or directly in a Spark-SQL shell.
from pyspark.sql import SparkSession from pyspark.sql.functions import avg appName = "PySpark Average (AVG) Example" master = "local" # Create Spark session with Hive supported. spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() spark.sparkContext.setLogLevel("ERROR") # GroupBY df = spark.sql("""SELECT TXN_DT, AVG(AMT) FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT,TXN_DT) GROUP BY TXN_DT""") df.show()
Result:
+----------+----------+ | TXN_DT| avg(AMT)| +----------+----------+ |2021-01-01| 68.340000| |2021-01-03|900.560000| |2021-01-02|913.100000| +----------+----------+
Use avg API
The above example can also be changed to use avg API directly. This is useful is you already have an dataframe and if you don't want to use Spark SQL:
# AVG df = spark.sql("""SELECT ACCT,AMT,TXN_DT FROM VALUES (101,10.01, DATE'2021-01-01'), (101,102.01, DATE'2021-01-01'), (102,93., DATE'2021-01-01'), (103,913.1, DATE'2021-01-02'), (102,913.1, DATE'2021-01-02'), (101,900.56, DATE'2021-01-03') AS TXN(ACCT,AMT,TXN_DT) """) df.groupBy("TXN_DT").agg(avg("AMT").alias("AVG_AMT")).show()
The result will be the same.
+----------+----------+ | TXN_DT| AVG_AMT| +----------+----------+ |2021-01-01| 68.340000| |2021-01-03|900.560000| |2021-01-02|913.100000| +----------+----------+
Remember to import avg function:
from pyspark.sql.functions import avg
copyright
This page is subject to Site terms.
comment Comments
No comments yet.