access_time 7 months ago languageEnglish
more_vert

Spark SQL - Standard Deviation Calculation

visibility 435 comment 0

In Spark SQL, function std or stddev or  stddev_sample can be used to calculate sample standard deviation from values of a group. 

Function signature

std(expr)
stddev(expr)
stddev_samp(expr)

The first two functions are the alias of stddev_sample function.

Code snippets

SELECT ACCT, std(AMT) FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT)
group by ACCT;

Output:

ACCT    std(CAST(AMT AS DOUBLE))
101     489.76616954760493
103     NaN
102     579.8982712510876

Another example:

SELECT std(AMT) FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);	

Output:

std(CAST(AMT AS DOUBLE))
461.543305097149

Population standard deviation stddev_pop

Compared with sample standard deviation, there is a population standard deviation named stddev_pop.

Example:

SELECT stddev_pop(AMT) FROM VALUES 
(101,10.01, DATE'2021-01-01'),
(101,102.01, DATE'2021-01-01'),
(102,93., DATE'2021-01-01'),
(103,913.1, DATE'2021-01-02'),
(102,913.1, DATE'2021-01-02'),
(101,900.56, DATE'2021-01-03')
AS TXN(ACCT,AMT, TXN_DT);

Output:

stddev_pop(CAST(AMT AS DOUBLE))
421.3294657786627

The results are different because sample and population standard deviation have different formula definitions. Find the implementation difference here: spark/CentralMomentAgg.scala at master ยท apache/spark (github.com).

info Last modified by Raymond 7 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn.

Want to contribute on Kontext to help others?

Learn more

More from Kontext

visibility 402
thumb_up 0
access_time 7 months ago