access_time 7 months ago languageEnglish
more_vert

Spark SQL - Calculate Covariance

visibility 94 comment 0

Spark SQL provides functions to calculate covariances of a set of number pairs. There are two functions: covar_pop(expr1, expr2) and covar_samp(expr1, expr2). The first one calculates population covariance while the second one calculates sample covariance. 

covar_pop

Example:

SELECT covar_pop(col1,col2) FROM VALUES 
(1,10.),
(2,20.1),
(3,29.86),
(4,41.8),
(10,101.5)
AS tab(col1, col2);

Output:

covar_pop(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE))
101.788

covar_samp

Example:

SELECT covar_samp(col1,col2) FROM VALUES 
(1,10.),
(2,20.1),
(3,29.86),
(4,41.8),
(10,101.5)
AS tab(col1, col2);

Output:

covar_samp(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE))
127.235
infoThe difference between sample and population covariance implementation can be found here: spark/Covariance.scala at master ยท apache/spark (github.com)
copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn.

Want to contribute on Kontext to help others?

Learn more

More from Kontext

visibility 2281
thumb_up 0
access_time 7 months ago
visibility 402
thumb_up 0
access_time 7 months ago
visibility 953
thumb_up 0
access_time 7 months ago