Spark SQL - Calculate Covariance
Spark SQL provides functions to calculate covariances of a set of number pairs. There are two functions: covar_pop(expr1, expr2) and covar_samp(expr1, expr2). The first one calculates population covariance while the second one calculates sample covariance.
covar_pop
Example:
SELECT covar_pop(col1,col2) FROM VALUES (1,10.), (2,20.1), (3,29.86), (4,41.8), (10,101.5) AS tab(col1, col2);
Output:
covar_pop(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE)) 101.788
covar_samp
Example:
SELECT covar_samp(col1,col2) FROM VALUES (1,10.), (2,20.1), (3,29.86), (4,41.8), (10,101.5) AS tab(col1, col2);
Output:
covar_samp(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE)) 127.235
infoThe difference between sample and population covariance implementation can be found here: spark/Covariance.scala at master ยท apache/spark (github.com)
copyright
This page is subject to Site terms.
comment Comments
No comments yet.