Spark SQL - Calculate Covariance

access_time 5 days ago visibility6 comment 0

Spark SQL provides functions to calculate covariances of a set of number pairs. There are two functions: covar_pop(expr1, expr2) and covar_samp(expr1, expr2). The first one calculates population covariance while the second one calculates sample covariance. 

covar_pop

Example:

SELECT covar_pop(col1,col2) FROM VALUES 
(1,10.),
(2,20.1),
(3,29.86),
(4,41.8),
(10,101.5)
AS tab(col1, col2);

Output:

covar_pop(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE))
101.788

covar_samp

Example:

SELECT covar_samp(col1,col2) FROM VALUES 
(1,10.),
(2,20.1),
(3,29.86),
(4,41.8),
(10,101.5)
AS tab(col1, col2);

Output:

covar_samp(CAST(col1 AS DOUBLE), CAST(col2 AS DOUBLE))
127.235
infoThe difference between sample and population covariance implementation can be found here: spark/Covariance.scala at master ยท apache/spark (github.com)
copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 8
thumb_up 0
access_time 9 days ago

Spark LAG function provides access to a row at a given offset that comes before the current row in the windows. This function can be used in a SELECT statement to compare values in the current row with values in a previous row. lag(input[, offset[, default]]) OVER ([PARYITION BY ..] ORDER BY ...) ...

visibility 10
thumb_up 0
access_time 5 days ago

In Spark SQL, function std or   stddev or    stddev_sample  can be used to calculate sample standard deviation from values of a group.  std(expr) stddev(expr) stddev_samp(expr) The first two functions are the alias of stddev_sample function. SELECT ACCT ...

visibility 5
thumb_up 0
access_time 6 days ago

JSON string values can be extracted using built-in Spark functions like get_json_object or json_tuple.  Values can be extracted using get_json_object function. The function has two parameters: json_txt and path. The first is the JSON text itself, for example a string column in your Spark ...

About column

Code snippets and tips for various programming languages/frameworks.