Spark SQL Functions
Spark SQL - PIVOT Clause
Like other SQL engines, Spark also supports PIVOT clause. PIVOT is usually used to calculated aggregated values for each value in a column and the calculated values will be included as columns in the result set. PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ] FOR ...
Spark SQL - Calculate Covariance
Spark SQL provides functions to calculate covariances of a set of number pairs. There are two functions: covar_pop(expr1, expr2) and covar_samp(expr1, expr2) . The first one calculates population covariance while the second one calculates sample covariance. Example: SELECT ...
In Spark SQL, function std or stddev or stddev_sample can be used to calculate sample standard deviation from values of a group. std(expr) stddev(expr) stddev_samp(expr) The first two functions are the alias of stddev_sample function. SELECT ACCT ...
Spark SQL - FIRST_VALUE or LAST_VALUE
In Spark SQL, function FIRST_VALUE (FIRST) and LAST_VALUE (LAST) can be used to to find the first or the last value of given column or expression for a group of rows. If parameter `isIgnoreNull` is specified as true, they return only non-null values (unless all values are null). first(expr[ ...
Spark SQL - Array Functions
Unlike traditional RDBMS systems, Spark SQL supports complex types like array or map. There are a number of built-in functions to operate efficiently on array values. ArrayType columns can be created directly using array or array_repeat function. The latter repeat one element multiple times ...
Spark SQL - Map Functions
In Spark SQL, MapType is designed for key values, which is like dictionary object type in many other programming languages. This article summarize the commonly used map functions in Spark SQL. Function map is used to create a map. Example: spark-sql> select ...
In article Scala: Parse JSON String as Spark DataFrame , it shows how to convert JSON string to Spark DataFrame; this article show the other way around - convert complex columns to a JSON string using to_json function. Function ' to_json(expr[, options]) ' returns a JSON string with a ...
JSON string values can be extracted using built-in Spark functions like get_json_object or json_tuple. Values can be extracted using get_json_object function. The function has two parameters: json_txt and path. The first is the JSON text itself, for example a string column in your Spark ...
Spark SQL - Convert JSON String to Map
Spark SQL function from_json(jsonStr, schema[, options]) returns a struct value with the given JSON string and format. Parameter options is used to control how the json is parsed. It accepts the same options as the json data source in Spark DataFrame reader APIs. The following code ...
Spark SQL - Convert String to Timestamp
Similar as Convert String to Date using Spark SQL , you can convert string of timestamp to Spark SQL timestamp data type. Function to_timestamp(timestamp_str[, fmt]) p arses the `timestamp_str` expression with the `fmt` expression to a timestamp data type in Spark. Example ...