Spark SQL Functions

Articles tagged with spark-sql-function.
visibility 18
thumb_up 0
access_time 14 days ago

ROW_NUMBER in Spark assigns a unique sequential number (starting from 1) to each record based on the ordering of rows in each window partition. It is commonly used to deduplicate data. The following sample SQL uses ROW_NUMBER function without PARTITION BY clause: SELECT TXN.*, ROW_NUMBER() OVER ...

visibility 10
thumb_up 0
access_time 10 days ago

RANK in Spark calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are not sequential.   The following sample SQL uses RANK function without PARTITION BY ...

visibility 9
thumb_up 0
access_time 10 days ago

DENSE_RANK is similar as  Spark SQL - RANK Window Function . It  calculates the rank of a value in a group of values. It returns one plus the number of rows proceeding or equals to the current row in the ordering of a partition. The returned values are sequential in each window thus no ...

visibility 9
thumb_up 0
access_time 10 days ago

Spark NTILE function   divides  the rows in each window to 'n' buckets ranging from 1 to at most 'n' (n is the specified parameter).  The following sample SQL uses NTILE function to divide records in each window to two buckets.  SELECT TXN.*, NTILE(2) OVER (PARTITION BY ...

visibility 8
thumb_up 0
access_time 10 days ago

Spark LAG function provides access to a row at a given offset that comes before the current row in the windows. This function can be used in a SELECT statement to compare values in the current row with values in a previous row. lag(input[, offset[, default]]) OVER ([PARYITION BY ..] ORDER BY ...) ...

visibility 7
thumb_up 0
access_time 10 days ago

Spark LEAD function provides access to a row at a given offset that follows the current row in a window. This analytic function can be used in a SELECT statement to compare values in the current row with values in a following row. This function is like  Spark SQL - LAG Window Function .

visibility 7
thumb_up 0
access_time 8 days ago

Function current_date() or current_date can be used to return the current date at the start of query evaluation.  Example: spark-sql> select current_date(); current_date() 2021-01-09 spark-sql> select current_date; current_date() 2021-01-09 *Brackets are optional for this ...

visibility 9
thumb_up 0
access_time 8 days ago

Function unix_timestamp() returns the UNIX timestamp of current time. You can also specify a input timestamp value.  Example: spark-sql> select unix_timestamp(); unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss) 1610174099 spark-sql> select unix_timestamp(current_timestamp ...

visibility 11
thumb_up 0
access_time 8 days ago

Similar as  Convert String to Date using Spark SQL , you can convert string of timestamp to Spark SQL timestamp data type. Function  to_timestamp(timestamp_str[, fmt]) p arses the `timestamp_str` expression with the `fmt` expression to a timestamp data type in Spark.  Example ...

visibility 6
thumb_up 0
access_time 8 days ago

Spark SQL function from_json(jsonStr, schema[, options]) returns a struct value with the given JSON string and format. Parameter options is used to control how the json is parsed. It accepts the same options as the  json data source in Spark DataFrame reader APIs. The following code ...

Read more

Find more tags on tag cloud.

launch Tag cloud