Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

open_in_new Go to forum rss_feed Subscribe RSS
Spark Scala: Load Data from MySQL
visibility 4
thumb_up 0
access_time 1 day ago

In article Connect to MySQL in Spark (PySpark) ,  I showed how to connect to MySQL in PySpark. In this article, I will directly use JDBC driver to load data from MySQL database with Scala. Download JDBC Driver for MySQL from the following website: MySQL :: Download Connector/J ...

Connect to MySQL in Spark (PySpark)
visibility 9
thumb_up 0
access_time 1 day ago

Spark is an analytics engine for big data processing. There are various ways to connect to a MySQL database in Spark.  This page summarizes some of common approaches to connect to MySQL using Python as programming language. Similar as  Connect to SQL Server in Spark (PySpark) , there ...

Apache Spark 3.0.1 Installation on macOS
visibility 6
thumb_up 0
access_time 8 days ago

Spark is written with Scala which runs in JVM (Java Virtual Machine); thus it is also feasible to run Spark in a macOS system. This article provides step by step guide to install the latest version of Apache Spark 3.0.1 on macOS. The version I'm using is macOS Big Sur version 11.1. Hadoop 3.3.0 ...

visibility 20
thumb_up 0
access_time 15 days ago

Like other SQL engines, Spark also supports PIVOT clause. PIVOT is usually used to calculated aggregated values for each value in a column and the calculated values will be included as columns in the result set. PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ] FOR ...

visibility 9
thumb_up 0
access_time 15 days ago

Unlike traditional RDBMS systems, Spark SQL supports complex types like array or map. There are a number of built-in functions to operate efficiently on array values. ArrayType columns can be created directly using array or array_repeat  function. The latter repeat one element multiple times ...

visibility 11
thumb_up 0
access_time 15 days ago

In Spark SQL, MapType is designed for key values, which is like dictionary object type in many other programming languages. This article summarize the commonly used map functions in Spark SQL. Function map is used to create a map.  Example: spark-sql> select ...

visibility 9
thumb_up 0
access_time 16 days ago

Spark SQL function from_json(jsonStr, schema[, options]) returns a struct value with the given JSON string and format. Parameter options is used to control how the json is parsed. It accepts the same options as the  json data source in Spark DataFrame reader APIs. The following code ...

visibility 14
thumb_up 0
access_time 16 days ago

Similar as  Convert String to Date using Spark SQL , you can convert string of timestamp to Spark SQL timestamp data type. Function  to_timestamp(timestamp_str[, fmt]) p arses the `timestamp_str` expression with the `fmt` expression to a timestamp data type in Spark.  Example ...

visibility 12
thumb_up 0
access_time 16 days ago

Function unix_timestamp() returns the UNIX timestamp of current time. You can also specify a input timestamp value.  Example: spark-sql> select unix_timestamp(); unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss) 1610174099 spark-sql> select unix_timestamp(current_timestamp ...

visibility 8
thumb_up 0
access_time 16 days ago

Function current_date() or current_date can be used to return the current date at the start of query evaluation.  Example: spark-sql> select current_date(); current_date() 2021-01-09 spark-sql> select current_date; current_date() 2021-01-09 *Brackets are optional for this ...