Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

open_in_new Go to forum rss_feed Subscribe RSS
visibility 57
thumb_up 0
access_time 2 months ago

This article shows about how read CSV or TSV file as Spark DataFrame using Scala. The CSV file can be a local file or a file in HDFS (Hadoop Distributed File System).  SparkSession.read can be used to read CSV files.  def csv(path: String): DataFrame Loads a CSV file and returns the ...

visibility 101
thumb_up 1
access_time 2 months ago

This article shows how to convert a JSON string to a Spark DataFrame using Scala. It can be used for processing small in memory JSON string.  The following sample JSON string will be used. It is a simple JSON array with three items in the array. For each item, there are two attributes named ...

visibility 20
thumb_up 0
access_time 2 months ago

This article shows how to change column types of Spark DataFrame using Scala. For example, convert StringType to DoubleType, StringType to Integer, StringType to DateType. Follow article  Scala: Convert List to Spark Data Frame to construct a dataframe.

visibility 15
thumb_up 0
access_time 2 months ago

This article shows you how to filter NULL/None values from a Spark data frame using Scala. Function DataFrame.filter or DataFrame.where can be used to filter out null values.

visibility 24
thumb_up 0
access_time 2 months ago

This article shows how to add a constant or literal column to Spark data frame using Scala.  Follow article  Scala: Convert List to Spark Data Frame to construct a Spark data frame. +----------+-----+------------------+ | Category|Count| ...

visibility 18
thumb_up 0
access_time 2 months ago

This article shows how to 'remove' column from Spark data frame using Scala.  Follow article  Scala: Convert List to Spark Data Frame to construct a data frame. The DataFrame object looks like the following:  +----------+-----+------------------+ | Category|Count| ...

visibility 18
thumb_up 0
access_time 2 months ago

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Scala.  info This is the Scala version of article:  Change DataFrame Column Names in PySpark The following code snippet creates a ...

visibility 13
thumb_up 0
access_time 2 months ago

In Spark 2.0 +, SparkSession can directly create Spark data frame using createDataFrame function.  In this page, I am going to show you how to convert the following Scala list to a Spark data frame: val data = Array(List("Category A", 100, "This is category A"), List("Category B", 120 ...

Fix - ERROR SparkUI: Failed to bind SparkUI
visibility 45
thumb_up 0
access_time 2 months ago

When starting Spark shell in Windows 10 machine, I encountered an error - ERROR SparkUI: Failed to bind SparkUI. The detailed error message looks like the following: 20/12/13 20:47:34 ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Failed to bind to /0.0.0.0:4056: Service 'SparkUI' ...

visibility 870
thumb_up 0
access_time 3 months ago

In Spark, function to_date can be used to convert string to date. This function is available since Spark 1.5.0. SELECT to_date('2020-10-23', 'yyyy-MM-dd'); SELECT to_date('23Oct2020', 'ddMMMyyyy'); Refer to the official documentation about all the datetime patterns.  ...