Apache Spark installation guides, performance tuning tips, general tutorials, etc.

*Spark logo is a registered trademark of Apache Spark.

open_in_new Go to forum rss_feed Subscribe RSS
Fix - ERROR SparkUI: Failed to bind SparkUI
visibility 74
thumb_up 0
access_time 2 months ago

When starting Spark shell in Windows 10 machine, I encountered an error - ERROR SparkUI: Failed to bind SparkUI. The detailed error message looks like the following: 20/12/13 20:47:34 ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Failed to bind to /0.0.0.0:4056: Service 'SparkUI' ...

visibility 988
thumb_up 0
access_time 4 months ago

In Spark, function to_date can be used to convert string to date. This function is available since Spark 1.5.0. SELECT to_date('2020-10-23', 'yyyy-MM-dd'); SELECT to_date('23Oct2020', 'ddMMMyyyy'); Refer to the official documentation about all the datetime patterns.  ...

.NET for Apache Spark v1.0.0 Released
visibility 26
thumb_up 0
access_time 4 months ago

.NET for Apache Spark v1.0.0 was released officially on 2020-10-14. This page summarizes some important resources for you to get started on .NET for Spark. *Image credit: https://github.com/dotnet/spark/raw/master/docs/img/dotnetsparklogo-6.png Release Notes on GitHub ...

visibility 155
thumb_up 0
access_time 4 months ago

Recently, one of my colleague asked me one question about Spark: for the same SQL statement on finding max value of partition column, different values are returned in Spark SQL and Hive/Impala SQL. The SQL statement looks like the following: SELECT MAX(PART_COL) FROM HiveDb.TestSQL; ...

visibility 8632
thumb_up 1
access_time 6 months ago

This article shows you how to filter NULL/None values from a Spark data frame using Python. Function DataFrame.filter or DataFrame.where can be used to filter out null values.

visibility 1165
thumb_up 0
access_time 6 months ago

Spark is a robust framework with logging implemented in all modules. Sometimes it might get too verbose to show all the INFO logs. This article shows you how to hide those INFO logs in the console output. Log level can be setup using function pyspark.SparkContext.setLogLevel . The ...

visibility 4196
thumb_up 0
access_time 6 months ago

This article shows how to change column types of Spark DataFrame using Python. For example, convert StringType to DoubleType, StringType to Integer, StringType to DateType. Follow article  Convert Python Dictionary List to PySpark DataFrame to construct a dataframe.

visibility 1269
thumb_up 0
access_time 6 months ago

This article shows how to add a constant or literal column to Spark data frame using Python.  Follow article  Convert Python Dictionary List to PySpark DataFrame to construct a dataframe. +----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| ...

visibility 1384
thumb_up 0
access_time 6 months ago

This article shows how to 'delete' column from Spark data frame using Python.  Follow article  Convert Python Dictionary List to PySpark DataFrame to construct a dataframe. +----------+---+------+ | Category| ID| Value| +----------+---+------+ |Category A| 1| 12.40| |Category B| ...

visibility 2747
thumb_up 0
access_time 6 months ago

Column renaming is a common action when working with data frames. In this article, I will show you how to rename column names in a Spark data frame using Python.  The following code snippet creates a DataFrame from a Python native dictionary list. PySpark SQL types are used to create the ...