Scala: Convert List to Spark Data Frame

access_time 2 months ago visibility12 comment 0

In Spark 2.0 +, SparkSession can directly create Spark data frame using createDataFrame function. 

In this page, I am going to show you how to convert the following Scala list to a Spark data frame:

val data = 
Array(List("Category A", 100, "This is category A"),
List("Category B", 120, "This is category B"),
List("Category C", 150, "This is category C"))
infoThis is a structured documentation of article Convert List to Spark Data Frame in Scala / Spark 

Import types

First, let’s import the data types we need for the data frame.

import org.apache.spark.sql._
import org.apache.spark.sql.types._

Define the schema

Define a schema for the data frame based on the structure of the Python list.

// Create a schema for the dataframe
val schema =
StructType( StructField("Category", StringType, true) ::
StructField("Count", IntegerType, true) ::
StructField("Description", StringType, true) :: Nil)

Convert the list to data frame

The list can be converted to RDD through parallelize function:

// Convert list to List of Row
val rows = data.map(t=>Row(t(0),t(1),t(2))).toList

// Create RDD
val rdd = spark.sparkContext.parallelize(rows)

// Create data frame
val df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()

Sample output

scala> print(df.schema)
StructType(StructField(Category,StringType,true), StructField(Count,IntegerType,true), StructField(Description,StringType,true))
scala> df.show()
+----------+-----+------------------+
|  Category|Count|       Description|
+----------+-----+------------------+
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|
+----------+-----+------------------+

Reference

Refer to the Scala API documentation for more information about SparkSession class:

Spark 3.0.1 ScalaDoc - org.apache.spark.sql.SparkSession

info Last modified by Raymond 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Follow Kontext

Get our latest updates on LinkedIn or Twitter.

Want to publish your article on Kontext?

Learn more

More from Kontext

visibility 17
thumb_up 0
access_time 2 months ago

This article shows how to 'remove' column from Spark data frame using Scala.  Follow article  Scala: Convert List to Spark Data Frame to construct a data frame. The DataFrame object looks like the following:  +----------+-----+------------------+ | Category|Count| ...

visibility 8651
thumb_up 0
access_time 2 years ago

This pages summarizes the steps to install the latest version 2.4.3 of Apache Spark on Windows 10 via Windows Subsystem for Linux (WSL). Follow either of the following pages to install WSL in a system or non-system drive on your Windows 10. Install Windows Subsystem for Linux on a Non-System ...

visibility 12
thumb_up 0
access_time 2 months ago

In Spark 2.0 +, SparkSession can directly create Spark data frame using createDataFrame function.  In this page, I am going to show you how to convert the following Scala list to a Spark data frame: val data = Array(List("Category A", 100, "This is category A"), List("Category B", 120 ...