Scala: Convert List to Spark Data Frame
In Spark 2.0 +, SparkSession can directly create Spark data frame using createDataFrame function.
In this page, I am going to show you how to convert the following Scala list to a Spark data frame:
val data = Array(List("Category A", 100, "This is category A"), List("Category B", 120, "This is category B"), List("Category C", 150, "This is category C"))
infoThis is a structured documentation of article Convert List to Spark Data Frame in Scala / Spark
Import types
First, let’s import the data types we need for the data frame.
import org.apache.spark.sql._ import org.apache.spark.sql.types._
Define the schema
Define a schema for the data frame based on the structure of the Python list.
// Create a schema for the dataframe val schema =
StructType( StructField("Category", StringType, true) ::
StructField("Count", IntegerType, true) ::
StructField("Description", StringType, true) :: Nil)
Convert the list to data frame
The list can be converted to RDD through parallelize function:
// Convert list to List of Row val rows = data.map(t=>Row(t(0),t(1),t(2))).toList // Create RDD val rdd = spark.sparkContext.parallelize(rows) // Create data frame val df = spark.createDataFrame(rdd,schema) print(df.schema) df.show()
Sample output
scala> print(df.schema) StructType(StructField(Category,StringType,true), StructField(Count,IntegerType,true), StructField(Description,StringType,true)) scala> df.show() +----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
Reference
Refer to the Scala API documentation for more information about SparkSession class:
info Last modified by Raymond 3 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
warning Please login first to view stats information.
article
When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set
article
Data Partitioning Functions in Spark (PySpark) Deep Dive
article
Debug PySpark Code in Visual Studio Code
image
Spark SQL Joins - Inner Join
article
Convert Python Dictionary List to PySpark DataFrame
Read more (127)