Scala: Convert List to Spark Data Frame
In Spark 2.0 +, SparkSession can directly create Spark data frame using createDataFrame function.
In this page, I am going to show you how to convert the following Scala list to a Spark data frame:
val data = Array(List("Category A", 100, "This is category A"), List("Category B", 120, "This is category B"), List("Category C", 150, "This is category C"))
infoThis is a structured documentation of article Convert List to Spark Data Frame in Scala / Spark
Import types
First, let’s import the data types we need for the data frame.
import org.apache.spark.sql._ import org.apache.spark.sql.types._
Define the schema
Define a schema for the data frame based on the structure of the Python list.
// Create a schema for the dataframe val schema =
StructType( StructField("Category", StringType, true) ::
StructField("Count", IntegerType, true) ::
StructField("Description", StringType, true) :: Nil)
Convert the list to data frame
The list can be converted to RDD through parallelize function:
// Convert list to List of Row val rows = data.map(t=>Row(t(0),t(1),t(2))).toList // Create RDD val rdd = spark.sparkContext.parallelize(rows) // Create data frame val df = spark.createDataFrame(rdd,schema) print(df.schema) df.show()
Sample output
scala> print(df.schema) StructType(StructField(Category,StringType,true), StructField(Count,IntegerType,true), StructField(Description,StringType,true)) scala> df.show() +----------+-----+------------------+ | Category|Count| Description| +----------+-----+------------------+ |Category A| 100|This is category A| |Category B| 120|This is category B| |Category C| 150|This is category C| +----------+-----+------------------+
Reference
Refer to the Scala API documentation for more information about SparkSession class:
Spark 3.0.1 ScalaDoc - org.apache.spark.sql.SparkSession
info Last modified by Raymond 4 years ago
copyright
This page is subject to Site terms.
comment Comments
No comments yet.