Convert List to Spark Data Frame in Scala / Spark

access_time 2 years ago visibility5860 comment 0

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

Similar to PySpark, we can use SparkContext.parallelize function to create RDD; alternatively we can also use SparkContext.makeRDD function to convert list to RDD.

The output looks like the following:

+----------+-----+------------------+

|  Category|Count|       Description|

+----------+-----+------------------+

|Category A|  100|This is category A|

|Category B|  120|This is category B|

|Category C|  150|This is category C|

+----------+-----+------------------+

Code snippet

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val appName = "Scala Example - List to Spark Data Frame"
val master = "local"

/*Create Spark session with Hive supported.*/
val spark = SparkSession.builder.appName(appName).master(master).getOrCreate()

/* List */
val data = List(Row("Category A", 100, "This is category A"),
Row("Category B", 120, "This is category B"),
Row("Category C", 150, "This is category C"))

val schema = StructType(List(
  StructField("Category", StringType, true),
StructField("Count", IntegerType, true),
StructField("Description", StringType, true)
))

/* Convert list to RDD */
val rdd = spark.sparkContext.parallelize(data)

/* Create data frame */
val df = spark.createDataFrame(rdd, schema)
print(df.schema)
df.show()
info Last modified by Raymond at 2 years ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Want to publish your article on Kontext?

Learn more

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.


Learn more arrow_forward

More from Kontext

local_offer spark local_offer hdfs local_offer scala local_offer parquet local_offer spark-file-operations

visibility 15439
thumb_up 0
access_time 3 years ago

In my previous post, I demonstrated how to write and read parquet files in Spark/Scala. The parquet file destination is a local folder. Write and Read Parquet Files in Spark/Scala In this page, I am going to demonstrate how to write and read parquet files in HDFS. import ...

local_offer spark local_offer scala

visibility 344
thumb_up 0
access_time 12 months ago

Parquet is columnar store format published by Apache. It's commonly used in Hadoop ecosystem. There are many programming language APIs that have been implemented to support writing and reading parquet files. 

local_offer python local_offer spark-2-x local_offer spark-dataframe

visibility 3508
thumb_up 0
access_time 12 months ago

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

About column

Code snippets and tips for various programming languages/frameworks.

rss_feed Subscribe RSS