Spark Dataset and DataFrame

visibility 12 comment 0 access_time 4 days ago languageEnglish

Spark Dataset

Spark Dataset was introduced from Spark 1.6 that provides Spark SQL benefits for RDDs. It is a distributed collection of data. 

Dataset API is available in Scala and Java and is not supported in Python or R due to the dynamic nature of those languages. However because the features of those languages, you can easily access columns too via DataFrame object in Python or R.

Spark DataFrame

Spark DataFrame is a Dataset of Rows with named columns. It is like a table in a relational database.

In Java, Spark DataFrame is a Dataset or Row type (i.e. Dataset<Row>). In Scala, DataFrame type is an alias for type Dataset[Row]. In Python and R, DataFrame type provides similar functions. 

Spark Dataset example via Scala

The following code snippet provide examples of creating Dataset using Scala.

case class Person(var FirstName:String, var LastName:String)
val ds = spark.read.format("csv").option("header","true").load("file:///F:/big-data/person.csv").as[Person]
import spark.implicits._
ds.select($"FirstName").show()
ds.where($"FirstName" === "Raymond").show()
ds.filter($"FirstName" === "Raymond").show()
The code first create a case class named Person; it then creates a Dataset object from CSV file. 

The file content looks like the following:

FirstName,LastName
"Raymond","Tang"
"Kontext","Admin"

Output:

scala> ds.filter($"FirstName" === "Raymond").show()
+---------+--------+
|FirstName|LastName|
+---------+--------+
| Raymond| Tang|
+---------+--------+

Spark DataFrame examples

Kontext provides many examples about Spark DataFrame and transformations.

Refer to series: Spark DataFrame Transformation Tutorials

References

Scala: Read CSV File as Spark DataFrame

copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

More from Kontext
Spark Scala: Load Data from MySQL
visibility 682
thumb_up 0
access_time 9 months ago
visibility 12,654
thumb_up 0
access_time 11 months ago