Convert List to Spark Data Frame in Python / Spark

access_time 2 years ago visibility3042 comment 0

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

In PySpark, we can convert a Python list to RDD using SparkContext.parallelize function.

|  Category|Count|       Description|
|Category A|  100|This is category A|
|Category B|  120|This is category B|
|Category C|  150|This is category C|

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DecimalType
from decimal import Decimal

appName = "PySpark Example - Python Array/List to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \

# List
data = [('Category A', Decimal(100), "This is category A"),
        ('Category B', Decimal(120), "This is category B"),
        ('Category C', Decimal(150), "This is category C")]

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', DecimalType(), True),
    StructField('Description', StringType(), True)

# Convert list to RDD
rdd = spark.sparkContext.parallelize(data)

# Create data frame
df = spark.createDataFrame(rdd,schema)
info Last modified by Raymond at 2 months ago copyright This page is subject to Site terms.
Like this article?
Share on

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

Kontext Column

Created for everyone to publish data, programming and cloud related articles.
Follow three steps to create your columns.

Learn more arrow_forward

More from Kontext

Pandas DataFrame Plot - Line Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python local_offer pandas-plot

visibility 213
thumb_up 0
access_time 6 months ago

This article provides examples about plotting line chart using pandas.DataFrame.plot function. The data I'm going to use is the same as the other article  Pandas DataFrame Plot - Bar Chart . I'm also using Jupyter Notebook to plot them. The DataFrame has 9 records: DATE TYPE SALES ...

local_offer teradata local_offer python local_offer python-database

visibility 1741
thumb_up 1
access_time 5 months ago

Pandas is commonly used by Python users to perform data operations. In many scenarios, the results need to be saved to a storage like Teradata. This article shows you how to do that easily using JayDeBeApi or  sqlalchemy-teradata   package.  JayDeBeApi package and Teradata JDBC ...

local_offer pyspark local_offer spark-2-x local_offer spark local_offer python local_offer spark-dataframe

visibility 5611
thumb_up 1
access_time 9 months ago

This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...

About column

Code snippets for various programming languages/frameworks.

rss_feed Subscribe RSS