By using this site, you acknowledge that you have read and understand our Cookie policy, Privacy policy and Terms .

In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.

In PySpark, we can convert a Python list to RDD using SparkContext.parallelize function.

+----------+-----+------------------+

|  Category|Count|       Description|

+----------+-----+------------------+

|Category A|  100|This is category A|

|Category B|  120|This is category B|

|Category C|  150|This is category C|

+----------+-----+------------------+

Code snippet

from pyspark.sql import SparkSession
from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DecimalType
from decimal import Decimal

appName = "PySpark Example - Python Array/List to Spark Data Frame"
master = "local"

# Create Spark session
spark = SparkSession.builder \
    .appName(appName) \
    .master(master) \
    .getOrCreate()

# List
data = [('Category A', Decimal(100), "This is category A"),
        ('Category B', Decimal(120), "This is category B"),
        ('Category C', Decimal(150), "This is category C")]

# Create a schema for the dataframe
schema = StructType([
    StructField('Category', StringType(), True),
    StructField('Count', DecimalType(), True),
    StructField('Description', StringType(), True)
])

# Convert list to RDD
rdd = spark.sparkContext.parallelize(data)

# Create data frame
df = spark.createDataFrame(rdd,schema)
print(df.schema)
df.show()
info Last modified by Raymond at 10 months ago * This page is subject to Site terms.

More from Kontext

local_offer sqlite local_offer python local_offer Java

visibility 4
thumb_up 0
access_time 55 minutes ago

To read data from SQLite database in Python, you can use the built-in sqlite3 package . Another approach is to use SQLite JDBC driver via  ...

open_in_new View open_in_new Python Programming

local_offer python local_offer sqlite

visibility 2
thumb_up 0
access_time 1 hour ago

SQLite is one of the most commonly used embedded file databases. All the mainstream programming language/framework provides APIs to interact with SQLite database. In my previous article  ...

open_in_new View open_in_new Python Programming

local_offer Java local_offer python local_offer SQL Server

visibility 3
thumb_up 0
access_time 3 hours ago

In my previous article  Connect to SQL Server via JayDeBeApi in Python , I showed examples of u...

open_in_new View open_in_new Python Programming

Pandas DataFrame Plot - Scatter and Hexbin Chart

local_offer plot local_offer pandas local_offer jupyter-notebook local_offer python

visibility 11
thumb_up 0
access_time 6 days ago

 In this article I'm going to show you some examples about plotting scatter and hexbin chart with Pandas DataFrame. I'm using Jupyter Notebook as IDE/code execution environment.  Hexbin chart &nbs...

open_in_new View open_in_new Code snippets

info About author

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward
Kontext Column

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.

Learn more arrow_forward
info Follow us on Twitter to get the latest article updates. Follow us