In Spark, SparkContext.parallelize function can be used to convert list of objects to RDD and then RDD can be converted to DataFrame object through SparkSession.
In PySpark, we can convert a Python list to RDD using SparkContext.parallelize function.
+----------+-----+------------------+
| Category|Count| Description|
+----------+-----+------------------+
|Category A| 100|This is category A|
|Category B| 120|This is category B|
|Category C| 150|This is category C|
+----------+-----+------------------+
Code snippet
from pyspark.sql import SparkSession from pyspark.sql.types import ArrayType, StructField, StructType, StringType, IntegerType, DecimalType from decimal import Decimal appName = "PySpark Example - Python Array/List to Spark Data Frame" master = "local" # Create Spark session spark = SparkSession.builder \ .appName(appName) \ .master(master) \ .getOrCreate() # List data = [('Category A', Decimal(100), "This is category A"), ('Category B', Decimal(120), "This is category B"), ('Category C', Decimal(150), "This is category C")] # Create a schema for the dataframe schema = StructType([ StructField('Category', StringType(), True), StructField('Count', DecimalType(), True), StructField('Description', StringType(), True) ]) # Convert list to RDD rdd = spark.sparkContext.parallelize(data) # Create data frame df = spark.createDataFrame(rdd,schema) print(df.schema) df.show()