access_time 3 years ago languageEnglish
more_vert

Convert Python Dictionary List to PySpark DataFrame

visibility 40,154 comment 5
This article shows how to convert a Python dictionary list to a DataFrame in Spark using Python. data = [{"Category": 'Category A', "ID": 1, "Value": 12.40}, {"Category": 'Category B', "ID": 2, "Value": 30.10}, {"Category": 'Category C', "ID": 3, "Value": 100.01} ] The ...
info Last modified by Raymond 2 years ago
thumb_up 1

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

comment Comments
2 years ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#339 Re: Convert Python Dictionary List to PySpark DataFrame

Correct that is more about a Python syntax rather than something special about Spark.

I feel like to explicitly specify attributes for each Row will make the code easier to read sometimes. 

format_quote

person Swapnil access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame

like here:

I am reading  list with each  list item is a csv line  

rdd_f_n_cnt=['/usr/lcoal/app/,100,s3-xyz,emp.txt','/usr/lcoal/app/,100,s3-xyz,emp.txt']

and putting format like key=val

rdd_f_n_cnt_2 = rdd_f_n_cnt.map(lambda l:Row(path=l.split(",")[0],file_count=l.split(",")[1],folder_name=l.split(",")[2],file_name=l.split(",")[3]))


Indirectly you are doing same with **

2 years ago link more_vert
S
Swapnil
web_assetArticles 1
imageDiagrams 0
forumThreads 0
commentComments 4
loyaltyKontext Points 14
account_circleProfile
#338 Re: Convert Python Dictionary List to PySpark DataFrame

like here:

I am reading  list with each  list item is a csv line  

rdd_f_n_cnt=['/usr/lcoal/app/,100,s3-xyz,emp.txt','/usr/lcoal/app/,100,s3-xyz,emp.txt']

and putting format like key=val

rdd_f_n_cnt_2 = rdd_f_n_cnt.map(lambda l:Row(path=l.split(",")[0],file_count=l.split(",")[1],folder_name=l.split(",")[2],file_name=l.split(",")[3]))


Indirectly you are doing same with **

format_quote

person Raymond access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame

Hi Swapnil,

Is this a question or comment?

If I understand your question correctly, you were asking about the following?

**i

** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction. 

2 years ago link more_vert
S
Swapnil
web_assetArticles 1
imageDiagrams 0
forumThreads 0
commentComments 4
loyaltyKontext Points 14
account_circleProfile
#337 Re: Convert Python Dictionary List to PySpark DataFrame

ohhk  got it

I thought it needs only  this below format:

key=val

Row(Category= 'Category A', ID= 1,Value=1) 


format_quote

person Raymond access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame

Hi Swapnil,

Is this a question or comment?

If I understand your question correctly, you were asking about the following?

**i

** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction. 

2 years ago link more_vert
S
Swapnil
web_assetArticles 1
imageDiagrams 0
forumThreads 0
commentComments 4
loyaltyKontext Points 14
account_circleProfile
#336 Re: Convert Python Dictionary List to PySpark DataFrame

Hi Raymond,

wonderful Article ,Was just confused at below line :

df = spark.createDataFrame([Row(**i) for i in data])

I assume Row class needs input like

row=Row(Category= 'Category A', ID= 1,Value=1)  so how this is getting translated  here..

 or is it like when we give input like a key ,val,it understands and creates schema  correctly ?

2 years ago link more_vert
Raymond Raymond
web_assetArticles 583
imageDiagrams 40
forumThreads 9
commentComments 218
loyaltyKontext Points 6293
account_circleProfile
#335 Re: Convert Python Dictionary List to PySpark DataFrame

Hi Swapnil,

Is this a question or comment?

If I understand your question correctly, you were asking about the following?

**i

** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction. 

format_quote

Comment is deleted or blocked.

timeline Stats
Page index 45.75