Convert Python Dictionary List to PySpark DataFrame
Log in with external accounts
comment Comments
#338 Re: Convert Python Dictionary List to PySpark DataFrame
like here:
I am reading list with each list item is a csv line
rdd_f_n_cnt=['/usr/lcoal/app/,100,s3-xyz,emp.txt','/usr/lcoal/app/,100,s3-xyz,emp.txt']
and putting format like key=val
rdd_f_n_cnt_2 = rdd_f_n_cnt.map(lambda l:Row(path=l.split(",")[0],file_count=l.split(",")[1],folder_name=l.split(",")[2],file_name=l.split(",")[3]))
Indirectly you are doing same with **
person Raymond access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame
Hi Swapnil,
Is this a question or comment?
If I understand your question correctly, you were asking about the following?
**i
** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction.
#337 Re: Convert Python Dictionary List to PySpark DataFrame
ohhk got it
I thought it needs only this below format:
key=val
Row(Category= 'Category A', ID= 1,Value=1)
person Raymond access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame
Hi Swapnil,
Is this a question or comment?
If I understand your question correctly, you were asking about the following?
**i
** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction.
#336 Re: Convert Python Dictionary List to PySpark DataFrame
Hi Raymond,
wonderful Article ,Was just confused at below line :
df = spark.createDataFrame([Row(**i) for i in data])
I assume Row class needs input like
row=Row(Category= 'Category A', ID= 1,Value=1) so how this is getting translated here..
or is it like when we give input like a key ,val,it understands and creates schema correctly ?
#335 Re: Convert Python Dictionary List to PySpark DataFrame
Hi Swapnil,
Is this a question or comment?
If I understand your question correctly, you were asking about the following?
**i
** (double asterisk) denotes a dictionary unpacking. It unpacks the dictionary contents as parameters for Row class construction.
Comment is deleted or blocked.
#339 Re: Convert Python Dictionary List to PySpark DataFrame
Correct that is more about a Python syntax rather than something special about Spark.
I feel like to explicitly specify attributes for each Row will make the code easier to read sometimes.
person Swapnil access_time 2 years ago
Re: Convert Python Dictionary List to PySpark DataFrame
like here:
I am reading list with each list item is a csv line
rdd_f_n_cnt=['/usr/lcoal/app/,100,s3-xyz,emp.txt','/usr/lcoal/app/,100,s3-xyz,emp.txt']
and putting format like key=val
rdd_f_n_cnt_2 = rdd_f_n_cnt.map(lambda l:Row(path=l.split(",")[0],file_count=l.split(",")[1],folder_name=l.split(",")[2],file_name=l.split(",")[3]))
Indirectly you are doing same with **