Json dataframe to CSV dataframe

R RAVULA ANILKUMAR event 2022-11-27 visibility 66 comment 6
more_vert

Why do we need schema for a dataframe before converting json dataframe to CSV dataframe? In pyspark dataframe

More from Kontext
comment Comments
R RAVULA ANILKUMAR

RAVULA access_time 2 years ago link more_vert

Consider this array:




 {


Array 1:[


{


Info_:{


"a":1,


"b":4


},


Arry_inside:[


Item1,


Item2


]


},


{


Info_:{


"a":5,


"b":9


},


Arry_inside:[


Item1,


Item2,


Item3,


Item4


]},


]


}


Above json contains inside array is a growing array so how can we flat....i don't think that will work....can you please see these kind of json files




Some we can do it may be.....


I don't know that......




So basically the JSON in my case it is getting from RESTAPI response for a POST request to a server..




The response is a JSON...


Which look like above mentioned looks very simple now but it's not 😂,, but i can't share that one ..




In response Json file contains




Mainly one Array 1 inside that we are getting like 10k - 30k objects of struct type


Inside each object:


There 10 keys are struct type and 10 values arry type ...


All values are null is allowed


So here is the deal the one Array kind of key that is the Messege kind of stuff is there inside the complex types similar to the json i mentioned inside array if go inside items one key value pair message is there


We need to extract that each mesage in each row...



Raymond Raymond

Raymond access_time 2 years ago link more_vert

Can you please define what is a JSON or CSV DataFrame?

If a JSON document is loaded into memory as a DataFrame, it can then be saved as other format like CSV. 

Before you load JSON document, you usually can define a schema or Spark can also infer schema.

R RAVULA ANILKUMAR

RAVULA access_time 2 years ago link more_vert

For CSV :-

We can easily the data goes under easily ETL jobs.

But for json frames not

For json having complex levels like 

Example : struct {

Array 1:[

Info_:{

Some values

},

Arry_inside:[

Some values 

]

]

}

so we don't have choice to do like CSV so then we need to extract values the. We need to produce the output file as CSV from json.

Raymond Raymond

Raymond access_time 2 years ago link more_vert

As CSV format is flat and doesn’t support nested types hence you will need to flatten your DataFrame before writing it as CSV format.

R RAVULA ANILKUMAR

RAVULA access_time 2 years ago link more_vert

For CSV :-

We can easily the data goes under easily ETL jobs.

But for json frames not

For json having complex levels like 

Example : struct {

Array 1:[

Info_:{

Some values

},

Arry_inside:[

Some values 

]

]

}

so we don't have choice to do like CSV so then we need to extract values the. We need to produce the output file as CSV from json.

R RAVULA ANILKUMAR

RAVULA access_time 2 years ago link more_vert

For CSV :-

We can easily the data goes under easily ETL jobs.

But for json frames not

For json having complex levels like 

Example : struct {

Array 1:[

Info_:{

Some values

},

Arry_inside:[

Some values 

]

]

}

so we don't have choice to do like CSV so then we need to extract values the. We need to produce the output file as CSV from json.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts