Spark & PySpark
ColumnApache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Apache Spark installation guides, performance tuning tips, general tutorials, etc.
*Spark logo is a registered trademark of Apache Spark.
Hi Raymond,
But it takes lot of time because of df.collect()
Is there any way to fasten this process? I tried to use --num-executors 5 in spark-submit but no change in performance. Also if possible please provide a solution for this too on how can we leverage --num-executors in this case. Since it's a 'pyspark dataframe' i also used df1 = df.toPandas() but no change in performance.