Diagram Raymond Raymond

Spark Application Anatomy

event 2022-08-23 visibility 198 comment 0

This diagram depicts the relationships among Spark application, job, stage and task. 

One Spark application can contain multiple actions and each action will  be related to one Spark job; to run the computation within a job, multiple stages might be involved as some actions cannot be done within just one stage; each stage will include many tasks and the task count is decided by the total partitions in the RDD/DataFrame. Task is a lowest parallelism unit in Spark. 

comment Comments
No comments yet.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

tag Tags
info Info
Image URL