imageSpark Application Anatomy

visibility 25 access_time 2 months ago language English

This diagram depicts the relationships among Spark application, job, stage and task. 

One Spark application can contain multiple actions and each action will  be related to one Spark job; to run the computation within a job, multiple stages might be involved as some actions cannot be done within just one stage; each stage will include many tasks and the task count is decided by the total partitions in the RDD/DataFrame. Task is a lowest parallelism unit in Spark. 

copyright This page is subject to Site terms.

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

comment Comments
No comments yet.
tag Tags

info Info
Image URL
SVG URL
URL