Job

A piece of code that has all the operations to be performed are specified

Task

A single operation performed on each partition of data

Stage

The operations that are pipeline together
Shuffle operation always creates a new stage

No. of task = No. of partitions
1 task runs on 1 partition on 1 executor
No. of stages = No. of shuffling + 1

Job Execution

When action is encountered in code a Job is created
DAG Scheduler is responsible for creating a graph and splits it into stages of tasks to submit to the Task Scheduler
Task Scheduler launch’s the tasks using cluster manager (Responsible for allocation worker and executor in worker)
Cluster Manager instructs the worker node to execute the job. It tracks jobs and report back the status of jobs

Spark Basics : RDDs,Stages,Tasks and DAG | by saurabh goyal | Medium