Master
start-spark.sh
localhost:8080
Spark Jobs
pyspark
localhost:4040
sc.uiwebUrl
: Find port number
If we open an 2nd shell next it will open on localhost:4041. It goes all the way up to 4072
Spark History Server
start-history-server.sh
localhost:18080
Spark Script (Spark 1.0)
For RDD (Spark Context is entry point) - Old (Version less than 2)
For Dataset, Dataframe (Spark Session is entry point)
Run a Spark Script
Spark Scripts (Spark 2.0+)
Run a Spark Script
Run Spark Job using YARN
--master
: Decides the type of cluster to use to run the job YARN, MESOS, etc.
--executor-memory 20G
: Max memory that can be used by each executor
--total-executor-cores 200
: Total cores that can be used for executing the job
Spark History Servers
These configurations are used to enable job monitoring for all jobs in Spark
We can view the information about the jobs after the jobs are complete
Config Location: spark/conf/spark.default.conf
Spark-Submit Command Line Arguments - Gankrin
Understanding Apache Spark on YARN · Sujith Jay Nair
Things you need to know about Hadoop and YARN being a Spark developer
Running Spark Jobs on YARN. When running Spark on YARN, each Spark | Medium