Read file from Local FS
For HDFS specify path as “hdfs://localhost:9000/lti/sample.txt”
Create Partitions of List
Flat Map
Used to break data into smaller chunks
Flat Map Values
Used to flatten values of an key value pair. Can only access the value field
Map
Create key, value pairs of data. Returns an list
Map Values
Used to apply Map function of Key, Value pairs. Can only access the value field
Filter
Returns an Boolean value. True data is allowed to pass through
Caching of Data
- A action needs to be performed before the data is actually cached
- The cached files are visible under the Jobs tab in Spark Web UI
Number of Partitions
Partitions
Actions
Print the RDD
Save the RDD as a text file
Count By Value
Groups same data together. Returns a DefaultDict
Reduce by Key
Perform aggregation of data with the same key
The reduction is performed at the mapper stage and combined at the reducer
Group By Key
Group values with the same key. The reduction is performed at the reducer stage
Sort By Key
Sort RDD by key values
Create RDD of just Keys or Values
visualapi.pdf - Google Drive
Spark Programming Guide - Spark 2.2.0 Documentation