Read File
When reading using wrong format it won’t throw error but the data read will not be proper. Data will only loaded partially
Write data to Disk
View structure of data
View Data
Number of partitions
Change Column Names
Replace Values in a Column
Change Datatype of Column
Creating New Column (Constant Value)
Filter Data
& and, | or, ~ not clause can be used to chain multiple conditions together
Remove Duplicate Values
Convert RDD data to DF
pySpark Dataframe Workbook.docx - Google Docs
PySpark_SQL_Cheat_Sheet_Python.pdf - Google Drive
The Most Complete Guide to pySpark DataFrames | Towards Data Science
pyspark.sql module — PySpark 2.4.0 documentation