Delta Lake adds ACID Properties on Data
Data Lake API: Enforces & Validates Schema on top of Files in Data lake
JSON Transactional Logs: Manages changes & operations performed on files
Delta Table have the operation metadata in the JSON logs (_delta_log
)
By default all tables created in Databricks uses Delta Format (v8.0 and above)
Understanding the Delta Lake Transaction Log - Databricks Blog
Delta Lake Tutorial: How to Easily Delete, Update, and Merge Using DML
Columns that are marked as add
in the transaction log are present in the current active version of the table while the ones marked as remove
are no more required
NOTE
- ZORDER is used to group columns with similar values together which can speed up lookup
- Delta table can only be created on Parquet Format Files
IMPORTANT
- By default VACUUM will not allow the user to delete data that is not older than 7 days. A flag needs to be explicitly disabled before the above can be performed.
- Sometimes even after running the VACUUM command older versions of the data can be accessed this happens because the values where cached by the Active Cluster