Databases
Default location of database dbfs:/user/hive/warehouse/<database_name>.db/
The folder that will contain the objects will have the name of the database and end with the .db
extension
JSON Data
The nested values of JSON data can be accessed using the :
operator
We can use the from_json()
and schema_of_json()
function to define a schema
Once an schema is applied on the data we can access nested data using the .
operator
Once a schema has been applied to JSON data the *
operator can be used to flatten the data into different columns
Array Functions
The explode
function can be used to put each record of the array as a new row
The collect_set
function can collect unique values for a field, including fields within arrays (NULL values are excluded)
The flatten
function allows multiple arrays to be combined into a single array
The array_distinct
function removes duplicate elements from an array
TRANSFORM: Allows to apply a Higher Order Function to each element in a array
Pivot Table
Pivot Table allows us to convert values in a column into columns
Pivot in SQL - Databricks