Custom Partition
Partition is generally created on the column that has repeated data
By having partitions we can avoid entire table scan when we perform aggregate functions
User decides the value based on which the partition will be created (User control movement of data)
Custom partitioning can be used when we have definite data
The amount of files that are created for each partition depends on the number of files that are present in the source table
The partitioned data is stored in their own folder under the table folder in the format /tb_name/partition_column=value
(e.g. /sales/country=India)
Static Partition - Using Existing Column
The column used for partitioning should not be included in the table creating part of the query
The column values are case sensitive
Static Partition - Using New Column
This approach is rarely used in real time environment
Dynamic Partitioning
Dynamic is used when we have no idea about the data that is going to be stored so we only specify the column based on which the partition should be created
By default dynamic partition is not allowed it needs to be enabled
When inserting values into the table it should be specified in the same order as set during table creation
In the table a folder with the 1st partition name will be created which will have folders with the name of the 2nd partition which will have files equal to the number to the source table
NOTE
- During table creating Static and Dynamic partitioning can be specified together
- Static partition should always be specified first
PARTITION BY(country='India', year)