The partition is not defined by the user it is decided based on the data and its hash value.
Bucketing is mainly used when we want to sample data. It is also used when we have indefinite data and cannot be partitioned
Bucketing is the concept used by MapReduce to create partitions
Bucketing on multiple columns is possible. Both the column together is taken as key
Formula: Hash Value (Key) mod (No. of Bucket)
INFO
- To create a table in Hive DB it should be created using Hive CLI
- Bucketing is not supported by Spark so it should be done using Hive CLI