They are used to assign limits to the resources (CPU, Memory, IO, Network, etc.) which can be used by processes
Each resource has its own hierarchy tree
Every process will belong to a node in the hierarchy tree of the resources
Each hierarchy starts out with only one node (the root node), each process starts out at this node
Every node in the hierarchy tree is a group of processes that share the same resource
PID 1 is at the root of each cgroup
A new process starts in its parents group
Groups are materialized by pseudo-fs : /sys/fs/cgroup
A new cgroup can be created using mkdir
in the pseudo-fs : mkdir /sys/fs/cgroup/memory/mycgroup
Move process to group : echo $PID > /sys/fs/cgroup/../tasks
Memory Cgroup
Accounting
Allows to keep track of the amount of memory used by each process/ group of processes
The memory is amounted in terms of pages (4KB on most systems)
Memory pages are of two types:
- file (points to data present on disk)
- anonymous (points to data not present on disk) There are two pools active (recently accessed) and inactive (candidate for eviction) the kernel will decide based on the memory available which pages need to be evicted
Each page is charged/ tagged to a group
If multiple groups access a page then only one of the group is charged/ tagged for the page i.e. the memory occupied the page will be only counted against one process
Limits
Each group can be assigned an optional limit
They are of two types : hard and soft
When hard limits are surpassed it will cause an Out of Memory (OOM) processor killer that will randomly kill processes
Soft limits are not enforced but if the system reaches an state where it cannot operate safely anymore the soft limits are checked and based on how much above the soft limit the process is running the more likely pages are to be removed from it
These limits can be set for different types of memories as well like physical, kernel, total (RAM + Swap)
OOM Notification can be set which causes an group whose hard limit is surpassed to freeze. An notification is raised which can be acted up my the user and once the processes is memory usage is lower than the limit the process can be unfrozen
Every time the kernal gives an page or removes an page from a process it has to update an counter
There is an slight overhead cost associated with this. This cannot be enabled/ disabled at the process level its set on the whole machine
HugeTLS cgroup
Controls the amount of huge pages that can be used by a cgroup
By default a process can use all the Huge Pages
CPU cgroup
Allows to keep track of user/system CPU time
Keeps track of usage per CPU
We cannot set limit on CPU usage
CPUset cgroup
Allows to pin groups to certain CPU(s)
Reverse CPU for certain processes/ apps
Allows to avoid process bouncing between processes
Block IO cgroup
Keeps track of I/O for each group for each block device
It can be tracked by the no. of read and write operations performed by a group as well as the type of operation (sync & async)
Allows to set limits for each group for each block device
Based on speed of read and write operations or even based on operations and bytes
Allows to set weights to each group as well
Net_cls & Net_prio cgroup
Automatically set class or priority to the traffic that is generated by a group
It only applies to egress traffic
Net_cls will add an tag to the traffic that is generated by a group which then can be shaped as per our need using tools like tc/iptables
Net_prio assigns an priority to the traffic which is used by queuing algorithms
Devices cgroup
Controls what a group can do on a device node
Allows to control permissions like read, write
Used to prevent containers from having access to all the directories on the system
Freezer cgroup
Allows to throttle/ thaw a group of processes
Similar in functionally to SIGSTOP/ SIGCONT
Freezer signals cannot be identified by the processes and hence will not impede ptrace/ debugging