CPU Controller -------------- The CPU controller is responsible for grouping tasks together that will be viewed by the scheduler as a single unit. The CFS scheduler will first divide CPU time equally between all entities in the same level, and then proceed by doing the same in the next level. Basic use cases for that are described in the main cgroup documentation file, cgroups.txt. Users of this functionality should be aware that deep hierarchies will of course impose scheduler overhead, since the scheduler will have to take extra steps and look up additional data structures to make its final decision. Through the CPU controller, the scheduler is also able to cap the CPU utilization of a particular group. This is particularly useful in environments in which CPU is paid for by the hour, and one values predictability over performance. CPU Accounting -------------- The CPU cgroup will also provide additional files under the prefix "cpuacct". Those files provide accounting statistics and were previously provided by the separate cpuacct controller. Although the cpuacct controller will still be kept around for compatibility reasons, its usage is discouraged. If both the CPU and cpuacct controllers are present in the system, distributors are encouraged to always mount them together. Files ----- The CPU controller exposes the following files to the user: cpu.shares: - cpu.cfs_period_us: The duration in microseconds of each scheduler period, for bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will improve throughput at the expense of latency, since the scheduler will be able to sustain a cpu-bound workload for longer. The opposite of true for smaller periods. Note that this only affects non-RT tasks that are scheduled by the CFS scheduler. - cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us in for the current group will be allowed to run. For instance, if it is set to half of cpu_period_us, the cgroup will only be able to peak run for 50 % of the time. One should note that this represents aggregate time over all CPUs in the system. Therefore, in order to allow full usage of two CPUs, for instance, one should set this value to twice the value of cfs_period_us. - cpu.stat: statistics about the bandwidth controls. No data will be presented if cpu.cfs_quota_us is not set. The file presents three numbers: nr_periods: how many full periods have been elapsed. nr_throttled: number of times we exausted the full allowed bandwidth throttled_time: total time the tasks were not run due to being overquota - cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks analogous to the CFS files cfs_quota_us and cfs_period_us. One important difference, though, is that while the cfs quotas are upper bounds that won't necessarily be met, the rt runtimes form a stricter guarantee. Therefore, no overlap is allowed. Implications of that are that given a hierarchy with multiple children, the sum of all rt_runtime_us may not exceed the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks can ever be run in this cgroup. - cpuacct.usage: The aggregate CPU time, in microseconds, consumed by all tasks in this group. - cpuacct.usage_percpu: The CPU time, in microseconds, consumed by all tasks in this group, separated by CPU. The format is an space-separated array of time values, one for each present CPU. - cpuacct.stat: aggregate user and system time consumed by tasks in this group. The format is user: x\nsystem: y.