linux-kernel - [RFC PATCH] sched: Enable root level cgroup bandwidth control

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220518100841.1497391-1-fam.zheng@bytedance.com>
Date:   Wed, 18 May 2022 11:08:41 +0100
From:   Fam Zheng <fam.zheng@...edance.com>
To:     linux-kernel@...r.kernel.org
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        zhouchengming@...edance.com,
        Vincent Guittot <vincent.guittot@...aro.org>, fam@...hon.net,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...hat.com>,
        songmuchun@...edance.com, Juri Lelli <juri.lelli@...hat.com>,
        Fam Zheng <fam.zheng@...edance.com>
Subject: [RFC PATCH] sched: Enable root level cgroup bandwidth control

In the data center there sometimes comes a need to throttle down a
server, cgroup is a natural choice to reduce cpu quota for running task
but there is no interface for the root group.

Alternative solution such as cpufreq controlling exists, with the help
of e.g. intel-pstate or acpi-cpufreq; but that is not always available,
depending on the hardware and BIOS.

This patch allows capping the global cpu utilization.

Currently, writing a positive integer to the v1 root cgroup:

        /sys/fs/cgroup/cpu/cpu.cfs_quota_ns

will be rejected by kernel (-EINVAL). And there is no such entries in v2
either because of CFTYPE_NOT_ON_ROOT flags.

Remove this limitation by checking the root node's throttled state.

Signed-off-by: Chengming Zhou <zhouchengming@...edance.com>
Signed-off-by: Fam Zheng <fam.zheng@...edance.com>
---
 kernel/sched/core.c | 13 ++++---------
 kernel/sched/fair.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d58c0389eb23..c30c8a4d006a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10402,9 +10402,6 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota,
 	int i, ret = 0, runtime_enabled, runtime_was_enabled;
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 
-	if (tg == &root_task_group)
-		return -EINVAL;
-
 	/*
 	 * Ensure we have at some amount of bandwidth every period.  This is
 	 * to prevent reaching a state of large arrears when throttled via
@@ -10632,12 +10629,10 @@ static int tg_cfs_schedulable_down(struct task_group *tg, void *data)
 	struct cfs_bandwidth *cfs_b = &tg->cfs_bandwidth;
 	s64 quota = 0, parent_quota = -1;
 
-	if (!tg->parent) {
-		quota = RUNTIME_INF;
-	} else {
+	quota = normalize_cfs_quota(tg, d);
+	if (tg->parent) {
 		struct cfs_bandwidth *parent_b = &tg->parent->cfs_bandwidth;
 
-		quota = normalize_cfs_quota(tg, d);
 		parent_quota = parent_b->hierarchical_quota;
 
 		/*
@@ -10983,13 +10978,13 @@ static struct cftype cpu_files[] = {
 #ifdef CONFIG_CFS_BANDWIDTH
 	{
 		.name = "max",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = 0,
 		.seq_show = cpu_max_show,
 		.write = cpu_max_write,
 	},
 	{
 		.name = "max.burst",
-		.flags = CFTYPE_NOT_ON_ROOT,
+		.flags = 0,
 		.read_u64 = cpu_cfs_burst_read_u64,
 		.write_u64 = cpu_cfs_burst_write_u64,
 	},
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a68482d66535..dd8c7eb9b648 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7310,7 +7310,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 			if (unlikely(check_cfs_rq_runtime(cfs_rq))) {
 				cfs_rq = &rq->cfs;
 
-				if (!cfs_rq->nr_running)
+				if (!cfs_rq->nr_running || cfs_rq_throttled(cfs_rq))
 					goto idle;
 
 				goto simple;
-- 
2.25.1