lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240430090431.1619622-1-ankit-aj.jain@broadcom.com>
Date: Tue, 30 Apr 2024 14:34:31 +0530
From: Ankit Jain <ankit-aj.jain@...adcom.com>
To: yury.norov@...il.com,
	linux@...musvillemoes.dk,
	akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Cc: juri.lelli@...hat.com,
	pauld@...hat.com,
	ajay.kaher@...adcom.com,
	alexey.makhalov@...adcom.com,
	vasavi.sirnapalli@...adcom.com,
	Ankit Jain <ankit-aj.jain@...adcom.com>
Subject: [PATCH] lib/cpumask: Boot option to disable tasks distribution within cpumask

commit 46a87b3851f0 ("sched/core: Distribute tasks within affinity masks")
and commit 14e292f8d453 ("sched,rt: Use cpumask_any*_distribute()")
introduced the logic to distribute the tasks within cpumask upon initial
wakeup. For Telco RAN deployments, isolcpus are a necessity to cater to
the requirement of low latency applications. These isolcpus are generally
tickless so that high priority SCHED_FIFO tasks can execute without any
OS jitter. Since load balancing is disabled on isocpus, any task
which gets placed on these CPUs can not be migrated on its own.
For RT applications to execute on isolcpus, a guaranteed kubernetes pod
with all isolcpus becomes the requirement and these RT applications are
affine to execute on a specific isolcpu within the kubernetes pod.
However, there may be some non-RT tasks which could also schedule in the
same kubernetes pod without being affine to any specific CPU(inherits the
pod cpuset affinity). With multiple spawning and running containers inside
the pod, container runtime spawns several non-RT initializing tasks
("runc init") inside the pod and due to above mentioned commits, these
non-RT tasks may get placed on any isolcpus and may starve if it happens
to wakeup on the same CPU as SCHED_FIFO task because RT throttling is also
disabled in telco setup. Thus, RAN deployment fails and eventually leads
to system hangs.

With the introduction of kernel cmdline param 'sched_pick_firstcpu',
there is an option provided for such usecases to disable the distribution
of tasks within the cpumask logic and use the previous 'pick first cpu'
approach for initial placement of tasks. Because many telco vendors
configure the system in such a way that the first cpu within a cpuset
of pod doesn't run any SCHED_FIFO or High priority tasks.

Co-developed-by: Alexey Makhalov <alexey.makhalov@...adcom.com>
Signed-off-by: Alexey Makhalov <alexey.makhalov@...adcom.com>
Signed-off-by: Ankit Jain <ankit-aj.jain@...adcom.com>
---
 lib/cpumask.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/lib/cpumask.c b/lib/cpumask.c
index e77ee9d46f71..3dea87d5ec1f 100644
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@ -154,6 +154,23 @@ unsigned int cpumask_local_spread(unsigned int i, int node)
 }
 EXPORT_SYMBOL(cpumask_local_spread);
 
+/*
+ * Task distribution within the cpumask feature disabled?
+ */
+static bool cpumask_pick_firstcpu __read_mostly;
+
+/*
+ * Disable Tasks distribution within the cpumask feature
+ */
+static int __init cpumask_pick_firstcpu_setup(char *str)
+{
+	cpumask_pick_firstcpu = 1;
+	pr_info("cpumask: Tasks distribution within cpumask is disabled.");
+	return 1;
+}
+
+__setup("sched_pick_firstcpu", cpumask_pick_firstcpu_setup);
+
 static DEFINE_PER_CPU(int, distribute_cpu_mask_prev);
 
 /**
@@ -171,6 +188,13 @@ unsigned int cpumask_any_and_distribute(const struct cpumask *src1p,
 {
 	unsigned int next, prev;
 
+	/*
+	 * Don't distribute, if tasks distribution
+	 * within cpumask feature is disabled
+	 */
+	if (cpumask_pick_firstcpu)
+		return cpumask_any_and(src1p, src2p);
+
 	/* NOTE: our first selection will skip 0. */
 	prev = __this_cpu_read(distribute_cpu_mask_prev);
 
-- 
2.23.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ