lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221114120453.3233-1-xuewen.yan@unisoc.com>
Date:   Mon, 14 Nov 2022 20:04:53 +0800
From:   Xuewen Yan <xuewen.yan@...soc.com>
To:     <peterz@...radead.org>, <mingo@...hat.com>,
        <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>
CC:     <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
        <vschneid@...hat.com>, <ke.wang@...soc.com>,
        <linux-kernel@...r.kernel.org>
Subject: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

We are performing the stress test related to cpu hotplug,
when there are only two cpus left in the system(for example: cpu0 and cpu1),
if cpu1 is offline at this time, the following infinite loop may occur:

When cpu0 contains more than one rt task, it will push the other rt tasks
to cpux for execution. If cpu1 is going through the hotplug process at this time,
it woule start a stop_machine to be responsible for migrating the tasks on cpu1
to other online cpus. And prevent any task from migrating to this cpu.
As a result, when the cpu0 migrates the rt task to cpu1, the stop_machine
re-migrates the rt task to cpu0, which causes the rt.pushable_tasks of rq0
to never be null. The result is these rt tasks that have been repeatedly
migrated to cpu0 and cpu1.

In order to prevent the above situation from happening, use cpu_active_mask
to prevent find_lowset_rq from selecting inactive cpu. At the same time,
when there is only one active cpu in the system, irq_push_irq_work should be
terminated in advance.

Co-developed-by: Ke Wang <ke.wang@...soc.com>
Signed-off-by: Ke Wang <ke.wang@...soc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
---
 kernel/sched/cpupri.c | 1 +
 kernel/sched/rt.c     | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e726eb4b..42c40cfdf836 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 
 	if (lowest_mask) {
 		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
 
 		/*
 		 * We have to ensure that we have at least one bit
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..9433696dae50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
 {
 	int next;
 	int cpu;
+	struct cpumask tmp_cpumask;
 
 	/*
 	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
@@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
 		/* When rto_cpu is -1 this acts like cpumask_first() */
 		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
 
+		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
+		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
+		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
+			break;
+
 		rd->rto_cpu = cpu;
 
 		if (cpu < nr_cpu_ids)
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ