linux-kernel - [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221114120453.3233-1-xuewen.yan@unisoc.com>
Date:   Mon, 14 Nov 2022 20:04:53 +0800
From:   Xuewen Yan <xuewen.yan@...soc.com>
To:     <peterz@...radead.org>, <mingo@...hat.com>,
        <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>
CC:     <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
        <vschneid@...hat.com>, <ke.wang@...soc.com>,
        <linux-kernel@...r.kernel.org>
Subject: [PATCH] sched/rt: Use cpu_active_mask to prevent rto_push_irq_work's dead loop

We are performing the stress test related to cpu hotplug,
when there are only two cpus left in the system(for example: cpu0 and cpu1),
if cpu1 is offline at this time, the following infinite loop may occur:

When cpu0 contains more than one rt task, it will push the other rt tasks
to cpux for execution. If cpu1 is going through the hotplug process at this time,
it woule start a stop_machine to be responsible for migrating the tasks on cpu1
to other online cpus. And prevent any task from migrating to this cpu.
As a result, when the cpu0 migrates the rt task to cpu1, the stop_machine
re-migrates the rt task to cpu0, which causes the rt.pushable_tasks of rq0
to never be null. The result is these rt tasks that have been repeatedly
migrated to cpu0 and cpu1.

In order to prevent the above situation from happening, use cpu_active_mask
to prevent find_lowset_rq from selecting inactive cpu. At the same time,
when there is only one active cpu in the system, irq_push_irq_work should be
terminated in advance.

Co-developed-by: Ke Wang <ke.wang@...soc.com>
Signed-off-by: Ke Wang <ke.wang@...soc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
---
 kernel/sched/cpupri.c | 1 +
 kernel/sched/rt.c     | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c
index a286e726eb4b..42c40cfdf836 100644
--- a/kernel/sched/cpupri.c
+++ b/kernel/sched/cpupri.c
@@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p,
 
 	if (lowest_mask) {
 		cpumask_and(lowest_mask, &p->cpus_mask, vec->mask);
+		cpumask_and(lowest_mask, lowest_mask, cpu_active_mask);
 
 		/*
 		 * We have to ensure that we have at least one bit
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ed2a47e4ddae..9433696dae50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2219,6 +2219,7 @@ static int rto_next_cpu(struct root_domain *rd)
 {
 	int next;
 	int cpu;
+	struct cpumask tmp_cpumask;
 
 	/*
 	 * When starting the IPI RT pushing, the rto_cpu is set to -1,
@@ -2238,6 +2239,11 @@ static int rto_next_cpu(struct root_domain *rd)
 		/* When rto_cpu is -1 this acts like cpumask_first() */
 		cpu = cpumask_next(rd->rto_cpu, rd->rto_mask);
 
+		cpumask_and(&tmp_cpumask, rd->rto_mask, cpu_active_mask);
+		if (rd->rto_cpu == -1 && cpumask_weight(&tmp_cpumask) == 1 &&
+		    cpumask_test_cpu(smp_processor_id(), &tmp_cpumask))
+			break;
+
 		rd->rto_cpu = cpu;
 
 		if (cpu < nr_cpu_ids)
-- 
2.25.1