linux-kernel - Re: [PATCH] sched/deadline: Skip overflow check if 0 capacity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zy48-zbUnjTzxXXB@jlelli-thinkpadt14gen4.remote.csb>
Date: Fri, 8 Nov 2024 17:31:55 +0100
From: Juri Lelli <juri.lelli@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	linux-kernel@...r.kernel.org, Phil Auld <pauld@...hat.com>,
	Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH] sched/deadline: Skip overflow check if 0 capacity

On 08/11/24 08:39, Waiman Long wrote:
> On 11/8/24 8:25 AM, Juri Lelli wrote:
> > On 07/11/24 23:29, Waiman Long wrote:
> > > By properly setting up a 1-cpu sched domain (partition) with no
> > > task, it was found that offlining that particular CPU failed because
> > > dl_bw_check_overflow() in cpuset_cpu_inactive() returned -EBUSY. This
> > > is due to the fact that dl_bw_capacity() return 0 as the sched domain
> > > has no active CPU causing a false positive in the overflow check.
> > > 
> > > Fix this corner case by skipping the __dl_overflow() check in
> > > dl_bw_manage() when the returned capacity is 0.
> > > 
> > > Signed-off-by: Waiman Long <longman@...hat.com>
> > > ---
> > >   kernel/sched/deadline.c | 8 +++++++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index be1b917dc8ce..0195f350d6d3 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -3479,7 +3479,13 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
> > >   	} else {
> > >   		unsigned long cap = dl_bw_capacity(cpu);
> > > -		overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
> > > +		/*
> > > +		 * In the unlikely case of 0 capacity (e.g. a sched domain
> > > +		 * with no active CPUs), skip the overflow check as it will
> > > +		 * always return a false positive.
> > > +		 */
> > > +		if (likely(cap))
> > > +			overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
> > The remaining total_bw that make this check fail should be the one
> > relative to the dl_server on the cpu that is going offline. Wonder if we
> > shouldn't rather clean that up (remove dl_server contribution) before we
> > get to this point during an hotplug operation. Need to think about it a
> > little more.
> static inline bool
> __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_bw)
> {
>         return dl_b->bw != -1 &&
>                cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
> }
> 
> With a 0 cap, cap_scale(dl_b->bw, cap) will always be 0. As long as total_bw
> isn't 0 and bw isn't -1, the condition will be true.

Right, but I fear that by hiding the special corner case we would also
miss the cases where we have DEADLINE tasks with bandwidth on that
single CPU and then ignore it.

So, maybe something like the below?

---
 kernel/sched/deadline.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index be1b917dc8ce..7884e566557c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -130,11 +130,24 @@ static inline int dl_bw_cpus(int i)
 	if (cpumask_subset(rd->span, cpu_active_mask))
 		return cpumask_weight(rd->span);
 
+	/*
+	 * Hotplug extreme case in which the last remaining online CPU in a
+	 * root domain is going offline. We get here after that cpus has been
+	 * cleared from cpu_active_mask, so the loop below would return 0
+	 * CPUs, which of course doesn't make much sense. Return at least 1
+	 * CPU.
+	 */
+
+	if (cpumask_weight(rd->span) == 1)
+		return 1;
+
 	cpus = 0;
 
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 
+	WARN_ON(!cpus);
+
 	return cpus;
 }
---

That said though, I believe I just found an additional issue. With the
above the system doesn't crash (it did w/o it), but happily moves
DEADLINE tasks out of a domain with a single CPU going offline. Last
time I looked at this we were properly checking and failing the hotplug
operation, but it was indeed a while ago, so not sure yet what changed.
More staring.

Oh, so broken, yay. :)

Best,
Juri