linux-kernel - Re: for_each_domain()/sched_domain_span() has offline CPUs (was Re: [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhmsqiayjq.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Thu, 28 Mar 2024 21:31:21 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, Thomas Gleixner
 <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>, Ingo Molnar
 <mingo@...nel.org>, Anna-Maria Behnsen <anna-maria@...utronix.de>, Alex
 Shi <alexs@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Vincent
 Guittot <vincent.guittot@...aro.org>, Barry Song
 <song.bao.hua@...ilicon.com>
Subject: Re: for_each_domain()/sched_domain_span() has offline CPUs (was Re:
 [PATCH 2/2] timers: Fix removed self-IPI on global timer's enqueue in
 nohz_full)

On 28/03/24 17:58, Frederic Weisbecker wrote:
> Le Thu, Mar 28, 2024 at 03:08:08PM +0100, Valentin Schneider a écrit :
>> On 27/03/24 15:28, Valentin Schneider wrote:
>> > On 27/03/24 13:42, Frederic Weisbecker wrote:
>> >> Le Tue, Mar 26, 2024 at 05:46:07PM +0100, Valentin Schneider a écrit :
>> >>> > Then with that patch I ran TREE07, just some short iterations:
>> >>> >
>> >>> > tools/testing/selftests/rcutorture/bin/kvm.sh --configs "10*TREE07" --allcpus --bootargs "rcutorture.onoff_interval=200" --duration 2
>> >>> >
>> >>> > And the warning triggers very quickly. At least since v6.3 but maybe since
>> >>> > earlier. Is this expected behaviour or am I right to assume that
>> >>> > for_each_domain()/sched_domain_span() shouldn't return an offline CPU?
>> >>> >
>> >>> 
>> >>> I would very much assume an offline CPU shouldn't show up in a
>> >>> sched_domain_span().
>> >>> 
>> >>> Now, on top of the above, there's one more thing worth noting:
>> >>>   cpu_up_down_serialize_trainwrecks()
>> >>> 
>> >>> This just flushes the cpuset work, so after that the sched_domain topology
>> >>> should be sane. However I see it's invoked at the tail end of _cpu_down(),
>> >>> IOW /after/ takedown_cpu() has run, which sounds too late. The comments
>> >>> around this vs. lock ordering aren't very reassuring however, so I need to
>> >>> look into this more.
>> >>
>> >> Ouch...
>> >>
>> >>> 
>> >>> Maybe as a "quick" test to see if this is the right culprit, you could try
>> >>> that with CONFIG_CPUSET=n? Because in that case the sched_domain update is
>> >>> ran within sched_cpu_deactivate().
>> >>
>> >> I just tried and I fear that doesn't help. It still triggers even without
>> >> cpusets :-s
>> >>
>> >
>> > What, you mean I can't always blame cgroups? What has the world come to?
>> >
>> > That's interesting, it means the deferred work item isn't the (only)
>> > issue. I'll grab your test patch and try to reproduce on TREE07.
>> >
>> 
>> Unfortunately I haven't been able to trigger your warning with ~20 runs of
>> TREE07 & CONFIG_CPUSETS=n, however it does trigger reliably with
>> CONFIG_CPUSETS=y, so I'm back to thinking the cpuset work is a likely
>> culprit...
>
> Funny, I just checked again and I can still reliably reproduce with:
>
> ./tools/testing/selftests/rcutorture/bin/kvm.sh --kconfig "CONFIG_CPUSETS=n CONFIG_PROC_PID_CPUSET=n" --configs "10*TREE07" --allcpus --bootargs "rcutorture.onoff_interval=200" --duration 2
>
> I'm thinking there might be several culprits... ;-)

Hmm, frustrating that I can't seem to reproduce this...

Could you run this with CONFIG_SCHED_DEBUG=y and sched_verbose on the
cmdline? And maybe tweak the warning to show which CPU we are scanning the
sched_domain of and which one we found to be offline in the span.