lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <afe86f8d-e412-48e8-b2f6-a0879c832d4d@paulmck-laptop>
Date:   Wed, 11 Oct 2023 03:25:00 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, patches@...ts.linux.dev,
        linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
        akpm@...ux-foundation.org, linux@...ck-us.net, shuah@...nel.org,
        patches@...nelci.org, lkft-triage@...ts.linaro.org, pavel@...x.de,
        jonathanh@...dia.com, f.fainelli@...il.com,
        sudipm.mukherjee@...il.com, srw@...dewatkins.net, rwarsow@....de,
        conor@...nel.org, Chengming Zhou <zhouchengming@...edance.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ovidiu Panait <ovidiu.panait@...driver.com>,
        Ingo Molnar <mingo@...nel.org>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 5.15 000/183] 5.15.134-rc1 review

On Wed, Oct 11, 2023 at 05:05:04AM +0000, Joel Fernandes wrote:
> On Tue, Oct 10, 2023 at 06:34:35PM -0700, Paul E. McKenney wrote:
> [...]
> > > > > > > > It's also worth noting that the bug this fixes wasn't exposed until the
> > > > > > > > maple tree (added in v6.1) was used for the IRQ descriptors (added in
> > > > > > > > v6.5).
> > > > > > >
> > > > > > > Lots of latent bugs, to be sure, even with rcutorture.  :-/
> > > > > >
> > > > > > The Right Thing is to fix the bug all the way back to the introduction,
> > > > > > but what fallout makes the backport less desirable than living with the
> > > > > > unexposed bug?
> > > > >
> > > > > You are quite right that it is possible for the risk of a backport to
> > > > > exceed the risk of the original bug.
> > > > >
> > > > > I defer to Joel (CCed) on how best to resolve this in -stable.
> > > > 
> > > > Maybe I am missing something but this issue should also be happening
> > > > in mainline right?
> > > > 
> > > > Even though mainline has 897ba84dc5aa ("rcu-tasks: Handle idle tasks
> > > > for recently offlined CPUs") , the warning should still be happening
> > > > due to Liam's "kernel/sched: Modify initial boot task idle setup"
> > > > because the warning is just rearranged a bit but essentially the same.
> > > > 
> > > > IMHO, the right thing to do then is to drop Liam's patch from 5.15 and
> > > > fix it in mainline (using the ideas described in this thread), then
> > > > backport both that new fix and Liam's patch to 5.15.
> > > > 
> > > > Or is there a reason this warning does not show up on the mainline?
> > 
> > There is not a whole lot of commonality between the v5.15.134 version of
> > RCU Tasks Trace and that of mainline.  In theory, in mainline, CPU hotplug
> > is supposed to be disabled across all calls to trc_inspect_reader(),
> > which means that there would not be any CPU coming or going.
> > 
> > But there could potentially be some time between when a CPU was
> > marked as online and its idle task was marked PF_IDLE.  And in
> > fact x86 start_secondary() invokes set_cpu_online() before it calls
> > cpu_startup_entry(), and it is the latter than sets PF_IDLE.
> > 
> > The same is true of alpha, arc, arm, arm64, csky, ia64, loongarch, mips,
> > openrisc, parisc, powerpc, riscv, s390, sh, sparc32, sparc64, x86 xen,
> > and xtensa, which is everybody.
> > 
> > One reason why my testing did not reproduce this is because I was running
> > against v6.6-rc1, and cff9b2332ab7 ("kernel/sched: Modify initial boot
> > task idle setup") went into v6.6-rc3.  An initial run merging in current
> > mainline also failed to reproduce this, but I am running overnight.
> > If that doesn't reproduce, I will try inserting delays between the
> > set_cpu_online() and the cpu_startup_entry().
> 
> I thought the warning happens before set_cpu_online() is even called, because
> under such situation, ofl == true and the task is not set to PF_IDLE yet:
> 
>                   WARN_ON_ONCE(ofl && task_curr(t) && !is_idle_task(t));

That case is supposed to be excluded by the cpus_read_lock() calls.
Yes, key phrase "supposed to be".  ;-)

> > If this problem is real, fixes include:
> > 
> > o	Revert Liam's patch and make Tiny RCU's call_rcu() deal with
> > 	the problem.  This is overhead and non-tinyness, but to Joel's
> > 	point, it might be best.
> > 
> > o	Go back to something more like Liam's original patch, which
> > 	cleared PF_IDLE only for the boot CPU.
> > 
> > o	Set PF_IDLE before calling set_cpu_online().  This would work,
> > 	but it would also be rather ugly, reaching into each and every
> > 	architecture.
> > 
> > o	Move the call to set_cpu_online() into cpu_startup_entry().
> > 	This would require some serious inspection to prove that it is
> > 	safe, assuming that it is in fact safe.
> > 
> > o	Drop the WARN_ON_ONCE() from trc_inspect_reader().  Not all
> > 	that excited by losing this diagnostic, but then again it
> > 	has been awhile since it has caught anything.
> > 
> > o	Make the WARN_ON_ONCE() condition in trc_inspect_reader() instead
> > 	to a "return false" to retry later.  Ditto, also not liking the
> > 	possibility of indefinite deferral with no warning.
> 
> Just for completeness, 
> 
>  o      Since it just a warning, checking for task_struct::pid == 0 instead of is_idle_task()?
>         Though PF_IDLE is also set in play_idle_precise().
> 
>  o	Change warning to:
>                   WARN_ON_ONCE(ofl && task_curr(t) && (!is_idle_task(t) && t->pid != 0));

This change does look promising, thank you!

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ