lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 Nov 2022 11:09:32 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     "Elliott, Robert (Servers)" <elliott@....com>
Cc:     Mirsad Goran Todorovac <mirsad.todorovac@....unizg.hr>,
        Phillip Lougher <phillip@...ashfs.org.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        "phillip.lougher@...il.com" <phillip.lougher@...il.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>
Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in
 squashfs_readahead())

On Wed, Nov 23, 2022 at 04:08:03PM +0000, Elliott, Robert (Servers) wrote:
> > -----Original Message-----
> > From: Paul E. McKenney <paulmck@...nel.org>
> > Sent: Wednesday, November 23, 2022 8:44 AM
> > Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> > in squashfs_readahead())
> > 
> > On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
> ...
> 
> > > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > > Author: Christoph Hellwig <hch@....de>
> > > Date:   Wed Jun 29 17:01:02 2022 +0200
> > >
> > >     remove CONFIG_ANDROID
> > 
> > But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> > As in 20 milliseconds.
> > 
> > Is that intentional?
> ...
> > > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > > way down to 20 milliseconds.  This could allow you to ease into the
> > > > latency-reduction work.
> > > >
> > > > Alternatively, consider that response time is a property of the
> > > > entire system plus the environment that it runs in.  So I suspect that
> > > > the Android folks are accompanying that 20-millisecond timeout with
> > > > some restrictions on what the on-phone workloads are permitted to do.
> > > > Maybe ask the Android guys what those restrictions are and loosen them
> > > > slightly, again allowing you to ease into the latency-reduction work.
> 
> Since there is no longer a kernel default of 20 ms for Android since
> commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
> to drop that from stallwarn.rst. Something like:

Fair point, except that the Android guys asked for this.  In fact, the
Android guys supplied the patches.  I like the traceability.

> diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
> index e38c587067fc..d86a8b47504f 100644
> --- a/Documentation/RCU/stallwarn.rst
> +++ b/Documentation/RCU/stallwarn.rst
> @@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
>         Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
>         the expedited grace period. This parameter defines the period
>         of time that RCU will wait from the beginning of an expedited
> -       grace period until it issues an RCU CPU stall warning. This time
> -       period is normally 20 milliseconds on Android devices.  A zero
> -       value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
> -       after conversion to milliseconds.
> +       grace period until it issues an RCU CPU stall warning.
> +
> +       A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
> +        used, after conversion to milliseconds, limited to a maximum of
> +        21 seconds.
> 
>         This configuration parameter may be changed at runtime via the
>         /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 1b0c41d490f0..4477eeb8a54f 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
>           If the RCU grace period persists, additional CPU stall warnings
>           are printed at more widely spaced intervals.  A value of zero
>           says to use the RCU_CPU_STALL_TIMEOUT value converted from
> -         seconds to milliseconds.
> +         seconds to milliseconds, limited to a maximum of 21 seconds.
> 
> > > > But please keep in mind that cond_resched() helps only in kernels
> > > > built with CONFIG_PREEMPTION=n.
> 
> I think they also help with
> 	CONFIG_PREEMPT_VOLUNTARY=y
> which is used by Fedora and RHEL. For example, adding more of those in
> the crypto tcrypt test module eliminated some RCU stalls when running
> speed tests (that send lots of data through the algorithms).
> 
> > > I have bad news that 6.1-r6 is still affected with
> > > squashfs_xz_uncompress bug, despite having both of your fixes
> > > (as visible in above command's output -- double checked):
> > >
> > > [   91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
> > 
> > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > this still happen?
> 
> BTW, you don't need to rebuild the kernel to change those parameters; they're
> module parameters, so can be modified on the kernel command line (if needed
> during boot) and sysfs (if only needed after boot).
> 
> For sysfs the syntax is:
> #!/bin/bash
> # set rcu timeouts to specified values
> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout

Excellent point, thank you!

I hope that this makes Mirsad's life easier, perhaps featuring less time
waiting for kernel builds and reboots.  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ