lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Jan 2022 17:08:48 +0100
From:   Sven Schnelle <svens@...ux.ibm.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yinan Liu <yinan@...ux.alibaba.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Sachin Sant <sachinp@...ux.ibm.com>,
        linuxppc-dev@...ts.ozlabs.org,
        Russell King <linux@...linux.org.uk>,
        linux-arm-kernel@...ts.infradead.org, hca@...ux.ibm.com,
        linux-s390@...r.kernel.org, "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: ftrace hangs waiting for rcu

Hi Mark,

Mark Rutland <mark.rutland@....com> writes:

> On arm64 I bisected this down to:
>
>   7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
>
> Which was going wrong because ilog2() rounds down, and so the shift was wrong
> for any nr_cpus that was not a power-of-two. Paul had already fixed that in
> rcu-next, and just sent a pull request to Linus:
>
>   https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/
>
> With that applied, I no longer see these hangs.
>
> Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
> the issue for you?

We noticed the PR from Paul and are currently testing the fix. So far
it's looking good. The configuration where we have seen the hang is a
bit unusual:

- 16 physical CPUs on the kvm host
- 248 logical CPUs inside kvm
- debug kernel both on the host and kvm guest

So things are likely a bit slow in the kvm guest. Interesting is that
the number of CPUs is even. But maybe RCU sees an odd number of CPUs
and gets confused before all cpus are brought up. Have to read code/test
to see whether that could be possible.

Thanks for investigating!
Sven

Powered by blists - more mailing lists