linux-kernel - Re: kdump always hangs in rcu_barrier() -> wait_for

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201126154630.GR1437@paulmck-ThinkPad-P72>
Date:   Thu, 26 Nov 2020 07:46:30 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     "boqun.feng@...il.com" <boqun.feng@...il.com>,
        Ingo Molnar <mingo@...hat.com>,
        "rcu@...r.kernel.org" <rcu@...r.kernel.org>,
        vkuznets <vkuznets@...hat.com>,
        Michael Kelley <mikelley@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: kdump always hangs in rcu_barrier() -> wait_for_completion()

On Wed, Nov 25, 2020 at 04:56:33AM +0000, Dexuan Cui wrote:
> Hi,
> I happened to hit a kdump hang issue in a Linux VM running on some
> Hyper-V host. Please see the attached log: the kdump kernel always hangs,
> even if I configure only 1 virtual CPU to the VM.
> 
> I firstly hit the issue in RHEL 8.3's 4.18.x kernel, but later I found that 
> the latest upstream v5.10-rc5 also has the same issue (at least the
> symptom is exactly the same), so I dug into v5.10-rc5 and found that the
> kdump kernel always hangs in kernel_init() -> mark_readonly() ->
> rcu_barrier() -> wait_for_completion(&rcu_state.barrier_completion).
> 
> Let's take the 1-vCPU case for example (refer to the attached log): in the
> below code, somehow rcu_segcblist_n_cbs() returns true, so the call
> smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1) increases
> the counter by 1, and hence later the counter is still 1 after the
> atomic_sub_and_test(), and the complete() is not called.
> 
> static void rcu_barrier_func(void *cpu_in)
> {
>         ...
>         if (rcu_segcblist_entrain(&rdp->cblist, &rdp->barrier_head)) {
>                 atomic_inc(&rcu_state.barrier_cpu_count);
>         } else {
>         ...
> }
> 
> void rcu_barrier(void)
> {
>         ...
>         atomic_set(&rcu_state.barrier_cpu_count, 2);
>         ...
>         for_each_possible_cpu(cpu) {
>                 rdp = per_cpu_ptr(&rcu_data, cpu);
>                 ...
>                 if (rcu_segcblist_n_cbs(&rdp->cblist) && cpu_online(cpu)) {
>                         ...
>                         smp_call_function_single(cpu, rcu_barrier_func, (void *)cpu, 1);
>                         ...
>                 }
>         }
>         ...
>         if (atomic_sub_and_test(2, &rcu_state.barrier_cpu_count))
>                 complete(&rcu_state.barrier_completion);
> 
>         ...
>         wait_for_completion(&rcu_state.barrier_completion);
> 
> Sorry for my ignorance of RCU -- I'm not sure why the rcu_segcblist_n_cbs()
> returns 1 here. In the normal kernel, it returns 0, so the normal kernel does not
> hang.

The rcu_segcblist_n_cbs() function returns non-zero because something
invoked call_rcu() some time previously.  The ftrace facility (or just
a printk) should help you work out where that call_rcu() is located.

> Note: in the case of kdump kernel, if I remove the kernel parameter
> console=ttyS0 OR if I build the kernel with CONFIG_HZ=250, the issue can
> no longer reproduce. Currently my kernel uses CONFIG_HZ=1000 and I use
> console=ttyS0, so I'm able to reproduce the isue every time.
> 
> Note: the same kernel binary can not reproduce the issue when the VM
> runs on another Hyper-V host.
> 
> It looks there is some kind of race condition?

Quite possibly.  Perhaps there is some code somewhere in the kernel that
invokes call_rcu() after some number of jiffies, in which case reducing
CONFIG_HZ would delay that call_rcu() invocation until after this call
to rcu_barrier().  Perhaps that call_rcu() is also associated with the
console=ttyS0, so that it is delayed sufficiently with reduced CONFIG_HZ
and missing entirely without console=ttyS0.

> Looking forward to your insights!
> 
> I'm happy to test any patch or enable more tracing, if necessary. Thanks!

My best guess is that the underlying bug is that you are invoking
rcu_barrier() before the RCU grace-period kthread has been created.
This means that RCU grace periods cannot complete, which in turn means
that if there has been even one invocation of call_rcu() since boot,
rcu_barrier() cannot complete, which is what you are in fact seeing.
Please note that it is perfectly legal to invoke call_rcu() very early in
the boot process, as in even before the call to rcu_init().  Therefore,
if this is the case, the bug is the early call to rcu_barrier(), not
the early calls to call_rcu().

To check this, at the beginning of rcu_barrier(), check the value of
rcu_state.gp_kthread.  If my guess is correct, it will be NULL.

Another possibility is that rcu_state.gp_kthread is non-NULL, but that
something else is preventing RCU grace periods from completing, but in
that case you should see RCU CPU stall warnings.  Unless of course they
have been disabled.

Please let me know what you find!

							Thanx, Paul