[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MW2PR2101MB18014505C01027A9486D45EEBFF91@MW2PR2101MB1801.namprd21.prod.outlook.com>
Date: Thu, 26 Nov 2020 21:25:28 +0000
From: Dexuan Cui <decui@...rosoft.com>
To: "paulmck@...nel.org" <paulmck@...nel.org>
CC: "boqun.feng@...il.com" <boqun.feng@...il.com>,
Ingo Molnar <mingo@...hat.com>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
vkuznets <vkuznets@...hat.com>,
Michael Kelley <mikelley@...rosoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: kdump always hangs in rcu_barrier() -> wait_for_completion()
> From: Paul E. McKenney <paulmck@...nel.org>
> Sent: Thursday, November 26, 2020 7:47 AM
> ...
> The rcu_segcblist_n_cbs() function returns non-zero because something
> invoked call_rcu() some time previously. The ftrace facility (or just
> a printk) should help you work out where that call_rcu() is located.
call_rcu() is indeed called multiple times, but as you said, this should
be normal.
> My best guess is that the underlying bug is that you are invoking
> rcu_barrier() before the RCU grace-period kthread has been created.
> This means that RCU grace periods cannot complete, which in turn means
> that if there has been even one invocation of call_rcu() since boot,
> rcu_barrier() cannot complete, which is what you are in fact seeing.
> Please note that it is perfectly legal to invoke call_rcu() very early in
> the boot process, as in even before the call to rcu_init(). Therefore,
> if this is the case, the bug is the early call to rcu_barrier(), not
> the early calls to call_rcu().
>
> To check this, at the beginning of rcu_barrier(), check the value of
> rcu_state.gp_kthread. If my guess is correct, it will be NULL.
Unluckily, it's not NULL here. :-)
>
> Another possibility is that rcu_state.gp_kthread is non-NULL, but that
> something else is preventing RCU grace periods from completing, but in
It looks like somehow the scheduling is not working here: in rcu_barrier()
, if I replace the wait_for_completion() with
wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the
issue persists.
> that case you should see RCU CPU stall warnings. Unless of course they
> have been disabled.
> Thanx, Paul
I guess I didn't disable the wanrings (I don't even know how to do that :)
grep RCU .config
# RCU Subsystem
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_NOCB_CPU=y
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=30
CONFIG_RCU_TRACE=y
CONFIG_RCU_EQS_DEBUG=y
# end of RCU Debugging
Thanks,
-- Dexuan
Powered by blists - more mailing lists