[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240610000726.146177-1-fuweid89@gmail.com>
Date: Mon, 10 Jun 2024 08:07:26 +0800
From: Wei Fu <fuweid89@...il.com>
To: oleg@...hat.com
Cc: Sudhanva.Huruli@...rosoft.com,
akpm@...ux-foundation.org,
apais@...ux.microsoft.com,
axboe@...nel.dk,
boqun.feng@...il.com,
brauner@...nel.org,
ebiederm@...ssion.com,
frederic@...nel.org,
fuweid89@...il.com,
j.granados@...sung.com,
jiangshanlai@...il.com,
joel@...lfernandes.org,
josh@...htriplett.org,
linux-kernel@...r.kernel.org,
mathieu.desnoyers@...icios.com,
michael.christie@...cle.com,
mjguzik@...il.com,
neeraj.upadhyay@...nel.org,
paulmck@...nel.org,
qiang.zhang1211@...il.com,
rachelmenge@...ux.microsoft.com,
rcu@...r.kernel.org,
rostedt@...dmis.org,
weifu@...rosoft.com
Subject: Re: [RCU] zombie task hung in synchronize_rcu_expedited
>
> On 06/07, Oleg Nesterov wrote:
> >
> > On 06/07, Wei Fu wrote:
> > >
> > > Yes. I applied your patch on v5.15.160 and run reproducer for 5 hours.
> > > I didn't see this issue. Currently, it looks good!. I will continue that test
> > > on this weekend.
> >
> > Great, thanks!
> >
> > > In last reply, you mentioned TIF_NOTIFY_SIGNAL related to busy-wait loop.
> > > Would you please explain why flag-clear works here?
> >
> > Sure, I'll write the changelog with the explanation and send the patch on
> > weekend. If it passes your testing.
>
> Please see the patch I've sent. The changelog doesn't bother to describe this
> particular problem because busy-waiting can obviously cause multiple problems,
> especially without CONFIG_PREEMPT or if rt_task().
>
> So let me add more details about this particular deadlock here.
>
> The sub-namespace init task T spins in a tight loop calling kernel_wait4()
> which returns -EINTR without sleeping because its child C has not exited
> yet and signal_pending(T) is true due to TIF_NOTIFY_SIGNAL.
>
> The exiting child C sleeps in synchronize_rcu() which hangs exactly because
> T never calls schedule/rcu_note_context_switch, it can't be preempted because
> CONFIG_PREEMPT is not enabled.
>
> Note also that without PREEMPT_RCU __rcu_read_lock() is just preempt_disable()
> which is nop without CONFIG_PREEMPT.
>
> Oleg.
>
>
Thanks for the update. That's really helpful!
Wei
Powered by blists - more mailing lists