lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Jun 2024 14:42:37 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Wei Fu <fuweid89@...il.com>
Cc: Sudhanva.Huruli@...rosoft.com, akpm@...ux-foundation.org,
	apais@...ux.microsoft.com, axboe@...nel.dk, boqun.feng@...il.com,
	brauner@...nel.org, ebiederm@...ssion.com, frederic@...nel.org,
	j.granados@...sung.com, jiangshanlai@...il.com,
	joel@...lfernandes.org, josh@...htriplett.org,
	linux-kernel@...r.kernel.org, mathieu.desnoyers@...icios.com,
	michael.christie@...cle.com, mjguzik@...il.com,
	neeraj.upadhyay@...nel.org, paulmck@...nel.org,
	qiang.zhang1211@...il.com, rachelmenge@...ux.microsoft.com,
	rcu@...r.kernel.org, rostedt@...dmis.org, weifu@...rosoft.com
Subject: Re: [RCU] zombie task hung in synchronize_rcu_expedited

On 06/07, Oleg Nesterov wrote:
>
> On 06/07, Wei Fu wrote:
> >
> > Yes. I applied your patch on v5.15.160 and run reproducer for 5 hours.
> > I didn't see this issue. Currently, it looks good!. I will continue that test
> > on this weekend.
>
> Great, thanks!
>
> > In last reply, you mentioned TIF_NOTIFY_SIGNAL related to busy-wait loop.
> > Would you please explain why flag-clear works here?
>
> Sure, I'll write the changelog with the explanation and send the patch on
> weekend. If it passes your testing.

Please see the patch I've sent. The changelog doesn't bother to describe this
particular problem because busy-waiting can obviously cause multiple problems,
especially without CONFIG_PREEMPT or if rt_task().

So let me add more details about this particular deadlock here.

The sub-namespace init task T spins in a tight loop calling kernel_wait4()
which returns -EINTR without sleeping because its child C has not exited
yet and signal_pending(T) is true due to TIF_NOTIFY_SIGNAL.

The exiting child C sleeps in synchronize_rcu() which hangs exactly because
T never calls schedule/rcu_note_context_switch, it can't be preempted because
CONFIG_PREEMPT is not enabled.

Note also that without PREEMPT_RCU __rcu_read_lock() is just preempt_disable()
which is nop without CONFIG_PREEMPT.

Oleg.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ