lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun,  9 Jun 2024 22:12:44 +0800
From: Wei Fu <fuweid89@...il.com>
To: oleg@...hat.com
Cc: Sudhanva.Huruli@...rosoft.com,
	akpm@...ux-foundation.org,
	apais@...ux.microsoft.com,
	axboe@...nel.dk,
	boqun.feng@...il.com,
	brauner@...nel.org,
	frederic@...nel.org,
	fuweid89@...il.com,
	j.granados@...sung.com,
	jiangshanlai@...il.com,
	joel@...lfernandes.org,
	josh@...htriplett.org,
	linux-kernel@...r.kernel.org,
	mathieu.desnoyers@...icios.com,
	michael.christie@...cle.com,
	mjguzik@...il.com,
	neeraj.upadhyay@...nel.org,
	paulmck@...nel.org,
	qiang.zhang1211@...il.com,
	rachelmenge@...ux.microsoft.com,
	rcu@...r.kernel.org,
	rostedt@...dmis.org
Subject: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING

> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
> 
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.
> 
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <rachelmenge@...ux.microsoft.com>
> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
> Signed-off-by: Oleg Nesterov <oleg@...hat.com>

Tested-By: Wei Fu <fuweid89@...il.com>

This change looks good to me!

I used [rcudeadlock-v1][1] to verify this patch on v5.15.160 for more than 30
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 10 minutes.

```
root@(none):/# uname -a
Linux (none) 5.15.160-dirty #7 SMP Fri Jun 7 15:25:30 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

root@(none):/# ps -ef | grep rcu
root 3 2 0 Jun07 ? 00:00:00 [rcu_gp]
root 4 2 0 Jun07 ? 00:00:00 [rcu_par_gp]
root 11 2 0 Jun07 ? 00:00:00 [rcu_tasks_rude_]
root 12 2 0 Jun07 ? 00:00:00 [rcu_tasks_trace]
root 15 2 0 Jun07 ? 00:03:31 [rcu_sched]
root 145 141 0 Jun07 ? 00:15:29 ./rcudeadlock
root 5372 141 0 13:37 ? 00:00:00 grep rcu

root@(none):/# date
Sun Jun 9 13:37:38 UTC 2024
```


I used [rcudeadlock-v2][2] to verify this patch on v6.10-rc2 for more than 2
hours. The soft lockup didn't show up. If there is no such patch, that
test will trigger soft-lockup in 1 minute.

```
root@(none):/# uname -a
Linux (none) 6.10.0-rc2-dirty #4 SMP Sun Jun 9 11:19:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

root@(none):/# ps -ef | grep rcu
root 4 2 0 11:20 ? 00:00:00 [kworker/R-rcu_g]
root 13 2 0 11:20 ? 00:00:00 [rcu_tasks_rude_kthread]
root 14 2 0 11:20 ? 00:00:00 [rcu_tasks_trace_kthread]
root 16 2 0 11:20 ? 00:00:03 [rcu_sched]
root 17 2 0 11:20 ? 00:00:00 [rcu_exp_par_gp_kthread_worker/0]
root 18 2 0 11:20 ? 00:00:12 [rcu_exp_gp_kthread_worker]
root 117 108 0 11:21 ? 00:01:06 ./rcudeadlock
root 14451 108 0 13:37 ? 00:00:00 grep rcu

root@(none):/# date
Sun Jun 9 13:37:15 UTC 2024
```


It's about data-race during cleanup active iou-wrk-thread. I shares that idea
about how to verify this patch.

> ---
>  kernel/pid_namespace.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
>  	 */
>  	do {
>  		clear_thread_flag(TIF_SIGPENDING);
> +		clear_thread_flag(TIF_NOTIFY_SIGNAL);
>  		rc = kernel_wait4(-1, NULL, __WALL, NULL);
>  	} while (rc != -ECHILD);
>  
> -- 
> 2.25.1.362.g51ebf55
> 
> 
> 

Let's assume that there is new pid namespace unshared by host pid namespace,
named by `PA`. There are two processes in `PA`. The init process is named by
`X` and its child is named by `Y`.

```
unshare(CLONE_NEWPID|CLONE_NEWNS)

	X
	|__ Y
```

The main-thread of process X creates one active iouring worker thread
`iou-wrk-X`. When process X exits, that main-thread of process X wakes up and
set `TIF_NOTIFY_SIGNAL` flag on `iou-wrk-X` thread.

However, if `iou-wrk-X` thread receives signal from main-thread and wakes up,
that thread isn't able to clear `TIF_NOTIFY_SIGNAL` flag. And that `iou-wrk-X`
thread is last thread in process-X and it will carry `TIF_NOTIFY_SIGNAL` flag
to enter `zap_pid_ns_processes`. It can be described by the following comment.


```
== X main-thread ==		== X iou-wrk-X ==	== Y main-thread ==

do_exit
  kill iou-wrk-X thread
  io_uring_files_cancel		io_wq_worker
    set TIF_NOTIFY_SIGNAL on
     iou-wrk-X thread
				do_exit(0)
  exit_task_namespace		

				  exit_task_namespace
  do_task_dead

				  exit_notify
				    forget_original_parent
				      find_child_reaper
				        zap_pid_ns_processes	do_exit
								  exit_task_namespace
								    ...
								    namespace_unlock
								      synchronize_rcu_expedited
```


The `iou-wrk-X` thread kills process-Y which is only one holding the mount
namespace reference. The process-Y will get into `synchronize_rcu_expedited`.

Since kernel doesn't enable preempt and `iou-wrk-X` thread has
`TIF_NOTIFY_SIGNAL` flag, the `iou-wrk-X` thread will get into infinity loop,
which cause soft lockup.

So, in [rcudeadlock-v2][2] test, I create more active iou-wrk- threads in
init process so that there is high chance to have iou-wrk- thread in
`zap_pid_ns_processes` function.

Hope it can help.
Thanks,

Wei

[1]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/blob/662b8e414ff15d75419e2286b8121b7c2049a37c/rcudeadlock.go#L1
[2]: https://github.com/rlmenge/rcu-soft-lock-issue-repro/pull/1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ