[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a5jpqamx.fsf@email.froward.int.ebiederm.org>
Date: Thu, 13 Jun 2024 07:40:06 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Rachel Menge
<rachelmenge@...ux.microsoft.com>, linux-kernel@...r.kernel.org,
rcu@...r.kernel.org, Wei Fu <fuweid89@...il.com>,
apais@...ux.microsoft.com, Sudhanva Huruli
<Sudhanva.Huruli@...rosoft.com>, Jens Axboe <axboe@...nel.dk>, Christian
Brauner <brauner@...nel.org>, Mike Christie
<michael.christie@...cle.com>, Joel Granados <j.granados@...sung.com>,
Mateusz Guzik <mjguzik@...il.com>, "Paul E. McKenney"
<paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>, Neeraj
Upadhyay <neeraj.upadhyay@...nel.org>, Joel Fernandes
<joel@...lfernandes.org>, Josh Triplett <josh@...htriplett.org>, Boqun
Feng <boqun.feng@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Lai Jiangshan
<jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>
Subject: Re: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along
with TIF_SIGPENDING
Oleg Nesterov <oleg@...hat.com> writes:
> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
>
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.
I took a look through the code. It used to be that TIF_NOTIFY_SIGNAL
was all about waking up a task so that task_work_run can be used.
io_uring still mostly uses it that way. There is also a use in
kthread_stop that just uses it as a TIF_SIGPENDING without having a
pending signal.
At the point in do_exit where exit_notify and thus zap_pid_ns_processes
is called I can't possibly see a use for TIF_NOTIFY_SIGNAL.
exit_task_work, exit_signals, and io_uring_cancel have all been called.
So TIF_NOTIFY_SIGNAL should be spurious at this point and safe to clear.
Why it remains set is a mystery to me.
If I had infinite time and energy the ideal is to rework the pid
namespace exit logic so that waiting for everything to exit works like
delay_group_leader in wait_task_consider. Simply blocking reaping of
the pid namespace leader until everything in the pid namespace have been
reaped. I think acct_exit_ns is the only piece of code that needs
to be moved to allow that, and acct_exit_ns is purely bookkeeping so
does not affect userspace visible semantics.
This active waiting is weird and non-standard in the kernel and winds up
causeing a problem every couple of years because of that.
>
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <rachelmenge@...ux.microsoft.com>
> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.microsoft.com/
> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
> ---
> kernel/pid_namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> */
> do {
> clear_thread_flag(TIF_SIGPENDING);
> + clear_thread_flag(TIF_NOTIFY_SIGNAL);
> rc = kernel_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);
Powered by blists - more mailing lists