[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923120344.GA12377@redhat.com>
Date: Tue, 23 Sep 2025 14:03:45 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Demi Marie Obenour <demiobenour@...il.com>,
Mateusz Guzik <mjguzik@...il.com>,
Christian Brauner <brauner@...nel.org>
Cc: Linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kernel: Prevent prctl(PR_SET_PDEATHSIG) from racing with
parent process exit
Hi Demi,
(add more CC's)
I think your patch is correct in that it fixes the problems described in
the changelog. Thanks for the detailed explanations.
But I obviously dislike the fact it adds read_lock(tasklist_lock) into
prctl(PR_SET_PDEATHSIG) ;)
For the moment lets forget about the data race. Of course you right, but
this is simple, we can add READ/WRITE_ONCE(pdeath_signal).
Lets only discuss the 2nd problem: the exiting parent can miss
child->pdeath_signal != 0 while the child can still see the old getppid().
I honestly do not know if we really care, I always thought that this API
is ancient/obsolete and "strange". If nothing else, pdeath_signal is sent
even in the case of a threaded reparent...
But OK, lets suppose we need to fix this problem. Not that it matters, but
I guess you didn't hit it in practice?
As you correctly pointed out, forget_original_parent/prctl lack the necessary
barries. So lets add the barriers instead of abusing tasklist? As for sys_prctl(),
I think that ret-to-user-mode + enter-the-kernel-mode should act as a full
barrier, so it only needs WRITE_ONCE()...
Or perhaps user-space can do something else to sync with the exiting parent
instead of using getppid() ?
Oleg.
On 09/22, Andrew Morton wrote:
>
> From: Demi Marie Obenour <demiobenour@...il.com>
> Subject: kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
> Date: Sat, 13 Sep 2025 18:28:49 -0400
>
> If a process calls prctl(PR_SET_PDEATHSIG) at the same time that the
> parent process exits, the child will write to me->pdeath_sig at the same
> time the parent is reading it. Since there is no synchronization, this is
> a data race.
>
> Worse, it is possible that a subsequent call to getppid() can continue to
> return the previous parent process ID without the parent death signal
> being delivered. This happens in the following scenario:
>
> parent child
>
> forget_original_parent() prctl(PR_SET_PDEATHSIG, SIGKILL)
> sys_prctl()
> me->pdeath_sig = SIGKILL;
> getppid();
> RCU_INIT_POINTER(t->real_parent, reaper);
> if (t->pdeath_signal) /* reads stale me->pdeath_sig */
> group_send_sig_info(t->pdeath_signal, ...);
>
> And in the following:
>
> parent child
>
> forget_original_parent()
> RCU_INIT_POINTER(t->real_parent, reaper);
> /* also no barrier */
> if (t->pdeath_signal) /* reads stale me->pdeath_sig */
> group_send_sig_info(t->pdeath_signal, ...);
>
> prctl(PR_SET_PDEATHSIG, SIGKILL)
> sys_prctl()
> me->pdeath_sig = SIGKILL;
> getppid(); /* reads old ppid() */
>
> As a result, the following pattern is racy:
>
> pid_t parent_pid = getpid();
> pid_t child_pid = fork();
> if (child_pid == -1) {
> /* handle error... */
> return;
> }
> if (child_pid == 0) {
> if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
> /* handle error */
> _exit(126);
> }
> if (getppid() != parent_pid) {
> /* parent died already */
> raise(SIGKILL);
> }
> /* keep going in child */
> }
> /* keep going in parent */
>
> If the parent is killed at exactly the wrong time, the child process can
> (wrongly) stay running.
>
> I didn't manage to reproduce this in my testing, but I'm pretty sure the
> race is real. KCSAN is probably the best way to spot the race.
>
> Fix the bug by holding tasklist_lock for reading whenever pdeath_signal is
> being written to. This prevents races on me->pdeath_sig, and the locking
> and unlocking of the rwlock provide the needed memory barriers. If
> prctl(PR_SET_PDEATHSIG) happens before the parent exits, the signal will
> be sent. If it happens afterwards, a subsequent getppid() will return the
> new value.
>
> Link: https://lkml.kernel.org/r/20250913-fix-prctl-pdeathsig-race-v1-1-44e2eb426fe9@gmail.com
> Signed-off-by: Demi Marie Obenour <demiobenour@...il.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
> kernel/sys.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> --- a/kernel/sys.c~kernel-prevent-prctlpr_set_pdeathsig-from-racing-with-parent-process-exit
> +++ a/kernel/sys.c
> @@ -2533,7 +2533,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
> error = -EINVAL;
> break;
> }
> + /*
> + * Ensure that either:
> + *
> + * 1. Subsequent getppid() calls reflect the parent process having died.
> + * 2. forget_original_parent() will send the new me->pdeath_signal.
> + *
> + * Also prevent the read of me->pdeath_signal from being a data race.
> + */
> + read_lock(&tasklist_lock);
> me->pdeath_signal = arg2;
> + read_unlock(&tasklist_lock);
> break;
> case PR_GET_PDEATHSIG:
> error = put_user(me->pdeath_signal, (int __user *)arg2);
> _
>
Powered by blists - more mailing lists