lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923120344.GA12377@redhat.com>
Date: Tue, 23 Sep 2025 14:03:45 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
	Demi Marie Obenour <demiobenour@...il.com>,
	Mateusz Guzik <mjguzik@...il.com>,
	Christian Brauner <brauner@...nel.org>
Cc: Linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kernel: Prevent prctl(PR_SET_PDEATHSIG) from racing with
 parent process exit

Hi Demi,

(add more CC's)

I think your patch is correct in that it fixes the problems described in
the changelog. Thanks for the detailed explanations.

But I obviously dislike the fact it adds read_lock(tasklist_lock) into
prctl(PR_SET_PDEATHSIG) ;)

For the moment lets forget about the data race. Of course you right, but
this is simple, we can add READ/WRITE_ONCE(pdeath_signal).

Lets only discuss the 2nd problem: the exiting parent can miss
child->pdeath_signal != 0 while the child can still see the old getppid().
I honestly do not know if we really care, I always thought that this API
is ancient/obsolete and "strange". If nothing else, pdeath_signal is sent
even in the case of a threaded reparent...

But OK, lets suppose we need to fix this problem. Not that it matters, but
I guess you didn't hit it in practice?

As you correctly pointed out, forget_original_parent/prctl lack the necessary
barries. So lets add the barriers instead of abusing tasklist? As for sys_prctl(),
I think that ret-to-user-mode + enter-the-kernel-mode should act as a full
barrier, so it only needs WRITE_ONCE()...

Or perhaps user-space can do something else to sync with the exiting parent
instead of using getppid() ?

Oleg.

On 09/22, Andrew Morton wrote:
>
> From: Demi Marie Obenour <demiobenour@...il.com>
> Subject: kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
> Date: Sat, 13 Sep 2025 18:28:49 -0400
>
> If a process calls prctl(PR_SET_PDEATHSIG) at the same time that the
> parent process exits, the child will write to me->pdeath_sig at the same
> time the parent is reading it.  Since there is no synchronization, this is
> a data race.
>
> Worse, it is possible that a subsequent call to getppid() can continue to
> return the previous parent process ID without the parent death signal
> being delivered.  This happens in the following scenario:
>
> parent                                                 child
>
> forget_original_parent()                               prctl(PR_SET_PDEATHSIG, SIGKILL)
>                                                          sys_prctl()
>                                                            me->pdeath_sig = SIGKILL;
>                                                        getppid();
>   RCU_INIT_POINTER(t->real_parent, reaper);
>   if (t->pdeath_signal) /* reads stale me->pdeath_sig */
>            group_send_sig_info(t->pdeath_signal, ...);
>
> And in the following:
>
> parent                                                 child
>
> forget_original_parent()
>     RCU_INIT_POINTER(t->real_parent, reaper);
>     /* also no barrier */
>      if (t->pdeath_signal) /* reads stale me->pdeath_sig */
>              group_send_sig_info(t->pdeath_signal, ...);
>
>                                                        prctl(PR_SET_PDEATHSIG, SIGKILL)
>                                                          sys_prctl()
>                                                            me->pdeath_sig = SIGKILL;
>                                                        getppid(); /* reads old ppid() */
>
> As a result, the following pattern is racy:
>
> 	pid_t parent_pid = getpid();
> 	pid_t child_pid = fork();
> 	if (child_pid == -1) {
> 		/* handle error... */
> 		return;
> 	}
> 	if (child_pid == 0) {
> 		if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
> 			/* handle error */
> 			_exit(126);
> 		}
> 		if (getppid() != parent_pid) {
> 			/* parent died already */
> 			raise(SIGKILL);
> 		}
> 		/* keep going in child */
> 	}
> 	/* keep going in parent */
>
> If the parent is killed at exactly the wrong time, the child process can
> (wrongly) stay running.
>
> I didn't manage to reproduce this in my testing, but I'm pretty sure the
> race is real.  KCSAN is probably the best way to spot the race.
>
> Fix the bug by holding tasklist_lock for reading whenever pdeath_signal is
> being written to.  This prevents races on me->pdeath_sig, and the locking
> and unlocking of the rwlock provide the needed memory barriers.  If
> prctl(PR_SET_PDEATHSIG) happens before the parent exits, the signal will
> be sent.  If it happens afterwards, a subsequent getppid() will return the
> new value.
>
> Link: https://lkml.kernel.org/r/20250913-fix-prctl-pdeathsig-race-v1-1-44e2eb426fe9@gmail.com
> Signed-off-by: Demi Marie Obenour <demiobenour@...il.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
>  kernel/sys.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> --- a/kernel/sys.c~kernel-prevent-prctlpr_set_pdeathsig-from-racing-with-parent-process-exit
> +++ a/kernel/sys.c
> @@ -2533,7 +2533,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
>  			error = -EINVAL;
>  			break;
>  		}
> +		/*
> +		 * Ensure that either:
> +		 *
> +		 * 1. Subsequent getppid() calls reflect the parent process having died.
> +		 * 2. forget_original_parent() will send the new me->pdeath_signal.
> +		 *
> +		 * Also prevent the read of me->pdeath_signal from being a data race.
> +		 */
> +		read_lock(&tasklist_lock);
>  		me->pdeath_signal = arg2;
> +		read_unlock(&tasklist_lock);
>  		break;
>  	case PR_GET_PDEATHSIG:
>  		error = put_user(me->pdeath_signal, (int __user *)arg2);
> _
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ