[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGudoHED4nx8QT-yw-zdcUApUyvt2HCOR9c3SQ3tAm9J7Q1jEQ@mail.gmail.com>
Date: Tue, 23 Sep 2025 15:39:06 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Demi Marie Obenour <demiobenour@...il.com>,
Christian Brauner <brauner@...nel.org>,
Linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kernel: Prevent prctl(PR_SET_PDEATHSIG) from racing with
parent process exit
On Tue, Sep 23, 2025 at 2:05 PM Oleg Nesterov <oleg@...hat.com> wrote:
> As you correctly pointed out, forget_original_parent/prctl lack the necessary
> barries. So lets add the barriers instead of abusing tasklist? As for sys_prctl(),
> I think that ret-to-user-mode + enter-the-kernel-mode should act as a full
> barrier, so it only needs WRITE_ONCE()...
>
So I looked over this and I think I see why you are not eager to fix
the problem to begin with. ;)
I agree with reluctance to take tasklist lock to handle
PR_SET_PDEATHSIG, but I wonder if in practice this is used rarely
enough that the lock trip would not be a problem? It avoids any
modifications to the exit codepath.
By barriers I presume you meant smp_mb() between
RCU_INIT_POINTER(t->real_parent, reaper) and
READ_ONCE(t->pdeath_signal) in forget_original_parent. That's very
nasty as the full fence is quite expensive. This could be done with
just one fence for the entire call by iterating the list twice, but
that's still preferably avoided.
> Or perhaps user-space can do something else to sync with the exiting parent
> instead of using getppid() ?
>
I never put any thought concerning this mechanism, I do think it
nicely showcases the prctl at hand is kind of crap. The non-crap
version would pass the PID you think your parent is, so that you do
this race-free. I don't know if makes any sense to add this.
I'm wondering if the fact that tasklist is write-locked in that code
path could be utilized to synchronize this in a matter other than
taking it.
pseudo-code wise, something like this:
WRITE_ONCE(me->pdeath_signal, arg2);
/* publish the above store and load the lock after */
smb_mb();
/* here spin waiting until tasklist_lock is not write-locked */
smb_rmb();
Unless I'm missing something this should provide the guarantee you see
the updated parent, if any.
I don't see a routine to do it though and knowing memory barriers
there might be some bullshit hiding there making this not work, so not
my first choice unless someone with more memory barrier clue can chime
in.
>
> On 09/22, Andrew Morton wrote:
> >
> > From: Demi Marie Obenour <demiobenour@...il.com>
> > Subject: kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
> > Date: Sat, 13 Sep 2025 18:28:49 -0400
> >
> > If a process calls prctl(PR_SET_PDEATHSIG) at the same time that the
> > parent process exits, the child will write to me->pdeath_sig at the same
> > time the parent is reading it. Since there is no synchronization, this is
> > a data race.
> >
> > Worse, it is possible that a subsequent call to getppid() can continue to
> > return the previous parent process ID without the parent death signal
> > being delivered. This happens in the following scenario:
> >
> > parent child
> >
> > forget_original_parent() prctl(PR_SET_PDEATHSIG, SIGKILL)
> > sys_prctl()
> > me->pdeath_sig = SIGKILL;
> > getppid();
> > RCU_INIT_POINTER(t->real_parent, reaper);
> > if (t->pdeath_signal) /* reads stale me->pdeath_sig */
> > group_send_sig_info(t->pdeath_signal, ...);
> >
> > And in the following:
> >
> > parent child
> >
> > forget_original_parent()
> > RCU_INIT_POINTER(t->real_parent, reaper);
> > /* also no barrier */
> > if (t->pdeath_signal) /* reads stale me->pdeath_sig */
> > group_send_sig_info(t->pdeath_signal, ...);
> >
> > prctl(PR_SET_PDEATHSIG, SIGKILL)
> > sys_prctl()
> > me->pdeath_sig = SIGKILL;
> > getppid(); /* reads old ppid() */
> >
> > As a result, the following pattern is racy:
> >
> > pid_t parent_pid = getpid();
> > pid_t child_pid = fork();
> > if (child_pid == -1) {
> > /* handle error... */
> > return;
> > }
> > if (child_pid == 0) {
> > if (prctl(PR_SET_PDEATHSIG, SIGKILL) != 0) {
> > /* handle error */
> > _exit(126);
> > }
> > if (getppid() != parent_pid) {
> > /* parent died already */
> > raise(SIGKILL);
> > }
> > /* keep going in child */
> > }
> > /* keep going in parent */
> >
> > If the parent is killed at exactly the wrong time, the child process can
> > (wrongly) stay running.
> >
> > I didn't manage to reproduce this in my testing, but I'm pretty sure the
> > race is real. KCSAN is probably the best way to spot the race.
> >
> > Fix the bug by holding tasklist_lock for reading whenever pdeath_signal is
> > being written to. This prevents races on me->pdeath_sig, and the locking
> > and unlocking of the rwlock provide the needed memory barriers. If
> > prctl(PR_SET_PDEATHSIG) happens before the parent exits, the signal will
> > be sent. If it happens afterwards, a subsequent getppid() will return the
> > new value.
> >
> > Link: https://lkml.kernel.org/r/20250913-fix-prctl-pdeathsig-race-v1-1-44e2eb426fe9@gmail.com
> > Signed-off-by: Demi Marie Obenour <demiobenour@...il.com>
> > Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> > ---
> >
> > kernel/sys.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > --- a/kernel/sys.c~kernel-prevent-prctlpr_set_pdeathsig-from-racing-with-parent-process-exit
> > +++ a/kernel/sys.c
> > @@ -2533,7 +2533,17 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
> > error = -EINVAL;
> > break;
> > }
> > + /*
> > + * Ensure that either:
> > + *
> > + * 1. Subsequent getppid() calls reflect the parent process having died.
> > + * 2. forget_original_parent() will send the new me->pdeath_signal.
> > + *
> > + * Also prevent the read of me->pdeath_signal from being a data race.
> > + */
> > + read_lock(&tasklist_lock);
> > me->pdeath_signal = arg2;
> > + read_unlock(&tasklist_lock);
> > break;
> > case PR_GET_PDEATHSIG:
> > error = put_user(me->pdeath_signal, (int __user *)arg2);
> > _
> >
>
Powered by blists - more mailing lists