[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXIGZq18Bb6LK1qt@tycho.pizza>
Date: Thu, 7 Dec 2023 10:52:38 -0700
From: Tycho Andersen <tycho@...ho.pizza>
To: Christian Brauner <brauner@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
Tycho Andersen <tandersen@...flix.com>,
Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders
On Thu, Dec 07, 2023 at 06:21:18PM +0100, Christian Brauner wrote:
> [Cc fsdevel & Jan because we had some discussions about fanotify
> returning non-thread-group pidfds. That's just for awareness or in case
> this might need special handling.]
>
> On Thu, Nov 30, 2023 at 09:39:44AM -0700, Tycho Andersen wrote:
> > From: Tycho Andersen <tandersen@...flix.com>
> >
> > We are using the pidfd family of syscalls with the seccomp userspace
> > notifier. When some thread triggers a seccomp notification, we want to do
> > some things to its context (munge fd tables via pidfd_getfd(), maybe write
> > to its memory, etc.). However, threads created with ~CLONE_FILES or
> > ~CLONE_VM mean that we can't use the pidfd family of syscalls for this
> > purpose, since their fd table or mm are distinct from the thread group
> > leader's. In this patch, we relax this restriction for pidfd_open().
> >
> > In order to avoid dangling poll() users we need to notify pidfd waiters
> > when individual threads die, but once we do that all the other machinery
> > seems to work ok viz. the tests. But I suppose there are more cases than
> > just this one.
> >
> > Another weirdness is the open-coding of this vs. exporting using
> > do_notify_pidfd(). This particular location is after __exit_signal() is
> > called, which does __unhash_process() which kills ->thread_pid, so we need
> > to use the copy we have locally, vs do_notify_pid() which accesses it via
> > task_pid(). Maybe this suggests that the notification should live somewhere
> > in __exit_signals()? I just put it here because I saw we were already
> > testing if this task was the leader.
> >
> > Signed-off-by: Tycho Andersen <tandersen@...flix.com>
> > ---
>
> So we've always said that if there's a use-case for this then we're
> willing to support it. And I think that stance hasn't changed. I know
> that others have expressed interest in this as well.
>
> So currently the series only enables pidfds for threads to be created
> and allows notifications for threads. But all places that currently make
> use of pidfds refuse non-thread-group leaders. We can certainly proceed
> with a patch series that only enables creation and exit notification but
> we should also consider unlocking additional functionality:
>
> * audit of all callers that use pidfd_get_task()
>
> (1) process_madvise()
> (2) process_mrlease()
>
> I expect that both can handle threads just fine but we'd need an Ack
> from mm people.
>
> * pidfd_prepare() is used to create pidfds for:
>
> (1) CLONE_PIDFD via clone() and clone3()
> (2) SCM_PIDFD and SO_PEERPIDFD
> (3) fanotify
>
> (1) is what this series here is about.
>
> For (2) we need to check whether fanotify would be ok to handle pidfds
> for threads. It might be fine but Jan will probably know more.
>
> For (3) the change doesn't matter because SCM_CREDS always use the
> thread-group leader. So even if we allowed the creation of pidfds for
> threads it wouldn't matter.
> * audit all callers of pidfd_pid() whether they could simply be switched
> to handle individual threads:
>
> (1) setns() handles threads just fine so this is safe to allow.
> (2) pidfd_getfd() I would like to keep restricted and essentially
> freeze new features for it.
>
> I'm not happy that we did didn't just implement it as an ioctl to
> the seccomp notifier. And I wouldn't oppose a patch that would add
> that functionality to the seccomp notifier itself. But that's a
> separate topic.
> (3) pidfd_send_signal(). I think that one is the most interesting on
> to allow signaling individual threads. I'm not sure that you need
> to do this right now in this patch but we need to think about what
> we want to do there.
This all sounds reasonable to me, I can take a look as time permits.
pidfd_send_signal() at the very least would have been useful while
writing these tests.
Tycho
Powered by blists - more mailing lists