linux-kernel - Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220502160415.GA1289934@ircssh-3.c.rugged-nimbus-611.internal>
Date:   Mon, 2 May 2022 16:04:15 +0000
From:   Sargun Dhillon <sargun@...gun.me>
To:     Rodrigo Campos <rodrigo@...volk.io>
Cc:     Kees Cook <keescook@...omium.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        Will Drewry <wad@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Alban Crequy <alban@...volk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp
 user notifier

On Mon, May 02, 2022 at 04:15:07PM +0200, Rodrigo Campos wrote:
> On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@...gun.me> wrote:
> > diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
> > index 539e9d4a4860..204cf5ba511a 100644
> > --- a/Documentation/userspace-api/seccomp_filter.rst
> > +++ b/Documentation/userspace-api/seccomp_filter.rst
> > @@ -271,6 +271,14 @@ notifying process it will be replaced. The supervisor can also add an FD, and
> >  respond atomically by using the ``SECCOMP_ADDFD_FLAG_SEND`` flag and the return
> >  value will be the injected file descriptor number.
> >
> > +The notifying process can be preempted, resulting in the notification being
> > +aborted. This can be problematic when trying to take actions on behalf of the
> > +notifying process that are long-running and typically retryable (mounting a
> > +filesytem). Alternatively, the at filter installation time, the
> > +``SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV`` flag can be set. This flag makes it
> > +such that when a user notification is received by the supervisor, the notifying
> > +process will ignore non-fatal signals until the response is sent.
> 
> Maybe:
> 
> This flags ignores non-fatal signals that arrive after the supervisor
> received the notification
> 
> I mean, I want to make it clear that if a signal arrives before the
> notification was received by the supervisor, then it will be
> interrupted anyways.
> 
> 
Added: Signals that are sent prior to the notification being received by 
userspace are handled normally.

> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index db10e73d06e0..9291b0843cb2 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -1485,6 +1512,9 @@ static long seccomp_notify_recv(struct seccomp_filter *filter,
> >                 mutex_lock(&filter->notify_lock);
> >                 knotif = find_notification(filter, unotif.id);
> >                 if (knotif) {
> > +                       /* Reset the process to make sure it's not stuck */
> > +                       if (should_sleep_killable(filter, knotif))
> > +                               complete(&knotif->ready);
> >                         knotif->state = SECCOMP_NOTIFY_INIT;
> >                         up(&filter->notif->request);
> 
> (I couldn't git-am this locally, so maybe I'm injecting this at the
> wrong parts mentally when looking at the other code for more context.
> Sorry if that is the case :))
> 
> Why do we need to complete() only in this error path? As far as I can
> see this is on the error path where the copy to userspace failed and
> we want to reset this notification.
This error path acts as follows
(Say, S: Supervisor, P: Notifying Process, U: User)

P: 2 <-- Pid
P: getppid() // Generated notification
P: Waiting in wait_interruptible state
S: Calls receive notification, and the codepath gets up to the poin
   where it's copying the notification to userspace
U: kill -SIGURG 2 // Send a kill signal to the notifying process
P: Waiting in the wait_killable state
S: Kernel fails to copy notification into user memory, and resets
   the notification and returns an error

If we do not have the reset, the P will never return to wait interruptible.
This is the only code path that a notification can go init -> sent -> init.

> 
> I think that is wrong, we want to wake up the other side not just on
> the error path, but on the non-error path (in fact, do we want to do
> this on the error path? It seems like a no-op, but don't see any
> reason to do it).
>

It's unneccessary. Why do it? It just means we wake up a process without reason.
Wakeups are expensive.
 
> We _need_ to call complete() in the non error path here so the other
> side wakes up and switches to a killable wait. As we are not doing
> this (for the non error path), this will basically not achieve a
> wait_killable() at all.
> 
No, because here, we check if we were waiting interruptible, and
then we switch to wait_killable:
/*
 * Check to see if the notifcation got picked up and
 * whether we should switch to wait killable.
 */
if (!wait_killable && should_sleep_killable(match, &n))
	continue;

This could probably be:
if (fatal_signal_pending(current))
	break;
if (!wait_killable && should_sleep_killable(match, &n))
	continue;

But, that check for fatal_signal_pending seems to be unneccessary (or something we'll get
for free in the next iteration).

> I think this was probably an oversight adapting the patch from last
> year. Is it possble? Because it seems that in the previous version we
> sent last year[1] (if you can link them next time it will be way
> simpler :)) you had a new ioctl() and the call to complete() was
> handled there, in seccomp_notify_set_wait_killable(). Now, as this is
> part of the filter (and as I said last year, I think this way looks
> better) that call to complete() was completely forgotten.
> 
> Is it possible that this is not really working as intended, then? Am I
> missing something?
> 
> 
> Best,
> Rodrigo
> 
> 
> [1]: https://lore.kernel.org/all/20210430204939.5152-3-sargun@sargun.me/