linux-kernel - Re: [seccomp] Request for a "enable on execve" mode for Seccomp filters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201028185011.GF534@brightrain.aerifal.cx>
Date:   Wed, 28 Oct 2020 14:50:11 -0400
From:   Rich Felker <dalias@...c.org>
To:     Jann Horn <jannh@...gle.com>
Cc:     Camille Mougey <commial@...il.com>,
        Kees Cook <keescook@...omium.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Tycho Andersen <tycho@...ho.pizza>,
        Sargun Dhillon <sargun@...gun.me>,
        Christian Brauner <christian.brauner@...ntu.com>,
        "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
        Denis Efremov <efremov@...ux.com>,
        Andy Lutomirski <luto@...nel.org>
Subject: Re: [seccomp] Request for a "enable on execve" mode for Seccomp
 filters

On Wed, Oct 28, 2020 at 07:39:41PM +0100, Jann Horn wrote:
> On Wed, Oct 28, 2020 at 7:35 PM Rich Felker <dalias@...c.org> wrote:
> > On Wed, Oct 28, 2020 at 07:25:45PM +0100, Jann Horn wrote:
> > > On Wed, Oct 28, 2020 at 6:52 PM Rich Felker <dalias@...c.org> wrote:
> > > > On Wed, Oct 28, 2020 at 06:34:56PM +0100, Jann Horn wrote:
> > > > > On Wed, Oct 28, 2020 at 5:49 PM Rich Felker <dalias@...c.org> wrote:
> > > > > > On Wed, Oct 28, 2020 at 01:42:13PM +0100, Jann Horn wrote:
> > > > > > > On Wed, Oct 28, 2020 at 12:18 PM Camille Mougey <commial@...il.com> wrote:
> > > > > > > You're just focusing on execve() - I think it's important to keep in
> > > > > > > mind what happens after execve() for normal, dynamically-linked
> > > > > > > binaries: The next step is that the dynamic linker runs, and it will
> > > > > > > poke around in the file system with access() and openat() and fstat(),
> > > > > > > it will mmap() executable libraries into memory, it will mprotect()
> > > > > > > some memory regions, it will set up thread-local storage (e.g. using
> > > > > > > arch_prctl(); even if the process is single-threaded), and so on.
> > > > > > >
> > > > > > > The earlier you install the seccomp filter, the more of these steps
> > > > > > > you have to permit in the filter. And if you want the filter to take
> > > > > > > effect directly after execve(), the syscalls you'll be forced to
> > > > > > > permit are sufficient to cobble something together in userspace that
> > > > > > > effectively does almost the same thing as execve().
> > > > > >
> > > > > > I would assume you use SECCOMP_RET_USER_NOTIF to implement policy for
> > > > > > controlling these operations and allowing only the ones that are valid
> > > > > > during dynamic linking. This also allows you to defer application of
> > > > > > the filter until after execve. So unless I'm missing some reason why
> > > > > > this doesn't work, I think the requested functionality is already
> > > > > > available.
> > > > >
> > > > > Ah, yeah, good point.
> > > > >
> > > > > > If you really just want the "activate at exec" behavior, it might be
> > > > > > possible (depending on how SECCOMP_RET_USER_NOTIF behaves when there's
> > > > > > no notify fd open; I forget)
> > > > >
> > > > > syscall returns -ENOSYS. Yeah, that'd probably do the job. (Even
> > > > > though it might be a bit nicer if userspace had control over the errno
> > > > > there, such that it could be EPERM instead... oh well.)
> > > >
> > > > EPERM is a major bug in current sandbox implementations, so ENOSYS is
> > > > at least mildly better, but indeed it should be controllable, probably
> > > > by allowing a code path for the BPF to continue with a jump to a
> > > > different logic path if the notify listener is missing.
> > >
> > > I guess we might be able to expose the listener status through a bit /
> > > a field in the struct seccomp_data, and then filters could branch on
> > > that. (And the kernel would run the filter twice if we raced with
> > > filter detachment.) I don't know whether it would look pretty, but I
> > > think it should be doable...
> >
> > I was thinking the race wouldn't be salvagable, but indeed since the
> > filter is side-effect-free you can just re-run it if the status
> > changes between start of filter processing and the attempt at
> > notification. This sounds like it should work.
> >
> > I guess it's not possible to chain two BPF filters to do this, because
> > that only works when the first one allows? Or am I misunderstanding
> > the multiple-filters case entirely? (I've never gotten that far with
> > programming it.)
> 
> I'm not sure if I'm understanding the question correctly...
> At the moment you basically can't have multiple filters with notifiers.
> The rule with multiple filters is always that all the filters get run,
> and the actual action taken is the most restrictive result of all of
> them.

I probably just don't understand how multiple filters work then, which
is pretty much what I expected. But in any case it seems correct that
they're not a tool for solving the problem here.

Rich