[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202005191434.57253AD@keescook>
Date: Tue, 19 May 2020 14:40:00 -0700
From: Kees Cook <keescook@...omium.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Aleksa Sarai <cyphar@...har.com>, Jann Horn <jannh@...gle.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Tycho Andersen <tycho@...ho.ws>,
Sargun Dhillon <sargun@...gun.me>,
Matt Denton <mpdenton@...gle.com>,
Chris Palmer <palmer@...gle.com>,
Jeffrey Vander Stoep <jeffv@...gle.com>,
Linux Containers <containers@...ts.linux-foundation.org>,
Linux API <linux-api@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>,
Daniel Borkmann <daniel@...earbox.net>,
bpf <bpf@...r.kernel.org>
Subject: Re: seccomp feature development
On Tue, May 19, 2020 at 09:18:47AM -0700, Alexei Starovoitov wrote:
> On Mon, May 18, 2020 at 7:53 PM Aleksa Sarai <cyphar@...har.com> wrote:
> >
> > On 2020-05-19, Jann Horn <jannh@...gle.com> wrote:
> > > On Mon, May 18, 2020 at 11:05 PM Kees Cook <keescook@...omium.org> wrote:
> > > > ## deep argument inspection
> > > >
> > > > The argument caching bit is, I think, rather mechanical in nature since
> > > > it's all "just" internal to the kernel: seccomp can likely adjust how it
> > > > allocates seccomp_data (maybe going so far as to have it split across two
> > > > pages with the syscall argument struct always starting on the 2nd page
> > > > boundary), and copying the EA struct into that page, which will be both
> > > > used by the filter and by the syscall.
> > >
> > > We could also do the same kind of thing the eBPF verifier does in
> > > convert_ctx_accesses(), and rewrite the context accesses to actually
> > > go through two different pointers depending on the (constant) offset
> > > into seccomp_data.
> >
> > My main worry with this is that we'll need to figure out what kind of
> > offset mathematics are necessary to deal with pointers inside the
> > extensible struct. As a very ugly proposal, you could make it so that
> > you multiply the offset by PAGE_SIZE each time you want to dereference
> > the pointer at that offset (unless we want to add new opcodes to cBPF to
> > allow us to represent this).
>
> Please don't. cbpf is frozen.
https://www.youtube.com/watch?v=L0MK7qz13bU
If the only workable design paths for deep arg inspection end up needing
BPF helpers, I would agree that it's time for seccomp to grow eBPF
language support. I'm still hoping there's a clean solution that doesn't
require a seccomp language extension.
> > > We don't need to actually zero-fill memory for this beyond what the
> > > kernel supports - AFAIK the existing APIs already say that passing a
> > > short length is equivalent to passing zeroes, so we can just replace
> > > all out-of-bounds loads with zeroing registers in the filter.
> > > The tricky case is what should happen if the userspace program passes
> > > in fields that the filter doesn't know about. The filter can see the
> > > length field passed in by userspace, and then just reject anything
> > > where the length field is bigger than the structure size the filter
> > > knows about. But maybe it'd be slightly nicer if there was an
> > > operation for "tell me whether everything starting at offset X is
> > > zeroes", so that if someone compiles with newer kernel headers where
> > > the struct is bigger, and leaves the new fields zeroed, the syscalls
> > > still go through an outdated filter properly.
> >
> > I think the best way of handling this (without breaking programs
> > senselessly) is to have filters essentially emulate
> > copy_struct_from_user() semantics -- which is along the lines of what
> > you've suggested.
>
> and cpbf load instruction will become copy_from_user() underneath?
No, this was meaning internal checking about struct sizes needs to exist
(not the user copy parts).
> I don't see how that can work.
> Have you considered implications to jits, register usage, etc ?
>
> ebpf will become sleepable soon. It will be able to do copy_from_user()
> and examine any level of user pointer dereference.
> toctou is still going to be a concern though,
> but such ebpf+copy_from_user analysis and syscall sandboxing
> will not need to change kernel code base around syscalls at all.
> No need to invent E-syscalls and all the rest I've seen in this thread.
To avoid the ToCToU, the seccomp infrastructure must do the
copy_from_user(), so there's not need for the sleepable stuff in seccomp
that I can see. The question is mainly one of flattening.
--
Kees Cook
Powered by blists - more mailing lists