netdev - Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190221203613.q6k757fi3wxtoj5y@ast-mbp.dhcp.thefacebook.com>
Date:   Thu, 21 Feb 2019 12:36:15 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Kees Cook <keescook@...omium.org>
Cc:     Jann Horn <jannh@...gle.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andy Lutomirski <luto@...capital.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption
 splat for cbpf->ebpf progs

On Thu, Feb 21, 2019 at 11:53:06AM -0800, Kees Cook wrote:
> On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> > > > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@...il.com> wrote:
> > > > >>
> > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > > > >> <alexei.starovoitov@...il.com> wrote:
> > > > >>>
> > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > > > >>>> not make use of any functionality that would require this assertion. Fix this
> > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > > > >>>> the cant_sleep(); check.
> > > > >>>>
> > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@...kaller.appspotmail.com
> > > > >>>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> > > > >>>> Acked-by: Kees Cook <keescook@...omium.org>
> > > > >>>
> > > > >>> Applied, Thanks
> > > > >>
> > > > >> Actually I think it's a wrong approach to go long term.
> > > > >> I'm thinking to revert it.
> > > > >> I think it's better to disable preemption for duration of
> > > > >> seccomp cbpf prog.
> > > > >> It's short and there is really no reason for it to be preemptible.
> > > > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > > > >> Let's just disable preemption for seccomp as well.
> > > > >
> > > > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > > > which is convincing me there is a use-case. ;)
> > > > >
> > > > > But the main issue is that I'm not a huge fan of dropping two
> > > > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > > > something that is literally not needed right now.
> > > >
> > > > Yeah, I think it's okay to add once actually technically needed. Last
> > > > time I looked, if I recall correctly, at least Chrome installs some
> > > > heavy duty seccomp programs that go close to prog limit.
> > >
> > > Half of that is probably because that seccomp BPF code is so
> > > inefficient, though.
> > >
> > > This snippet shows that those programs constantly recheck the high
> > > halves of arguments:
> > >
> > > Some of the generated code is pointless because all reachable code
> > > from that point on has the same outcome (the last "ret ALLOW" in the
> > > following sample is unreachable because they've already checked that
> > > the high bit of the low half is set, so the low half can't be 3):
> >
> > and with ebpf these optimizations will be available for free
> > because llvm will remove unnecessary loads and simplify branches.
> > There is no technical reason not to use ebpf in seccomp.
> >
> > When we discussed preemption of classic vs extended in socket filters
> > context we agreed to make it a requirement that preemption must be
> > disabled though it's not strictly necessary. RX side of socket filters
> > was already non-preempt while TX was preemptible.
> > We must not make an exception of this rule for seccomp.
> > Hence I've reverted this commit.
> >
> > Here is the actual fix for seccomp:
> > From: Alexei Starovoitov <ast@...nel.org>
> > Date: Thu, 21 Feb 2019 10:40:14 -0800
> > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog
> >
> > All BPF programs must be called with preemption disabled.
> >
> > Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> > ---
> >  kernel/seccomp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index e815781ed751..a43c601ac252 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >          * All filters in the list are evaluated and the lowest BPF return
> >          * value always takes priority (ignoring the DATA).
> >          */
> > +       preempt_disable();
> >         for (; f; f = f->prev) {
> >                 u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> >
> > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >                         *match = f;
> >                 }
> >         }
> > +       preempt_enable();
> >         return ret;
> >  }
> >  #endif /* CONFIG_SECCOMP_FILTER */
> > --
> >
> > Doing per-cpu increment of cache hot data is practically free and it makes seccomp
> > play by the rules.
> 
> Other accesses should dominate the run time, yes. I'm still not a big
> fan of unconditionally adding this, but I won't NAK. :P

Thank you.

I also would like to touch on your comment:
"A lot of changes will be needed for seccomp ebpf"
There were two attempts to add it in the past and the patches were
small and straightforward.
If I recall correctly both times you nacked them because performance gains
and ease of use arguments were not convincing enough, right?
Are you still not convinced ?