netdev - Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190221192916.2mcd4fmxbdj2j2u3@ast-mbp.dhcp.thefacebook.com>
Date:   Thu, 21 Feb 2019 11:29:18 -0800
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Jann Horn <jannh@...gle.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Kees Cook <keescook@...omium.org>,
        Andy Lutomirski <luto@...capital.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption
 splat for cbpf->ebpf progs

On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@...earbox.net> wrote:
> > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > >>
> > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > >> <alexei.starovoitov@...il.com> wrote:
> > >>>
> > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > >>>> not make use of any functionality that would require this assertion. Fix this
> > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > >>>> the cant_sleep(); check.
> > >>>>
> > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@...kaller.appspotmail.com
> > >>>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> > >>>> Acked-by: Kees Cook <keescook@...omium.org>
> > >>>
> > >>> Applied, Thanks
> > >>
> > >> Actually I think it's a wrong approach to go long term.
> > >> I'm thinking to revert it.
> > >> I think it's better to disable preemption for duration of
> > >> seccomp cbpf prog.
> > >> It's short and there is really no reason for it to be preemptible.
> > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > >> Let's just disable preemption for seccomp as well.
> > >
> > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > which is convincing me there is a use-case. ;)
> > >
> > > But the main issue is that I'm not a huge fan of dropping two
> > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > something that is literally not needed right now.
> >
> > Yeah, I think it's okay to add once actually technically needed. Last
> > time I looked, if I recall correctly, at least Chrome installs some
> > heavy duty seccomp programs that go close to prog limit.
> 
> Half of that is probably because that seccomp BPF code is so
> inefficient, though.
> 
> This snippet shows that those programs constantly recheck the high
> halves of arguments:
> 
> Some of the generated code is pointless because all reachable code
> from that point on has the same outcome (the last "ret ALLOW" in the
> following sample is unreachable because they've already checked that
> the high bit of the low half is set, so the low half can't be 3):

and with ebpf these optimizations will be available for free
because llvm will remove unnecessary loads and simplify branches.
There is no technical reason not to use ebpf in seccomp.

When we discussed preemption of classic vs extended in socket filters
context we agreed to make it a requirement that preemption must be
disabled though it's not strictly necessary. RX side of socket filters
was already non-preempt while TX was preemptible.
We must not make an exception of this rule for seccomp.
Hence I've reverted this commit.

Here is the actual fix for seccomp:
From: Alexei Starovoitov <ast@...nel.org>
Date: Thu, 21 Feb 2019 10:40:14 -0800
Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog

All BPF programs must be called with preemption disabled.

Signed-off-by: Alexei Starovoitov <ast@...nel.org>
---
 kernel/seccomp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e815781ed751..a43c601ac252 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
         * All filters in the list are evaluated and the lowest BPF return
         * value always takes priority (ignoring the DATA).
         */
+       preempt_disable();
        for (; f; f = f->prev) {
                u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

@@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
                        *match = f;
                }
        }
+       preempt_enable();
        return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
--

Doing per-cpu increment of cache hot data is practically free and it makes seccomp
play by the rules.

I'm working on another set of patches that will introduce bpf program stats as:
#define BPF_RPOG_RUN(prog, ctx) ({ \
+	u32 ret;						\
+       cant_sleep();                                           \
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
+		u64 start = sched_clock();			\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+		this_cpu_inc(prog->aux->stats->cnt);		\
+		this_cpu_add(prog->aux->stats->nsecs,		\
+			     sched_clock() - start);		\
+	} else {						\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+	}							\
+	ret; })

and seccomp progs will get their stats just like socket filters and the rest of
classic and extended bpf.