netdev - Re: Self-XORing BPF registers is undefined behavior

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG_fn=XR_dRG4vpo-jDS1L-LFD8pkuL8yWaTWbJAAQ679C3big@mail.gmail.com>
Date:   Tue, 2 Jun 2020 15:31:40 +0200
From:   Alexander Potapenko <glider@...gle.com>
To:     Edward Cree <ecree@...arflare.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Michal Kubecek <mkubecek@...e.cz>,
        Alexei Starovoitov <ast@...nel.org>,
        Dmitriy Vyukov <dvyukov@...gle.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: Self-XORing BPF registers is undefined behavior

On Mon, Jun 1, 2020 at 11:55 AM Edward Cree <ecree@...arflare.com> wrote:
>
> If it's an alias of BPF_XOR r,r, then the interpreter will surely still
>  interpret it with the XOR code.  Unless you make the interpreter
>  special-case this, in which case you've added an extra branch to every
>  XOR the interpreter handles :(

You are right, the interpreter can be probably fixed in a different
way that doesn't introduce extra branches.

> > Given the increased popularity of Clang in the kernel these days, I
> > don't think it's a good idea for a single compiler to further diverge
> > from the standard.The standard in question isn't C89, but "--std=gnu89", which is
>  whatever GCC says it is :grin:
> So if GCC declares that some class of optimisations are not legal under
>  --std=gnu89, then they're not legal and Clang has to adapt to that.

Hm, I never thought of such an approach to standards. This sounds like
a legitimate solution, which will however result in:
 - GCC developers saying "no, we won't do this", or
 - Clang developers saying "no, we won't do this"
, unless we present them with good reasons to do so.

Because the use-case is quite narrow, and prohibiting this
optimization may break other optimizations (at least in Clang), I
won't expect anyone spending time on making Clang compliant with
--std=gnu89 in this particular case.

> > I wouldn't call this particular use case "extremely annoying".
> To be clear, what's "annoying" is the double-bind we're in as a result
>  of trying to optimise the prologue for both JITs (whose semantics are
>  whatever we define eBPF to be) and the interpreter (which has to be
>  implemented with reasonable efficiency as C code).

> > If I understand correctly, these two instructions are only executed
> > once per program.
> > Are they really expected to impact performance that much?
> If you have a very short program that's bound to a very frequent event,
>  then they might.  But I don't have, and haven't seen, any numbers...

So we're back to the question how much people care about the
interpreter performance.

> > I don't have evidence that such a transformation is currently possible
> > for the BPF code in question, but all the building blocks are there,
> > so it's probably just a matter of time.
> I'm not so sure.  Consider the following sequence of BPF instructions:
>     xor r0, r0
>     ld r0, 42
>     xor r0, r0
>     exit
> I hope you'll agree that at entry to the second XOR, the value of r0 is
>  not indeterminate.  So any optimisation the compiler does for the first
>  XOR ("oh, we don't need to fill the existing regs[] values, we can just
>  use whatever's already in whichever register we allocate") will need to
>  be predicated on something that makes it only happen for the first XOR.
> But the place it's doing this optimisation is in the interpreter, which
>  is a big fetch-execute loop.  I don't think even Clang is smart enough
>  to figure out "BPF programs always start with a prologue, I'll stick in
>  something that knows which prologue the prog uses and branches to a
>  statically compiled, optimised version thereof".
> (If Clang *is* that smart, then it's too clever by half...)

I don't think Clang does this at the moment, but I can certainly
imagine it unrolling the first two iterations of the interpretation
loop (since the prologue instructions are known at compile time) and
replacing them with explicit XOR instructions.

I however suggest we get to the practical question of how to deal with
this issue in the long run.
Currently these error reports prevent us from testing BPF with syzbot,
so we're using https://github.com/google/kmsan/commit/69b987d53462a7f3b5a41c62eb731340b53165f8
to work around them.
This seems to be the easiest fix that doesn't affect JIT performance
and removes the UB.

Once KMSAN makes it to the mainline, we'll need to either come up with
a similar fix, or disable fuzzing BPF, which isn't what we want.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg