lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 1 Jun 2020 10:55:39 +0100
From:   Edward Cree <ecree@...arflare.com>
To:     Alexander Potapenko <glider@...gle.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
CC:     Daniel Borkmann <daniel@...earbox.net>,
        Michal Kubecek <mkubecek@...e.cz>,
        Alexei Starovoitov <ast@...nel.org>,
        Dmitriy Vyukov <dvyukov@...gle.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: Self-XORing BPF registers is undefined behavior

On 29/05/2020 13:28, Alexander Potapenko wrote:> If the performance is really critical here, perhaps a better
> alternative is to introduce a BPF instruction (which could be an alias
> of BPF_XOR REG, REG) for zeroing out a register? Then different
> architectures may choose more efficient implementations for it, and
> the interpreter will be just assigning zero to the register without
> violating the C standard.If it's an alias of BPF_XOR r,r, then the interpreter will surely still
 interpret it with the XOR code.  Unless you make the interpreter
 special-case this, in which case you've added an extra branch to every
 XOR the interpreter handles :(

> Given the increased popularity of Clang in the kernel these days, I
> don't think it's a good idea for a single compiler to further diverge
> from the standard.The standard in question isn't C89, but "--std=gnu89", which is
 whatever GCC says it is :grin:
So if GCC declares that some class of optimisations are not legal under
 --std=gnu89, then they're not legal and Clang has to adapt to that.

> I wouldn't call this particular use case "extremely annoying".
To be clear, what's "annoying" is the double-bind we're in as a result
 of trying to optimise the prologue for both JITs (whose semantics are
 whatever we define eBPF to be) and the interpreter (which has to be
 implemented with reasonable efficiency as C code).

> If I understand correctly, these two instructions are only executed
> once per program.
> Are they really expected to impact performance that much?
If you have a very short program that's bound to a very frequent event,
 then they might.  But I don't have, and haven't seen, any numbers...

> I don't have evidence that such a transformation is currently possible
> for the BPF code in question, but all the building blocks are there,
> so it's probably just a matter of time.
I'm not so sure.  Consider the following sequence of BPF instructions:
    xor r0, r0
    ld r0, 42
    xor r0, r0
    exit
I hope you'll agree that at entry to the second XOR, the value of r0 is
 not indeterminate.  So any optimisation the compiler does for the first
 XOR ("oh, we don't need to fill the existing regs[] values, we can just
 use whatever's already in whichever register we allocate") will need to
 be predicated on something that makes it only happen for the first XOR.
But the place it's doing this optimisation is in the interpreter, which
 is a big fetch-execute loop.  I don't think even Clang is smart enough
 to figure out "BPF programs always start with a prologue, I'll stick in
 something that knows which prologue the prog uses and branches to a
 statically compiled, optimised version thereof".
(If Clang *is* that smart, then it's too clever by half...)

-ed

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ