[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38ff5e15-bf76-2d17-f524-3f943a5b8846@solarflare.com>
Date: Mon, 1 Jun 2020 10:55:39 +0100
From: Edward Cree <ecree@...arflare.com>
To: Alexander Potapenko <glider@...gle.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
CC: Daniel Borkmann <daniel@...earbox.net>,
Michal Kubecek <mkubecek@...e.cz>,
Alexei Starovoitov <ast@...nel.org>,
Dmitriy Vyukov <dvyukov@...gle.com>,
Networking <netdev@...r.kernel.org>
Subject: Re: Self-XORing BPF registers is undefined behavior
On 29/05/2020 13:28, Alexander Potapenko wrote:> If the performance is really critical here, perhaps a better
> alternative is to introduce a BPF instruction (which could be an alias
> of BPF_XOR REG, REG) for zeroing out a register? Then different
> architectures may choose more efficient implementations for it, and
> the interpreter will be just assigning zero to the register without
> violating the C standard.If it's an alias of BPF_XOR r,r, then the interpreter will surely still
interpret it with the XOR code. Unless you make the interpreter
special-case this, in which case you've added an extra branch to every
XOR the interpreter handles :(
> Given the increased popularity of Clang in the kernel these days, I
> don't think it's a good idea for a single compiler to further diverge
> from the standard.The standard in question isn't C89, but "--std=gnu89", which is
whatever GCC says it is :grin:
So if GCC declares that some class of optimisations are not legal under
--std=gnu89, then they're not legal and Clang has to adapt to that.
> I wouldn't call this particular use case "extremely annoying".
To be clear, what's "annoying" is the double-bind we're in as a result
of trying to optimise the prologue for both JITs (whose semantics are
whatever we define eBPF to be) and the interpreter (which has to be
implemented with reasonable efficiency as C code).
> If I understand correctly, these two instructions are only executed
> once per program.
> Are they really expected to impact performance that much?
If you have a very short program that's bound to a very frequent event,
then they might. But I don't have, and haven't seen, any numbers...
> I don't have evidence that such a transformation is currently possible
> for the BPF code in question, but all the building blocks are there,
> so it's probably just a matter of time.
I'm not so sure. Consider the following sequence of BPF instructions:
xor r0, r0
ld r0, 42
xor r0, r0
exit
I hope you'll agree that at entry to the second XOR, the value of r0 is
not indeterminate. So any optimisation the compiler does for the first
XOR ("oh, we don't need to fill the existing regs[] values, we can just
use whatever's already in whichever register we allocate") will need to
be predicated on something that makes it only happen for the first XOR.
But the place it's doing this optimisation is in the interpreter, which
is a big fetch-execute loop. I don't think even Clang is smart enough
to figure out "BPF programs always start with a prologue, I'll stick in
something that knows which prologue the prog uses and branches to a
statically compiled, optimised version thereof".
(If Clang *is* that smart, then it's too clever by half...)
-ed
Powered by blists - more mailing lists