lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 2 Jun 2020 10:32:16 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Alexander Potapenko <glider@...gle.com>
Cc:     Edward Cree <ecree@...arflare.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Michal Kubecek <mkubecek@...e.cz>,
        Alexei Starovoitov <ast@...nel.org>,
        Dmitriy Vyukov <dvyukov@...gle.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: Self-XORing BPF registers is undefined behavior

On Tue, Jun 02, 2020 at 03:31:40PM +0200, Alexander Potapenko wrote:
> 
> So we're back to the question how much people care about the
> interpreter performance.

No. The question is whether people care about UB fuzzing.

> 
> > > I don't have evidence that such a transformation is currently possible
> > > for the BPF code in question, but all the building blocks are there,
> > > so it's probably just a matter of time.
> > I'm not so sure.  Consider the following sequence of BPF instructions:
> >     xor r0, r0
> >     ld r0, 42
> >     xor r0, r0
> >     exit
> > I hope you'll agree that at entry to the second XOR, the value of r0 is
> >  not indeterminate.  So any optimisation the compiler does for the first
> >  XOR ("oh, we don't need to fill the existing regs[] values, we can just
> >  use whatever's already in whichever register we allocate") will need to
> >  be predicated on something that makes it only happen for the first XOR.
> > But the place it's doing this optimisation is in the interpreter, which
> >  is a big fetch-execute loop.  I don't think even Clang is smart enough
> >  to figure out "BPF programs always start with a prologue, I'll stick in
> >  something that knows which prologue the prog uses and branches to a
> >  statically compiled, optimised version thereof".
> > (If Clang *is* that smart, then it's too clever by half...)
> 
> I don't think Clang does this at the moment, but I can certainly
> imagine it unrolling the first two iterations of the interpretation
> loop (since the prologue instructions are known at compile time) and
> replacing them with explicit XOR instructions.
> 
> I however suggest we get to the practical question of how to deal with
> this issue in the long run.
> Currently these error reports prevent us from testing BPF with syzbot,
> so we're using https://github.com/google/kmsan/commit/69b987d53462a7f3b5a41c62eb731340b53165f8
> to work around them.
> This seems to be the easiest fix that doesn't affect JIT performance
> and removes the UB.
> 
> Once KMSAN makes it to the mainline, we'll need to either come up with
> a similar fix, or disable fuzzing BPF, which isn't what we want.

Disabling fuzzing altogether would be unfortunate,
but disabling UB fuzzing is perfectly fine.
I don't think UB fuzzer scratched the surface yet.
See fixup_bpf_calls() for example. It's the same UB due to uninit regs.
And I have to add the same Nack on messing with it.
There could be other places in bpf codegen. We're not going to
chase them one by one because standard says it's UB and interpreter
is written in C. The target for bpf codegen is JITs. JITs on some arch
optimize mov r,0 into xor. Some dont. bpf interpreter is simulating hw.
If gcc/clang will ever mess it with it to the point that this UB
will become an issue we will rewrite certain pieces of interpreter in asm.
For now if you want UB fuzzer running in your environment please add
_out_of_tree_ patch that inits all interpreter registers to zero.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ