lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 May 2020 14:28:48 +0200
From:   Alexander Potapenko <glider@...gle.com>
To:     Edward Cree <ecree@...arflare.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Michal Kubecek <mkubecek@...e.cz>,
        Alexei Starovoitov <ast@...nel.org>,
        Dmitriy Vyukov <dvyukov@...gle.com>,
        Networking <netdev@...r.kernel.org>
Subject: Re: Self-XORing BPF registers is undefined behavior

On Fri, May 29, 2020 at 2:17 AM Edward Cree <ecree@...arflare.com> wrote:
>
> On 28/05/2020 17:00, Alexei Starovoitov wrote:
> > xoring of two identical values is undefined in standard?
> I believe it is in this case, yes; even without the complication
>  of array references that happen to alias, Alexander's foo1() is
>  undefined behaviour under C89 (and also C99 which handles the
>  case differently).
>
> From the definitions section (1.6) of the C89 draft [1]:
> > * Undefined behavior --- behavior, upon use of a nonportable or
> > erroneous program construct, of erroneous data, or of
> > indeterminately-valued objects, for which the Standard imposes
> > no requirements.
> And from 3.5.7 'Initialization':
> > If an object that has automatic storage duration is not
> > initialized explicitly, its value is indeterminate.
> Since the standard doesn't say anything about self-XORing that
>  could make it 'special' in this regard, the compiler isn't
>  required to notice that it's a self-XOR, and (in the tradition
>  of compiler-writers the world over) is entitled to optimise the
>  program based on the assumption that the programmer has not
>  committed UB, so in the foo1() example would be strictly within
>  its rights to generate a binary that contained no XOR
>  instruction at all.  UB, as you surely know, isn't guaranteed to
>  do something 'sensible'.
> And in the BPF example, if the compiler at some point manages to
>  statically figure out that regs[insn->dst_reg] is uninitialised,
>  it might say "hey, I can just grab any old free register and
>  declare that that's now regs[insn->dst_reg] without filling it.
>  And then it can do the same for regs[insn->src_reg], or heck,
>  even choose to fill that one (this is now legal even though the
>  pointers alias, because you already committed UB), and do a xor
>  with different regs and produce garbage results.

Thanks for this writeup!

> Is this annoying?  Extremely; the XOR-clearing _would_ be fine
>  if the standard had chosen to define things differently (e.g.
>  it's fine under a hypothetical 'C99 but uninitialised auto
>  variables have unspecified rather than indeterminate values').

I wouldn't call this particular use case "extremely annoying". I think
so far this is the only case of initializing something by XOR we've
seen with both MSan and KMSAN.

> I can't see a way to work around it that doesn't have a possible
>  performance cost (alternatives to Alexander's MOV_IMM 0 include
>  initialising regs[BPF_REG_A] and regs[BPF_REG_X] in PROG_NAME
>  and PROG_NAME_ARGS), although there is the question of whether
>  anyone who cares about performance (or security) will be using
>  BPF without the JIT anyway.

If I understand correctly, these two instructions are only executed
once per program.
Are they really expected to impact performance that much?

It's also an interesting question whether the JIT compiler emits
consistently better code for BPF_XOR than for MOV_IMM 0 on every
architecture - while "xorl %rax, %rax" is probably shorter and faster
on X86, on ARM a better alternative would be "mov w0, wzr".
If the performance is really critical here, perhaps a better
alternative is to introduce a BPF instruction (which could be an alias
of BPF_XOR REG, REG) for zeroing out a register? Then different
architectures may choose more efficient implementations for it, and
the interpreter will be just assigning zero to the register without
violating the C standard.

> But I don't think "Alexandar has to do the data-flow analysis in
>  KMSAN" is the right answer; KMSAN's diagnostic here is _correct_
>  in that ___bpf_prog_run() invokes UB on this XOR.
> Now, since it would be rather difficult and pointless for the
>  compiler to statically prove that the reg is uninitialised (it
>  would need to generate a special code-path just for this one
>  case)

The godbolt link above actually shows a case (foo3()) in which Clang
knows that a local is uninitialized and transforms it into a special
`undef` value that can be then e.g. passed around and take part in
various optimizations, making them more efficient.
I don't have evidence that such a transformation is currently possible
for the BPF code in question, but all the building blocks are there,
so it's probably just a matter of time.

>, maybe the best thing to do is to get GCC folks to bless
>  this usage (perhaps defining uninitialised variables to have
>  what C99 would call an unspecified value), at which point it
>  becomes defined under the "gnu89" pseudo-standard which is what
>  we compile the kernel with.

Given the increased popularity of Clang in the kernel these days, I
don't think it's a good idea for a single compiler to further diverge
from the standard. Again, this code pattern doesn't really seem to be
popular enough to justify such a change.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Powered by blists - more mailing lists