lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 May 2020 14:56:17 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
        bpf@...r.kernel.org, linux-kernel@...r.kernel.org, x86@...nel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH] bpf: Tweak BPF jump table optimizations for objtool
 compatibility

On Fri, May 01, 2020 at 12:40:53PM -0700, Alexei Starovoitov wrote:
> On Fri, May 01, 2020 at 02:22:04PM -0500, Josh Poimboeuf wrote:
> > On Fri, May 01, 2020 at 12:09:30PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Apr 30, 2020 at 02:07:43PM -0500, Josh Poimboeuf wrote:
> > > > Objtool decodes instructions and follows all potential code branches
> > > > within a function.  But it's not an emulator, so it doesn't track
> > > > register values.  For that reason, it usually can't follow
> > > > intra-function indirect branches, unless they're using a jump table
> > > > which follows a certain format (e.g., GCC switch statement jump tables).
> > > > 
> > > > In most cases, the generated code for the BPF jump table looks a lot
> > > > like a GCC jump table, so objtool can follow it.  However, with
> > > > RETPOLINE=n, GCC keeps the jump table address in a register, and then
> > > > does 160+ indirect jumps with it.  When objtool encounters the indirect
> > > > jumps, it can't tell which jump table is being used (or even whether
> > > > they might be sibling calls instead).
> > > > 
> > > > This was fixed before by disabling an optimization in ___bpf_prog_run(),
> > > > using the "optimize" function attribute.  However, that attribute is bad
> > > > news.  It doesn't append options to the command-line arguments.  Instead
> > > > it starts from a blank slate.  And according to recent GCC documentation
> > > > it's not recommended for production use.  So revert the previous fix:
> > > > 
> > > >   3193c0836f20 ("bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()")
> > > > 
> > > > With that reverted, solve the original problem in a different way by
> > > > getting rid of the "goto select_insn" indirection, and instead just goto
> > > > the jump table directly.  This simplifies the code a bit and helps GCC
> > > > generate saner code for the jump table branches, at least in the
> > > > RETPOLINE=n case.
> > > > 
> > > > But, in the RETPOLINE=y case, this simpler code actually causes GCC to
> > > > generate far worse code, ballooning the function text size by +40%.  So
> > > > leave that code the way it was.  In fact Alexei prefers to leave *all*
> > > > the code the way it was, except where needed by objtool.  So even
> > > > non-x86 RETPOLINE=n code will continue to have "goto select_insn".
> > > > 
> > > > This stuff is crazy voodoo, and far from ideal.  But it works for now.
> > > > Eventually, there's a plan to create a compiler plugin for annotating
> > > > jump tables.  That will make this a lot less fragile.
> > > 
> > > I don't like this commit log.
> > > Here you're saying that the code recognized by objtool is sane and good
> > > whereas well optimized gcc code is somehow voodoo and bad.
> > > That is just wrong.
> > 
> > I have no idea what you're talking about.
> > 
> > Are you saying that ballooning the function text size by 40% is well
> > optimized GCC code?  It seems like a bug to me.  That's the only place I
> > said anything bad about GCC code.
> 
> It could be a bug, but did you benchmark the speed of interpreter ?
> Is it faster or slower with 40% more code ?
> Did you benchmark it on other archs ?

I thought we were in agreement that 40% text growth is bad.  Isn't that
why you wanted to keep 'goto select_insn' for the retpoline case?

If there's some other reason, let me know and I'll put it in the patch
description instead.

> > When I said "this stuff is crazy voodoo" I was referring to the patch
> > itself.  I agree it's horrible, it's only the best approach we're able
> > to come up with at the moment.
> 
> please reword it then.

Ok, so: This *patch* is crazy voodoo ?

-- 
Josh

Powered by blists - more mailing lists