[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQKWDvzsvyjGoFvSQV7VGr2hF2zzCsC9vnpncWMxOJWYdw@mail.gmail.com>
Date: Mon, 8 Jul 2019 15:49:33 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, Song Liu <songliubraving@...com>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Kairui Song <kasong@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/urgent] bpf: Fix ORC unwinding in non-JIT BPF code
On Mon, Jul 8, 2019 at 3:38 PM Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>
> On Mon, Jul 08, 2019 at 03:15:37PM -0700, Alexei Starovoitov wrote:
> > > 2)
> > >
> > > After doing the first optimization, GCC then does another one which is
> > > a little trickier. It replaces:
> > >
> > > select_insn:
> > > jmp *jumptable(, %rax, 8)
> > > ...
> > > ALU64_ADD_X:
> > > ...
> > > jmp *jumptable(, %rax, 8)
> > > ALU_ADD_X:
> > > ...
> > > jmp *jumptable(, %rax, 8)
> > >
> > > with
> > >
> > > select_insn:
> > > mov jumptable, %r12
> > > jmp *(%r12, %rax, 8)
> > > ...
> > > ALU64_ADD_X:
> > > ...
> > > jmp *(%r12, %rax, 8)
> > > ALU_ADD_X:
> > > ...
> > > jmp *(%r12, %rax, 8)
> > >
> > > The problem is that it only moves the jumptable address into %r12
> > > once, for the entire function, then it goes through multiple recursive
> > > indirect jumps which rely on that %r12 value. But objtool isn't yet
> > > smart enough to be able to track the value across multiple recursive
> > > indirect jumps through the jump table.
> > >
> > > After some digging I found that the quick and easy fix is to disable
> > > -fgcse. In fact, this seems to be recommended by the GCC manual, for
> > > code like this:
> > >
> > > -fgcse
> > > Perform a global common subexpression elimination pass. This
> > > pass also performs global constant and copy propagation.
> > >
> > > Note: When compiling a program using computed gotos, a GCC
> > > extension, you may get better run-time performance if you
> > > disable the global common subexpression elimination pass by
> > > adding -fno-gcse to the command line.
> > >
> > > Enabled at levels -O2, -O3, -Os.
> > >
> > > This code indeed relies extensively on computed gotos. I don't know
> > > *why* disabling this optimization would improve performance. In fact
> > > I really don't see how it could make much of a difference either way.
> > >
> > > Anyway, using -fno-gcse makes optimization #2 go away and makes
> > > objtool happy, with only a fix for #1 needed.
> > >
> > > If -fno-gcse isn't an option, we might be able to fix objtool by using
> > > the "first_jump_src" thing which Peter added, improving it such that
> > > it also takes table jumps into account.
> >
> > Sorry for delay. I'm mostly offgrid until next week.
> > As far as -fno-gcse.. I don't mind as long as it doesn't hurt performance.
> > Which I suspect it will :(
> > All these indirect gotos are there for performance.
> > Single indirect goto and a bunch of jmp select_insn
> > are way slower, since there is only one instruction
> > for cpu branch predictor to work with.
> > When every insn is followed by "jmp *jumptable"
> > there is more room for cpu to speculate.
> > It's been long time, but when I wrote it the difference
> > between all indirect goto vs single indirect goto was almost 2x.
>
> Just to clarify, -fno-gcse doesn't get rid of any of the indirect jumps.
> It still has 166 indirect jumps. It just gets rid of the second
> optimization, where the jumptable address is placed in a register.
what about other functions in core.c ?
May be it's easier to teach objtool to recognize that pattern?
> If you have a benchmark which is relatively easy to use, I could try to
> run some tests.
modprobe test_bpf
selftests/bpf/test_progs
both print runtime.
Some of test_progs have high run-to-run variations though.
Powered by blists - more mailing lists