[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a9da587b0330bafdf612c3d51285e144b0b9e46.camel@redhat.com>
Date: Fri, 09 Apr 2021 15:21:49 -0400
From: David Malcolm <dmalcolm@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ard Biesheuvel <ardb@...nel.org>, linux-toolchains@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Jason Baron <jbaron@...mai.com>,
"Steven Rostedt (VMware)" <rostedt@...dmis.org>
Subject: Re: static_branch/jump_label vs branch merging
On Fri, 2021-04-09 at 20:40 +0200, Peter Zijlstra wrote:
> On Fri, Apr 09, 2021 at 09:48:33AM -0400, David Malcolm wrote:
> > You tried __pure on arch_static_branch; did you try it on
> > static_branch_unlikely?
>
> static_branch_unlikely() is a CPP macro that expands to a statement
> expression, or as with the later patch, a _Generic(). I'm not sure
> how
> to apply the attribute to either of them since it is a function
> attribute.
>
> I was hoping the attribute would percolate through, so to speak.
>
> > With the caveat that my knowledge of GCC's middle-end is mostly
> > about
> > implementing warnings, rather than optimization, I did some
> > experimentation, with gcc trunk on x86_64 FWIW.
> >
> > Given:
> >
> > int __attribute__((pure)) foo(void);
> >
> > int t(void)
> > {
> > int a;
> > if (foo())
> > a++;
> > if (foo())
> > a++;
> > if (foo())
> > a++;
> > return a;
> > }
> >
> > At -O1 and above this is optimized to a single call to foo,
> > returning 0
> > or 3 accordingly.
> >
> > -fdump-tree-all shows that it's the "fre1" pass that eliminates the
> > subsequent calls to foo, replacing them with reuses of the result
> > of
> > the first call.
> >
> > This is in gcc/tree-ssa-sccvn.c, a value-numbering pass.
> >
> > I think you want to somehow "teach" the compiler that:
> > static_branch_unlikely(&sched_schedstats)
> > is "pure-ish", that for some portion of the surrounding code that
> > you
> > want the result to be treated as pure - though I suspect compiler
> > maintainers with more experience than me are thinking "but which
> > portion? what is it safe to assume, and what will users be annoyed
> > about if we optimize away? what if t itself is inlined somewhere?"
> > and
> > similar concerns.
>
> Right, pure or even const. As to the scope, as wide as possible. It
> literally is a global constant, the value returned is the same
> everywhere.
[Caveat: I'm a gcc developer, not a kernel expert]
But it's not *quite* a global constant, or presumably you would be
simply using a global constant, right? As the optimizer gets smarter,
you don't want to have it one day decide that actually it really is
constant, and optimize away everything at compile-time (e.g. when LTO
is turned on, or whatnot).
I get the impression that you're resorting to assembler because you're
pushing beyond what the C language can express. Taking things to a
slightly higher level, am I right in thinking that what you're trying
to achieve is a control flow construct that almost always takes one of
the given branches, but which can (very rarely) be switched to
permanently take one of the other branches, and that you want the
lowest possible overhead for the common case where the control flow
hasn't been touched yet? (and presumably little overhead for when it
has been?)... and that you want to be able to merge repeated such
conditionals. Perhaps a __builtin_ to hint that a conditional should
work that way (analogous to __builtin_expect)? I can imagine a way of
implementing such a construct in gcc's gimple and RTL representations,
but it would be a ton of work (and I have a full plate already)
Or maybe another way of thinking about it is that you're reading a
value and you would like the compiler to amortize away repeated reads
of the value (perhaps just within the current function).
It's kind of the opposite of "volatile" - something that the user is
happy for the compiler to treat as not changing much, as opposed to
something the user is warning the compiler about changing from under
it. A "const-ish" value?
Sorry if I'm being incoherent; I'm kind of thinking aloud here.
Hope this is constructive
Dave
>
> All we need GCC to do for the static_branch construct is to emit both
> branches; that is, it must not treat the result as a constant and
> elide
> the other branches. But it can consider consecutive calls (as far and
> wide as it wants) to return the same value.
>
> > Or maybe the asm stmt itself could somehow be marked as pure???
> > (with
> > similar concerns about semantics as above)
>
> Yeah, not sure, someone with more clue will have to inform us what,
> if
> anything more than marking it either pure or const is required.
> Perhaps
> that attribute is sufficient and the compiler just isn't optimizing
> for
> an unrelated reason.
>
> Regards,
>
> Peter
>
Powered by blists - more mailing lists