[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220317223756.GM614@gate.crashing.org>
Date: Thu, 17 Mar 2022 17:37:56 -0500
From: Segher Boessenkool <segher@...nel.crashing.org>
To: Bill Wendling <morbo@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Juergen Gross <jgross@...e.com>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>, llvm@...ts.linux.dev,
LKML <linux-kernel@...r.kernel.org>,
linux-toolchains <linux-toolchains@...r.kernel.org>
Subject: Re: [PATCH v5] x86: use builtins to read eflags
Hi!
On Thu, Mar 17, 2022 at 12:45:30PM -0700, Bill Wendling wrote:
> On Thu, Mar 17, 2022 at 11:52 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> > It is perhaps telling that the LLVM discussion I found seems to talk
> > more about the performance impact, not about the fact that THE
> > GENERATED CODE WAS WRONG.
> >
> I think that's a bit unfair. There seemed to be a general consensus
> that the code was wrong and needed to be fixed. However, the compiler
> is also expected to generate the most performant code that it can.
I think Linus' point is that correctness is required, and performance
is just a nice to have. I don't think you disagree, I sure don't :-)
> > In fact, gcc pretty much documents that "cc" clobbers doesn't do
> > anything on x86, and isn't needed. It just assumes that the arithmetic
> > flags always get clobbered, because on x86 that's pretty much the
> > case. They have sometimes been added for documentation purposes in the
> > kernel, but that's all they are.
GCC on x86 clobbers the arithmetic flags in every asm. Long ago the x86
port still used CC0 (an internal representation of the condition code
flags), which acted effectively like clobbering the flags on every
instruction (CC0 will finally be gone in GCC 12 btw). The x86 port
stopped using CC0 (and so started using the automatic "cc" clobber, to
keep old inline asm working) in 1999 :-)
"cc" is a valid clobber on every port, but it doesn't necessarily do
anything, and if it does, it does not necessarily correspond to any more
specific register, neither to a GCC register nor to a hardware register.
It also is not really useful for generic code, since the asm itself is
by nature pretty much target-specific :-)
> > But the whole "you can't move _other_ things that you don't even
> > understand around this either" is equally important. A "disable
> > interrupts" could easily be protecting a "read and modify a CPU MSR
> > value" too - no real "memory" access necessarily involved, but
> > "memory" is the only way we can tell you "don't move this".
> >
> And yet that's not guaranteed. From
> https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html:
>
> ```
> Note that the compiler can move even volatile asm instructions
> relative to other code, including across jump instructions. For
> example, on many targets there is a system register that controls the
> rounding mode of floating-point operations. Setting it with a volatile
> asm statement, as in the following PowerPC example, does not work
> reliably.
>
> asm volatile("mtfsf 255, %0" : : "f" (fpenv));
> sum = x + y;
>
> The solution is to reference the "sum" variable in the asm:
>
> asm volatile ("mtfsf 255,%1" : "=X" (sum) : "f" (fpenv));
> ```
>
> Note that the solution _isn't_ to add a "memory" clobber, because it's
> not guaranteed to work, as it's explicitly defined to be a read/write
> _memory_ barrier, despite what kernel writers wish it would do.
It doesn't work because *other* code can change the fp environment as
well; adding "memory" clobbers to both asms will keep them in order (if
they both are not deleted, etc). This hinders optimisation very
seriously for code like this btw, another important reason to not do it
here.
And what is "memory" here anyway? New memory items can be added very
late in the pass pipeline, and anything done in an earlier pass will
not have considered those of course. The most important case of this
(but not the only case) is new slots on the stack.
> Your assertion that compilers don't know about control registers isn't
> exactly true. In the case of "pushf/popf", those instructions know
> about the eflags registers. All subsequent instructions that read or
> modify eflags also know about it. In essence, the compiler can
> determine its own clobber list, which include MSRs.
The compiler only knows about some bits in the flags register. The
compiler does not know about most machine resources, including all of
the model-specific registers.
Segher
Powered by blists - more mailing lists