[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHk-=wikqihO-ai-SLWGqVm1SPmYh60AujgpDniDWmXDBryKoQ@mail.gmail.com>
Date: Wed, 4 Dec 2024 10:49:31 -0800
From: Linus Torvalds <torvalds@...uxfoundation.org>
To: David Laight <David.Laight@...lab.com>
Cc: "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>,
Andrew Cooper <andrew.cooper3@...rix.com>, Josh Poimboeuf <jpoimboe@...nel.org>,
"bp@...en8.de" <bp@...en8.de>
Subject: Re: [PATCH next] x86: mask_user_address() return base of guard page
for kernel addresses
On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@...lab.com> wrote:
>
> Agner's tables pretty much show that Intel implemented as
> x = cond ? y : x
> so it suffers from being a 2 u-op instruction (the same as sbb)
> on older core-2 cpu.
So I don't worry about a 2-cycle latency here, you'll find the same
for 'sbb' too, and there you have the additional 'or' operation that
then adds another cycle.
And Intel has documented that cmov is a data dependency, so it's
mainly just AMD that I'd worry about:
> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).
So that's the part that really worried me. "4 per lock, just like
'mov'" makes me worry it's a clever predicted mov instruction.
However, it looks like Anger is actually wrong here. Going to
https://uops.info/table.html
and looking up 'cmovbe' (which I think is the op we'd want), says that
ZEN 4 is 2 per cycle (I hate how they call that 0.5 "throughput" - at
least Agner correctly calls it the "reciprocal throughput").
So that actually looks ok.
I'd still be happier if I could find some official AMD doc that says
that cmov is a data dependency and is not predicted, but at least now
the numbers line up for it.
Linus
Powered by blists - more mailing lists