linux-kernel - Re: [PATCH next] x86: mask_user_address() return base of guard page for kernel addresses

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAHk-=wikqihO-ai-SLWGqVm1SPmYh60AujgpDniDWmXDBryKoQ@mail.gmail.com>
Date: Wed, 4 Dec 2024 10:49:31 -0800
From: Linus Torvalds <torvalds@...uxfoundation.org>
To: David Laight <David.Laight@...lab.com>
Cc: "x86@...nel.org" <x86@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Andrew Cooper <andrew.cooper3@...rix.com>, Josh Poimboeuf <jpoimboe@...nel.org>, 
	"bp@...en8.de" <bp@...en8.de>
Subject: Re: [PATCH next] x86: mask_user_address() return base of guard page
 for kernel addresses

On Sun, 1 Dec 2024 at 14:24, David Laight <David.Laight@...lab.com> wrote:
>
> Agner's tables pretty much show that Intel implemented as
>         x = cond ? y : x
> so it suffers from being a 2 u-op instruction (the same as sbb)
> on older core-2 cpu.

So I don't worry about a 2-cycle latency here, you'll find the same
for 'sbb' too, and there you have the additional 'or' operation that
then adds another cycle.

And Intel has documented that cmov is a data dependency, so it's
mainly just AMD that I'd worry about:

> OTOH AMD have is as '4 per clock' (the same as mov) so could be
> a 'mov' with the write disabled' (but I'm not sure how that
> would work if 'mov' is a register rename).

So that's the part that really worried me. "4 per lock, just like
'mov'" makes me worry it's a clever predicted mov instruction.

However, it looks like Anger is actually wrong here. Going to

    https://uops.info/table.html

and looking up 'cmovbe' (which I think is the op we'd want), says that
ZEN 4 is 2 per cycle (I hate how they call that 0.5 "throughput" - at
least Agner correctly calls it the "reciprocal throughput").

So that actually looks ok.

I'd still be happier if I could find some official AMD doc that says
that cmov is a data dependency and is not predicted, but at least now
the numbers line up for it.

          Linus