linux-kernel - RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <PH7PR11MB75720F8E94AF9E7FA05FC663BBC42@PH7PR11MB7572.namprd11.prod.outlook.com>
Date: Thu, 20 Feb 2025 18:28:21 +0000
From: "Constable, Scott D" <scott.d.constable@...el.com>
To: "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>, Peter Zijlstra
	<peterz@...radead.org>, "x86@...nel.org" <x86@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Milburn,
 Alyssa" <alyssa.milburn@...el.com>, "joao@...rdrivepizza.com"
	<joao@...rdrivepizza.com>, "jpoimboe@...nel.org" <jpoimboe@...nel.org>,
	"jose.marchesi@...cle.com" <jose.marchesi@...cle.com>, "hjl.tools@...il.com"
	<hjl.tools@...il.com>, "ndesaulniers@...gle.com" <ndesaulniers@...gle.com>,
	"samitolvanen@...gle.com" <samitolvanen@...gle.com>, "nathan@...nel.org"
	<nathan@...nel.org>, "ojeda@...nel.org" <ojeda@...nel.org>, "kees@...nel.org"
	<kees@...nel.org>, "alexei.starovoitov@...il.com"
	<alexei.starovoitov@...il.com>, "mhiramat@...nel.org" <mhiramat@...nel.org>,
	"jmill@....edu" <jmill@....edu>
Subject: RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence

Hi Andrew,

I can elaborate, if only "a bit." Your intuition about branches is pretty accurate, and the difference between taken vs. not-taken should, on average, be marginal. I can quote from Intel's software optimization manual: "Conditional branches that are never taken do not consume BTB resources." Additionally, there are some more subtle reasons that not-taken branches can be preferable--these vary by microarchitecture.

Regards,

Scott Constable

-----Original Message-----
From: Andrew Cooper <andrew.cooper3@...rix.com> 
Sent: Wednesday, February 19, 2025 9:15 AM
To: Peter Zijlstra <peterz@...radead.org>; x86@...nel.org
Cc: linux-kernel@...r.kernel.org; Milburn, Alyssa <alyssa.milburn@...el.com>; Constable, Scott D <scott.d.constable@...el.com>; joao@...rdrivepizza.com; jpoimboe@...nel.org; jose.marchesi@...cle.com; hjl.tools@...il.com; ndesaulniers@...gle.com; samitolvanen@...gle.com; nathan@...nel.org; ojeda@...nel.org; kees@...nel.org; alexei.starovoitov@...il.com; mhiramat@...nel.org; jmill@....edu
Subject: Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence

On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code 
> that traps instead of explicit UD2 instructions.
>
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@...el.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>

Can we get a bit more info on this "non-taken branches are faster" ?

For modern cores which have branch prediction pre-decode, a branch unknown to the predictor will behave as non-taken until the Jcc executes[1].

Something size of Linux is surely going to exceed the branch predictor capacity, so it's perhaps fair to say that there's a reasonable chance to miss in the predictor.

But, for a branch known to the predictor, taken branches ought to be bubble-less these days.  At least, this is what the marketing material claims.

And, this doesn't account for branches which alias in the predictor and end up with a wrong prediction.

~Andrew

[1] Yes, I know RWC has the reintroduced 0xee prefix with the decode resteer.