[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH7PR11MB75720F8E94AF9E7FA05FC663BBC42@PH7PR11MB7572.namprd11.prod.outlook.com>
Date: Thu, 20 Feb 2025 18:28:21 +0000
From: "Constable, Scott D" <scott.d.constable@...el.com>
To: "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>, Peter Zijlstra
<peterz@...radead.org>, "x86@...nel.org" <x86@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Milburn,
Alyssa" <alyssa.milburn@...el.com>, "joao@...rdrivepizza.com"
<joao@...rdrivepizza.com>, "jpoimboe@...nel.org" <jpoimboe@...nel.org>,
"jose.marchesi@...cle.com" <jose.marchesi@...cle.com>, "hjl.tools@...il.com"
<hjl.tools@...il.com>, "ndesaulniers@...gle.com" <ndesaulniers@...gle.com>,
"samitolvanen@...gle.com" <samitolvanen@...gle.com>, "nathan@...nel.org"
<nathan@...nel.org>, "ojeda@...nel.org" <ojeda@...nel.org>, "kees@...nel.org"
<kees@...nel.org>, "alexei.starovoitov@...il.com"
<alexei.starovoitov@...il.com>, "mhiramat@...nel.org" <mhiramat@...nel.org>,
"jmill@....edu" <jmill@....edu>
Subject: RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
Hi Andrew,
I can elaborate, if only "a bit." Your intuition about branches is pretty accurate, and the difference between taken vs. not-taken should, on average, be marginal. I can quote from Intel's software optimization manual: "Conditional branches that are never taken do not consume BTB resources." Additionally, there are some more subtle reasons that not-taken branches can be preferable--these vary by microarchitecture.
Regards,
Scott Constable
-----Original Message-----
From: Andrew Cooper <andrew.cooper3@...rix.com>
Sent: Wednesday, February 19, 2025 9:15 AM
To: Peter Zijlstra <peterz@...radead.org>; x86@...nel.org
Cc: linux-kernel@...r.kernel.org; Milburn, Alyssa <alyssa.milburn@...el.com>; Constable, Scott D <scott.d.constable@...el.com>; joao@...rdrivepizza.com; jpoimboe@...nel.org; jose.marchesi@...cle.com; hjl.tools@...il.com; ndesaulniers@...gle.com; samitolvanen@...gle.com; nathan@...nel.org; ojeda@...nel.org; kees@...nel.org; alexei.starovoitov@...il.com; mhiramat@...nel.org; jmill@....edu
Subject: Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code
> that traps instead of explicit UD2 instructions.
>
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@...el.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Can we get a bit more info on this "non-taken branches are faster" ?
For modern cores which have branch prediction pre-decode, a branch unknown to the predictor will behave as non-taken until the Jcc executes[1].
Something size of Linux is surely going to exceed the branch predictor capacity, so it's perhaps fair to say that there's a reasonable chance to miss in the predictor.
But, for a branch known to the predictor, taken branches ought to be bubble-less these days. At least, this is what the marketing material claims.
And, this doesn't account for branches which alias in the predictor and end up with a wrong prediction.
~Andrew
[1] Yes, I know RWC has the reintroduced 0xee prefix with the decode resteer.
Powered by blists - more mailing lists