lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d89fb36-c143-2bdd-9898-6053058b2e12@citrix.com>
Date:   Thu, 9 Feb 2023 01:33:07 +0000
From:   Andrew.Cooper3@...rix.com
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, mhiramat@...nel.org,
        kirill.shutemov@...ux.intel.com, jpoimboe@...hat.com
Subject: Re: [PATCH v3 3/4] x86/alternative: Rewrite optimize_nops() some

On 08/02/2023 8:29 pm, Peter Zijlstra wrote:
> On Wed, Feb 08, 2023 at 07:52:04PM +0000, Andrew.Cooper3@...rix.com wrote:
>> On 08/02/2023 5:10 pm, Peter Zijlstra wrote:
>>> This rewrite address two issues:
>>>
>>>  - it no longer hard requires single byte nop runs, it now accepts
>>>    any NOP and NOPL encoded instruction (but not the more complicated
>>>    32bit NOPs).
>>>
>>>  - it writes a single 'instruction' replacement.
>>>
>>> Specifically, ORC unwinder relies on the tail NOP of an alternative to
>>> be a single instruction, in particular it relies on the inner bytes
>>> not being executed.
>>>
>>> Once we reach the max supported NOP length (currently 8, could easily
>>> be extended to 11 on x86_64), switches to JMP.d8 and INT3 padding to
>>> achieve the same result.
>>>
>>> The ORC unwinder uses this guarantee in the analysis of
>>> alternative/overlapping CFI state,
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>> How lucky are you feeling for your game of performance roulette?
> Yeah, not very lucky.. I've been talking about this with Boris for a bit
> already.
>
>> Unconditional jmps cost branch prediction these days, and won't be
>> successfully predicted until taken.
> IKR, insane, but that's what it is.

In terms of rationalising how things work, sure, put the resulting perf
numbers speak for themselves.

For the benefit of others reading this and not following what's going
on, modern x86 processors have branch prediction occurring pre-decode,
not post-decode, to reduce the misprediction latency.

Branch prediction operates using the current %rip and past history, and
selects the $I lines to send for decode.  The "decoded bytes disagree
with prediction metadata" feedback cycle is fast, but missing this
disagreement is the root cause of the Branch Type Confusion speculation
issue (a.k.a. AMD Retbleed).

~Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ