linux-kernel - Re: [PATCH v3 3/4] x86/alternative: Rewrite optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c23540da-5f73-dede-124f-529b01ce5273@citrix.com>
Date:   Thu, 9 Feb 2023 01:11:17 +0000
From:   Andrew.Cooper3@...rix.com
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, mhiramat@...nel.org,
        kirill.shutemov@...ux.intel.com, jpoimboe@...hat.com
Subject: Re: [PATCH v3 3/4] x86/alternative: Rewrite optimize_nops() some

On 08/02/2023 9:21 pm, Peter Zijlstra wrote:
> On Wed, Feb 08, 2023 at 10:08:12PM +0100, Peter Zijlstra wrote:
>> On Wed, Feb 08, 2023 at 09:44:04PM +0100, Peter Zijlstra wrote:
>>
>>> [   11.584069] SMP alternatives: ffffffff82000095: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.590068] SMP alternatives: ffffffff820001f3: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.720069] SMP alternatives: ffffffff8200189f: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.731069] SMP alternatives: ffffffff820019ae: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.738069] SMP alternatives: ffffffff82001a4a: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.746069] SMP alternatives: ffffffff82001b2d: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.766069] SMP alternatives: ffffffff82001d14: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.770069] SMP alternatives: ffffffff82001dd5: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>>> [   11.779069] SMP alternatives: ffffffff82001f35: [0:20) optimized NOPs: eb 12 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
>> UNTRAIN_RET -- specifically RESET_CALL_DEPTH
> 19:       48 c7 c0 80 00 00 00    mov    $0x80,%rax
> 20:       48 c1 e0 38             shl    $0x38,%rax
> 24:       65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0     29: R_X86_64_32S        pcpu_hot+0x10
>
> Is ofc an atrocity.
>
> We can easily trim that by 5 bytes to:
>
> 0:   b0 80                   mov    $0x80,%al
> 2:   48 c1 e0 38             shl    $0x38,%rax
> 6:   65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0
>
> Who cares about the top bytes, we're explicitly shifting them out
> anyway. But that's still 15 bytes or so.
>
> If it weren't for those pesky prefix penalties that would make exactly
> one instruction :-)

Yeah, but then you're taking a merge penalty instead.

Given that you can't reduce enough anyway, while only a 4 byte reduction
rather than 5, you're probably better off with:

0:   31 c0                   xor    %eax,%eax
2:   48 0f ba e8 3f          bts    $0x3f,%rax
7:   65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0

because of the zeroing idiom splitting these 3 instructions away from
the previous operation on rax.

It's a shame that x86 doesn't have a mov $imm8, %d32 form, because
loading 1 into a register is an incredibly common operation to perform.

~Andrew