linux-kernel - RE: [PATCH v3 3/4] x86/alternative: Rewrite optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a12179a9f31b4b05b7430d5ba743d615@AcuMS.aculab.com>
Date:   Thu, 9 Feb 2023 22:27:05 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     "'Andrew.Cooper3@...rix.com'" <Andrew.Cooper3@...rix.com>,
        Peter Zijlstra <peterz@...radead.org>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mhiramat@...nel.org" <mhiramat@...nel.org>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "jpoimboe@...hat.com" <jpoimboe@...hat.com>
Subject: RE: [PATCH v3 3/4] x86/alternative: Rewrite optimize_nops() some

From: Andrew.Cooper3@...rix.com
> Sent: 09 February 2023 01:11
...
> >> UNTRAIN_RET -- specifically RESET_CALL_DEPTH
> > 19:       48 c7 c0 80 00 00 00    mov    $0x80,%rax
> > 20:       48 c1 e0 38             shl    $0x38,%rax
> > 24:       65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0     29: R_X86_64_32S
> pcpu_hot+0x10
> >
> > Is ofc an atrocity.
> >
> > We can easily trim that by 5 bytes to:
> >
> > 0:   b0 80                   mov    $0x80,%al
> > 2:   48 c1 e0 38             shl    $0x38,%rax
> > 6:   65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0
> >
> > Who cares about the top bytes, we're explicitly shifting them out
> > anyway. But that's still 15 bytes or so.
> >
> > If it weren't for those pesky prefix penalties that would make exactly
> > one instruction :-)
> 
> Yeah, but then you're taking a merge penalty instead.
> 
> Given that you can't reduce enough anyway, while only a 4 byte reduction
> rather than 5, you're probably better off with:
> 
> 0:   31 c0                   xor    %eax,%eax
> 2:   48 0f ba e8 3f          bts    $0x3f,%rax
> 7:   65 48 89 04 25 00 00 00 00      mov    %rax,%gs:0x0
> 
> because of the zeroing idiom splitting these 3 instructions away from
> the previous operation on rax.

How about:
		31 c0     xor %eax,%eax
		f9        stc
		48 d1 d8  rcr $1,%rax
So 6 bytes total.
But that might be a partial dependency on flags.
(Although that isn't any worse than the xor.)
It is also a longer dependency chain - so the execution time
rather depends on what else is going on.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)