lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Sep 2021 14:28:42 -0700
From:   "H. Peter Anvin" <hpa@...or.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in
 static_cpu_has()

On 9/9/21 10:01 AM, Borislav Petkov wrote:
> On Wed, Sep 08, 2021 at 10:17:16AM -0700, H. Peter Anvin (Intel) wrote:
> 
>> Subject: Re: [PATCH] x86/asm: pessimize the pre-initialization case in static_cpu_has()
> 
> "pessimize" huh? :)
> 
> Why not simply
> 
> "Do not waste registers in the pre-initialization case..."
> 

Because it is shorter and thus can fit more contents

> ?
> 
>> gcc will sometimes manifest the address of boot_cpu_data in a register
>> as part of constant propagation. When multiple static_cpu_has() are
>> used this may foul the mainline code with a register load which will
>> only be used on the fallback path, which is unused after
>> initialization.
> 
> So a before-after thing looks like this here:
> 
> before:
> 
> ffffffff89696517 <.altinstr_aux>:
> ffffffff89696517:       f6 05 cb 09 cb ff 80    testb  $0x80,-0x34f635(%rip)        # ffffffff89346ee9 <boot_cpu_data+0x69>
> ffffffff8969651e:       0f 85 fc 3e fb ff       jne    ffffffff8964a420 <intel_pmu_init+0x14e7>
> ffffffff89696524:       e9 ee 3e fb ff          jmp    ffffffff8964a417 <intel_pmu_init+0x14de>
> ffffffff89696529:       f6 45 6a 08             testb  $0x8,0x6a(%rbp)
> ffffffff8969652d:       0f 85 45 b9 97 f7       jne    ffffffff81011e78 <intel_pmu_lbr_filter+0x68>
> ffffffff89696533:       e9 95 b9 97 f7          jmp    ffffffff81011ecd <intel_pmu_lbr_filter+0xbd>
> ffffffff89696538:       41 f6 44 24 6a 08       testb  $0x8,0x6a(%r12)
> ffffffff8969653e:       0f 85 d3 bc 97 f7       jne    ffffffff81012217 <intel_pmu_store_lbr+0x77>
> ffffffff89696544:       e9 d9 bc 97 f7          jmp    ffffffff81012222 <intel_pmu_store_lbr+0x82>
> ffffffff89696549:       41 f6 44 24 6a 08       testb  $0x8,0x6a(%r12)
> 
> after:
> 
> ffffffff89696517 <.altinstr_aux>:
> ffffffff89696517:       f6 04 25 e9 6e 34 89    testb  $0x80,0xffffffff89346ee9
> ffffffff8969651e:       80
> ffffffff8969651f:       0f 85 fb 3e fb ff       jne    ffffffff8964a420 <intel_pmu_init+0x14e7>
> ffffffff89696525:       e9 ed 3e fb ff          jmp    ffffffff8964a417 <intel_pmu_init+0x14de>
> ffffffff8969652a:       f6 04 25 ea 6e 34 89    testb  $0x8,0xffffffff89346eea
> ffffffff89696531:       08
> ffffffff89696532:       0f 85 37 b9 97 f7       jne    ffffffff81011e6f <intel_pmu_lbr_filter+0x5f>
> ffffffff89696538:       e9 89 b9 97 f7          jmp    ffffffff81011ec6 <intel_pmu_lbr_filter+0xb6>
> ffffffff8969653d:       f6 04 25 ea 6e 34 89    testb  $0x8,0xffffffff89346eea
> ffffffff89696544:       08
> ffffffff89696545:       0f 85 b5 bc 97 f7       jne    ffffffff81012200 <intel_pmu_store_lbr+0x70>
> ffffffff8969654b:       e9 bb bc 97 f7          jmp    ffffffff8101220b <intel_pmu_store_lbr+0x7b>
> ffffffff89696550:       f6 04 25 ea 6e 34 89    testb  $0x8,0xffffffff89346eea
> 
> so you're basically forcing an immediate thing.
> 
> And you wanna get rid of the (%<reg>) relative addressing and force it
> to be rip-relative.
> 
>> Explicitly force gcc to use immediate (rip-relative) addressing for
> 
> Right, the rip-relative addressing doesn't happen here:
> 

Indeed it doesn't (egg on my face), nor does it turn out is there 
currently a way to do so (just adding (%%rip) breaks i386, and there is 
no equivalent to %{pP} which adds the suffix). Let me fix both; will 
have a patchset shortly.

	-hpa

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ