linux-kernel - Re: [PATCH v2 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyvGkuM33C8ki=5-11idWWK4tHKuPaSrSb0FPTaJmC_iQ@mail.gmail.com>
Date:   Mon, 5 Feb 2018 13:58:56 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andi Kleen <ak@...ux.intel.com>, X86 ML <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v2 1/3] x86/entry: Clear extra registers beyond syscall
 arguments for 64bit kernels

On Mon, Feb 5, 2018 at 1:33 PM, Dan Williams <dan.j.williams@...el.com> wrote:
>
> On a suggestion from Arjan it also appears worthwhile to interleave
> 'mov' with 'xor'. Perf stat says that this test gets 3.45 instructions
> per cycle:

Ugh.

A "xor %reg/reg" is two bytes (three for the high regs due to REX
prefix). A "mov $0" is 7 bytes because unlike most of the ALU ops,
"mov" doesn't have a 8-bit expanding immediate.

So replacing those xors with movq's will add at least four bytes per
replacement.  So you may well end up adding an L1 cache miss.

At which point "3.45 ipc" vs "2.88 ipc" is pretty much a non-issue.

I suspect that a bigger win would be if you try to interleave those
"xor" instructions with the "pushq" instructions in the entry code.
Because those push instructions tend to be limited by the LSU store
bandwidth, so you can probably put in xor instructions almost for free
in there.

                   Linus