lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+55aFxEwWespnh+=pmdUR6WkBresxde9eoXm2nHVUEw6ZzsyQ@mail.gmail.com>
Date:   Tue, 5 Jun 2018 16:27:24 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Alexey Dobriyan <adobriyan@...il.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter Anvin <hpa@...or.com>,
        Denys Vlasenko <dvlasenk@...hat.com>
Subject: Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL]
 x86/asm changes for v4.18")

On Tue, Jun 5, 2018 at 4:20 PM Alexey Dobriyan <adobriyan@...il.com> wrote:
>
> This is Broadwell Xeon E5-2620 v4.
> Which is somewhat strange indeed because it should be modern enough.

Yeah, odd.

Here's the benchmark I used:

  #define SIZE 4068

  int main(int argc, char **argv)
  {
    int i;
    unsigned char buffer[SIZE], *p;

    for (i = 0; i < 1000000; i++)
        asm volatile(
            "1: movq %[zero],(%[mem]); addq %[eight],%[mem]; decl
%[count]; jne 1b"
            : [mem] "=r" (p)
            : [zero] "i" (0l), [eight] "i" (8l),
             "0" (buffer), [count] "r" (SIZE/8));
  }

where you can change that "i" for [zero] and [eight] to be "r" to get
the register version.

I just timed it, because I'm lazy and perf seemed to be overkill.

It might be some very specific loop buffer issue or something.

Or maybe my benchmark above is broken, I didn't really verify that the
end result was any good (I just did an objdump to verify the asm code
superficially).

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ