linux-kernel - Re: [PATCH] lib/string: Bring optimized memcmp from glibc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgMyXh3gGuSzj_Dgw=Gn_XPxGSTPq6Pz7dEyx6JNuAh9g@mail.gmail.com>
Date:   Wed, 21 Jul 2021 11:45:36 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nikolay Borisov <nborisov@...e.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc

On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov <nborisov@...e.com> wrote:
>
> I find it somewhat arbitrary that we choose to align the 2nd pointer and
> not the first.

Yeah, that's a bit odd, but I don't think it matters.

The hope is obviously that they are mutually aligned, and in that case
it doesn't matter which one you aim to align.

> So you are saying that the current memcmp could indeed use improvement
> but you don't want it to be based on the glibc's code due to the ugly
> misalignment handling?

Yeah. I suspect that this (very simple) patch gives you the same
performance improvement that the glibc code does.

NOTE! I'm not saying this patch is perfect. This one doesn't even
_try_ to do the mutual alignment, because it's really silly. But I'm
throwing this out here for discussion, because

 - it's really simple

 - I suspect it gets you 99% of the way there

 - the code generation is actually quite good with both gcc and clang.
This is gcc:

        memcmp:
                jmp     .L60
        .L52:
                movq    (%rsi), %rax
                cmpq    %rax, (%rdi)
                jne     .L53
                addq    $8, %rdi
                addq    $8, %rsi
                subq    $8, %rdx
        .L60:
                cmpq    $7, %rdx
                ja      .L52
                testq   %rdx, %rdx
                je      .L61
        .L53:
                xorl    %ecx, %ecx
                jmp     .L56
        .L62:
                addq    $1, %rcx
                cmpq    %rcx, %rdx
                je      .L51
        .L56:
                movzbl  (%rdi,%rcx), %eax
                movzbl  (%rsi,%rcx), %r8d
                subl    %r8d, %eax
                je      .L62
        .L51:
                ret
        .L61:
                xorl    %eax, %eax
                ret

and notice how there are no spills, no extra garbage, just simple and
straightforward code.

Those things ends mattering too - it's good for I$, it's good for the
small cases, and it's good for debugging and reading the code.

If this is "good enough" for your test-case, I really would prefer
something like this. "Make it as simple as possible, but no simpler"

I can do the mutual alignment too, but I'd actually prefer to do it as
a separate patch, for when there are numbers for that.

And I wouldn't do it as a byte-by-byte case, because that's just stupid.

I'd do it using a separate first single "get unaligned word from both
sources, compare them for equality, and then only add enough bytes to
align"

                  Linus

View attachment "patch.diff" of type "text/x-patch" (975 bytes)