linux-kernel - Re: [PATCH] lib/string: Bring optimized memcmp from glibc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whqJKKc9wUacLEkvTzXYfYOUDt=kHKX6Fa8Kb4kQftbbQ@mail.gmail.com>
Date:   Wed, 21 Jul 2021 11:00:59 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Nikolay Borisov <nborisov@...e.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc

On Wed, Jul 21, 2021 at 6:59 AM Nikolay Borisov <nborisov@...e.com> wrote:
>
> This is glibc's memcmp version. The upside is that for architectures
> which don't have an optimized version the kernel can provide some
> solace in the form of a generic, word-sized optimized memcmp. I tested
> this with a heavy IOCTL_FIDEDUPERANGE(2) workload and here are the
> results I got:

Hmm. I suspect the usual kernel use of memcmp() is _very_ skewed to
very small memcmp calls, and I don't think I've ever seen that
(horribly bad) byte-wise default memcmp in most profiles.

I suspect that FIDEDUPERANGE thing is most likely a very special case.

So I don't think you're wrong to look at this, but I think you've gone
from our old "spend no effort at all" to "look at one special case".

And I think the glibc implementation is horrible and doesn't know
about machines where unaligned loads are cheap - which is all
reasonable ones.

That MERGE() macro is disgusting, and memcmp_not_common_alignment()
should not exist on any sane architecture. It's literally doing extra
work to make for slower accesses, when the hardware does it better
natively.

So honestly, I'd much rather see a much saner and simpler
implementation that works well on the architectures that matter, and
that don't want that "align things by hand".

Aligning one of the sources by hand is fine and makes sense - so that
_if_ the two strings end up being mutually aligned, all subsequent
accesses are aligned.

 But then trying to do shift-and-masking for the possibly remaining
unaligned source is crazy and garbage. Don't do it.

And you never saw that, because your special FIDEDUPERANGE testcase
will never have anything but mutually aligned cases.

Which just shows that going from "don't care at all' to "care about
one special case" is not the way to go.

So I'd much rather see a simple default function that works well for
the sane architectures, than go with the default code from glibc - and
bad for the common modern architectures.

Then architectures could choose that one with some

        select GENERIC_MEMCMP

the same way we have

        select GENERIC_STRNCPY_FROM_USER

for the (sane, for normal architectures) common optimized case for a
special string instruction that matters a lot for the kernel.

                     Linus