linux-kernel - Re: [PATCH] lib/string: Bring optimized memcmp from glibc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <792949a2-d987-f6a0-a153-8c5fe1e3a073@suse.com>
Date:   Thu, 22 Jul 2021 08:54:14 +0300
From:   Nikolay Borisov <nborisov@...e.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        David Sterba <dsterba@...e.cz>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc



On 21.07.21 г. 23:27, Linus Torvalds wrote:
> On Wed, Jul 21, 2021 at 1:13 PM David Sterba <dsterba@...e.cz> wrote:
>>
>> adding a memcmp_large that compares by native words or u64 could be
>> the best option.
> 
> Yeah, we could just special-case that one place.

This who thread started because I first implemented a special case just
for dedupe and Dave Chinner suggested instead of playing whack-a-mole to
get something decent for the generic memcmp so that we get an
improvement across the whole of the kernel.

> 
> But see the patches I sent out - I think we can get the best of both worlds.
> 
> A small and simple memcmp() that is good enough and not the
> _completely_ stupid thing we have now.
> 
> The second patch I sent out even gets the mutually aligned case right.
> 
> Of course, the glibc code also ended up unrolling things a bit, but
> honestly, the way it did it was too disgusting for words.
> 
> And if it really turns out that the unrolling makes a big difference -
> although I doubt it's meaningful with any modern core - I can add a
> couple of lines to that simple patch I sent out to do that too.
> Without getting the monster that is that glibc code.
> 
> Of course, my patch depends on the fact that "get_unaligned()" is
> cheap on all CPU's that really matter, and that caches aren't
> direct-mapped any more. The glibc code seems to be written for a world
> where registers are cheap, unaligned accesses are prohibitively
> expensive, and unrolling helps because L1 caches are direct-mapped and
> you really want to do chunking to not get silly way conflicts.
> 
> If old-style Sparc or MIPS was our primary target, that would be one
> thing. But it really isn't.
> 
>               Linus
>