[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whCygw44p30Pmf+Bt8=LVtmij3_XOxweEA3OQNruhMg+A@mail.gmail.com>
Date: Wed, 21 Jul 2021 13:27:56 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Sterba <dsterba@...e.cz>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Nikolay Borisov <nborisov@...e.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc
On Wed, Jul 21, 2021 at 1:13 PM David Sterba <dsterba@...e.cz> wrote:
>
> adding a memcmp_large that compares by native words or u64 could be
> the best option.
Yeah, we could just special-case that one place.
But see the patches I sent out - I think we can get the best of both worlds.
A small and simple memcmp() that is good enough and not the
_completely_ stupid thing we have now.
The second patch I sent out even gets the mutually aligned case right.
Of course, the glibc code also ended up unrolling things a bit, but
honestly, the way it did it was too disgusting for words.
And if it really turns out that the unrolling makes a big difference -
although I doubt it's meaningful with any modern core - I can add a
couple of lines to that simple patch I sent out to do that too.
Without getting the monster that is that glibc code.
Of course, my patch depends on the fact that "get_unaligned()" is
cheap on all CPU's that really matter, and that caches aren't
direct-mapped any more. The glibc code seems to be written for a world
where registers are cheap, unaligned accesses are prohibitively
expensive, and unrolling helps because L1 caches are direct-mapped and
you really want to do chunking to not get silly way conflicts.
If old-style Sparc or MIPS was our primary target, that would be one
thing. But it really isn't.
Linus
Powered by blists - more mailing lists