[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgMyXh3gGuSzj_Dgw=Gn_XPxGSTPq6Pz7dEyx6JNuAh9g@mail.gmail.com>
Date: Wed, 21 Jul 2021 11:45:36 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Nikolay Borisov <nborisov@...e.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH] lib/string: Bring optimized memcmp from glibc
On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov <nborisov@...e.com> wrote:
>
> I find it somewhat arbitrary that we choose to align the 2nd pointer and
> not the first.
Yeah, that's a bit odd, but I don't think it matters.
The hope is obviously that they are mutually aligned, and in that case
it doesn't matter which one you aim to align.
> So you are saying that the current memcmp could indeed use improvement
> but you don't want it to be based on the glibc's code due to the ugly
> misalignment handling?
Yeah. I suspect that this (very simple) patch gives you the same
performance improvement that the glibc code does.
NOTE! I'm not saying this patch is perfect. This one doesn't even
_try_ to do the mutual alignment, because it's really silly. But I'm
throwing this out here for discussion, because
- it's really simple
- I suspect it gets you 99% of the way there
- the code generation is actually quite good with both gcc and clang.
This is gcc:
memcmp:
jmp .L60
.L52:
movq (%rsi), %rax
cmpq %rax, (%rdi)
jne .L53
addq $8, %rdi
addq $8, %rsi
subq $8, %rdx
.L60:
cmpq $7, %rdx
ja .L52
testq %rdx, %rdx
je .L61
.L53:
xorl %ecx, %ecx
jmp .L56
.L62:
addq $1, %rcx
cmpq %rcx, %rdx
je .L51
.L56:
movzbl (%rdi,%rcx), %eax
movzbl (%rsi,%rcx), %r8d
subl %r8d, %eax
je .L62
.L51:
ret
.L61:
xorl %eax, %eax
ret
and notice how there are no spills, no extra garbage, just simple and
straightforward code.
Those things ends mattering too - it's good for I$, it's good for the
small cases, and it's good for debugging and reading the code.
If this is "good enough" for your test-case, I really would prefer
something like this. "Make it as simple as possible, but no simpler"
I can do the mutual alignment too, but I'd actually prefer to do it as
a separate patch, for when there are numbers for that.
And I wouldn't do it as a byte-by-byte case, because that's just stupid.
I'd do it using a separate first single "get unaligned word from both
sources, compare them for equality, and then only add enough bytes to
align"
Linus
View attachment "patch.diff" of type "text/x-patch" (975 bytes)
Powered by blists - more mailing lists