lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c31eb59df5f8426aaf0009ab15587cee@AcuMS.aculab.com>
Date:   Fri, 23 Jul 2021 14:02:18 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Linus Torvalds' <torvalds@...ux-foundation.org>,
        Nikolay Borisov <nborisov@...e.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>
Subject: RE: [PATCH] lib/string: Bring optimized memcmp from glibc

From: Linus Torvalds
> Sent: 21 July 2021 19:46
> 
> On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov <nborisov@...e.com> wrote:
> >
> > I find it somewhat arbitrary that we choose to align the 2nd pointer and
> > not the first.
> 
> Yeah, that's a bit odd, but I don't think it matters.
> 
> The hope is obviously that they are mutually aligned, and in that case
> it doesn't matter which one you aim to align.
> 
> > So you are saying that the current memcmp could indeed use improvement
> > but you don't want it to be based on the glibc's code due to the ugly
> > misalignment handling?
> 
> Yeah. I suspect that this (very simple) patch gives you the same
> performance improvement that the glibc code does.
> 
> NOTE! I'm not saying this patch is perfect. This one doesn't even
> _try_ to do the mutual alignment, because it's really silly. But I'm
> throwing this out here for discussion, because
> 
>  - it's really simple
> 
>  - I suspect it gets you 99% of the way there
> 
>  - the code generation is actually quite good with both gcc and clang.
> This is gcc:
> 
>         memcmp:
>                 jmp     .L60
>         .L52:
>                 movq    (%rsi), %rax
>                 cmpq    %rax, (%rdi)
>                 jne     .L53
>                 addq    $8, %rdi
>                 addq    $8, %rsi
>                 subq    $8, %rdx
>         .L60:
>                 cmpq    $7, %rdx
>                 ja      .L52

I wonder how fast that can be made to run.
I think the two conditional branches have to run in separate clocks.
So you may get all 5 arithmetic operations to run in the same 2 clocks.
But that may be pushing things on everything except the very latest cpu.
The memory reads aren't limiting at all, the cpu can do two per clock.
So even though (IIRC) misaligned ones cost an extra clock it doesn't matter.

That looks like a +dst++ = *src++ loop.
The array copy dst[i] = src[i]; i++ requires one less 'addq'
provided the cpu has 'register + register' addressing.
Not decrementing the length also saves an 'addq'.
So the loop:
	for (i = 0; i < length - 7; i += 8)
		dst[i] = src[i];  /* Hacked to be right in C */
probably only has one addq and one cmpq per iteration.
That is much more likely to run in the 2 clocks.
(If you can persuade gcc not to transform it!)

It may also be possible to remove the cmpq by arranging
that the flags from the addq contain the right condition.
That needs something like:
	dst += len; src += len; len = -len
	do
		dst[len] = src[len];
	while ((len += 8) < 0);
That probably isn't necessary for x86, but is likely to help sparc.

For mips-like cpu (with 'compare and jump', only 'reg + constant'
addressing) you really want a loop like:
	dst_end = dst + length;
	do
		*dst++ = *src++;
	while (dst < dst_end);
This has two adds and a compare per iteration.
That might be a good compromise for aligned copies.

I'm not at all sure is it ever worth aligning either pointer
if misaligned reads don't fault.
Most compares (of any size) will be aligned.
So you get the 'hit' of the test when it cannot help.
That almost certainly exceeds any benefit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ