lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTimeWSEUU6EYa4yWY11OyAVQqNu5eoBZc5ddqHQL@mail.gmail.com>
Date:	Mon, 13 Dec 2010 19:25:05 +1100
From:	Nick Piggin <npiggin@...il.com>
To:	"J. R. Okajima" <hooanon05@...oo.co.jp>
Cc:	Nick Piggin <npiggin@...nel.dk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-arch@...r.kernel.org, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp

On Mon, Dec 13, 2010 at 6:29 PM, J. R. Okajima <hooanon05@...oo.co.jp> wrote:
>
> Nick Piggin:
>> It's not scaling but just single threaded performance. gcc turns memcmp
>> into rep cmp, which has quite a long latency, so it's not appripriate
>> for short strings.
>
> Honestly speaking I doubt how this 'long *' approach is effective
> (Of course it never means that your result (by 'char *') is doubtful).

Well, let's see what turns up. We certainly can try the long *
approach. I suspect on architectures where byte loads are
very slow, gcc will block the loop into larger loads, so it should
be no worse than a normal memcmp call, but if we do explicit
padding we can avoid all the problems associated with tail
handling.

Doing name padding and long * comparison will be practically
free (because slab allocator will align out to sizeof(long long)
anyway), so if any architecture prefers to do the long loads, I'd
be interested to hear and we could whip up a patch.

> But is the "rep cmp has quite a long latency" issue generic for all x86
> architecture, or Westmere system specific?

I don't believe it is Westmere specific. Intel and AMD have
been improving these instructions in the past few years, so
Westmere is probably as good or better than any.

That said, rep cmp may not be as heavily optimized as the
set and copy string instructions.

In short, I think the change should be suitable for all x86 CPUs,
but I would like to hear more opinions or see numbers for other
cores.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ