linux-kernel - Re: Big git diff speedup by avoiding x86 "fast string" memcmp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 13 Dec 2010 19:25:05 +1100
From:	Nick Piggin <npiggin@...il.com>
To:	"J. R. Okajima" <hooanon05@...oo.co.jp>
Cc:	Nick Piggin <npiggin@...nel.dk>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-arch@...r.kernel.org, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp

On Mon, Dec 13, 2010 at 6:29 PM, J. R. Okajima <hooanon05@...oo.co.jp> wrote:
>
> Nick Piggin:
>> It's not scaling but just single threaded performance. gcc turns memcmp
>> into rep cmp, which has quite a long latency, so it's not appripriate
>> for short strings.
>
> Honestly speaking I doubt how this 'long *' approach is effective
> (Of course it never means that your result (by 'char *') is doubtful).

Well, let's see what turns up. We certainly can try the long *
approach. I suspect on architectures where byte loads are
very slow, gcc will block the loop into larger loads, so it should
be no worse than a normal memcmp call, but if we do explicit
padding we can avoid all the problems associated with tail
handling.

Doing name padding and long * comparison will be practically
free (because slab allocator will align out to sizeof(long long)
anyway), so if any architecture prefers to do the long loads, I'd
be interested to hear and we could whip up a patch.

> But is the "rep cmp has quite a long latency" issue generic for all x86
> architecture, or Westmere system specific?

I don't believe it is Westmere specific. Intel and AMD have
been improving these instructions in the past few years, so
Westmere is probably as good or better than any.

That said, rep cmp may not be as heavily optimized as the
set and copy string instructions.

In short, I think the change should be suitable for all x86 CPUs,
but I would like to hear more opinions or see numbers for other
cores.

Thanks,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/