linux-kernel - Re: Big git diff speedup by avoiding x86 "fast string" memcmp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTinZ=Bk53KCr4_8Vjpb6M+RWq6n2XCz=rY2DOLRx@mail.gmail.com>
Date:	Thu, 16 Dec 2010 08:51:04 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Boaz Harrosh <bharrosh@...asas.com>
Cc:	David Miller <davem@...emloft.net>, npiggin@...il.com,
	hooanon05@...oo.co.jp, npiggin@...nel.dk,
	linux-arch@...r.kernel.org, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp

On Thu, Dec 16, 2010 at 1:53 AM, Boaz Harrosh <bharrosh@...asas.com> wrote:
>
> You miss understood me. I'm saying that we know the beggining of the
> string is aligned and Nick offered to pad the last long, so surly
> a shift by 2 (or 3) + the reduction of the 12 dec-and-test to 3
> should give you an optimization?

Sadly, right now we don't know that the string is necessarily even aligned.

Yes, it's always aligned in a dentry, because it's either the inline
short string, or it's the longer string we explicitly allocated to the
dentry.

But when we do name compares in __d_lookup, only one part of that is a
dentry. The other is a qstr, and the name there is not aligned. In
fact, it's not even NUL-terminated. It's the data directly from the
path itself.

So we can certainly do compares a "long" at a time, but it's not
entirely trivial. And just making the dentries be aligned and
null-padded is not enough. Most likely, you'd have to make the dentry
name compare function do an unaligned load from the qstr part, and
then do the masking.

Which is likely still the best performance on something like x86 where
unaligned loads are cheap, but on other architectures it might be less
so.

                                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/