linux-kernel - Re: Big git diff speedup by avoiding x86 "fast string" memcmp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19324.1291990997@jrobl>
Date:	Fri, 10 Dec 2010 23:23:17 +0900
From:	"J. R. Okajima" <hooanon05@...oo.co.jp>
To:	Nick Piggin <npiggin@...nel.dk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-arch@...r.kernel.org, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: Big git diff speedup by avoiding x86 "fast string" memcmp


Nick Piggin:
> The standard memcmp function on a Westmere system shows up hot in
> profiles in the `git diff` workload (both parallel and single threaded),
> and it is likely due to the costs associated with trapping into
> microcode, and little opportunity to improve memory access (dentry
> name is not likely to take up more than a cacheline).

Let me make sure.
What you are pointing out is
- asm("repe; cmpsb") may grab CPU long time, and can be a hazard for
  scaling.
- by breaking it into pieces, the chances to scale will increase.
Right?

Anyway this appraoch replacing smallest code by larger but faster code
is interesting.
How about mixing 'unsigned char *' and 'unsigned long *' in referencing
the given strings?
For example,

int f(const unsigned char *cs, const unsigned char *ct, size_t count)
{
	int ret;
	union {
		const unsigned long *l;
		const unsigned char *c;
	} s, t;

/* this macro is your dentry_memcmp() actually */
#define cmp(s, t, c, step)		      \
	do {				      \
		while ((c) >= (step)) {	      \
			ret = (*(s) != *(t)); \
			if (ret)	      \
				return ret;   \
			(s)++;		      \
			(t)++;		      \
			(c) -= (step);	      \
		}			      \
	} while (0)

	s.c = cs;
	t.c = ct;
	cmp(s.l, t.l, count, sizeof(*s.l));
	cmp(s.c, t.c, count, sizeof(*s.c));
	return 0;
}

What I am thinking here is,
- in load and compare, there is no difference between 'char*' and
  'long*', probably.
- obviously 'step by sizeof(long)' will reduce the number of repeats.
- but I am not sure whether the length of string is generally longer
  than 4 (or 8) or not.


J. R. Okajima
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/