lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120119121859.GA3936@elte.hu>
Date:	Thu, 19 Jan 2012 13:18:59 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Jan Beulich <JBeulich@...e.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>, tglx@...utronix.de,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, hpa@...or.com
Subject: Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above


* Jan Beulich <JBeulich@...e.com> wrote:

> >>> On 18.01.12 at 19:16, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > On Wed, Jan 18, 2012 at 2:40 AM, Jan Beulich <JBeulich@...e.com> wrote:
> >>
> >>> For example the kernel's memcpy routine in slightly faster than
> >>> glibc's:
> >>
> >> This is an illusion - since the kernel's memcpy_64.S also defines a
> >> "memcpy" (not just "__memcpy"), the static linker resolves the
> >> reference from mem-memcpy.c against this one. Apparent
> >> performance differences rather point at effects like (guessing)
> >> branch prediction (using the second vs the first entry of
> >> routines[]). After fixing this, on my Westmere box glibc's is quite
> >> a bit slower than the unrolled kernel variant (4% fewer
> >> instructions, but about 15% more cycles).
> > 
> > Please don't bother doing memcpy performance analysis using 
> > hot-cache cases (or entirely cold-cache for that matter) 
> > and/or big memory copies.
> 
> I realize that - I just was asked to do this analysis, to 
> (hopefully) turn down arguments against the $subject patch.

The other problem with such repeated measurements, beyond their 
very isolated and artificially sterile nature, is what i 
mentioned: the inter-test variability is not enough to signal 
the real variance that occurs in a live system. That too can be 
deceiving.

Note that your patch is a special case which makes measurement 
easier: from the nature of your changes i expected *at most* 
some minimal micro-performance impact, not any larger access 
pattern related changes.

But Linus is right that this cannot be generalized to the 
typical patch.

So i realize all those limitations and fully agree with being 
aware of them, but compared to measuring *nothing* (which is the 
current status quo) we have to start *somewhere*.

> > The *normal* memory copy size tends to be in the 10-30 byte 
> > range, and the cache issues (both code *and* data) are 
> > unclear. Running microbenchmarks is almost always 
> > counter-productive, since it actually shows numbers for 
> > something that has absolutely *nothing* to do with the 
> > actual patterns.
> 
> This is why I added a way to do meaningful measurement on 
> small size operations (albeit still cache-hot) with perf.

We could add a test point for 10 and a 30 bytes, and the two 
corner cases: one measurement with an I$ that is trashing and a 
measurement where the D$ is trashing in a non-trivial way.

( I have used test-code before to achieve high I$ trashing: a
  function with a million NOPs. )

Once we have the typical sizes and the edge cases covered we can 
at least hope that reality is a healthy mix of all those 
"eigen-vectors".

Once we have that in place we can at least have one meaningful 
result: if a patch improves *all* these edge cases on the CPU 
models that matter, then it's typically true that it will 
improve the generic 'mixed' workload as well.

If a patch is not so clear-cut then it has to be measured with 
real loads as well, etc.

Anyway, i'll apply your current patches and play with them a 
bit.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ