lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxKkfKNKqt4rWanWXX3g2p0-K+=SVeAR8H-TLAXTqTZjg@mail.gmail.com>
Date:	Wed, 18 Jan 2012 10:16:51 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jan Beulich <JBeulich@...e.com>
Cc:	Ingo Molnar <mingo@...e.hu>, tglx@...utronix.de,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, hpa@...or.com
Subject: Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above

On Wed, Jan 18, 2012 at 2:40 AM, Jan Beulich <JBeulich@...e.com> wrote:
>
>> For example the kernel's memcpy routine in slightly faster than
>> glibc's:
>
> This is an illusion - since the kernel's memcpy_64.S also defines a
> "memcpy" (not just "__memcpy"), the static linker resolves the
> reference from mem-memcpy.c against this one. Apparent
> performance differences rather point at effects like (guessing)
> branch prediction (using the second vs the first entry of
> routines[]). After fixing this, on my Westmere box glibc's is quite
> a bit slower than the unrolled kernel variant (4% fewer
> instructions, but about 15% more cycles).

Please don't bother doing memcpy performance analysis using hot-cache
cases (or entirely cold-cache for that matter) and/or big memory
copies.

The *normal* memory copy size tends to be in the 10-30 byte range, and
the cache issues (both code *and* data) are unclear. Running
microbenchmarks is almost always counter-productive, since it actually
shows numbers for something that has absolutely *nothing* to do with
the actual patterns.

End result: people do crazy things and tune memcpy for their
benchmarks, doing things like instruction (or prefetch) scheduling
that only makes the code more complicated, and has no actual basis in
reality. And then other people are afraid to touch the end result,
even though it's shit - simply because it *looks* like it was done
with lots of actual testing and effort. Never mind that all the
testing and effort was likely crap.

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ