lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Nov 2009 10:12:14 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	"H. Peter Anvin" <hpa@...or.com>
CC:	Ingo Molnar <mingo@...e.hu>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.

>-----Original Message-----
>From: H. Peter Anvin [mailto:hpa@...or.com]
>Sent: 2009年11月12日 7:21
>To: Ma, Ling
>Cc: Ingo Molnar; Ingo Molnar; Thomas Gleixner; linux-kernel
>Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast
>string.
>
>On 11/10/2009 11:57 PM, Ma, Ling wrote:
>> Hi Ingo
>>
>> This program is for 64bit version, so please use 'cc -o memcpy  memcpy.c -O2
>-m64'
>>
>
>I did some measurements with this program; I added power-of-two
>measurements from 1-512 bytes, plus some different alignments, and found
>some very interesting results:
>
>Nehalem:
>	memcpy_new is a win for 1024+ bytes, but *also* a win for 2-32
>	bytes, where the old code apparently performs appallingly bad.
>
>	memcpy_new loses in the 64-512 byte range, so the 1024
>	threshold is probably justified.
>
>Core2:
>	memcpy_new is a win for <= 512 bytes, but a lose for larger
>	copies (possibly a win again for 16K+ copies, but those are
>	very rare in the Linux kernel.)  Surprise...
>
>	However, the difference is very small.
>
>However, I had overlooked something much more fundamental about your
>patch.  On Nehalem, at least *it will never get executed* (except during
>very early startup), because we replace the memcpy code with a jmp to
>memcpy_c on any CPU which has X86_FEATURE_REP_GOOD, which includes Nehalem.
>
>So the patch is a no-op on Nehalem, and any other modern CPU.

[Ma Ling]
It is good for modern CPU, our original intention is also to introduce movsq for Nehalem, above method is more smart.

>Am I guessing that the perf numbers you posted originally were all from
>your user space test program?

[Ma Ling] 
Yes, they are all from this program, and I'm confused about measurement values will be different for only one case after multiple tests.
(3 times at least on my core2 platform).

Thanks
Ling

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ