linux-kernel - RE: [PATCH RFC] [X86] performance improvement for memcpy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8FED46E8A9CA574792FC7AACAC38FE7714FE830398@PDSMSX501.ccr.corp.intel.com>
Date:	Wed, 11 Nov 2009 15:05:34 +0800
From:	"Ma, Ling" <ling.ma@...el.com>
To:	Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>
CC:	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.

Hi All
Please use the memcpy.c(cc -o memcpy memcpy.c -O2) to test more cases,
if you have interest. In this program we did simple modification
on memcpy_new function.

Thanks
Ling


>-----Original Message-----
>From: Ingo Molnar [mailto:mingo@...e.hu]
>Sent: 2009年11月9日 16:09
>To: H. Peter Anvin
>Cc: Ma, Ling; Ingo Molnar; Thomas Gleixner; linux-kernel
>Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast
>string.
>
>
>* H. Peter Anvin <hpa@...or.com> wrote:
>
>> On 11/08/2009 11:24 PM, Ma, Ling wrote:
>> > Hi All
>> >
>> > Today we run our benchmark on Core2 and Sandy Bridge:
>> >
>>
>> Hi Ling,
>>
>> Thanks for doing that.  Do you also have access to any older CPUs?  I
>> suspect that the CPUs that Andi are worried about are older CPUs like
>> P4, K8 or Pentium M/Core 1.  (Andi: please do clarify if you have
>> additional information.)
>>
>> My personal opinion is that if we can show no significant slowdown on
>> P4, K8, P-M/Core 1, Core 2, and Nehalem then we can simply use this
>> code unconditionally.  If one of them is radically worse than
>> baseline, then we have to do something conditional, which is a lot
>> more complicated.
>>
>> [Ingo, Thomas: do you agree?]
>
>Yeah. IIRC the worst-case were the old P2's which had a really slow,
>microcode based string ops. (Some of them even had erratums in early
>prototypes although we can certainly ignore those as string ops get
>relied on quite frequently.)
>
>IIRC the original PPro core came up with some nifty, hardwired string
>ops, but those had to be dumbed down and emulated in microcode due to
>SMP bugs - making it an inferior choice in the end.
>
>But that should be ancient history and i'd suggest we ignore the P4
>dead-end too, unless it's some really big slowdown (which i doubt). If
>anyone cares then some optional assembly implementations could be added
>back.
>
>Ling, if you are interested, could you send a user-space test-app to
>this thread that everyone could just compile and run on various older
>boxes, to gather a performance profile of hand-coded versus string ops
>performance?
>
>( And i think we can make a judgement based on cache-hot performance
>  alone - if then the strings ops will perform comparatively better in
>  cache-cold scenarios, so the cache-hot numbers would be a conservative
>  estimate. )
>
>	Ingo

View attachment "memcpy.c" of type "text/plain" (5683 bytes)