lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 11 Nov 2009 15:05:34 +0800
From:	"Ma, Ling" <>
To:	Ingo Molnar <>, "H. Peter Anvin" <>
CC:	Ingo Molnar <>,
	Thomas Gleixner <>,
	linux-kernel <>
Subject: RE: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.

Hi All
Please use the memcpy.c(cc -o memcpy memcpy.c -O2) to test more cases,
if you have interest. In this program we did simple modification
on memcpy_new function.


>-----Original Message-----
>From: Ingo Molnar []
>Sent: 2009年11月9日 16:09
>To: H. Peter Anvin
>Cc: Ma, Ling; Ingo Molnar; Thomas Gleixner; linux-kernel
>Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by fast
>* H. Peter Anvin <> wrote:
>> On 11/08/2009 11:24 PM, Ma, Ling wrote:
>> > Hi All
>> >
>> > Today we run our benchmark on Core2 and Sandy Bridge:
>> >
>> Hi Ling,
>> Thanks for doing that.  Do you also have access to any older CPUs?  I
>> suspect that the CPUs that Andi are worried about are older CPUs like
>> P4, K8 or Pentium M/Core 1.  (Andi: please do clarify if you have
>> additional information.)
>> My personal opinion is that if we can show no significant slowdown on
>> P4, K8, P-M/Core 1, Core 2, and Nehalem then we can simply use this
>> code unconditionally.  If one of them is radically worse than
>> baseline, then we have to do something conditional, which is a lot
>> more complicated.
>> [Ingo, Thomas: do you agree?]
>Yeah. IIRC the worst-case were the old P2's which had a really slow,
>microcode based string ops. (Some of them even had erratums in early
>prototypes although we can certainly ignore those as string ops get
>relied on quite frequently.)
>IIRC the original PPro core came up with some nifty, hardwired string
>ops, but those had to be dumbed down and emulated in microcode due to
>SMP bugs - making it an inferior choice in the end.
>But that should be ancient history and i'd suggest we ignore the P4
>dead-end too, unless it's some really big slowdown (which i doubt). If
>anyone cares then some optional assembly implementations could be added
>Ling, if you are interested, could you send a user-space test-app to
>this thread that everyone could just compile and run on various older
>boxes, to gather a performance profile of hand-coded versus string ops
>( And i think we can make a judgement based on cache-hot performance
>  alone - if then the strings ops will perform comparatively better in
>  cache-cold scenarios, so the cache-hot numbers would be a conservative
>  estimate. )
>	Ingo

View attachment "memcpy.c" of type "text/plain" (5683 bytes)

Powered by blists - more mailing lists