lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091109080830.GI453@elte.hu>
Date:	Mon, 9 Nov 2009 09:08:30 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	"Ma, Ling" <ling.ma@...el.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
 fast string.


* H. Peter Anvin <hpa@...or.com> wrote:

> On 11/08/2009 11:24 PM, Ma, Ling wrote:
> > Hi All
> > 
> > Today we run our benchmark on Core2 and Sandy Bridge:
> > 
> 
> Hi Ling,
> 
> Thanks for doing that.  Do you also have access to any older CPUs?  I 
> suspect that the CPUs that Andi are worried about are older CPUs like 
> P4, K8 or Pentium M/Core 1.  (Andi: please do clarify if you have 
> additional information.)
> 
> My personal opinion is that if we can show no significant slowdown on 
> P4, K8, P-M/Core 1, Core 2, and Nehalem then we can simply use this 
> code unconditionally.  If one of them is radically worse than 
> baseline, then we have to do something conditional, which is a lot 
> more complicated.
> 
> [Ingo, Thomas: do you agree?]

Yeah. IIRC the worst-case were the old P2's which had a really slow, 
microcode based string ops. (Some of them even had erratums in early 
prototypes although we can certainly ignore those as string ops get 
relied on quite frequently.)

IIRC the original PPro core came up with some nifty, hardwired string 
ops, but those had to be dumbed down and emulated in microcode due to 
SMP bugs - making it an inferior choice in the end.

But that should be ancient history and i'd suggest we ignore the P4 
dead-end too, unless it's some really big slowdown (which i doubt). If 
anyone cares then some optional assembly implementations could be added 
back.

Ling, if you are interested, could you send a user-space test-app to 
this thread that everyone could just compile and run on various older 
boxes, to gather a performance profile of hand-coded versus string ops 
performance?

( And i think we can make a judgement based on cache-hot performance
  alone - if then the strings ops will perform comparatively better in
  cache-cold scenarios, so the cache-hot numbers would be a conservative
  estimate. )

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ