[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091109080830.GI453@elte.hu>
Date: Mon, 9 Nov 2009 09:08:30 +0100
From: Ingo Molnar <mingo@...e.hu>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: "Ma, Ling" <ling.ma@...el.com>, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC] [X86] performance improvement for memcpy_64.S by
fast string.
* H. Peter Anvin <hpa@...or.com> wrote:
> On 11/08/2009 11:24 PM, Ma, Ling wrote:
> > Hi All
> >
> > Today we run our benchmark on Core2 and Sandy Bridge:
> >
>
> Hi Ling,
>
> Thanks for doing that. Do you also have access to any older CPUs? I
> suspect that the CPUs that Andi are worried about are older CPUs like
> P4, K8 or Pentium M/Core 1. (Andi: please do clarify if you have
> additional information.)
>
> My personal opinion is that if we can show no significant slowdown on
> P4, K8, P-M/Core 1, Core 2, and Nehalem then we can simply use this
> code unconditionally. If one of them is radically worse than
> baseline, then we have to do something conditional, which is a lot
> more complicated.
>
> [Ingo, Thomas: do you agree?]
Yeah. IIRC the worst-case were the old P2's which had a really slow,
microcode based string ops. (Some of them even had erratums in early
prototypes although we can certainly ignore those as string ops get
relied on quite frequently.)
IIRC the original PPro core came up with some nifty, hardwired string
ops, but those had to be dumbed down and emulated in microcode due to
SMP bugs - making it an inferior choice in the end.
But that should be ancient history and i'd suggest we ignore the P4
dead-end too, unless it's some really big slowdown (which i doubt). If
anyone cares then some optional assembly implementations could be added
back.
Ling, if you are interested, could you send a user-space test-app to
this thread that everyone could just compile and run on various older
boxes, to gather a performance profile of hand-coded versus string ops
performance?
( And i think we can make a judgement based on cache-hot performance
alone - if then the strings ops will perform comparatively better in
cache-cold scenarios, so the cache-hot numbers would be a conservative
estimate. )
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists