[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121014105821.GB2165@liondog.tnic>
Date: Sun, 14 Oct 2012 12:58:21 +0200
From: Borislav Petkov <bp@...en8.de>
To: "Ma, Ling" <ling.ma@...el.com>
Cc: Konrad Rzeszutek Wilk <konrad@...nel.org>,
"mingo@...e.hu" <mingo@...e.hu>, "hpa@...or.com" <hpa@...or.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"iant@...gle.com" <iant@...gle.com>,
George Spelvin <linux@...izon.com>
Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging
instruction sequence and saving register
On Fri, Oct 12, 2012 at 08:04:11PM +0200, Borislav Petkov wrote:
> Right, so benchmark shows around 20% speedup on Bulldozer but this is
> a microbenchmark and before pursue this further, we need to verify
> whether this brings any palpable speedup with a real benchmark, I
> don't know, kernbench, netbench, whatever. Even something as boring as
> kernel build. And probably check for perf regressions on the rest of
> the uarches.
Ok, so to summarize, on AMD we're using REP MOVSQ which is even
faster than the unrolled version. I've added the REP MOVSQ version
to the µbenchmark. It nicely validates that we're correctly setting
X86_FEATURE_REP_GOOD on everything >= F10h and some K8s.
So, to answer Konrad's question: those patches don't concern AMD
machines.
Thanks.
--
Regards/Gruss,
Boris.
View attachment "copy-page.c" of type "text/x-csrc" (6206 bytes)
Powered by blists - more mailing lists