lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 14 Oct 2012 12:58:21 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	"Ma, Ling" <ling.ma@...el.com>
Cc:	Konrad Rzeszutek Wilk <konrad@...nel.org>,
	"mingo@...e.hu" <mingo@...e.hu>, "hpa@...or.com" <hpa@...or.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"iant@...gle.com" <iant@...gle.com>,
	George Spelvin <linux@...izon.com>
Subject: Re: [PATCH RFC 2/2] [x86] Optimize copy_page by re-arranging
 instruction sequence and saving register

On Fri, Oct 12, 2012 at 08:04:11PM +0200, Borislav Petkov wrote:
> Right, so benchmark shows around 20% speedup on Bulldozer but this is
> a microbenchmark and before pursue this further, we need to verify
> whether this brings any palpable speedup with a real benchmark, I
> don't know, kernbench, netbench, whatever. Even something as boring as
> kernel build. And probably check for perf regressions on the rest of
> the uarches.

Ok, so to summarize, on AMD we're using REP MOVSQ which is even
faster than the unrolled version. I've added the REP MOVSQ version
to the µbenchmark. It nicely validates that we're correctly setting
X86_FEATURE_REP_GOOD on everything >= F10h and some K8s.

So, to answer Konrad's question: those patches don't concern AMD
machines.

Thanks.

-- 
Regards/Gruss,
    Boris.

View attachment "copy-page.c" of type "text/x-csrc" (6206 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ