linux-kernel - Re: [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p..

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20111202233122.GA12556@phenom.dumpdata.com>
Date:	Fri, 2 Dec 2011 18:31:22 -0500
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	linux-kernel@...r.kernel.org, x86@...nel.org, len.brown@...el.com,
	tglx@...utronix.de, jeremy@...p.org, hpa@...or.com, bp@...en8.de,
	tj@...nel.org, trenn@...e.de
Cc:	mingo@...hat.com, xen-devel@...ts.xensource.com, stable@...nel.org
Subject: Re: [PATCH 2/3] x86/cpa: Use pte_attrs instead of pte_flags on
 CPA/set_p.._wb/wc operations.

> The fix, which this patch proposes, is to wrap the pte_pgprot in the CPA
> code with newly introduced pte_attrs which can go through the pvops interface
> to get the "emulated" value instead of the raw. Naturally if CONFIG_PARAVIRT is
> not set, it would end calling native_pte_val.
> 
> The other way to fix this is by wrapping pte_flags and go through the pvops
> interface and it really is the Right Thing to do.  The problem is, that past
> experience with mprotect stuff demonstrates that it be really expensive in inner
> loops, and pte_flags() is used in some very perf-critical areas.

I did not get to verify the mprotect stuff as I need to chase down the details of it,
but I did run some benchmarks using kernbench on three different boxes:

 AMD A8-3850 (8GB) - tst005
 Intel i3-2100 (8GB) - tst007
 Nehelem EX (32logical cpus) (32GB) - tst010

I've put all the kernebench results in https://www.dumpdata.com/results/baseline_pte_flags_pte_attrs/
(and the chart for the AMD is attached).

The boxes have a fresh install of F16, with a 3.2-rc3 variant kernel using the
.config that F16 came with. I just hit Enter when oldconfig asked me to choose.

The baseline is virgin v3.2-rc3. The pte_attrs is the patch that this email is
replaying too (on top of v3.2-rc3). The pte_flags are two patches that wrap pte_flags
in paravirt and use alternative_asm to patch the code (on top of v3.2-rc3).

The patches are in the URL mentioned or in my git branch as
devel/pte_attrs.v1 ( git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git).
I am also attaching them in this email.

The summary is that I could only get the numbers to show some difference when
the maximum load was run - and _only_ on the AMD machine. The small SandyBridge
and the big SandyBridge had no trouble with. The AMD machine the difference was
13% worst if pte_flags (so alternative_asm) was used instead of pte_attrs.

The way I did these tests is to bootup with 'init=/bin/bash', remount / as rw, activate
swap disk and run kernbench on the v3.2-rc3 linux tree. Then unplug the machine for a tea
break and then repeat the cycle with a different kernel.

Download attachment "AMD-A8-3850.png" of type "image/png" (9629 bytes)

View attachment "0001-x86-paravirt-xen-Introduce-pte_flags.patch" of type "text/plain" (5164 bytes)

View attachment "0002-x86-paravirt-xen-Optimize-pte_flags-by-marking-it-as.patch" of type "text/plain" (4346 bytes)