linux-kernel - Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <483729E7.9010002@goop.org>
Date:	Fri, 23 May 2008 21:32:39 +0100
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Zachary Amsden <zach@...are.com>
CC:	Ingo Molnar <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>,
	xen-devel <xen-devel@...ts.xensource.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Hugh Dickins <hugh@...itas.com>,
	kvm-devel <kvm-devel@...ts.sourceforge.net>,
	Virtualization Mailing List <virtualization@...ts.osdl.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write	abstraction

Zachary Amsden wrote:
> I'm a bit skeptical you can get such a semantic to work without a very
> heavyweight method in the hypervisor.  How do you guarantee no other CPU
> is fizzling the A/D bits in the page table (it can be done by hardware
> with direct page tables), unless you use some kind of IPI?  Is this why
> it is still 7x?
>   

No, you just use cmpxchg.  It's pretty lightweight really.  Xen holds a 
lock internally to stop other cpus from updating the pte in software, so 
the only source of modification is the hardware itself; the cmpxchg loop 
is guaranteed to terminate because the A/D bits can only transition from 
0->1.

I haven't really gone into depth as to exactly where the 7x number comes 
from.  I could increase the batch size (currently max of 32 pte 
updates/hypercall), and some of it is plain overhead from the in-kernel 
infrastructure.  A simpler and more hackish approach which basically 
pastes the Xen hypercall directly into the mprotect loop gets the 
overhead down to about 5.5x.

> Still, a 7x gain from asynchronous batching is very nice.  I wonder if
> that means the average mprotect size in your benchmark is 7 pages.
>   

Yeah, it's around 7x.  The batching pays off even for single page 
mprotects, because the trap and emulate of xchg is so expensive.

>> I believe that other virtualization systems, whether they use direct
>> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
>> make use of this interface to improve the performance.
>>     
>
> On VMI, we don't trap the xchg of the pte, thus we don't have any
> bottleneck here to begin with.

If you're doing code rewriting then I guess you can effectively do the 
same trick at that point.  If not, then presumably you take a fault for 
the first pte updated in the mprotect and then sync the shadow up when 
the tlb flush happens; batching that trap and the tlb flush would give 
you some benefit for small mprotects.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/