[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <483729E7.9010002@goop.org>
Date: Fri, 23 May 2008 21:32:39 +0100
From: Jeremy Fitzhardinge <jeremy@...p.org>
To: Zachary Amsden <zach@...are.com>
CC: Ingo Molnar <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>,
xen-devel <xen-devel@...ts.xensource.com>,
Thomas Gleixner <tglx@...utronix.de>,
Hugh Dickins <hugh@...itas.com>,
kvm-devel <kvm-devel@...ts.sourceforge.net>,
Virtualization Mailing List <virtualization@...ts.osdl.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction
Zachary Amsden wrote:
> I'm a bit skeptical you can get such a semantic to work without a very
> heavyweight method in the hypervisor. How do you guarantee no other CPU
> is fizzling the A/D bits in the page table (it can be done by hardware
> with direct page tables), unless you use some kind of IPI? Is this why
> it is still 7x?
>
No, you just use cmpxchg. It's pretty lightweight really. Xen holds a
lock internally to stop other cpus from updating the pte in software, so
the only source of modification is the hardware itself; the cmpxchg loop
is guaranteed to terminate because the A/D bits can only transition from
0->1.
I haven't really gone into depth as to exactly where the 7x number comes
from. I could increase the batch size (currently max of 32 pte
updates/hypercall), and some of it is plain overhead from the in-kernel
infrastructure. A simpler and more hackish approach which basically
pastes the Xen hypercall directly into the mprotect loop gets the
overhead down to about 5.5x.
> Still, a 7x gain from asynchronous batching is very nice. I wonder if
> that means the average mprotect size in your benchmark is 7 pages.
>
Yeah, it's around 7x. The batching pays off even for single page
mprotects, because the trap and emulate of xchg is so expensive.
>> I believe that other virtualization systems, whether they use direct
>> paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
>> make use of this interface to improve the performance.
>>
>
> On VMI, we don't trap the xchg of the pte, thus we don't have any
> bottleneck here to begin with.
If you're doing code rewriting then I guess you can effectively do the
same trick at that point. If not, then presumably you take a fault for
the first pte updated in the mprotect and then sync the shadow up when
the tlb flush happens; batching that trap and the tlb flush would give
you some benefit for small mprotects.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists