[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49AB9336.7010103@goop.org>
Date: Mon, 02 Mar 2009 00:05:10 -0800
From: Jeremy Fitzhardinge <jeremy@...p.org>
To: Nick Piggin <nickpiggin@...oo.com.au>
CC: Andrew Morton <akpm@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Xen-devel <xen-devel@...ts.xensource.com>
Subject: Re: [PATCH] xen: core dom0 support
Nick Piggin wrote:
>> Those would be pertinent questions if I were suddenly popping up and
>> saying "hey, let's add Xen support to the kernel!" But Xen support has
>> been in the kernel for well over a year now, and is widely used, enabled
>> in distros, etc. The patches I'm proposing here are not a whole new
>> thing, they're part of the last 10% to fill out the kernel's support to
>> make it actually useful.
>>
>
> As a guest, I guess it has been agreed that guest support for all
> different hypervisors is "a good thing". dom0 is more like a piece
> of the hypervisor itself, right?
>
Hm, I wouldn't put it like that. dom0 is no more part of the hypervisor
than the hypervisor is part of dom0. The hypervisor provides one set of
services (domain isolation and multiplexing). Domains with direct
hardware access and drivers provide arbitration for virtualized device
access. They provide orthogonal sets of functionality which are both
required to get a working system.
Also, the machinery needed to allow a kernel to operate as dom0 is more
than that: it allows direct access to hardware in general. An otherwise
unprivileged domU can be given access to a specific PCI device via
PCI-passthrough so that it can drive it directly. This is often used
for direct access to 3D hardware, or high-performance networking (esp
with multi-context hardware that's designed for virtualization use).
>> Because Xen is dedicated to just running virtual machines, its internal
>> architecture can be more heavily oriented towards that task, which
>> affects things from how its scheduler works, its use and multiplexing of
>> physical memory. For example, Xen manages to use new hardware
>> virtualization features pretty quickly, partly because it doesn't need
>> to trade-off against normal kernel functions. The clear distinction
>> between the privileged hypervisor and the rest of the domains makes the
>> security people happy as well. Also, because Xen is small and fairly
>> self-contained, there's quite a few hardware vendors shipping it burned
>> into the firmware so that it really is the first thing to boot (many of
>> instant-on features that laptops have are based on Xen). Both HP and
>> Dell, at least, are selling servers with Xen pre-installed in the firmware.
>>
>
> That would kind of seem like Xen has a better design to me, OTOH if it
> needs this dom0 for most device drivers and things, then how much
> difference is it really? Is KVM really disadvantaged by being a part of
> the kernel?
>
Well, you can lump everything together in dom0 if you want, and that is
a common way to run a Xen system. But there's no reason you can't
disaggregate drivers into their own domains, each with the
responsibility for a particular device or set of devices (or indeed, any
other service you want provided). Xen can use hardware features like
VT-d to really enforce the partitioning so that the domains can't
program their hardware to touch anything except what they're allowed to
touch, so nothing is trusted beyond its actual area of responsibility.
It also means that killing off and restarting a driver domain is a
fairly lightweight and straightforward operation because the state is
isolated and self-contained; guests using a device have to be able to
deal with a disconnect/reconnect anyway (for migration), so it doesn't
affect them much. Part of the reason there's a lot of academic interest
in Xen is because it has the architectural flexibility to try out lots
of different configurations.
I wouldn't say that KVM is necessarily disadvantaged by its design; its
just a particular set of tradeoffs made up-front. It loses Xen's
flexibility, but the result is very familiar to Linux people. A guest
domain just looks like a qemu process that happens to run in a strange
processor mode a lot of the time. The qemu process provides virtual
device access to its domain, and accesses the normal device drivers like
any other usermode process would. The domains are as isolated from each
other as much as processes normally are, but they're all floating around
in the same kernel; whether that provides enough isolation for whatever
technical, billing, security, compliance/regulatory or other
requirements you have is up to the user to judge.
>> Once important area of paravirtualization is that Xen guests directly
>> use the processor's pagetables; there is no shadow pagetable or use of
>> hardware pagetable nesting. This means that a tlb miss is just a tlb
>> miss, and happens at full processor performance. This is possible
>> because 1) pagetables are always read-only to the guest, and 2) the
>> guest is responsible for looking up in a table to map guest-local pfns
>> into machine-wide mfns before installing them in a pte. Xen will check
>> that any new mapping or pagetable satisfies all the rules, by checking
>> that the writable reference count is 0, and that the domain owns (or has
>> been allowed access to) any mfn it tries to install in a pagetable.
>>
>
> Xen's memory virtualization is pretty neat, I'll give it that. Is it
> faster than KVM on a modern CPU?
It really depends on the workload. There's three cases to consider:
software shadow pagetables, hardware nested pagetables, and Xen direct
pagetables. Even now, Xen's (highly optimised) shadow pagetable code
generally out-performs modern nested pagetables, at least when running
Windows (for which that code was most heavily tuned). Shadow pagetables
and nested pagetables will generally outperform direct pagetables when
the workload does lots of pagetable updates compared to accesses. (I
don't know what the current state of kvm's shadow pagetable performance
is, but it seems OK.)
But if you're mostly accessing the pagetable, direct pagetables still
win. On a tlb miss, it gets 4 memory accesses, whereas a nested
pagetable tlb miss needs 24 memory accesses; and a nested tlb hit means
that you have 24 tlb entries being tied up to service the hit, vs 4.
(Though the chip vendors are fairly secretive about exactly how they
structure their tlbs to deal with nested lookups, so I may be off
here.) (It also depends on whether you arrange to put the guest, host
or both memory into large pages; doing so helps a lot.)
> Would it be possible I wonder to make
> a MMU virtualization layer for CPUs without support, using Xen's page
> table protection methods, and have KVM use that? Or does that amount
> to putting a significant amount of Xen hypervisor into the kernel..?
>
At one point Avi was considering doing it, but I don't think he ever
made any real effort in that direction. KVM is pretty wedded to having
hardware support anyway, so there's not much point in removing it in
this one area.
The Xen technique gets its performance from collapsing a level of
indirection, but that has a cost in terms of flexibility; the hypervisor
can't do as much mucking around behind the guest's back (for example,
the guest sees real hardware memory addresses in the form of mfns, so
Xen can't move pages around, at least not without some form of explicit
synchronisation).
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists