[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BAA0DFE.1080700@redhat.com>
Date: Wed, 24 Mar 2010 15:05:02 +0200
From: Avi Kivity <avi@...hat.com>
To: Joerg Roedel <joro@...tes.org>
CC: Anthony Liguori <anthony@...emonkey.ws>,
Ingo Molnar <mingo@...e.hu>,
Pekka Enberg <penberg@...helsinki.fi>,
"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Sheng Yang <sheng@...ux.intel.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Marcelo Tosatti <mtosatti@...hat.com>,
Jes Sorensen <Jes.Sorensen@...hat.com>,
Gleb Natapov <gleb@...hat.com>, ziteng.huang@...el.com,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Fr?d?ric Weisbecker <fweisbec@...il.com>,
Gregory Haskins <ghaskins@...ell.com>
Subject: Re: [RFC] Unify KVM kernel-space and user-space code into a single
project
On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>
>> You can always provide the kernel and module paths as command line
>> parameters. It just won't be transparently usable, but if you're using
>> qemu from the command line, presumably you can live with that.
>>
> I don't want the tool for myself only. A typical perf user expects that
> it works transparent.
>
A typical kvm user uses libvirt, so we can integrate it with that.
>>> Could be easily done using notifier chains already in the kernel.
>>> Probably implemented with much less than 100 lines of additional code.
>>>
>> And a userspace interface for that.
>>
> Not necessarily. The perf event is configured to measure systemwide kvm
> by userspace. The kernel side of perf takes care that it stays
> system-wide even with added vm instances. So in this case the consumer
> for the notifier would be the perf kernel part. No userspace interface
> required.
>
Someone needs to know about the new guest to fetch its symbols. Or do
you want that part in the kernel too?
>> If we make an API, I'd like it to be generally useful.
>>
> Thats hard to do at this point since we don't know what people will use
> it for. We should keep it simple in the beginning and add new features
> as they are requested and make sense in this context.
>
IMO this use case is to rare to warrant its own API, especially as there
are alternatives.
>> It's a total headache. For example, we'd need security module hooks to
>> determine access permissions. So far we managed to avoid that since kvm
>> doesn't allow you to access any information beyond what you provided it
>> directly.
>>
> Depends on how it is designed. A filesystem approach was already
> mentioned. We could create /sys/kvm/ for example to expose information
> about virtual machines to userspace. This would not require any new
> security hooks.
>
Who would set the security context on those files? Plus, we need cgroup
support so you can't see one container's guests from an unrelated container.
>> Copying the objects is a one time cost. If you run perf for more than a
>> second or two, it would fetch and cache all of the data. It's really
>> the same problem with non-guest profiling, only magnified a bit.
>>
> I don't think we can cache filesystem data of a running guest on the
> host. It is too hard to keep such a cache coherent.
>
I don't see any choice. The guest can change its symbols at any time
(say by kexec), without any notification.
>>>> Other userspaces can also provide this functionality, like they have to
>>>> provide disk, network, and display emulation. The kernel is not a huge
>>>> library.
>>>>
> If two userspaces run in parallel what is the single instance where perf
> can get a list of guests from?
>
I don't know. Surely that's solvable though.
>> kvm.ko has only a small subset of the information that is used to define
>> a guest.
>>
> The subset is not small. It contains all guest vcpus, the complete
> interrupt routing hardware emulation and manages event the guests
> memory.
>
It doesn't contain most of the mmio and pio address space. Integration
with qemu would allow perf to tell us that the guest is hitting the
interrupt status register of a virtio-blk device in pci slot 5 (the
information is already available through the kvm_mmio trace event, but
only qemu can decode it).
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists