[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BA7CFF4.8080102@redhat.com>
Date: Mon, 22 Mar 2010 22:15:48 +0200
From: Avi Kivity <avi@...hat.com>
To: Ingo Molnar <mingo@...e.hu>
CC: Anthony Liguori <anthony@...emonkey.ws>,
Pekka Enberg <penberg@...helsinki.fi>,
"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Sheng Yang <sheng@...ux.intel.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Marcelo Tosatti <mtosatti@...hat.com>,
oerg Roedel <joro@...tes.org>,
Jes Sorensen <Jes.Sorensen@...hat.com>,
Gleb Natapov <gleb@...hat.com>,
Zachary Amsden <zamsden@...hat.com>, ziteng.huang@...el.com,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Fr?d?ric Weisbecker <fweisbec@...il.com>,
Gregory Haskins <ghaskins@...ell.com>
Subject: Re: [RFC] Unify KVM kernel-space and user-space code into a single
project
On 03/22/2010 10:06 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@...hat.com> wrote:
>
>
>> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
>>
>>> * Avi Kivity<avi@...hat.com> wrote:
>>>
>>>
>>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>>>> Anthony. There's numerous ways that this can break:
>>>>>
>>>> I don't like it either. We have libvirt for enumerating guests.
>>>>
>>> Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
>>> obviously.
>>>
>> It doesn't follow. The libvirt daemon could/should own guests from all
>> users. I don't know if it does so now, but nothing is preventing it
>> technically.
>>
> It's hard for me to argue against a hypothetical implementation, but all
> user-space driven solutions for resource enumeration i've seen so far had
> weaknesses that kernel-based solutions dont have.
>
Correct. kernel-based solutions also have issues.
>> If qemu hangs, the guest hangs a few milliseconds later.
>>
> I think you didnt understand my point. I am talking about 'perf kvm top'
> hanging if Qemu hangs.
>
Use non-blocking I/O, report that guest as dead. No point in profiling
it, it isn't making any progress.
> With a proper in-kernel enumeration the kernel would always guarantee the
> functionality, even if the vcpu does not make progress (i.e. it's "hung").
>
> With this implemented in Qemu we lose that kind of robustness guarantee.
>
If qemu has a bug in the resource enumeration code, you can't profile
one guest. If the kernel has a bug in the resource enumeration code,
the system either panics or needs to be rebooted later.
> And especially during development (when developers use instrumentation the
> most) is it important to have robust instrumentation that does not hang along
> with the Qemu process.
>
It's nice not to have kernel oopses either. So when code can be in
userspace, that's where it should be.
>> If qemu fails, you lose your guest. If libvirt forgets about a
>> guest, you can't do anything with it any more. These are more
>> serious problems than 'perf kvm' not working. [...]
>>
> How on earth can you justify a bug ("perf kvm top" hanging) with that there
> are other bugs as well?
>
There's no reason for 'perf kvm top' to hang if some process is not
responsive. That would be a perf bug.
> Basically you are arguing the equivalent that a gdb session would be fine to
> become unresponsive if the debugged task hangs. Fortunately ptrace is
> kernel-based and it never 'hangs' if the user-space process hangs somewhere.
>
Neither gdb nor perf should hang.
> This is an essential property of good instrumentation.
>
> So the enumeration method you suggested is a poor, sub-part solution, simple
> as that.
>
Or, you misunderstood it.
>> [...] Qemu and libvirt have to be robust anyway, we can rely on them. Like
>> we have to rely on init, X, sshd, and a zillion other critical tools.
>>
> We can still profile any of those tools without the profiler breaking if the
> debugged tool breaks ...
>
You can't profile without qemu.
>>> By your argument it would be perfectly fine to implement /proc purely via
>>> user-space, correct?
>>>
>> I would have preferred /proc to be implemented via syscalls called directly
>> from tools, and good tools written to expose the information in it. When
>> computers were slower 'top' would spend tons of time opening and closing all
>> those tiny files and parsing them. Of course the kernel needs to provide
>> the information.
>>
> (Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
> are working towards that precise usecase.)
>
Are you exporting /proc/pid data via the perf syscall? If so, I think
that's a good move.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists