linux-kernel - Re: [RFC] Unify KVM kernel-space and user-space code into a single project

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BA7CFF4.8080102@redhat.com>
Date:	Mon, 22 Mar 2010 22:15:48 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Anthony Liguori <anthony@...emonkey.ws>,
	Pekka Enberg <penberg@...helsinki.fi>,
	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Sheng Yang <sheng@...ux.intel.com>,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	Marcelo Tosatti <mtosatti@...hat.com>,
	oerg Roedel <joro@...tes.org>,
	Jes Sorensen <Jes.Sorensen@...hat.com>,
	Gleb Natapov <gleb@...hat.com>,
	Zachary Amsden <zamsden@...hat.com>, ziteng.huang@...el.com,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Fr?d?ric Weisbecker <fweisbec@...il.com>,
	Gregory Haskins <ghaskins@...ell.com>
Subject: Re: [RFC] Unify KVM kernel-space and user-space code into a single
 project

On 03/22/2010 10:06 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@...hat.com>  wrote:
>
>    
>> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@...hat.com>   wrote:
>>>
>>>        
>>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>>>> Anthony. There's numerous ways that this can break:
>>>>>            
>>>> I don't like it either.  We have libvirt for enumerating guests.
>>>>          
>>> Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
>>> obviously.
>>>        
>> It doesn't follow.  The libvirt daemon could/should own guests from all
>> users.  I don't know if it does so now, but nothing is preventing it
>> technically.
>>      
> It's hard for me to argue against a hypothetical implementation, but all
> user-space driven solutions for resource enumeration i've seen so far had
> weaknesses that kernel-based solutions dont have.
>    

Correct.  kernel-based solutions also have issues.

>> If qemu hangs, the guest hangs a few milliseconds later.
>>      
> I think you didnt understand my point. I am talking about 'perf kvm top'
> hanging if Qemu hangs.
>    

Use non-blocking I/O, report that guest as dead.  No point in profiling 
it, it isn't making any progress.

> With a proper in-kernel enumeration the kernel would always guarantee the
> functionality, even if the vcpu does not make progress (i.e. it's "hung").
>
> With this implemented in Qemu we lose that kind of robustness guarantee.
>    

If qemu has a bug in the resource enumeration code, you can't profile 
one guest.  If the kernel has a bug in the resource enumeration code, 
the system either panics or needs to be rebooted later.

> And especially during development (when developers use instrumentation the
> most) is it important to have robust instrumentation that does not hang along
> with the Qemu process.
>    

It's nice not to have kernel oopses either.  So when code can be in 
userspace, that's where it should be.

>> If qemu fails, you lose your guest.  If libvirt forgets about a
>> guest, you can't do anything with it any more.  These are more
>> serious problems than 'perf kvm' not working. [...]
>>      
> How on earth can you justify a bug ("perf kvm top" hanging) with that there
> are other bugs as well?
>    

There's no reason for 'perf kvm top' to hang if some process is not 
responsive.  That would be a perf bug.

> Basically you are arguing the equivalent that a gdb session would be fine to
> become unresponsive if the debugged task hangs. Fortunately ptrace is
> kernel-based and it never 'hangs' if the user-space process hangs somewhere.
>    

Neither gdb nor perf should hang.

> This is an essential property of good instrumentation.
>
> So the enumeration method you suggested is a poor, sub-part solution, simple
> as that.
>    

Or, you misunderstood it.

>> [...] Qemu and libvirt have to be robust anyway, we can rely on them.  Like
>> we have to rely on init, X, sshd, and a zillion other critical tools.
>>      
> We can still profile any of those tools without the profiler breaking if the
> debugged tool breaks ...
>    

You can't profile without qemu.

>>> By your argument it would be perfectly fine to implement /proc purely via
>>> user-space, correct?
>>>        
>> I would have preferred /proc to be implemented via syscalls called directly
>> from tools, and good tools written to expose the information in it.  When
>> computers were slower 'top' would spend tons of time opening and closing all
>> those tiny files and parsing them.  Of course the kernel needs to provide
>> the information.
>>      
> (Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
> are working towards that precise usecase.)
>    

Are you exporting /proc/pid data via the perf syscall?  If so, I think 
that's a good move.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/