linux-kernel - Re: hardwired VMI crap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45F09521.1060308@vmware.com>
Date:	Thu, 08 Mar 2007 14:58:41 -0800
From:	Zachary Amsden <zach@...are.com>
To:	Andi Kleen <ak@...e.de>
CC:	tglx@...utronix.de, Ingo Molnar <mingo@...e.hu>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	john stultz <johnstul@...ibm.com>, akpm@...ux-foundation.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Pratap Subrahmanyam <pratap@...are.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Daniel Hecht <dhecht@...are.com>,
	Daniel Arai <arai@...are.com>,
	Chris Wright <chrisw@...s-sol.org>,
	Virtualization Mailing List <virtualization@...ts.osdl.org>
Subject: Re: hardwired VMI crap

Andi Kleen wrote:

> At least in Linux we don't really work with deadlines; if there 
> are issues they need to be fixed even if it takes longer. I don't 
> expect the version in .21 to be really usable anyways; it is clearly
> still in development.
>   

It was working, and I expect to have it working again.  It is not in 
development, but we urgently need to find a way to fix the problems 
created when Ingo hobbled it by removing NO_IDLE_HZ code from 2.6.21..

>   
>> we re-used the APIC and IO-APIC, this is  
>> uber rocket science.  We've been doing things this way, with public 
>> patches for over a year, and you've even been CC'd on some of the 
>> discussions.  So it is a little late to tell us - "redesign your 
>> hypervisor, or else.."
>>     
>
> It shouldn't touch the hypervisor, just the paravirt VMI backend shouldn't it?
> I assume you could do a very minimal APIC layer that is just enough to 
> talk to your softapic and a genapic backend for IPIs.
>
> At least I would welcome anything that shrinks the number of 
> paravirt hooks.
>
> I'm just not sure it would be less hooks: you would probably need
> functions to start other CPUs at least.
>   
 
Anything that attempts to create this uber multi-virtual interrupt / 
timer / IPI / clock management beast is going to add a huge number of 
paravirt hooks, because the vendor backends will be different for all of 
these.

>  
> I must admit I also didn't quite get what was the big problem with
> hooking apic_read/apic_write.
>   

You mean why we need them?  They make APIC writes faster, since 
otherwise they would trap and emulate, which is slow, and APIC is on 
critical paths.  Or why people object to them?  I don't get the latter 
either.

> For the timer you just need to use a own exclusive 
> clocksource that never touches PIT.
>   

We have that working fine.  It is getting the clock event to work 
independently from the lapic timer that is difficult because of the i386 
backend.

>   
>> We faithfully emulate lapic, io_apic, the pit, pic, and a normal 
>> interrupt subsystem. We can't magically stop using these things because  
>> we have to support traditional full virtualization.  Which means any 
>> version of Linux, virtual interrupt controller or not, is going to boot 
>> up, find these things, and try to use them.  So for a paravirt kernel, 
>> either we have to disable each of these things in the Linux code or just 
>> re-use them.
>>     
>
> If you don't enable them they should be already disabled as default 
> state, shouldn't they? 
>
> With an own custom clocksource and possible own APIC layer nobody
> would ever enable the APICs.
>   

But we enable and use them, in both full-virt, and paravirt mode.  So we 
really would need to duplicate the code, almost exactly for our "virtual 
interrupt controller", which would really just be a wrapper on top of a 
nearly identical APIC or IO-APIC implementation.

>> 1) Rewrite the interrupt subsystem of our hypervisor, making it 
>> incompatible with full virtualization, so that we can support an 
>> abstract interrupt controller with a "clean" interface
>>     
>
> What do you mean with rewrite? It's quite easy to add a new
> backend to the generic IRQ code. They aren't a lot of code.
>   

Yes, but we would then need to duplicate the APIC or IO-APIC 
implementation, because that is the hardware we emulate and use.  We 
just want a different way to fire local timers, that is all.

> You could probably do a much simpler version, couldn't you? A lot of 
> the stuff in apic.c/io_apic.c shouldn't be needed for a clean virtual
> interface. But yes it would probably be still a lot of code.
>   

Yes, we could do a cleaner simpler version.  But then we need to write 
this new interrupt controller code for both the hypervisor and for 
Linux.  And the fact that it is cleaner doesn't make it any nicer or 
perform any better - it is just another dependency between the kernel 
and hypervisor that then becomes hard to change.  So we would rather 
stay as close to the hardware design as possible.

> Still (2) is probably best for now, but the other alternatives
> are not as ridiculous as you paint them.
>   

We have (2) working.  But Thomas apparently hated it.  The idea I have 
about a single-IRQ source interrupt controller for timers seems pretty 
nice, and does almost exactly encapsulate the one difference we have 
from standard APIC / IO-APIC hardware - a different way to drive local 
timers.

Thanks for your feedback,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/