[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45F09521.1060308@vmware.com>
Date: Thu, 08 Mar 2007 14:58:41 -0800
From: Zachary Amsden <zach@...are.com>
To: Andi Kleen <ak@...e.de>
CC: tglx@...utronix.de, Ingo Molnar <mingo@...e.hu>,
Jeremy Fitzhardinge <jeremy@...p.org>,
john stultz <johnstul@...ibm.com>, akpm@...ux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Pratap Subrahmanyam <pratap@...are.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Daniel Hecht <dhecht@...are.com>,
Daniel Arai <arai@...are.com>,
Chris Wright <chrisw@...s-sol.org>,
Virtualization Mailing List <virtualization@...ts.osdl.org>
Subject: Re: hardwired VMI crap
Andi Kleen wrote:
> At least in Linux we don't really work with deadlines; if there
> are issues they need to be fixed even if it takes longer. I don't
> expect the version in .21 to be really usable anyways; it is clearly
> still in development.
>
It was working, and I expect to have it working again. It is not in
development, but we urgently need to find a way to fix the problems
created when Ingo hobbled it by removing NO_IDLE_HZ code from 2.6.21..
>
>> we re-used the APIC and IO-APIC, this is
>> uber rocket science. We've been doing things this way, with public
>> patches for over a year, and you've even been CC'd on some of the
>> discussions. So it is a little late to tell us - "redesign your
>> hypervisor, or else.."
>>
>
> It shouldn't touch the hypervisor, just the paravirt VMI backend shouldn't it?
> I assume you could do a very minimal APIC layer that is just enough to
> talk to your softapic and a genapic backend for IPIs.
>
> At least I would welcome anything that shrinks the number of
> paravirt hooks.
>
> I'm just not sure it would be less hooks: you would probably need
> functions to start other CPUs at least.
>
Anything that attempts to create this uber multi-virtual interrupt /
timer / IPI / clock management beast is going to add a huge number of
paravirt hooks, because the vendor backends will be different for all of
these.
>
> I must admit I also didn't quite get what was the big problem with
> hooking apic_read/apic_write.
>
You mean why we need them? They make APIC writes faster, since
otherwise they would trap and emulate, which is slow, and APIC is on
critical paths. Or why people object to them? I don't get the latter
either.
> For the timer you just need to use a own exclusive
> clocksource that never touches PIT.
>
We have that working fine. It is getting the clock event to work
independently from the lapic timer that is difficult because of the i386
backend.
>
>> We faithfully emulate lapic, io_apic, the pit, pic, and a normal
>> interrupt subsystem. We can't magically stop using these things because
>> we have to support traditional full virtualization. Which means any
>> version of Linux, virtual interrupt controller or not, is going to boot
>> up, find these things, and try to use them. So for a paravirt kernel,
>> either we have to disable each of these things in the Linux code or just
>> re-use them.
>>
>
> If you don't enable them they should be already disabled as default
> state, shouldn't they?
>
> With an own custom clocksource and possible own APIC layer nobody
> would ever enable the APICs.
>
But we enable and use them, in both full-virt, and paravirt mode. So we
really would need to duplicate the code, almost exactly for our "virtual
interrupt controller", which would really just be a wrapper on top of a
nearly identical APIC or IO-APIC implementation.
>> 1) Rewrite the interrupt subsystem of our hypervisor, making it
>> incompatible with full virtualization, so that we can support an
>> abstract interrupt controller with a "clean" interface
>>
>
> What do you mean with rewrite? It's quite easy to add a new
> backend to the generic IRQ code. They aren't a lot of code.
>
Yes, but we would then need to duplicate the APIC or IO-APIC
implementation, because that is the hardware we emulate and use. We
just want a different way to fire local timers, that is all.
> You could probably do a much simpler version, couldn't you? A lot of
> the stuff in apic.c/io_apic.c shouldn't be needed for a clean virtual
> interface. But yes it would probably be still a lot of code.
>
Yes, we could do a cleaner simpler version. But then we need to write
this new interrupt controller code for both the hypervisor and for
Linux. And the fact that it is cleaner doesn't make it any nicer or
perform any better - it is just another dependency between the kernel
and hypervisor that then becomes hard to change. So we would rather
stay as close to the hardware design as possible.
> Still (2) is probably best for now, but the other alternatives
> are not as ridiculous as you paint them.
>
We have (2) working. But Thomas apparently hated it. The idea I have
about a single-IRQ source interrupt controller for timers seems pretty
nice, and does almost exactly encapsulate the one difference we have
from standard APIC / IO-APIC hardware - a different way to drive local
timers.
Thanks for your feedback,
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists