[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45F0761C.6060107@vmware.com>
Date: Thu, 08 Mar 2007 12:46:20 -0800
From: Zachary Amsden <zach@...are.com>
To: tglx@...utronix.de
CC: Ingo Molnar <mingo@...e.hu>, Jeremy Fitzhardinge <jeremy@...p.org>,
john stultz <johnstul@...ibm.com>, akpm@...ux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Pratap Subrahmanyam <pratap@...are.com>,
Rusty Russell <rusty@...tcorp.com.au>, Andi Kleen <ak@...e.de>,
Daniel Hecht <dhecht@...are.com>,
Daniel Arai <arai@...are.com>,
Chris Wright <chrisw@...s-sol.org>,
Virtualization Mailing List <virtualization@...ts.osdl.org>
Subject: Re: hardwired VMI crap
Thomas Gleixner wrote:
> On Thu, 2007-03-08 at 02:06 -0800, Zachary Amsden wrote:
>
>>>> The correct solution here is to properly separate the APIC, SMP, and
>>>> timer code so the logic of it which we want to reuse is separated from
>>>> the hardware dependence. Clock events and clocksources take care of
>>>> most of the timer issues, but there is still ugliness from SMP timer
>>>> events depending on having part of the APIC infrastructure for wiring
>>>> the interrupt gates.
>>>>
>>>>
>>> what are you talking about? A clockevents driver does not need to know
>>> about lapic details, at all. In terms of interrupt gates for the
>>> hypervisor to notify about clock events - use a virtual interrupt
>>> controller via genirq.
>>>
>>>
>> See my last e-mail. It is not possible on i386, since local per-cpu
>> interrupts are only supported via the APIC.
>>
>
> It is not possible from your POV. It is possible, as we have already a
> complete irq abstraction layer, which supports _ALL_ of the
> requirements.
>
> To make use of it in a maintainable way, it just needs the work of doing
> a proper client for the genirq layer, which get's its interrupt injected
> by the hypervisor.
>
> genirq() does not care by which mechanism handle_percpu_irq() is called.
>
> We provided the abstractions and you just tell us straight in the face,
> that your hypervisor works that way and therefor we have to accept that
> you do it that way.
>
> It's not rocket science to implement an abstract interrupt controller,
> which lets you inject per cpu or global interrupts into the generic
> layer. It needs some preparatory work to distangle the boot code
> assumptions from the implicit hardware, but this is a better spent time,
> than another set of hackery, which you already advertised for smpboot.c
>
When we're about two weeks away from a product release and you are
threatening to unmerge or block our code because we didn't create an
abstract interrupt controller, we re-used the APIC and IO-APIC, this is
uber rocket science. We've been doing things this way, with public
patches for over a year, and you've even been CC'd on some of the
discussions. So it is a little late to tell us - "redesign your
hypervisor, or else.."
> All we want you and the other hypervisor folks to do is to
>
> - use existing abstractions in the way they are designed
> - create new ones where applicable
>
Great.
> - break the hardwired hardware assumptions, so a sane emulation model
> can be used.
>
Why? This is your own invention, as you think it would make life
easier. It doesn't - you still have real hardware to deal with, and
your code will always be designed to operate on silicon with these
hardwired assumptions. Breaking away from that can actually make the
code more complex, both in the hypervisor and in Linux.
>
>> So far, all you have done is not complain about our code until it was
>> merged, the pursue every tactic possible to break it. It is not us that
>> are stonewalling.
>>
>
> You have been told before. Andi asked you more than once to move to
> clockevents.
>
Which we have done. And now you refuse to give any feedback on
technical points, but maintain an objection to the way we have done it.
> If you can not change your hypervisor model to use a sane abstraction of
> interrupts, then please emulate lapic, io_apic and everything else
> _OUTSIDE_ of the kernel.
>
We faithfully emulate lapic, io_apic, the pit, pic, and a normal
interrupt subsystem. We can't magically stop using these things because
we have to support traditional full virtualization. Which means any
version of Linux, virtual interrupt controller or not, is going to boot
up, find these things, and try to use them. So for a paravirt kernel,
either we have to disable each of these things in the Linux code or just
re-use them.
So we re-use them. We don't even change their semantics. Where we get
into trouble is the fact that only the lapic can deliver per-cpu timer
IRQs, and we need to provide a better time abstraction than TSC. So we
need a time device, but there is no way to implement it in the
traditional hardware model.
And I ask again for your feedback on which approach you think is correct:
1) Rewrite the interrupt subsystem of our hypervisor, making it
incompatible with full virtualization, so that we can support an
abstract interrupt controller with a "clean" interface
2) Reuse the same method that HPET, PIT and other time clients in i386
use - the global_clock_event pointer which allows you to wrest control
back from the APIC and reuse the lapic_events local clockevents.
3) Create a new low level interrupt handler for the per-cpu VMI timer
IRQs instead of re-using the APIC handler
4) Use the irq APIs to allocate IRQ-0 as a percpu IRQ, then change the
IO-APIC code so it can know not to convert this PIC IRQ into a IO-APIC
edge IRQ.
5) Disable the io-apic code entirely in paravirt mode. Rather than
change it, merge a parallel copy of it into the VMI code
so that we can use the 99% of the code we need, with the one bugfix for
#4 above
6) Disable the apic code entirely in paravirt mode. Rather than change
it, merge a parallel copy of into the VMI code so that we can use the
90% of the code we need, with changes to the LVT0 timer handling.
7) For SMP only, allocate a non-shared IO-APIC IRQ, then after the
IO-APIC is initialized, magically switch this to a percpu handler and
start delivering local timer interrupts via this IRQ.
8) Create a pie-in-the-sky single interrupt source, reserve an IDT
vector for it (or steal the lapic timer slot), and use the irq apis to
set it up to be handled as a per-cpu interrupt. This actually sounds
pretty good, to me. The only problem is we will need to switch the
timer IRQ from IRQ 0 to this vector when the APIC is initialized, but I
think we already have all the machinery we need to handle that.
9) ???
This is a serious question, I would appreciate a serious response
instead of snide comments about the crappiness of our interface and our
code. Which do help a little, because by process of elimination, we
can rule out the approaches you don't like. But it would be more
productive if we could carry on a traditional dialogue and I could just
ask a question and you could answer and vice versa.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists