[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <461C084C.9020009@vmware.com>
Date: Tue, 10 Apr 2007 14:57:32 -0700
From: Zachary Amsden <zach@...are.com>
To: Chris Wright <chrisw@...s-sol.org>
CC: Andrew Morton <akpm@...l.org>, Andi Kleen <ak@....de>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Virtualization Mailing List <virtualization@...ts.osdl.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Dan Hecht <dhecht@...are.com>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 9/10] Vmi timer update.patch
Chris Wright wrote:
> * Zachary Amsden (zach@...are.com) wrote:
>
>>>> +void __init vmi_time_init(void)
>>>> +{
>>>> + /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
>>>> + outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
>>>>
>>> That shouldn't be necessary using clockevents.
>>>
>> Actually, I'm not so sure. If clockevents simply masks the PIT when
>> disabling it, we still have overhead of keeping the latch in sync, which
>> requires a timer at the PIT frequency. I can instrument to see how
>> exactly the PIT gets disabled.
>>
>
> It should switch from pit to vmi-timer, and the switch should do the state
> transistions on pit to go to unused mode.
>
Yes, but unfortunately that is a nop:
/*
* Avoid unnecessary state transitions, as it confuses
* Geode / Cyrix based boxen.
*/
case CLOCK_EVT_MODE_SHUTDOWN:
if (evt->mode == CLOCK_EVT_MODE_UNUSED)
break;
case CLOCK_EVT_MODE_UNUSED:
if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
break;
case CLOCK_EVT_MODE_ONESHOT:
/* One shot setup */
outb_p(0x38, PIT_MODE);
So switching from PIT to VMI does not disable PIT timer interrupts.
Thus I have to keep this part of the patch.
>>> isn't this just your ->startup?
>>>
>> Which structure has a ->startup function we can use? Sorry if this
>> seems ignorant, I'm not quite sure what you mean.
>>
>
> The irq_chip. IOW, it looks like a liberal sprinkling of LVTT vector
> initialization.
>
Ahh, ok.
>
>>>> + local_irq_enable();
>>>> + clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
>>>>
>>> ...and resume? Instead of letting clockevents core handle all of that,
>>> and just registering right here?
>>>
>> It wasn't clear that clockevents would issue a resume notify for us; if
>> so we could handle this setup in the callback, but it has to be done on
>> the correct CPU. I can try it and see if that works.
>>
>
> I would've expected to simply register the clockevents device right here,
> and that should do the proper state transitions on the old device, as well
> as the new device. Why do you need resume notify?
I was confused. The problem is, time init is also highly confused on
i386; the system comes up running on the PIT, then switches to the
APIC. This changes when the APIC gets activated, which is why we
suspend and resume clockevents while making the irq wiring changes for
the switch to APIC mode. Internally, clockevents does the proper state
transition for us on VMI clock events - they get suspended properly,
subject to all the proper precautions needed to avoid spurious interrupt
that are pending in hardware, then they get resumed properly without us
having to worry about missed interrupts or failure to restart the clock.
But to get the clockevents to do state transitions for us without this
explicit suspend / resume, we would need to model two clockevents; a
clockevent running through the PIT, and a higher priority clockevent
running through the APIC. Then the logic could conceivably switch over
for us, but there is an unavoidable race, since we are using the same
IRQ in each case - hardware IRQ-0. And an IRQ can only have one irq
chip, so we can't put the code to switch to the new irq chip in the
->startup for that chip, since it will never get called unless we set
the chip before shutting down the old irq chip (through the PIT), which
means the old PIT irq never gets shut down.
We can't workaround it however without one of these options:
1) stealing a different IRQ and knowing the fixed 1-1 IRQ<->IDT vector
mapping for it. We could conceivably hijack any number of IRQs, in any
order by reserving them during VMI platform initialization, but due to
the non-linearity with which IRQs are re-mapped to IDT vectors when the
IO-APIC is activated, I though it simpler to just continue using IRQ-0,
as this is linearly mapped by constants (instead of offset by 8,
skipping some vectors and wrapping around).
2) Reusing the local timer IDT vector, since APIC won't be using it.
Reset the IDT handler to point to our own handler. We used to do this,
providing smp_vmi_timer_interrupt and entry.S assembler code for our own
low-level interrupt handlers. There were objections to our adding
low-level interrupt handling code instead of using the proper genirq
infrastructure.
3) Reuse the local timer IDT vector as our fixed IDT vector. We must
reset the IDT vector for this to point to a do_IRQ style handler which
enters the genirq code. So, reserve a non-zero platform IRQ and set the
IDT vector for local timer to vector to interrupt[IRQ]. Now, set the
irq chip for this IRQ to be a per-cpu handler and give it the VMI
interrupt chip.
I can't really think of anything else that is feasible. And keep in
mind, all of these options require modeling the VMI timer as two
separate clockevents. Is it worth the complexity to avoid the suspend /
resume? These hacks seem even less palatable, to me, while suspend /
resume of clock event scheduling seems to be a well defined operation.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists