[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45EF2FB5.1080604@vmware.com>
Date: Wed, 07 Mar 2007 13:33:41 -0800
From: Dan Hecht <dhecht@...are.com>
To: tglx@...utronix.de
Cc: Jeremy Fitzhardinge <jeremy@...p.org>,
James Morris <jmorris@...ei.org>,
Virtualization Mailing List <virtualization@...ts.osdl.org>,
akpm@...ux-foundation.org, john stultz <johnstul@...ibm.com>,
Ingo Molnar <mingo@...e.hu>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree
On 03/07/2007 01:21 PM, Thomas Gleixner wrote:
> On Wed, 2007-03-07 at 11:49 -0800, Dan Hecht wrote:
>> Jeremy, I saw you sent out the Xen version earlier, thanks. Here's ours
>> for reference (please excuse any formating issues); it's also lean.
>> We'll send out a proper patch later after some more testing:
>
> Ah. Bitching loud enough speeds things up. :)
>
We've always planned to do this. We just didn't want to create the
dependency between paravirt_ops and clockevents too early such that they
would depend on each other to merge to main line. Now that they are
both there, we are all for it.
>> /** vmi clockevent */
>>
>> static struct clock_event_device vmi_global_clockevent;
>>
>> static inline u32 vmi_alarm_wiring(struct clock_event_device *evt)
>> {
>> return (evt == &vmi_global_clockevent) ?
>> VMI_ALARM_WIRED_IRQ0 : VMI_ALARM_WIRED_LVTT;
>> }
>>
>> static void vmi_timer_set_mode(enum clock_event_mode mode,
>> struct clock_event_device *evt)
>> {
>> u32 wiring;
>> cycle_t now, cycles_per_hz;
>> BUG_ON(!irqs_disabled());
>>
>> wiring = vmi_alarm_wiring(evt);
>> if (wiring == VMI_ALARM_WIRED_LVTT)
>> /* Route the interrupt to the correct vector */
>> apic_write_around(APIC_LVTT, LOCAL_TIMER_VECTOR);
>
> Wire that in the hypervisor.
>
>> switch (mode) {
>> case CLOCK_EVT_MODE_ONESHOT:
>> break;
>> case CLOCK_EVT_MODE_PERIODIC:
>> cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
>> (void)do_div(cycles_per_hz, HZ);
>> now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
>> vmi_timer_ops.set_alarm(wiring | VMI_PERIODIC,
>> now, cycles_per_hz);
>
> paravirt_ops->paravirt_clockevent->set_periodic(vcpu, period);
>
Huh? paravirt_ops isn't a hypervisor interface, it's just a linux code
abstraction. The code on both sides of paravirt_ops is *linux* code,
any way you cut it. clockevents is already a linux code abstraction.
why introduce the redundancy?
>> break;
>> case CLOCK_EVT_MODE_UNUSED:
>> case CLOCK_EVT_MODE_SHUTDOWN:
>
> paravirt_ops->paravirt_clockevent->stop_event(vcpu, mode);
>
You would be introducing the same redundancy.
>
>> switch (evt->mode) {
>> case CLOCK_EVT_MODE_ONESHOT:
>> vmi_timer_ops.cancel_alarm(VMI_ONESHOT);
>> break;
>> case CLOCK_EVT_MODE_PERIODIC:
>> vmi_timer_ops.cancel_alarm(VMI_PERIODIC);
>> break;
>> default:
>> break;
>> }
>> break;
>> default:
>> break;
>> }
>> }
>
> This whole vmi_timer_ops thing is horrible. All hypervisors can share
> paravirt_ops->paravirt_clockevent and retrieve the methods on boot.
>
vmi_timer_ops.whatever is where the kernel <-> hypervisor boundary is
crossed for VMI.
>> static int vmi_timer_next_event(unsigned long delta,
>> struct clock_event_device *evt)
>> {
>> /* Unfortunately, set_next_event interface only passes relative
>> * expiry, but we want absolute expiry. It'd be better if were
>> * were passed an aboslute expiry, since a bunch of time may
>> * have been stolen between the time the delta is computed and
>> * when we set the alarm below. */
>> cycle_t now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_ONESHOT));
>>
>> BUG_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT);
>> vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT,
>> now + delta, 0);
>> return 0;
>> }
>
> Great. Now we have:
>
> s64 event = startup_offset + ktime_to_ns(evt->next_event);
>
> if (HYPERVISOR_set_timer_op(event) < 0)
> BUG();
> and
>
> vmi_timer_ops.set_alarm(vmi_alarm_wiring(evt) | VMI_ONESHOT, now + delta, 0);
>
> How will the next implementations look like ?
>
> lguest_program_timer(delta + lguest_current_time(), LGUEST_TIMER_SHOOT_ONCE);
>
> virt_nextgen_ops.set_timer_event(delta, NO_WE_NEED_NO_FLAGS);
>
> .......
>
> This is tinkering of the best. My understanding of the paravirt
> discussion at Kernel Summit was, that paravirt ops are exactly there to
> prevent the above random hackery in the kernel and to allow _ALL_
> hypervisors to interact via a sane interface inside of the kernel.
>
No, that was not the point of paravirt_ops. It is actually the complete
opposite of the intention of paravirt_ops. paravirt_ops' intent is
exactly to allow for *multiple* hypervisor ABIs to exist in the kernel.
At kernel summit, paravirt_ops was proposed to allow for multiple
hypervisor ABI's to be targeted by the kernel. The code on both sides
of paravirt_ops is *linux* code.
> You are just perverting the whole idea of a standartized
> paravirtualization interface.
>
> This things can be done for clocksources, clockevents, interrupts (the
> generic irq code allows this) and probaly for a whole bunch of other
> stuff.
>
> The current paravirt interface is completely insane and will explode
> into an unmaintainable nightmare within no time, if we keep accepting
> that crap further.
>
> No thanks.
>
Again, you are misunderstanding the intent of paravirt_ops and history
behind it's development.
>> #ifdef CONFIG_X86_LOCAL_APIC
>>
>> /* Replacement for lapic timer local clock event.
>> * paravirt_ops.setup_boot_clock = vmi_nop
>> * (continue using global_clock_event on cpu0)
>> * paravirt_ops.setup_secondary_clock = vmi_timer_setup_local_alarm
>> */
>> void __devinit vmi_timer_setup_local_alarm(void)
>> {
>> struct clock_event_device *evt = &__get_cpu_var(local_clock_events);
>>
>> /* Then, start it back up as a local clockevent device. */
>> memcpy(evt, &vmi_clockevent, sizeof(*evt));
>> evt->cpumask = cpumask_of_cpu(smp_processor_id());
>>
>> printk(KERN_WARNING "vmi: registering clock event %s. mult=%lu
>> shift=%u\n",
>> evt->name, evt->mult, evt->shift);
>> clockevents_register_device(evt);
>> }
>>
>> #endif
>
> Why the hell do you need an lapic emulator here? This is exactly the
> kind of crap, we do not want to have. clockevents do not care which
> piece of hardware is calling them and we do not care how a particular
> hypervisor is wiring that hardware.
>
Again, I said in a previous mail that we am fine with introducing our
own interrupt handler rather than using the lapic one.
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists