[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ACB0833.2050203@redhat.com>
Date: Tue, 06 Oct 2009 11:04:51 +0200
From: Avi Kivity <avi@...hat.com>
To: Jeremy Fitzhardinge <jeremy.fitzhardinge@...rix.com>
CC: Xen-devel <xen-devel@...ts.xensource.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
kurt.hackel@...cle.com,
Dan Magenheimer <dan.magenheimer@...cle.com>,
Keir Fraser <keir.fraser@...citrix.com>,
Glauber de Oliveira Costa <gcosta@...hat.com>,
Zach Brown <zach.brown@...cle.com>,
the arch/x86 maintainers <x86@...nel.org>,
Chris Mason <chris.mason@...cle.com>
Subject: Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation
On 10/06/2009 02:50 AM, Jeremy Fitzhardinge wrote:
> This patch allows the pvclock mechanism to be used in usermode. To
> do this, we map an extra page into usermode containing an array of
> pvclock_vcpu_time_info structures which give the information required
> to compute a global system clock from the tsc. With this, we can
> implement pvclock_clocksource_vread().
>
> One complication is that usermode is subject to two levels of scheduling:
> kernel scheduling of tasks onto vcpus, and hypervisor scheduling of
> vcpus onto pcpus. In either case the underlying pcpu changed, and with
> it, the correct set of parameters to compute tsc->system clock. To
> address this we install a preempt notifier on sched_out to increment
> that vcpu's version number. Usermode can then check the version number
> is unchanged while computing the time and retry if it has (the only
> difference from the kernel's version of the algorithm is that the vcpu
> may have changed, so we may need to switch pvclock_vcpu_time_info
> structures.
>
> To use this feature, hypervisor-specific code is required
> to call pvclock_init_vsyscall(), and if successful:
> - cause the pvclock_vcpu_time_info structure at
> pvclock_get_vsyscall_time_info(cpu) to be updated appropriately for
> each vcpu.
> - use pvclock_clocksource_vread as the implementation of clocksource
> .vread.
>
> +
> +cycle_t __vsyscall_fn pvclock_clocksource_vread(void)
> +{
> + const struct pvclock_vcpu_time_info *pvti_base;
> + const struct pvclock_vcpu_time_info *pvti;
> + cycle_t ret;
> + u32 version;
> +
> + pvti_base = (struct pvclock_vcpu_time_info *)fix_to_virt(FIX_PVCLOCK_TIME_INFO);
> +
> + /*
> + * When looping to get a consistent (time-info, tsc) pair, we
> + * also need to deal with the possibility we can switch vcpus,
> + * so make sure we always re-fetch time-info for the current vcpu.
> + */
> + do {
> + unsigned cpu;
> +
> + vgetcpu(&cpu, NULL, NULL);
> + pvti =&pvti_base[cpu];
> +
> + version = __pvclock_read_cycles(pvti,&ret);
> + } while (unlikely(pvti->version != version));
> +
> + return ret;
> +}
>
Instead of using vgetcpu() and rdtsc() independently, you can use rdtscp
to read both atomically. This removes the need for the preempt notifier.
> +
> +/*
> + * Initialize the generic pvclock vsyscall state. This will allocate
> + * a/some page(s) for the per-vcpu pvclock information, set up a
> + * fixmap mapping for the page(s)
> + */
> +int __init pvclock_init_vsyscall(void)
> +{
> + int cpu;
> +
> + /* Just one page for now */
> + if (nr_cpu_ids * sizeof(struct vcpu_time_info)> PAGE_SIZE) {
> + printk(KERN_WARNING "pvclock_vsyscall: too many CPUs to fit time_info into a single page\n");
> + return -ENOSPC;
> + }
> +
> + pvclock_vsyscall_time_info =
> + (struct pvclock_vcpu_time_info *)get_zeroed_page(GFP_KERNEL);
> + if (pvclock_vsyscall_time_info == NULL)
> + return -ENOMEM;
> +
>
Need to align the vcpu_time_infos on a cacheline boundary.
> + for (cpu = 0; cpu< nr_cpu_ids; cpu++)
> + pvclock_vsyscall_time_info[cpu].version = ~0;
> +
> + __set_fixmap(FIX_PVCLOCK_TIME_INFO, __pa(pvclock_vsyscall_time_info),
> + PAGE_KERNEL_VSYSCALL);
> +
> + preempt_notifier_init(&pvclock_vsyscall_notifier,
> + &pvclock_vsyscall_preempt_ops);
> + preempt_notifier_register(&pvclock_vsyscall_notifier);
> +
>
preempt notifiers are per-thread, not global, and will upset the cycle
counters. I'd drop them and use rdtscp instead (and give up if the
processor doesn't support it).
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists