linux-kernel - Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ACB0833.2050203@redhat.com>
Date:	Tue, 06 Oct 2009 11:04:51 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Jeremy Fitzhardinge <jeremy.fitzhardinge@...rix.com>
CC:	Xen-devel <xen-devel@...ts.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	kurt.hackel@...cle.com,
	Dan Magenheimer <dan.magenheimer@...cle.com>,
	Keir Fraser <keir.fraser@...citrix.com>,
	Glauber de Oliveira Costa <gcosta@...hat.com>,
	Zach Brown <zach.brown@...cle.com>,
	the arch/x86 maintainers <x86@...nel.org>,
	Chris Mason <chris.mason@...cle.com>
Subject: Re: [PATCH 3/5] x86/pvclock: add vsyscall implementation

On 10/06/2009 02:50 AM, Jeremy Fitzhardinge wrote:
> This patch allows the pvclock mechanism to be used in usermode.  To
> do this, we map an extra page into usermode containing an array of
> pvclock_vcpu_time_info structures which give the information required
> to compute a global system clock from the tsc.  With this, we can
> implement pvclock_clocksource_vread().
>
> One complication is that usermode is subject to two levels of scheduling:
> kernel scheduling of tasks onto vcpus, and hypervisor scheduling of
> vcpus onto pcpus.  In either case the underlying pcpu changed, and with
> it, the correct set of parameters to compute tsc->system clock.  To
> address this we install a preempt notifier on sched_out to increment
> that vcpu's version number.  Usermode can then check the version number
> is unchanged while computing the time and retry if it has (the only
> difference from the kernel's version of the algorithm is that the vcpu
> may have changed, so we may need to switch pvclock_vcpu_time_info
> structures.
>
> To use this feature, hypervisor-specific code is required
> to call pvclock_init_vsyscall(), and if successful:
>   - cause the pvclock_vcpu_time_info structure at
>     pvclock_get_vsyscall_time_info(cpu) to be updated appropriately for
>     each vcpu.
>   - use pvclock_clocksource_vread as the implementation of clocksource
>     .vread.
>
> +
> +cycle_t __vsyscall_fn pvclock_clocksource_vread(void)
> +{
> +	const struct pvclock_vcpu_time_info *pvti_base;
> +	const struct pvclock_vcpu_time_info *pvti;
> +	cycle_t ret;
> +	u32 version;
> +
> +	pvti_base = (struct pvclock_vcpu_time_info *)fix_to_virt(FIX_PVCLOCK_TIME_INFO);
> +
> +	/*
> +	 * When looping to get a consistent (time-info, tsc) pair, we
> +	 * also need to deal with the possibility we can switch vcpus,
> +	 * so make sure we always re-fetch time-info for the current vcpu.
> +	 */
> +	do {
> +		unsigned cpu;
> +
> +		vgetcpu(&cpu, NULL, NULL);
> +		pvti =&pvti_base[cpu];
> +
> +		version = __pvclock_read_cycles(pvti,&ret);
> +	} while (unlikely(pvti->version != version));
> +
> +	return ret;
> +}
>    

Instead of using vgetcpu() and rdtsc() independently, you can use rdtscp 
to read both atomically.  This removes the need for the preempt notifier.

> +
> +/*
> + * Initialize the generic pvclock vsyscall state.  This will allocate
> + * a/some page(s) for the per-vcpu pvclock information, set up a
> + * fixmap mapping for the page(s)
> + */
> +int __init pvclock_init_vsyscall(void)
> +{
> +	int cpu;
> +
> +	/* Just one page for now */
> +	if (nr_cpu_ids * sizeof(struct vcpu_time_info)>  PAGE_SIZE) {
> +		printk(KERN_WARNING "pvclock_vsyscall: too many CPUs to fit time_info into a single page\n");
> +		return -ENOSPC;
> +	}
> +
> +	pvclock_vsyscall_time_info =
> +		(struct pvclock_vcpu_time_info *)get_zeroed_page(GFP_KERNEL);
> +	if (pvclock_vsyscall_time_info == NULL)
> +		return -ENOMEM;
> +
>    

Need to align the vcpu_time_infos on a cacheline boundary.

> +	for (cpu = 0; cpu<  nr_cpu_ids; cpu++)
> +		pvclock_vsyscall_time_info[cpu].version = ~0;
> +
> +	__set_fixmap(FIX_PVCLOCK_TIME_INFO, __pa(pvclock_vsyscall_time_info),
> +		     PAGE_KERNEL_VSYSCALL);
> +
> +	preempt_notifier_init(&pvclock_vsyscall_notifier,
> +			&pvclock_vsyscall_preempt_ops);
> +	preempt_notifier_register(&pvclock_vsyscall_notifier);
> +
>    

preempt notifiers are per-thread, not global, and will upset the cycle 
counters.  I'd drop them and use rdtscp instead (and give up if the 
processor doesn't support it).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/