[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200611140305.00383.ak@suse.de>
Date: Tue, 14 Nov 2006 03:05:00 +0100
From: Andi Kleen <ak@...e.de>
To: Suleiman Souhlal <ssouhlal@...ebsd.org>
Cc: Linux Kernel ML <linux-kernel@...r.kernel.org>, vojtech@...e.cz,
Jiri Bohac <jbohac@...e.cz>
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().
On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote:
> This is done by a per-cpu vxtime structure that stores the last TSC and HPET
> values.
>
> Whenever we switch to a userland process after a HLT instruction has been
> executed or after the CPU frequency has changed, we force a new read of the
> TSC, HPET and xtime so that we know the correct frequency we have to deal
> with.
Hmm, interesting approach.
>
> With this, we can safely use RDTSC in gettimeofday() in CPUs where the
> TSCs are not synchronized, such as Opterons, instead of doing a very expensive
> HPET read.
> + cpu = hard_smp_processor_id();
Why not smp_processor_id ?
> +
> + apicid = hard_smp_processor_id();
The apicid <-> linux cpuid mapping should be 1:1 so i don't know
why you do this separately. All the uses of hard_smp_processor_id
are bogus.
> +
> + /*
> + * We will need to also do this when switching to kernel tasks if we
> + * want to use the per-cpu monotonic_clock in the kernel
> + */
> + if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
> + vxtime_update_pcpu();
Why? It should be really only needed on HLT/cpufreq change, or?
Anyways, I'm not very fond of adding code to the context switch critical
path.
> seq = read_seqbegin(&xtime_lock);
> + preempt_disable();
> + cpu = hard_smp_processor_id();
Again, shouldn't use hard
>
> - sec = xtime.tv_sec;
> - usec = xtime.tv_nsec / NSEC_PER_USEC;
> + sec = vxtime.pcpu[cpu].tv_sec;
> + usec = vxtime.pcpu[cpu].tv_usec;
>
> /* i386 does some correction here to keep the clock
> monotonous even when ntpd is fixing drift.
> @@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
> be found. Note when you fix it here you need to do the same
> in arch/x86_64/kernel/vsyscall.c and export all needed
> variables in vmlinux.lds. -AK */
> - usec += do_gettimeoffset();
> + t = get_cycles_sync();
> + x = (((t - vxtime.pcpu[cpu].last_tsc) *
> + vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> + usec += x;
>
> } while (read_seqretry(&xtime_lock, seq));
> + preempt_enable();
If it can be implemented without races in user space, why does
it need preempt disable in kernel space?
The faster way to access all the vxtime code would be to stick
a pointer to the per CPU vxtime data into the PDA
> +#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
> +extern unsigned long hpet_tick;
> +
> +extern unsigned long vxtime_hz;
No externs in .c files. Happened earlier too I think
> +
> static __always_inline void timeval_normalize(struct timeval * tv)
> {
> time_t __sec;
> @@ -57,35 +67,107 @@ static __always_inline void timeval_norm
> }
> }
>
> +inline int apicid(void)
> +{
> + int cpu;
> +
> + __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
> + return (cpu >> 24);
The faster way to do this is to use LSL from a magic GDT entry.
> +}
> +
> static __always_inline void do_vgettimeofday(struct timeval * tv)
> {
> long sequence, t;
> unsigned long sec, usec;
> + int cpu;
>
> do {
> sequence = read_seqbegin(&__xtime_lock);
> -
> - sec = __xtime.tv_sec;
> - usec = __xtime.tv_nsec / 1000;
> -
> - if (__vxtime.mode != VXTIME_HPET) {
> - t = get_cycles_sync();
> - if (t < __vxtime.last_tsc)
> - t = __vxtime.last_tsc;
> - usec += ((t - __vxtime.last_tsc) *
> - __vxtime.tsc_quot) >> 32;
> - /* See comment in x86_64 do_gettimeofday. */
> - } else {
> - usec += ((readl((void __iomem *)
> - fix_to_virt(VSYSCALL_HPET) + 0xf0) -
> - __vxtime.last) * __vxtime.quot) >> 32;
> - }
> - } while (read_seqretry(&__xtime_lock, sequence));
> + cpu = apicid();
> +
> + sec = __vxtime.pcpu[cpu].tv_sec;
> + usec = __vxtime.pcpu[cpu].tv_usec;
> + rdtscll(t);
> +
> + usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
> + __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> + } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);
Hmm, isn't there still a (unlikely, but possible) race:
CPU #0 CPU #1
read cpu number
switch to CPU #1
read TSC
switch back to CPU #0
check cpu number
check succeeds, but you got wrong TSC data
How do you prevent that? I suppose you could force the seqlock to retry
in this case in the context switch, but I don't think you do that?
Also I'm not sure what your context switch code is good for.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists