[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070227035456.GA15444@Krystal>
Date: Mon, 26 Feb 2007 22:54:56 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Daniel Walker <dwalker@...sta.com>
Cc: mbligh@...gle.com, linux-kernel@...r.kernel.org,
johnstul@...ibm.com, mingo@...e.hu
Subject: Re: [RFC] Fast assurate clock readable from user space and NMI handler
* Daniel Walker (dwalker@...sta.com) wrote:
> On Mon, 2007-02-26 at 17:14 -0500, Mathieu Desnoyers wrote:
>
>
> > For kernel and user space tracing, those small jumps are very annoying :
> > it can show, in a trace, that a fork() appears on a CPU after the first
> > schedule() of the thread on the other CPU : scheduling causality relationship
> > can become very hard to follow. This is only a sample case. Inaccuracy and
> > periodical modification of the clock time (non monotonic) can cause important
> > inaccuracy in performance tests, even on UP systems. A monotonic clock,
> > accessible from anywhere in kernel space (including NMI handler) and
> > from user space is very useful for performance analysis and, more
> > generally, for timestamping data in per cpu buffers so it can be later
> > reordered correctly.
>
> What about adding a layer below do_gettimeofday() which just scheds the
> adjustment process? That might be reasonable .. The NMI, and userspace
> cases aren't very compelling right now, at least I'm not convinced a
> whole new timing interface is needed ..
>
> The latency tracing system in the -rt branch modifies the gettimeofday
> facilities , I'm not sure of the correctness of it but it gets called
> from anyplace in the kernel including NMI's .
>
> Here's the function,
>
> cycle_t notrace get_monotonic_cycles(void)
> {
> cycle_t cycle_now, cycle_delta;
>
> /* read clocksource: */
> cycle_now = clocksource_read(clock);
>
> /* calculate the delta since the last update_wall_time: */
> cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
>
> return clock->cycle_last + cycle_delta;
> }
>
> That looks safe. When converting this to nanoseconds you would still get
> the time adjustments but it would be all at once instead of in little
> increments ..
>
ouch... if the clocksource used is the PIT on x86 :
static cycle_t pit_read(void)
{
unsigned long flags;
int count;
u32 jifs;
static int old_count;
static u32 old_jifs;
spin_lock_irqsave(&i8253_lock, flags);
If an NMI nests over the spinlock, we have a deadlock.
In addition, clock->cycle_last is a cycle_t, defined as a 64 bits on
x86. If is therefore not updated atomically by change_clocksource,
timekeeping_init, timekeeping_resume and update_wall_time. If an NMI
fires right on top of the update, especially around the 32 bits wrap
around, the time will be really fuzzy.
Mathieu
> Daniel
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists