[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1224546444.7092.58.camel@localhost.localdomain>
Date: Mon, 20 Oct 2008 16:47:23 -0700
From: john stultz <johnstul@...ibm.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
"Luck, Tony" <tony.luck@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Thomas Gleixner <tglx@...utronix.de>,
David Miller <davem@...emloft.net>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [RFC patch 15/15] LTTng timestamp x86
On Mon, 2008-10-20 at 15:06 -0700, Linus Torvalds wrote:
>
> On Mon, 20 Oct 2008, john stultz wrote:
> >
> > I'm not quite sure I followed your per-cpu xtime thoughts. Could you
> > explain further your thinking as to why the entire timekeeping
> > subsystem should be per-cpu instead of just keeping that back in the
> > arch-specific clocksource implementation? In other words, why keep
> > things synced at the nanosecond level instead of keeping the per-cpu
> > TSC synched at the cycle level?
>
> I don't think you can kep them sync'ed without taking frequency drift into
> account. When you have multiple boards (ie big boxes), they simply _will_
> be in different clock domains. They won't have the exact same frequency.
>
> So the "rewrite the TSC every once in a while" approach (where "after
> coming out of idle" is just a special case of "once in a while" due to
> many CPU's losing TSC in idle) works well in the kind of situation where
> you really only have a single clock domain, and the TSC's are all
> basically from the same reference clock. And that's a common case, but it
> certainly isn't the _only_ case.
>
> What about fundamnetally different frequencies (old TSC's that change with
> cpufreq)? Or what about just subtle different ones (new TSC's but on
> separate sockets that use separate external clocks)?
Ok. Thanks, the clarification about dealing with the multiple frequency
domains helps me understand what you're looking for and why per-cpu time
bases would be needed.
I was assuming that we were just looking at the single frequency domain,
but unsynced TSCs due to idle halting (or maybe just very slight
frequency skew).
<snip>
> Oh, I'm sure you can do hacky things, and work around known issues, and
> consider the TSC to be globally stable in a lot of common schenarios.
> That's what you get by re-syncing after idle etc. And it's going to work
> in a lot of situations.
Yea, and indeed this is path we've been on, because folks have had quite
a bit of difficulty getting the single freq domain solution working. So
small hacks have been added over time, hoping to get there for just one
freq.
> But it's not going to solve the "hey, I have 512 CPU's, they are all on
> different boards, and no, they are _not_ synchronized to one global
> clock!".
Yep. And for now we dodge that by pushing to use an stable global
clocksource like HPET for these cases, at the cost of performance.
> That's why I'd suggest making _purely_ local time, and then aiming for
> something NTP-like. But maybe there are better solutions out there.
The difficulty with NTP-like, is distributed systems tend to expect
slight deltas between machines. Userland gettimeofday() users do not
expect detectable skew between cpus.
Getting that last part right without those "at least don't go backwards"
hacks is hard.
I'll keep thinking about it.
thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists