[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALAqxLXhm3wGVZ0EVe-pOH2i3cq1S3AsROaanJ0xX51tQ3+fHg@mail.gmail.com>
Date: Fri, 26 May 2017 20:58:13 -0700
From: John Stultz <john.stultz@...aro.org>
To: Paul Mackerras <paulus@...abs.org>
Cc: Michael Ellerman <mpe@...erman.id.au>,
lkml <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Miroslav Lichvar <mlichvar@...hat.com>,
Richard Cochran <richardcochran@...il.com>,
Prarit Bhargava <prarit@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>,
Anton Blanchard <anton@...ba.org>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Tony Luck <tony.luck@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>
Subject: Re: [RFC][PATCH] time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
On Thu, May 25, 2017 at 5:03 AM, Paul Mackerras <paulus@...abs.org> wrote:
> On Mon, May 22, 2017 at 12:06:04PM -0700, John Stultz wrote:
>>
>> Basically long ago, timekeeping was handled (roughly) like:
>>
>> clock_gettime():
>> now = tk->clock->read()
>> offset_ns = ((now - tk->cycle_last) * tk->clock->mult) >> tk->clock->shift;
>> return timespec_add_ns(tk->xtime, offset_ns);
>>
>> But since for error handling use sub-ns precision, combined with that
>> for update performance, we accumulate in fixed intervals, there are
>> situations where in the update, we could accumulate half of a
>> nanosecond into the base tk->xtime value and leaving half of a
>> nanosecond in the offset. This caused the split nanosecond to be
>> truncated out by the math, causing 1ns discontinuities.
>>
>> So to address this, we came up with sort of a hack, which when we
>> accumulate rounds up that partial nanosecond, and adds the amount we
>> rounded up to the error (which will cause the freq correction code to
>> slow the clock down slightly). This is the code that is now done in
>> the old_vsyscall_fixup() logic.
>>
>> Unfortunately this fix (which generates up to a nanosecond of error
>> per tick) then made the freq correction code do more work and made it
>> more difficult to have a stable clock.
>>
>> So we went for a more proper fix, which was to properly handle the
>> sub-nanosecond portion of the timekeeping throughout the logic, doing
>> the truncation last.
>>
>> clock_gettime():
>> now = tk->clock->read()
>> ret.tv_sec = tk->xtime_sec;
>> offset_sns = (now - tk->cycle_last) * tk->clock->mult;
>> ret.tv_nsec = (offset_sns + tk->tkr_mono.xtime_nsec) >> tk->clock->shift;
>> return ret;
>>
>> So in the above, we now use the tk->tkr_mono.xtime_nsec (which despite
>> its unfortunate name, stores the accumulated shifted nanoseconds), and
>> add it to the (current_cycle_delta * clock->mult), then we do the
>> shift last to preserve as much precision as we can.
>>
>> Unfortunately we need all the reader code to do the same, which wasn't
>> easy to transition in some cases. So we provided the
>> CONFIG_GENERIC_TIME_VSYSCALL_OLD option to preserve the old round-up
>> behavior while arch maintainers could make the transition.
>
> The VDSO code on PPC computes the offset in units of 2^-32 seconds,
> not nanoseconds, because that makes it easy to handle the split of the
> offset into whole seconds and fractional seconds (which is handled in
> the generic code by the slightly icky __iter_div_u64_rem function),
> and also means that we can use PPC's instruction that computes
> (a * b) >> 32 to convert the fractional part to either nanoseconds or
> microseconds without doing a division.
>
> I could pretty easily change the computations done at update_vsyscall
> time to convert the tk->tkr_mono.xtime_nsec value to units of 2^-32
> seconds for use by the VDSO. That would mean we would no longer need
> CONFIG_GENERIC_TIME_VSYSCALL_OLD, and would give us values returned by
> the VDSO gettimeofday() and clock_gettime() that should be within
> about 1/4 ns of what the generic code in the kernel would give (on
> average, I mean, given that the results have at best nanosecond
> resolution). Since that corresponds to about 1 CPU clock cycle, it
> seems like it should be good enough.
>
> Alternatively I could make the VDSO computations use a smaller unit
> (maybe 2^-36 or 2^-40 seconds), or else rewrite them to use exactly
> the same algorithm as the generic code - which would be a bigger
> change, and would mean having to do an iterative division.
>
> So, do you think the 1/4 ns resolution is good enough for the VDSO
> computations?
Sorry not to have gotten back to you yesterday on this! So yea, the
above sounds reasonable to me. We only return ns resolution to
userspace, so I don't believe one would be able to make a normal
syscall that calculates time and then make a gettime call within a
1/4th of a nanosecond.
But, I can't say I've ever really groked the ppc logic, so take my
opinion with a grain of salt. :)
thanks
-john
Powered by blists - more mailing lists