lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLVXiV9VuJtSAH_zS5UERGopnHNoxw=C33NhCtW69vBbpw@mail.gmail.com>
Date:	Fri, 2 Oct 2015 13:25:46 -0700
From:	John Stultz <john.stultz@...aro.org>
To:	Miroslav Lichvar <mlichvar@...hat.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Nuno Gonçalves <nunojpg@...il.com>,
	Prarit Bhargava <prarit@...hat.com>,
	Richard Cochran <richardcochran@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Shuah Khan <shuahkh@....samsung.com>
Subject: Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate
 adjtimex() tick adjustments

On Mon, Sep 14, 2015 at 7:48 AM, Miroslav Lichvar <mlichvar@...hat.com> wrote:
> On Thu, Sep 10, 2015 at 11:14:25AM -0700, John Stultz wrote:
>> On Thu, Sep 10, 2015 at 10:42 AM, John Stultz <john.stultz@...aro.org> wrote:
>> > On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@...hat.com> wrote:
>> >> The precision of the clock is better than microsecond, so that
>> >> wouldn't explain a 12 ppm error over the 15 second interval. I guess
>> >> it's due to a larger xtime_remainder, which basically is a hidden
>> >> frequency offset added (and not multiplied) to the NTP frequency
>> >> offset. Would that explain it?
>> >
>> > I think its due to the ntp_error being large enough prior (or during
>> > the freq transition) that we're still applying a single unit freq
>> > adjustment for that error. But I'm guessing on the acpi_pm clocksource
>> > the shift is low enough that a single unit adjustment is coarse enough
>> > to affect the ppm, since I see the same consistently measured ppm
>> > result if I both increase the settling time measurement sleep times.
>> > If I left it for a long long time, the single unit correction would
>> > likely null the error out and we'd get the desired result, but I don't
>> > think the test has time for that.
>
> I ran few tests and it doesn't seem to be a problem with large
> ntp_error or an extremely slow adjustment of the multiplier for the
> new frequency.
>
> I think it really is the xtime_remainder correction. It is a fixed
> offset added to the ntp error on each tick to compensate for the
> cycle_interval rounding error. With the acpi_pm clocksource and 1000Hz
> update rate xtime_remainder is -127 ns, which effectively speeds up
> the clock by 127 ppm. When NTP slows the clock down by 10%, the
> correction is not decreased by 10% and we can observe the clock is
> running faster by 12.7 ppm than expected.

Sorry for taking so long to get back to you here. Had a conference
(and related prep) that pulled me away.

So yea.. I've spend some more time looking at this, and your argument
above looks pretty convincing and my theory didn't prove out (clearing
out the ntp_error value on frequency changes doesn't avoid the issue).

> Is there a cheap way to calculate this?
> xtime_remainder * (ntp_tick >> ntp_error_shift) / NTP_INTERVAL_LENGTH



Hrm.. So
   xtime_remainder = (NTP_INTERVAL_LENGTH <<
tk->tkr_mono.clock->shift) - (tk->cycle_interval *
tk->tkr_mono.clock->mult)

   for simplificiation:

And we want to scale it as you pointed out above (though slightly
fixed here) by:
         (tk->ntp_tick >> tk->ntp_error_shift) / (NTP_INTERVAL_LENGTH
<< tk->tkr_mono.clock->shift)


So this comes  out to:


(tk->ntp_tick ) -  (tk->ntp_tick ) *  (tk->cycle_interval *
tk->tkr_mono.clock->mult) / (NTP_INTERVAL_LENGTH <<
tk->tkr_mono.clock->shift)
tk->ntp_error_shift



would:

xtime_remainder = (tk->ntp_tick >> ntp_error_shift) - tk->xtime_interval

After we've adjusted xtime_interval give us the equivalent?


>> So bumping the fail level to > 100ppm avoids false positives due to
>> long-term error correction with coarse clocksources, but still is
>> tight enough to catch the dampened approximation issue caused by the
>> abs(s64) problem.
>>
>> Any objection to moving to that? It is still a 0.01% error bound.
>
> No objection from me as long as we understand where that error is
> coming from.

Yea. I think we agree. I did find one bug with the test (we can't
clear an existing offset if STA_PLL isn't set), so I'll be
resubmitting it here in a bit.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ