linux-kernel - Re: TSC to Mono-raw Drift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181102102613.GL19434@localhost>
Date:   Fri, 2 Nov 2018 11:26:13 +0100
From:   Miroslav Lichvar <mlichvar@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     John Stultz <john.stultz@...aro.org>,
        Christopher Hall <christopher.s.hall@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        linux-rt-users <linux-rt-users@...r.kernel.org>,
        jesus.sanchez-palencia@...el.com,
        Gavin Hindman <gavin.hindman@...el.com>,
        liam.r.girdwood@...el.com, Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: TSC to Mono-raw Drift

On Thu, Nov 01, 2018 at 06:41:00PM +0100, Thomas Gleixner wrote:
> On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> > The error is too large to be corrected by stepping on clock updates.
> > For a typical TSC frequency we have multiplier in the range of few
> > millions, so that's a frequency error of up to few hundred ppb. In the
> > old days when the clock was updated 1000 times per second that would
> > be hidden in the resolution of the clock, but now with tickless
> > kernels those steps would be very noticeable.

> That only happens when the system was completely idle for a second and in
> that case it's a non issue because the clock is updated before it's
> used. So nothing will be able to observe the time jumping forward by a few
> or even a few hundreds of nanoseconds.

That's great news (to me). I think we should do the same with the
mono/real clock. A periodic 4ns step would be better than a slew
correcting tens or hundreds of nanoseconds. This would be a
significant improvement in accuracy on idle systems, in theory
identical to running with nohz=off.

Maybe I am missing some important detail, but I think we can just drop
the +1 mult adjustment and step on each update by the (truncated)
amount that has accumulated in the NTP error register. With the
changes that have been made earlier this year the clock should never
be ahead, so the step would always be forward.

> For the regular case, where CPUs are
> busy and the update happens 100/250/1000 times per second the jump forward
> will not be noticable at all.

I think a 4ns jump at 100 Hz might be noticeable with a good reference
clock and large number of measurements, but so would be the current +1
mult adjustment.

-- 
Miroslav Lichvar