linux-kernel - Re: Time keeping while suspended in the presence of persistent clock drift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <4bb238e1-e8fa-44e6-9f5e-d047d1d4a892@www.fastmail.com>
Date:   Mon, 13 Dec 2021 06:39:43 -0700
From:   "Joel Daniels" <jdaniels@...t.com>
To:     "Thomas Gleixner" <tglx@...utronix.de>,
        "John Stultz" <john.stultz@...aro.org>,
        "Stephen Boyd" <sboyd@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: Time keeping while suspended in the presence of persistent clock drift

Hi Thomas,

On Sat, 11 Dec 2021 14:36 +0100, Thomas Gleixner wrote:
> Can you please verify that the problem persists with NTP enabled and
> synchronized?

Yes, I just verified that the problem still exists while
synchronized to NTP. Specifically, I connected the machine to the
internet and launched chrony with:

    $ sudo chronyd -d -L 0

using this /etc/chrony.conf:

    server time.cloudflare.com iburst
    driftfile /var/lib/chrony/drift
    maxslewrate 500
    rtcsync

I waited for chrony to stabilize:

    $ chronyc tracking && echo && chronyc sources
    [...]
    Ref time (UTC)  : Sun Dec 12 00:18:00 2021
    System time     : 0.000099906 seconds slow of NTP time
    [...]
    Frequency       : 2.096 ppm fast
    Residual freq   : -0.041 ppm
    Skew            : 0.985 ppm
    [...]

    MS Name/IP address         Stratum Poll Reach LastRx Last sample               
    ====================================================================
    ^* time.cloudflare.com           3   6   377    26   -282us[ -424us]

    $ sntp -K/dev/null time.cloudflare.com
    [...]
    2021-12-11 17:18:57.339207 (+0700) +0.001868 +/- 0.012247 [...]

Then I suspended the computer. I woke it 37 hours later and my system
clock was ahead by ~6 seconds:

    $ sntp -K/dev/null time.cloudflare.com
    [...]
    2021-12-13 06:24:28.425752 (+0700) -5.767757 +/- 3.856178 [...]

Chrony, of course, started to slew away the error and at the current
rate my system time will be correct in about 3 hours:

    $ chronyc tracking && echo && chronyc sources
    [...]
    Ref time (UTC)  : Mon Dec 13 13:30:52 2021
    System time     : 5.597892284 seconds fast of NTP time
    [...]
    Frequency       : 2.252 ppm fast
    Residual freq   : +0.603 ppm
    Skew            : 4.259 ppm
    [...]

    MS Name/IP address         Stratum Poll Reach LastRx Last sample               
    ====================================================================
    ^* time.cloudflare.com           3   6   377    15    +80us[ +197us]

This behavior is expected from my reading of the timekeeping_resume
function (kernel/time/timekeeping.c). Specifically these lines:

    read_persistent_clock64(&ts_new);

    ...

    cycle_now = tk_clock_read(&tk->tkr_mono);
    nsec = clocksource_stop_suspend_timing(clock, cycle_now);
    if (nsec > 0) {
            ts_delta = ns_to_timespec64(nsec);
            inject_sleeptime = true;
    } else if (timespec64_compare(&ts_new, &timekeeping_suspend_time) > 0) {
            ts_delta = timespec64_sub(ts_new, timekeeping_suspend_time);
            inject_sleeptime = true;
    }

    if (inject_sleeptime) {
            suspend_timing_needed = false;
            __timekeeping_inject_sleeptime(tk, &ts_delta);
    }

The "if" branch does not apply as I have no clock sources flagged as
CLOCK_SOURCE_SUSPEND_NONSTOP but the "else if" branch does apply.

The kernel seems to believe that the time spent sleeping is exactly
the difference of two calls to read_persistent_clock64 with no option
to adjust for persistent clock drift.

I would like to provide a way for user space to inform the kernel
that the persistent clock drifts so it can make a corresponding
adjustment when resuming from a long suspend period.

In my use case it would be enough for me to set this parameter on
boot. In use cases with continuous network access, NTP daemons
could be enhanced to periodically update this parameter with the
daemon's best estimate of the persistent clock drift.

Regards,
Joel