linux-kernel - Re: Time keeping while suspended in the presence of persistent clock drift

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <2ab24da8-e37d-426a-9500-b7541d21f8a3@www.fastmail.com>
Date:   Tue, 14 Dec 2021 10:43:00 -0700
From:   "Joel Daniels" <jdaniels@...t.com>
To:     "Thomas Gleixner" <tglx@...utronix.de>,
        "John Stultz" <john.stultz@...aro.org>,
        "Stephen Boyd" <sboyd@...nel.org>
Cc:     linux-kernel@...r.kernel.org,
        "Alessandro Zummo" <a.zummo@...ertech.it>,
        "Alexandre Belloni" <alexandre.belloni@...tlin.com>,
        linux-rtc@...r.kernel.org, x86@...nel.org
Subject: Re: Time keeping while suspended in the presence of persistent clock drift

Thomas,

On Tue, Dec 14, 2021, at 6:57 AM, Thomas Gleixner wrote:
> thanks for making sure that this is really a RTC issue on that machine.

And thank you for taking an interest. I've measured the RTC drift over
a number of days and it is stable at around 3.8 seconds per day (or 44
ppm).

>> The "if" branch does not apply as I have no clock sources flagged as
>> CLOCK_SOURCE_SUSPEND_NONSTOP but the "else if" branch does apply.
>
> Which CPU is in that box?

Intel Celeron N4120. This is a Gemini Lake Refresh (Atom) chip.

The relevant bit from the early_init_intel function
(linux/arch/x86/kernel/cpu/intel.c) is:

    /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
    if (c->x86 == 6) {
            switch (c->x86_model) {
            case INTEL_FAM6_ATOM_SALTWELL_MID:
            case INTEL_FAM6_ATOM_SALTWELL_TABLET:
            case INTEL_FAM6_ATOM_SILVERMONT_MID:
            case INTEL_FAM6_ATOM_AIRMONT_NP:
                    set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC_S3);
                    break;
            default:
                    break;
            }
    }

> The kernel does not believe. It relies on the accuracy of the CMOS clock
> which is usually pretty good.

The references I have found on CMOS clock accuracy [1, 2, 3, 4]
indicate that a drift of 1 or 2 seconds per day (10 to 20 ppm) is
typical. Hopefully people on linux-rtc can confirm?

If that is correct then my clock, at +44ppm, is an outlier but I
suspect that people with a consistent drift of only 1 second per day
would still benefit from being able to correct for it. Indeed, people
have been using hwclock and /etc/adjtime to correct for CMOS RTC
drift for decades.

> > I would like to provide a way for user space to inform the kernel
> > that the persistent clock drifts so it can make a corresponding
> > adjustment when resuming from a long suspend period.
> >
> > ...
> 
> That needs some thought. The RTC people (cc'ed now) might have opinions
> on that.

I agree that this needs thought. Three issues that I am particularly
worried about:

   [A] On machines with a persistent clock how is userspace supposed
       to be sure what drift to measure? Can it assume that the drift
       of the persistent clock used for sleep time injection is the
       same as the drift of /dev/rtc? This seems dangerous.

   [B] Sleep time injection can come from the "persistent clock" or,
       if there is no persistent clock, from an RTC driver. I'd like
       to correct for drift from the perisistant clock but not touch
       the RTC driver sleep time injection mechanism. Is this
       acceptable or do people feel that any drift correction should
       work with both mechanisms in order to ensure a polished
       interface?
       
   [C] Some users may want to correct for drift during suspend-to-RAM
       but during suspend-to-disk they might boot into some other
       operating system which itself sets the CMOS RTC. Hopefully,
       this could be solved from userspace by changing the drift
       correction parameter to 0 just before a suspend-to-disk
       operation.

I suspect that there are other things about which I should also be
worried if only I were less ignorant. That is why I am asking here.

Thanks,
Joel Daniels

[1] http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#AEN5674 :
    "A PC used a stratum 1 server with PPS had had a hardware fault,
    and it had been powered off for about 18 days. ... when the system
    was rebooted the RTC clock was off by 18 seconds. That would be
    an error of roughly 12 PPM."

[2] https://www.hindawi.com/journals/jcnc/2008/583162/ :
    "In IBM PC compatible computers, the RTC circuit is the Motorola
    146818, with a resolution of approximately one second and a
    significant drift"

[3] https://www.maximintegrated.com/en/design/
            technical-documents/app-notes/5/58.html :
    Tables 1, 2 and 3 list 32.768Khz crystals with typical frequency
    tolerances of around +/- 20ppm at 25 degrees celsius.

[4] https://www.greyware.com/software/domaintime/technical/
            accuracy/pcclocks.asp:
    "The resolution of most PC real-time clocks is one full second,
    and most RTCs drift considerably over time."