lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLXf6TmOn_jCOv68oop=4On+CN-p_KkN-70BDt9OjQhzUw@mail.gmail.com>
Date:   Wed, 15 Dec 2021 13:06:30 -0800
From:   John Stultz <john.stultz@...aro.org>
To:     Joel Daniels <jdaniels@...t.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>, linux-kernel@...r.kernel.org,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...tlin.com>,
        linux-rtc@...r.kernel.org, x86@...nel.org
Subject: Re: Time keeping while suspended in the presence of persistent clock drift

On Tue, Dec 14, 2021 at 9:43 AM Joel Daniels <jdaniels@...t.com> wrote:
> On Tue, Dec 14, 2021, at 6:57 AM, Thomas Gleixner wrote:
> > thanks for making sure that this is really a RTC issue on that machine.
>
> And thank you for taking an interest. I've measured the RTC drift over
> a number of days and it is stable at around 3.8 seconds per day (or 44
> ppm).
>
> >> The "if" branch does not apply as I have no clock sources flagged as
> >> CLOCK_SOURCE_SUSPEND_NONSTOP but the "else if" branch does apply.
> >
> > Which CPU is in that box?
>
> Intel Celeron N4120. This is a Gemini Lake Refresh (Atom) chip.
>
> The relevant bit from the early_init_intel function
> (linux/arch/x86/kernel/cpu/intel.c) is:
>
>     /* Penwell and Cloverview have the TSC which doesn't sleep on S3 */
>     if (c->x86 == 6) {
>             switch (c->x86_model) {
>             case INTEL_FAM6_ATOM_SALTWELL_MID:
>             case INTEL_FAM6_ATOM_SALTWELL_TABLET:
>             case INTEL_FAM6_ATOM_SILVERMONT_MID:
>             case INTEL_FAM6_ATOM_AIRMONT_NP:
>                     set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC_S3);
>                     break;
>             default:
>                     break;
>             }
>     }
>
> > The kernel does not believe. It relies on the accuracy of the CMOS clock
> > which is usually pretty good.
>
> The references I have found on CMOS clock accuracy [1, 2, 3, 4]
> indicate that a drift of 1 or 2 seconds per day (10 to 20 ppm) is
> typical. Hopefully people on linux-rtc can confirm?
>
> If that is correct then my clock, at +44ppm, is an outlier but I
> suspect that people with a consistent drift of only 1 second per day
> would still benefit from being able to correct for it. Indeed, people
> have been using hwclock and /etc/adjtime to correct for CMOS RTC
> drift for decades.
>
> > > I would like to provide a way for user space to inform the kernel
> > > that the persistent clock drifts so it can make a corresponding
> > > adjustment when resuming from a long suspend period.
> > >
> > > ...
> >
> > That needs some thought. The RTC people (cc'ed now) might have opinions
> > on that.
>
> I agree that this needs thought. Three issues that I am particularly
> worried about:

I'm not really active in this space much anymore, but a few of my
(possibly wrongheaded) thoughts:

>    [A] On machines with a persistent clock how is userspace supposed
>        to be sure what drift to measure? Can it assume that the drift
>        of the persistent clock used for sleep time injection is the
>        same as the drift of /dev/rtc? This seems dangerous.

Yea, there can be multiple RTCs as well.

>    [B] Sleep time injection can come from the "persistent clock" or,
>        if there is no persistent clock, from an RTC driver. I'd like
>        to correct for drift from the perisistant clock but not touch
>        the RTC driver sleep time injection mechanism. Is this
>        acceptable or do people feel that any drift correction should
>        work with both mechanisms in order to ensure a polished
>        interface?

This dual interface comes from the desire to support both the more
atomic/earlier correction we can do w/ the persistent_clock interface
while holding the timekeeping lock, while also supporting RTC devices
that may sleep when being read, or may have dependencies that aren't
ready that early in resume.

Admittedly having two separate abstractions here is a bit of a pain,
and fixing just one side doesn't make it better.

>    [C] Some users may want to correct for drift during suspend-to-RAM
>        but during suspend-to-disk they might boot into some other
>        operating system which itself sets the CMOS RTC. Hopefully,
>        this could be solved from userspace by changing the drift
>        correction parameter to 0 just before a suspend-to-disk
>        operation.

Oof. This feels particularly complex and fragile to try to address.

> I suspect that there are other things about which I should also be
> worried if only I were less ignorant. That is why I am asking here.

Personally, I'm not sure this warrants adding new userland interfaces
for. I'd probably lean towards having the RTC framework internally
measure and correct for drift, rather than adding an extra knob in
userland.

Alternatively I'd go very simple and just put the correction factor in
a boot argument.

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ