lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Oct 2017 09:11:13 -0700
From:   Gabriel Beddingfield <gabe@...tlabs.com>
To:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...eaurora.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        John Stultz <john.stultz@...aro.org>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
        linux-rtc@...r.kernel.org
Cc:     Guy Erb <guy@...tlabs.com>, hharte@...tlabs.com
Subject: Extreme time jitter with suspend/resume cycles

TL;DR: the "delta_delta" hack[1 and 2] in kernel/time/timekeeping.c
and drivers/rtc/class.c undermines the NTP system. It's not
appropriate to use if sub-second precision is available. I've attached
a patch to resolve this... please let me know the ways you hate it.
:-)

Hello Kernel Timekeeping Maintainers,

We have been developing a device that has very a very aggressive power
policy, doing suspend/resume cycles a few times a minute ("echo mem >
/sys/power/state"). In doing so, we found that the system time
experiences a lot of jitter (compared to, say, an NTP server). It was
not uncommon for us to see time corrections of 1s to 4s on a regular
basis. This didn't happen when the device stayed awake, only when it
was allowed to do suspend/resume.

We found that the problem is an interaction between the NTP code and
what I call the "delta_delta hack." (see [1] and [2]) This code
allocates a static variable in a function that contains an offset from
the system time to the persistent/rtc clock. It uses that time to
fudge the suspend timestamp so that on resume the sleep time will be
compensated. It's kind of a statistical hack that assumes things will
average out. It seems to have two main assumptions:

  1. The persistent/rtc clock has only single-second precision
  2. The system does not frequently suspend/resume.
  3. If delta_delta is less than 2 seconds, these assumptions are "true"

Because the delta_delta hack is trying to maintain an offset from the
system time to the persistent/rtc clock, any minor NTP corrections
that have occurred since the last suspend will be discarded. However,
the NTP subsystem isn't notified that this is happening -- and so it
causes some level of instability in its PLL logic.

This problem affects any device that does "frequent" suspend/resume
cycles. I.e. any battery-powered "linux" device (like Android phones).

Many ARM systems provide a "persistent clock." Most of them are backed
by a 32kHz clock that gives good precision and makes the delta_delta
hack unnecessary. However, devices that only have single-second
precision for the persistent clock and/or are forced to use the RTC
(whose API only allows for single-second precision) -- they still need
this hack.

I've attached a patch that we developed in-house. I have a feeling you
won't like it... since it pushes the responsibility on whoever
configures the kernel. It also ignores the RTC problem (which will
still affect a lot of battery-powered devices).

Please let me know what you think -- and what the right approach for
solving this would be.

Thanks,
Gabe

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/kernel/time/timekeeping.c?h=v4.13.4#n1717
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/drivers/rtc/class.c?h=v4.13.4#n76

View attachment "0001-time-add-CONFIG_PERSISTENT_CLOCK_IS_LOW_PRECISION-to.patch" of type "text/x-patch" (2565 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ