lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1710051216050.2083@nanos>
Date:   Thu, 5 Oct 2017 13:01:17 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Gabriel Beddingfield <gabe@...tlabs.com>
cc:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...eaurora.org>,
        John Stultz <john.stultz@...aro.org>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
        linux-rtc@...r.kernel.org, Guy Erb <guy@...tlabs.com>,
        hharte@...tlabs.com
Subject: Re: Extreme time jitter with suspend/resume cycles

On Wed, 4 Oct 2017, Gabriel Beddingfield wrote:
> On Wed, Oct 4, 2017 at 11:22 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> Long story short: you can't always have your low-power clock be your
> monotonic/sched clock.

sched_clock and the clocksource for timekeeping, which feeds monotonic are
not required to be the same thing.
 
> The SoC we use backs the monotonic clock (sched_clock_register()) with a

Again. monotonic clock != sched clock. The clocksource which feeds the
monotonic timekeeper clock is registered via clocksource_register() & al.

> counter that is high frequency (>10 MHz) in their reference
> implementation. But it does not count when the system is in low-power
> mode. However, it can be configured to use a 32kHz clock that *does*
> count when the system is in low-power mode. So, we started by using this
> clock and setting the CLOCK_SOURCE_SUSPEND_NONSTOP flag. It worked
> great... at first.
> 
> Then we found that devices would randomly experience a 36-hour time jump.
> While we don't have a definitive root cause, the current theory is that
> we are getting non-atomic reads because the clock source isn't
> synchronized with the the high frequency clock (which is used for most of
> the digital logic on the SoC).

Groan. Engineering based on theories is doomed to begin with.

Your 36 hour time jump is probably exactly 36.4089 hours as that's

     ((1 << 32) / 32768) / 3600

i.e. the 32bit rollover of the clocksource. So, if the clocksource->read()
function returns a full 64bit counter value, then it must have protection
against observing the rollover independent of the clock which feeds that
counter. Of course the frequency changes the probablity of observing it,
but still the read function must be protected against observing the
rollover unconditionally.

Which SoC/clocksource driver are you talking about?

> Therefore, we moved the monotonic/sched clock back to the high-frequency source.

Please stop confusing timekeeping clock source and sched clock. They might
be the same physical device but conceptually they are different.

> Meanwhile, we can directly read the RTC clock on this system, and it will
> give us 32kHz resolution and also runs non-stop. Since reads are
> non-atomic, we have to read the registers in a loop. We used this
> register to implement read_persistent_clock64().  Because we have to read
> the registers in a loop, it seemed unfit for use as the monotonic/sched
> clock.

I can understand that. Though, using that value for injecting accurate
sleep time should just work with the existing code no matter how long the
actual sleep time was. The timekeeping core takes the nsec part of the
timespec value retrieved via read_persistent_clock64() into account.

I still have a hard time to figure out what you are trying to achieve.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ