linux-kernel - Re: Extreme time jitter with suspend/resume cycles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1710051943570.2398@nanos>
Date:   Thu, 5 Oct 2017 20:01:16 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Gabriel Beddingfield <gabe@...tlabs.com>
cc:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...eaurora.org>,
        John Stultz <john.stultz@...aro.org>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
        linux-rtc@...r.kernel.org, Guy Erb <guy@...tlabs.com>,
        Howard Harte <hharte@...tlabs.com>
Subject: Re: Extreme time jitter with suspend/resume cycles

Gabriel,

On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:
> On Thu, Oct 5, 2017 at 4:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> > i.e. the 32bit rollover of the clocksource. So, if the clocksource->read()
> > function returns a full 64bit counter value, then it must have protection
> > against observing the rollover independent of the clock which feeds that
> > counter. Of course the frequency changes the probablity of observing it,
> > but still the read function must be protected against observing the
> > rollover unconditionally.
> 
> Right, but isn't this what clocksource->mask is supposed to do? When we change
> the back-end frequency, we're still using the same front-end 32-bit register and
> we don't see the same jumps.

Right. That's what the mask should protect. I was assuming that this is one
of the fancy clocksources which expose two 32bit registers of a 64bit
counter and the rollover protection was missing. So that's not the
case. Good, or not so good :)

> > Which SoC/clocksource driver are you talking about?
> 
> NXP i.MX 6SoloX
> drivers/clocksource/timer-imx-gpt.c

So that clocksource driver looks correct. Do you have an idea in which
context this time jump happens? Does it happen when you exercise your high
frequency suspend/resume dance or is that happening just when you let the
machine run forever as well?

The timekeeping_resume() path definitely has an issue:

        cycle_now = tk_clock_read(&tk->tkr_mono);
        if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
                cycle_now > tk->tkr_mono.cycle_last) {

This works nice for clocksources which wont wrap across suspend/resume but
not for those which can. That cycle_now -> cycle_last check should take
cs-mask into account ...

Of course for clocksources which can wrap within realistic suspend times,
which 36 hours might be accounted for, this would need an extra sanity
check against a RTC whether wrap time has been exceeded.

I haven't thought it through whether that buggered check fully explains
what you are observing, but it's wrong nevertheless. John?

Thanks,

	tglx