linux-kernel - Re: Extreme time jitter with suspend/resume cycles

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOdF7nssJZZG_VKO+G_K6+aukOApmXp823yA9k3sUsSYpSB71g@mail.gmail.com>
Date:   Thu, 5 Oct 2017 13:51:03 -0700
From:   Gabriel Beddingfield <gabe@...tlabs.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...eaurora.org>,
        John Stultz <john.stultz@...aro.org>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
        linux-rtc@...r.kernel.org, Guy Erb <guy@...tlabs.com>,
        Howard Harte <hharte@...tlabs.com>
Subject: Re: Extreme time jitter with suspend/resume cycles

Hi Thomas,

On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> > Which SoC/clocksource driver are you talking about?
>>
>> NXP i.MX 6SoloX
>> drivers/clocksource/timer-imx-gpt.c
>
> So that clocksource driver looks correct. Do you have an idea in which
> context this time jump happens? Does it happen when you exercise your high
> frequency suspend/resume dance or is that happening just when you let the
> machine run forever as well?

We couldn't devise any reproduction steps. We observed it happening at
unexpected
times in a fleet of devices -- and we couldn't find any patterns to clue us in.

>
> The timekeeping_resume() path definitely has an issue:
>
>         cycle_now = tk_clock_read(&tk->tkr_mono);
>         if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
>                 cycle_now > tk->tkr_mono.cycle_last) {
>
> This works nice for clocksources which wont wrap across suspend/resume but
> not for those which can. That cycle_now -> cycle_last check should take
> cs-mask into account ...
>
> Of course for clocksources which can wrap within realistic suspend times,
> which 36 hours might be accounted for, this would need an extra sanity
> check against a RTC whether wrap time has been exceeded.
>
> I haven't thought it through whether that buggered check fully explains
> what you are observing, but it's wrong nevertheless. John?

Nah. It looks like the consequence is that you'll either fail to
inject the sleep time
or you'll fall back to having the RTC inject the sleep time. In our
case, we never
sleep for more than a couple of minutes so the error would be seconds rather
than hours.

-gabe