lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOdF7nssJZZG_VKO+G_K6+aukOApmXp823yA9k3sUsSYpSB71g@mail.gmail.com>
Date:   Thu, 5 Oct 2017 13:51:03 -0700
From:   Gabriel Beddingfield <gabe@...tlabs.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Stephen Boyd <sboyd@...eaurora.org>,
        John Stultz <john.stultz@...aro.org>,
        Alessandro Zummo <a.zummo@...ertech.it>,
        Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
        linux-rtc@...r.kernel.org, Guy Erb <guy@...tlabs.com>,
        Howard Harte <hharte@...tlabs.com>
Subject: Re: Extreme time jitter with suspend/resume cycles

Hi Thomas,

On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> > Which SoC/clocksource driver are you talking about?
>>
>> NXP i.MX 6SoloX
>> drivers/clocksource/timer-imx-gpt.c
>
> So that clocksource driver looks correct. Do you have an idea in which
> context this time jump happens? Does it happen when you exercise your high
> frequency suspend/resume dance or is that happening just when you let the
> machine run forever as well?

We couldn't devise any reproduction steps. We observed it happening at
unexpected
times in a fleet of devices -- and we couldn't find any patterns to clue us in.

>
> The timekeeping_resume() path definitely has an issue:
>
>         cycle_now = tk_clock_read(&tk->tkr_mono);
>         if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
>                 cycle_now > tk->tkr_mono.cycle_last) {
>
> This works nice for clocksources which wont wrap across suspend/resume but
> not for those which can. That cycle_now -> cycle_last check should take
> cs-mask into account ...
>
> Of course for clocksources which can wrap within realistic suspend times,
> which 36 hours might be accounted for, this would need an extra sanity
> check against a RTC whether wrap time has been exceeded.
>
> I haven't thought it through whether that buggered check fully explains
> what you are observing, but it's wrong nevertheless. John?

Nah. It looks like the consequence is that you'll either fail to
inject the sleep time
or you'll fall back to having the RTC inject the sleep time. In our
case, we never
sleep for more than a couple of minutes so the error would be seconds rather
than hours.

-gabe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ