[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1710052302440.2398@nanos>
Date: Thu, 5 Oct 2017 23:04:42 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Gabriel Beddingfield <gabe@...tlabs.com>
cc: LKML <linux-kernel@...r.kernel.org>,
Stephen Boyd <sboyd@...eaurora.org>,
John Stultz <john.stultz@...aro.org>,
Alessandro Zummo <a.zummo@...ertech.it>,
Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
linux-rtc@...r.kernel.org, Guy Erb <guy@...tlabs.com>,
Howard Harte <hharte@...tlabs.com>
Subject: Re: Extreme time jitter with suspend/resume cycles
On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:
> Hi Thomas,
>
> On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >> > Which SoC/clocksource driver are you talking about?
> >>
> >> NXP i.MX 6SoloX
> >> drivers/clocksource/timer-imx-gpt.c
> >
> > So that clocksource driver looks correct. Do you have an idea in which
> > context this time jump happens? Does it happen when you exercise your high
> > frequency suspend/resume dance or is that happening just when you let the
> > machine run forever as well?
>
> We couldn't devise any reproduction steps. We observed it happening at
> unexpected times in a fleet of devices -- and we couldn't find any
> patterns to clue us in.
Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
loop to detect the wreckage and maybe a pattern in there?
> > The timekeeping_resume() path definitely has an issue:
> >
> > cycle_now = tk_clock_read(&tk->tkr_mono);
> > if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> > cycle_now > tk->tkr_mono.cycle_last) {
> >
> > This works nice for clocksources which wont wrap across suspend/resume but
> > not for those which can. That cycle_now -> cycle_last check should take
> > cs-mask into account ...
> >
> > Of course for clocksources which can wrap within realistic suspend times,
> > which 36 hours might be accounted for, this would need an extra sanity
> > check against a RTC whether wrap time has been exceeded.
> >
> > I haven't thought it through whether that buggered check fully explains
> > what you are observing, but it's wrong nevertheless. John?
>
> Nah. It looks like the consequence is that you'll either fail to inject
> the sleep time or you'll fall back to having the RTC inject the sleep
> time. In our case, we never sleep for more than a couple of minutes so
> the error would be seconds rather than hours.
Fair enough. It's still wrong though and wants to be fixed.
Thanks,
tglx
Powered by blists - more mailing lists