linux-kernel - Re: [PATCH RFC v2] PM / sleep: Fix racing timers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <cdd55961-e247-4385-b86e-a0a0b794ab58@BN1BFFO11FD020.protection.gbl>
Date:	Tue, 29 Jul 2014 16:22:31 -0700
From:	Sören Brinkmann <soren.brinkmann@...inx.com>
To:	Stephen Boyd <sboyd@...eaurora.org>
CC:	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
	<linux-kernel@...r.kernel.org>, <linux-pm@...r.kernel.org>,
	Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH RFC v2] PM / sleep: Fix racing timers

On Tue, 2014-07-29 at 04:05PM -0700, Stephen Boyd wrote:
> On 07/28/14 13:02, Sören Brinkmann wrote:
> > On Mon, 2014-07-28 at 12:38PM -0700, Stephen Boyd wrote:
> >>
> >> Agreed. Perhaps I put it the wrong way. I'm worried that some timer
> >> needs to run just when we go into suspend. As long as that timer is the
> >> scheduler tick we should be ok, but if it isn't the scheduler tick then
> >> it would be good to know what it is and why it's pending. Unless the
> >> idea is that if we get this far into suspend and there's a pending timer
> >> we should just ignore it and go to sleep anyway?
> > Well, that is pretty much what happens currently. The IRQs are disabled
> > and nobody cares about the pending timer. 
> 
> Yep. It sounds like we don't know what it is so let's hope it's the
> sched tick. I suspect that driver suspend paths are canceling their
> timers because their hardware has been quiesced.
Drivers probably, as well as the suspend path in general, mask all non-timer
and non-wakeup interrupts.

> 
> > My problem with that is, that
> > "suspend" for Zynq is just waiting in WFI. Hence, the pending interrupts
> > causes an immediate resume.
> > So, it should hopefully be more or less fine since the current
> > implementation basically ignores the timer. With this patch we just shut
> > them down a little earlier to prevent this pending interrupt - at least
> > that is the intention.
> >
> 
> That sort of WFI based suspend doesn't actually sound like a memory
> suspend at all. It's really the "freeze" state where we would sit in the
> deepest CPU idle state waiting for some prescribed wakeup event (power
> button press, etc.) that would then trigger a wakeup_source to be
> activated and then wakeup the suspend thread.
> 
> Unless the WFI actually triggers some power state controller? For
> example, on the ARM platforms I have we trigger suspend via a WFI, which
> causes a power state controller to pull the power from the CPU that
> triggered the WFI and then goes ahead and turns off the rest of the SoC
> power and puts the ddr in self-refresh. If we have a pending irq then
> the power state controller would abort suspend and we'd come right back
> almost immediately (similar to your situation). The thing is we don't
> see any pending irqs and we don't have this patch, so I wonder if we
> just haven't hit this case, or if there's something more fundamental
> going on that causes a difference. Or maybe we do see this pending irq
> sometimes but we don't care because we'll try and go right back to
> suspend again.
On Zynq we don't have such sophisticated external helpers. The ARM core
does everything on its own and power down is pretty much impossible by
design.
So, for Zynq we enable some low power features, move execution to OCM,
shut down PLLs and DRAM as far as possible and then just sit in wfi
(which might trigger some external low power features like SCU and L2$
standby etc.), but no smart external power controller.

One thing that might make this happen on Zynq more frequently is
that we use a 16-bit timer. I guess that timer overflowing all the
time probably causes more frequent interrupts than you'd see on
platforms with wider timers. After all the window in which this
problem would occur is rather small.

	Thanks,
	Sören
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/