lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 1 Feb 2022 23:59:36 -0800
From:   Kelly Rossmoyer <krossmo@...gle.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Lee Jones <lee.jones@...aro.org>,
        Vijay Nayak <nayakvij@...gle.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Zichar Zhang <zichar.zhang@...aro.org>
Subject: Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support

+Zichar, to try to pull the threads together.

On Sun, Jan 30, 2022 at 6:46 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Sat, Jan 29, 2022 at 9:27 AM Kelly Rossmoyer <krossmo@...gle.com> wrote:
> >
> > On Thu, Jan 27, 2022 at 12:10 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > >
> > > That said, the general idea behind wakeup_source objects is that every
> > > system wakeup event should be recorded in one of them which then can
> > > be used for later analysis.
> > >
> > > If there are reasons why this cannot work in general, what are they?
> >
> > I won't presume to say that it "cannot work in general."  Nearly everyone on
> > this thread has more expertise here than I do, and I'm keenly aware of how
> > much I don't know.  :-)
> >
> > What I will say is that - across the chips and architectures I've worked upon
> > over the last few years - that concept has not appeared to match observed
> > reality.  From what I've seen (which is a very narrow slice of the Linux
> > universe, but I suspect is at least pretty representative of Android):
> > * resumes from successful suspends are typically accompanied by a flurry of
> >   wakesource activity from which it is not possible to determine what actually
> >   caused the resume (despite last changed timestamps)
>
> So I wonder how you are going to determine what actually caused the
> resume reliably.

I don't have the cross-platform/cross-arch experience to know if this is
broadly applicable, but right now what we're doing is taking:
* the first suspend abort reason that was captured if there was one, or if
  not...
* all pending wakeup-armed IRQs, if there are any, or if not...
* the first non-IRQ, non-abort reason logged, which mostly tends to emerge
  from platform-specific code (e.g. the Foo logic block decided it was
  time to power on for reasons that aren't reflected in any IRQs)

> > * resumes that aren't accompanied by wakeup-armed IRQs can be even
> >   less well-reflected by wakesource activity
>
> Do you have examples of these other than the aborts mentioned below?

Nothing super detailed, as my area of power work has only gotten down to
relatively high-level portions of the kernel as opposed to lower-level SOC
architecture.  But two examples that come to mind include:
* a secure watchdog interrupt fires, which wakes up a hypervisor that
  handles and clears the interrupt, leaving no IRQ pending by the time
  kernel execution resumes
* power control logic outside of the CPUs decides to turn CPU clusters
  back on to support a use case currently driven by other logic on the
  chip that doesn't involve any wakeup-armed IRQs (this could plausibly be
  something like a low-power logic block that handles audio playback
  causing the CPUs to be woken up so it can receive the next several
  seconds of audio data to buffer up before the kernel suspends again)

> > * I believe inferring wakeup reasons from wakesource stats would require
> >   having a snapshot from the last moment prior to suspend, which seems
> >   unsolvable from userspace
>
> That can be addressed by extending the wakeup sources in principle.
>
> > * suspend aborts (which can be even more harmful for battery life than
> >   "true" wakeups) are often caused by things that aren't reflected by specific
> >   wakesources (e.g. a driver failing to suspend)
>
> Which again can be addressed by using special wakeup sources for
> registering these "wakeups"  or similar.

Do you have the outline of a concept in mind, or is this more about the
general principle of extending what's there vs. adding something new?
(Apologies if what you're alluding to should be clear to me... I'm afraid
I don't have the relevant experience to envision what this could look like.)

> > And as I mentioned in my reply to Zichar, this isn't solely about
> > troubleshooting.  There's a lot of room to improve on user-focused power
> > attribution, and I'm hoping to build change in that direction upon the same
> > foundation.  Having the best possible data about "why we're awake" serves both
> > goals.
>
> Generally speaking, there is one wakeup-related framework in the
> kernel (wakeup sources) and you want to add another one sort of on top
> of it and it is still quite unclear to me what can be done with the
> new framework that cannot be achieved with the old one (possibly with
> some extensions),

I guess my (neophytic) impression has been that there are actually already
three frameworks (not just one) for wakeup-related data (where "wakeup" in
this sense is "the reason(s) we aren't suspended", including aborts):
* wakeup_source stats
* pm_wakeup_irq
* suspend_stats

And if those were extended, such that:
* the name of the last active wakeup_source was exposed to userspace
  to decode suspend aborts due to wakeup_count changes
* pm_wakeup_irq captured potentially multiple pending wakeup_armed
  IRQs during resume instead of just one
* some sense of causation was added to wakeup_sources to enable their
  use for non-IRQ resume causes (e.g. PM core outside of the CPUs
  turned CPUs back on for reasons that only platform logic can decode
  and report)
* the userspace interface to wakeup_source stats wasn't so flawed

Then very determined userspace code could combine those things with their
pre-suspend-attempt states and the fourth key piece of data (the return
value from the write to /sys/power/state, assuming kernel autosuspend
isn't used) to put together something that would be close to what the
Android wakeup_reason code is doing.  And then it would break because
there's nothing in the kernel tying those different pieces together in a
cohesive way that's guaranteed to work with some degree of stability.

> Let's first talk about the specific problems to address and then we'll
> decide whether or not we need yet another piece of infrastructure to
> address them.

I'm probably being overly optimistic, but my intention is more to tie
together and shore up the gaps in existing pieces that are currently
disjoint and incomplete, as opposed to just throwing yet another
framework into the mix.

--

Kelly Rossmoyer | Software Engineer | krossmo@...gle.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ