[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE9iGoiCZZBkyX9ZWnhSDMjWmucOmybCOp=XTr6Hz5rN9GNyrw@mail.gmail.com>
Date: Wed, 26 Jan 2022 13:09:21 +0800
From: Zichar Zhang <zichar.zhang@...aro.org>
To: John Stultz <john.stultz@...aro.org>
Cc: Kelly Rossmoyer <krossmo@...gle.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Lee Jones <lee.jones@...aro.org>,
Vijay Nayak <nayakvij@...gle.com>, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Sumit Semwal <sumit.semwal@...aro.org>
Subject: Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
hello John, Kelly!
I'm working on upstream "wakeup_reason" patch.
But I found It's a little bit hard to do that, cause it is too heavy.
The android patch put "wakeup_reason", suspend/resume process and
interrupt subsystem together make it hard to upstream all these at once. I
think maybe we can cooperate with each other, and do it step by step.
And I make a simplified "wakeup_reason" patch as the start.
It just report the normal IRQs which cause system wake up from suspend
and non-device "wakeups" which use kernel interface "wakeup source". It
can cover most of the situations.
Here I placed the patch, so that people can review it:
https://android-review.googlesource.com/c/kernel/common/+/1958188
Here is the differences from the requirement:
>> As of today, the active set of patches surface the following
>> suspend-related data:
>>
>> * wakeup IRQs, including:
>> * multiple IRQs if more than one is pending during resume flow
>> * unmapped HW IRQs (wakeup-capable in HW) that should not be
>> occurring
>> * misconfigured IRQs (e.g. both enable_irq_wake() and
>> IRQF_NO_SUSPEND)
>> * threaded IRQs (not just the parent chip's IRQ)
-- If we do these things, the additional codes should be added in
interrupt subsystem or some interrupt controller drivers. These
codes are no relationship with the interrupt subsystem or the
interrupt controllers. And we can't distinguish these IRQs from
"non-suspend" environment, even the "Android patch" can't do that.
So it will make the things complicated.
-- If these IRQs do hanpen, the code in this commit will catch
them as "unknown wakeup reason" and suggest user to check the
kernel log for more information.
-- I think We should cooperate with interrupt subsystem to log
these "abnormal" IRQs. And it could be several additional
commits to accomplish this work together then.
>> * abort reasons, including:
>> * wakeup_source activity
>> * failure to freeze userspace
>> * failure to suspend devices
>> * failed syscore_suspend callback
-- These codes are "intrusive" to kernel (suspend/resume).
-- These "errors" are already printed in kernel log, if we add
these log to "wakeup_reason" either, it will cause double
"log string" in section ".data", which just waste of the memory.
-- As mentioned before, if these "abort action" happened, you
can catch string "unknown wakeup reason", and check the kernel
log then.
>> * durations from the most recent cycle, including:
>> * time spent doing suspend/resume work
>> * time spent in suspend
-- Just separate these from "wakeup reason".
It should be done in another commit.
The patch is not complete, these is the next steps:
1. add interface to show time spend in suspend/resume work.
(after this, I think it could be works and replace the android patch)
2. we need solve the "unmapped HW IRQs", "misconfigured IRQs" and
"threaded IRQs" problem.
(this is the hardest one)
3. add a string report interface for the "wakeup reason" not belongs to
any thing above.
(kernel actually didn't know the hardware wake up reason. The IRQs just
one of the reason for wake up. So it just report a "wakeup reason" from
interrupt subsystem. It just a coincidence that most hardware "wakeup
reason" is also the interupt signal. Even the "interrupt" and "wake up"
signal are separated from each other in GIC700.
So give them a chance to report the their "wakeup reason".)
This patch can be tested in Android using the following change to AOSP:
https://android-review.googlesource.com/c/platform/system/hardware/interfaces/+/1958666
And there is a stress test code for the interfaces in kernel:
https://android-review.googlesource.com/c/kernel/common/+/1958189
Best
zichar
On Tue, 25 Jan 2022 at 01:38, John Stultz <john.stultz@...aro.org> wrote:
>
> On Tue, Jan 11, 2022 at 5:06 AM Kelly Rossmoyer <krossmo@...gle.com> wrote:
> >
> > # Introduction
> >
> > To aid optimization, troubleshooting, and attribution of battery life, the
> > Android kernel currently includes a set of patches which provide enhanced
> > visibility into kernel suspend/resume/abort behaviors. The capabilities
> > and implementation of this feature have evolved significantly since an
> > unsuccessful attempt to upstream the original code
> > (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> > conversation about upstreaming, starting with the central question: is
> > there support for upstreaming this set of features?
> >
> > # Motivation
> >
> > Of the many factors influencing battery life on Linux-powered mobile
> > devices, kernel suspend tends to be amongst the most impactful. Maximizing
> > time spent in suspend and minimizing the frequency of net-negative suspend
> > cycles are both important contributors to battery life optimization. But
> > enabling that optimization - and troubleshooting when things go wrong -
> > requires more observability of suspend/resume/abort behavior than Linux
> > currently provides. While mechanisms like `/sys/power/pm_wakeup_irq` and
> > wakeup_source stats are useful, they are incomplete and scattered. The
> > Android kernel wakeup reason patches implement significant improvements in
> > that area.
> >
> > # Features
> >
> > As of today, the active set of patches surface the following
> > suspend-related data:
> >
> > * wakeup IRQs, including:
> > * multiple IRQs if more than one is pending during resume flow
> > * unmapped HW IRQs (wakeup-capable in HW) that should not be
> > occurring
> > * misconfigured IRQs (e.g. both enable_irq_wake() and
> > IRQF_NO_SUSPEND)
> > * threaded IRQs (not just the parent chip's IRQ)
> >
> > * non-IRQ wakeups, including:
> > * wakeups caused by an IRQ that was consumed by lower-level SW
> > * wakeups from SOC architecture that don't manifest as IRQs
> >
> > * abort reasons, including:
> > * wakeup_source activity
> > * failure to freeze userspace
> > * failure to suspend devices
> > * failed syscore_suspend callback
> >
> > * durations from the most recent cycle, including:
> > * time spent doing suspend/resume work
> > * time spent in suspend
> >
> > In addition to battery life optimization and troubleshooting, some of these
> > capabilities also lay the groundwork for efforts around improving
> > attribution of wakeups/aborts (e.g. to specific processes, device features,
> > external devices, etc).
> >
> > # Shortcomings
> >
> > While the core implementation (see below) is relatively straightforward and
> > localized, calls into that core are somewhat widely spread in order to
> > capture the breadth of events of interest. The pervasiveness of those
> > hooks is clearly an area where improvement would be beneficial, especially
> > if a cleaner solution preserved equivalent capabilities.
> >
> > # Existing Code
> >
> > As a reference for how Android currently implements the core code for these
> > features (which would need a bit of work before submission even if all
> > features were included), see the following link:
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
> >
>
> Hey Kelly!
> So Zichar (added to the thread here) has been working for a little
> while on his own approach to upstream a simplified version of the
> wakeup_reason functionality. He's just gotten it to a place where it
> can be shared, so I wanted to pull him in so he could reply with his
> proposal.
>
> thanks
> -john
Powered by blists - more mailing lists