[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gLMSPsaS7Jnsr8DhevaQamsVk=pu=BfXZxrT+SBAM=fQ@mail.gmail.com>
Date: Thu, 27 Jan 2022 21:10:25 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Kelly Rossmoyer <krossmo@...gle.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Lee Jones <lee.jones@...aro.org>,
Vijay Nayak <nayakvij@...gle.com>,
Linux PM <linux-pm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support
On Thu, Jan 27, 2022 at 8:54 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@...gle.com> wrote:
> >
> > # Introduction
> >
> > To aid optimization, troubleshooting, and attribution of battery life, the
> > Android kernel currently includes a set of patches which provide enhanced
> > visibility into kernel suspend/resume/abort behaviors. The capabilities
> > and implementation of this feature have evolved significantly since an
> > unsuccessful attempt to upstream the original code
> > (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> > conversation about upstreaming, starting with the central question: is
> > there support for upstreaming this set of features?
> >
> > # Motivation
> >
> > Of the many factors influencing battery life on Linux-powered mobile
> > devices, kernel suspend tends to be amongst the most impactful. Maximizing
> > time spent in suspend and minimizing the frequency of net-negative suspend
> > cycles are both important contributors to battery life optimization. But
> > enabling that optimization - and troubleshooting when things go wrong -
> > requires more observability of suspend/resume/abort behavior than Linux
> > currently provides. While mechanisms like `/sys/power/pm_wakeup_irq` and
> > wakeup_source stats are useful, they are incomplete and scattered. The
> > Android kernel wakeup reason patches implement significant improvements in
> > that area.
> >
> > # Features
> >
> > As of today, the active set of patches surface the following
> > suspend-related data:
> >
> > * wakeup IRQs, including:
> > * multiple IRQs if more than one is pending during resume flow
> > * unmapped HW IRQs (wakeup-capable in HW) that should not be
> > occurring
> > * misconfigured IRQs (e.g. both enable_irq_wake() and
> > IRQF_NO_SUSPEND)
> > * threaded IRQs (not just the parent chip's IRQ)
> >
> > * non-IRQ wakeups, including:
> > * wakeups caused by an IRQ that was consumed by lower-level SW
> > * wakeups from SOC architecture that don't manifest as IRQs
> >
> > * abort reasons, including:
> > * wakeup_source activity
> > * failure to freeze userspace
> > * failure to suspend devices
> > * failed syscore_suspend callback
> >
> > * durations from the most recent cycle, including:
> > * time spent doing suspend/resume work
> > * time spent in suspend
> >
> > In addition to battery life optimization and troubleshooting, some of these
> > capabilities also lay the groundwork for efforts around improving
> > attribution of wakeups/aborts (e.g. to specific processes, device features,
> > external devices, etc).
> >
> > # Shortcomings
> >
> > While the core implementation (see below) is relatively straightforward and
> > localized, calls into that core are somewhat widely spread in order to
> > capture the breadth of events of interest. The pervasiveness of those
> > hooks is clearly an area where improvement would be beneficial, especially
> > if a cleaner solution preserved equivalent capabilities.
> >
> > # Existing Code
> >
> > As a reference for how Android currently implements the core code for these
> > features (which would need a bit of work before submission even if all
> > features were included), see the following link:
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
>
> So as Zichar said, this is quite heavy-weight.
>
> I'm not fundamentally against adding more infrastructure to help
> identify issues related to system suspend, but there needs to be a
> clear benefit associated with any change in this direction.
That said, the general idea behind wakeup_source objects is that every
system wakeup event should be recorded in one of them which then can
be used for later analysis.
If there are reasons why this cannot work in general, what are they?
Powered by blists - more mailing lists