linux-kernel - Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0jrU4Xw2wzdUL9Vd2C6u8NVx5J79DeiRY6KU1xT6ZSuqw@mail.gmail.com>
Date:   Thu, 27 Jan 2022 20:54:16 +0100
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Kelly Rossmoyer <krossmo@...gle.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Lee Jones <lee.jones@...aro.org>,
        Vijay Nayak <nayakvij@...gle.com>,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support

On Mon, Jan 10, 2022 at 7:49 PM Kelly Rossmoyer <krossmo@...gle.com> wrote:
>
> # Introduction
>
> To aid optimization, troubleshooting, and attribution of battery life, the
> Android kernel currently includes a set of patches which provide enhanced
> visibility into kernel suspend/resume/abort behaviors.  The capabilities
> and implementation of this feature have evolved significantly since an
> unsuccessful attempt to upstream the original code
> (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> conversation about upstreaming, starting with the central question: is
> there support for upstreaming this set of features?
>
> # Motivation
>
> Of the many factors influencing battery life on Linux-powered mobile
> devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> time spent in suspend and minimizing the frequency of net-negative suspend
> cycles are both important contributors to battery life optimization.  But
> enabling that optimization - and troubleshooting when things go wrong -
> requires more observability of suspend/resume/abort behavior than Linux
> currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> wakeup_source stats are useful, they are incomplete and scattered.  The
> Android kernel wakeup reason patches implement significant improvements in
> that area.
>
> # Features
>
> As of today, the active set of patches surface the following
> suspend-related data:
>
> * wakeup IRQs, including:
>    * multiple IRQs if more than one is pending during resume flow
>    * unmapped HW IRQs (wakeup-capable in HW) that should not be
>      occurring
>    * misconfigured IRQs (e.g. both enable_irq_wake() and
>      IRQF_NO_SUSPEND)
>    * threaded IRQs (not just the parent chip's IRQ)
>
> * non-IRQ wakeups, including:
>    * wakeups caused by an IRQ that was consumed by lower-level SW
>    * wakeups from SOC architecture that don't manifest as IRQs
>
> * abort reasons, including:
>    * wakeup_source activity
>    * failure to freeze userspace
>    * failure to suspend devices
>    * failed syscore_suspend callback
>
> * durations from the most recent cycle, including:
>    * time spent doing suspend/resume work
>    * time spent in suspend
>
> In addition to battery life optimization and troubleshooting, some of these
> capabilities also lay the groundwork for efforts around improving
> attribution of wakeups/aborts (e.g. to specific processes, device features,
> external devices, etc).
>
> # Shortcomings
>
> While the core implementation (see below) is relatively straightforward and
> localized, calls into that core are somewhat widely spread in order to
> capture the breadth of events of interest.  The pervasiveness of those
> hooks is clearly an area where improvement would be beneficial, especially
> if a cleaner solution preserved equivalent capabilities.
>
> # Existing Code
>
> As a reference for how Android currently implements the core code for these
> features (which would need a bit of work before submission even if all
> features were included), see the following link:
>
> https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c

So as Zichar said, this is quite heavy-weight.

I'm not fundamentally against adding more infrastructure to help
identify issues related to system suspend, but there needs to be a
clear benefit associated with any change in this direction.  Also
adding significant overhead just for this purpose alone is rather out
of the question.

I would advise you to follow the suggestion to split the work into
smaller pieces and submit them one at a time, possibly starting with
the ones bringing the most significant benefits to the table.