lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE9iGoiCZZBkyX9ZWnhSDMjWmucOmybCOp=XTr6Hz5rN9GNyrw@mail.gmail.com>
Date:   Wed, 26 Jan 2022 13:09:21 +0800
From:   Zichar Zhang <zichar.zhang@...aro.org>
To:     John Stultz <john.stultz@...aro.org>
Cc:     Kelly Rossmoyer <krossmo@...gle.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Pavel Machek <pavel@....cz>, Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Lee Jones <lee.jones@...aro.org>,
        Vijay Nayak <nayakvij@...gle.com>, linux-pm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Sumit Semwal <sumit.semwal@...aro.org>
Subject: Re: [RFC] PM: suspend: Upstreaming wakeup reason capture support

hello John,  Kelly!

I'm working on upstream "wakeup_reason" patch.
But I found It's a little bit hard to do that, cause it is too heavy.

The android patch put "wakeup_reason",  suspend/resume process and
interrupt subsystem together make it hard to upstream all these at once. I
think maybe we can cooperate with each other, and do it step by step.
And I make a simplified "wakeup_reason" patch as the start.

It just report the  normal IRQs which cause system wake up from suspend
and non-device "wakeups" which use kernel interface "wakeup source". It
can cover most of the situations.

Here I placed the patch, so that people can review it:
https://android-review.googlesource.com/c/kernel/common/+/1958188

Here is the differences from the requirement:

>> As of today, the active set of patches surface the following
>> suspend-related data:
>>
>> * wakeup IRQs, including:
>>    * multiple IRQs if more than one is pending during resume flow
>>    * unmapped HW IRQs (wakeup-capable in HW) that should not be
>>      occurring
>>    * misconfigured IRQs (e.g. both enable_irq_wake() and
>>      IRQF_NO_SUSPEND)
>>    * threaded IRQs (not just the parent chip's IRQ)

    -- If we do these things, the additional codes should be added in
    interrupt subsystem or some interrupt controller drivers. These
    codes are no relationship with the interrupt subsystem or the
    interrupt controllers. And we can't distinguish these IRQs from
    "non-suspend" environment, even the "Android patch" can't do that.
    So it will make the things complicated.
    -- If these IRQs do hanpen, the code in this commit will catch
    them as "unknown wakeup reason" and suggest user to check the
    kernel log for more information.
    -- I think We should cooperate with interrupt subsystem to log
    these "abnormal" IRQs. And it could be several additional
    commits to accomplish this work together then.

>> * abort reasons, including:
>>    * wakeup_source activity
>>    * failure to freeze userspace
>>    * failure to suspend devices
>>    * failed syscore_suspend callback

    -- These codes are "intrusive" to kernel (suspend/resume).
    -- These "errors" are already printed in kernel log, if we add
    these log to "wakeup_reason" either, it will cause double
    "log string" in section ".data", which just waste of the memory.
    -- As mentioned before, if these "abort action" happened, you
    can catch string "unknown wakeup reason", and check the kernel
    log then.

>> * durations from the most recent cycle, including:
>>    * time spent doing suspend/resume work
>>    * time spent in suspend

    -- Just separate these from "wakeup reason".
    It should be done in another commit.

The patch is not complete, these is the next steps:
1. add interface to show time spend in suspend/resume work.
(after this, I think it could be works and replace the android patch)
2. we need solve the "unmapped HW IRQs", "misconfigured IRQs" and
"threaded IRQs" problem.
(this is the hardest one)
3. add a string report interface for the "wakeup reason" not belongs to
any thing above.
(kernel actually didn't know the hardware wake up reason. The IRQs just
one of the reason for wake up. So it just report a "wakeup reason" from
interrupt subsystem. It just a coincidence that most hardware "wakeup
reason" is also the interupt signal.  Even the "interrupt" and "wake up"
signal are separated from each other in GIC700.
So give them a chance to report the their "wakeup reason".)

This patch can be tested in Android using the following change to AOSP:
 https://android-review.googlesource.com/c/platform/system/hardware/interfaces/+/1958666

And there is a stress test code for the interfaces in kernel:
 https://android-review.googlesource.com/c/kernel/common/+/1958189

Best
zichar


On Tue, 25 Jan 2022 at 01:38, John Stultz <john.stultz@...aro.org> wrote:
>
> On Tue, Jan 11, 2022 at 5:06 AM Kelly Rossmoyer <krossmo@...gle.com> wrote:
> >
> > # Introduction
> >
> > To aid optimization, troubleshooting, and attribution of battery life, the
> > Android kernel currently includes a set of patches which provide enhanced
> > visibility into kernel suspend/resume/abort behaviors.  The capabilities
> > and implementation of this feature have evolved significantly since an
> > unsuccessful attempt to upstream the original code
> > (https://lkml.org/lkml/2014/3/10/716), and we would like to (re)start a
> > conversation about upstreaming, starting with the central question: is
> > there support for upstreaming this set of features?
> >
> > # Motivation
> >
> > Of the many factors influencing battery life on Linux-powered mobile
> > devices, kernel suspend tends to be amongst the most impactful.  Maximizing
> > time spent in suspend and minimizing the frequency of net-negative suspend
> > cycles are both important contributors to battery life optimization.  But
> > enabling that optimization - and troubleshooting when things go wrong -
> > requires more observability of suspend/resume/abort behavior than Linux
> > currently provides.  While mechanisms like `/sys/power/pm_wakeup_irq` and
> > wakeup_source stats are useful, they are incomplete and scattered.  The
> > Android kernel wakeup reason patches implement significant improvements in
> > that area.
> >
> > # Features
> >
> > As of today, the active set of patches surface the following
> > suspend-related data:
> >
> > * wakeup IRQs, including:
> >    * multiple IRQs if more than one is pending during resume flow
> >    * unmapped HW IRQs (wakeup-capable in HW) that should not be
> >      occurring
> >    * misconfigured IRQs (e.g. both enable_irq_wake() and
> >      IRQF_NO_SUSPEND)
> >    * threaded IRQs (not just the parent chip's IRQ)
> >
> > * non-IRQ wakeups, including:
> >    * wakeups caused by an IRQ that was consumed by lower-level SW
> >    * wakeups from SOC architecture that don't manifest as IRQs
> >
> > * abort reasons, including:
> >    * wakeup_source activity
> >    * failure to freeze userspace
> >    * failure to suspend devices
> >    * failed syscore_suspend callback
> >
> > * durations from the most recent cycle, including:
> >    * time spent doing suspend/resume work
> >    * time spent in suspend
> >
> > In addition to battery life optimization and troubleshooting, some of these
> > capabilities also lay the groundwork for efforts around improving
> > attribution of wakeups/aborts (e.g. to specific processes, device features,
> > external devices, etc).
> >
> > # Shortcomings
> >
> > While the core implementation (see below) is relatively straightforward and
> > localized, calls into that core are somewhat widely spread in order to
> > capture the breadth of events of interest.  The pervasiveness of those
> > hooks is clearly an area where improvement would be beneficial, especially
> > if a cleaner solution preserved equivalent capabilities.
> >
> > # Existing Code
> >
> > As a reference for how Android currently implements the core code for these
> > features (which would need a bit of work before submission even if all
> > features were included), see the following link:
> >
> > https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/kernel/power/wakeup_reason.c
> >
>
> Hey Kelly!
>   So Zichar (added to the thread here) has been working for a little
> while on his own approach to upstream a simplified version of the
> wakeup_reason functionality. He's just gotten it to a place where it
> can be shared, so I wanted to pull him in so he could reply with his
> proposal.
>
> thanks
> -john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ