linux-kernel - Re: [RFC PATCH] PM / core: skip suspend next time if resume returns an error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0hSohUZvJcVcQzy_DdSyPVppOABk0HELsa3dpvf12LMPQ@mail.gmail.com>
Date:   Tue, 2 Oct 2018 23:16:23 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Doug Anderson <dianders@...omium.org>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Pavel Machek <pavel@....cz>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>, dkota@...eaurora.org,
        Dmitry Torokhov <dtor@...omium.org>, swboyd@...omium.org,
        Linux PM <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Len Brown <len.brown@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [RFC PATCH] PM / core: skip suspend next time if resume returns
 an error

On Tue, Oct 2, 2018 at 11:01 PM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Tue, Oct 2, 2018 at 1:29 AM Rafael J. Wysocki <rafael@...nel.org> wrote:

[cut]

> > I guess so.
> >
> > Doing that in all cases is kind of risky IMO, because we haven't taken
> > the return values of the ->resume* callbacks into account so far
> > (except for printing the information that is), so there may be
> > non-lethal cases when that happens and the $subject patch would make
> > them not work any more.
>
> I think you're arguing that the best option is to leave the code / API
> exactly as-is because someone could be relying on the existing
> behavior?  That is certainly the least likely to introduce any new
> bugs.  ;-P
>
> ...would you accept a patch adding a comment codifying the existing
> behavior (AKA suspend will be called again even if resume failed) as
> the officially documented behavior?

It is documented already IIRC, but yes.

> Then we can start making new
> drivers behave correctly at least.  If nothing else I can add a
> boolean inside my driver data that says "resume failed, ignore the
> next suspend".

Or maybe "fail the next suspend" even?

> ...or if the official word is that if your resume fails you're totally
> unrecoverable then I can start simplifying the error handling in
> resume.  AKA instead of:
>
>   hypothetical_resume(...) {
>     ret = clk_enable(...);
>     if (ret)
>       return ret;
>     ret = regulator_enable(...);
>     if (ret)
>       clk_disable(...);
>     return ret;
>
> ...I can just change it to:
>
>   hypothetical_resume(...) {
>     ret = clk_enable(...);
>     if (ret)
>       return ret;
>     return regulator_enable(...);
>
> ...the above would leave no way to recover the system because if
> hypothetical_resume() returned an error we'd have no idea if the clock
> was left enabled or not.  ...but if we're unrecoverable anyway why not
> save the code?

This really depends on the particular case.

If you deal with clocks directly, then you pretty much know whether or
not things are recoverable after a failing device resume, but if AML
tells you that it failed (say), you don't really know what happened.
In many cases the device that failed to resume will not work correctly
in the working state, but attempting to suspend it again may be fine.
It may recover after the next suspend-resume cycle even sometimes.  So
IMO drivers can do "smart" things if they really want to and know
enough, but there really is too much variation to handle it in the
core in a uniform way.

Thanks,
Rafael