linux-kernel - Re: PM runtime_error handling missing in many drivers?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5caa944f-c841-6f74-8e43-a278b2b93b06@suse.com>
Date:   Tue, 21 Jun 2022 11:38:33 +0200
From:   Oliver Neukum <oneukum@...e.com>
To:     Vincent Whitchurch <vincent.whitchurch@...s.com>,
        rafael.j.wysocki@...el.com, jic23@...nel.org
Cc:     linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-iio@...r.kernel.org
Subject: Re: PM runtime_error handling missing in many drivers?

On 20.06.22 16:42, Vincent Whitchurch wrote:

Hi,

> Many drivers do something like the following to resume their hardware
> before performing some hardware access when user space ask for it:
> 
> 	ret = pm_runtime_resume_and_get(dev);
> 	if (ret)
> 		return ret;
> 
> But if the ->runtime_resume() callback fails, then the
> power.runtime_error is set and any further attempts to use
> pm_runtime_resume_and_get() will fail, as documented in
> Documentation/power/runtime_pm.rst.

Whether this is properly documented is debatable.
4. Runtime PM Device Helper Functions (as a chapter)
does _not_ document it.

> [110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
> [110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
> 
> The following patch fixes the issue on vcnl4000, but is this the right in the

> fix?  And, unless I'm missing something, there are dozens of drivers
> with the same problem.

Yes. The point of pm_runtime_resume_and_get() is to remove the need
for handling errors when the resume fails. So I fail to see why a
permanent record of a failure makes sense for this API.

> diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
> index e02e92bc2928..082b8969fe2f 100644
> --- a/drivers/iio/light/vcnl4000.c
> +++ b/drivers/iio/light/vcnl4000.c
> @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
>  
>  	if (on) {
>  		ret = pm_runtime_resume_and_get(dev);
> +		if (ret)
> +			pm_runtime_set_suspended(dev);
>  	} else {
>  		pm_runtime_mark_last_busy(dev);
>  		ret = pm_runtime_put_autosuspend(dev);

If you need to add this to every driver, you can just as well add it to
pm_runtime_resume_and_get() to avoid the duplication.

But I am afraid we need to ask a deeper question. Is there a point
in recording failures to resume? The error code is reported back.
If a driver wishes to act upon it, it can. The core really only
uses the result to block new PM operations.
But nobody requests a resume unless it is necessary. Thus I fail
to see the point of checking this flag in resume as opposed to
suspend. If we fail, we fail, why not retry? It seems to me that the
record should be used only during runtime suspend.

And as an immediate band aid, some errors like ENOMEM should
never be recorded.

	Regards
		Oliver