linux-kernel - Re: PM runtime_error handling missing in many drivers?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ca77763-53d0-965a-889e-be2eafadfd2f@intel.com>
Date:   Fri, 8 Jul 2022 22:10:34 +0200
From:   "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
To:     Vincent Whitchurch <vincent.whitchurch@...s.com>,
        Oliver Neukum <oneukum@...e.com>
CC:     "jic23@...nel.org" <jic23@...nel.org>,
        "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-iio@...r.kernel.org" <linux-iio@...r.kernel.org>
Subject: Re: PM runtime_error handling missing in many drivers?

On 7/8/2022 1:03 PM, Vincent Whitchurch wrote:
> On Tue, Jun 21, 2022 at 11:38:33AM +0200, Oliver Neukum wrote:
>> On 20.06.22 16:42, Vincent Whitchurch wrote:
>>> [110778.050000][   T27] rpm_resume: 0-0009 flags-4 cnt-1  dep-0  auto-1 p-0 irq-0 child-0
>>> [110778.050000][   T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22
>>>
>>> The following patch fixes the issue on vcnl4000, but is this the right in the
>>> fix?  And, unless I'm missing something, there are dozens of drivers
>>> with the same problem.
>> Yes. The point of pm_runtime_resume_and_get() is to remove the need
>> for handling errors when the resume fails. So I fail to see why a
>> permanent record of a failure makes sense for this API.
> I don't understand it either.
>
>>> diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c
>>> index e02e92bc2928..082b8969fe2f 100644
>>> --- a/drivers/iio/light/vcnl4000.c
>>> +++ b/drivers/iio/light/vcnl4000.c
>>> @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on)
>>>   
>>>   	if (on) {
>>>   		ret = pm_runtime_resume_and_get(dev);
>>> +		if (ret)
>>> +			pm_runtime_set_suspended(dev);
>>>   	} else {
>>>   		pm_runtime_mark_last_busy(dev);
>>>   		ret = pm_runtime_put_autosuspend(dev);
>> If you need to add this to every driver, you can just as well add it to
>> pm_runtime_resume_and_get() to avoid the duplication.
> Yes, the documentation says that the error should be cleared, but it's
> unclear why the driver is expected to do it.  From the documentation it
> looks the driver is supposed to choose between pm_runtime_set_active()
> and pm_runtime_set_suspended() to clear the error, but how/why is this
> choice supposed to be made in the driver when the driver doesn't know
> more than the framework about the status of the device?
>
> Perhaps Rafael can shed some light on this.

The driver always knows more than the framework about the device's 
actual state.  The framework only knows that something failed, but it 
doesn't know what it was and what way it failed.


>> But I am afraid we need to ask a deeper question. Is there a point
>> in recording failures to resume? The error code is reported back.
>> If a driver wishes to act upon it, it can. The core really only
>> uses the result to block new PM operations.
>> But nobody requests a resume unless it is necessary. Thus I fail
>> to see the point of checking this flag in resume as opposed to
>> suspend. If we fail, we fail, why not retry? It seems to me that the
>> record should be used only during runtime suspend.
> I guess this is also a question for Rafael.
>
> Even if the error recording is removed from runtime_resume and only done
> on suspend failures, all these drivers still have the problem of not
> clearing the error, since the next resume will fail if that is not done.

The idea was that drivers would clear these errors.


>> And as an immediate band aid, some errors like ENOMEM should
>> never be recorded.