lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iFPf=WT3CjNqtioUoiX9jc5nmZLJnAkQOhBTmGq_ioAw@mail.gmail.com>
Date: Fri, 25 Apr 2025 20:43:33 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Vinod Koul <vkoul@...nel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Linux PM <linux-pm@...r.kernel.org>, 
	Bard Liao <yung-chuan.liao@...ux.intel.com>, linux-sound@...r.kernel.org
Subject: Re: [PATCH v1] soundwire: intel_auxdevice: Fix system suspend/resume handling

On Fri, Apr 25, 2025 at 8:10 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Fri, Apr 25, 2025 at 7:14 PM Pierre-Louis Bossart
> <pierre-louis.bossart@...ux.dev> wrote:
> >
> > On 4/24/25 20:13, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > >
> > > The code in intel_suspend() and intel_resume() needs to be properly
> > > synchronized with runtime PM which is not the case currently, so fix
> > > it.
> > >
> > > First of all, prevent runtime PM from triggering after intel_suspend()
> > > has started because the changes made by it to the device might be
> > > undone by a runtime resume of the device.  For this purpose, add a
> > > pm_runtime_disable() call to intel_suspend().
> >
> > Allow me to push back on this, because we have to be very careful with a hidden state transition that needs to happen.
> >
> > If a controller was suspended by pm_runtime, it will enter the clock stop mode.
> >
> > If the system needs to suspend, the controller has to be forced to exit the clock stop mode and the bus has to restart before we can suspend it, and that's why we had those pm_runtime_resume().
> >
> > Disabling pm_runtime when entering system suspend would be problematic for Intel hardware, it's a known can of worms.
>
> No, it wouldn't AFAICS.
>
> > It's quite possible that some of the code in intel_suspend() is no longer required because the .prepare will resume the bus properly, but I wanted to make sure this state transition out of clock-stop is known and taken into consideration.
>
> This patch doesn't change the functionality in intel_suspend(), it
> just prevents runtime resume running in parallel with it or after it
> from messing up with the hardware.
>
> I don't see why it would be unsafe to do and please feel free to prove me wrong.

Or just tell me what I'm missing in the reasoning below.

This code:

-    if (pm_runtime_suspended(dev)) {
-        dev_dbg(dev, "pm_runtime status was suspended, forcing active\n");
-
-        /* follow required sequence from runtime_pm.rst */
-        pm_runtime_disable(dev);
-        pm_runtime_set_active(dev);
-        pm_runtime_mark_last_busy(dev);
-        pm_runtime_enable(dev);
-
-        pm_runtime_resume(bus->dev);
-
-        link_flags = md_flags >> (bus->link_id * 8);
-
-        if (!(link_flags & SDW_INTEL_MASTER_DISABLE_PM_RUNTIME_IDLE))
-            pm_runtime_idle(dev);
-    }

that is being removed by my patch (because it is invalid - more about
that later) had never run before commit bca84a7b93fd ("PM: sleep: Use
DPM_FLAG_SMART_SUSPEND conditionally") because setting
DPM_FLAG_SMART_SUSPEND had caused the core to call
pm_runtime_set_active() on the device in the noirq resume phase, so it
had never been seen as runtime-suspended in intel_resume().  After
commit bca84a7b93fd the core doesn't do that any more, so if the
device has been runtime-suspended before intel_suspend() runs,
intel_resume() will see that its status is RPM_SUSPENDED.  The code in
question will run and it will crash and burn if
SDW_INTEL_MASTER_DISABLE_PM_RUNTIME_IDLE is set in the link flags.

The reason why that code is invalid is because the
pm_runtime_set_active() call in it causes the status to change to
RPM_ACTIVE, but it doesn't actually change the state of the device
(that is still physically suspended).  The subsequent
pm_runtime_resume() sees that the status is RPM_ACTIVE and it doesn't
do anything.  At this point, the device is still physically suspended,
but its runtime PM status is RPM_ACTIVE, so if pm_runtime_idle() runs,
it will trigger an attempt to suspend and that will break because the
device is already suspended.

So this code had never run before and it demonstrably doesn't work, so
I don't see why removing it could be incorrect.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ