[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jG7crhwnK0Tr0aswGWDFMc3o5QSPEDfQ+WRo+VPbYJKQ@mail.gmail.com>
Date: Sat, 26 Apr 2025 15:30:25 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Vinod Koul <vkoul@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Linux PM <linux-pm@...r.kernel.org>,
Bard Liao <yung-chuan.liao@...ux.intel.com>, linux-sound@...r.kernel.org
Subject: Re: [PATCH v1] soundwire: intel_auxdevice: Fix system suspend/resume handling
On Fri, Apr 25, 2025 at 8:43 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Fri, Apr 25, 2025 at 8:10 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> >
> > On Fri, Apr 25, 2025 at 7:14 PM Pierre-Louis Bossart
> > <pierre-louis.bossart@...ux.dev> wrote:
> > >
> > > On 4/24/25 20:13, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > >
> > > > The code in intel_suspend() and intel_resume() needs to be properly
> > > > synchronized with runtime PM which is not the case currently, so fix
> > > > it.
> > > >
> > > > First of all, prevent runtime PM from triggering after intel_suspend()
> > > > has started because the changes made by it to the device might be
> > > > undone by a runtime resume of the device. For this purpose, add a
> > > > pm_runtime_disable() call to intel_suspend().
> > >
> > > Allow me to push back on this, because we have to be very careful with a hidden state transition that needs to happen.
> > >
> > > If a controller was suspended by pm_runtime, it will enter the clock stop mode.
> > >
> > > If the system needs to suspend, the controller has to be forced to exit the clock stop mode and the bus has to restart before we can suspend it, and that's why we had those pm_runtime_resume().
> > >
> > > Disabling pm_runtime when entering system suspend would be problematic for Intel hardware, it's a known can of worms.
> >
> > No, it wouldn't AFAICS.
> >
> > > It's quite possible that some of the code in intel_suspend() is no longer required because the .prepare will resume the bus properly, but I wanted to make sure this state transition out of clock-stop is known and taken into consideration.
> >
> > This patch doesn't change the functionality in intel_suspend(), it
> > just prevents runtime resume running in parallel with it or after it
> > from messing up with the hardware.
> >
> > I don't see why it would be unsafe to do and please feel free to prove me wrong.
>
> Or just tell me what I'm missing in the reasoning below.
>
> This code:
>
> - if (pm_runtime_suspended(dev)) {
> - dev_dbg(dev, "pm_runtime status was suspended, forcing active\n");
> -
> - /* follow required sequence from runtime_pm.rst */
> - pm_runtime_disable(dev);
> - pm_runtime_set_active(dev);
> - pm_runtime_mark_last_busy(dev);
> - pm_runtime_enable(dev);
> -
> - pm_runtime_resume(bus->dev);
> -
> - link_flags = md_flags >> (bus->link_id * 8);
> -
> - if (!(link_flags & SDW_INTEL_MASTER_DISABLE_PM_RUNTIME_IDLE))
> - pm_runtime_idle(dev);
> - }
Well, I actually missed a couple of things.
First off, the core increases the runtime PM reference counter for
every device in the "prepare" phase of system suspend and it is still
nonzero when the code above runs, so the pm_runtime_idle() call at the
end of it has no effect (the bottom line is that devices don't
runtime-suspend during system-wide suspend and resume transitions,
they can only runtime-resume then).
Second, the argument of the pm_runtime_resume() call in it is
bus->dev, not dev. I have to admit ignorance regarding the reason why
this code attempts to resume a different device, but this is what
breaks things AFAICS.
The pm_runtime_mark_last_busy() call is kind of fine, but it is
redundant because it is repeated at the end of intel_resume() and even
if the autosuspend timer triggers between these two calls, it will not
cause the device to suspend ("devices don't runtime-suspend during
system-wide suspend and resume transitions").
The pm_runtime_set_active() is questionable because it takes place
before the device is technically activated, so if anything in the
meantime actually depended on it being active, it would break.
So I need to rewrite the changelog, but I still cannot find anything
problematic in the patch itself.
First, it removes some code that had not been run earlier and it
started to break things after it had been allowed to run (and which is
questionable for that matter). Next, it adds protection against races
that probably don't happen in practice, but if they happened, they
would be problematic. It also replaces two checks with simpler
versions of them that can be used at this point, nothing wrong with
that. Finally, it makes intel_resume() call pm_runtime_set_active()
when it is needed because the device actually becomes active at the
end of that function and it removes the setting of a driver flag that
has no effect on the given device any more.
I'm going to resend it with a new changelog as a v2.
Thanks!
Powered by blists - more mailing lists