[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gQ-6ZehL5HNhFvOWDEyXdS++uaMn1AOB7whoMTKzj-ZQ@mail.gmail.com>
Date: Thu, 24 Apr 2025 13:02:58 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Raag Jadav <raag.jadav@...el.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, mahesh@...ux.ibm.com, oohall@...il.com,
bhelgaas@...gle.com, linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
ilpo.jarvinen@...ux.intel.com, lukas@...ner.de,
aravind.iddamsetty@...ux.intel.com
Subject: Re: [PATCH v2] PCI/PM: Avoid suspending the device with errors
On Thu, Apr 24, 2025 at 7:38 AM Raag Jadav <raag.jadav@...el.com> wrote:
>
> On Wed, Apr 23, 2025 at 02:41:52PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Apr 22, 2025 at 3:55 PM Raag Jadav <raag.jadav@...el.com> wrote:
> > >
> > > If an error is triggered before or during system suspend, any bus level
> > > power state transition will result in unpredictable behaviour by the
> > > device with failed recovery. Avoid suspending such a device and leave
> > > it in its existing power state.
> > >
> > > This only covers the devices that rely on PCI core PM for their power
> > > state transition.
> > >
> > > Signed-off-by: Raag Jadav <raag.jadav@...el.com>
> > > ---
> > >
> > > v2: Synchronize AER handling with PCI PM (Rafael)
> > >
> > > More discussion on [1].
> > > [1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/
> > >
> > > drivers/pci/pci-driver.c | 3 ++-
> > > drivers/pci/pcie/aer.c | 11 +++++++++++
> > > include/linux/aer.h | 2 ++
> > > 3 files changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > index f57ea36d125d..289a1fa7cb2d 100644
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -884,7 +884,8 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > > }
> > > }
> > >
> > > - if (!pci_dev->state_saved) {
> > > + /* Avoid suspending the device with errors */
> > > + if (!pci_aer_in_progress(pci_dev) && !pci_dev->state_saved) {
> >
> > Apart from the potential raciness mentioned by Bjorn, doing it just
> > here is questionable because this is not the only place where the PCI
> > device power state can change.
> >
> > It would be better to catch this in pci_set_low_power_state() IMO.
>
> I'm not sure if we should prevent power state transition for the users
> that explicitly want to transition.
>
> Also, the device state can potentially be corrupted because of the errors,
> so we'd probably want to avoid pci_save_state() as well, which is what
> I attempted here.
But it's not what the changelog is saying.
If you want to avoid pci_save_state(), there are also other places
when it is called and then you also may want to avoid the transition
because if the state is not saved, it won't be possible to restore it.
Powered by blists - more mailing lists