[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hJbKEJKRKv67bcQaHbL7h5e_WDGNPg4BA_P4JY-mk_nw@mail.gmail.com>
Date: Thu, 5 Jun 2025 14:26:05 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Raag Jadav <raag.jadav@...el.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Mario Limonciello <superm1@...nel.org>,
Denis Benato <benato.denis96@...il.com>, mahesh@...ux.ibm.com, oohall@...il.com,
bhelgaas@...gle.com, linux-pci@...r.kernel.org, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org, ilpo.jarvinen@...ux.intel.com, lukas@...ner.de,
aravind.iddamsetty@...ux.intel.com,
"amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>, Alex Deucher <alexander.deucher@....com>
Subject: Re: [PATCH v4] PCI: Prevent power state transition of erroneous device
On Thu, Jun 5, 2025 at 1:44 PM Raag Jadav <raag.jadav@...el.com> wrote:
>
> On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> > On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@...el.com> wrote:
> > > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@...el.com> wrote:
> > > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@...el.com> wrote:
> > > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> > >
> > > ...
> > >
> > > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > > the driver did the right thing. If the driver was expecting the root port
> > > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > > >
> > > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > > detected under root port during suspend.
> > > > > > >
> > > > > > > And IMV this driver's expectation is questionable at least.
> > > > > > >
> > > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > > D3cold during system suspend.
> > > > > >
> > > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > > d3cold_allowed attribute in sysfs.
> > > > > >
> > > > > > If the driver cannot handle this, it needs to be fixed.
> > > > >
> > > > > Thanks for confirming. So should we consider this patch to be valid
> > > > > and worth moving forward?
> > > >
> > > > It doesn't do anything that would be invalid in principle IMV.
> > > >
> > > > You need to consider one more thing, though: It may be necessary to
> > > > power-cycle the device in order to kick it out of the erroneous state
> > > > and the patch effectively blocks this if I'm not mistaken.
> > > >
> > > > But admittedly I'm not sure if this really matters.
> > >
> > > Wouldn't something like bus reset (SBR) be more predictable?
> >
> > Maybe.
> >
> > The device state is most likely inconsistent in that case, so it depends.
>
> My limited understanding is that if SBR doesn't help, at that point all
> bets are off including PMCSR configuration and probably a cold boot is
> needed.
I'm not talking about PMCSR, I'm talking about power removal (D3cold).
This should be equivalent to a cold boot for the particular device
except that cold boot would also reset the hierarchy above it.
Powered by blists - more mailing lists