[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEGDL0IF10QX3Abr@black.fi.intel.com>
Date: Thu, 5 Jun 2025 14:44:47 +0300
From: Raag Jadav <raag.jadav@...el.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Mario Limonciello <superm1@...nel.org>,
Denis Benato <benato.denis96@...il.com>, mahesh@...ux.ibm.com,
oohall@...il.com, bhelgaas@...gle.com, linux-pci@...r.kernel.org,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
ilpo.jarvinen@...ux.intel.com, lukas@...ner.de,
aravind.iddamsetty@...ux.intel.com,
"amd-gfx@...ts.freedesktop.org" <amd-gfx@...ts.freedesktop.org>,
Alex Deucher <alexander.deucher@....com>
Subject: Re: [PATCH v4] PCI: Prevent power state transition of erroneous
device
On Wed, Jun 04, 2025 at 08:19:34PM +0200, Rafael J. Wysocki wrote:
> On Wed, Jun 4, 2025 at 5:43 PM Raag Jadav <raag.jadav@...el.com> wrote:
> > On Fri, May 30, 2025 at 07:49:26PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, May 30, 2025 at 7:23 PM Raag Jadav <raag.jadav@...el.com> wrote:
> > > > On Fri, May 23, 2025 at 05:23:10PM +0200, Rafael J. Wysocki wrote:
> > > > > On Wed, May 21, 2025 at 1:27 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > > > > On Wed, May 21, 2025 at 10:54 AM Raag Jadav <raag.jadav@...el.com> wrote:
> > > > > > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote:
> > > > > > > > On 5/20/2025 1:42 PM, Raag Jadav wrote:
> > > > > > > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote:
> >
> > ...
> >
> > > > > > > > From the driver perspective it does have expectations that the parts outside
> > > > > > > > the driver did the right thing. If the driver was expecting the root port
> > > > > > > > to be powered down at suspend and it wasn't there are hardware components
> > > > > > > > that didn't power cycle and that's what we're seeing here.
> > > > > > >
> > > > > > > Which means the expectation set by the driver is the opposite of the
> > > > > > > purpose of this patch, and it's going to fail if any kind of error is
> > > > > > > detected under root port during suspend.
> > > > > >
> > > > > > And IMV this driver's expectation is questionable at least.
> > > > > >
> > > > > > There is no promise whatsoever that the device will always be put into
> > > > > > D3cold during system suspend.
> > > > >
> > > > > For instance, user space may disable D3cold for any PCI device via the
> > > > > d3cold_allowed attribute in sysfs.
> > > > >
> > > > > If the driver cannot handle this, it needs to be fixed.
> > > >
> > > > Thanks for confirming. So should we consider this patch to be valid
> > > > and worth moving forward?
> > >
> > > It doesn't do anything that would be invalid in principle IMV.
> > >
> > > You need to consider one more thing, though: It may be necessary to
> > > power-cycle the device in order to kick it out of the erroneous state
> > > and the patch effectively blocks this if I'm not mistaken.
> > >
> > > But admittedly I'm not sure if this really matters.
> >
> > Wouldn't something like bus reset (SBR) be more predictable?
>
> Maybe.
>
> The device state is most likely inconsistent in that case, so it depends.
My limited understanding is that if SBR doesn't help, at that point all
bets are off including PMCSR configuration and probably a cold boot is
needed.
Please correct me if I've misunderstood.
Raag
Powered by blists - more mailing lists