[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250919121739.53f79518.alex.williamson@redhat.com>
Date: Fri, 19 Sep 2025 12:17:39 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Farhan Ali <alifm@...ux.ibm.com>
Cc: Bjorn Helgaas <helgaas@...nel.org>, linux-s390@...r.kernel.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, schnelle@...ux.ibm.com, mjrosato@...ux.ibm.com
Subject: Re: [PATCH v3 01/10] PCI: Avoid saving error values for config
space
On Tue, 16 Sep 2025 13:00:30 -0700
Farhan Ali <alifm@...ux.ibm.com> wrote:
> On 9/16/2025 11:09 AM, Bjorn Helgaas wrote:
> > On Thu, Sep 11, 2025 at 11:32:58AM -0700, Farhan Ali wrote:
> >> The current reset process saves the device's config space state before
> >> reset and restores it afterward. However, when a device is in an error
> >> state before reset, config space reads may return error values instead of
> >> valid data. This results in saving corrupted values that get written back
> >> to the device during state restoration.
> >>
> >> Avoid saving the state of the config space when the device is in error.
> >> While restoring we only restorei the state that can be restored through
> >> kernel data such as BARs or doesn't depend on the saved state.
> >>
> >> Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
> >> ---
> >> drivers/pci/pci.c | 29 ++++++++++++++++++++++++++---
> >> drivers/pci/pcie/aer.c | 5 +++++
> >> drivers/pci/pcie/dpc.c | 5 +++++
> >> drivers/pci/pcie/ptm.c | 5 +++++
> >> drivers/pci/tph.c | 5 +++++
> >> drivers/pci/vc.c | 5 +++++
> >> 6 files changed, 51 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index b0f4d98036cd..4b67d22faf0a 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -1720,6 +1720,11 @@ static void pci_restore_pcie_state(struct pci_dev *dev)
> >> struct pci_cap_saved_state *save_state;
> >> u16 *cap;
> >>
> >> + if (!dev->state_saved) {
> >> + pci_warn(dev, "Not restoring pcie state, no saved state");
> >> + return;
> Hi Bjorn
>
> Thanks for taking a look.
>
> > Seems like a lot of messages. If we want to warn about this, why
> > don't we do it once in pci_restore_state()?
>
> I thought providing messages about which state is not restored would be
> better and meaningful as we try to restore some of the state. But if the
> preference is to just have a single warn message in pci_restore_state
> then I can update it. (would also like to hear if Alex has any
> objections to that)
I thought it got a bit verbose as well.
> > I guess you're making some judgment about what things can be restored
> > even when !dev->state_saved. That seems kind of hard to maintain in
> > the future as other capabilities are added.
> >
> > Also seems sort of questionable if we restore partial state and keep
> > using the device as if all is well. Won't the device be in some kind
> > of inconsistent, unpredictable state then?
To an extent that's always true. Reset is a lossy process, we're
intentionally throwing away the internal state of the device and
attempting to restore the architected config space as best as we can.
It's hard to guarantee it's complete though.
In this case we're largely just trying to determine whether the
pre-reset config space is already broken, which would mean that some
forms of reset are unavailable and our restore data is bogus. In
addition to the s390x specific scenario resolved here, I hope this
might eliminate some of the "device stuck in D3" or "device stuck with
pending transaction" errors we currently see trying to do PM or FLR
resets on broken devices. Failing to actually reset the device in any
way, then trying to write back -1 for restore data is what we'd see
today, which also isn't what we intend.
It probably doesn't make sense to note the specific capabilities that
aren't being restored. Probably a single pci_warn indicating the
device config space is inaccessible prior to reset and will only be
partially restored is probably sufficient. Thanks,
Alex
Powered by blists - more mailing lists