[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260207003936.GA112515@bhelgaas>
Date: Fri, 6 Feb 2026 18:39:36 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Farhan Ali <alifm@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, lukas@...ner.de, alex@...zbot.org,
clg@...hat.com, stable@...r.kernel.org, schnelle@...ux.ibm.com,
mjrosato@...ux.ibm.com, julianr@...ux.ibm.com
Subject: Re: [PATCH v8 3/9] PCI: Avoid saving config space state if
inaccessible
On Thu, Jan 22, 2026 at 11:44:31AM -0800, Farhan Ali wrote:
> The current reset process saves the device's config space state before
> reset and restores it afterward. However errors may occur unexpectedly and
> it may then be impossible to save config space because the device may be
> inaccessible (e.g. DPC) or config space may be corrupted. This results in
> saving corrupted values that get written back to the device during state
> restoration.
>
> With a reset we want to recover/restore the device into a functional state.
> So avoid saving the state of the config space when the device config space
> is inaccessible.
>
> Signed-off-by: Farhan Ali <alifm@...ux.ibm.com>
> ---
> drivers/pci/pci.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index c105e285cff8..e7beaf1f65a7 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4960,6 +4960,7 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
>
> static void pci_dev_save_and_disable(struct pci_dev *dev)
> {
> + u32 val;
> const struct pci_error_handlers *err_handler =
> dev->driver ? dev->driver->err_handler : NULL;
>
> @@ -4980,6 +4981,19 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> */
> pci_set_power_state(dev, PCI_D0);
>
> + /*
> + * If device's config space is inaccessible it can return ~0 for
> + * any reads. Since VFs can also return ~0 for Device and Vendor ID
> + * check Command and Status registers. At the very least we should
> + * avoid restoring config space for device with error bits set in
> + * Status register.
> + */
> + pci_read_config_dword(dev, PCI_COMMAND, &val);
> + if (PCI_POSSIBLE_ERROR(val)) {
> + pci_warn(dev, "Device config space inaccessible\n");
> + return;
> + }
This seems related to Lukas' recent patches:
a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
5e09895b4063 ("Documentation: PCI: Amend error recovery doc with pci_save_state() rules")
My poor understanding is that the PCI core now saves config space for
every device at enumeration time, and if a driver wants to capture an
updated config space after it has changed things, it is responsible
for doing that explicitly.
a2f1e22390ac gives us a baseline saved state that will be restored in
some cases. This pci_dev_save_and_disable() is part of the reset path
and saves a potentially different state. I'm a little uncomfortable
about the fact that we save the state at different times, including
unpredictable times after an error, and I'm not sure the driver can
always know what gets restored.
Maybe the reset path shouldn't even try to save config space again,
since we're now guaranteed to have at least the state from
enumeration? I suppose skipping the save here would expose cases
where the driver changed config space without doing a pci_save_state()
itself, and a driver- or sysfs-initiated reset would lose that change,
whereas today we preserve it because we save/restore as part of that
reset.
That change would also be lost if the reset was unintentional, e.g.,
during error recovery, but I guess in that case the driver does know
that an error occurred, so it could redo the change.
It seems like the ideal thing would be if every restore used the
enumeration saved-state or the driver's intentional saved-state, never
a state saved at the unpredictable time of an error recovery reset.
I hope Lukas and Alex can chime in here.
> pci_save_state(dev);
> /*
> * Disable the device by clearing the Command register, except for
> --
> 2.43.0
>
Powered by blists - more mailing lists