[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200104003452.GA63827@google.com>
Date: Fri, 3 Jan 2020 18:34:52 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: sathyanarayanan.kuppuswamy@...ux.intel.com
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
ashok.raj@...el.com, keith.busch@...el.com
Subject: Re: [PATCH v11 1/8] PCI/ERR: Update error status after reset_link()
On Thu, Dec 26, 2019 at 04:39:07PM -0800, sathyanarayanan.kuppuswamy@...ux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
>
> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
> reset_link() to recover from fatal errors. But, if the reset is
> successful there is no need to continue the rest of the error recovery
> checks. Also, during fatal error recovery, if the initial value of error
> status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then
> even after successful recovery (using reset_link()) pcie_do_recovery()
> will report the recovery result as failure. So update the status of
> error after reset_link().
I like the part about updating "status" with the result of
reset_link(), and I split that into its own patch because it
seems like a fix that *can* be separated.
But I'm not convinced that we should skip the ->slot_reset()
callbacks if the reset_link() was successful. According to
Documentation/PCI/pci-error-recovery.rst, we should call
->slot_reset() after completion of the reset.
For example, rsxx_err_handler implements ->slot_reset(), but
not ->resume(). If we reset the device, we'll claim success and
return, but we won't call rsxx_slot_reset(), which does a bunch
of important-looking recovery stuff.
If pci-error-recovery.rst is wrong, we should fix that (after
auditing all the drivers to make sure they match).
> Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
> Cc: Ashok Raj <ashok.raj@...el.com>
> Cc: Keith Busch <keith.busch@...el.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Acked-by: Keith Busch <keith.busch@...el.com>
> ---
> drivers/pci/pcie/err.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index b0e6048a9208..53cd9200ec2c 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -204,9 +204,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> else
> pci_walk_bus(bus, report_normal_detected, &status);
>
> - if (state == pci_channel_io_frozen &&
> - reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> - goto failed;
> + if (state == pci_channel_io_frozen) {
> + status = reset_link(dev, service);
> + if (status != PCI_ERS_RESULT_RECOVERED)
> + goto failed;
> + goto done;
> + }
>
> if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> status = PCI_ERS_RESULT_RECOVERED;
> @@ -228,6 +231,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> if (status != PCI_ERS_RESULT_RECOVERED)
> goto failed;
>
> +done:
> pci_dbg(dev, "broadcast resume message\n");
> pci_walk_bus(bus, report_resume, &status);
>
> --
> 2.21.0
>
Powered by blists - more mailing lists