lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 9 Jan 2020 17:26:39 -0600
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        ashok.raj@...el.com, keith.busch@...el.com
Subject: Re: [PATCH v11 1/8] PCI/ERR: Update error status after reset_link()

On Wed, Jan 08, 2020 at 04:14:09PM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 1/3/20 6:54 PM, Bjorn Helgaas wrote:
> > On Fri, Jan 03, 2020 at 05:03:03PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > > On 1/3/20 4:34 PM, Bjorn Helgaas wrote:
> > > > On Thu, Dec 26, 2019 at 04:39:07PM -0800, sathyanarayanan.kuppuswamy@...ux.intel.com wrote:
> > > > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> > > > > 
> > > > > Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
> > > > > reset_link() to recover from fatal errors. But, if the reset is
> > > > > successful there is no need to continue the rest of the error recovery
> > > > > checks. Also, during fatal error recovery, if the initial value of error
> > > > > status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then
> > > > > even after successful recovery (using reset_link()) pcie_do_recovery()
> > > > > will report the recovery result as failure. So update the status of
> > > > > error after reset_link().
> > > > I like the part about updating "status" with the result of
> > > > reset_link(), and I split that into its own patch because it
> > > > seems like a fix that *can* be separated.
> > > > 
> > > > But I'm not convinced that we should skip the ->slot_reset()
> > > > callbacks if the reset_link() was successful.
> > > If reset_link() call is successful then the result value will be
> > > "PCI_ERS_RESULT_RECOVERED". So even if you proceed with
> > > rest of the code, slot_reset() will never get called right ?
> > The current code:
> > 
> >          if (state == pci_channel_io_frozen &&
> >              reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> >                  goto failed;
> >          ...
> >          if (status == PCI_ERS_RESULT_NEED_RESET) {
> >                  status = PCI_ERS_RESULT_RECOVERED;
> >                  pci_walk_bus(bus, report_slot_reset, &status);
> > 
> > doesn't save the result of reset_link(), so if status was
> > PCI_ERS_RESULT_NEED_RESET and the reset succeeds, we will call
> > ->slot_reset().
> > 
> > After your patch, if "state == pci_channel_io_frozen", we *never* call
> > ->slot_reset().
> > 
> > Do you think that matches pci-error-recovery.rst?  It doesn't seem
> > like it to me, but perhaps I haven't read it closely enough.
> Documentation does not have clear details on what to do with return
> value of reset_link() (step 3). But IMO, if step 3 recovers the device and
> returns PCI_ERS_RESULT_RECOVERED then there is no need to proceed
> to slot reset (step 4). May be we should update the Documentation?

Are you suggesting we don't need to call a driver callback after
resetting the device?  Note that the ->slot_reset() doesn't *perform*
a reset; it is called *after* completion of a reset.

The doc says:

  ... Upon completion of slot reset, the platform will call the device
  slot_reset() callback.
  ...
  This call gives drivers the chance to re-initialize the hardware
  (re-download firmware, etc.).  At this point, the driver may assume
  that the card is in a fresh state and is fully functional. The slot
  is unfrozen and the driver has full access to PCI config space,
  memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
  will also be available.

After we reset a device, the driver certainly needs a chance to
reinitialize it.

> > > > According to
> > > > Documentation/PCI/pci-error-recovery.rst, we should call
> > > > ->slot_reset() after completion of the reset.
> > > > 
> > > > For example, rsxx_err_handler implements ->slot_reset(), but
> > > > not ->resume().  If we reset the device, we'll claim success and
> > > > return, but we won't call rsxx_slot_reset(), which does a bunch
> > > > of important-looking recovery stuff.
> > > > 
> > > > If pci-error-recovery.rst is wrong, we should fix that (after
> > > > auditing all the drivers to make sure they match).
> > > > 
> > > > > Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
> > > > > Cc: Ashok Raj <ashok.raj@...el.com>
> > > > > Cc: Keith Busch <keith.busch@...el.com>
> > > > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> > > > > Acked-by: Keith Busch <keith.busch@...el.com>
> > > > > ---
> > > > >    drivers/pci/pcie/err.c | 10 +++++++---
> > > > >    1 file changed, 7 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> > > > > index b0e6048a9208..53cd9200ec2c 100644
> > > > > --- a/drivers/pci/pcie/err.c
> > > > > +++ b/drivers/pci/pcie/err.c
> > > > > @@ -204,9 +204,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> > > > >    	else
> > > > >    		pci_walk_bus(bus, report_normal_detected, &status);
> > > > > -	if (state == pci_channel_io_frozen &&
> > > > > -	    reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
> > > > > -		goto failed;
> > > > > +	if (state == pci_channel_io_frozen) {
> > > > > +		status = reset_link(dev, service);
> > > > > +		if (status != PCI_ERS_RESULT_RECOVERED)
> > > > > +			goto failed;
> > > > > +		goto done;
> > > > > +	}
> > > > >    	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> > > > >    		status = PCI_ERS_RESULT_RECOVERED;
> > > > > @@ -228,6 +231,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
> > > > >    	if (status != PCI_ERS_RESULT_RECOVERED)
> > > > >    		goto failed;
> > > > > +done:
> > > > >    	pci_dbg(dev, "broadcast resume message\n");
> > > > >    	pci_walk_bus(bus, report_resume, &status);
> > > > > -- 
> > > > > 2.21.0
> > > > > 
> > > -- 
> > > Sathyanarayanan Kuppuswamy
> > > Linux kernel developer
> > > 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux kernel developer
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ