lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Sep 2016 07:52:00 +0000
From:   Yuval Mintz <Yuval.Mintz@...gic.com>
To:     "Guilherme G. Piccoli" <gpiccoli@...ux.vnet.ibm.com>
CC:     netdev <netdev@...r.kernel.org>,
        Ariel Elior <Ariel.Elior@...gic.com>
Subject: RE: [PATCH net v2] bnx2x: don't reset chip on cleanup if PCI function
 is offline

> When PCI error is detected, in some architectures (like PowerPC) a slot reset is
> performed - the driver's error handlers are in charge of "disable"
> device before the reset, and re-enable it after a successful slot reset.
> 
> There are two cases though that another path is taken on the code: if the slot
> reset is not successful or if too many errors already happened in the specific
> adapter (meaning that possibly the device is experiencing a HW failure that slot
> reset is not able to solve), the core PCI error mechanism (called EEH in PowerPC)
> will remove the adapter from the system, since it will consider this as a
> permanent failure on device. In this case, a path is taken that leads to
> bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which then tries to perform a
> HW reset on chip. This reset won't succeed since the HW is in a fault state,
> which can be seen by multiple messages on kernel log like below:
> 
> 	bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
> 	bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1
> 
> After some time, the PCI error mechanism gives up on waiting the driver's
> correct removal procedure and forcibly remove the adapter from the system.
> We can see soft lockup while core PCI error mechanism is waiting for driver to
> accomplish the right removal process.
> 
> This patch adds a verification to avoid a chip reset whenever the function is in
> PCI error state - since this case is only reached when we have a device being
> removed because of a permanent failure, the HW chip reset is not expected to
> work fine neither is necessary.
> 
> Also, as a minor improvement in error path, we avoid the MCP information
> dump in case of non-recoverable PCI error (when adapter is about to be
> removed), since it will certainly fail.
> 
> Reported-by: Harsha Thyagaraja <hathyaga@...ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@...ux.vnet.ibm.com>

Thanks.

Acked-By: Yuval Mintz <Yuval.Mintz@...gic.com>

Powered by blists - more mailing lists