lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 May 2020 13:35:06 +1000
From:   "Oliver O'Halloran" <oohall@...il.com>
To:     "Kuppuswamy, Sathyanarayanan" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc:     Yicong Yang <yangyicong@...ilicon.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        jay.vosburgh@...onical.com, linux-pci@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ashok.raj@...el.com, Sam Bobroff <sbobroff@...ux.ibm.com>
Subject: Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for
 non-hotplug capable devices

On Wed, May 27, 2020 at 1:06 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com> wrote:
>
> Yes, in case of DPC (Fatal errors) link is already reset. So we
> don't need any special handling. This reset logic is mainly for
> non-fatal errors.

Why? In our experience most fatal errors aren't all that fatal and can
be recovered by resetting the device. The base spec backs that up (see
gen5 base, sec 6.2) too saying the main point of distinction between
fatal and non-fatal errors is whether handling the error requires a
reset or not. For EEH we always try to recover the device and only
mark it as permanently failed once the devices goes over the max error
threshold (5 errors per hour, by default). Doing something similar for
(native) DPC would make sense IMO.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ