[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOSf1CE00f_3KxWAPvWngsW8z_frw6=qB70H+VmdSULaspHWhQ@mail.gmail.com>
Date: Wed, 27 May 2020 13:35:06 +1000
From: "Oliver O'Halloran" <oohall@...il.com>
To: "Kuppuswamy, Sathyanarayanan"
<sathyanarayanan.kuppuswamy@...ux.intel.com>
Cc: Yicong Yang <yangyicong@...ilicon.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
jay.vosburgh@...onical.com, linux-pci@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
ashok.raj@...el.com, Sam Bobroff <sbobroff@...ux.ibm.com>
Subject: Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for
non-hotplug capable devices
On Wed, May 27, 2020 at 1:06 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com> wrote:
>
> Yes, in case of DPC (Fatal errors) link is already reset. So we
> don't need any special handling. This reset logic is mainly for
> non-fatal errors.
Why? In our experience most fatal errors aren't all that fatal and can
be recovered by resetting the device. The base spec backs that up (see
gen5 base, sec 6.2) too saying the main point of distinction between
fatal and non-fatal errors is whether handling the error requires a
reset or not. For EEH we always try to recover the device and only
mark it as permanently failed once the devices goes over the max error
threshold (5 errors per hour, by default). Doing something similar for
(native) DPC would make sense IMO.
Powered by blists - more mailing lists