[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55b3a469-c306-acf1-f97e-f07f40054974@linux.intel.com>
Date: Tue, 26 May 2020 20:06:01 -0700
From: "Kuppuswamy, Sathyanarayanan"
<sathyanarayanan.kuppuswamy@...ux.intel.com>
To: Oliver O'Halloran <oohall@...il.com>
Cc: Yicong Yang <yangyicong@...ilicon.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
jay.vosburgh@...onical.com, linux-pci@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
ashok.raj@...el.com
Subject: Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for
non-hotplug capable devices
Hi,
On 5/26/20 8:00 PM, Oliver O'Halloran wrote:
> On Wed, May 27, 2020 at 12:00 PM Kuppuswamy, Sathyanarayanan
> <sathyanarayanan.kuppuswamy@...ux.intel.com> wrote:
>>
>> Hi,
>>
>> On 5/21/20 7:56 PM, Yicong Yang wrote:
>>>
>>>
>>> On 2020/5/22 3:31, Kuppuswamy, Sathyanarayanan wrote:
>>>>
>>> Not exactly. In pci_bus_error_reset(), we call pci_slot_reset() only if it's
>>> hotpluggable. But we always call pci_bus_reset() to perform a secondary bus
>>> reset for the bridge. That's what I think is unnecessary for a normal link,
>>> and that's what reset link indicates us to do. The slot reset is introduced
>>> in the process only to solve side effects. (c4eed62a2143, PCI/ERR: Use slot reset if available)
>>
>> IIUC, pci_bus_reset() will do slot reset if its supported (hot-plug
>> capable slots). If its not supported then it will attempt secondary
>> bus reset. So secondary bus reset will be attempted only if slot
>> reset is not supported.
>>
>> Since reported_error_detected() requests us to do reset, we will have
>> to attempt some kind of reset before we call ->slot_reset() right?
>
> Yes, the driver returns PCI_ERS_RESULT_NEED_RESET from
> ->error_detected() to indicate that it doesn't know how to recover
> from the error. How that reset is performed doesn't really matter, but
> it does need to happen.
>
>
>>> PCI_ERS_RESULT_NEED_RESET indicates that the driver
>>> wants a platform-dependent slot reset and its ->slot_reset() method to be called then.
>>> I don't think it's same as slot reset mentioned above, which is only for hotpluggable
>>> ones.
>> What you think is the correct reset implementation ? Is it something
>> like this?
>>
>> if (hotplug capable)
>> try_slot_reset()
>> else
>> do_nothing()
>
> Looks broken to me, but all the reset handling is a rat's nest so
> maybe I'm missing something. In the case of a DPC trip the link is
> disabled which has the side-effect of hot-resetting the downstream
> device. Maybe it's fine?
Yes, in case of DPC (Fatal errors) link is already reset. So we
don't need any special handling. This reset logic is mainly for
non-fatal errors.
>
> As an aside, why do we have both ->slot_reset() and ->reset_done() in
> the error handling callbacks? Seems like their roles are almost
> identical.
Not sure.I think reset_done() is final cleanup.
>
> Oliver
>
Powered by blists - more mailing lists