[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a73a2156-124b-47f7-b545-6751f6e87d54@nvidia.com>
Date: Mon, 27 Nov 2023 20:13:50 +0530
From: Vidya Sagar <vidyas@...dia.com>
To: Bjorn Helgaas <helgaas@...nel.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Lukas Wunner <lukas@...ner.de>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
"kbusch@...nel.org" <kbusch@...nel.org>
Cc: Vikram Sethi <vsethi@...dia.com>,
Krishna Thota <kthota@...dia.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
sagar.tv@...il.com
Subject: Re: Race b/w pciehp and FirmwareFirst DPC->EDR flow - Reg
Hi Bjorn and others,
Could you please share your thoughts on this?
Thanks,
Vidya Sagar
On 11/10/2023 10:31 PM, Vidya Sagar wrote:
> There was a typo. Corrected it.
>
> s/If DPC->EDR flow checks DPC also/If DPC->EDR flow checks 'PD Change' also
>
> On 11/10/2023 6:30 PM, Vidya Sagar wrote:
>> Hi folks,
>> Here are the platform details.
>> - System with a Firmware First approach for both AER and DPC
>> - CPERs are sent to the OS for the AER events
>> - DPC is notified to the OS through EDR mechanism
>> - System doesn't have support for in-band PD and supports only OOB PD
>> where writing to a private register would set the PD state
>> - System has this design where PD state gets cleared whenever there is
>> a link down (without even writing to the private register)
>>
>> In this system, if the root port has to experience a DPC because of
>> any right reasons (say SW trigger of DPC for time being), I'm
>> observing the following flow which is causing a race.
>> 1. Since DPC brings the link down, there is a DLLSC and an MSI is sent
>> to the OS hence PCIe HP ISR is invoked.
>> 2. ISR then takes stock of the Slot_Status register to see what all
>> events caused the MSI generation.
>> 3. It sees both DLLSC and PDC bits getting set.
>> 4. PDC is set because of the aforementioned hardware design where for
>> every link down, PD state gets cleared and since this is considered as
>> a change in the PD state, PDC also gets set.
>> 5. PCIe HP ISR flow transitions to the PCIe HP IST (interrupt
>> thread/bottom half) and waits for the DPC_EDR to complete (because
>> DLLSC is one of the events)
>> 6. Parallel to the PCIe HP ISR/IST, DPC interrupt is raised to the
>> firmware and that would then send an EDR event to the OS. Firmware
>> also sets the PD state to '1' before that, as the endpoint device is
>> still available.
>> 7. Firmware programming of the private register to set the PD state
>> raises second interrupt and PCIe HP ISR takes stock of the events and
>> this time it is only the PDC and not DLLSC. ISR sends a wake to the
>> IST, but since there is an IST in progress already, nothing much
>> happens at this point.
>> 8. Once the DPC is completed and link comes back up again, DPC
>> framework asks the endpoint drivers to handle it by calling the
>> 'pci_error_handlers' callabacks registered by the endpoint device
>> drivers.
>> 9. The PCIe HP IST (of the very first MSI) resumes after receiving the
>> completion from the DPC framework saying that DPC recovery is successful.
>> 10. Since PDC (Presence Detect Change) bit is also set for the first
>> interrupt, IST attempts to remove the devices (as part of
>> pciehp_handle_presence_or_link_change())
>>
>> At this point, there is a race between the device driver that is
>> trying to work with the device (through pci_error_handlers callback)
>> and the IST that is trying to remove the device.
>> To be fair to pciehp_handle_presence_or_link_change(), after removing
>> the devices, it checks for the link-up/PD being '1' and scans the
>> devices again if the device is still available. But unfortunately, IST
>> is deadlocked (with the device driver) while removing the devices
>> itself and won't go to the next step.
>>
>> The rationale given in the form of a comment in
>> pciehp_handle_presence_or_link_change() for removing the devices first
>> (without checking PD/link-up) and adding them back after checking
>> link-up/PD is that, since there is a change in PD, the new link-up
>> need not be with the same old device rather it could be with a
>> different device. So, it justifies in removing the existing devices
>> and then scanning for new ones. But this is causing deadlock with the
>> already initiated DPC recovery process.
>>
>> I see two ways to avoid the deadlock in this scenario.
>> 1. When PCIe HP IST is looking at both DLLSC and PDC, why should
>> DPC->EDR flow look at only DLLSC and not DPC? If DPC->EDR flow checks
>> 'PD Change' also (along with DLL) and declares that the DPC recovery
>> can't happen (as there could be a change of device) hence returning
>> DPC recovery failure, then, the device driver's pci_error_handlers
>> callbacks won't be called and there won't be any deadlock.
>> 2. Check for the possibility of a common lock so that PCIe HP IST and
>> device driver's pci_error_handlers callbacks don't race.
>>
>> Let me know your comments on this.
>>
>> Thanks,
>> Vidya Sagar
Powered by blists - more mailing lists