[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ab986f2-6aa5-401a-aa21-e8b21f68eaad@amd.com>
Date: Mon, 22 May 2023 15:23:57 -0700
From: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>, oohall@...il.com,
Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
Yazen Ghannam <yazen.ghannam@....com>,
Fontenot Nathan <Nathan.Fontenot@....com>
Subject: Re: [PATCH v2 1/2] PCI: pciehp: Add support for async hotplug with
native AER and DPC/EDR
Hi,
On 5/16/2023 3:10 AM, Lukas Wunner wrote:
> On Tue, Apr 18, 2023 at 09:05:25PM +0000, Smita Koralahalli wrote:
>> According to Section 6.7.6 of PCIe Base Specification [1], async removal
>> with DPC and EDR may be unexpected and may result in a Surprise Down error.
>> This error is just a side effect of hot remove. Most of the time, these
>> errors will be abstract to the OS as current systems rely on Firmware-First
>> model for AER and DPC, where the error handling (side effects of async
>> remove) and other necessary HW sequencing actions is taken care by the FW
>> and is abstract to the OS. However, FW-First model poses issues while
>> rolling out updates or fixing bugs as the servers need to be brought down
>> for firmware updates.
>>
>> Add support for async hot-plug with native AER and DPC/EDR. Here, OS is
>> responsible for handling async add and remove along with handling of AER
>> and DPC events which are generated as a side-effect of async remove.
>
> I think you can probably leave out the details about Firmware-First.
> Pointing to 6.7.6 and the fact that Surprise Down Errors may occur as
> an expected side-effect of surprise removal is sufficient. They should
> be treated as such.
>
> You also want to point out what you're trying to achieve here, i.e. get
> rid of irritating log messages and a 1 sec delay while pciehp waits for
> DPC recovery.
Okay.
>
>
>> Please note that, I have provided explanation why I'm not setting the
>> Surprise Down bit in uncorrectable error mask register in AER.
>> https://lore.kernel.org/all/fba22d6b-c225-4b44-674b-2c62306135ed@amd.com/
>
> Add an explanation to the commit message that masking Surprise Down Errors
> was explored as an alternative approach, but scrapped due to the odd
> behavior that masking only avoids the interrupt, but still records an
> error per PCIe r6.0.1 sec 6.2.3.2.2. That stale error is going to be
> reported the next time some error other than Surprise Down is handled.
>
>
Okay.
>> Also, while testing I noticed PCI_STATUS and PCI_EXP_DEVSTA will be set
>> on an async remove and will not be cleared while the device is brought
>> down. I have included clearing them here in order to mask any kind of
>> appearance that there was an error and as well duplicating our BIOS
>> functionality. I can remove if its not necessary.
>
> Which bits are set exactly? Can you constrain the register write to
> those bits?
>
Hmm, I was mostly trying to follow the similar approach done for AER.
pci_aer_raw_clear_status(), where they clear status registers
unconditionally. Also, thought might be better if we are dealing with
legacy endpoints or so.
I see these bits set in status registers:
PCI_ERR_UNCOR_STATUS 0x20 (Surprise Down)
PCI_EXP_DPC_RP_PIO_STATUS 0x10000 (Memory Request received URCompletion)
PCI_EXP_DEVSTA 0x604 (fatal error detected)
>
>> +static void dpc_handle_surprise_removal(struct pci_dev *pdev)
>> +{
>> + if (pdev->dpc_rp_extensions && dpc_wait_rp_inactive(pdev))
>> + return;
>
> Emit an error message here?
Okay
>
>
>> + /*
>> + * According to Section 6.7.6 of the PCIe Base Spec 6.0, since async
>> + * removal might be unexpected, errors might be reported as a side
>> + * effect of the event and software should handle them as an expected
>> + * part of this event.
>> + */
>
> I'd move that code comment to into dpc_handler() above the
> "if (dpc_is_surprise_removal(pdev))" check.
Okay.
>
>
>> + pci_aer_raw_clear_status(pdev);
>> + pci_clear_surpdn_errors(pdev);
>> +
>> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
>> + PCI_EXP_DPC_STATUS_TRIGGER);
>> +}
>
> Do you need a "wake_up_all(&dpc_completed_waitqueue);" at the end
> of the function to wake up a pciehp handler waiting for DPC recovery?
I don't think so. The pciehp handler is however getting invoked
simultaneously due to PDSC or DLLSC state change right.. Let me know if
I'm missing anything here.
>
>
>> +static bool dpc_is_surprise_removal(struct pci_dev *pdev)
>> +{
>> + u16 status;
>> +
>> + pci_read_config_word(pdev, pdev->aer_cap + PCI_ERR_UNCOR_STATUS, &status);
>> +
>> + if (!(status & PCI_ERR_UNC_SURPDN))
>> + return false;
>> +
>
> You need an additional check for pdev->is_hotplug_bridge here.
>
> And you need to read PCI_EXP_SLTCAP and check for PCI_EXP_SLTCAP_HPS.
>
> Return false if either of them isn't set.
Return false, if PCI_EXP_SLTCAP isn't set only correct?
PCI_EXP_SLTCAP_HPS should be disabled if DPC is enabled.
Implementation notes in 6.7.6 says that:
"The Hot-Plug Surprise (HPS) mechanism, as indicated by the Hot-Plug
Surprise bit in the Slot Capabilities Register being Set, is deprecated
for use with async hot-plug. DPC is the recommended mechanism for
supporting async hot-plug."
Platform FW will disable the SLTCAP_HPS bit at boot time to enable async
hotplug on AMD devices.
Probably check if SLTCAP_HPS bit is set and return false?
>
>
>> + dpc_handle_surprise_removal(pdev);
>> +
>> + return true;
>> +}
>
> A function named "..._is_..." should not have any side effects.
> So move the dpc_handle_surprise_removal() call down into dpc_handler()
> before the "return IRQ_HANDLED;" statement.
Okay.
>
>
> What about ports which support AER but not DPC? Don't you need to
> amend aer.c in that case? I suppose you don't have hardware without
> DPC to test that?
Yeah right. Also, if DPC isn't supported we leave HPS bit set and we
won't see the DPC event at that time.
Thanks,
Smita
>
> Thanks,
>
> Lukas
Powered by blists - more mailing lists