lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fba22d6b-c225-4b44-674b-2c62306135ed@amd.com>
Date:   Tue, 14 Feb 2023 01:33:54 -0800
From:   Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To:     Lukas Wunner <lukas@...ner.de>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Bjorn Helgaas <bhelgaas@...gle.com>, oohall@...il.com,
        Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Yazen Ghannam <yazen.ghannam@....com>,
        Fontenot Nathan <Nathan.Fontenot@....com>
Subject: Re: [PATCH 1/2] PCI: pciehp: Add support for OS-First Hotplug and
 AER/DPC

On 11/4/2022 3:15 AM, Lukas Wunner wrote:
> On Tue, Nov 01, 2022 at 12:07:18AM +0000, Smita Koralahalli wrote:
>> The implementation is as follows: On an async remove a DPC is triggered as
>> a side-effect along with an MSI to the OS. Determine it's an async remove
>> by checking for DPC Trigger Status in DPC Status Register and Surprise
>> Down Error Status in AER Uncorrected Error Status to be non-zero. If true,
>> treat the DPC event as a side-effect of async remove, clear the error
>> status registers and continue with hot-plug tear down routines. If not,
>> follow the existing routine to handle AER/DPC errors.
> Instead of having the OS recognize and filter Surprise Down events,
> it would also be possible to simply set the Surprise Down bit in the
> Uncorrectable Error Mask Register.  This could be constrained to
> Downstream Ports capable of surprise removal, i.e. those where the
> is_hotplug_bridge in struct pci_dev is set.  And that check and the
> register change could be performed in pci_dpc_init().
>
> Have you considered such an alternative approach?  If you have, what
> was the reason to prefer the more complex solution you're proposing?

By setting the Surprise down bit in Uncorrectable Error Mask register we 
will
not get a DPC event. What I know so far is, we cannot set this bit at 
run-time
after we determine its a surprise down event or probably I don't know
enough how to do it! (once an pciehp interrupt is triggered..).

And setting this bit at initialization might not trigger true DPC events..

Second thing, is masking Surprise Down bit has no impact on logging 
errors in
AER registers.

So, I think that approach probably will not resolve the issue of 
clearing the logs
in AER registers and complicate things while differentiating true errors 
vs surprise
down events. Please correct me if I'm wrong!!

I did few testing after I read your comments.  What I realized is that, 
these DPC
events (side effects of hot remove) are actually benign on AMD systems. 
On a hot
remove I see a Presence Detect change and a DPC event. This PD state change
will trigger a pciehp isr and calls pciehp_handle_presence_or_link_change()
and disables the slot normally. So essentially, this patch will boil 
down to the point
to clearing the logs in AER registers and also handling those error 
messages ("device
recovery failed"....) in dmesg which might confuse users on a hot remove.
What do you think?

Now, I'm not sure whether there will be a PD state change across other 
systems on a
hot remove when AER/DPC is native (OS First) in which case we should 
call the
pciehp_disable_slot() from dpc handler as well.. Any inputs would be
appreciated here..
>
>
>> +static void pci_clear_surpdn_errors(struct pci_dev *pdev)
>> +{
>> +	u16 reg16;
>> +	u32 reg32;
>> +
>> +	pci_read_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, &reg32);
>> +	pci_write_config_dword(pdev, pdev->dpc_cap + PCI_EXP_DPC_RP_PIO_STATUS, reg32);
>> +
>> +	pci_read_config_word(pdev, PCI_STATUS, &reg16);
>> +	pci_write_config_word(pdev, PCI_STATUS, reg16);
>> +
>> +	pcie_capability_read_word(pdev, PCI_EXP_DEVSTA, &reg16);
>> +	pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, reg16);
>> +}
> I don't understand why PCI_STATUS and PCI_EXP_DEVSTA need to be
> touched here?

This is just to mask any kind of appearance that there was an error since
the errors would have been induced by the hot plug event (just duplicating
our BIOS functionality here..).  But, please let me know if OS is already
handling the things here and if it is not required.
>
>
>> +static void pciehp_handle_surprise_removal(struct pci_dev *pdev)
> Since this function is located in dpc.c and is strictly called from
> other functions in the same file, it should be prefixed dpc_, not
> pciehp_.

Sure, will fix in v2.
>
>
>> +	/*
>> +	 * According to Section 6.13 and 6.15 of the PCIe Base Spec 6.0,
>> +	 * following a hot-plug event, clear the ARI Forwarding Enable bit
>> +	 * and AtomicOp Requester Enable as its not determined whether the
>> +	 * next device inserted will support these capabilities. AtomicOp
>> +	 * capabilities are not supported on PCI Express to PCI/PCI-X Bridges
>> +	 * and any newly added component may not be an ARI device.
>> +	 */
>> +	pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL2,
>> +				   (PCI_EXP_DEVCTL2_ARI | PCI_EXP_DEVCTL2_ATOMIC_REQ));
> That looks like a reasonable change, but it belongs in a separate
> patch.  And I think it should be performed as part of (de-)enumeration,
> not as part of DPC error handling.  What about Downstream Ports which
> are not DPC-capable, I guess the bits should be cleared as well, no?

AFAIK, DPC will work on all AMD root ports. I'm not sure how could we handle
on a per port basis if the bridges/ports downstream to root ports don't 
support
DPC..
>
> How about clearing the bits in pciehp_unconfigure_device()?

Okay.

Thanks,
Smita
>
> Thanks,
>
> Lukas


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ