lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc36bb5b-6a4a-258b-6707-4d019154e019@amd.com>
Date:   Mon, 22 May 2023 15:23:31 -0700
From:   Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To:     Lukas Wunner <lukas@...ner.de>
Cc:     linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
        Bjorn Helgaas <bhelgaas@...gle.com>, oohall@...il.com,
        Mahesh J Salgaonkar <mahesh@...ux.ibm.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Yazen Ghannam <yazen.ghannam@....com>,
        Fontenot Nathan <Nathan.Fontenot@....com>
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Clear the optional capabilities in
 DEVCTL2 on a hot-plug

Hi Lukas,

Sorry for the delay in replying to this patch. We had some internal 
discussions after the review comments. Please find the responses inline..

On 5/11/2023 4:19 AM, Lukas Wunner wrote:
> On Tue, Apr 18, 2023 at 09:05:26PM +0000, Smita Koralahalli wrote:
>> Clear all capabilities in Device Control 2 register as they are optional
>> and it is not determined whether the next device inserted will support
>> these capabilities. Moreover, Section 6.13 of the PCIe Base
>> Specification [1], recommends clearing the ARI Forwarding Enable bit on
>> a hot-plug event as its not guaranteed that the newly added component
>> is in fact an ARI device.
> 
> Clearing ARI Forwarding Enable sounds reasonable, but I don't see
> why all the other bits in the Device Control 2 register need to be
> cleared.  If there isn't a reason to clear them, I'd be in favor of
> leaving them alone.

I understand. The SPEC doesn't "clearly" state to clear the other bits 
except ARI on a hot-plug event.

But, we came across issues when a device with 10-bit tags was removed 
and replaced with a device that didn't support 10-bit tags. The device 
became inaccessible and the port was not able to be recovered without a 
system reset. So, we thought it would be better to cherry pick all bits 
that were negotiated between endpoint and root port and decided that we 
should clear them all on removal. Hence, my first revision of this patch 
series had aimed to clear only ARI, AtomicOp Requestor and 10 bit tags 
as these were the negotiated settings between endpoint and root port. 
Ideally, these settings should be re-negotiated and set up for optimal 
operation on a hot add.

We had some internal discussions to understand if SPEC has it documented 
somewhere. And we could see in Section 2.2.6.2, it implies that:
[i] If a Requester sends a 10-Bit Tag Request to a Completer that lacks 
10-Bit Completer capability, the returned Completion(s) will have Tags 
with Tag[9:8] equal to 00b. Since the Requester is forbidden to generate 
these Tag values for 10-Bit Tags, such Completions will be handled as 
Unexpected Completions, which by default are Advisory Non-Fatal Errors. 
The Requester must follow standard PCI Express error handling requirements.
[ii] In configurations where a Requester with 10-Bit Tag Requester 
capability needs to target multiple Completers, one needs to ensure that 
the Requester sends 10-Bit Tag Requests only to Completers that have 
10-Bit Tag Completer capability.

Now, we might wonder, why clear them (especially 10-bit tags and 
AtomicOps) if Linux hasn't enabled them at all as the "10-Bit Tag 
Requester Enable" bit is not defined in Linux currently. But, these 
features might be enabled by Platform FW for "performance reasons" if 
the endpoint supports and now it is the responsibility of the operating 
system to disable it on a hot remove event.

According to implementation notes in 2.2.6.2: "For platforms where the 
RC supports 10-Bit Tag Completer capability, it is highly recommended 
for platform firmware or operating software that configures PCIe 
hierarchies to Set the 10-Bit Tag Requester Enable bit automatically in 
Endpoints with 10-Bit Tag Requester capability. This enables the 
important class of 10-Bit Tag capable adapters that send Memory Read 
Requests only to host memory." So if the endpoint and root port both 
support 10-bit tags BIOS is enabling it at boot time..

I ran a quick check to see how DEV_CTL2 registers for root port look on 
a 10-bit tag supported NVMe device.

pci 0000:80:05.1: DEVCTL2 0x1726 (Bit 12: 10-bit tag is enabled..)
pci 0000:80:05.1: DEVCAP2 0x7f19ff

So, it seems like BIOS has enabled 10-bit tags at boot time even though 
Linux hasn't enabled it.

Some couple of ways we think could be:
[1] Check if these bits are enabled by Platform at boot time, clear them 
only it is set during hotplug flow.
[2] Clear them unconditionally as I did..
[3] Enable 10-bits tags in Linux when a device is probed just like how 
we do for ARI..

Similarly call pci_enable_atomic_ops_to_root() during a hot add..

Let me know what you think..

> 
> As for clearing ARI Forwarding Enable, it seems commit b0cc6020e1cc
> ("PCI: Enable ARI if dev and upstream bridge support it; disable
> otherwise") already solved this problem.  Quoth its commit message:
> 
>     "Currently, we enable ARI in a device's upstream bridge if the bridge and
>      the device support it.  But we never disable ARI, even if the device is
>      removed and replaced with a device that doesn't support ARI.
>      
>      This means that if we hot-remove an ARI device and replace it with a
>      non-ARI multi-function device, we find only function 0 of the new device
>      because the upstream bridge still has ARI enabled, and next_ari_fn()
>      only returns function 0 for the new non-ARI device.
>      
>      This patch disables ARI in the upstream bridge if the device doesn't
>      support ARI.  See the PCIe spec, r3.0, sec 6.13."
> 
> My superficial understanding of that patch is that we do find function 0,
> while enumerating it we clear the ARI Forwarding Enable bit and thus the
> remaining functions become accessible and are subsequently enumerated.
> 
> Are you seeing issues when removing an ARI-capable endpoint from a
> hotplug slot and replacing it with a non-ARI-capable device?
> If you do, the question is why the above-quoted commit doesn't avoid them.

Yeah! Sorry I missed this. ARI is already checked and enabled during 
device initialization.

> 
> 
>> --- a/drivers/pci/hotplug/pciehp_pci.c
>> +++ b/drivers/pci/hotplug/pciehp_pci.c
>> @@ -104,6 +104,7 @@ void pciehp_unconfigure_device(struct controller *ctrl, bool presence)
>>   	list_for_each_entry_safe_reverse(dev, temp, &parent->devices,
>>   					 bus_list) {
>>   		pci_dev_get(dev);
>> +		pcie_capability_clear_word(dev, PCI_EXP_DEVCTL2, 0xffff);
>>   		pci_stop_and_remove_bus_device(dev);
>>   		/*
>>   		 * Ensure that no new Requests will be generated from
> 
> This clears the Device Control 2 register on the hotplugged device,
> but to clear ARI Forwarding Enable, you'd have to clear the register
> of the hotplug port, i.e. the *parent* of the hotplugged device.

I agree!

Thanks,
Smita
> 
> Also, this patch doesn't apply cleanly to v6.4-rc1 because of a context
> change by f5eff5591b8f ("PCI: pciehp: Fix AB-BA deadlock between
> reset_lock and device_lock").
> 
> Thanks,
> 
> Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ