[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8f4d543a-0b6d-65ca-59ef-70370fbfcb12@sangfor.com.cn>
Date: Wed, 3 May 2023 21:31:42 +0800
From: Ding Hui <dinghui@...gfor.com.cn>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: dinghui@...gfor.com.cn, bhelgaas@...gle.com,
sathyanarayanan.kuppuswamy@...ux.intel.com, vidyas@...dia.com,
david.e.box@...ux.intel.com, kai.heng.feng@...onical.com,
michael.a.bottini@...ux.intel.com, rajatja@...gle.com,
qinzongquan@...gfor.com.cn, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/ASPM: fix UAF by removing cached downstream
On 2023/5/2 12:52 上午, Bjorn Helgaas wrote:
> On Mon, May 01, 2023 at 11:50:50AM +0800, Ding Hui wrote:
>> On 2023/5/1 10:10, Bjorn Helgaas wrote:
>>> On Sat, Apr 29, 2023 at 09:26:04PM +0800, Ding Hui wrote:
>>>> If the function 0 of a multifunction device is removed, an freed
>>>> downstream pointer will be left in struct pcie_link_state, and then
>>>> when pcie_config_aspm_link() be invoked from any path, we will get a
>>>> KASAN use-after-free report.
>>>
>>> Thanks for finding this problem, debugging it, and the patch!
>>>
>>> In this case we're doing a "software remove" and the other functions
>>> are still present, right? It's kind of annoying that there's only one
>>> link, but all the functions of a multifunction device have a Link
>>> Control register, and the spec "recommends" that software program the
>>> same ASPM control value for all the functions.
>>
>> Yes, that is the case.
>>
>>> The hardware of course doesn't know anything about this software
>>> remove; all the functions are still physically present and powered up.
>>>
>>> That makes me think that if software ignores the "removed" function
>>> and continues to operate ASPM on the N-1 remaining functions, we're
>>> outside the spec recommendations because the ASPM configuration is no
>>> longer the same across all the functions.
>>>
>>> So my inclination would be disable ASPM completely when any function
>>> of a multi-function device is removed. What are your thoughts on
>>> this?
>>
>> Agree with you.
>>
>> Previously, I thought another fix that was if the function 0 is removed,
>> we can free the link state to disable ASPM for this link.
>>
>> Now following you suggestion, it can be expanded to any child function.
>>
>> How about fixing like this?
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 66d7514ca111..657e0647d19f 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1011,12 +1011,11 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
>> down_read(&pci_bus_sem);
>> mutex_lock(&aspm_lock);
>> /*
>> - * All PCIe functions are in one slot, remove one function will remove
>> - * the whole slot, so just wait until we are the last function left.
>> + * All PCIe functions are in one slot.
>> + * The spec "recommends" that software program set the same ASPM control
>> + * value for all the functions.
>> + * Disable ASPM when any child function is removed.
>
> Since we're updating the comment anyway, let's clean up the "slot"
> language here. The PCIe spec doesn't use "slot" in the context of the
> bus/device/function PCIe topology; it only uses it when referring to a
> physical connector where a card might be installed. What we want here
> is "Device," and then we have to consider whether ARI makes any
> difference here.
>
> The spec says (referring to ASPM Control):
>
> For Multi-Function Devices (including ARI Devices), it is
> recommended that software program the same value for this field in
> all Functions. For non-ARI Multi-Function Devices, only capabilities
> enabled in all Functions are enabled for the component as a whole.
>
> For ARI Devices, ASPM Control is determined solely by the setting in
> Function 0, regardless of Function 0’s D-state. The settings in the
> other Functions always return whatever value software programmed for
> each, but otherwise are ignored by the component.
>
> A spec reference, e.g., "PCIe r6.0, sec 7.5.3.7", would be good here.
>
> Anyway, I think the idea of "software removing" a single function is
> kind of a niche situation that we don't need to worry about
> optimizing, and I think turning off ASPM completely will avoid a lot
> of weird corner cases.
>
Thanks for details.
I'll redo the patch with different title.
>> */
>> - if (!list_empty(&parent->subordinate->devices))
>> - goto out;
>> -
>> link = parent->link_state;
>> root = link->root;
>> parent_link = link->parent;
>>
>>
>> --
>> Thanks,
>> -dinghui
>>
>
--
Thanks,
-dinghui
Powered by blists - more mailing lists