lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b077a691-5790-40a0-8539-0f5294d0fc28@linux.intel.com>
Date: Wed, 10 Jan 2024 16:37:26 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, kevin.tian@...el.com,
 bhelgaas@...gle.com, dwmw2@...radead.org, will@...nel.org,
 robin.murphy@....com, lukas@...ner.de
Cc: linux-pci@...r.kernel.org, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation
 request when device is disconnected


On 1/10/2024 1:24 PM, Baolu Lu wrote:
> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>> Except those aggressive hotplug cases - surprise remove a hotplug device
>> while its safe removal is requested and handled in process by:
>>
>> 1. pull it out directly.
>> 2. turn off its power.
>> 3. bring the link down.
>> 4. just died there that moment.
>>
>> etc, in a word, 'gone' or 'disconnected'.
>>
>> Mostly are regular normal safe removal and surprise removal unplug.
>> these hot unplug handling process could be optimized for fix the ATS
>> Invalidation hang issue by calling pci_dev_is_disconnected() in function
>> devtlb_invalidation_with_pasid() to check target device state to avoid
>> sending meaningless ATS Invalidation request to iommu when device is 
>> gone.
>> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>
>> For safe removal, device wouldn't be removed untill the whole software
>> handling process is done, it wouldn't trigger the hard lock up issue
>> caused by too long ATS Invalidation timeout wait. In safe removal path,
>> device state isn't set to pci_channel_io_perm_failure in
>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>> return
>> false there, wouldn't break the function.
>>
>> For surprise removal, device state is set to 
>> pci_channel_io_perm_failure in
>> pciehp_unconfigure_device(), means device is already gone (disconnected)
>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
>> return true to break the function not to send ATS Invalidation 
>> request to
>> the disconnected device blindly, thus avoid the further long time 
>> waiting
>> triggers the hard lockup.
>>
>> safe removal & surprise removal
>>
>> pciehp_ist()
>>     pciehp_handle_presence_or_link_change()
>>       pciehp_disable_slot()
>>         remove_board()
>>           pciehp_unconfigure_device(presence)
>>
>> Tested-by: Haorong Ye <yehaorong@...edance.com>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@...ux.intel.com>
>> ---
>>   drivers/iommu/intel/pasid.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 715943531091..3d5ed27f39ef 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>> *iommu,
>>       if (!info || !info->ats_enabled)
>>           return;
>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>> +        return;
>
> Why do you need the above after changes in PATCH 2/5? It's unnecessary
> and not complete. We have other places where device TLB invalidation is
> issued, right?

This one could be regarded as optimization, no need to trapped into rabbit

hole if we could predict the result. because the bad thing is we don't know

what response to us in the rabbit hole from third party switch (bridges will

feedback timeout to requester as PCIe spec mentioned if the endpoint is

gone).


Thanks,

Ethan

>
>>       /*
>>        * When PASID 0 is used, it indicates RID2PASID(DMA request w/o 
>> PASID),
>>        * devTLB flush w/o PASID should be used. For non-zero PASID under
>
> Best regards,
> baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ