lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 Dec 2023 09:16:00 +0800
From: Ethan Zhao <haifeng.zhao@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: bhelgaas@...gle.com, baolu.lu@...ux.intel.com, dwmw2@...radead.org,
 will@...nel.org, robin.murphy@....com, linux-pci@...r.kernel.org,
 iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v6 4/4] iommu/vt-d: break out devTLB invalidation if
 target device is gone


On 12/24/2023 6:47 PM, Lukas Wunner wrote:
> On Sun, Dec 24, 2023 at 12:06:57AM -0500, Ethan Zhao wrote:
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1423,6 +1423,13 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>>   	writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
>>   
>>   	while (qi->desc_status[wait_index] != QI_DONE) {
>> +		/*
>> +		 * if the devTLB invalidation target device is gone, don't wait
>> +		 * anymore, it might take up to 1min+50%, causes system hang.
>> +		 */
>> +		if (type == QI_DIOTLB_TYPE && iommu->flush_target_dev)
>> +			if (!pci_device_is_present(to_pci_dev(iommu->flush_target_dev)))
>> +				break;
> As a general approach, this is much better now.
>
> Please combine the nested if-clauses into one.
That would be harder to read ?
> Please amend the code comment with a spec reference, i.e.
> "(see Implementation Note in PCIe r6.1 sec 10.3.1)"
> so that readers of the code know where the magic number "1min+50%"
> is coming from.
Yup.
>
> Is flush_target_dev guaranteed to always be a pci_dev?

yes, as Baolu said, only PCI and ATS capable device supports

devTLB invalidation operation, this is checked by its caller path.

>
> I'll let iommu maintainers comment on whether storing a flush_target_dev
> pointer is the right approach.  (May store a back pointer from
> struct intel_iommu to struct device_domain_info?)

One of them,  wonder which one is better, but device_domain_info

is still per device...seems no good to back it there.

>
> Maybe move the "to_pci_dev(iommu->flush_target_dev)" lookup outside the
> loop to avoid doing this over and over again?

hmm. that is a macro renam of container_of(), exactly, doesn't matter.

right ?

>
> I think we still have a problem here if the device is not removed
> but simply takes a long time to respond to Invalidate Requests
> (as it is permitted to do per the Implementation Note).  We'll
> busy-wait for the completion and potentially run into the watchdog's
> time limit again.  So I think you or someone else in your org should
> add OKRs to refactor the code so that it sleeps in-between polling

refactor code would be long story, so far still a quick fix for the issue.

and I think developers have other justifiction or conern about the

non-sync version, once again, thanks for your comment.


regards,

Ethan

> for Invalidate Completions (instead of busy-waiting with interrupts
> disabled).
>
> Thanks,
>
> Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ