lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 24 Dec 2023 11:47:09 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Ethan Zhao <haifeng.zhao@...ux.intel.com>
Cc: bhelgaas@...gle.com, baolu.lu@...ux.intel.com, dwmw2@...radead.org,
	will@...nel.org, robin.murphy@....com, linux-pci@...r.kernel.org,
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v6 4/4] iommu/vt-d: break out devTLB invalidation if
 target device is gone

On Sun, Dec 24, 2023 at 12:06:57AM -0500, Ethan Zhao wrote:
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1423,6 +1423,13 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>  	writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
>  
>  	while (qi->desc_status[wait_index] != QI_DONE) {
> +		/*
> +		 * if the devTLB invalidation target device is gone, don't wait
> +		 * anymore, it might take up to 1min+50%, causes system hang.
> +		 */
> +		if (type == QI_DIOTLB_TYPE && iommu->flush_target_dev)
> +			if (!pci_device_is_present(to_pci_dev(iommu->flush_target_dev)))
> +				break;

As a general approach, this is much better now.

Please combine the nested if-clauses into one.

Please amend the code comment with a spec reference, i.e.
"(see Implementation Note in PCIe r6.1 sec 10.3.1)"
so that readers of the code know where the magic number "1min+50%"
is coming from.

Is flush_target_dev guaranteed to always be a pci_dev?

I'll let iommu maintainers comment on whether storing a flush_target_dev
pointer is the right approach.  (May store a back pointer from
struct intel_iommu to struct device_domain_info?)

Maybe move the "to_pci_dev(iommu->flush_target_dev)" lookup outside the
loop to avoid doing this over and over again?

I think we still have a problem here if the device is not removed
but simply takes a long time to respond to Invalidate Requests
(as it is permitted to do per the Implementation Note).  We'll
busy-wait for the completion and potentially run into the watchdog's
time limit again.  So I think you or someone else in your org should
add OKRs to refactor the code so that it sleeps in-between polling
for Invalidate Completions (instead of busy-waiting with interrupts
disabled).

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ