lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 1 Jul 2024 06:39:18 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>, "Will
 Deacon" <will@...nel.org>, Robin Murphy <robin.murphy@....com>, "Jason
 Gunthorpe" <jgg@...pe.ca>
CC: "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 1/2] iommu/vt-d: Add helper to flush caches for context
 change

> From: Baolu Lu <baolu.lu@...ux.intel.com>
> Sent: Friday, June 28, 2024 7:25 PM
> 
> On 2024/6/28 15:43, Tian, Kevin wrote:
> >> From: Baolu Lu <baolu.lu@...ux.intel.com>
> >> Sent: Thursday, June 27, 2024 4:22 PM
> >>
> >> On 2024/6/27 14:08, Tian, Kevin wrote:
> >>>> From: Lu Baolu <baolu.lu@...ux.intel.com>
> >>>> Sent: Thursday, June 27, 2024 10:31 AM
> >>>>
> >>>> + */
> >>>> +void intel_context_flush_present(struct device_domain_info *info,
> >>>> +				 struct context_entry *context,
> >>>> +				 bool affect_domains)
> >>>> +{
> >>>> +	struct intel_iommu *iommu = info->iommu;
> >>>> +	u16 did = context_domain_id(context);
> >>>> +	struct pasid_entry *pte;
> >>>> +	int i;
> >>>> +
> >>>> +	assert_spin_locked(&iommu->lock);
> >>>> +
> >>>> +	/*
> >>>> +	 * Device-selective context-cache invalidation. The Domain-ID field
> >>>> +	 * of the Context-cache Invalidate Descriptor is ignored by hardware
> >>>> +	 * when operating in scalable mode. Therefore the @did value
> >>>> doesn't
> >>>> +	 * matter in scalable mode.
> >>>> +	 */
> >>>> +	iommu->flush.flush_context(iommu, did, PCI_DEVID(info->bus, info-
> >>>>> devfn),
> >>>> +				   DMA_CCMD_MASK_NOBIT,
> >>>> DMA_CCMD_DEVICE_INVL);
> >>>> +
> >>>> +	/*
> >>>> +	 * For legacy mode:
> >>>> +	 * - Domain-selective IOTLB invalidation
> >>>> +	 * - Global Device-TLB invalidation to all affected functions
> >>>> +	 */
> >>>> +	if (!sm_supported(iommu)) {
> >>>> +		iommu->flush.flush_iotlb(iommu, did, 0, 0,
> >>>> DMA_TLB_DSI_FLUSH);
> >>>> +		__context_flush_dev_iotlb(info);
> >>>> +
> >>>> +		return;
> >>>> +	}
> >>>> +
> >>>> +	/*
> >>>> +	 * For scalable mode:
> >>>> +	 * - Domain-selective PASID-cache invalidation to affected domains
> >>>> +	 * - Domain-selective IOTLB invalidation to affected domains
> >>>> +	 * - Global Device-TLB invalidation to affected functions
> >>>> +	 */
> >>>> +	if (affect_domains) {
> >>>> +		for (i = 0; i < info->pasid_table->max_pasid; i++) {
> >>>> +			pte = intel_pasid_get_entry(info->dev, i);
> >>>> +			if (!pte || !pasid_pte_is_present(pte))
> >>>> +				continue;
> >>>> +
> >>>> +			did = pasid_get_domain_id(pte);
> >>>> +			qi_flush_pasid_cache(iommu, did,
> >>>> QI_PC_ALL_PASIDS, 0);
> >>>> +			iommu->flush.flush_iotlb(iommu, did, 0, 0,
> >>>> DMA_TLB_DSI_FLUSH);
> >>>> +		}
> >>>> +	}
> >>>> +
> >>>> +	__context_flush_dev_iotlb(info);
> >>>> +}
> >>>> --
> >>>> 2.34.1
> >>>>
> >>>
> >>> this change moves the entire cache invalidation flow inside
> >>> iommu->lock. Though the directly-affected operations are not in
> >>> critical path the indirect impact applies to all other paths acquiring
> >>> iommu->lock given it'll be held unnecessarily longer after this
> >>> change.
> >>>
> >>> If the only requirement of holding iommu->lock is to walk the pasid
> >>> table, probably we can collect a bitmap of DID's in the locked walk
> >>> and then invalidate each in a loop outside of iommu->lock. Here
> >>> we only care about DIDs associated with the old context entry at
> >>> this point. New pasid attach will hit new context entry. Concurrent
> >>> pasid detach then may just come with duplicated invalidations.
> >>
> >> The iommu->lock is not only for the PASID table walk. The basic
> >> schematic here is that once a present context table entry is being
> >> changed, all PASID entries should not be altered until all the related
> >> caches have been flushed. This is because the configuration of the
> >> context entry might also impact PASID translation.
> >
> > Is it what the spec explicitly requires? My impression was that we
> > need to invalidate any cache which may be associated with the old
> > context entry, which is not equal to prohibiting PASID entries from
> > being changed at the same time (as long as those changes won't
> > cause a stale cache entry being active).
> 
> No, I didn't find that the VT-d spec explicitly requires this. But
> conceptually, the context entry controls features of the whole device,
> while pasid entries control the translation on a pasid. The change in
> context entry potentially impacts the pasid translation ...
> 
> >
> > e.g. let's say this helper collects valid pasid entries (2, 3, 4) and
> > associated DIDs (x, y, z) in a locked walk of the pasid table and
> > then follows the spec sequence to invalidate the pasid cache
> > for (2, 3, 4) and the iotlb for (x, y, z) outside of the lock.
> >
> > there could be several concurrent scenarios if iommu->lock is
> > only guarded on the pasid table walking:
> >
> > 1) pasid entry #1 is torn down before the locked walk. The
> > teardown path will invalidate its pasid cache and iotlb properly.
> > 2) pasid entry #4 is torn down after the locked walk. Then we may
> > see duplicated invalidations both in this helper and in the
> > teardown path.
> > 3) new pasid attach before locked walk may be associated with
> > either old or new context entry, depending on whether it's passed
> > the context cache invalidation. In any case it will be captured by
> > locked walk and then have related cache invalidated in the helper.
> > 4) new pasid attach after locked walk is always safe as related
> > cache will only be associated with the new context entry.
> 
> ... hence, from the driver's point of view, let's start from simplicity
> unless there's any real use case.
> 
> >>
> >> Previously, we did not apply this lock because all those cases involved
> >> changing the context entry from present to non-present, and we were
> >> certain that all PASID entries were empty. Now, as we are making it a
> >> generic helper that also serves scenarios where the entry remains
> >> present after the change, we need this lock to ensure that no PASID
> >> entry changes occur at the same time.
> >>
> >
> > Even if we want to do a conservative locking I prefer to not applying
> > it to existing paths which clearly have no need for extended lock.
> >
> > Then probably making a specific helper for pri flip makes more sense
> > than generalizing code to incur unnecessary lock overhead on existing
> > paths...
> 
> Yes, agreed with you. We don't need to extend this lock mechanism change
> to the existing cases where it's unnecessary.
> 
> How about below additional changes?

it's better

> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index c5d9c2283977..523407f6f6b2 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1954,8 +1954,8 @@ static void domain_context_clear_one(struct
> device_domain_info *info, u8 bus, u8
> 
>   	context_clear_entry(context);
>   	__iommu_flush_cache(iommu, context, sizeof(*context));
> -	intel_context_flush_present(info, context, true);
>   	spin_unlock(&iommu->lock);
> +	intel_context_flush_present(info, context, true);
>   }
> 
>   static int domain_setup_first_level(struct intel_iommu *iommu,
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 01156135e60f..9a7b5668c723 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -690,8 +690,8 @@ static void device_pasid_table_teardown(struct
> device *dev, u8 bus, u8 devfn)
> 
>   	context_clear_entry(context);
>   	__iommu_flush_cache(iommu, context, sizeof(*context));
> +	spin_unlock(&iommu->lock);
>   	intel_context_flush_present(info, context, false);
> -	spin_unlock(&iommu->lock);
>   }
> 
>   static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias,
> void *data)
> @@ -889,8 +889,6 @@ void intel_context_flush_present(struct
> device_domain_info *info,
>   	struct pasid_entry *pte;
>   	int i;
> 
> -	assert_spin_locked(&iommu->lock);
> -
>   	/*
>   	 * Device-selective context-cache invalidation. The Domain-ID field
>   	 * of the Context-cache Invalidate Descriptor is ignored by hardware
> @@ -919,6 +917,13 @@ void intel_context_flush_present(struct
> device_domain_info *info,
>   	 * - Global Device-TLB invalidation to affected functions
>   	 */
>   	if (flush_domains) {
> +		/*
> +		 * If the IOMMU is running in scalable mode and there might
> +		 * be potential PASID translations, the caller should hold
> +		 * the lock to ensure that context changes and cache flushes
> +		 * are atomic.
> +		 */
> +		assert_spin_locked(&iommu->lock);
>   		for (i = 0; i < info->pasid_table->max_pasid; i++) {
>   			pte = intel_pasid_get_entry(info->dev, i);
>   			if (!pte || !pasid_pte_is_present(pte))
> 
> Best regards,
> baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ