lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWafiUwBI8togJvP@google.com>
Date: Tue, 13 Jan 2026 20:39:53 +0100
From: Dmytro Maluka <dmaluka@...omium.org>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
	Robin Murphy <robin.murphy@....com>,
	Kevin Tian <kevin.tian@...el.com>, Jason Gunthorpe <jgg@...dia.com>,
	Samiullah Khawaja <skhawaja@...gle.com>, iommu@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	"Vineeth Pillai (Google)" <vineeth@...byteword.org>,
	Aashish Sharma <aashish@...hishsharma.net>
Subject: Re: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement

On Tue, Jan 13, 2026 at 11:00:48AM +0800, Lu Baolu wrote:
> The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
> hardware may fetch this entry in multiple 128-bit chunks, updating the
> entire entry while it is active (P=1) risks a "torn" read where the
> hardware observes an inconsistent state.
> 
> However, certain updates (e.g., changing page table pointers while
> keeping the translation type and domain ID the same) can be performed
> hitlessly. This is possible if the update is limited to a single
> 128-bit chunk while the other chunks remains stable.
> 
> Introduce a hitless replacement mechanism for PASID entries:
> 
> - Update 'struct pasid_entry' with a union to support 128-bit
>   access via the newly added val128[4] array.
> - Add pasid_support_hitless_replace() to determine if a transition
>   between an old and new entry is safe to perform atomically.
>   - For First-level/Nested translations: The first 128 bits (chunk 0)
>     must remain identical; chunk 1 is updated atomically.
>   - For Second-level/Pass-through: The second 128 bits (chunk 1)
>     must remain identical; chunk 0 is updated atomically.
> - If hitless replacement is supported, use intel_iommu_atomic128_set()
>   to commit the change in a single 16-byte burst.
> - If the changes are too extensive to be hitless, fall back to the
>   safe "tear down and re-setup" flow (clear present -> flush -> setup).
> 
> Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
> ---
>  drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
>  drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
>  2 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 35de1d77355f..b569e2828a8b 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -37,7 +37,10 @@ struct pasid_dir_entry {
>  };
>  
>  struct pasid_entry {
> -	u64 val[8];
> +	union {
> +		u64 val[8];
> +		u128 val128[4];
> +	};
>  };
>  
>  #define PASID_ENTRY_PGTT_FL_ONLY	(1)
> @@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
>  	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
>  }
>  
> +static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
> +						 struct pasid_entry *new, int type)
> +{
> +	switch (type) {
> +	case PASID_ENTRY_PGTT_FL_ONLY:
> +	case PASID_ENTRY_PGTT_NESTED:
> +		/* The first 128 bits remain the same. */
> +		return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
> +			READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
> +	case PASID_ENTRY_PGTT_SL_ONLY:
> +	case PASID_ENTRY_PGTT_PT:
> +		/* The second 128 bits remain the same. */
> +		return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
> +			READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
> +	default:
> +		WARN_ON(true);

nit: WARN_ON(false) seems a bit more suitable?

> +	}
> +
> +	return false;
> +}
> +
>  extern unsigned int intel_pasid_max_id;
>  int intel_pasid_alloc_table(struct device *dev);
>  void intel_pasid_free_table(struct device *dev);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 4f36138448d8..da7ab18d3bfe 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_FL_ONLY)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_first_level(iommu, dev, fsptptr,
> +						     pasid, did, flags);
> +	}
> +
> +	/*
> +	 * A first-only hitless replace requires the first 128 bits to remain
> +	 * the same. Only the second 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_SL_ONLY)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> +	}
> +
> +	/*
> +	 * A second-only hitless replace requires the second 128 bits to remain
> +	 * the same. Only the first 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_PT)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_pass_through(iommu, dev, pasid);
> +	}
> +
> +	/*
> +	 * A passthrough hitless replace requires the second 128 bits to remain
> +	 * the same. Only the first 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_NESTED)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_nested(iommu, dev, pasid, domain);
> +	}
> +
> +	/*
> +	 * A nested hitless replace requires the first 128 bits to remain
> +	 * the same. Only the second 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ