lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260113030052.977366-4-baolu.lu@linux.intel.com>
Date: Tue, 13 Jan 2026 11:00:48 +0800
From: Lu Baolu <baolu.lu@...ux.intel.com>
To: Joerg Roedel <joro@...tes.org>,
	Will Deacon <will@...nel.org>,
	Robin Murphy <robin.murphy@....com>,
	Kevin Tian <kevin.tian@...el.com>,
	Jason Gunthorpe <jgg@...dia.com>
Cc: Dmytro Maluka <dmaluka@...omium.org>,
	Samiullah Khawaja <skhawaja@...gle.com>,
	iommu@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	Lu Baolu <baolu.lu@...ux.intel.com>
Subject: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement

The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
hardware may fetch this entry in multiple 128-bit chunks, updating the
entire entry while it is active (P=1) risks a "torn" read where the
hardware observes an inconsistent state.

However, certain updates (e.g., changing page table pointers while
keeping the translation type and domain ID the same) can be performed
hitlessly. This is possible if the update is limited to a single
128-bit chunk while the other chunks remains stable.

Introduce a hitless replacement mechanism for PASID entries:

- Update 'struct pasid_entry' with a union to support 128-bit
  access via the newly added val128[4] array.
- Add pasid_support_hitless_replace() to determine if a transition
  between an old and new entry is safe to perform atomically.
  - For First-level/Nested translations: The first 128 bits (chunk 0)
    must remain identical; chunk 1 is updated atomically.
  - For Second-level/Pass-through: The second 128 bits (chunk 1)
    must remain identical; chunk 0 is updated atomically.
- If hitless replacement is supported, use intel_iommu_atomic128_set()
  to commit the change in a single 16-byte burst.
- If the changes are too extensive to be hitless, fall back to the
  safe "tear down and re-setup" flow (clear present -> flush -> setup).

Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
---
 drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
 drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
 2 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 35de1d77355f..b569e2828a8b 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -37,7 +37,10 @@ struct pasid_dir_entry {
 };
 
 struct pasid_entry {
-	u64 val[8];
+	union {
+		u64 val[8];
+		u128 val128[4];
+	};
 };
 
 #define PASID_ENTRY_PGTT_FL_ONLY	(1)
@@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
 }
 
+static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
+						 struct pasid_entry *new, int type)
+{
+	switch (type) {
+	case PASID_ENTRY_PGTT_FL_ONLY:
+	case PASID_ENTRY_PGTT_NESTED:
+		/* The first 128 bits remain the same. */
+		return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
+			READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
+	case PASID_ENTRY_PGTT_SL_ONLY:
+	case PASID_ENTRY_PGTT_PT:
+		/* The second 128 bits remain the same. */
+		return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
+			READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
+	default:
+		WARN_ON(true);
+	}
+
+	return false;
+}
+
 extern unsigned int intel_pasid_max_id;
 int intel_pasid_alloc_table(struct device *dev);
 void intel_pasid_free_table(struct device *dev);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 4f36138448d8..da7ab18d3bfe 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
 
 	WARN_ON(old_did != pasid_get_domain_id(pte));
 
-	*pte = new_pte;
+	if (!pasid_support_hitless_replace(pte, &new_pte,
+					   PASID_ENTRY_PGTT_FL_ONLY)) {
+		spin_unlock(&iommu->lock);
+		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+		return intel_pasid_setup_first_level(iommu, dev, fsptptr,
+						     pasid, did, flags);
+	}
+
+	/*
+	 * A first-only hitless replace requires the first 128 bits to remain
+	 * the same. Only the second 128-bit chunk needs to be updated.
+	 */
+	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
 	spin_unlock(&iommu->lock);
 
 	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
 
 	WARN_ON(old_did != pasid_get_domain_id(pte));
 
-	*pte = new_pte;
+	if (!pasid_support_hitless_replace(pte, &new_pte,
+					   PASID_ENTRY_PGTT_SL_ONLY)) {
+		spin_unlock(&iommu->lock);
+		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+		return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
+	}
+
+	/*
+	 * A second-only hitless replace requires the second 128 bits to remain
+	 * the same. Only the first 128-bit chunk needs to be updated.
+	 */
+	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
 	spin_unlock(&iommu->lock);
 
 	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
 
 	WARN_ON(old_did != pasid_get_domain_id(pte));
 
-	*pte = new_pte;
+	if (!pasid_support_hitless_replace(pte, &new_pte,
+					   PASID_ENTRY_PGTT_PT)) {
+		spin_unlock(&iommu->lock);
+		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+		return intel_pasid_setup_pass_through(iommu, dev, pasid);
+	}
+
+	/*
+	 * A passthrough hitless replace requires the second 128 bits to remain
+	 * the same. Only the first 128-bit chunk needs to be updated.
+	 */
+	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
 	spin_unlock(&iommu->lock);
 
 	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
 
 	WARN_ON(old_did != pasid_get_domain_id(pte));
 
-	*pte = new_pte;
+	if (!pasid_support_hitless_replace(pte, &new_pte,
+					   PASID_ENTRY_PGTT_NESTED)) {
+		spin_unlock(&iommu->lock);
+		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+		return intel_pasid_setup_nested(iommu, dev, pasid, domain);
+	}
+
+	/*
+	 * A nested hitless replace requires the first 128 bits to remain
+	 * the same. Only the second 128-bit chunk needs to be updated.
+	 */
+	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
 	spin_unlock(&iommu->lock);
 
 	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ