[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhSv56Y9rJLVdqV9N54c7S30ZUjyNh05xc-EW2+dS74GFQ@mail.gmail.com>
Date: Tue, 13 Jan 2026 11:27:30 -0800
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>, Kevin Tian <kevin.tian@...el.com>,
Jason Gunthorpe <jgg@...dia.com>, Dmytro Maluka <dmaluka@...omium.org>, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement
On Mon, Jan 12, 2026 at 7:03 PM Lu Baolu <baolu.lu@...ux.intel.com> wrote:
>
> The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
> hardware may fetch this entry in multiple 128-bit chunks, updating the
> entire entry while it is active (P=1) risks a "torn" read where the
> hardware observes an inconsistent state.
>
> However, certain updates (e.g., changing page table pointers while
> keeping the translation type and domain ID the same) can be performed
> hitlessly. This is possible if the update is limited to a single
> 128-bit chunk while the other chunks remains stable.
>
> Introduce a hitless replacement mechanism for PASID entries:
>
> - Update 'struct pasid_entry' with a union to support 128-bit
> access via the newly added val128[4] array.
> - Add pasid_support_hitless_replace() to determine if a transition
> between an old and new entry is safe to perform atomically.
> - For First-level/Nested translations: The first 128 bits (chunk 0)
> must remain identical; chunk 1 is updated atomically.
Looking at the specs, the DID is part of the first 128 bits (chunk 0),
so I guess for the first level the hitless replacement would not be
supported since each domain will have a different DID?
> - For Second-level/Pass-through: The second 128 bits (chunk 1)
> must remain identical; chunk 0 is updated atomically.
> - If hitless replacement is supported, use intel_iommu_atomic128_set()
> to commit the change in a single 16-byte burst.
> - If the changes are too extensive to be hitless, fall back to the
> safe "tear down and re-setup" flow (clear present -> flush -> setup).
>
> Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
> ---
> drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
> drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
> 2 files changed, 78 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 35de1d77355f..b569e2828a8b 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -37,7 +37,10 @@ struct pasid_dir_entry {
> };
>
> struct pasid_entry {
> - u64 val[8];
> + union {
> + u64 val[8];
> + u128 val128[4];
> + };
> };
>
> #define PASID_ENTRY_PGTT_FL_ONLY (1)
> @@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
> pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
> }
>
> +static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
> + struct pasid_entry *new, int type)
> +{
> + switch (type) {
> + case PASID_ENTRY_PGTT_FL_ONLY:
> + case PASID_ENTRY_PGTT_NESTED:
> + /* The first 128 bits remain the same. */
> + return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
> + READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
> + case PASID_ENTRY_PGTT_SL_ONLY:
> + case PASID_ENTRY_PGTT_PT:
> + /* The second 128 bits remain the same. */
> + return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
> + READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
> + default:
> + WARN_ON(true);
> + }
> +
> + return false;
> +}
> +
> extern unsigned int intel_pasid_max_id;
> int intel_pasid_alloc_table(struct device *dev);
> void intel_pasid_free_table(struct device *dev);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 4f36138448d8..da7ab18d3bfe 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
>
> WARN_ON(old_did != pasid_get_domain_id(pte));
>
> - *pte = new_pte;
> + if (!pasid_support_hitless_replace(pte, &new_pte,
> + PASID_ENTRY_PGTT_FL_ONLY)) {
> + spin_unlock(&iommu->lock);
> + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> + return intel_pasid_setup_first_level(iommu, dev, fsptptr,
> + pasid, did, flags);
> + }
> +
> + /*
> + * A first-only hitless replace requires the first 128 bits to remain
> + * the same. Only the second 128-bit chunk needs to be updated.
> + */
> + intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
> spin_unlock(&iommu->lock);
>
> intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
>
> WARN_ON(old_did != pasid_get_domain_id(pte));
>
> - *pte = new_pte;
> + if (!pasid_support_hitless_replace(pte, &new_pte,
> + PASID_ENTRY_PGTT_SL_ONLY)) {
> + spin_unlock(&iommu->lock);
> + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> + return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> + }
> +
> + /*
> + * A second-only hitless replace requires the second 128 bits to remain
> + * the same. Only the first 128-bit chunk needs to be updated.
> + */
> + intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
> spin_unlock(&iommu->lock);
>
> intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
>
> WARN_ON(old_did != pasid_get_domain_id(pte));
>
> - *pte = new_pte;
> + if (!pasid_support_hitless_replace(pte, &new_pte,
> + PASID_ENTRY_PGTT_PT)) {
> + spin_unlock(&iommu->lock);
> + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> + return intel_pasid_setup_pass_through(iommu, dev, pasid);
> + }
> +
> + /*
> + * A passthrough hitless replace requires the second 128 bits to remain
> + * the same. Only the first 128-bit chunk needs to be updated.
> + */
> + intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
> spin_unlock(&iommu->lock);
>
> intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
>
> WARN_ON(old_did != pasid_get_domain_id(pte));
>
> - *pte = new_pte;
> + if (!pasid_support_hitless_replace(pte, &new_pte,
> + PASID_ENTRY_PGTT_NESTED)) {
> + spin_unlock(&iommu->lock);
> + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> + return intel_pasid_setup_nested(iommu, dev, pasid, domain);
> + }
> +
> + /*
> + * A nested hitless replace requires the first 128 bits to remain
> + * the same. Only the second 128-bit chunk needs to be updated.
> + */
> + intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
> spin_unlock(&iommu->lock);
>
> intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> --
> 2.43.0
>
Powered by blists - more mailing lists