lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhSv56Y9rJLVdqV9N54c7S30ZUjyNh05xc-EW2+dS74GFQ@mail.gmail.com>
Date: Tue, 13 Jan 2026 11:27:30 -0800
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, 
	Robin Murphy <robin.murphy@....com>, Kevin Tian <kevin.tian@...el.com>, 
	Jason Gunthorpe <jgg@...dia.com>, Dmytro Maluka <dmaluka@...omium.org>, iommu@...ts.linux.dev, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement

On Mon, Jan 12, 2026 at 7:03 PM Lu Baolu <baolu.lu@...ux.intel.com> wrote:
>
> The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
> hardware may fetch this entry in multiple 128-bit chunks, updating the
> entire entry while it is active (P=1) risks a "torn" read where the
> hardware observes an inconsistent state.
>
> However, certain updates (e.g., changing page table pointers while
> keeping the translation type and domain ID the same) can be performed
> hitlessly. This is possible if the update is limited to a single
> 128-bit chunk while the other chunks remains stable.
>
> Introduce a hitless replacement mechanism for PASID entries:
>
> - Update 'struct pasid_entry' with a union to support 128-bit
>   access via the newly added val128[4] array.
> - Add pasid_support_hitless_replace() to determine if a transition
>   between an old and new entry is safe to perform atomically.
>   - For First-level/Nested translations: The first 128 bits (chunk 0)
>     must remain identical; chunk 1 is updated atomically.

Looking at the specs, the DID is part of the first 128 bits (chunk 0),
so I guess for the first level the hitless replacement would not be
supported since each domain will have a different DID?
>   - For Second-level/Pass-through: The second 128 bits (chunk 1)
>     must remain identical; chunk 0 is updated atomically.
> - If hitless replacement is supported, use intel_iommu_atomic128_set()
>   to commit the change in a single 16-byte burst.
> - If the changes are too extensive to be hitless, fall back to the
>   safe "tear down and re-setup" flow (clear present -> flush -> setup).
>
> Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
> ---
>  drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
>  drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
>  2 files changed, 78 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 35de1d77355f..b569e2828a8b 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -37,7 +37,10 @@ struct pasid_dir_entry {
>  };
>
>  struct pasid_entry {
> -       u64 val[8];
> +       union {
> +               u64 val[8];
> +               u128 val128[4];
> +       };
>  };
>
>  #define PASID_ENTRY_PGTT_FL_ONLY       (1)
> @@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
>         pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
>  }
>
> +static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
> +                                                struct pasid_entry *new, int type)
> +{
> +       switch (type) {
> +       case PASID_ENTRY_PGTT_FL_ONLY:
> +       case PASID_ENTRY_PGTT_NESTED:
> +               /* The first 128 bits remain the same. */
> +               return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
> +                       READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
> +       case PASID_ENTRY_PGTT_SL_ONLY:
> +       case PASID_ENTRY_PGTT_PT:
> +               /* The second 128 bits remain the same. */
> +               return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
> +                       READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
> +       default:
> +               WARN_ON(true);
> +       }
> +
> +       return false;
> +}
> +
>  extern unsigned int intel_pasid_max_id;
>  int intel_pasid_alloc_table(struct device *dev);
>  void intel_pasid_free_table(struct device *dev);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 4f36138448d8..da7ab18d3bfe 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
>
>         WARN_ON(old_did != pasid_get_domain_id(pte));
>
> -       *pte = new_pte;
> +       if (!pasid_support_hitless_replace(pte, &new_pte,
> +                                          PASID_ENTRY_PGTT_FL_ONLY)) {
> +               spin_unlock(&iommu->lock);
> +               intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +               return intel_pasid_setup_first_level(iommu, dev, fsptptr,
> +                                                    pasid, did, flags);
> +       }
> +
> +       /*
> +        * A first-only hitless replace requires the first 128 bits to remain
> +        * the same. Only the second 128-bit chunk needs to be updated.
> +        */
> +       intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>         spin_unlock(&iommu->lock);
>
>         intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
>
>         WARN_ON(old_did != pasid_get_domain_id(pte));
>
> -       *pte = new_pte;
> +       if (!pasid_support_hitless_replace(pte, &new_pte,
> +                                          PASID_ENTRY_PGTT_SL_ONLY)) {
> +               spin_unlock(&iommu->lock);
> +               intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +               return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> +       }
> +
> +       /*
> +        * A second-only hitless replace requires the second 128 bits to remain
> +        * the same. Only the first 128-bit chunk needs to be updated.
> +        */
> +       intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>         spin_unlock(&iommu->lock);
>
>         intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
>
>         WARN_ON(old_did != pasid_get_domain_id(pte));
>
> -       *pte = new_pte;
> +       if (!pasid_support_hitless_replace(pte, &new_pte,
> +                                          PASID_ENTRY_PGTT_PT)) {
> +               spin_unlock(&iommu->lock);
> +               intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +               return intel_pasid_setup_pass_through(iommu, dev, pasid);
> +       }
> +
> +       /*
> +        * A passthrough hitless replace requires the second 128 bits to remain
> +        * the same. Only the first 128-bit chunk needs to be updated.
> +        */
> +       intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>         spin_unlock(&iommu->lock);
>
>         intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
>
>         WARN_ON(old_did != pasid_get_domain_id(pte));
>
> -       *pte = new_pte;
> +       if (!pasid_support_hitless_replace(pte, &new_pte,
> +                                          PASID_ENTRY_PGTT_NESTED)) {
> +               spin_unlock(&iommu->lock);
> +               intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +               return intel_pasid_setup_nested(iommu, dev, pasid, domain);
> +       }
> +
> +       /*
> +        * A nested hitless replace requires the first 128 bits to remain
> +        * the same. Only the second 128-bit chunk needs to be updated.
> +        */
> +       intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>         spin_unlock(&iommu->lock);
>
>         intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> --
> 2.43.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ