lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52760881132AD4513E32373C8C8FA@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Wed, 14 Jan 2026 07:26:10 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, Samiullah Khawaja
	<skhawaja@...gle.com>
CC: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, "Robin
 Murphy" <robin.murphy@....com>, Jason Gunthorpe <jgg@...dia.com>, "Dmytro
 Maluka" <dmaluka@...omium.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement

> From: Baolu Lu <baolu.lu@...ux.intel.com>
> Sent: Wednesday, January 14, 2026 1:45 PM
> 
> On 1/14/26 03:27, Samiullah Khawaja wrote:
> > On Mon, Jan 12, 2026 at 7:03 PM Lu Baolu<baolu.lu@...ux.intel.com>
> wrote:
> >> The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
> >> hardware may fetch this entry in multiple 128-bit chunks, updating the
> >> entire entry while it is active (P=1) risks a "torn" read where the
> >> hardware observes an inconsistent state.
> >>
> >> However, certain updates (e.g., changing page table pointers while
> >> keeping the translation type and domain ID the same) can be performed
> >> hitlessly. This is possible if the update is limited to a single
> >> 128-bit chunk while the other chunks remains stable.
> >>
> >> Introduce a hitless replacement mechanism for PASID entries:
> >>
> >> - Update 'struct pasid_entry' with a union to support 128-bit
> >>    access via the newly added val128[4] array.
> >> - Add pasid_support_hitless_replace() to determine if a transition
> >>    between an old and new entry is safe to perform atomically.
> >>    - For First-level/Nested translations: The first 128 bits (chunk 0)
> >>      must remain identical; chunk 1 is updated atomically.
> > Looking at the specs, the DID is part of the first 128 bits (chunk 0),
> > so I guess for the first level the hitless replacement would not be
> > supported since each domain will have a different DID?
> 
> It's not necessarily true that each domain will have a different DID. On
> Intel IOMMU, all SVA domains can share a single DID. Similarly, multiple
> nested domains sitting on top of the same second-stage page table can
> also share a DID.
> 

I guess Samiullah talked about DMA domain with first stage, where each
DMA domain has a DID. The spec says that DID and pgtable pointer must
be updated in one atomic operation. It applies to second-stage but not
first-stage which sits in a different chunk from where DID sits.

But thinking more I'm not sure whether that guidance is too strict.

The key requirement is below:

  When modifying fields in present (P=1) entries, software must ensure
  that at any point of time during the modification (performed through 
  single or multiple write operations), the before and after state of the
  entry being modified is individually self-consistent. 

i.e. there should be no iommu error triggered when the hw reads a
partially-modified entry in that transition period - either translating via
the old table or via the new table.

Then the one initiating replace will ensure that in-fly DMAs will only
target the addresses with same mapping in both old/new tables.
Otherwise its own problem.

Now let's say a flow in the iommu driver:
  
 1) updates the first stage page pointer (in the 2nd 128bit)
 2) updates the DID (in the 1st 128bit)
 3) flush iommu cache

before cache is flushed, it may contain:

 - entries tagged with old DID, with content loaded from old table
 - entries tagged with old DID, with content loaded from new table
 - entries tagged with new DID, with content loaded from new table

Compared to 2nd-stage the only problematic one is old DID + new table.

According to 6.2.1 (Tagging of Cached Translations), the root address
of page table is not used in tagging and DID-based invalidation will
flush all entries related to old DID (no matter it's from old or new table).

Then it should just work!

p.s. Jason said that atomic size is 128bit on AMD and 64bit on ARM.
they both have DID concept and two page table pointers. So I assume
it's the same case on this front?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ