lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52764C6A9F1774DDAFED41258C85A@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Thu, 8 Jan 2026 02:09:37 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Dmytro Maluka <dmaluka@...omium.org>, Jason Gunthorpe <jgg@...pe.ca>
CC: David Woodhouse <dwmw2@...radead.org>, Lu Baolu
	<baolu.lu@...ux.intel.com>, "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, Robin Murphy
	<robin.murphy@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Vineeth Pillai (Google)"
	<vineeth@...byteword.org>, Aashish Sharma <aashish@...hishsharma.net>,
	Grzegorz Jaszczyk <jaszczyk@...omium.org>, "Dong, Chuanxiao"
	<chuanxiao.dong@...el.com>
Subject: RE: [PATCH v2 0/5] iommu/vt-d: Ensure memory ordering in context &
 root entry updates

> From: Dmytro Maluka <dmaluka@...omium.org>
> Sent: Tuesday, January 6, 2026 11:50 PM
> 
> On Tue, Jan 06, 2026 at 10:23:01AM -0400, Jason Gunthorpe wrote:
> > On Tue, Jan 06, 2026 at 02:51:38PM +0100, Dmytro Maluka wrote:
> > > Regarding flushing caches right after that - what for? (BTW the Intel
> > > driver doesn't do that either.) If we don't do that and as a result the
> > > HW is using an old entry cached before we cleared the present bit, it
> > > is not affected by our later modifications anyway.
> >
> > You don't know what state the HW fetcher is in. This kind of race is possible:
> >
> >      CPU                 FETCHER
> >                         read present = 1
> >     present = 0
> >     mangle qword 1
> >                         read qword 1
> >                         < fail - HW sees a corrupted entry >
> >
> > The flush is not just a flush but a barrier to synchronize with the HW
> > that it is done all fetches that may have been dependent on seeing
> > present = 1.
> >
> > So missing a flush after clearing present is possibly a bug today - I
> > don't remember what guarenteed the atomic size is for Intel IOMMU
> > though, if the atomic size is the whole entry it is OK since there is
> > only one fetcher read. Though AMD is 128 bits and ARM is 64 bits.
> 
> Indeed, may be a bug... In the VT-d spec I don't immediately see a
> guarantee that context and PASID entries are fetched atomically. (And
> for PASID entries, which are 512 bits, that seems particularly
> unlikely.)
> 

512bits atomicity is possible, but not on the PASID entry.

VT-d spec, head of section 9 (Translation Structure Formats):

"
This chapter describes the memory-resident structures for DMA and
interrupt remapping. Hardware must access structure entries that
are 64-bit or 128-bit atomically. Hardware must update a 512-bit
Posted Interrupt Descriptor (see Section 9.11 for details) atomically.
Other than the Posted Interrupt Descriptor (PID), hardware is allowed 
to break access to larger than 128-bit entries into multiple aligned
128-bit accesses.
"

root entry, scalable root entry, context entry and IRTE are 128bits
so they are OK.

scalable context entry are 256bits but only the lower 128bits are
defined so it's OK for now.

scalable PASID directory entry is 64bits. ok.

posted interrupt descriptor is 512bits with atomicity guaranteed.

but we do have problem on scalable pasid entry which is 512bits.
  - bits beyond 191 are for future hardware, not a problem now
  - bits 128-191 are for 1st-stage
  - bits 0-127 manages stage selection, 2nd-stage, and some 1st-stage

so in theory 1st-stage and nesting are affected by this bug.

In reality:
  - iommu driver shouldn't receive an attach request on an in-use pasid
    entry, so the cache should remain cleared (either at initial state or
    flushed by previous teardown) then hw won't use a partial 1st-stage
    config after seeing the entry as non-present.

  - replace is already broken, as the entry should not be cleared in the
    1st place then this bug will be fixed when replace is reworked.

If no oversight (Baolu?), probably we don't need to fix it strictly following
Jason's pseudo logic at this point. Instead, just rename pasid_clear_entry()
to pasid_clear_entry_no_flush() for now (with some comment to clarify
the expectation), and rework the replace path in parallel.

We may never require a pasid_clear_entry_flush_cache() once hitless
replace is in place. 😊

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ