lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260123195939.GE1134360@nvidia.com>
Date: Fri, 23 Jan 2026 15:59:39 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Will Deacon <will@...nel.org>
Cc: Nicolin Chen <nicolinc@...dia.com>, jean-philippe@...aro.org,
	robin.murphy@....com, joro@...tes.org, balbirs@...dia.com,
	miko.lenczewski@....com, peterz@...radead.org, kevin.tian@...el.com,
	praan@...gle.com, linux-arm-kernel@...ts.infradead.org,
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v9 7/7] iommu/arm-smmu-v3: Perform per-domain
 invalidations using arm_smmu_invs

On Fri, Jan 23, 2026 at 05:07:15PM +0000, Will Deacon wrote:
> On Fri, Dec 19, 2025 at 12:11:29PM -0800, Nicolin Chen wrote:
> > Replace the old invalidation functions with arm_smmu_domain_inv_range() in
> > all the existing invalidation routines. And deprecate the old functions.
> > 
> > The new arm_smmu_domain_inv_range() handles the CMDQ_MAX_TLBI_OPS as well,
> > so drop it in the SVA function.
> > 
> > Since arm_smmu_cmdq_batch_add_range() has only one caller now, and it must
> > be given a valid size, add a WARN_ON_ONCE to catch any missed case.
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@...dia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   7 -
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  29 +--
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 165 +-----------------
> >  3 files changed, 11 insertions(+), 190 deletions(-)
> 
> It's one thing replacing the invalidation implementation but I think you
> need to update some of the old ordering comments, too. In particular,
> the old code relies on the dma_wmb() during cmdq insertion to order
> updates to in-memory structures, which includes the pgtable in non-strict
> mode.
> 
> I don't think any of that is true now?

You are talking about this comment?

	/*
	 * NOTE: when io-pgtable is in non-strict mode, we may get here with
	 * PTEs previously cleared by unmaps on the current CPU not yet visible
	 * to the SMMU. We are relying on the dma_wmb() implicit during cmd
	 * insertion to guarantee those are observed before the TLBI. Do be
	 * careful, 007.
	 */

Maybe we can restate that a little bit:

	/*
	 * If the DMA API is running in non-strict mode then another CPU could
	 * have changed the page table and not invoked any flush op. Instead the
	 * other CPU will do an atomic_read() and this CPU will have done an
	 * atomic_write(). That handshake is enough to acquire the page table
	 * writes from the other CPU.
	 *
	 * All command execution has a dma_wmb() to release all the in-memory
	 * structures written by this CPU, that barrier must also release the
	 * writes acquired from all the other CPUs too.
	 *
	 * There are other barriers and atomics on this path, but the above is
	 * the essential mechanism for ensuring that HW sees the page table
	 * writes from another CPU before it executes the IOTLB invalidation.
	 */

I'm sure this series adds more barriers that might move things earlier
in the sequence, but that isn't why those barries exist.

I think the original documentation was on to something important so
I'd like to keep the information, though perhaps the comment belongs
in dma-iommu.c as it really applies to all iommu drivers on all
architectures.

It is also why things were done as a smp_XX not a dma_XX. The
intention was not to release things all the way to DMA, but just to
release things enough that the thread writing the STE can acquire
them and it will ensure they are released to DMA.

For instance on UP it is fine to have no barriers at all in the
invalidation code, the dma_wmb() in the STE store command is
sufficient.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ