linux-kernel - Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250711083252.GE1099709@noisy.programming.kicks-ass.net>
Date: Fri, 11 Jul 2025 10:32:52 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Baolu Lu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
	Robin Murphy <robin.murphy@....com>,
	Kevin Tian <kevin.tian@...el.com>, Jason Gunthorpe <jgg@...dia.com>,
	Jann Horn <jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>,
	Dave Hansen <dave.hansen@...el.com>,
	Alistair Popple <apopple@...dia.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Jean-Philippe Brucker <jean-philippe@...aro.org>,
	Andy Lutomirski <luto@...nel.org>,
	"Tested-by : Yi Lai" <yi1.lai@...el.com>, iommu@...ts.linux.dev,
	security@...nel.org, linux-kernel@...r.kernel.org,
	stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On Fri, Jul 11, 2025 at 11:00:06AM +0800, Baolu Lu wrote:
> Hi Peter Z,
> 
> On 7/10/25 21:54, Peter Zijlstra wrote:
> > On Wed, Jul 09, 2025 at 02:28:00PM +0800, Lu Baolu wrote:
> > > The vmalloc() and vfree() functions manage virtually contiguous, but not
> > > necessarily physically contiguous, kernel memory regions. When vfree()
> > > unmaps such a region, it tears down the associated kernel page table
> > > entries and frees the physical pages.
> > > 
> > > In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
> > > shares and walks the CPU's page tables. Architectures like x86 share
> > > static kernel address mappings across all user page tables, allowing the
> > > IOMMU to access the kernel portion of these tables.
> > > 
> > > Modern IOMMUs often cache page table entries to optimize walk performance,
> > > even for intermediate page table levels. If kernel page table mappings are
> > > changed (e.g., by vfree()), but the IOMMU's internal caches retain stale
> > > entries, Use-After-Free (UAF) vulnerability condition arises. If these
> > > freed page table pages are reallocated for a different purpose, potentially
> > > by an attacker, the IOMMU could misinterpret the new data as valid page
> > > table entries. This allows the IOMMU to walk into attacker-controlled
> > > memory, leading to arbitrary physical memory DMA access or privilege
> > > escalation.
> > > 
> > > To mitigate this, introduce a new iommu interface to flush IOMMU caches
> > > and fence pending page table walks when kernel page mappings are updated.
> > > This interface should be invoked from architecture-specific code that
> > > manages combined user and kernel page tables.
> > 
> > I must say I liked the kPTI based idea better. Having to iterate and
> > invalidate an unspecified number of IOMMUs from non-preemptible context
> > seems 'unfortunate'.
> 
> The cache invalidation path in IOMMU drivers is already critical and
> operates within a non-preemptible context. This approach is, in fact,
> already utilized for user-space page table updates since the beginning
> of SVA support.

OK, fair enough I suppose. What kind of delays are we talking about
here? The fact that you basically have a unbounded list of IOMMUs
(although in practise I suppose it is limited by the amount of GPUs and
other fancy stuff you can stick in your machine) does slightly worry me.

At some point the low latency folks are going to come hunting you down.
Do you have a plan on how to deal with this; or are we throwing up our
hands an say, the hardware sucks, deal with it?