lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7zpxrs7wgnflfc6eypf5ngrncztvqzp4rriedahmzyehpkeikd@5mbhgvqctqmh>
Date: Thu, 10 Jul 2025 17:37:40 +0800
From: Yu Zhang <zhangyu1@...ux.microsoft.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
Cc: Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>, 
	Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>, 
	Jason Gunthorpe <jgg@...dia.com>, Jann Horn <jannh@...gle.com>, 
	Vasant Hegde <vasant.hegde@....com>, "Hansen, Dave" <dave.hansen@...el.com>, 
	Alistair Popple <apopple@...dia.com>, Peter Zijlstra <peterz@...radead.org>, 
	Uladzislau Rezki <urezki@...il.com>, Jean-Philippe Brucker <jean-philippe@...aro.org>, 
	Andy Lutomirski <luto@...nel.org>, "Lai, Yi1" <yi1.lai@...el.com>, 
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, "security@...nel.org" <security@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On Thu, Jul 10, 2025 at 08:15:27AM +0000, Tian, Kevin wrote:
> > From: Yu Zhang <zhangyu1@...ux.microsoft.com>
> > Sent: Thursday, July 10, 2025 4:11 PM
> > 
> > On Thu, Jul 10, 2025 at 03:02:07AM +0000, Tian, Kevin wrote:
> > > > From: Lu Baolu <baolu.lu@...ux.intel.com>
> > > > Sent: Wednesday, July 9, 2025 2:28 PM
> > > >
> > > > The vmalloc() and vfree() functions manage virtually contiguous, but not
> > > > necessarily physically contiguous, kernel memory regions. When vfree()
> > > > unmaps such a region, it tears down the associated kernel page table
> > > > entries and frees the physical pages.
> > > >
> > > > In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU
> > > > hardware
> > > > shares and walks the CPU's page tables. Architectures like x86 share
> > > > static kernel address mappings across all user page tables, allowing the
> > >
> > > I'd remove 'static'
> > >
> > > > IOMMU to access the kernel portion of these tables.
> > > >
> > > > Modern IOMMUs often cache page table entries to optimize walk
> > > > performance,
> > > > even for intermediate page table levels. If kernel page table mappings are
> > > > changed (e.g., by vfree()), but the IOMMU's internal caches retain stale
> > > > entries, Use-After-Free (UAF) vulnerability condition arises. If these
> > > > freed page table pages are reallocated for a different purpose, potentially
> > > > by an attacker, the IOMMU could misinterpret the new data as valid page
> > > > table entries. This allows the IOMMU to walk into attacker-controlled
> > > > memory, leading to arbitrary physical memory DMA access or privilege
> > > > escalation.
> > >
> > > this lacks of a background that currently the iommu driver is notified
> > > only for changes of user VA mappings, so the IOMMU's internal caches
> > > may retain stale entries for kernel VA.
> > >
> > > >
> > > > To mitigate this, introduce a new iommu interface to flush IOMMU caches
> > > > and fence pending page table walks when kernel page mappings are
> > updated.
> > > > This interface should be invoked from architecture-specific code that
> > > > manages combined user and kernel page tables.
> > >
> > > this also needs some words about the fact that new flushes are triggered
> > > not just for freeing page tables.
> > >
> > Thank you, Kevin. A question about the background of this issue:
> > 
> > My understanding of the attacking scenario is, a malicious user application
> > could initiate DMAs to some vmalloced address, causing the paging structure
> > cache being loaded and then possibly being used after that paging structure
> > is freed(may be allocated to some other users later).
> > 
> > If that is the case, only when the paging structures are freed, do we need
> > to do the flush. I mean, the IOTLB entries may not be loaded at all when the
> > permission check failes. Did I miss anything? :)
> > 
> 
> It's about the paging structure cache instead of IOTLB.
> 
> You may look at the discussion in v1 for more background, especially
> the latest reply from Baolu about a detailed example:
> 
> https://lore.kernel.org/linux-iommu/2080aaea-0d6e-418e-8391-ddac9b39c109@linux.intel.com/
> 

Thank you, Kevin. This really helps.

And by pointing out "this also needs some words about the fact that new
flushes are triggered not just for freeing page tables". Do you mean this
fix is not an optimal one, because iommu_sva_invalidate_kva_range() is
triggered for each unmapping of the vmalloced address?

Do we have any choice, e.g., to not trigger the flush e.g., when the page
table(or directory etc.) is not freed? 

B.R.
Yu



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ