[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710082808.52399e31@DESKTOP-0403QTC.>
Date: Thu, 10 Jul 2025 08:28:08 -0700
From: Jacob Pan <jacob.pan@...ux.microsoft.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Joerg Roedel <joro@...tes.org>, Will
Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>, Kevin Tian
<kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>, Vasant Hegde
<vasant.hegde@....com>, Dave Hansen <dave.hansen@...el.com>, Alistair
Popple <apopple@...dia.com>, Peter Zijlstra <peterz@...radead.org>,
Uladzislau Rezki <urezki@...il.com>, Jean-Philippe Brucker
<jean-philippe@...aro.org>, Andy Lutomirski <luto@...nel.org>,
iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org, jacob.pan@...ux.microsoft.com
Subject: Re: [PATCH 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
Hi Baolu,
On Thu, 10 Jul 2025 10:57:19 +0800
Baolu Lu <baolu.lu@...ux.intel.com> wrote:
> Hi Jacob,
>
> On 7/10/25 02:15, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Wed, 9 Jul 2025 13:27:24 -0300
> > Jason Gunthorpe <jgg@...dia.com> wrote:
> >
> >> On Wed, Jul 09, 2025 at 08:51:58AM -0700, Jacob Pan wrote:
> >>>> In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU
> >>>> hardware shares and walks the CPU's page tables. Architectures
> >>>> like x86 share static kernel address mappings across all user
> >>>> page tables, allowing the IOMMU to access the kernel portion of
> >>>> these tables.
> >>
> >>> Is there a use case where a SVA user can access kernel memory in
> >>> the first place?
> >>
> >> No. It should be fully blocked.
> >>
> > Then I don't understand what is the "vulnerability condition" being
> > addressed here. We are talking about KVA range here.
>
> Let me take a real example:
>
> A device might be mistakenly configured to access memory at IOVA
> 0xffffa866001d5000 (a vmalloc'd memory region) with user-mode access
> permission. The corresponding page table entries for this IOVA
> translation, assuming a five-level page table, would appear as
> follows:
>
> PGD: Entry present with U/S bit set (1)
> P4D: Entry present with U/S bit set (1)
> PUD: Entry present with U/S bit set (1)
> PMD: Entry present with U/S bit set (1)
> PTE: Entry present with U/S bit clear (0)
>
> When the IOMMU walks this page table, it may potentially cache all
> present entries, regardless of the U/S bit's state. Upon reaching the
> leaf PTE, the IOMMU performs a permission check. This involves
> comparing the device's DMA access mode (in this case, user mode)
> against the cumulative U/S permission derived from an AND operation
> across all U/S bits in the traversed page table entries (which here
> results in U/S == 0).
why would IOMMU cache all the entries if the walk is not successful?
Also, per x86 vm map how could this example (UUUUS) happen to SVA? i.e.
sharing intermediate levels.
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap
0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space
> The IOMMU correctly blocks this DMA access because the device's
> requested access (user mode) exceeds the permissions granted by the
> page table (supervisor-only at the PTE level). However, the PGD, P4D,
> PUD, and PMD entries that were traversed might remain cached within
> the IOMMU's paging structure cache.
>
> Now, consider a scenario where the page table leaf page is freed and
> subsequently repurposed, and the U/S bit at its previous location is
> modified to 1. From the IOMMU's perspective, the page table for the
> aforementioned IOVA would now appear as follows:
>
> PGD: Entry present with U/S bit set (1) [retrieved from paging cache]
> P4D: Entry present with U/S bit set (1) [retrieved from paging cache]
> PUD: Entry present with U/S bit set (1) [retrieved from paging cache]
> PMD: Entry present with U/S bit set (1) [retrieved from paging cache]
> PTE: Entry present with U/S bit set (1) {read from physical memory}
>
> As a result, the device could then potentially access the memory at
> IOVA 0xffffa866001d5000 with user-mode permission, which was
> explicitly disallowed.
>
> Thanks,
> baolu
Powered by blists - more mailing lists