[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2080aaea-0d6e-418e-8391-ddac9b39c109@linux.intel.com>
Date: Thu, 10 Jul 2025 10:57:19 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: jacob.pan@...ux.microsoft.com, Jason Gunthorpe <jgg@...dia.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>, Kevin Tian <kevin.tian@...el.com>,
Jann Horn <jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>,
Dave Hansen <dave.hansen@...el.com>, Alistair Popple <apopple@...dia.com>,
Peter Zijlstra <peterz@...radead.org>, Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, iommu@...ts.linux.dev,
security@...nel.org, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
Hi Jacob,
On 7/10/25 02:15, Jacob Pan wrote:
> Hi Jason,
>
> On Wed, 9 Jul 2025 13:27:24 -0300
> Jason Gunthorpe <jgg@...dia.com> wrote:
>
>> On Wed, Jul 09, 2025 at 08:51:58AM -0700, Jacob Pan wrote:
>>>> In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU
>>>> hardware shares and walks the CPU's page tables. Architectures
>>>> like x86 share static kernel address mappings across all user
>>>> page tables, allowing the IOMMU to access the kernel portion of
>>>> these tables.
>>
>>> Is there a use case where a SVA user can access kernel memory in the
>>> first place?
>>
>> No. It should be fully blocked.
>>
> Then I don't understand what is the "vulnerability condition" being
> addressed here. We are talking about KVA range here.
Let me take a real example:
A device might be mistakenly configured to access memory at IOVA
0xffffa866001d5000 (a vmalloc'd memory region) with user-mode access
permission. The corresponding page table entries for this IOVA
translation, assuming a five-level page table, would appear as follows:
PGD: Entry present with U/S bit set (1)
P4D: Entry present with U/S bit set (1)
PUD: Entry present with U/S bit set (1)
PMD: Entry present with U/S bit set (1)
PTE: Entry present with U/S bit clear (0)
When the IOMMU walks this page table, it may potentially cache all
present entries, regardless of the U/S bit's state. Upon reaching the
leaf PTE, the IOMMU performs a permission check. This involves comparing
the device's DMA access mode (in this case, user mode) against the
cumulative U/S permission derived from an AND operation across all U/S
bits in the traversed page table entries (which here results in U/S ==
0).
The IOMMU correctly blocks this DMA access because the device's
requested access (user mode) exceeds the permissions granted by the page
table (supervisor-only at the PTE level). However, the PGD, P4D, PUD,
and PMD entries that were traversed might remain cached within the
IOMMU's paging structure cache.
Now, consider a scenario where the page table leaf page is freed and
subsequently repurposed, and the U/S bit at its previous location is
modified to 1. From the IOMMU's perspective, the page table for the
aforementioned IOVA would now appear as follows:
PGD: Entry present with U/S bit set (1) [retrieved from paging cache]
P4D: Entry present with U/S bit set (1) [retrieved from paging cache]
PUD: Entry present with U/S bit set (1) [retrieved from paging cache]
PMD: Entry present with U/S bit set (1) [retrieved from paging cache]
PTE: Entry present with U/S bit set (1) {read from physical memory}
As a result, the device could then potentially access the memory at IOVA
0xffffa866001d5000 with user-mode permission, which was explicitly
disallowed.
Thanks,
baolu
Powered by blists - more mailing lists