[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250704133807.GB1410929@nvidia.com>
Date: Fri, 4 Jul 2025 10:38:07 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
Vasant Hegde <vasant.hegde@....com>,
Dave Hansen <dave.hansen@...el.com>,
Alistair Popple <apopple@...dia.com>,
Peter Zijlstra <peterz@...radead.org>,
Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, iommu@...ts.linux.dev,
security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush
On Fri, Jul 04, 2025 at 09:30:56PM +0800, Lu Baolu wrote:
> The vmalloc() and vfree() functions manage virtually contiguous, but not
> necessarily physically contiguous, kernel memory regions. When vfree()
> unmaps such a region, it tears down the associated kernel page table
> entries and frees the physical pages.
>
> In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
> shares and walks the CPU's page tables. Architectures like x86 share
> static kernel address mappings across all user page tables, allowing the
> IOMMU to access the kernel portion of these tables.
>
> Modern IOMMUs often cache page table entries to optimize walk performance,
> even for intermediate page table levels. If kernel page table mappings are
> changed (e.g., by vfree()), but the IOMMU's internal caches retain stale
> entries, Use-After-Free (UAF) vulnerability condition arises. If these
> freed page table pages are reallocated for a different purpose, potentially
> by an attacker, the IOMMU could misinterpret the new data as valid page
> table entries. This allows the IOMMU to walk into attacker-controlled
> memory, leading to arbitrary physical memory DMA access or privilege
> escalation.
>
> To mitigate this, introduce a new iommu interface to flush IOMMU caches
> and fence pending page table walks when kernel page mappings are updated.
> This interface should be invoked from architecture-specific code that
> manages combined user and kernel page tables.
>
> Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices")
> Cc: stable@...r.kernel.org
> Co-developed-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
> ---
> arch/x86/mm/tlb.c | 2 ++
> drivers/iommu/iommu-sva.c | 32 +++++++++++++++++++++++++++++++-
> include/linux/iommu.h | 4 ++++
> 3 files changed, 37 insertions(+), 1 deletion(-)
Reported-by: Jann Horn <jannh@...gle.com>
> @@ -1540,6 +1541,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end)
> kernel_tlb_flush_range(info);
>
> put_flush_tlb_info();
> + iommu_sva_invalidate_kva_range(start, end);
> }
This is much less call sites than I guessed!
> +void iommu_sva_invalidate_kva_range(unsigned long start, unsigned long end)
> +{
> + struct iommu_mm_data *iommu_mm;
> +
> + might_sleep();
> +
> + if (!static_branch_unlikely(&iommu_sva_present))
> + return;
> +
> + guard(mutex)(&iommu_sva_lock);
> + list_for_each_entry(iommu_mm, &iommu_sva_mms, mm_list_elm)
> + mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm, start, end);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range);
I don't think it needs to be exported it only arch code is calling it?
Looks Ok to me:
Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
Jason
Powered by blists - more mailing lists