lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a83fbbdd-ab31-4439-a6e9-594a3d4a837e@linux.intel.com>
Date: Thu, 17 Jul 2025 09:51:13 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Yi Liu <yi.l.liu@...el.com>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
 Kevin Tian <kevin.tian@...el.com>, Jason Gunthorpe <jgg@...dia.com>,
 Jann Horn <jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>,
 Dave Hansen <dave.hansen@...el.com>, Alistair Popple <apopple@...dia.com>,
 Peter Zijlstra <peterz@...radead.org>, Uladzislau Rezki <urezki@...il.com>,
 Jean-Philippe Brucker <jean-philippe@...aro.org>,
 Andy Lutomirski <luto@...nel.org>, "Tested-by : Yi Lai" <yi1.lai@...el.com>
Cc: iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On 7/16/25 18:54, Yi Liu wrote:
> On 2025/7/9 14:28, Lu Baolu wrote:
>> The vmalloc() and vfree() functions manage virtually contiguous, but not
>> necessarily physically contiguous, kernel memory regions. When vfree()
>> unmaps such a region, it tears down the associated kernel page table
>> entries and frees the physical pages.
>>
>> In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
>> shares and walks the CPU's page tables. Architectures like x86 share
>> static kernel address mappings across all user page tables, allowing the
>> IOMMU to access the kernel portion of these tables.
> 
> I remember Jason once clarified that no support for kernel SVA. I don't
> think linux has such support so far. If so, may just drop the static
> mapping terms. This can be attack surface mainly because the page table
> may include both user and kernel mappings.

Yes. Kernel SVA has already been removed from the tree.

>> Modern IOMMUs often cache page table entries to optimize walk 
>> performance,
>> even for intermediate page table levels. If kernel page table mappings 
>> are
>> changed (e.g., by vfree()), but the IOMMU's internal caches retain stale
>> entries, Use-After-Free (UAF) vulnerability condition arises. If these
>> freed page table pages are reallocated for a different purpose, 
>> potentially
>> by an attacker, the IOMMU could misinterpret the new data as valid page
>> table entries. This allows the IOMMU to walk into attacker-controlled
>> memory, leading to arbitrary physical memory DMA access or privilege
>> escalation.
> 
> Does this fix cover the page compaction and de-compaction as well? It is

It should.

> valuable to call out the mm subsystem does not notify iommu per page table
> modifications except for the modifications related to user VA, hence SVA is
> in risk to use stale intermediate caches due to this.
> 
>> To mitigate this, introduce a new iommu interface to flush IOMMU caches
>> and fence pending page table walks when kernel page mappings are updated.
>> This interface should be invoked from architecture-specific code that
>> manages combined user and kernel page tables.
> 
> aha, this is what I'm trying to find. Using page tables with both kernel
> and user mappings is the prerequisite for this bug. :)

Yes.

> 
>> Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices")
>> Cc: stable@...r.kernel.org
>> Reported-by: Jann Horn <jannh@...gle.com>
>> Co-developed-by: Jason Gunthorpe <jgg@...dia.com>
>> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
>> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
>> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
>> Reviewed-by: Vasant Hegde <vasant.hegde@....com>
>> Tested-by: Yi Lai <yi1.lai@...el.com>
>> ---
>>   arch/x86/mm/tlb.c         |  2 ++
>>   drivers/iommu/iommu-sva.c | 34 +++++++++++++++++++++++++++++++++-
>>   include/linux/iommu.h     |  4 ++++
>>   3 files changed, 39 insertions(+), 1 deletion(-)
>>
>> Change log:
>> v2:
>>   - Remove EXPORT_SYMBOL_GPL(iommu_sva_invalidate_kva_range);
>>   - Replace the mutex with a spinlock to make the interface usable in the
>>     critical regions.
>>
>> v1: https://lore.kernel.org/linux-iommu/20250704133056.4023816-1- 
>> baolu.lu@...ux.intel.com/
>>
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 39f80111e6f1..a41499dfdc3f 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/task_work.h>
>>   #include <linux/mmu_notifier.h>
>>   #include <linux/mmu_context.h>
>> +#include <linux/iommu.h>
>>   #include <asm/tlbflush.h>
>>   #include <asm/mmu_context.h>
>> @@ -1540,6 +1541,7 @@ void flush_tlb_kernel_range(unsigned long start, 
>> unsigned long end)
>>           kernel_tlb_flush_range(info);
>>       put_flush_tlb_info();
>> +    iommu_sva_invalidate_kva_range(start, end);
>>   }
>>   /*
>> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
>> index 1a51cfd82808..fd76aefa5a88 100644
>> --- a/drivers/iommu/iommu-sva.c
>> +++ b/drivers/iommu/iommu-sva.c
>> @@ -10,6 +10,9 @@
>>   #include "iommu-priv.h"
>>   static DEFINE_MUTEX(iommu_sva_lock);
>> +static DEFINE_STATIC_KEY_FALSE(iommu_sva_present);
>> +static LIST_HEAD(iommu_sva_mms);
>> +static DEFINE_SPINLOCK(iommu_mms_lock);
>>   static struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
>>                              struct mm_struct *mm);
>> @@ -42,6 +45,7 @@ static struct iommu_mm_data 
>> *iommu_alloc_mm_data(struct mm_struct *mm, struct de
>>           return ERR_PTR(-ENOSPC);
>>       }
>>       iommu_mm->pasid = pasid;
>> +    iommu_mm->mm = mm;
>>       INIT_LIST_HEAD(&iommu_mm->sva_domains);
>>       /*
>>        * Make sure the write to mm->iommu_mm is not reordered in front of
>> @@ -132,8 +136,15 @@ struct iommu_sva *iommu_sva_bind_device(struct 
>> device *dev, struct mm_struct *mm
>>       if (ret)
>>           goto out_free_domain;
>>       domain->users = 1;
>> -    list_add(&domain->next, &mm->iommu_mm->sva_domains);
>> +    if (list_empty(&iommu_mm->sva_domains)) {
>> +        scoped_guard(spinlock_irqsave, &iommu_mms_lock) {
>> +            if (list_empty(&iommu_sva_mms))
>> +                static_branch_enable(&iommu_sva_present);
>> +            list_add(&iommu_mm->mm_list_elm, &iommu_sva_mms);
>> +        }
>> +    }
>> +    list_add(&domain->next, &iommu_mm->sva_domains);
>>   out:
>>       refcount_set(&handle->users, 1);
>>       mutex_unlock(&iommu_sva_lock);
>> @@ -175,6 +186,15 @@ void iommu_sva_unbind_device(struct iommu_sva 
>> *handle)
>>           list_del(&domain->next);
>>           iommu_domain_free(domain);
>>       }
>> +
>> +    if (list_empty(&iommu_mm->sva_domains)) {
>> +        scoped_guard(spinlock_irqsave, &iommu_mms_lock) {
>> +            list_del(&iommu_mm->mm_list_elm);
>> +            if (list_empty(&iommu_sva_mms))
>> +                static_branch_disable(&iommu_sva_present);
>> +        }
>> +    }
>> +
>>       mutex_unlock(&iommu_sva_lock);
>>       kfree(handle);
>>   }
>> @@ -312,3 +332,15 @@ static struct iommu_domain 
>> *iommu_sva_domain_alloc(struct device *dev,
>>       return domain;
>>   }
>> +
>> +void iommu_sva_invalidate_kva_range(unsigned long start, unsigned 
>> long end)
>> +{
>> +    struct iommu_mm_data *iommu_mm;
>> +
>> +    if (!static_branch_unlikely(&iommu_sva_present))
>> +        return;
>> +
>> +    guard(spinlock_irqsave)(&iommu_mms_lock);
>> +    list_for_each_entry(iommu_mm, &iommu_sva_mms, mm_list_elm)
>> +        mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm, 
>> start, end);
> 
> is it possible the TLB flush side calls this API per mm?

Nope.

> 
>> +}
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 156732807994..31330c12b8ee 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -1090,7 +1090,9 @@ struct iommu_sva {
>>   struct iommu_mm_data {
>>       u32            pasid;
>> +    struct mm_struct    *mm;
>>       struct list_head    sva_domains;
>> +    struct list_head    mm_list_elm;
>>   };
>>   int iommu_fwspec_init(struct device *dev, struct fwnode_handle 
>> *iommu_fwnode);
>> @@ -1571,6 +1573,7 @@ struct iommu_sva *iommu_sva_bind_device(struct 
>> device *dev,
>>                       struct mm_struct *mm);
>>   void iommu_sva_unbind_device(struct iommu_sva *handle);
>>   u32 iommu_sva_get_pasid(struct iommu_sva *handle);
>> +void iommu_sva_invalidate_kva_range(unsigned long start, unsigned 
>> long end);
>>   #else
>>   static inline struct iommu_sva *
>>   iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> @@ -1595,6 +1598,7 @@ static inline u32 mm_get_enqcmd_pasid(struct 
>> mm_struct *mm)
>>   }
>>   static inline void mm_pasid_drop(struct mm_struct *mm) {}
>> +static inline void iommu_sva_invalidate_kva_range(unsigned long 
>> start, unsigned long end) {}
>>   #endif /* CONFIG_IOMMU_SVA */
>>   #ifdef CONFIG_IOMMU_IOPF
> 

Thanks,
baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ