[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62d21545-9e75-41e3-89a3-f21dda15bf16@intel.com>
Date: Wed, 6 Aug 2025 09:34:12 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
Vasant Hegde <vasant.hegde@....com>, Alistair Popple <apopple@...dia.com>,
Peter Zijlstra <peterz@...radead.org>, Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, Yi Lai <yi1.lai@...el.com>,
iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
flush
On 8/6/25 09:09, Jason Gunthorpe wrote:
>>>
>>> You can't do this approach without also pushing the pages to freed on
>>> a list and defering the free till the work. This is broadly what the
>>> normal mm user flow is doing..
>> FWIW, I think the simplest way to do this is to plop an unconditional
>> schedule_work() in pte_free_kernel(). The work function will invalidate
>> the IOTLBs and then free the page.
>>
>> Keep the schedule_work() unconditional to keep it simple. The
>> schedule_work() is way cheaper than all the system-wide TLB invalidation
>> IPIs that have to get sent as well. No need to add complexity to
>> optimize out something that's in the noise already.
> That works also, but now you have to allocate memory or you are
> dead.. Is it OK these days, and safe in this code which seems a little
> bit linked to memory management?
>
> The MM side avoided this by putting the list and the rcu_head in the
> struct page.
I don't think you need to allocate memory. A little static structure
that uses the page->list and has a lock should do. Logically something
like this:
struct kernel_pgtable_work
{
struct list_head list;
spinlock_t lock;
struct work_struct work;
} kernel_pte_work;
pte_free_kernel()
{
struct page *page = ptdesc_magic();
guard(spinlock)(&kernel_pte_work.lock);
list_add(&page->list, &kernel_pte_work.list);
schedule_work(&kernel_pte_work.work);
}
work_func()
{
iommu_sva_invalidate_kva();
guard(spinlock)(&kernel_pte_work.lock);
list_for_each_safe() {
page = container_of(...);
free_whatever(page);
}
}
The only wrinkle is that pte_free_kernel() itself still has a pte and
'ptdesc', not a 'struct page'. But there is ptdesc->pt_list, which
should be unused at this point, especially for non-pgd pages on x86.
So, either go over to the 'struct page' earlier (maybe by open-coding
pagetable_dtor_free()?), or just use the ptdesc.
Powered by blists - more mailing lists