[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c69950ee-660b-4f51-9277-522470d0ce5d@linux.intel.com>
Date: Wed, 27 Aug 2025 18:58:02 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Dave Hansen <dave.hansen@...el.com>, "Tian, Kevin"
<kevin.tian@...el.com>, Jason Gunthorpe <jgg@...dia.com>
Cc: baolu.lu@...ux.intel.com, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
Jann Horn <jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>,
Alistair Popple <apopple@...dia.com>, Peter Zijlstra <peterz@...radead.org>,
Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, "Lai, Yi1" <yi1.lai@...el.com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
"security@...nel.org" <security@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>, vishal.moola@...il.com,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
flush
On 8/26/2025 10:22 PM, Dave Hansen wrote:
> On 8/25/25 19:49, Baolu Lu wrote:
>>> The three separate lists are needed because we're handling three
>>> distinct types of page deallocation. Grouping the pages this way allows
>>> the workqueue handler to free each type using the correct function.
>>
>> Please allow me to add more details.
>
> Right, I know why it got added this way: it was the quickest way to hack
> together a patch that fixes the IOMMU issue without refactoring anything.
>
> I agree that you have three cases:
> 1. A full on 'struct ptdesc' that needs its destructor run
> 2. An order-0 'struct page'
> 3. A higher-order 'struct page'
>
> Long-term, #2 and #3 probably need to get converted over to 'struct
> ptdesc'. They don't look _that_ hard to convert to me. Willy, Vishal,
> any other mm folks: do you agree?
>
> Short-term, I'd just consolidate your issue down to a single list.
>
> #1: For 'struct ptdesc', modify pte_free_kernel() to pass information in
> to pagetable_dtor_free() to tell it to use the deferred page table
> free list. Do this with a bit in the ptdesc or a new argument to
> pagetable_dtor_free().
> #2. Just append these to the deferred page table free list. Easy.
> #3. The biggest hacky way to do this is to just treat the higher-order
> non-compound page and put the pages on the deferred page table
> free list one at a time. The other way to do it is to track down how
> this thing got allocated in the first place and make sure it's got
> __GFP_COMP metadata. If so, you can just use __free_pages() for
> everything.
>
> Yeah, it'll take a couple patches up front to refactor some things. But
> that refactoring will make things more consistent instead of adding
> adding complexity to deal with the inconsistency.
Following the insights above, I wrote the code as follows. Does it look
good?
mm/pgtable-generic.c:
#ifdef CONFIG_ASYNC_PGTABLE_FREE
/* a 'struct ptdesc' that needs its destructor run */
#define ASYNC_PGTABLE_FREE_DTOR BIT(NR_PAGEFLAGS)
static void kernel_pgtable_work_func(struct work_struct *work);
static struct {
struct list_head list;
/* protect above ptdesc lists */
spinlock_t lock;
struct work_struct work;
} kernel_pgtable_work = {
.list = LIST_HEAD_INIT(kernel_pgtable_work.list),
.lock = __SPIN_LOCK_UNLOCKED(kernel_pgtable_work.lock),
.work = __WORK_INITIALIZER(kernel_pgtable_work.work,
kernel_pgtable_work_func),
};
static void kernel_pgtable_work_func(struct work_struct *work)
{
struct ptdesc *ptdesc, *next;
LIST_HEAD(page_list);
spin_lock(&kernel_pgtable_work.lock);
list_splice_tail_init(&kernel_pgtable_work.list, &page_list);
spin_unlock(&kernel_pgtable_work.lock);
iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
list_for_each_entry_safe(ptdesc, next, &page_list, pt_list) {
list_del(&ptdesc->pt_list);
if (ptdesc->__page_flags & ASYNC_PGTABLE_FREE_DTOR)
pagetable_dtor_free(ptdesc);
else
__free_page(ptdesc_page(ptdesc));
}
}
void kernel_pgtable_async_free_dtor(struct ptdesc *ptdesc)
{
spin_lock(&kernel_pgtable_work.lock);
ptdesc->__page_flags |= ASYNC_PGTABLE_FREE_DTOR;
list_add(&ptdesc->pt_list, &kernel_pgtable_work.list);
spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
void kernel_pgtable_async_free_page_list(struct list_head *list)
{
spin_lock(&kernel_pgtable_work.lock);
list_splice_tail(list, &kernel_pgtable_work.list);
spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
void kernel_pgtable_async_free_page(struct page *page)
{
spin_lock(&kernel_pgtable_work.lock);
list_add(&page_ptdesc(page)->pt_list, &kernel_pgtable_work.list);
spin_unlock(&kernel_pgtable_work.lock);
schedule_work(&kernel_pgtable_work.work);
}
#endif
Thanks,
baolu
Powered by blists - more mailing lists