lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62d21545-9e75-41e3-89a3-f21dda15bf16@intel.com>
Date: Wed, 6 Aug 2025 09:34:12 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Lu Baolu <baolu.lu@...ux.intel.com>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
 Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
 Vasant Hegde <vasant.hegde@....com>, Alistair Popple <apopple@...dia.com>,
 Peter Zijlstra <peterz@...radead.org>, Uladzislau Rezki <urezki@...il.com>,
 Jean-Philippe Brucker <jean-philippe@...aro.org>,
 Andy Lutomirski <luto@...nel.org>, Yi Lai <yi1.lai@...el.com>,
 iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On 8/6/25 09:09, Jason Gunthorpe wrote:
>>>
>>> You can't do this approach without also pushing the pages to freed on
>>> a list and defering the free till the work. This is broadly what the
>>> normal mm user flow is doing..
>> FWIW, I think the simplest way to do this is to plop an unconditional
>> schedule_work() in pte_free_kernel(). The work function will invalidate
>> the IOTLBs and then free the page.
>>
>> Keep the schedule_work() unconditional to keep it simple. The
>> schedule_work() is way cheaper than all the system-wide TLB invalidation
>> IPIs that have to get sent as well. No need to add complexity to
>> optimize out something that's in the noise already.
> That works also, but now you have to allocate memory or you are
> dead.. Is it OK these days, and safe in this code which seems a little
> bit linked to memory management?
> 
> The MM side avoided this by putting the list and the rcu_head in the
> struct page.

I don't think you need to allocate memory. A little static structure
that uses the page->list and has a lock should do. Logically something
like this:

struct kernel_pgtable_work
{
	struct list_head list;
	spinlock_t lock;
	struct work_struct work;
} kernel_pte_work;

pte_free_kernel()
{
	struct page *page = ptdesc_magic();

	guard(spinlock)(&kernel_pte_work.lock);
	
	list_add(&page->list, &kernel_pte_work.list);
	schedule_work(&kernel_pte_work.work);
}

work_func()
{
	iommu_sva_invalidate_kva();

	guard(spinlock)(&kernel_pte_work.lock);

	list_for_each_safe() {
		page = container_of(...);
		free_whatever(page);
	}
}

The only wrinkle is that pte_free_kernel() itself still has a pte and
'ptdesc', not a 'struct page'. But there is ptdesc->pt_list, which
should be unused at this point, especially for non-pgd pages on x86.

So, either go over to the 'struct page' earlier (maybe by open-coding
pagetable_dtor_free()?), or just use the ptdesc.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ