linux-kernel - Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <df5353e2-1d54-476b-90ab-e673686dcc41@linux.intel.com>
Date: Thu, 17 Jul 2025 09:43:19 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Joerg Roedel <joro@...tes.org>,
 Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
 Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
 Vasant Hegde <vasant.hegde@....com>, Dave Hansen <dave.hansen@...el.com>,
 Alistair Popple <apopple@...dia.com>, Uladzislau Rezki <urezki@...il.com>,
 Jean-Philippe Brucker <jean-philippe@...aro.org>,
 Andy Lutomirski <luto@...nel.org>, "Tested-by : Yi Lai" <yi1.lai@...el.com>,
 iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On 7/16/25 20:08, Jason Gunthorpe wrote:
> On Wed, Jul 16, 2025 at 02:34:04PM +0800, Baolu Lu wrote:
>>>> @@ -654,6 +656,9 @@ struct iommu_ops {
>>>>
>>>>    	int (*def_domain_type)(struct device *dev);
>>>>
>>>> +	void (*paging_cache_invalidate)(struct iommu_device *dev,
>>>> +					unsigned long start, unsigned long end);
>>>
>>> How would you even implement this in a driver?
>>>
>>> You either flush the whole iommu, in which case who needs a rage, or
>>> the driver has to iterate over the PASID list, in which case it
>>> doesn't really improve the situation.
>>
>> The Intel iommu driver supports flushing all SVA PASIDs with a single
>> request in the invalidation queue.
> 
> How? All PASID !=0 ? The HW has no notion about a SVA PASID vs no-SVA
> else. This is just flushing almost everything.

The intel iommu driver allocates a dedicated domain id for all sva
domains. It can flush all cache entries with that domain id tagged.

> 
>>> If this is a concern I think the better answer is to do a defered free
>>> like the mm can sometimes do where we thread the page tables onto a
>>> linked list, flush the CPU cache and push it all into a work which
>>> will do the iommu flush before actually freeing the memory.
>>
>> Is it a workable solution to use schedule_work() to queue the KVA cache
>> invalidation as a work item in the system workqueue? By doing so, we
>> wouldn't need the spinlock to protect the list anymore.
> 
> Maybe.
> 
> MM is also more careful to pull the invalidation out some of the
> locks, I don't know what the KVA side is like..
How about something like the following? It's compiled but not tested.

struct kva_invalidation_work_data {
	struct work_struct work;
	unsigned long start;
	unsigned long end;
	bool free_on_completion;
};

static void invalidate_kva_func(struct work_struct *work)
{
	struct kva_invalidation_work_data *data =
		container_of(work, struct kva_invalidation_work_data, work);
	struct iommu_mm_data *iommu_mm;

	guard(mutex)(&iommu_sva_lock);
	list_for_each_entry(iommu_mm, &iommu_sva_mms, mm_list_elm)
		mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm,
				data->start, data->end);

	if (data->free_on_completion)
		kfree(data);
}

void iommu_sva_invalidate_kva_range(unsigned long start, unsigned long end)
{
	struct kva_invalidation_work_data stack_data;

	if (!static_branch_unlikely(&iommu_sva_present))
		return;

	/*
	 * Since iommu_sva_mms is an unbound list, iterating it in an atomic
	 * context could introduce significant latency issues.
	 */
	if (in_atomic()) {
		struct kva_invalidation_work_data *data =
			kzalloc(sizeof(*data), GFP_ATOMIC);

		if (!data)
			return;

		data->start = start;
		data->end = end;
		INIT_WORK(&data->work, invalidate_kva_func);
		data->free_on_completion = true;
		schedule_work(&data->work);
		return;
	}

	stack_data.start = start;
	stack_data.end = end;
	invalidate_kva_func(&stack_data.work);
}

Thanks,
baolu