[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df5353e2-1d54-476b-90ab-e673686dcc41@linux.intel.com>
Date: Thu, 17 Jul 2025 09:43:19 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
Vasant Hegde <vasant.hegde@....com>, Dave Hansen <dave.hansen@...el.com>,
Alistair Popple <apopple@...dia.com>, Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, "Tested-by : Yi Lai" <yi1.lai@...el.com>,
iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
flush
On 7/16/25 20:08, Jason Gunthorpe wrote:
> On Wed, Jul 16, 2025 at 02:34:04PM +0800, Baolu Lu wrote:
>>>> @@ -654,6 +656,9 @@ struct iommu_ops {
>>>>
>>>> int (*def_domain_type)(struct device *dev);
>>>>
>>>> + void (*paging_cache_invalidate)(struct iommu_device *dev,
>>>> + unsigned long start, unsigned long end);
>>>
>>> How would you even implement this in a driver?
>>>
>>> You either flush the whole iommu, in which case who needs a rage, or
>>> the driver has to iterate over the PASID list, in which case it
>>> doesn't really improve the situation.
>>
>> The Intel iommu driver supports flushing all SVA PASIDs with a single
>> request in the invalidation queue.
>
> How? All PASID !=0 ? The HW has no notion about a SVA PASID vs no-SVA
> else. This is just flushing almost everything.
The intel iommu driver allocates a dedicated domain id for all sva
domains. It can flush all cache entries with that domain id tagged.
>
>>> If this is a concern I think the better answer is to do a defered free
>>> like the mm can sometimes do where we thread the page tables onto a
>>> linked list, flush the CPU cache and push it all into a work which
>>> will do the iommu flush before actually freeing the memory.
>>
>> Is it a workable solution to use schedule_work() to queue the KVA cache
>> invalidation as a work item in the system workqueue? By doing so, we
>> wouldn't need the spinlock to protect the list anymore.
>
> Maybe.
>
> MM is also more careful to pull the invalidation out some of the
> locks, I don't know what the KVA side is like..
How about something like the following? It's compiled but not tested.
struct kva_invalidation_work_data {
struct work_struct work;
unsigned long start;
unsigned long end;
bool free_on_completion;
};
static void invalidate_kva_func(struct work_struct *work)
{
struct kva_invalidation_work_data *data =
container_of(work, struct kva_invalidation_work_data, work);
struct iommu_mm_data *iommu_mm;
guard(mutex)(&iommu_sva_lock);
list_for_each_entry(iommu_mm, &iommu_sva_mms, mm_list_elm)
mmu_notifier_arch_invalidate_secondary_tlbs(iommu_mm->mm,
data->start, data->end);
if (data->free_on_completion)
kfree(data);
}
void iommu_sva_invalidate_kva_range(unsigned long start, unsigned long end)
{
struct kva_invalidation_work_data stack_data;
if (!static_branch_unlikely(&iommu_sva_present))
return;
/*
* Since iommu_sva_mms is an unbound list, iterating it in an atomic
* context could introduce significant latency issues.
*/
if (in_atomic()) {
struct kva_invalidation_work_data *data =
kzalloc(sizeof(*data), GFP_ATOMIC);
if (!data)
return;
data->start = start;
data->end = end;
INIT_WORK(&data->work, invalidate_kva_func);
data->free_on_completion = true;
schedule_work(&data->work);
return;
}
stack_data.start = start;
stack_data.end = end;
invalidate_kva_func(&stack_data.work);
}
Thanks,
baolu
Powered by blists - more mailing lists