[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f58a6825-e53a-4751-97cc-0891052936f1@linux.intel.com>
Date: Wed, 16 Jul 2025 14:34:04 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
Vasant Hegde <vasant.hegde@....com>, Dave Hansen <dave.hansen@...el.com>,
Alistair Popple <apopple@...dia.com>, Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, "Tested-by : Yi Lai" <yi1.lai@...el.com>,
iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] iommu/sva: Invalidate KVA range on kernel TLB
flush
On 7/15/25 20:25, Jason Gunthorpe wrote:
> On Tue, Jul 15, 2025 at 01:55:01PM +0800, Baolu Lu wrote:
>> Yes, the mm (struct mm of processes that are bound to devices) list is
>> an unbounded list and can theoretically grow indefinitely. This results
>> in an unpredictable critical region.
>
> Every MM has a unique PASID so I don't see how you can avoid this.
>
>> @@ -654,6 +656,9 @@ struct iommu_ops {
>>
>> int (*def_domain_type)(struct device *dev);
>>
>> + void (*paging_cache_invalidate)(struct iommu_device *dev,
>> + unsigned long start, unsigned long end);
>
> How would you even implement this in a driver?
>
> You either flush the whole iommu, in which case who needs a rage, or
> the driver has to iterate over the PASID list, in which case it
> doesn't really improve the situation.
The Intel iommu driver supports flushing all SVA PASIDs with a single
request in the invalidation queue. I am not sure if other IOMMU
implementations also support this, so you are right, it doesn't
generally improve the situation.
>
> If this is a concern I think the better answer is to do a defered free
> like the mm can sometimes do where we thread the page tables onto a
> linked list, flush the CPU cache and push it all into a work which
> will do the iommu flush before actually freeing the memory.
Is it a workable solution to use schedule_work() to queue the KVA cache
invalidation as a work item in the system workqueue? By doing so, we
wouldn't need the spinlock to protect the list anymore.
Perhaps we would need another interface, perhaps named
iommu_sva_flush_kva_inv_wq(), to guarantee that all flush work is
completed before actually freeing the pages.
> One of the KPTI options might be easier at that point..
>
> Jason
Thanks,
baolu
Powered by blists - more mailing lists