[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <563bd363-d806-4ee5-bcfe-05725055598d@gmail.com>
Date: Tue, 12 Aug 2025 09:17:01 +0800
From: Ethan Zhao <etzhao1900@...il.com>
To: Dave Hansen <dave.hansen@...el.com>, Uladzislau Rezki <urezki@...il.com>,
Baolu Lu <baolu.lu@...ux.intel.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Joerg Roedel <joro@...tes.org>,
Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
Vasant Hegde <vasant.hegde@....com>, Alistair Popple <apopple@...dia.com>,
Peter Zijlstra <peterz@...radead.org>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, Yi Lai <yi1.lai@...el.com>,
iommu@...ts.linux.dev, security@...nel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
flush
On 8/11/2025 9:55 PM, Dave Hansen wrote:
> On 8/11/25 02:15, Uladzislau Rezki wrote:
>>> kernel_pte_work.list is global shared var, it would make the producer
>>> pte_free_kernel() and the consumer kernel_pte_work_func() to operate in
>>> serialized timing. In a large system, I don't think you design this
>>> deliberately 🙂
>>>
>> Sorry for jumping.
>>
>> Agree, unless it is never considered as a hot path or something that can
>> be really contented. It looks like you can use just a per-cpu llist to drain
>> thinks.
>
> Remember, the code that has to run just before all this sent an IPI to
> every single CPU on the system to have them do a (on x86 at least)
> pretty expensive TLB flush.
>
It can be easily identified as a bottleneck by multi-CPU stress testing
programs involving frequent process creation and destruction, similar to
the operation of a heavily loaded multi-process Apache web server.
Hot/cold path ?
> If this is a hot path, we have bigger problems on our hands: the full
> TLB flush on every CPU.
Perhaps not "WE", IPI driven TLB flush seems not the shared mechanism of
all CPUs, at least not for ARM as far as I know.
>
> So, sure, there are a million ways to make this deferred freeing more
> scalable. But the code that's here is dirt simple and self contained. If
> someone has some ideas for something that's simpler and more scalable,
> then I'm totally open to it.
>
> But this is _not_ the place to add complexity to get scalability.
At least, please dont add bottleneck, how complex to do that ?
Thanks,
Ethan
Powered by blists - more mailing lists