linux-kernel - Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aJoEiajJwlWuXyax@pc636>
Date: Mon, 11 Aug 2025 16:56:09 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Uladzislau Rezki <urezki@...il.com>, Ethan Zhao <etzhao1900@...il.com>,
	Baolu Lu <baolu.lu@...ux.intel.com>,
	Jason Gunthorpe <jgg@...dia.com>, Joerg Roedel <joro@...tes.org>,
	Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>,
	Kevin Tian <kevin.tian@...el.com>, Jann Horn <jannh@...gle.com>,
	Vasant Hegde <vasant.hegde@....com>,
	Alistair Popple <apopple@...dia.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Jean-Philippe Brucker <jean-philippe@...aro.org>,
	Andy Lutomirski <luto@...nel.org>, Yi Lai <yi1.lai@...el.com>,
	iommu@...ts.linux.dev, security@...nel.org,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

On Mon, Aug 11, 2025 at 06:55:52AM -0700, Dave Hansen wrote:
> On 8/11/25 02:15, Uladzislau Rezki wrote:
> >> kernel_pte_work.list is global shared var, it would make the producer
> >> pte_free_kernel() and the consumer kernel_pte_work_func() to operate in
> >> serialized timing. In a large system, I don't think you design this
> >> deliberately 🙂
> >>
> > Sorry for jumping.
> > 
> > Agree, unless it is never considered as a hot path or something that can
> > be really contented. It looks like you can use just a per-cpu llist to drain
> > thinks.
> 
> Remember, the code that has to run just before all this sent an IPI to
> every single CPU on the system to have them do a (on x86 at least)
> pretty expensive TLB flush.
> 
> If this is a hot path, we have bigger problems on our hands: the full
> TLB flush on every CPU.
> 
> So, sure, there are a million ways to make this deferred freeing more
> scalable. But the code that's here is dirt simple and self contained. If
> someone has some ideas for something that's simpler and more scalable,
> then I'm totally open to it.
> 
You could also have a look toward removing the &kernel_pte_work.lock.
Replace it by llist_add() on adding side and llist_for_each_safe(n, t, llist_del_all(&list))
on removing side. So you do not need guard(spinlock) stuff. 

If i do not miss anything.

>
> But this is _not_ the place to add complexity to get scalability.
>
OK.

--
Uladzislau Rezki