linux-kernel - Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8bf303a222ba27f3a86b357db58ee3df3fa7f82e.camel@surriel.com>
Date: Thu, 28 Nov 2024 21:52:48 -0500
From: Rik van Riel <riel@...riel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Peter Zijlstra
	 <peterz@...radead.org>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev, 
	lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org, Ingo Molnar
	 <mingo@...nel.org>, Dave Hansen <dave.hansen@...el.com>, Linus Torvalds
	 <torvalds@...ux-foundation.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [tip:x86/mm] [x86/mm/tlb]  209954cbc7: 
 will-it-scale.per_thread_ops 13.2% regression

On Thu, 2024-11-28 at 14:46 -0500, Mathieu Desnoyers wrote:
> 
> I suspect you could use a similar per-cpu data structure per-mm
> to keep track of the pending TLB flush mask, and update it simply
> with
> load/store to per-CPU data rather than have to cacheline-bounce all
> over
> the place due to frequent mm_cpumask atomic updates.
> 
> Then you get all the benefits without introducing a window where
> useless
> TLB flush IPIs get triggered.
> 
> Of course it's slightly less compact in terms of memory footprint
> than a
> cpumask, but you gain a lot by removing cache line bouncing on this
> frequent context switch code path.
> 
> Thoughts ?

The first thought that comes to mind is that we already
have a per-CPU variable indicating which is the currently
loaded mm on that CPU.

We could probably just skip sending IPIs to CPUs that do
not have the mm_struct currently loaded.

This can race against switch_mm_irqs_off() on a CPU
switching to that mm simultaneously with the TLB flush,
which should be fine because that CPU cannot load TLB
entries from previously cleared page tables.

However, it does mean we cannot safely clear bits
out of the mm_cpumask, because a race between clearing
the bit on one CPU, and setting it on another would not
be something we could easily catch at all, unless we
can figure out some clever memory ordering thing there.

-- 
All Rights Reversed.