lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8bf303a222ba27f3a86b357db58ee3df3fa7f82e.camel@surriel.com>
Date: Thu, 28 Nov 2024 21:52:48 -0500
From: Rik van Riel <riel@...riel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Peter Zijlstra
	 <peterz@...radead.org>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev, 
	lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org, Ingo Molnar
	 <mingo@...nel.org>, Dave Hansen <dave.hansen@...el.com>, Linus Torvalds
	 <torvalds@...ux-foundation.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [tip:x86/mm] [x86/mm/tlb]  209954cbc7: 
 will-it-scale.per_thread_ops 13.2% regression

On Thu, 2024-11-28 at 14:46 -0500, Mathieu Desnoyers wrote:
> 
> I suspect you could use a similar per-cpu data structure per-mm
> to keep track of the pending TLB flush mask, and update it simply
> with
> load/store to per-CPU data rather than have to cacheline-bounce all
> over
> the place due to frequent mm_cpumask atomic updates.
> 
> Then you get all the benefits without introducing a window where
> useless
> TLB flush IPIs get triggered.
> 
> Of course it's slightly less compact in terms of memory footprint
> than a
> cpumask, but you gain a lot by removing cache line bouncing on this
> frequent context switch code path.
> 
> Thoughts ?

The first thought that comes to mind is that we already
have a per-CPU variable indicating which is the currently
loaded mm on that CPU.

We could probably just skip sending IPIs to CPUs that do
not have the mm_struct currently loaded.

This can race against switch_mm_irqs_off() on a CPU
switching to that mm simultaneously with the TLB flush,
which should be fine because that CPU cannot load TLB
entries from previously cleared page tables.

However, it does mean we cannot safely clear bits
out of the mm_cpumask, because a race between clearing
the bit on one CPU, and setting it on another would not
be something we could easily catch at all, unless we
can figure out some clever memory ordering thing there.

-- 
All Rights Reversed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ