[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1df2caa3-4c5d-4bd9-88bb-66a07bf1eb65@intel.com>
Date: Fri, 8 Nov 2024 12:31:56 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Rik van Riel <riel@...riel.com>, Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, x86@...nel.org,
"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [PATCH] x86,tlb: update mm_cpumask lazily
On 11/8/24 11:31, Rik van Riel wrote:
> On busy multi-threaded workloads, there can be significant contention
> on the mm_cpumask at context switch time.
>
> Reduce that contention by updating mm_cpumask lazily, setting the CPU bit
> at context switch time (if not already set), and clearing the CPU bit at
> the first TLB flush sent to a CPU where the process isn't running.
>
> When a flurry of TLB flushes for a process happen, only the first one
> will be sent to CPUs where the process isn't running. The others will
> be sent to CPUs where the process is currently running.
So I guess it comes down to balancing:
The cpumask_clear_cpu() happens on every mm switch which can be
thousands of times a second. But it's _relatively_ cheap: dozens or a
couple hundred cycles.
with:
Skipping the cpumask_clear_cpu() will cause more TLB flushes. It can
cause at most one extra TLB flush for each time a process is migrated
off a CPU and never returns. This is _relatively_ expensive: on the
order of thousands of cycles to send and receive an IPI.
Migrations are obviously the enemy here, but they're the enemy for lots
of _other_ reasons too, which is a really nice property.
The only thing I can think of that really worries me is some kind of
forked worker model where before this patch you would have:
* fork()
* run on CPU A
* ... migrate to CPU B
* malloc()/free(), needs to flush B only
* exit()
and after:
* fork()
* run on CPU A
* ... migrate to CPU B
* malloc()/free(), needs to flush A+B, including IPI
* exit()
Where that IPI wasn't needed at *all* before. But that's totally contrived.
So I think this is the kind of thing we'd want to apply to -rc1 and let
the robots poke at it for a few weeks. But it does seem like a sound
idea to me.
Powered by blists - more mailing lists