[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0jIsYsuo_9w16tK@localhost.localdomain>
Date: Thu, 28 Nov 2024 14:46:57 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Peter Zijlstra <peterz@...radead.org>, Rik van Riel <riel@...riel.com>
Cc: kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org,
Ingo Molnar <mingo@...nel.org>, Dave Hansen <dave.hansen@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Mel Gorman <mgorman@...e.de>
Subject: Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7:
will-it-scale.per_thread_ops 13.2% regression
On 28-Nov-2024 10:57:35 PM, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on:
>
>
> commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
AFAIU, this commit changes the way TLB flushes are inhibited when
context switching away from a mm. This means that one additional TLB
flush is performed to a given CPU even after it has context switched
away from the mm, and only then is the mm_cpumask cleared for that CPU.
This could result in additional TLB flush IPI overhead in specific
scenarios where the IPIs are typically triggered after a thread has
context-switched out.
May I recommend looking into a scheme similar to rseq mm_cid for this ?
We're already adding a per-mm per-cpu data:
mm_struct:
/**
* @pcpu_cid: Per-cpu current cid.
*
* Keep track of the currently allocated mm_cid for each cpu.
* The per-cpu mm_cid values are serialized by their respective
* runqueue locks.
*/
struct mm_cid __percpu *pcpu_cid;
struct mm_cid {
u64 time;
int cid;
int recent_cid;
};
I suspect you could use a similar per-cpu data structure per-mm
to keep track of the pending TLB flush mask, and update it simply with
load/store to per-CPU data rather than have to cacheline-bounce all over
the place due to frequent mm_cpumask atomic updates.
Then you get all the benefits without introducing a window where useless
TLB flush IPIs get triggered.
Of course it's slightly less compact in terms of memory footprint than a
cpumask, but you gain a lot by removing cache line bouncing on this
frequent context switch code path.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists