[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251022104005.907410538@linutronix.de>
Date: Wed, 22 Oct 2025 14:55:19 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Gabriele Monaco <gmonaco@...hat.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Michael Jeanson <mjeanson@...icios.com>,
Jens Axboe <axboe@...nel.dk>,
"Paul E. McKenney" <paulmck@...nel.org>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Florian Weimer <fweimer@...hat.com>,
Tim Chen <tim.c.chen@...el.com>,
Yury Norov <yury.norov@...il.com>
Subject: [patch V2 00/20] sched: Rewrite MM CID management
This is a follow up on V1 series which can be found here:
https://lore.kernel.org/20251015164952.694882104@linutronix.de
The V1 cover letter contains a detailed analyisis of the issues.
TLDR: The CID management is way to complex and adds significant overhead
into scheduler hotpaths.
The series rewrites MM CID management in a more simplistic way which
focusses on low overhead in the scheduler while maintaining per task CIDs
as long as the number of threads is not exceeding the number of possible
CPUs.
The series is based on the V5 series of the rseq rewrite:
https://lore.kernel.org/20251022121836.019469732@linutronix.de/
which is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/perf
The series on top of the rseq/perf branch is available from git as well:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rseq/cid
Changes vs. V1:
- Use num_possible_cpus() instead of nr_cpu_ids. - PeterZ
- Cache the number of possible CPUs as that is constant after init and
expose the cached value via num_possible_cpus() instead of calculating
the same constant weight over and over.
- Rename cpumask_or_weight() and use the weight helper macro - Yury
- Fix the bogus condition in the task to CPU fixup - PeterZ
- Add a transitional state bit which prevents CID space exhaustion when
switching from per CPU mode to per task mode and fixup the
corresponding logic all over the place.
Thanks,
tglx
---
Thomas Gleixner (20):
sched/mmcid: Revert the complex CID management
sched/mmcid: Use proper data structures
sched/mmcid: Cacheline align MM CID storage
sched: Fixup whitespace damage
sched/mmcid: Move scheduler code out of global header
sched/mmcid: Prevent pointless work in mm_update_cpus_allowed()
cpumask: Introduce cpumask_or_and_calc_weight()
sched/mmcid: Use cpumask_or_and_calc_weight()
cpumask: Cache num_possible_cpus()
sched/mmcid: Convert mm CID mask to a bitmap
signal: Move MMCID exit out of sighand lock
sched/mmcid: Move initialization out of line
sched/mmcid: Provide precomputed maximal value
sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutex
sched/mmcid: Introduce per task/CPU ownership infrastrcuture
sched/mmcid: Provide new scheduler CID mechanism
sched/mmcid: Provide CID ownership mode fixup functions
irqwork: Move data struct to a types header
sched/mmcid: Implement deferred mode change
sched/mmcid: Switch over to the new mechanism
include/linux/bitmap.h | 16
include/linux/cpumask.h | 26 +
include/linux/irq_work.h | 9
include/linux/irq_work_types.h | 14
include/linux/mm_types.h | 125 ------
include/linux/rseq.h | 27 -
include/linux/rseq_types.h | 71 +++
include/linux/sched.h | 19
init/init_task.c | 3
kernel/cpu.c | 15
kernel/exit.c | 1
kernel/fork.c | 7
kernel/sched/core.c | 815 +++++++++++++++++++----------------------
kernel/sched/sched.h | 396 ++++++++-----------
kernel/signal.c | 2
lib/bitmap.c | 6
16 files changed, 727 insertions(+), 825 deletions(-)
Powered by blists - more mailing lists