[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250113074231.61638-1-gmonaco@redhat.com>
Date: Mon, 13 Jan 2025 08:42:28 +0100
From: Gabriele Monaco <gmonaco@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Juri Lelli <juri.lelli@...hat.com>,
Gabriele Monaco <gmonaco@...hat.com>
Subject: [PATCH v4 0/3] sched: Restructure task_mm_cid_work for predictability
This patchset moves the task_mm_cid_work to a preemptible and migratable
context. This reduces the impact of this task to the scheduling latency
of real time tasks.
The change makes the recurrence of the task a bit more predictable.
We also add optimisation and fixes to make sure the task_mm_cid_work
works as intended.
The behaviour causing latency was introduced in commit 223baf9d17f2
("sched: Fix performance regression introduced by mm_cid") which
introduced a task work tied to the scheduler tick.
That approach presents two possible issues:
* the task work runs before returning to user and causes, in fact, a
scheduling latency (with order of magnitude significant in PREEMPT_RT)
* periodic tasks with short runtime are less likely to run during the
tick, hence they might not run the task work at all
Patch 1 allows the mm_cids to be actually compacted when a process
reduces its number of threads, which was not the case since the same
mm_cids were reused to improve cache locality, more details in [3].
Patch 2 contains the main changes, removing the task_work on the
scheduler tick and using a delayed_work instead.
Additionally, we terminate the call immediately if we see that no mm_cid
is actually active, which could happen on processes sleeping for long
time or which exited but whose mm has not been freed yet.
Patch 3 adds a selftest to validate the functionality of the
task_mm_cid_work (i.e. to compact the mm_cids). The test fails if patch
1 is not applied and is flaky without patch 2. We expect it to always
pass with the entire patchset applied.
Changes since V3 [1]:
* Fixes on the selftest
* Minor style issues in comments and indentation
* Use of perror where possible
* Add a barrier to align threads execution
* Improve test failure and error handling
Changes since V2 [2]:
* Change the order of the patches
* Merge patches changing the main delayed_work logic
* Improved self-test to spawn 1 less thread and use the main one instead
Changes since V1 [3]:
* Re-arm the delayed_work at each invocation
* Cancel the work synchronously at mmdrop
* Remove next scan fields and completely rely on the delayed_work
* Shrink mm_cid allocation with nr thread/affinity (Mathieu Desnoyers)
* Add self test
Overhead comparison in [3]
[1] - https://lore.kernel.org/linux-kernel/20241216130909.240042-1-gmonaco@redhat.com/
[2] - https://lore.kernel.org/linux-kernel/20241213095407.271357-1-gmonaco@redhat.com/
[3] - https://lore.kernel.org/linux-kernel/20241205083110.180134-2-gmonaco@redhat.com/
Gabriele Monaco (2):
sched: Move task_mm_cid_work to mm delayed work
rseq/selftests: Add test for mm_cid compaction
Mathieu Desnoyers (1):
sched: Compact RSEQ concurrency IDs with reduced threads and affinity
include/linux/mm_types.h | 23 ++-
include/linux/sched.h | 1 -
kernel/sched/core.c | 66 +------
kernel/sched/sched.h | 32 ++-
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 185 ++++++++++++++++++
7 files changed, 231 insertions(+), 79 deletions(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
--
2.47.1
Powered by blists - more mailing lists