[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250521034527.3476332-1-hezhongkun.hzk@bytedance.com>
Date: Wed, 21 May 2025 11:45:27 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: tj@...nel.org,
hannes@...xchg.org,
longman@...hat.com
Cc: cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
muchun.song@...ux.dev,
Zhongkun He <hezhongkun.hzk@...edance.com>
Subject: [PATCH] cpuset: introduce non-blocking cpuset.mems setting option
Setting the cpuset.mems in cgroup v2 can trigger memory
migrate in cpuset. This behavior is fine for newly created
cgroups but it can cause issues for the existing cgroups.
In our scenario, modifying the cpuset.mems setting during
peak times frequently leads to noticeable service latency
or stuttering.
It is important to have a consistent set of behavior for
both cpus and memory. But it does cause issues at times,
so we would like to have a flexible option.
This idea is from the non-blocking limit setting option in
memory control.
https://lore.kernel.org/all/20250506232833.3109790-1-shakeel.butt@linux.dev/
Signed-off-by: Zhongkun He <hezhongkun.hzk@...edance.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 +++++++
kernel/cgroup/cpuset-internal.h | 6 ++++++
kernel/cgroup/cpuset.c | 7 +++++++
3 files changed, 20 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..d9e8e2a770af 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2408,6 +2408,13 @@ Cpuset Interface Files
a need to change "cpuset.mems" with active tasks, it shouldn't
be done frequently.
+ If cpuset.mems is opened with O_NONBLOCK then the migration is
+ bypassed. This is useful for admin processes that need to adjust
+ the cpuset.mems dynamically without blocking. However, there is
+ a risk that previously allocated pages are not within the new
+ cpuset.mems range, which may be altered by move_pages syscall or
+ numa_balance.
+
cpuset.mems.effective
A read-only multiple values file which exists on all
cpuset-enabled cgroups.
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 383963e28ac6..5686bb08c4fe 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -162,6 +162,9 @@ struct cpuset {
/* partition root state */
int partition_root_state;
+ /* Do not migrate memory when modifying cpuset.mems this time */
+ bool skip_migration_once;
+
/*
* number of SCHED_DEADLINE tasks attached to this cpuset, so that we
* know when to rebuild associated root domain bandwidth information.
@@ -227,6 +230,9 @@ static inline int is_sched_load_balance(const struct cpuset *cs)
static inline int is_memory_migrate(const struct cpuset *cs)
{
+ if (cs->skip_migration_once)
+ return 0;
+
return test_bit(CS_MEMORY_MIGRATE, &cs->flags);
}
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 24b70ea3e6ce..f43d7b291cde 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3208,7 +3208,14 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
retval = update_exclusive_cpumask(cs, trialcs, buf);
break;
case FILE_MEMLIST:
+ if (of->file->f_flags & O_NONBLOCK)
+ cs->skip_migration_once = true;
+
retval = update_nodemask(cs, trialcs, buf);
+
+ /* Restore skip_migration */
+ if (cs->skip_migration_once)
+ cs->skip_migration_once = false;
break;
default:
retval = -EINVAL;
--
2.39.5
Powered by blists - more mailing lists