[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250520031552.1931598-1-hezhongkun.hzk@bytedance.com>
Date: Tue, 20 May 2025 11:15:52 +0800
From: Zhongkun He <hezhongkun.hzk@...edance.com>
To: tj@...nel.org,
hannes@...xchg.org,
longman@...hat.com
Cc: cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
muchun.song@...ux.dev,
Zhongkun He <hezhongkun.hzk@...edance.com>
Subject: [PATCH] cpuset: introduce non-blocking cpuset.mems setting option
Setting the cpuset.mems in cgroup v2 can trigger memory
migrate in cpuset. This behavior is fine for newly created
cgroups but it can cause issues for the existing cgroups.
In our scenario, modifying the cpuset.mems setting during
peak times frequently leads to noticeable service latency
or stuttering.
It is important to have a consistent set of behavior for
both cpus and memory. But it does cause issues at times,
so we would hope to have a flexible option.
This idea is from the non-blocking limit setting option in
memory control.
https://lore.kernel.org/all/20250506232833.3109790-1-shakeel.butt@linux.dev/
Signed-off-by: Zhongkun He <hezhongkun.hzk@...edance.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 +++++++
kernel/cgroup/cpuset.c | 11 +++++++++++
2 files changed, 18 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..d9e8e2a770af 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2408,6 +2408,13 @@ Cpuset Interface Files
a need to change "cpuset.mems" with active tasks, it shouldn't
be done frequently.
+ If cpuset.mems is opened with O_NONBLOCK then the migration is
+ bypassed. This is useful for admin processes that need to adjust
+ the cpuset.mems dynamically without blocking. However, there is
+ a risk that previously allocated pages are not within the new
+ cpuset.mems range, which may be altered by move_pages syscall or
+ numa_balance.
+
cpuset.mems.effective
A read-only multiple values file which exists on all
cpuset-enabled cgroups.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 24b70ea3e6ce..2a0867e0c6d2 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3208,7 +3208,18 @@ ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
retval = update_exclusive_cpumask(cs, trialcs, buf);
break;
case FILE_MEMLIST:
+ bool skip_migrate_once = false;
+
+ if ((of->file->f_flags & O_NONBLOCK) &&
+ is_memory_migrate(cs) &&
+ !cpuset_update_flag(CS_MEMORY_MIGRATE, cs, 0))
+ skip_migrate_once = true;
+
retval = update_nodemask(cs, trialcs, buf);
+
+ /* Restore the migrate flag */
+ if (skip_migrate_once)
+ cpuset_update_flag(CS_MEMORY_MIGRATE, cs, 1);
break;
default:
retval = -EINVAL;
--
2.39.5
Powered by blists - more mailing lists