linux-kernel - [PATCH] cgroup/cpuset: Avoid memory migration when nodemasks match

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210825105415.1365360-1-nsaenzju@redhat.com>
Date:   Wed, 25 Aug 2021 12:54:15 +0200
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     tj@...nel.org, lizefan.x@...edance.com, hannes@...xchg.org,
        mtosatti@...hat.com, nilal@...hat.com, frederic@...nel.org,
        longman@...hat.com, Nicolas Saenz Julienne <nsaenzju@...hat.com>
Subject: [PATCH] cgroup/cpuset: Avoid memory migration when nodemasks match

With the introduction of ee9707e8593d ("cgroup/cpuset: Enable memory
migration for cpuset v2") attaching a process to a different cgroup will
trigger a memory migration regardless of whether it's really needed.
Memory migration is an expensive operation, so bypass it if the
nodemasks passed to cpuset_migrate_mm() are equal.

Note that we're not only avoiding the migration work itself, but also a
call to lru_cache_disable(), which triggers and flushes an LRU drain
work on every online CPU.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>

---

NOTE: This also alleviates hangs I stumbled upon while testing
linux-next on systems with nohz_full CPUs (running latency sensitive
loads). ee9707e8593d's newly imposed memory migration never finishes, as
the LRU drain is never scheduled on isolated CPUs.

I tried to follow the user-space call trace, it's something like this:

  Create new tmux pane, which triggers hostname operation, hangs...
    -> systemd (pid 1) creates new hostnamed process (using clone())
      -> hostnamed process attaches itself to:
  	 "system.slice/systemd-hostnamed.service/cgroup.procs"
        -> hangs... Waiting for LRU drain to finish on nohz_full CPUs.

As far as CPU isolation is concerned, this calls for better
understanding of the underlying issues. For example, should LRU be made
CPU isolation aware or should we deal with it at cgroup/cpuset level? In
the meantime, I figured this small optimization is worthwhile on its
own.

 kernel/cgroup/cpuset.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 44d234b0df5e..d497a65c4f04 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1634,6 +1634,11 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 {
 	struct cpuset_migrate_mm_work *mwork;

+	if (nodes_equal(*from, *to)) {
+		mmput(mm);
+		return;
+	}
+
 	mwork = kzalloc(sizeof(*mwork), GFP_KERNEL);
 	if (mwork) {
 		mwork->mm = mm;
-- 
2.31.1