lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Aug 2021 12:54:15 +0200
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Cc:     tj@...nel.org, lizefan.x@...edance.com, hannes@...xchg.org,
        mtosatti@...hat.com, nilal@...hat.com, frederic@...nel.org,
        longman@...hat.com, Nicolas Saenz Julienne <nsaenzju@...hat.com>
Subject: [PATCH] cgroup/cpuset: Avoid memory migration when nodemasks match

With the introduction of ee9707e8593d ("cgroup/cpuset: Enable memory
migration for cpuset v2") attaching a process to a different cgroup will
trigger a memory migration regardless of whether it's really needed.
Memory migration is an expensive operation, so bypass it if the
nodemasks passed to cpuset_migrate_mm() are equal.

Note that we're not only avoiding the migration work itself, but also a
call to lru_cache_disable(), which triggers and flushes an LRU drain
work on every online CPU.

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>

---

NOTE: This also alleviates hangs I stumbled upon while testing
linux-next on systems with nohz_full CPUs (running latency sensitive
loads). ee9707e8593d's newly imposed memory migration never finishes, as
the LRU drain is never scheduled on isolated CPUs.

I tried to follow the user-space call trace, it's something like this:

  Create new tmux pane, which triggers hostname operation, hangs...
    -> systemd (pid 1) creates new hostnamed process (using clone())
      -> hostnamed process attaches itself to:
  	 "system.slice/systemd-hostnamed.service/cgroup.procs"
        -> hangs... Waiting for LRU drain to finish on nohz_full CPUs.

As far as CPU isolation is concerned, this calls for better
understanding of the underlying issues. For example, should LRU be made
CPU isolation aware or should we deal with it at cgroup/cpuset level? In
the meantime, I figured this small optimization is worthwhile on its
own.

 kernel/cgroup/cpuset.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 44d234b0df5e..d497a65c4f04 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1634,6 +1634,11 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
 {
 	struct cpuset_migrate_mm_work *mwork;
 
+	if (nodes_equal(*from, *to)) {
+		mmput(mm);
+		return;
+	}
+
 	mwork = kzalloc(sizeof(*mwork), GFP_KERNEL);
 	if (mwork) {
 		mwork->mm = mm;
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ