lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250424024523.2298272-2-libo.chen@oracle.com>
Date: Wed, 23 Apr 2025 19:45:22 -0700
From: Libo Chen <libo.chen@...cle.com>
To: akpm@...ux-foundation.org, rostedt@...dmis.org, peterz@...radead.org,
        mgorman@...e.de, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, tj@...nel.org, llong@...hat.com
Cc: sraithal@....com, venkat88@...ux.ibm.com, kprateek.nayak@....com,
        raghavendra.kt@....com, yu.c.chen@...el.com, tim.c.chen@...el.com,
        vineethr@...ux.ibm.com, chris.hyser@...cle.com,
        daniel.m.jordan@...cle.com, lorenzo.stoakes@...cle.com,
        mkoutny@...e.com, linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH v5 1/2] sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems

When the memory of the current task is pinned to one NUMA node by cgroup,
there is no point in continuing the rest of VMA scanning and hinting page
faults as they will just be overhead. With this change, there will be no
more unnecessary PTE updates or page faults in this scenario.

We have seen up to a 6x improvement on a typical java workload running on
VMs with memory and CPU pinned to one NUMA node via cpuset in a two-socket
AARCH64 system. With the same pinning, on a 18-cores-per-socket Intel
platform, we have seen 20% improvment in a microbench that creates a
30-vCPU selftest KVM guest with 4GB memory, where each vCPU reads 4KB
pages in a fixed number of loops.

Signed-off-by: Libo Chen <libo.chen@...cle.com>
Tested-by: Chen Yu <yu.c.chen@...el.com>
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e43993a4e580..c9903b1b3948 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3329,6 +3329,13 @@ static void task_numa_work(struct callback_head *work)
 	if (p->flags & PF_EXITING)
 		return;
 
+	/*
+	 * Memory is pinned to only one NUMA node via cpuset.mems, naturally
+	 * no page can be migrated.
+	 */
+	if (cpusets_enabled() && nodes_weight(cpuset_current_mems_allowed) == 1)
+		return;
+
 	if (!mm->numa_next_scan) {
 		mm->numa_next_scan = now +
 			msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
-- 
2.43.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ