lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d33b897912b91a118006d83dafc29c6ebe548361.1683033105.git.raghavendra.kt@amd.com>
Date:   Wed, 3 May 2023 07:35:49 +0530
From:   Raghavendra K T <raghavendra.kt@....com>
To:     <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
CC:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Mel Gorman" <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "David Hildenbrand" <david@...hat.com>, <rppt@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Bharata B Rao <bharata@....com>,
        Raghavendra K T <raghavendra.kt@....com>
Subject: [RFC PATCH V1 2/2] sched/numa: Introduce per vma numa_scan_seq

 Per vma scan counter was introduced to aid disjoint set vma
scanning in corner cases. But that counter needs reset regularly.

Reset is achieved after full round of mm scanning by per vma
numa_scan_sequence that follows mm->numa_scan_seq.

Result: With this patch series we recover mmtest's
numa01_THREAD_ALLOC performance as below

Base 11-apr-next
        w/numascan      w/o numascan    numascan+patch

real    1m33.579s       1m2.042s        1m11.738s
user    280m46.032s     213m38.647s     231m40.226s
sys     0m18.061s       6m54.963s       4m43.174s

In summary: it adds back some system overhaed of scanning disjoint
vma scanning, But still we are at huge advantage w.r.t base kernel

Signed-off-by: Raghavendra K T <raghavendra.kt@....com>
---
 include/linux/mm_types.h |  1 +
 kernel/sched/fair.c      | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f66e6b4e0620..9c0fc83118da 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -479,6 +479,7 @@ struct vma_numab_state {
 	unsigned long next_scan;
 	unsigned long next_pid_reset;
 	unsigned long access_pids[2];
+	unsigned int vma_scan_seq;
 	unsigned int scan_counter;
 };
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c50dc3893eb..dc011a2a31ac 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2935,6 +2935,7 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
 {
 	unsigned long pids;
 	unsigned int windows;
+	unsigned int mm_seq, vma_seq;
 	unsigned int scan_size = READ_ONCE(sysctl_numa_balancing_scan_size);
 
 	if (scan_size < MAX_SCAN_WINDOW)
@@ -2945,6 +2946,18 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
 
 	windows = max(VMA_DISJOINT_SET_ACCESS_THRESH, windows);
 
+	mm_seq = READ_ONCE(current->mm->numa_scan_seq);
+	vma_seq = READ_ONCE(vma->numab_state->vma_scan_seq);
+
+	if (vma_seq != mm_seq) {
+	/*
+	 * One more round of whole mm scan was done. Reset the vma scan_counter
+	 * and sync per vma numa_scan_seq.
+	 */
+		WRITE_ONCE(vma->numab_state->vma_scan_seq,
+					READ_ONCE(current->mm->numa_scan_seq));
+		WRITE_ONCE(vma->numab_state->scan_counter, 0);
+	}
 	/*
 	 * Make sure to allow scanning of disjoint vma set for the first
 	 * few times.
@@ -2954,8 +2967,7 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
 	 * This is also done to avoid any side effect of task scanning
 	 * amplifying the unfairness of disjoint set of VMAs' access.
 	 */
-	if (READ_ONCE(vma->numab_state->scan_counter) < windows ||
-		READ_ONCE(current->mm->numa_scan_seq) < 2)
+	if (READ_ONCE(vma->numab_state->scan_counter) < windows || mm_seq < 2)
 		return true;
 
 	pids = vma->numab_state->access_pids[0] | vma->numab_state->access_pids[1];
@@ -3078,6 +3090,8 @@ static void task_numa_work(struct callback_head *work)
 			vma->numab_state->next_pid_reset =  vma->numab_state->next_scan +
 				msecs_to_jiffies(VMA_PID_RESET_PERIOD);
 
+			WRITE_ONCE(vma->numab_state->vma_scan_seq,
+					READ_ONCE(current->mm->numa_scan_seq));
 			WRITE_ONCE(vma->numab_state->scan_counter, 0);
 		}
 
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ