lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 1 Feb 2023 13:32:22 +0530
From:   Raghavendra K T <raghavendra.kt@....com>
To:     <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
CC:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Mel Gorman" <mgorman@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "David Hildenbrand" <david@...hat.com>, <rppt@...nel.org>,
        Bharata B Rao <bharata@....com>,
        Disha Talreja <dishaa.talreja@....com>,
        Raghavendra K T <raghavendra.kt@....com>
Subject: [PATCH V2 3/3] sched/numa: Reset the accessing PID information periodically

 This helps to ensure, only recently accessed PIDs scan the
VMAs.

Current implementation:
 Reset accessing PIDs every (4 * sysctl_numa_balancing_scan_delay)
interval after initial scan delay period expires. The reset logic
is implemented in scan path

Suggested-by: Mel Gorman <mgorman@...hsingularity.net>
Signed-off-by: Raghavendra K T <raghavendra.kt@....com>
---
Some of the potential ideas for clearing the accessing PIDs

1) Flag to indicate phase in life cycle of vma and tie with timestamp (reuse next_scan or so)

VMA life cycle

t1         t2         t3                    t4         t5                   t6
|<-  DS  ->|<-  US  ->|<-        CS       ->|<-  US  ->|<-        CS       ->|
flags used to indicate whether we are in DS/CS/US phase

DS (delay scan): Initial phase where scan is avoided for new VMA
US (unconditional scan): Brief period where scanning is allowed irrespective of task faulting the VMA
CS (conditional scan) :  Longer conditiona scanning phase where task scanning is allowed only for VMA of interest  


2) Maintain duplicate list of accessing PIDs to keep track of history of access. and switch/reset. use OR operation during iteration

 Two lists of PIDs maintained. At regular interval old list is reset and we make current list as old list
At any point of time tracking of PIDs accessing VMA is determined by ORing list1 and list2  

accessing_pids_list1 <-  current list
accessing_pids_list2 <-  old list

3) Maintain per vma numa_seq also
Currently numa_seq (how many times we are scanning entire set of VMAs) is maintained at mm level.
Having numa_seq (almost like how many times the current VMA considered for scanning) per VMA may be helpful
in some context (for e.g., whether we need to allow VMA scanning unconditionally for a newly created VMA).

4) Reset accessing PIDs at regular intervals (current implementation)

t1       t2         t3         t4         t5         t6
|<- DS ->|<-  CS  ->|<-  CS  ->|<-  CS  ->|<-  CS  ->|

The current implementation resets accessing PIDs every 4*scan_delay intervals after initial scan delay
time expires. The reset logic is implemented in scan path

 include/linux/mm_types.h |  1 +
 kernel/sched/fair.c      | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 980a6a4308b6..08a007744ea1 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -437,6 +437,7 @@ struct anon_vma_name {
 
 struct vma_numab {
 	unsigned long next_scan;
+	unsigned long next_pid_reset;
 	unsigned long accessing_pids;
 };
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3505ae57c07c..14db6d8a5090 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2928,6 +2928,8 @@ static bool vma_is_accessed(struct vm_area_struct *vma)
 	return vma->numab->accessing_pids & (1UL << active_pid_bit);
 }
 
+#define VMA_PID_RESET_PERIOD (4 * sysctl_numa_balancing_scan_delay)
+
 /*
  * The expensive part of numa migration is done from task_work context.
  * Triggered from task_tick_numa().
@@ -3035,6 +3037,10 @@ static void task_numa_work(struct callback_head *work)
 
 			vma->numab->next_scan = now +
 				msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
+
+			/* Reset happens after 4 times scan delay of scan start */
+			vma->numab->next_pid_reset =  vma->numab->next_scan +
+				msecs_to_jiffies(VMA_PID_RESET_PERIOD);
 		}
 
 		/*
@@ -3047,6 +3053,17 @@ static void task_numa_work(struct callback_head *work)
 		if (!vma_is_accessed(vma))
 			continue;
 
+		/*
+		 * RESET accessing PIDs regularly for old VMAs. Resetting after checking
+		 * vma for recent access to avoid clearing PID info before access..
+		 */
+		if (mm->numa_scan_seq &&
+				time_after(jiffies, vma->numab->next_pid_reset)) {
+			vma->numab->next_pid_reset =  vma->numab->next_pid_reset +
+				msecs_to_jiffies(VMA_PID_RESET_PERIOD);
+			vma->numab->accessing_pids = 0;
+		}
+
 		do {
 			start = max(start, vma->vm_start);
 			end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ