linux-kernel - Re: [PATCH RESEND] autonuma: Fix scan period updating

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y30l5jdo.fsf@yhuang-dev.intel.com>
Date:   Fri, 26 Jul 2019 15:45:39 +0800
From:   "Huang\, Ying" <ying.huang@...el.com>
To:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, Rik van Riel <riel@...hat.com>,
        Mel Gorman <mgorman@...e.de>, <jhladky@...hat.com>,
        <lvenanci@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH RESEND] autonuma: Fix scan period updating

Hi, Srikar,

Srikar Dronamraju <srikar@...ux.vnet.ibm.com> writes:

> * Huang, Ying <ying.huang@...el.com> [2019-07-25 16:01:24]:
>
>> From: Huang Ying <ying.huang@...el.com>
>> 
>> From the commit log and comments of commit 37ec97deb3a8 ("sched/numa:
>> Slow down scan rate if shared faults dominate"), the autonuma scan
>> period should be increased (scanning is slowed down) if the majority
>> of the page accesses are shared with other processes.  But in current
>> code, the scan period will be decreased (scanning is speeded up) in
>> that situation.
>> 
>> The commit log and comments make more sense.  So this patch fixes the
>> code to make it match the commit log and comments.  And this has been
>> verified via tracing the scan period changing and /proc/vmstat
>> numa_pte_updates counter when running a multi-threaded memory
>> accessing program (most memory areas are accessed by multiple
>> threads).
>> 
>
> Lets split into 4 modes.
> More Local and Private Page Accesses:
> We definitely want to scan slowly i.e increase the scan window.
>
> More Local and Shared Page Accesses:
> We still want to scan slowly because we have consolidated and there is no
> point in scanning faster. So scan slowly + increase the scan window.
> (Do remember access on any active node counts as local!!!)
>
> More Remote + Private page Accesses:
> Most likely the Private accesses are going to be local accesses.
>
> In the unlikely event of the private accesses not being local, we should
> scan faster so that the memory and task consolidates.
>
> More Remote + Shared page Accesses: This means the workload has not
> consolidated and needs to scan faster. So we need to scan faster.

This sounds reasonable.  But

lr_ratio < NUMA_PERIOD_THRESHOLD

doesn't indicate More Remote.  If Local = Remote, it is also true.  If
there are also more Shared, we should slow down the scanning.  So, the
logic could be

if (lr_ratio >= NUMA_PERIOD_THRESHOLD)
    slow down scanning
else if (sp_ratio >= NUMA_PERIOD_THRESHOLD) {
    if (NUMA_PERIOD_SLOTS - lr_ratio >= NUMA_PERIOD_THRESHOLD)
        speed up scanning
    else
        slow down scanning
} else
   speed up scanning

This follows your idea better?

Best Regards,
Huang, Ying

> So I would think we should go back to before 37ec97deb3a8.
>
> i.e 
>
> 	int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
>
> 	if (!slot)
> 		slot = 1;
> 	diff = slot * period_slot;
>
>
> No?
>
>> Fixes: 37ec97deb3a8 ("sched/numa: Slow down scan rate if shared faults dominate")
>> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
>> Cc: Rik van Riel <riel@...hat.com>
>> Cc: Peter Zijlstra (Intel) <peterz@...radead.org>
>> Cc: Mel Gorman <mgorman@...e.de>
>> Cc: jhladky@...hat.com
>> Cc: lvenanci@...hat.com
>> Cc: Ingo Molnar <mingo@...nel.org>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> ---
>>  kernel/sched/fair.c | 20 ++++++++++----------
>>  1 file changed, 10 insertions(+), 10 deletions(-)
>> 
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 036be95a87e9..468a1c5038b2 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -1940,7 +1940,7 @@ static void update_task_scan_period(struct task_struct *p,
>>  			unsigned long shared, unsigned long private)
>>  {
>>  	unsigned int period_slot;
>> -	int lr_ratio, ps_ratio;
>> +	int lr_ratio, sp_ratio;
>>  	int diff;
>>  
>>  	unsigned long remote = p->numa_faults_locality[0];
>> @@ -1971,22 +1971,22 @@ static void update_task_scan_period(struct task_struct *p,
>>  	 */
>>  	period_slot = DIV_ROUND_UP(p->numa_scan_period, NUMA_PERIOD_SLOTS);
>>  	lr_ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote);
>> -	ps_ratio = (private * NUMA_PERIOD_SLOTS) / (private + shared);
>> +	sp_ratio = (shared * NUMA_PERIOD_SLOTS) / (private + shared);
>>  
>> -	if (ps_ratio >= NUMA_PERIOD_THRESHOLD) {
>> +	if (sp_ratio >= NUMA_PERIOD_THRESHOLD) {
>>  		/*
>> -		 * Most memory accesses are local. There is no need to
>> -		 * do fast NUMA scanning, since memory is already local.
>> +		 * Most memory accesses are shared with other tasks.
>> +		 * There is no point in continuing fast NUMA scanning,
>> +		 * since other tasks may just move the memory elsewhere.
>
> With this change, I would expect that with Shared page accesses,
> consolidation to take a hit.
>
>>  		 */
>> -		int slot = ps_ratio - NUMA_PERIOD_THRESHOLD;
>> +		int slot = sp_ratio - NUMA_PERIOD_THRESHOLD;
>>  		if (!slot)
>>  			slot = 1;
>>  		diff = slot * period_slot;
>>  	} else if (lr_ratio >= NUMA_PERIOD_THRESHOLD) {
>>  		/*
>> -		 * Most memory accesses are shared with other tasks.
>> -		 * There is no point in continuing fast NUMA scanning,
>> -		 * since other tasks may just move the memory elsewhere.
>> +		 * Most memory accesses are local. There is no need to
>> +		 * do fast NUMA scanning, since memory is already local.
>
> Comment wise this make sense.
>
>>  		 */
>>  		int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
>>  		if (!slot)
>> @@ -1998,7 +1998,7 @@ static void update_task_scan_period(struct task_struct *p,
>>  		 * yet they are not on the local NUMA node. Speed up
>>  		 * NUMA scanning to get the memory moved over.
>>  		 */
>> -		int ratio = max(lr_ratio, ps_ratio);
>> +		int ratio = max(lr_ratio, sp_ratio);
>>  		diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot;
>>  	}
>>  
>> -- 
>> 2.20.1
>>