linux-kernel - Re: [PATCH 5/6] sched/numa: Reset scan rate whenever task moves across nodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180910084808.GE48257@gmail.com>
Date:   Mon, 10 Sep 2018 10:48:08 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rik van Riel <riel@...riel.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 5/6] sched/numa: Reset scan rate whenever task moves
 across nodes


* Srikar Dronamraju <srikar@...ux.vnet.ibm.com> wrote:

> Currently task scan rate is reset when numa balancer migrates the task
> to a different node. If numa balancer initiates a swap, reset is only
> applicable to the task that initiates the swap. Similarly no scan rate
> reset is done if the task is migrated across nodes by traditional load
> balancer.
> 
> Instead move the scan reset to the migrate_task_rq. This ensures the
> task moved out of its preferred node, either gets back to its preferred
> node quickly or finds a new preferred node. Doing so, would be fair to
> all tasks migrating across nodes.
> 
> specjbb2005 / bops/JVM / higher bops are better
> on 2 Socket/2 Node Intel
> JVMS  Prev    Current  %Change
> 4     210118  208862   -0.597759
> 1     313171  307007   -1.96825
> 
> 
> on 2 Socket/4 Node Power8 (PowerNV)
> JVMS  Prev     Current  %Change
> 8     91027.5  89911.4  -1.22611
> 1     216460   216176   -0.131202
> 
> 
> on 2 Socket/2 Node Power9 (PowerNV)
> JVMS  Prev    Current  %Change
> 4     191918  196078   2.16759
> 1     207043  214664   3.68088
> 
> 
> on 4 Socket/4 Node Power7
> JVMS  Prev     Current  %Change
> 8     58462.1  60719.2  3.86079
> 1     108334   112615   3.95167
> 
> 
> dbench / transactions / higher numbers are better
> on 2 Socket/2 Node Intel
> count  Min      Max      Avg      Variance  %Change
> 5      11851.8  11937.3  11890.9  33.5169
> 5      12511.7  12559.4  12539.5  15.5883   5.45459
> 
> 
> on 2 Socket/4 Node Power8 (PowerNV)
> count  Min      Max      Avg      Variance  %Change
> 5      4791     5016.08  4962.55  85.9625
> 5      4709.28  4979.28  4919.32  105.126   -0.871125
> 
> 
> on 2 Socket/2 Node Power9 (PowerNV)
> count  Min      Max      Avg     Variance  %Change
> 5      9353.43  9380.49  9369.6  9.04361
> 5      9388.38  9406.29  9395.1  5.98959   0.272157
> 
> 
> on 4 Socket/4 Node Power7
> count  Min      Max      Avg      Variance  %Change
> 5      149.518  215.412  179.083  21.5903
> 5      157.71   184.929  174.754  10.7275   -2.41731
> 
> Signed-off-by: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
> ---
>  kernel/sched/fair.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a5936ed..4ea0eff 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1837,12 +1837,6 @@ static int task_numa_migrate(struct task_struct *p)
>  	if (env.best_cpu == -1)
>  		return -EAGAIN;
>  
> -	/*
> -	 * Reset the scan period if the task is being rescheduled on an
> -	 * alternative node to recheck if the tasks is now properly placed.
> -	 */
> -	p->numa_scan_period = task_scan_start(p);
> -
>  	best_rq = cpu_rq(env.best_cpu);
>  	if (env.best_task == NULL) {
>  		ret = migrate_task_to(p, env.best_cpu);
> @@ -6361,6 +6355,19 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu __maybe_unus
>  
>  	/* We have migrated, no longer consider this task hot */
>  	p->se.exec_start = 0;
> +
> +#ifdef CONFIG_NUMA_BALANCING
> +	if (!p->mm || (p->flags & PF_EXITING))
> +		return;
> +
> +	if (p->numa_faults) {
> +		int src_nid = cpu_to_node(task_cpu(p));
> +		int dst_nid = cpu_to_node(new_cpu);
> +
> +		if (src_nid != dst_nid)
> +			p->numa_scan_period = task_scan_start(p);
> +	}
> +#endif

Please don't add #ifdeffery inside functions, especially not if they do weird flow control like 
a 'return' from the middle of a block.

A properly named inline helper would work I suppose.

Thanks,

	Ingo