linux-kernel - Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Ybzq/A+HS/GxGYha@BLR-5CG11610CF.amd.com>
Date:   Sat, 18 Dec 2021 01:24:36 +0530
From:   "Gautham R. Shenoy" <gautham.shenoy@....com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Mike Galbraith <efault@....de>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when
 SD_NUMA spans multiple LLCs

On Fri, Dec 10, 2021 at 09:33:07AM +0000, Mel Gorman wrote:
[..snip..]

> @@ -9186,12 +9191,13 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
>  				return idlest;
>  #endif
>  			/*
> -			 * Otherwise, keep the task on this node to stay close
> -			 * its wakeup source and improve locality. If there is
> -			 * a real need of migration, periodic load balance will
> -			 * take care of it.
> +			 * Otherwise, keep the task on this node to stay local
> +			 * to its wakeup source if the number of running tasks
> +			 * are below the allowed imbalance. If there is a real
> +			 * need of migration, periodic load balance will take
> +			 * care of it.
>  			 */
> -			if (allow_numa_imbalance(local_sgs.sum_nr_running, sd->span_weight))
> +			if (local_sgs.sum_nr_running <= sd->imb_numa_nr)
>  				return NULL;

Considering the fact that we want to check whether or not the
imb_numa_nr threshold is going to be crossed if we let the new task
stay local, this should be

      	      	    	if (local_sgs.sum_nr_running + 1 <= sd->imb_numa_nr)
			   	return NULL;



Without this change, on a Zen3 configured with Nodes Per Socket
(NPS)=4, the lower NUMA domain with sd->imb_numa_nr = 2, has 4 groups
(each group corresponds to a NODE sched-domain), when we run stream
with 8 threads, we see 3 of them being initially placed in the local
group and the remaining 5 distributed across the other 4 groups. None
of these 3 tasks will never get migrated to any of the other 3 groups,
because those others have at least one task.

Eg:

PID 157811 : timestamp 108921.267293 : first placed in NODE 1
PID 157812 : timestamp 108921.269877 : first placed in NODE 1
PID 157813 : timestamp 108921.269921 : first placed in NODE 1
PID 157814 : timestamp 108921.270007 : first placed in NODE 2
PID 157815 : timestamp 108921.270065 : first placed in NODE 3
PID 157816 : timestamp 108921.270118 : first placed in NODE 0
PID 157817 : timestamp 108921.270168 : first placed in NODE 2
PID 157818 : timestamp 108921.270216 : first placed in NODE 3

With the fix mentioned above, we see the 8 threads uniformly
distributed across the 4 groups.

PID 7500 : timestamp 436.156429 : first placed in     NODE   1
PID 7501 : timestamp 436.159058 : first placed in     NODE   1
PID 7502 : timestamp 436.159106 : first placed in     NODE   2
PID 7503 : timestamp 436.159173 : first placed in     NODE   3
PID 7504 : timestamp 436.159219 : first placed in     NODE   0
PID 7505 : timestamp 436.159263 : first placed in     NODE   2
PID 7506 : timestamp 436.159305 : first placed in     NODE   3
PID 7507 : timestamp 436.159348 : first placed in     NODE   0


--
Thanks and Regards
gautham.