linux-kernel - Re: Re: [PATCH 1/5] sched/fair: ignore SIS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1fc40679-b7c3-24f2-aa27-f1edab71228e@bytedance.com>
Date:   Mon, 5 Sep 2022 22:40:00 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Josh Don <joshdon@...gle.com>, Chen Yu <yu.c.chen@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

On 9/2/22 6:25 PM, Mel Gorman Wrote:
> For the simple case, I was expecting the static depth to *not* match load
> because it's unclear what the scaling should be for load or if it had a
> benefit. If investigating scaling the scan depth to load, it would still
> make sense to compare it to a static depth. The depth of 2 cores was to
> partially match the old SIS_PROP behaviour of the minimum depth to scan.
> 
>                  if (span_avg > 4*avg_cost)
>                          nr = div_u64(span_avg, avg_cost);
>                  else
>                          nr = 4;
> 
> nr is not proportional to cores although it could be
> https://lore.kernel.org/all/20210726102247.21437-7-mgorman@techsingularity.net/
> 
> This is not tested or properly checked for correctness but for
> illustrative purposes something like this should conduct a limited scan when
> overloaded. It has a side-effect that the has_idle_cores hint gets cleared
> for a partial scan for idle cores but the hint is probably wrong anyway.
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6089251a4720..59b27a2ef465 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>   		if (sd_share) {
>   			/* because !--nr is the condition to stop scan */
>   			nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> -			/* overloaded LLC is unlikely to have idle cpu/core */
> -			if (nr == 1)
> -				return -1;
> +
> +			/*
> +			 * Non-overloaded case: Scan full domain if there is
> +			 * 	an idle core. Otherwise, scan for an idle
> +			 * 	CPU based on nr_idle_scan
> +			 * Overloaded case: Unlikely to have an idle CPU but
> +			 * 	conduct a limited scan if there is potentially
> +			 * 	an idle core.
> +			 */
> +			if (nr > 1) {
> +				if (has_idle_core)
> +					nr = sd->span_weight;
> +			} else {
> +				if (!has_idle_core)
> +					return -1;
> +				nr = 2;
> +			}
>   		}
>   	}
>   
>   	for_each_cpu_wrap(cpu, cpus, target + 1) {
> +		if (!--nr)
> +			break;
> +
>   		if (has_idle_core) {
>   			i = select_idle_core(p, cpu, cpus, &idle_cpu);
>   			if ((unsigned int)i < nr_cpumask_bits)
>   				return i;
>   
>   		} else {
> -			if (!--nr)
> -				return -1;
>   			idle_cpu = __select_idle_cpu(cpu, p);
>   			if ((unsigned int)idle_cpu < nr_cpumask_bits)
>   				break;

I spent last few days testing this, with 3 variations (assume
has_idle_core):

  a) full or limited (2cores) scan when !nr_idle_scan
  b) whether clear sds->has_idle_core when partial scan failed
  c) scale scan depth with load or not

some observations:

  1) It seems always bad if not clear sds->has_idle_core when
     partial scan fails. It is due to over partially scanned
     but still can not find an idle core. (Following ones are
     based on clearing has_idle_core even in partial scans.)

  2) Unconditionally full scan when has_idle_core is not good
     for netperf_{udp,tcp} and tbench4. It is probably because
     the SIS success rate of these workloads is already high
     enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
     hackbench ~= 3.5%) which negate a lot of the benefit full
     scan brings.

  3) Scaling scan depth with load seems good for the hackbench
     socket tests, and neutral in pipe tests. And I think this
     is just the case you mentioned before, under fast wake-up
     workloads the has_idle_core will become not that reliable,
     so a full scan won't always win.

Best Regards,
Abel