lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZRfKKxBzfu+kf0tM@chenyu5-mobl2.ccr.corp.intel.com>
Date:   Sat, 30 Sep 2023 15:11:39 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Aaron Lu <aaron.lu@...el.com>, Tim Chen <tim.c.chen@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>, <x86@...nel.org>
Subject: Re: [RFC PATCH] sched/fair: Bias runqueue selection towards almost
 idle prev CPU

Hi Mathieu,

On 2023-09-29 at 14:33:50 -0400, Mathieu Desnoyers wrote:
> Introduce the WAKEUP_BIAS_PREV_IDLE scheduler feature. It biases
> select_task_rq towards the previous CPU if it was almost idle
> (avg_load <= 0.1%).

Yes, this is a promising direction IMO. One question is that,
can cfs_rq->avg.load_avg be used for percentage comparison?
If I understand correctly, load_avg reflects that more than
1 tasks could have been running this runqueue, and the
load_avg is the direct proportion to the load_weight of that
cfs_rq. Besides, LOAD_AVG_MAX seems to not be the max value
that load_avg can reach, it is the sum of
1024 * (y + y^1 + y^2 ... )

For example,
taskset -c 1 nice -n -20 stress -c 1
cat /sys/kernel/debug/sched/debug | grep 'cfs_rq\[1\]' -A 12 | grep "\.load_avg"
  .load_avg                      : 88763
  .load_avg                      : 1024

88763 is higher than LOAD_AVG_MAX=47742
Maybe the util_avg can be used for precentage comparison I suppose?

> It eliminates frequent task migrations from almost
> idle CPU to completely idle CPUs. This is achieved by using the CPU
> load of the previously used CPU as "almost idle" criterion in
> wake_affine_idle() and select_idle_sibling().
> 
> The following benchmarks are performed on a v6.5.5 kernel with
> mitigations=off.
> 
> This speeds up the following hackbench workload on a 192 cores AMD EPYC
> 9654 96-Core Processor (over 2 sockets):
> 
> hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100
> 
> from 49s to 32s. (34% speedup)
> 
> We can observe that the number of migrations is reduced significantly
> (-94%) with this patch, which may explain the speedup:
> 
> Baseline:      118M cpu-migrations  (9.286 K/sec)
> With patch:      7M cpu-migrations  (0.709 K/sec)
> 
> As a consequence, the stalled-cycles-backend are reduced:
> 
> Baseline:     8.16% backend cycles idle
> With patch:   6.70% backend cycles idle
> 
> Interestingly, the rate of context switch increases with the patch, but
> it does not appear to be an issue performance-wise:
> 
> Baseline:     454M context-switches (35.677 K/sec)
> With patch:   654M context-switches (62.290 K/sec)
> 
> This was developed as part of the investigation into a weird regression
> reported by AMD where adding a raw spinlock in the scheduler context
> switch accelerated hackbench. It turned out that changing this raw
> spinlock for a loop of 10000x cpu_relax within do_idle() had similar
> benefits.
> 
> This patch achieves a similar effect without the busy-waiting by
> allowing select_task_rq to favor almost idle previously used CPUs based
> on the CPU load of that CPU. The threshold of 0.1% avg_load for almost
> idle CPU load has been identified empirically using the hackbench
> workload.
> 
> Feedback is welcome. I am especially interested to learn whether this
> patch has positive or detrimental effects on performance of other
> workloads.
> 
> Link: https://lore.kernel.org/r/09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com
> Link: https://lore.kernel.org/lkml/20230725193048.124796-1-mathieu.desnoyers@efficios.com/
> Link: https://lore.kernel.org/lkml/20230810140635.75296-1-mathieu.desnoyers@efficios.com/
> Link: https://lore.kernel.org/lkml/20230810140635.75296-1-mathieu.desnoyers@efficios.com/
> Link: https://lore.kernel.org/lkml/f6dc1652-bc39-0b12-4b6b-29a2f9cd8484@amd.com/
> Link: https://lore.kernel.org/lkml/20230822113133.643238-1-mathieu.desnoyers@efficios.com/
> Link: https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/
> Link: https://lore.kernel.org/lkml/20230905171105.1005672-1-mathieu.desnoyers@efficios.com/
> Link: https://lore.kernel.org/lkml/cover.1695704179.git.yu.c.chen@intel.com/
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Valentin Schneider <vschneid@...hat.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Swapnil Sapkal <Swapnil.Sapkal@....com>
> Cc: Aaron Lu <aaron.lu@...el.com>
> Cc: Chen Yu <yu.c.chen@...el.com>
> Cc: Tim Chen <tim.c.chen@...el.com>
> Cc: K Prateek Nayak <kprateek.nayak@....com>
> Cc: Gautham R . Shenoy <gautham.shenoy@....com>
> Cc: x86@...nel.org
> ---
>  kernel/sched/fair.c     | 18 +++++++++++++-----
>  kernel/sched/features.h |  6 ++++++
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1d9c2482c5a3..65a7d923ea61 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6599,6 +6599,14 @@ static int wake_wide(struct task_struct *p)
>  	return 1;
>  }
>  
> +static bool
> +almost_idle_cpu(int cpu, struct task_struct *p)
> +{
> +	if (!sched_feat(WAKEUP_BIAS_PREV_IDLE))
> +		return false;
> +	return cpu_load_without(cpu_rq(cpu), p) <= LOAD_AVG_MAX / 1000;

Or
return cpu_util_without(cpu_rq(cpu), p) * 1000 <= capacity_orig_of(cpu) ?

thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ