lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZNPedInw6Cgh++Xa@li-05afa54c-330e-11b2-a85c-e3f3aa0db1e9.ibm.com>
Date:   Thu, 10 Aug 2023 00:14:04 +0530
From:   Vishal Chourasia <vishalc@...ux.ibm.com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Cc:     peterz@...radead.org, vincent.guittot@...aro.org,
        srikar@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
        mingo@...hat.com, dietmar.eggemann@....com, mgorman@...e.de
Subject: Re: [RFC PATCH] sched/fair: Skip idle CPU search on busy system

On Wed, Jul 26, 2023 at 03:06:12PM +0530, Shrikanth Hegde wrote:
> When the system is fully busy, there will not be any idle CPU's.
> In that case, load_balance will be called mainly with CPU_NOT_IDLE
> type. In should_we_balance its currently checking for an idle CPU if
> one exist. When system is 100% busy, there will not be an idle CPU and
> these idle_cpu checks can be skipped. This would avoid fetching those rq
> structures.
> 
> This is a minor optimization for a specific case of 100% utilization.
> 
> .....
> Coming to the current implementation. It is a very basic approach to the
> issue. It may not be the best/perfect way to this.  It works only in
> case of CONFIG_NO_HZ_COMMON. nohz.nr_cpus is a global info available which
> tracks idle CPU's. AFAIU there isn't any other. If there is such info, we
> can use that instead. nohz.nr_cpus is atomic, which might be costly too.
> 
> Alternative way would be to add a new attribute to sched_domain and update
> it in cpu idle entry/exit path per CPU. Advantage is, check can be per
> env->sd instead of global. Slightly complicated, but maybe better. there
> could other advantage at wake up to limit the scan etc.
> 
> Your feedback would really help. Does this optimization makes sense?
> 
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
> ---
>  kernel/sched/fair.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 373ff5f55884..903d59b5290c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10713,6 +10713,12 @@ static int should_we_balance(struct lb_env *env)
>  		return 1;
>  	}
> 
> +#ifdef CONFIG_NO_HZ_COMMON
> +	/* If the system is fully busy, its better to skip the idle checks */
> +	if (env->idle == CPU_NOT_IDLE && atomic_read(&nohz.nr_cpus) == 0)
> +		return group_balance_cpu(sg) == env->dst_cpu;
> +#endif
> +
>  	/* Try to find first idle CPU */
>  	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
>  		if (!idle_cpu(cpu))
> --
> 2.31.1
> 
Tested this patchset on top of v6.4

5 Runs of stress-ng (100% load) on a system with 16CPUs spawning 23 threads for
60 minutes.

stress-ng: 16CPUs, 23threads, 60mins

- 6.4.0

| completion time(sec) |      user |        sys |
|----------------------+-----------+------------|
|              3600.05 |  57582.44 |       0.70 |
|              3600.10 |  57597.07 |       0.68 |
|              3600.05 |  57596.65 |       0.47 |
|              3600.04 |  57596.36 |       0.71 |
|              3600.06 |  57595.32 |       0.42 |
|              3600.06 | 57593.568 |      0.596 | average
|          0.046904158 | 12.508392 | 0.27878307 | stddev

- 6.4.0+ (with patch)

| completion time(sec) |      user |         sys |
|----------------------+-----------+-------------|
|              3600.04 |  57596.58 |        0.50 |
|              3600.04 |  57595.19 |        0.48 |
|              3600.05 |  57597.39 |        0.49 |
|              3600.04 |  57596.64 |        0.53 |
|              3600.04 |  57595.94 |        0.43 |
|             3600.042 | 57596.348 |       0.486 | average
|         0.0089442719 | 1.6529610 | 0.072938330 | stddev

The average system time is slightly lower in the patched version (0.486 seconds)
compared to the 6.4.0 version (0.596 seconds). 
The standard deviation for system time is also lower in the patched version
(0.0729 seconds) than in the 6.4.0 version (0.2788 seconds), suggesting more
consistent system time results with the patch.

vishal.c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ