lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAm5wTM3TB_s6H=4gs8VmbuFvkHbFMTqn5-ptFPdktHLQ@mail.gmail.com>
Date: Tue, 4 Jun 2024 16:37:01 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org, 
	Chen Yu <yu.c.chen@...el.com>, Vinicius Gomes <vinicius.gomes@...el.com>
Subject: Re: [PATCH] sched/balance: Skip unnecessary updates to idle load
 balancer's flags

On Fri, 31 May 2024 at 22:52, Tim Chen <tim.c.chen@...ux.intel.com> wrote:
>
> We observed that the overhead on trigger_load_balance(), now renamed
> sched_balance_trigger(), has risen with a system's core counts.
>
> For an OLTP workload running 6.8 kernel on a 2 socket x86 systems
> having 96 cores/socket, we saw that 0.7% cpu cycles are spent in
> trigger_load_balance(). On older systems with fewer cores/socket, this
> function's overhead was less than 0.1%.
>
> The cause of this overhead was that there are multiple cpus calling
> kick_ilb(flags), updating the balancing work needed to a common idle
> load balancer cpu. The ilb_cpu's flags field got updated unconditionally
> with atomic_fetch_or().  The atomic read and writes to ilb_cpu's flags
> causes much cache bouncing and cpu cycles overhead. This is seen in the
> annotated profile below.
>
>              kick_ilb():
>              if (ilb_cpu < 0)
>                test   %r14d,%r14d
>              ↑ js     6c
>              flags = atomic_fetch_or(flags, nohz_flags(ilb_cpu));
>                mov    $0x2d600,%rdi
>                movslq %r14d,%r8
>                mov    %rdi,%rdx
>                add    -0x7dd0c3e0(,%r8,8),%rdx
>              arch_atomic_read():
>   0.01         mov    0x64(%rdx),%esi
>  35.58         add    $0x64,%rdx
>              arch_atomic_fetch_or():
>
>              static __always_inline int arch_atomic_fetch_or(int i, atomic_t *v)
>              {
>              int val = arch_atomic_read(v);
>
>              do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i));
>   0.03  157:   mov    %r12d,%ecx
>              arch_atomic_try_cmpxchg():
>              return arch_try_cmpxchg(&v->counter, old, new);
>   0.00         mov    %esi,%eax
>              arch_atomic_fetch_or():
>              do { } while (!arch_atomic_try_cmpxchg(v, &val, val | i));
>                or     %esi,%ecx
>              arch_atomic_try_cmpxchg():
>              return arch_try_cmpxchg(&v->counter, old, new);
>   0.01         lock   cmpxchg %ecx,(%rdx)
>  42.96       ↓ jne    2d2
>              kick_ilb():
>
> With instrumentation, we found that 81% of the updates do not result in
> any change in the ilb_cpu's flags.  That is, multiple cpus are asking
> the ilb_cpu to do the same things over and over again, before the ilb_cpu
> has a chance to run NOHZ load balance.
>
> Skip updates to ilb_cpu's flags if no new work needs to be done.
> Such updates do not change ilb_cpu's NOHZ flags.  This requires an extra
> atomic read but it is less expensive than frequent unnecessary atomic
> updates that generate cache bounces.
>
> We saw that on the OLTP workload, cpu cycles from trigger_load_balance()
> (or sched_balance_trigger()) got reduced from 0.7% to 0.2%.

Make sense, we have seen other variables being a bottleneck in the
scheduler like task_group's load_avg or root domain's overload.

Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>

>
> Signed-off-by: Tim Chen <tim.c.chen@...ux.intel.com>
> ---
>  kernel/sched/fair.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..9ab6dff6d8ac 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11891,6 +11891,13 @@ static void kick_ilb(unsigned int flags)
>         if (ilb_cpu < 0)
>                 return;
>
> +       /*
> +        * Don't bother if no new NOHZ balance work items for ilb_cpu,
> +        * i.e. all bits in flags are already set in ilb_cpu.
> +        */
> +       if ((atomic_read(nohz_flags(ilb_cpu)) & flags) == flags)
> +               return;
> +
>         /*
>          * Access to rq::nohz_csd is serialized by NOHZ_KICK_MASK; he who sets
>          * the first flag owns it; cleared by nohz_csd_func().
> --
> 2.32.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ