lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xhsmh8qe3nj9n.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Mon, 12 Jan 2026 12:49:08 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, mingo@...nel.org,
 peterz@...radead.org, vincent.guittot@...aro.org,
 linux-kernel@...r.kernel.org
Cc: sshegde@...ux.ibm.com, kprateek.nayak@....com, juri.lelli@...hat.com,
 tglx@...nel.org, dietmar.eggemann@....com, anna-maria@...utronix.de,
 frederic@...nel.org, wangyang.guo@...el.com
Subject: Re: [PATCH v4 3/3] sched/fair: Remove nohz.nr_cpus and use weight
 of cpumask instead

On 12/01/26 10:34, Shrikanth Hegde wrote:
> nohz.nr_cpus was observed as contended cacheline when running
> enterprise workload on large systems.
>
> Fundamental scalability challenge with nohz.idle_cpus_mask
> and nohz.nr_cpus is the following:
>
>  (1) nohz_balancer_kick() observes (reads) nohz.nr_cpus
>      (or nohz.idle_cpu_mask) and nohz.has_blocked to  see whether there's
>      any nohz balancing work to do, in every scheduler tick.
>
>  (2) nohz_balance_enter_idle() and nohz_balance_exit_idle()
>      (through nohz_balancer_kick() via sched_tick()) modify (write)
>      nohz.nr_cpus (and/or nohz.idle_cpu_mask) and nohz.has_blocked.
>
> The characteristic frequencies are the following:
>
>  (1) nohz_balancer_kick() happens at scheduler (busy)tick frequency
>      on CPU(which has not gone idle). This is a relatively constant
>      frequency  in the ~1 kHz range or lower.
>
>  (2) happens at idle enter/exit frequency on every CPU that goes to idle.
>      This is workload dependent, but can easily be hundreds of kHz for
>      IO-bound loads and high CPU counts. Ie. can be orders of magnitude
>      higher than (1), in which case a cachemiss at every invocation of (1)
>      is almost inevitable. idle exit will trigger (1) on the CPU
>      which is coming out of idle.
>
> There's two types of costs from these functions:
>
>  (A) scheduler tick cost via (1): this happens on busy CPUs too, and is
>      thus a primary scalability cost. But the rate here is constant and
>      typically much lower than (B), hence the absolute benefit to workload
>      scalability will be lower as well.
>
>  (B) idle cost via (2): going-to-idle and coming-from-idle costs are
>      secondary concerns, because they impact power efficiency more than
>      they impact scalability. But in terms of absolute cost this scales
>      up with nr_cpus as well, and a much faster rate, and thus may also
>      approach and negatively impact system limits like
>      memory bus/fabric bandwidth.
>
> Note that nohz.idle_cpus_mask and nohz.nr_cpus may appear to reside in the
> same cacheline, however under CONFIG_CPUMASK_OFFSTACK=y the backing storage for
> nohz.idle_cpus_mask will be elsewhere. With CPUMASK_OFFSTACK=n,
> the nohz.idle_cpus_mask and rest of nohz fields are in different cachelines
> under typical NR_CPUS=512/2048. This implies two separate cachelines
> being dirtied upon idle entry / exit.
>
> nohz.nr_cpus can be derived from the mask itself. Its usage doesn't warrant
> a functionally correct value. This means one less cacheline being dirtied in
> idle entry/exit path which helps to save some bus bandwidth w.r.t to those
> nohz functions(approx 50%). This in turn helps to improve enterprise
> workload throughput.
>
> On system with 480 CPUs, running "hackbench 40 process 10000 loops"
> (Avg of 3 runs)
> baseline:
>      0.81%  hackbench          [k] nohz_balance_exit_idle
>      0.21%  hackbench          [k] nohz_balancer_kick
>      0.09%  swapper            [k] nohz_run_idle_balance
>
> With patch:
>      0.35%  hackbench          [k] nohz_balance_exit_idle
>      0.09%  hackbench          [k] nohz_balancer_kick
>      0.07%  swapper            [k] nohz_run_idle_balance
>
> [Ingo Molnar: scalability analysis changlog]
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>

Reviewed-by: Valentin Schneider <vschneid@...hat.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ