lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251201183146.74443-1-sshegde@linux.ibm.com>
Date: Tue,  2 Dec 2025 00:01:42 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: mingo@...nel.org, peterz@...radead.org, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, kprateek.nayak@....com
Cc: sshegde@...ux.ibm.com, dietmar.eggemann@....com, vschneid@...hat.com,
        rostedt@...dmis.org, tglx@...utronix.de, tim.c.chen@...ux.intel.com
Subject: [PATCH 0/4] sched/fair: improve nohz fields for large systems

It was noted when running on large systems nohz.nr_cpus cacheline was
bouncing quite often. There is atomic inc/dec and read happening on many
CPUs at a time and it is possible for this line to bounce often.

Gist of the series is to get rid of nr_cpus, instead use the cpumask
which is always updated alongside with it. Functionally it should serve
the same purpose. At worst, one might miss an idle load balance
happening due to race. Looking at comments, it might happen even today.

Other patches are minor ones. there are couple of time checks to bail
out. Check the variables after the time checks to avoid cache references
to it.

There is a series which aims to solve contention by moving to LLC.
https://lore.kernel.org/all/20250904041516.3046-1-kprateek.nayak@amd.com/
Maybe these bits are useful for that too. We could discuss further at
LPC.

Ran "hackbench 100 process 5000 loops" and collected perf cycles and
selected top nohz functions. Benchmark numbers don't change by much.
Will ask our performance team to do the numbers with the series.

baseline: tip sched/core at 3eb593560146

   1.01%  [k] nohz_balance_exit_idle
   0.31%  [k] nohz_balancer_kick
   0.05%  [k] nohz_balance_enter_idle

With series:
   0.45%  [k] nohz_balance_exit_idle
   0.18%  [k] nohz_balancer_kick
   0.01%  [k] nohz_balance_enter_idle


Shrikanth Hegde (4):
  sched/fair: Move checking for nohz cpus after time check
  sched/fair: Change likelyhood of nohz nr_cpus check
  sched/fair: Check for blocked task after time check
  sched/fair: Remove atomic nr_cpus and use cpumask instead

 kernel/sched/fair.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ