lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250821042707.62993-1-adamli@os.amperecomputing.com>
Date: Thu, 21 Aug 2025 04:27:05 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: anna-maria@...utronix.de,
	frederic@...nel.org,
	tglx@...utronix.de,
	mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org,
	vschneid@...hat.com
Cc: dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	cl@...ux.com,
	linux-kernel@...r.kernel.org,
	patches@...erecomputing.com,
	Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH RESEND 0/2] tick/nohz: CPU cannot enter NOHZ idle balance state

Valentin Schneider suggested to resend this patch and copy to
scheduler reviewers [1].

When running llama on arm64 server, some CPUs *keep* idle while others
are 100% busy. All CPUs are in 'nohz_full=' cpu list, and CONFIG_NO_HZ_FULL
is set. The server has 192 CPUs, with kernel option 'nohz_full=0-191'.

The problem is caused by two issues:
1) Some idle CPUs cannot be added to 'nohz.idle_cpus_mask'. This bug
is fixed by the first patch in this serial:
"tick/nohz: Fix wrong NOHZ idle CPU state".

2) Even if the idle CPUs are in 'nohz.idle_cpus_mask', no CPU can be
selected to do NOHZ idle load balancing because conditions in
find_new_ilb() is too strict. This issue is fixed by patch in [2].

We can see that the idle CPUs are not in nohz.idle_cpus_mask. The NOHZ
idle load balancing only considers CPUs in nohz.idle_cpus_mask. The ticks
on the idle CPUs are stopped and therefore period load balancing
is not triggered. Therefore the CPUs are not used and the
imbalance persists.

A CPU is added to nohz.idle_cpus_mask in:
do_idle()
   -> tick_nohz_idle_stop_tick()
      -> nohz_balance_enter_idle()

nohz_balance_enter_idle() depends on '!was_stopped' condition.
It looks 'was_stopped' is used to avoid duplicated calling
nohz_balance_enter_idle() and duplicated setting 'ts->idle_jiffies'.

When the CPU is in nohz_full mode, 'was_stopped' may alwasy be true.
The call path might be:

tick_nohz_full_stop_tick() /* stop tick and set TS_FLAG_STOPPED */
... ...
do_idle()
    -> tick_nohz_idle_stop_tick() /* was_stoppped == 1 */

The first patch "Fix wrong NOHZ idle CPU state" makes
nohz_balance_enter_idle() independent of '!was_stopped'. It is safe
since in nohz_balance_enter_idle(), there exists a condition check
'rq->nohz_tick_stopped' to avoid duplicated nohz.idle_cpus_mask setting.

The second patch "Trigger warning when CPU in wrong NOHZ idle state"
is for debug only. It is not intended to be merged. The patch can help
to reproduce the bug.

Warning is triggerred when CPU is in this 'wrong' state:
1) tick was already stopped before tick_nohz_idle_stop_tick()
   stops the tick
2) and CPU is not in nohz.idle_cpus_mask
3) and CPU is idle
4) and tick is stopped

When kernel booting on my system there is warning:
[   15.536604] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:1230 tick_nohz_idle_stop_tick+0x148/0x160
[   15.550687] Modules linked in:
[   15.553731] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc1-cls-00002-g39cde4c0206e-dirty #109 VOLUNTARY
[   15.580390] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
<snip>
[   15.703028] Call trace:
[   15.705462]  tick_nohz_idle_stop_tick+0x148/0x160 (P)
[   15.710502]  cpuidle_idle_call+0x118/0x1d0
[   15.714588]  do_idle+0xf4/0x100
[   15.717717]  cpu_startup_entry+0x40/0x50
[   15.721627]  secondary_start_kernel+0xe4/0x128
[   15.732745]  __secondary_switched+0xc0/0xc8

After the first patch, CPU is added to nohz.idle_cpus_mask.
NOHZ idle balancing can move task to this CPU.

Adam Li (2):
  tick/nohz: Fix wrong NOHZ idle CPU state
  tick/nohz: Trigger warning when CPU in wrong NOHZ idle state

Links
[1]: https://lore.kernel.org/all/xhsmho6sagz7p.mognet@vschneid-thinkpadt14sgen2i.remote.csb/
[2]: https://lore.kernel.org/all/20250819025720.14794-1-adamli@os.amperecomputing.com/

 include/linux/sched/nohz.h | 2 ++
 kernel/sched/fair.c        | 5 +++++
 kernel/time/tick-sched.c   | 8 +++++---
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ