lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250815065115.289337-1-adamli@os.amperecomputing.com>
Date: Fri, 15 Aug 2025 06:51:13 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: anna-maria@...utronix.de,
	frederic@...nel.org,
	mingo@...nel.org,
	tglx@...utronix.de,
	cl@...two.org
Cc: cl@...ux.com,
	linux-kernel@...r.kernel.org,
	patches@...erecomputing.com,
	Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH 0/2] tick/nohz: CPU cannot enter NOHZ idle balance state

A few CPUs stay in idle while others are 100% busy when running llama
on an arm64 server. CONFIG_NO_HZ_FULL is set and all CPUs are in
the nohz_full list.

We can see that the idle CPUs are not in nohz.idle_cpus_mask. The NOHZ
idle load balancing only considers CPUs in nohz.idle_cpus_mask. The ticks
on the idle CPUs are stopped and therefore period load balancing
is not triggered. Therefore the CPUs are not used and the
imbalance persists.

A CPU is added to nohz.idle_cpus_mask in:
do_idle()
   -> tick_nohz_idle_stop_tick()
      -> nohz_balance_enter_idle()

nohz_balance_enter_idle() depends on '!was_stopped' condition.
It looks 'was_stopped' is used to avoid duplicated calling
nohz_balance_enter_idle() and duplicated setting 'ts->idle_jiffies'.

When the CPU is in nohz_full mode, 'was_stopped' may alwasy be true.
The call path might be:

tick_nohz_full_stop_tick() /* stop tick and set TS_FLAG_STOPPED */
... ...
do_idle()
    -> tick_nohz_idle_stop_tick() /* was_stoppped == 1 */

The first patch "Fix wrong NOHZ idle CPU state" makes
nohz_balance_enter_idle() independent of '!was_stopped'. It is safe
since in nohz_balance_enter_idle(), there exists a condition check
'rq->nohz_tick_stopped' to avoid duplicated nohz.idle_cpus_mask setting.

The second patch "Trigger warning when CPU in wrong NOHZ idle state"
is for debug only. It is not intended to be merged. The patch can help
to reproduce the bug.

Warning is triggerred when CPU is in this 'wrong' state:
1) tick was already stopped before tick_nohz_idle_stop_tick()
   stops the tick
2) and CPU is not in nohz.idle_cpus_mask
3) and CPU is idle
4) and tick is stopped

When kernel booting on my system there is warning:
[   15.536604] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:1230 tick_nohz_idle_stop_tick+0x148/0x160
[   15.550687] Modules linked in:
[   15.553731] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc1-cls-00002-g39cde4c0206e-dirty #109 VOLUNTARY
[   15.580390] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
<snip>
[   15.703028] Call trace:
[   15.705462]  tick_nohz_idle_stop_tick+0x148/0x160 (P)
[   15.710502]  cpuidle_idle_call+0x118/0x1d0
[   15.714588]  do_idle+0xf4/0x100
[   15.717717]  cpu_startup_entry+0x40/0x50
[   15.721627]  secondary_start_kernel+0xe4/0x128
[   15.732745]  __secondary_switched+0xc0/0xc8

After the first patch, CPU is added to nohz.idle_cpus_mask.
NOHZ idle balancing can move task to this CPU.

Adam Li (2):
  tick/nohz: Fix wrong NOHZ idle CPU state
  tick/nohz: Trigger warning when CPU in wrong NOHZ idle state

 include/linux/sched/nohz.h | 2 ++
 kernel/sched/fair.c        | 5 +++++
 kernel/time/tick-sched.c   | 8 +++++---
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ