[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250815065115.289337-1-adamli@os.amperecomputing.com>
Date: Fri, 15 Aug 2025 06:51:13 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: anna-maria@...utronix.de,
frederic@...nel.org,
mingo@...nel.org,
tglx@...utronix.de,
cl@...two.org
Cc: cl@...ux.com,
linux-kernel@...r.kernel.org,
patches@...erecomputing.com,
Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH 0/2] tick/nohz: CPU cannot enter NOHZ idle balance state
A few CPUs stay in idle while others are 100% busy when running llama
on an arm64 server. CONFIG_NO_HZ_FULL is set and all CPUs are in
the nohz_full list.
We can see that the idle CPUs are not in nohz.idle_cpus_mask. The NOHZ
idle load balancing only considers CPUs in nohz.idle_cpus_mask. The ticks
on the idle CPUs are stopped and therefore period load balancing
is not triggered. Therefore the CPUs are not used and the
imbalance persists.
A CPU is added to nohz.idle_cpus_mask in:
do_idle()
-> tick_nohz_idle_stop_tick()
-> nohz_balance_enter_idle()
nohz_balance_enter_idle() depends on '!was_stopped' condition.
It looks 'was_stopped' is used to avoid duplicated calling
nohz_balance_enter_idle() and duplicated setting 'ts->idle_jiffies'.
When the CPU is in nohz_full mode, 'was_stopped' may alwasy be true.
The call path might be:
tick_nohz_full_stop_tick() /* stop tick and set TS_FLAG_STOPPED */
... ...
do_idle()
-> tick_nohz_idle_stop_tick() /* was_stoppped == 1 */
The first patch "Fix wrong NOHZ idle CPU state" makes
nohz_balance_enter_idle() independent of '!was_stopped'. It is safe
since in nohz_balance_enter_idle(), there exists a condition check
'rq->nohz_tick_stopped' to avoid duplicated nohz.idle_cpus_mask setting.
The second patch "Trigger warning when CPU in wrong NOHZ idle state"
is for debug only. It is not intended to be merged. The patch can help
to reproduce the bug.
Warning is triggerred when CPU is in this 'wrong' state:
1) tick was already stopped before tick_nohz_idle_stop_tick()
stops the tick
2) and CPU is not in nohz.idle_cpus_mask
3) and CPU is idle
4) and tick is stopped
When kernel booting on my system there is warning:
[ 15.536604] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:1230 tick_nohz_idle_stop_tick+0x148/0x160
[ 15.550687] Modules linked in:
[ 15.553731] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc1-cls-00002-g39cde4c0206e-dirty #109 VOLUNTARY
[ 15.580390] pstate: 614000c9 (nZCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
<snip>
[ 15.703028] Call trace:
[ 15.705462] tick_nohz_idle_stop_tick+0x148/0x160 (P)
[ 15.710502] cpuidle_idle_call+0x118/0x1d0
[ 15.714588] do_idle+0xf4/0x100
[ 15.717717] cpu_startup_entry+0x40/0x50
[ 15.721627] secondary_start_kernel+0xe4/0x128
[ 15.732745] __secondary_switched+0xc0/0xc8
After the first patch, CPU is added to nohz.idle_cpus_mask.
NOHZ idle balancing can move task to this CPU.
Adam Li (2):
tick/nohz: Fix wrong NOHZ idle CPU state
tick/nohz: Trigger warning when CPU in wrong NOHZ idle state
include/linux/sched/nohz.h | 2 ++
kernel/sched/fair.c | 5 +++++
kernel/time/tick-sched.c | 8 +++++---
3 files changed, 12 insertions(+), 3 deletions(-)
--
2.34.1
Powered by blists - more mailing lists