[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <8218e149-7718-4432-9312-f97297c352b9@linux.ibm.com>
Date: Wed, 8 Oct 2025 07:41:10 +0530
From: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
To: LKML <linux-kernel@...r.kernel.org>,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>,
Shrikanth Hegde <sshegde@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>, jstultz@...gle.com,
stultz@...gle.com
Subject: [bisected][mainline]Kernel warnings at kernel/sched/cpudeadline.c:219
Greetings!!!
IBM CI has reported a kernel warnings while running CPU hot plug
operation on IBM Power9 system.
Command to reproduce the issue:
drmgr -c cpu -r -q 1
Git Bisect is pointing to below commit as the first bad commit.
4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c
Traces:
[ 464.306613] ------------[ cut here ]------------
[ 464.306628] WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219
cpudl_set+0x58/0x170
[ 464.306641] Modules linked in: rpadlpar_io(E) rpaphp(E)
nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E)
nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E)
bonding(E) nft_ct(E) tls(E) rfkill(E) nft_chain_nat(E) ip_set(E) hvcs(E)
ibmveth(E) pseries_rng(E) hvcserver(E) vmx_crypto(E) sg(E)
dm_multipath(E) drm(E) dm_mod(E) fuse(E) drm_panel_orientation_quirks(E)
ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E)
ibmvscsi(E) scsi_transport_srp(E)
[ 464.306703] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G E
6.17.0-gfd94619c4336 #1 VOLUNTARY
[ 464.306711] Tainted: [E]=UNSIGNED_MODULE
[ 464.306714] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[ 464.306720] NIP: c0000000002b6ed8 LR: c0000000002b7cb8 CTR:
c0000000002b7df0
[ 464.306725] REGS: c000000002c2f5d0 TRAP: 0700 Tainted: G E
(6.17.0-gfd94619c4336)
[ 464.306730] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 22000228
XER: 00000000
[ 464.306743] CFAR: c0000000002b726c IRQMASK: 3
[ 464.306743] GPR00: c0000000002b7cb8 c000000002c2f870 c000000001df8100
c000000002d6a710
[ 464.306743] GPR04: 000000000000001e 0000006c566f51e0 0000000000000000
c000000002d6adb0
[ 464.306743] GPR08: 00000000ffffffff 0000000000000001 c000000002cac488
0000000000000000
[ 464.306743] GPR12: c0000000030a7000 c000000002fa0000 0000000000000000
0000000000000000
[ 464.306743] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 464.306743] GPR20: c0000009e940ac20 0000006c1aa50360 0000000000000001
0000000000000002
[ 464.306743] GPR24: 0000000000000000 0000000000000000 0000000000000003
c0000009e940ab80
[ 464.306743] GPR28: 000000000000001e 0000006c566f51e0 c000000002d6a710
000000000000001e
[ 464.306804] NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
[ 464.306809] LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[ 464.306815] Call Trace:
[ 464.306818] [c000000002c2f870] [c000000002c2f8c0]
init_stack+0x78c0/0x8000 (unreliable)
[ 464.306828] [c000000002c2f8c0] [c0000000002b7cb8]
dl_server_timer+0x168/0x2a0
[ 464.306835] [c000000002c2f920] [c00000000034df84]
__hrtimer_run_queues+0x1a4/0x390
[ 464.306842] [c000000002c2f9b0] [c00000000034f624]
hrtimer_interrupt+0x124/0x300
[ 464.306849] [c000000002c2fa60] [c00000000002a230]
timer_interrupt+0x140/0x320
[ 464.306856] [c000000002c2fac0] [c000000000009ffc]
decrementer_common_virt+0x28c/0x290
[ 464.306865] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
[ 464.306872] NIP: c0000000001b75d8 LR: c0000000001bf274 CTR:
0000000000000000
[ 464.306877] REGS: c000000002c2faf0 TRAP: 0900 Tainted: G E
(6.17.0-gfd94619c4336)
[ 464.306882] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR: 24000228 XER: 20040000
[ 464.306897] CFAR: 0000000000000000 IRQMASK: 0
[ 464.306897] GPR00: 0000000000000000 c000000002c2fd90 c000000001df8100
0000000000000000
[ 464.306897] GPR04: 0000000000000010 000000002c000040 0000000000000002
0000000000000040
[ 464.306897] GPR08: 0000000000000000 0000000000000310 0000000000000031
0000000000000000
[ 464.306897] GPR12: 00000000d02f71f1 c000000002fa0000 0000000000000000
0000000000000000
[ 464.306897] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 464.306897] GPR20: 0000000000c00000 0000000000000008 0000000000000000
0000000000000000
[ 464.306897] GPR24: 0000000000000000 c000000000000000 c00000000a6e0000
c000000002cad0c0
[ 464.306897] GPR28: 0000000000000001 c0000000022418e0 c0000000022418e8
c0000000022418e0
[ 464.306956] NIP [c0000000001b75d8] plpar_hcall_norets_notrace+0x18/0x2c
[ 464.306962] LR [c0000000001bf274] pseries_lpar_idle.part.0+0x74/0x160
[ 464.306967] ---- interrupt: 900
[ 464.306970] [c000000002c2fd90] [c0000009e940b3b0] 0xc0000009e940b3b0
(unreliable)
[ 464.306984] [c000000002c2fe10] [c0000000000212fc]
arch_cpu_idle+0x4c/0x110
[ 464.306993] [c000000002c2fe30] [c00000000134ddd0]
default_idle_call+0x50/0x140
[ 464.307001] [c000000002c2fe50] [c0000000002b4fdc]
cpuidle_idle_call+0x1ac/0x240
[ 464.307007] [c000000002c2fea0] [c0000000002b5164] do_idle+0xf4/0x1a0
[ 464.307013] [c000000002c2fef0] [c0000000002b5498]
cpu_startup_entry+0x48/0x50
[ 464.307020] [c000000002c2ff20] [c0000000000113cc] rest_init+0xec/0xf0
[ 464.307026] [c000000002c2ff50] [c0000000020052e0] do_initcalls+0x0/0x18c
[ 464.307034] [c000000002c2ffe0] [c00000000000ea9c]
start_here_common+0x1c/0x20
[ 464.307040] Code: 549c06be 7c9f2378 7cbd2b78 7c7e1b78 39494388
5489e8f8 f8010010 f821ffb1 7d2a482a 7d29e436 552907fe 69290001
<0b090000> 490a428d 60000000 e93e0010
[ 464.307060] ---[ end trace 0000000000000000 ]---
[ 464.736380] ------------[ cut here ]------------
[ 464.736397] WARNING: CPU: 0 PID: 0 at kernel/sched/cpudeadline.c:219
cpudl_set+0x58/0x170
[ 464.736408] Modules linked in: rpadlpar_io(E) rpaphp(E)
nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E)
nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E)
bonding(E) nft_ct(E) tls(E) rfkill(E) nft_chain_nat(E) ip_set(E) hvcs(E)
ibmveth(E) pseries_rng(E) hvcserver(E) vmx_crypto(E) sg(E)
dm_multipath(E) drm(E) dm_mod(E) fuse(E) drm_panel_orientation_quirks(E)
ext4(E) crc16(E) mbcache(E) jbd2(E) sr_mod(E) sd_mod(E) cdrom(E)
ibmvscsi(E) scsi_transport_srp(E)
[ 464.736468] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G W E
6.17.0-gfd94619c4336 #1 VOLUNTARY
[ 464.736476] Tainted: [W]=WARN, [E]=UNSIGNED_MODULE
[ 464.736480] Hardware name: IBM,8375-42A POWER9 (architected) 0x4e0202
0xf000005 of:IBM,FW950.80 (VL950_131) hv:phyp pSeries
[ 464.736486] NIP: c0000000002b6ed8 LR: c0000000002b7cb8 CTR:
c0000000002b7df0
[ 464.736491] REGS: c000000002c2f4f0 TRAP: 0700 Tainted: G W E
(6.17.0-gfd94619c4336)
[ 464.736497] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 22000424
XER: 00000000
[ 464.736509] CFAR: c0000000002b726c IRQMASK: 3
[ 464.736509] GPR00: c0000000002b7cb8 c000000002c2f790 c000000001df8100
c000000002d6a710
[ 464.736509] GPR04: 000000000000001f 0000006c700d1304 0000000000000000
c000000002d6adb0
[ 464.736509] GPR08: 00000000ffffffff 0000000000000001 c000000002cac488
0000000000000000
[ 464.736509] GPR12: c0000000030a7000 c000000002fa0000 0000000000000000
0000000000000000
[ 464.736509] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 464.736509] GPR20: c0000009e940ac20 0000006c3442c73b 0000000000000001
0000000000000002
[ 464.736509] GPR24: 0000000000000000 0000000000000000 0000000000000003
c0000009e940ab80
[ 464.736509] GPR28: 000000000000001f 0000006c700d1304 c000000002d6a710
000000000000001f
[ 464.736569] NIP [c0000000002b6ed8] cpudl_set+0x58/0x170
[ 464.736574] LR [c0000000002b7cb8] dl_server_timer+0x168/0x2a0
[ 464.736580] Call Trace:
[ 464.736582] [c000000002c2f790] [c000000002c2f7e0]
init_stack+0x77e0/0x8000 (unreliable)
[ 464.736592] [c000000002c2f7e0] [c0000000002b7cb8]
dl_server_timer+0x168/0x2a0
[ 464.736599] [c000000002c2f840] [c00000000034df84]
__hrtimer_run_queues+0x1a4/0x390
[ 464.736606] [c000000002c2f8d0] [c00000000034f624]
hrtimer_interrupt+0x124/0x300
[ 464.736613] [c000000002c2f980] [c00000000002a230]
timer_interrupt+0x140/0x320
[ 464.736620] [c000000002c2f9e0] [c000000000009ffc]
decrementer_common_virt+0x28c/0x290
[ 464.736627] ---- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c
[ 464.736634] NIP: c0000000001b75d8 LR: c00000000134dfe8 CTR:
0000000000000000
[ 464.736638] REGS: c000000002c2fa10 TRAP: 0900 Tainted: G W E
(6.17.0-gfd94619c4336)
[ 464.736644] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
CR: 22000424 XER: 20040000
[ 464.736659] CFAR: 0000000000000000 IRQMASK: 0
[ 464.736659] GPR00: 0000000000000000 c000000002c2fcb0 c000000001df8100
0000000000000000
[ 464.736659] GPR04: 0000000000000010 000000002c000040 0000000000000002
0000000000000040
[ 464.736659] GPR08: 0000000000000000 0000000000000290 0000000000000029
0000000000000000
[ 464.736659] GPR12: 00000000d02f74a9 c000000002fa0000 0000000000000000
0000000000000000
[ 464.736659] GPR16: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 464.736659] GPR20: 0000000000c00000 0000000000000008 0000000000000000
0000000000000000
[ 464.736659] GPR24: 0000000000000000 0000000000000000 0000006c3469239a
0000000000000001
[ 464.736659] GPR28: c0000009e9419cc0 0000000000000001 c0000000022418e0
c0000000022418e8
[ 464.736717] NIP [c0000000001b75d8] plpar_hcall_norets_notrace+0x18/0x2c
[ 464.736723] LR [c00000000134dfe8] check_and_cede_processor+0x48/0x60
[ 464.736730] ---- interrupt: 900
[ 464.736733] [c000000002c2fcb0] [c0000000026a1080]
init_task+0x0/0x1d00 (unreliable)
[ 464.736741] [c000000002c2fd10] [c00000000134e210]
shared_cede_loop+0x70/0x170
[ 464.736748] [c000000002c2fd50] [c00000000134d830]
cpuidle_enter_state+0x2b0/0x648
[ 464.736756] [c000000002c2fdf0] [c000000000e09f70] cpuidle_enter+0x50/0x80
[ 464.736764] [c000000002c2fe30] [c0000000002ad868] call_cpuidle+0x48/0x90
[ 464.736772] [c000000002c2fe50] [c0000000002b4f94]
cpuidle_idle_call+0x164/0x240
[ 464.736779] [c000000002c2fea0] [c0000000002b5164] do_idle+0xf4/0x1a0
[ 464.736785] [c000000002c2fef0] [c0000000002b549c]
cpu_startup_entry+0x4c/0x50
[ 464.736791] [c000000002c2ff20] [c0000000000113cc] rest_init+0xec/0xf0
[ 464.736797] [c000000002c2ff50] [c0000000020052e0] do_initcalls+0x0/0x18c
[ 464.736804] [c000000002c2ffe0] [c00000000000ea9c]
start_here_common+0x1c/0x20
[ 464.736810] Code: 549c06be 7c9f2378 7cbd2b78 7c7e1b78 39494388
5489e8f8 f8010010 f821ffb1 7d2a482a 7d29e436 552907fe 69290001
<0b090000> 490a428d 60000000 e93e0010
[ 464.736831] ---[ end trace 0000000000000000 ]---
[ 493.843328] Non-volatile memory driver v1.3
Git Bisect logs:
git bisect bad
4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c is the first bad commit
commit 4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c (HEAD)
Author: Peter Zijlstra <peterz@...radead.org>
Date: Tue Sep 16 23:02:41 2025 +0200
sched/deadline: Fix dl_server getting stuck
John found it was easy to hit lockup warnings when running locktorture
on a 2 CPU VM, which he bisected down to: commit cccb45d7c429
("sched/deadline: Less agressive dl_server handling").
While debugging it seems there is a chance where we end up with the
dl_server dequeued, with dl_se->dl_server_active. This causes
dl_server_start() to return without enqueueing the dl_server, thus it
fails to run when RT tasks starve the cpu.
When this happens, dl_server_timer() catches the
'!dl_se->server_has_tasks(dl_se)' case, which then calls
replenish_dl_entity() and dl_server_stopped() and finally return
HRTIMER_NO_RESTART.
This ends in no new timer and also no enqueue, leaving the dl_server
'dead', allowing starvation.
What should have happened is for the bandwidth timer to start the
zero-laxity timer, which in turn would enqueue the dl_server and cause
dl_se->server_pick_task() to be called -- which will stop the
dl_server if no fair tasks are observed for a whole period.
IOW, it is totally irrelevant if there are fair tasks at the moment of
bandwidth refresh.
This removes all dl_se->server_has_tasks() users, so remove the whole
thing.
Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server
handling")
Reported-by: John Stultz <jstultz@...gle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Tested-by: John Stultz <jstultz@...gle.com>
include/linux/sched.h | 1 -
kernel/sched/deadline.c | 12 +-----------
kernel/sched/fair.c | 7 +------
kernel/sched/sched.h | 4 ----
4 files changed, 2 insertions(+), 22 deletions(-)
# git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [038d61fd642278bab63ee8ef722c50d10ab01e8f] Linux 6.16
git bisect good 038d61fd642278bab63ee8ef722c50d10ab01e8f
# status: waiting for bad commit, 1 good commit known
# bad: [c746c3b5169831d7fb032a1051d8b45592ae8d78] Merge tag
'for-6.18-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect bad c746c3b5169831d7fb032a1051d8b45592ae8d78
# good: [e25079858627916b22c4a789005a90a9fae808d8] Merge branch
'net-better-drop-accounting'
git bisect good e25079858627916b22c4a789005a90a9fae808d8
# bad: [05a54fa773284d1a7923cdfdd8f0c8dabb98bd26] Merge tag
'sound-6.18-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 05a54fa773284d1a7923cdfdd8f0c8dabb98bd26
# bad: [ae28ed4578e6d5a481e39c5a9827f27048661fdd] Merge tag
'bpf-next-6.18' of
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect bad ae28ed4578e6d5a481e39c5a9827f27048661fdd
# bad: [6855f06042ae8d134f96c63feb5dfb3943c6d789] Merge tag
'i2c-for-6.17-rc8' of
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect bad 6855f06042ae8d134f96c63feb5dfb3943c6d789
# good: [3d1e36499e02457f8de0edc9d87783cce97e8677] Merge tag
'gpio-fixes-for-v6.17-rc5' of
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect good 3d1e36499e02457f8de0edc9d87783cce97e8677
# good: [86cc796e5e9bff0c3993607f4301b8188095516c] Merge tag 'for-linus'
of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 86cc796e5e9bff0c3993607f4301b8188095516c
# good: [f975f08c2e899ae2484407d7bba6bb7f8b6d9d40] Merge tag
'for-6.17-rc6-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect good f975f08c2e899ae2484407d7bba6bb7f8b6d9d40
# good: [4ff71af020ae59ae2d83b174646fc2ad9fcd4dc4] Merge tag
'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good 4ff71af020ae59ae2d83b174646fc2ad9fcd4dc4
# good: [f26a24662cd2875f82029e28879a20cea212214c] Merge tag
'v6.17rc7-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good f26a24662cd2875f82029e28879a20cea212214c
# bad: [51a24b7deaae5c3561965f5b4b27bb9d686add1c] Merge tag
'trace-tools-v6.17-rc5' of
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect bad 51a24b7deaae5c3561965f5b4b27bb9d686add1c
# bad: [083fc6d7fa0d974a3663b97c8b0466737a544236] Merge tag
'sched-urgent-2025-09-26' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 083fc6d7fa0d974a3663b97c8b0466737a544236
# good: [2cea0ed9796381b142f46bd8de97bb6b54b1df61] Merge tag
'locking-urgent-2025-09-26' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 2cea0ed9796381b142f46bd8de97bb6b54b1df61
# bad: [a3a70caf7906708bf9bbc80018752a6b36543808] sched/deadline: Fix
dl_server behaviour
git bisect bad a3a70caf7906708bf9bbc80018752a6b36543808
# bad: [4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c] sched/deadline: Fix
dl_server getting stuck
git bisect bad 4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c
# first bad commit: [4ae8d9aa9f9dc7137ea5e564d79c5aa5af1bc45c]
sched/deadline: Fix dl_server getting stuck
If you happen to fix this, please add below tag.
Reported-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
Regards,
Venkat.
Powered by blists - more mailing lists