linux-kernel - [peterz-queue:sched/cleanup] [sched] cfcabf4524: WARNING:possible_recursive_locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202509212119.eab661a8-lkp@intel.com>
Date: Sun, 21 Sep 2025 21:29:34 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: [peterz-queue:sched/cleanup] [sched]  cfcabf4524:
 WARNING:possible_recursive_locking_detected



Hello,

kernel test robot noticed "WARNING:possible_recursive_locking_detected" on:

commit: cfcabf45249df741fa733f41f7dbf98534e31b6b ("sched: Fix do_set_cpus_allowed() locking")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git sched/cleanup

in testcase: locktorture
version: 
with following parameters:

	runtime: 300s
	test: cpuhotplug



config: x86_64-randconfig-076-20250917
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202509212119.eab661a8-lkp@intel.com


[   95.960389][   T23]
[   95.961013][   T23] ============================================
[   95.961890][   T23] WARNING: possible recursive locking detected
[   95.962817][   T23] 6.17.0-rc4-00016-gcfcabf45249d #1 Tainted: G                T
[   95.967338][   T23] --------------------------------------------
[   95.976369][   T23] migration/1/23 is trying to acquire lock:
[ 95.977282][ T23] ffff8883a9dfa198 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[   95.978743][   T23]
[   95.978743][   T23] but task is already holding lock:
[ 95.979934][ T23] ffff8883a9dfa198 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[   95.981409][   T23]
[   95.981409][   T23] other info that might help us debug this:
[   95.982674][   T23]  Possible unsafe locking scenario:
[   95.982674][   T23]
[   95.984064][   T23]        CPU0
[   95.984613][   T23]        ----
[   95.985206][   T23]   lock(&rq->__lock);
[   95.985896][   T23]   lock(&rq->__lock);
[   95.986590][   T23]
[   95.986590][   T23]  *** DEADLOCK ***
[   95.986590][   T23]
[   95.988030][   T23]  May be due to missing lock nesting notation
[   95.988030][   T23]
[   95.989277][   T23] 3 locks held by migration/1/23:
[ 95.990078][ T23] #0: ffff888175d905b8 (&p->pi_lock){-.-.}-{2:2}, at: __balance_push_cpu_stop (kernel/sched/sched.h:1520 kernel/sched/sched.h:1847 kernel/sched/core.c:8098) 
[ 95.991598][ T23] #1: ffff8883a9dfa198 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[ 95.993052][ T23] #2: ffffffff8cd48ba0 (rcu_read_lock){....}-{1:3}, at: cpuset_cpus_allowed_fallback (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 kernel/cgroup/cpuset.c:4122) 
[   95.996014][   T23]
[   95.996014][   T23] stack backtrace:
[   95.996988][   T23] CPU: 1 UID: 0 PID: 23 Comm: migration/1 Tainted: G                T   6.17.0-rc4-00016-gcfcabf45249d #1 PREEMPT
[   95.996998][   T23] Tainted: [T]=RANDSTRUCT
[   95.997001][   T23] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 95.997005][ T23] Stopper: __balance_push_cpu_stop+0x0/0x320 <- balance_push (kernel/sched/core.c:8177) 
[   95.997018][   T23] Call Trace:
[   95.997022][   T23]  <TASK>
[ 95.997027][ T23] __dump_stack (lib/dump_stack.c:95) 
[ 95.997034][ T23] dump_stack_lvl (lib/dump_stack.c:123) 
[ 95.997041][ T23] dump_stack (lib/dump_stack.c:130) 
[ 95.997046][ T23] print_deadlock_bug (kernel/locking/lockdep.c:3043) 
[ 95.997054][ T23] __lock_acquire (kernel/locking/lockdep.c:?) 
[ 95.997062][ T23] ? kvm_sched_clock_read (arch/x86/kernel/kvmclock.c:91) 
[ 95.997070][ T23] ? sched_clock_noinstr (arch/x86/kernel/tsc.c:271) 
[ 95.997080][ T23] lock_acquire (kernel/locking/lockdep.c:5868) 
[ 95.997085][ T23] ? raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[ 95.997091][ T23] ? __lock_acquire (kernel/locking/lockdep.c:?) 
[ 95.997096][ T23] ? kvm_sched_clock_read (arch/x86/kernel/kvmclock.c:91) 
[ 95.997102][ T23] ? sched_clock_noinstr (arch/x86/kernel/tsc.c:271) 
[ 95.997109][ T23] ? raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[ 95.997114][ T23] _raw_spin_lock_nested (kernel/locking/spinlock.c:378) 
[ 95.997121][ T23] ? raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[ 95.997127][ T23] raw_spin_rq_lock_nested (kernel/sched/core.c:638) 
[ 95.997133][ T23] __task_rq_lock (include/linux/sched.h:2226) 
[ 95.997141][ T23] do_set_cpus_allowed (kernel/sched/sched.h:1825 kernel/sched/core.c:2742) 
[ 95.997149][ T23] ? cpuset_cpus_allowed_fallback (include/linux/rcupdate.h:331 include/linux/rcupdate.h:841 kernel/cgroup/cpuset.c:4122) 
[ 95.997157][ T23] cpuset_cpus_allowed_fallback (kernel/cgroup/cpuset.c:?) 
[ 95.997164][ T23] select_fallback_rq (kernel/sched/core.c:?) 
[ 95.997171][ T23] __balance_push_cpu_stop (kernel/sched/core.c:8103) 
[ 95.997178][ T23] ? __do_trace_sched_move_numa (kernel/sched/core.c:8091) 
[ 95.997183][ T23] cpu_stopper_thread (kernel/stop_machine.c:513) 
[ 95.997192][ T23] ? cpu_stop_should_run (kernel/stop_machine.c:488) 
[ 95.997200][ T23] smpboot_thread_fn (kernel/smpboot.c:?) 
[ 95.997210][ T23] ? smpboot_thread_fn (kernel/smpboot.c:?) 
[ 95.997218][ T23] kthread (kernel/kthread.c:465) 
[ 95.997225][ T23] ? smpboot_unregister_percpu_thread (kernel/smpboot.c:103) 
[ 95.997233][ T23] ? __do_trace_sched_kthread_stop_ret (kernel/kthread.c:412) 
[ 95.997240][ T23] ret_from_fork (arch/x86/kernel/process.c:154) 
[ 95.997247][ T23] ? __do_trace_sched_kthread_stop_ret (kernel/kthread.c:412) 
[ 95.997254][ T23] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) 
[   95.997263][   T23]  </TASK>
[  155.023652][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 57s!
[  155.059744][    C0] Showing busy workqueues and worker pools:
[  155.064892][    C0] workqueue events: flags=0x0
[  155.065729][    C0]   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=3 refcnt=5
[  155.065748][    C0]     in-flight: 9:work_for_cpu_fn BAR(476) ,10:vmstat_shepherd
[  155.065802][    C0]     pending: e1000_watchdog
[  155.065820][    C0] workqueue events_unbound: flags=0x2
[  155.069920][    C0]   pwq 10: cpus=0-1 node=0 flags=0x4 nice=0 active=1 refcnt=2
[  155.069941][    C0]     pending: crng_reseed
[  155.069956][    C0] workqueue events_power_efficient: flags=0x82
[  155.072924][    C0]   pwq 9: cpus=0-1 node=0 flags=0x4 nice=0 active=4 refcnt=5
[  155.072944][    C0]     pending: do_cache_clean, 2*neigh_periodic_work, check_lifetime
[  155.072972][    C0]   pwq 10: cpus=0-1 node=0 flags=0x4 nice=0 active=2 refcnt=3
[  155.072984][    C0]     pending: 2*neigh_managed_work
[  155.072996][    C0] workqueue mm_percpu_wq: flags=0x8
[  155.078563][    C0]   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
[  155.078581][    C0]     pending: vmstat_update
[  155.078679][    C0] workqueue ipv6_addrconf: flags=0x6000a
[  155.081392][    C0]   pwq 8: cpus=0-1 flags=0x4 nice=0 active=1 refcnt=4
[  155.081409][    C0]     pending: addrconf_verify_work
[  155.081427][    C0] pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=57s workers=4 idle: 223 94
[  155.081475][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250921/202509212119.eab661a8-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki