lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2a033ce4-184f-9e4e-4438-4c4d2e65315a@huaweicloud.com>
Date: Mon, 11 Aug 2025 17:19:13 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Mike Galbraith <efault@....de>, LKML <linux-kernel@...r.kernel.org>
Cc: linux-block <linux-block@...r.kernel.org>, Jens Axboe <axboe@...nel.dk>,
 Thomas Gleixner <tglx@...utronix.de>, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: v6.17-rc1 cpu_hotplug_lock deadlock splat - blk_mq_alloc_queue vs
 fs_reclaim

Hi,

在 2025/08/11 14:00, Mike Galbraith 写道:
> [    7.007387] ======================================================
> [    7.007710] WARNING: possible circular locking dependency detected
> [    7.007951] 6.17.0.g8f5ae30d-master #193 Not tainted
> [    7.008170] ------------------------------------------------------
> [    7.008387] (udev-worker)/708 is trying to acquire lock:
> [    7.008602] ffffffff88675e90 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_inc+0x12/0x30
> [    7.008826]
>                 but task is already holding lock:
> [    7.009244] ffff8a1a9994cb28 (&q->q_usage_counter(io)#68){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
> [    7.009464]
>                 which lock already depends on the new lock.
> 
> [    7.010096]
>                 the existing dependency chain (in reverse order) is:
> [    7.010509]
>                 -> #2 (&q->q_usage_counter(io)#68){++++}-{0:0}:
> [    7.011047]        __lock_acquire+0x550/0xbc0
> [    7.011364]        lock_acquire.part.0+0xa1/0x210
> [    7.011640]        blk_alloc_queue+0x30a/0x350
> [    7.011894]        blk_mq_alloc_queue+0x62/0xd0
> [    7.012152]        scsi_alloc_sdev+0x273/0x3a0 [scsi_mod]
> [    7.012436]        scsi_probe_and_add_lun+0x1e0/0x3f0 [scsi_mod]
> [    7.012833]        __scsi_add_device+0x109/0x120 [scsi_mod]
> [    7.013136]        ata_scsi_scan_host+0x9c/0x1b0 [libata]
> [    7.013398]        async_run_entry_fn+0x2c/0x110
> [    7.013749]        process_one_work+0x21f/0x5b0
> [    7.014072]        worker_thread+0x1ce/0x3c0
> [    7.014346]        kthread+0x119/0x210
> [    7.014639]        ret_from_fork+0x1a6/0x1f0
> [    7.014872]        ret_from_fork_asm+0x11/0x20
> [    7.015216]
>                 -> #1 (fs_reclaim){+.+.}-{0:0}:
> [    7.015689]        __lock_acquire+0x550/0xbc0
> [    7.015921]        lock_acquire.part.0+0xa1/0x210
> [    7.016185]        fs_reclaim_acquire+0x95/0xd0
> [    7.016460]        __kmalloc_cache_node_noprof+0x58/0x460
> [    7.016731]        intel_cpuc_prepare+0x5e/0x1c0
> [    7.017026]        cpuhp_invoke_callback+0x19e/0x650
> [    7.017272]        __cpuhp_invoke_callback_range+0x6d/0xd0
> [    7.017551]        _cpu_up+0xeb/0x230
> [    7.017757]        cpu_up+0xb4/0xd0
> [    7.017962]        cpuhp_bringup_mask+0x58/0x90
> [    7.018168]        bringup_nonboot_cpus+0x6b/0xf0
> [    7.018371]        smp_init+0x2a/0x80
> [    7.018570]        kernel_init_freeable+0x14e/0x1d0
> [    7.018769]        kernel_init+0x1a/0x130
> [    7.018968]        ret_from_fork+0x1a6/0x1f0
> [    7.019166]        ret_from_fork_asm+0x11/0x20
> [    7.019363]
>                 -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> [    7.019766]        check_prev_add+0xe8/0xca0
> [    7.019956]        validate_chain+0x48c/0x530
> [    7.020141]        __lock_acquire+0x550/0xbc0
> [    7.020326]        lock_acquire.part.0+0xa1/0x210
> [    7.020509]        cpus_read_lock+0x40/0xe0
> [    7.020690]        static_key_slow_inc+0x12/0x30
> [    7.020873]        rq_qos_add+0xc7/0x130
> [    7.021074]        wbt_init+0x156/0x1b0
> [    7.021253]        elevator_change_done+0x189/0x1d0
> [    7.021432]        elevator_change+0xeb/0x1a0
> [    7.021610]        elv_iosched_store+0x13d/0x170
> [    7.021788]        kernfs_fop_write_iter+0x14a/0x220
> [    7.021965]        vfs_write+0x213/0x520
> [    7.022139]        ksys_write+0x69/0xe0
> [    7.022311]        do_syscall_64+0x94/0xa10
> [    7.022483]        entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    7.022655]
>                 other info that might help us debug this:
> 
> [    7.023174] Chain exists of:
>                   cpu_hotplug_lock --> fs_reclaim --> &q->q_usage_counter(io)#68
> 
> [    7.023677]  Possible unsafe locking scenario:
> 
> [    7.024021]        CPU0                    CPU1
> [    7.024182]        ----                    ----
> [    7.024338]   lock(&q->q_usage_counter(io)#68);
> [    7.024496]                                lock(fs_reclaim);
> [    7.024653]                                lock(&q->q_usage_counter(io)#68);
> [    7.024813]   rlock(cpu_hotplug_lock);
> [    7.024971]
>                  *** DEADLOCK ***
> 
> [    7.025422] 7 locks held by (udev-worker)/708:
> [    7.025576]  #0: ffff8a1a872ad428 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x69/0xe0
> [    7.025737]  #1: ffff8a1a99733a88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x103/0x220
> [    7.025903]  #2: ffff8a1aa66e7428 (kn->active#23){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x220
> [    7.026069]  #3: ffff8a1a82414380 (&set->update_nr_hwq_lock){.+.+}-{4:4}, at: elv_iosched_store+0xfe/0x170
> [    7.026237]  #4: ffff8a1a9994cd30 (&q->rq_qos_mutex){+.+.}-{4:4}, at: wbt_init+0x141/0x1b0
> [    7.026404]  #5: ffff8a1a9994cb28 (&q->q_usage_counter(io)#68){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
> [    7.026576]  #6: ffff8a1a9994cb60 (&q->q_usage_counter(queue)#68){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x12/0x20
> [    7.026766]
>                 stack backtrace:
> [    7.027385] CPU: 0 UID: 0 PID: 708 Comm: (udev-worker) Not tainted 6.17.0.g8f5ae30d-master #193 PREEMPT(lazy)  51b04e6fd63e725774bb751c4ffe0152e5eaa466
> [    7.027388] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
> [    7.027390] Call Trace:
> [    7.027392]  <TASK>
> [    7.027395]  dump_stack_lvl+0x5b/0x80
> [    7.027399]  print_circular_bug.cold+0x38/0x45
> [    7.027403]  check_noncircular+0x12c/0x150
> [    7.027406]  ? save_trace+0x65/0x1e0
> [    7.027411]  check_prev_add+0xe8/0xca0
> [    7.027415]  validate_chain+0x48c/0x530
> [    7.027419]  __lock_acquire+0x550/0xbc0
> [    7.027423]  lock_acquire.part.0+0xa1/0x210
> [    7.027425]  ? static_key_slow_inc+0x12/0x30
> [    7.027430]  ? rcu_is_watching+0x11/0x40
> [    7.027434]  ? lock_acquire+0xee/0x130
> [    7.027437]  cpus_read_lock+0x40/0xe0
> [    7.027439]  ? static_key_slow_inc+0x12/0x30
> [    7.027441]  static_key_slow_inc+0x12/0x30
> [    7.027444]  rq_qos_add+0xc7/0x130
> [    7.027447]  wbt_init+0x156/0x1b0
> [    7.027450]  elevator_change_done+0x189/0x1d0
> [    7.027454]  elevator_change+0xeb/0x1a0
> [    7.027458]  elv_iosched_store+0x13d/0x170
> [    7.027463]  kernfs_fop_write_iter+0x14a/0x220
> [    7.027467]  vfs_write+0x213/0x520
> [    7.027474]  ksys_write+0x69/0xe0
> [    7.027477]  do_syscall_64+0x94/0xa10
> [    7.027480]  ? do_sys_openat2+0x8a/0xc0
> [    7.027484]  ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    7.027486]  ? lockdep_hardirqs_on+0x78/0x100
> [    7.027488]  ? do_syscall_64+0x139/0xa10
> [    7.027490]  ? find_held_lock+0x2b/0x80
> [    7.027493]  ? mntput_no_expire+0x91/0x470
> [    7.027495]  ? __lock_release.isra.0+0x5d/0x180
> [    7.027507]  ? rcu_is_watching+0x11/0x40
> [    7.027510]  ? mntput_no_expire+0x91/0x470
> [    7.027512]  ? mntput_no_expire+0x91/0x470
> [    7.027514]  ? lock_release+0x86/0x150
> [    7.027517]  ? __x64_sys_close+0x3d/0x80
> [    7.027519]  ? kmem_cache_free+0x13b/0x460
> [    7.027523]  ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    7.027525]  ? lockdep_hardirqs_on+0x78/0x100
> [    7.027527]  ? do_syscall_64+0x139/0xa10
> [    7.027529]  ? do_sys_openat2+0x8a/0xc0
> [    7.027531]  ? kmem_cache_free+0x13b/0x460
> [    7.027533]  ? lock_release+0x86/0x150
> [    7.027535]  ? __seccomp_filter+0x37/0x4e0
> [    7.027541]  ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    7.027543]  ? lockdep_hardirqs_on+0x78/0x100
> [    7.027545]  ? do_syscall_64+0x139/0xa10
> [    7.027548]  ? exc_page_fault+0x116/0x210
> [    7.027552]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [    7.027554] RIP: 0033:0x7f8889921000
> [    7.027557] Code: 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 80 3d 09 ca 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
> [    7.027559] RSP: 002b:00007ffeb10faba8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [    7.027562] RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f8889921000
> [    7.027563] RDX: 000000000000000b RSI: 00007ffeb10fae60 RDI: 0000000000000006
> [    7.027564] RBP: 00007ffeb10fae60 R08: 00007f88899fe2a8 R09: 00007ffeb10fac50
> [    7.027565] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000b
> [    7.027567] R13: 000055e71aef5fa0 R14: 00007f88899fdf60 R15: 0000000000000000
> [    7.027573]  </TASK>
> 

This is a known problem:

https://lore.kernel.org/all/20250805171749.3448694-1-nilay@linux.ibm.com/

Thanks,
Kuai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ