[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CABXGCsMDOWXM8SQbmNsiXTh6ej87JKah3Wh_ze2dzG5mO5W98g@mail.gmail.com>
Date: Thu, 21 Aug 2025 11:56:00 +0500
From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To: sunjunchao@...edance.com, axboe@...nel.dk, nilay@...ux.ibm.com,
yukuai3@...wei.com, Ming Lei <ming.lei@...hat.com>, linux-block@...r.kernel.org,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: [REGRESSION] 6.17-rc2: lockdep circular dependency at boot introduced by 8f5845e0743b (“block: restore default wbt enablement”)
Hi,
After commit 8f5845e0743b (“block: restore default wbt enablement”)
I started seeing a lockdep warning about a circular locking dependency
on every boot.
Bisect
git bisect identifies 8f5845e0743bf3512b71b3cb8afe06c192d6acc4 as the
first bad commit.
Reverting this commit on top of 6.17.0-rc2-git-b19a97d57c15 makes the
warning disappear completely.
The warning looks like this:
[ 12.595070] nvme nvme0: 32/0/0 default/read/poll queues
[ 12.595566] nvme nvme1: 32/0/0 default/read/poll queues
[ 12.610697] ======================================================
[ 12.610705] WARNING: possible circular locking dependency detected
[ 12.610714] 6.17.0-rc2-git-b19a97d57c15+ #158 Not tainted
[ 12.610726] ------------------------------------------------------
[ 12.610734] kworker/u129:3/911 is trying to acquire lock:
[ 12.610743] ffffffff899ab700 (cpu_hotplug_lock){++++}-{0:0}, at:
static_key_slow_inc+0x16/0x40
[ 12.610760]
but task is already holding lock:
[ 12.610769] ffff8881d166d570
(&q->q_usage_counter(io)#4){++++}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.610787]
which lock already depends on the new lock.
[ 12.610798]
the existing dependency chain (in reverse order) is:
[ 12.610971]
-> #2 (&q->q_usage_counter(io)#4){++++}-{0:0}:
[ 12.611246] __lock_acquire+0x56a/0xbe0
[ 12.611381] lock_acquire.part.0+0xc7/0x270
[ 12.611518] blk_alloc_queue+0x5cd/0x720
[ 12.611649] blk_mq_alloc_queue+0x143/0x250
[ 12.611780] __blk_mq_alloc_disk+0x18/0xd0
[ 12.611906] nvme_alloc_ns+0x240/0x1930 [nvme_core]
[ 12.612042] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.612170] async_run_entry_fn+0x94/0x540
[ 12.612289] process_one_work+0x87a/0x14e0
[ 12.612406] worker_thread+0x5f2/0xfd0
[ 12.612527] kthread+0x3b0/0x770
[ 12.612641] ret_from_fork+0x3ef/0x510
[ 12.612760] ret_from_fork_asm+0x1a/0x30
[ 12.612875]
-> #1 (fs_reclaim){+.+.}-{0:0}:
[ 12.613102] __lock_acquire+0x56a/0xbe0
[ 12.613215] lock_acquire.part.0+0xc7/0x270
[ 12.613327] fs_reclaim_acquire+0xd9/0x130
[ 12.613444] __kmalloc_cache_node_noprof+0x60/0x4e0
[ 12.613560] amd_pmu_cpu_prepare+0x123/0x670
[ 12.613674] cpuhp_invoke_callback+0x2c8/0x9c0
[ 12.613791] __cpuhp_invoke_callback_range+0xbd/0x1f0
[ 12.613904] _cpu_up+0x2f8/0x6c0
[ 12.614015] cpu_up+0x11e/0x1c0
[ 12.614124] cpuhp_bringup_mask+0xea/0x130
[ 12.614231] bringup_nonboot_cpus+0xa9/0x170
[ 12.614335] smp_init+0x2b/0xf0
[ 12.614443] kernel_init_freeable+0x23f/0x2e0
[ 12.614545] kernel_init+0x1c/0x150
[ 12.614643] ret_from_fork+0x3ef/0x510
[ 12.614744] ret_from_fork_asm+0x1a/0x30
[ 12.614840]
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
[ 12.615029] check_prev_add+0xe1/0xcf0
[ 12.615126] validate_chain+0x4cf/0x740
[ 12.615221] __lock_acquire+0x56a/0xbe0
[ 12.615316] lock_acquire.part.0+0xc7/0x270
[ 12.615414] cpus_read_lock+0x40/0xe0
[ 12.615508] static_key_slow_inc+0x16/0x40
[ 12.615602] rq_qos_add+0x264/0x440
[ 12.615696] wbt_init+0x3b2/0x510
[ 12.615793] blk_register_queue+0x334/0x470
[ 12.615887] __add_disk+0x5fd/0xd50
[ 12.615980] add_disk_fwnode+0x113/0x590
[ 12.616073] nvme_alloc_ns+0x7be/0x1930 [nvme_core]
[ 12.616173] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.616272] async_run_entry_fn+0x94/0x540
[ 12.616366] process_one_work+0x87a/0x14e0
[ 12.616464] worker_thread+0x5f2/0xfd0
[ 12.616558] kthread+0x3b0/0x770
[ 12.616651] ret_from_fork+0x3ef/0x510
[ 12.616749] ret_from_fork_asm+0x1a/0x30
[ 12.616841]
other info that might help us debug this:
[ 12.617108] Chain exists of:
cpu_hotplug_lock --> fs_reclaim --> &q->q_usage_counter(io)#4
[ 12.617385] Possible unsafe locking scenario:
[ 12.617570] CPU0 CPU1
[ 12.617662] ---- ----
[ 12.617755] lock(&q->q_usage_counter(io)#4);
[ 12.617847] lock(fs_reclaim);
[ 12.617940] lock(&q->q_usage_counter(io)#4);
[ 12.618035] rlock(cpu_hotplug_lock);
[ 12.618129]
*** DEADLOCK ***
[ 12.618397] 7 locks held by kworker/u129:3/911:
[ 12.618495] #0: ffff8881083ba158
((wq_completion)async){+.+.}-{0:0}, at: process_one_work+0xe31/0x14e0
[ 12.618692] #1: ffffc900061b7d20
((work_completion)(&entry->work)){+.+.}-{0:0}, at:
process_one_work+0x7f9/0x14e0
[ 12.618906] #2: ffff888109c801a8
(&set->update_nr_hwq_lock){.+.+}-{4:4}, at: add_disk_fwnode+0xfd/0x590
[ 12.619132] #3: ffff8881d166dbb8 (&q->sysfs_lock){+.+.}-{4:4}, at:
blk_register_queue+0xdc/0x470
[ 12.619257] #4: ffff8881d166d798 (&q->rq_qos_mutex){+.+.}-{4:4},
at: wbt_init+0x39c/0x510
[ 12.619383] #5: ffff8881d166d570
(&q->q_usage_counter(io)#4){++++}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.619640] #6: ffff8881d166d5b0
(&q->q_usage_counter(queue)#4){+.+.}-{0:0}, at:
blk_mq_freeze_queue_nomemsave+0x16/0x30
[ 12.619913]
stack backtrace:
[ 12.620171] CPU: 6 UID: 0 PID: 911 Comm: kworker/u129:3 Not tainted
6.17.0-rc2-git-b19a97d57c15+ #158 PREEMPT(lazy)
[ 12.620173] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.30 06/16/2025
[ 12.620174] Workqueue: async async_run_entry_fn
[ 12.620177] Call Trace:
[ 12.620178] <TASK>
[ 12.620179] dump_stack_lvl+0x84/0xd0
[ 12.620182] print_circular_bug.cold+0x38/0x46
[ 12.620185] check_noncircular+0x14a/0x170
[ 12.620187] check_prev_add+0xe1/0xcf0
[ 12.620189] ? lock_acquire.part.0+0xc7/0x270
[ 12.620191] validate_chain+0x4cf/0x740
[ 12.620193] __lock_acquire+0x56a/0xbe0
[ 12.620196] lock_acquire.part.0+0xc7/0x270
[ 12.620197] ? static_key_slow_inc+0x16/0x40
[ 12.620199] ? rcu_is_watching+0x15/0xe0
[ 12.620202] ? __pfx___might_resched+0x10/0x10
[ 12.620204] ? static_key_slow_inc+0x16/0x40
[ 12.620205] ? lock_acquire+0xf6/0x140
[ 12.620207] cpus_read_lock+0x40/0xe0
[ 12.620209] ? static_key_slow_inc+0x16/0x40
[ 12.620210] static_key_slow_inc+0x16/0x40
[ 12.620212] rq_qos_add+0x264/0x440
[ 12.620213] wbt_init+0x3b2/0x510
[ 12.620215] ? wbt_enable_default+0x174/0x2b0
[ 12.620217] blk_register_queue+0x334/0x470
[ 12.620218] __add_disk+0x5fd/0xd50
[ 12.620220] ? wait_for_completion+0x17f/0x3c0
[ 12.620222] add_disk_fwnode+0x113/0x590
[ 12.620224] nvme_alloc_ns+0x7be/0x1930 [nvme_core]
[ 12.620232] ? __pfx_nvme_alloc_ns+0x10/0x10 [nvme_core]
[ 12.620241] ? __pfx_nvme_find_get_ns+0x10/0x10 [nvme_core]
[ 12.620249] ? __pfx_nvme_ns_info_from_identify+0x10/0x10 [nvme_core]
[ 12.620257] nvme_scan_ns+0x320/0x3b0 [nvme_core]
[ 12.620264] ? __pfx_nvme_scan_ns+0x10/0x10 [nvme_core]
[ 12.620271] ? __lock_release.isra.0+0x1cb/0x340
[ 12.620273] ? lockdep_hardirqs_on+0x8c/0x130
[ 12.620275] ? seqcount_lockdep_reader_access+0xb5/0xc0
[ 12.620277] ? seqcount_lockdep_reader_access+0xb5/0xc0
[ 12.620279] ? ktime_get+0x6a/0x180
[ 12.620281] async_run_entry_fn+0x94/0x540
[ 12.620282] process_one_work+0x87a/0x14e0
[ 12.620285] ? __pfx_process_one_work+0x10/0x10
[ 12.620287] ? local_clock_noinstr+0xf/0x130
[ 12.620289] ? assign_work+0x156/0x390
[ 12.620291] worker_thread+0x5f2/0xfd0
[ 12.620294] ? __pfx_worker_thread+0x10/0x10
[ 12.620295] kthread+0x3b0/0x770
[ 12.620297] ? local_clock_noinstr+0xf/0x130
[ 12.620298] ? __pfx_kthread+0x10/0x10
[ 12.620300] ? rcu_is_watching+0x15/0xe0
[ 12.620301] ? __pfx_kthread+0x10/0x10
[ 12.620303] ret_from_fork+0x3ef/0x510
[ 12.620305] ? __pfx_kthread+0x10/0x10
[ 12.620306] ? __pfx_kthread+0x10/0x10
[ 12.620307] ret_from_fork_asm+0x1a/0x30
[ 12.620310] </TASK>
[ 12.628224] nvme0n1: p1
[ 12.628699] nvme1n1: p1 p2 p3
It looks like enabling WBT by default causes wbt_init() → rq_qos_add()
to hit static_key_slow_inc(), which takes cpus_read_lock() (i.e.
cpu_hotplug_lock), while the worker already holds q_usage_counter(io),
creating the cycle reported by lockdep.
Environment / Repro:
Hardware: ASRock B650I Lightning WiFi (NVMe), link to probe below
Kernel: 6.17.0-rc2-git-b19a97d57c15 (self-built)
Repro: occurs deterministically on every boot during NVMe namespace scan
First bad commit: 8f5845e0743bf3512b71b3cb8afe06c192d6acc4
(“block: restore default wbt enablement”) — found by git bisect
Fix/workaround: revert 8f5845e0743b
Attachments:
Full dmesg (with the complete lockdep trace)
.config
Hardware probe: https://linux-hardware.org/?probe=9a6dd1ef4d
Happy to test any proposed patches or additional instrumentation.
Thanks for looking into it.
--
Best Regards,
Mike Gavrilov.
Download attachment "dmesg-6.17.0-rc2-git-b19a97d57c15.zip" of type "application/zip" (45945 bytes)
Download attachment ".config.zip" of type "application/zip" (70241 bytes)
Powered by blists - more mailing lists