[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca16370e-d646-4eee-b9cc-87277c89c43c@samsung.com>
Date: Tue, 29 Oct 2024 12:13:35 +0100
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Ming Lei <ming.lei@...hat.com>, Jens Axboe <axboe@...nel.dk>,
linux-block@...r.kernel.org
Cc: Christoph Hellwig <hch@....de>, Peter Zijlstra <peterz@...radead.org>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, Ingo
Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
linux-kernel@...r.kernel.org, Bart Van Assche <bvanassche@....org>
Subject: Re: [PATCH V2 3/3] block: model freeze & enter queue as lock for
supporting lockdep
On 25.10.2024 02:37, Ming Lei wrote:
> Recently we got several deadlock report[1][2][3] caused by
> blk_mq_freeze_queue and blk_enter_queue().
>
> Turns out the two are just like acquiring read/write lock, so model them
> as read/write lock for supporting lockdep:
>
> 1) model q->q_usage_counter as two locks(io and queue lock)
>
> - queue lock covers sync with blk_enter_queue()
>
> - io lock covers sync with bio_enter_queue()
>
> 2) make the lockdep class/key as per-queue:
>
> - different subsystem has very different lock use pattern, shared lock
> class causes false positive easily
>
> - freeze_queue degrades to no lock in case that disk state becomes DEAD
> because bio_enter_queue() won't be blocked any more
>
> - freeze_queue degrades to no lock in case that request queue becomes dying
> because blk_enter_queue() won't be blocked any more
>
> 3) model blk_mq_freeze_queue() as acquire_exclusive & try_lock
> - it is exclusive lock, so dependency with blk_enter_queue() is covered
>
> - it is trylock because blk_mq_freeze_queue() are allowed to run
> concurrently
>
> 4) model blk_enter_queue() & bio_enter_queue() as acquire_read()
> - nested blk_enter_queue() are allowed
>
> - dependency with blk_mq_freeze_queue() is covered
>
> - blk_queue_exit() is often called from other contexts(such as irq), and
> it can't be annotated as lock_release(), so simply do it in
> blk_enter_queue(), this way still covered cases as many as possible
>
> With lockdep support, such kind of reports may be reported asap and
> needn't wait until the real deadlock is triggered.
>
> For example, lockdep report can be triggered in the report[3] with this
> patch applied.
>
> [1] occasional block layer hang when setting 'echo noop > /sys/block/sda/queue/scheduler'
> https://bugzilla.kernel.org/show_bug.cgi?id=219166
>
> [2] del_gendisk() vs blk_queue_enter() race condition
> https://lore.kernel.org/linux-block/20241003085610.GK11458@google.com/
>
> [3] queue_freeze & queue_enter deadlock in scsi
> https://lore.kernel.org/linux-block/ZxG38G9BuFdBpBHZ@fedora/T/#u
>
> Reviewed-by: Christoph Hellwig <hch@....de>
> Signed-off-by: Ming Lei <ming.lei@...hat.com>
This patch landed yesterday in linux-next as commit f1be1788a32e
("block: model freeze & enter queue as lock for supporting lockdep").
In my tests I found that it introduces the following 2 lockdep warnings:
1. On Samsung Exynos 4412-based Odroid U3 board (ARM 32bit), observed
when booting it:
======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc4-00037-gf1be1788a32e #9290 Not tainted
------------------------------------------------------
find/1284 is trying to acquire lock:
cf3b8534 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x30/0x70
but task is already holding lock:
c203a0c8 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at:
iterate_dir+0x30/0x140
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&sb->s_type->i_mutex_key#2){++++}-{3:3}:
down_write+0x44/0xc4
start_creating+0x8c/0x170
debugfs_create_dir+0x1c/0x178
blk_register_queue+0xa0/0x1c0
add_disk_fwnode+0x210/0x434
brd_alloc+0x1cc/0x210
brd_init+0xac/0x104
do_one_initcall+0x64/0x30c
kernel_init_freeable+0x1c4/0x228
kernel_init+0x1c/0x12c
ret_from_fork+0x14/0x28
-> #3 (&q->debugfs_mutex){+.+.}-{3:3}:
__mutex_lock+0x94/0x94c
mutex_lock_nested+0x1c/0x24
blk_mq_init_sched+0x140/0x204
elevator_init_mq+0xb8/0x130
add_disk_fwnode+0x3c/0x434
mmc_blk_alloc_req+0x34c/0x464
mmc_blk_probe+0x1f4/0x6f8
really_probe+0xe0/0x3d8
__driver_probe_device+0x9c/0x1e4
driver_probe_device+0x30/0xc0
__device_attach_driver+0xa8/0x120
bus_for_each_drv+0x80/0xcc
__device_attach+0xac/0x1fc
bus_probe_device+0x8c/0x90
device_add+0x5a4/0x7cc
mmc_add_card+0x118/0x2c8
mmc_attach_mmc+0xd8/0x174
mmc_rescan+0x2ec/0x3a8
process_one_work+0x240/0x6d0
worker_thread+0x1a0/0x398
kthread+0x104/0x138
ret_from_fork+0x14/0x28
-> #2 (&q->q_usage_counter(io)#17){++++}-{0:0}:
blk_mq_submit_bio+0x8dc/0xb34
__submit_bio+0x78/0x148
submit_bio_noacct_nocheck+0x204/0x32c
ext4_mpage_readpages+0x558/0x7b0
read_pages+0x64/0x28c
page_cache_ra_unbounded+0x118/0x1bc
filemap_get_pages+0x124/0x7ec
filemap_read+0x174/0x5b0
__kernel_read+0x128/0x2c0
bprm_execve+0x230/0x7a4
kernel_execve+0xec/0x194
try_to_run_init_process+0xc/0x38
kernel_init+0xdc/0x12c
ret_from_fork+0x14/0x28
-> #1 (mapping.invalidate_lock#2){++++}-{3:3}:
down_read+0x44/0x224
filemap_fault+0x648/0x10f0
__do_fault+0x38/0xd4
handle_mm_fault+0xaf8/0x14d0
do_page_fault+0xe0/0x5c8
do_DataAbort+0x3c/0xb0
__dabt_svc+0x50/0x80
__clear_user_std+0x34/0x68
elf_load+0x1a8/0x208
load_elf_binary+0x3f4/0x13cc
bprm_execve+0x28c/0x7a4
do_execveat_common+0x150/0x198
sys_execve+0x30/0x38
ret_fast_syscall+0x0/0x1c
-> #0 (&mm->mmap_lock){++++}-{3:3}:
__lock_acquire+0x158c/0x2970
lock_acquire+0x130/0x384
__might_fault+0x50/0x70
filldir64+0x94/0x28c
dcache_readdir+0x174/0x260
iterate_dir+0x64/0x140
sys_getdents64+0x60/0x130
ret_fast_syscall+0x0/0x1c
other info that might help us debug this:
Chain exists of:
&mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(&sb->s_type->i_mutex_key#2);
lock(&q->debugfs_mutex);
lock(&sb->s_type->i_mutex_key#2);
rlock(&mm->mmap_lock);
*** DEADLOCK ***
2 locks held by find/1284:
#0: c3df1e88 (&f->f_pos_lock){+.+.}-{3:3}, at: fdget_pos+0x88/0xd0
#1: c203a0c8 (&sb->s_type->i_mutex_key#2){++++}-{3:3}, at:
iterate_dir+0x30/0x140
stack backtrace:
CPU: 1 UID: 0 PID: 1284 Comm: find Not tainted
6.12.0-rc4-00037-gf1be1788a32e #9290
Hardware name: Samsung Exynos (Flattened Device Tree)
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x88
dump_stack_lvl from print_circular_bug+0x31c/0x394
print_circular_bug from check_noncircular+0x16c/0x184
check_noncircular from __lock_acquire+0x158c/0x2970
__lock_acquire from lock_acquire+0x130/0x384
lock_acquire from __might_fault+0x50/0x70
__might_fault from filldir64+0x94/0x28c
filldir64 from dcache_readdir+0x174/0x260
dcache_readdir from iterate_dir+0x64/0x140
iterate_dir from sys_getdents64+0x60/0x130
sys_getdents64 from ret_fast_syscall+0x0/0x1c
Exception stack(0xf22b5fa8 to 0xf22b5ff0)
5fa0: 004b4fa0 004b4f80 00000004 004b4fa0 00008000
00000000
5fc0: 004b4fa0 004b4f80 00000001 000000d9 00000000 004b4af0 00000000
000010ea
5fe0: 004b1eb4 bea05af0 b6da4b08 b6da4a28
--->8---
2. On QEMU's ARM64 virt machine, observed during system suspend/resume
cycle:
# time rtcwake -s10 -mmem
rtcwake: wakeup from "mem" using /dev/rtc0 at Tue Oct 29 11:54:30 2024
PM: suspend entry (s2idle)
Filesystems sync: 0.004 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.007 seconds)
OOM killer disabled.
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.004 seconds)
======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc4+ #9291 Not tainted
------------------------------------------------------
rtcwake/1299 is trying to acquire lock:
ffff80008358a7f8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x1c/0x28
but task is already holding lock:
ffff000006136d68 (&q->q_usage_counter(io)#5){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&q->q_usage_counter(io)#5){++++}-{0:0}:
blk_mq_submit_bio+0x7c0/0x9d8
__submit_bio+0x74/0x164
submit_bio_noacct_nocheck+0x2d4/0x3b4
submit_bio_noacct+0x148/0x3fc
submit_bio+0x130/0x204
submit_bh_wbc+0x148/0x1bc
submit_bh+0x18/0x24
ext4_read_bh_nowait+0x70/0x100
ext4_sb_breadahead_unmovable+0x50/0x80
__ext4_get_inode_loc+0x354/0x654
ext4_get_inode_loc+0x40/0xa8
ext4_reserve_inode_write+0x40/0xf0
__ext4_mark_inode_dirty+0x88/0x300
ext4_ext_tree_init+0x40/0x4c
__ext4_new_inode+0x99c/0x1614
ext4_create+0xe4/0x1d4
lookup_open.isra.0+0x47c/0x540
path_openat+0x370/0x9e8
do_filp_open+0x80/0x130
do_sys_openat2+0xb4/0xe8
__arm64_compat_sys_openat+0x5c/0xa4
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc_compat+0x20/0x3c
el0_svc_compat+0x44/0xe0
el0t_32_sync_handler+0x98/0x148
el0t_32_sync+0x194/0x198
-> #2 (jbd2_handle){++++}-{0:0}:
start_this_handle+0x178/0x4e8
jbd2__journal_start+0x110/0x298
__ext4_journal_start_sb+0x9c/0x274
ext4_dirty_inode+0x38/0x88
__mark_inode_dirty+0x90/0x568
generic_update_time+0x50/0x64
touch_atime+0x2c0/0x324
ext4_file_mmap+0x68/0x88
mmap_region+0x448/0xa38
do_mmap+0x3dc/0x540
vm_mmap_pgoff+0xf8/0x1b4
ksys_mmap_pgoff+0x148/0x1f0
__arm64_compat_sys_aarch32_mmap2+0x20/0x2c
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc_compat+0x20/0x3c
el0_svc_compat+0x44/0xe0
el0t_32_sync_handler+0x98/0x148
el0t_32_sync+0x194/0x198
-> #1 (&mm->mmap_lock){++++}-{3:3}:
__might_fault+0x5c/0x80
inet_gifconf+0xcc/0x278
dev_ifconf+0xc4/0x1f8
sock_ioctl+0x234/0x384
compat_sock_ioctl+0x180/0x35c
__arm64_compat_sys_ioctl+0x14c/0x16c
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc_compat+0x20/0x3c
el0_svc_compat+0x44/0xe0
el0t_32_sync_handler+0x98/0x148
el0t_32_sync+0x194/0x198
-> #0 (rtnl_mutex){+.+.}-{3:3}:
__lock_acquire+0x1374/0x2224
lock_acquire+0x200/0x340
__mutex_lock+0x98/0x428
mutex_lock_nested+0x24/0x30
rtnl_lock+0x1c/0x28
virtnet_freeze_down.isra.0+0x20/0x9c
virtnet_freeze+0x44/0x60
virtio_device_freeze+0x68/0x94
virtio_mmio_freeze+0x14/0x20
platform_pm_suspend+0x2c/0x6c
dpm_run_callback+0xa4/0x218
device_suspend+0x12c/0x52c
dpm_suspend+0x10c/0x2e4
dpm_suspend_start+0x70/0x78
suspend_devices_and_enter+0x130/0x798
pm_suspend+0x1ac/0x2e8
state_store+0x8c/0x110
kobj_attr_store+0x18/0x2c
sysfs_kf_write+0x4c/0x78
kernfs_fop_write_iter+0x120/0x1b4
vfs_write+0x2b0/0x35c
ksys_write+0x68/0xf4
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc_compat+0x20/0x3c
el0_svc_compat+0x44/0xe0
el0t_32_sync_handler+0x98/0x148
el0t_32_sync+0x194/0x198
other info that might help us debug this:
Chain exists of:
rtnl_mutex --> jbd2_handle --> &q->q_usage_counter(io)#5
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->q_usage_counter(io)#5);
lock(jbd2_handle);
lock(&q->q_usage_counter(io)#5);
lock(rtnl_mutex);
*** DEADLOCK ***
9 locks held by rtcwake/1299:
#0: ffff0000069103f8 (sb_writers#7){.+.+}-{0:0}, at: vfs_write+0x1e4/0x35c
#1: ffff000007c0ae88 (&of->mutex#2){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0xf0/0x1b4
#2: ffff0000046906e8 (kn->active#24){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0xf8/0x1b4
#3: ffff800082fd9908 (system_transition_mutex){+.+.}-{3:3}, at:
pm_suspend+0x88/0x2e8
#4: ffff000006137678 (&q->q_usage_counter(io)#6){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
#5: ffff0000061376b0 (&q->q_usage_counter(queue)#2){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
#6: ffff000006136d68 (&q->q_usage_counter(io)#5){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
#7: ffff000006136da0 (&q->q_usage_counter(queue)){++++}-{0:0}, at:
virtblk_freeze+0x24/0x60
#8: ffff0000056208f8 (&dev->mutex){....}-{3:3}, at:
device_suspend+0xf8/0x52c
stack backtrace:
CPU: 0 UID: 0 PID: 1299 Comm: rtcwake Not tainted 6.12.0-rc4+ #9291
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x90/0xd0
dump_stack+0x18/0x24
print_circular_bug+0x298/0x37c
check_noncircular+0x15c/0x170
__lock_acquire+0x1374/0x2224
lock_acquire+0x200/0x340
__mutex_lock+0x98/0x428
mutex_lock_nested+0x24/0x30
rtnl_lock+0x1c/0x28
virtnet_freeze_down.isra.0+0x20/0x9c
virtnet_freeze+0x44/0x60
virtio_device_freeze+0x68/0x94
virtio_mmio_freeze+0x14/0x20
platform_pm_suspend+0x2c/0x6c
dpm_run_callback+0xa4/0x218
device_suspend+0x12c/0x52c
dpm_suspend+0x10c/0x2e4
dpm_suspend_start+0x70/0x78
suspend_devices_and_enter+0x130/0x798
pm_suspend+0x1ac/0x2e8
state_store+0x8c/0x110
kobj_attr_store+0x18/0x2c
sysfs_kf_write+0x4c/0x78
kernfs_fop_write_iter+0x120/0x1b4
vfs_write+0x2b0/0x35c
ksys_write+0x68/0xf4
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x48/0x110
el0_svc_common.constprop.0+0x40/0xe8
do_el0_svc_compat+0x20/0x3c
el0_svc_compat+0x44/0xe0
el0t_32_sync_handler+0x98/0x148
el0t_32_sync+0x194/0x198
virtio_blk virtio2: 1/0/0 default/read/poll queues
virtio_blk virtio3: 1/0/0 default/read/poll queues
OOM killer enabled.
Restarting tasks ... done.
random: crng reseeded on system resumption
PM: suspend exit
--->8---
Let me know if You need more information about my test systems.
> ---
> block/blk-core.c | 18 ++++++++++++++++--
> block/blk-mq.c | 26 ++++++++++++++++++++++----
> block/blk.h | 29 ++++++++++++++++++++++++++---
> block/genhd.c | 15 +++++++++++----
> include/linux/blkdev.h | 6 ++++++
> 5 files changed, 81 insertions(+), 13 deletions(-)
> ...
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists