[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <451c8746-5260-4be6-b78d-54305c94ef73@vivo.com>
Date: Tue, 16 Jul 2024 17:30:42 +0800
From: YangYang <yang.yang@...o.com>
To: Yu Kuai <yukuai1@...weicloud.com>, "yukuai (C)" <yukuai3@...wei.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] block: fix deadlock between sd_remove & sd_release
On 2024/7/16 17:15, Yu Kuai wrote:
> Hi,
>
> 在 2024/07/16 16:38, Yang Yang 写道:
>> Our test report the following hung task:
>>
>> [ 2538.459400] INFO: task "kworker/0:0":7 blocked for more than 188 seconds.
>> [ 2538.459427] Call trace:
>> [ 2538.459430] __switch_to+0x174/0x338
>> [ 2538.459436] __schedule+0x628/0x9c4
>> [ 2538.459442] schedule+0x7c/0xe8
>> [ 2538.459447] schedule_preempt_disabled+0x24/0x40
>> [ 2538.459453] __mutex_lock+0x3ec/0xf04
>> [ 2538.459456] __mutex_lock_slowpath+0x14/0x24
>> [ 2538.459459] mutex_lock+0x30/0xd8
>> [ 2538.459462] del_gendisk+0xdc/0x350
>> [ 2538.459466] sd_remove+0x30/0x60
>> [ 2538.459470] device_release_driver_internal+0x1c4/0x2c4
>> [ 2538.459474] device_release_driver+0x18/0x28
>> [ 2538.459478] bus_remove_device+0x15c/0x174
>> [ 2538.459483] device_del+0x1d0/0x358
>> [ 2538.459488] __scsi_remove_device+0xa8/0x198
>> [ 2538.459493] scsi_forget_host+0x50/0x70
>> [ 2538.459497] scsi_remove_host+0x80/0x180
>> [ 2538.459502] usb_stor_disconnect+0x68/0xf4
>> [ 2538.459506] usb_unbind_interface+0xd4/0x280
>> [ 2538.459510] device_release_driver_internal+0x1c4/0x2c4
>> [ 2538.459514] device_release_driver+0x18/0x28
>> [ 2538.459518] bus_remove_device+0x15c/0x174
>> [ 2538.459523] device_del+0x1d0/0x358
>> [ 2538.459528] usb_disable_device+0x84/0x194
>> [ 2538.459532] usb_disconnect+0xec/0x300
>> [ 2538.459537] hub_event+0xb80/0x1870
>> [ 2538.459541] process_scheduled_works+0x248/0x4dc
>> [ 2538.459545] worker_thread+0x244/0x334
>> [ 2538.459549] kthread+0x114/0x1bc
>>
>> [ 2538.461001] INFO: task "fsck.":15415 blocked for more than 188 seconds.
>> [ 2538.461014] Call trace:
>> [ 2538.461016] __switch_to+0x174/0x338
>> [ 2538.461021] __schedule+0x628/0x9c4
>> [ 2538.461025] schedule+0x7c/0xe8
>> [ 2538.461030] blk_queue_enter+0xc4/0x160
>> [ 2538.461034] blk_mq_alloc_request+0x120/0x1d4
>> [ 2538.461037] scsi_execute_cmd+0x7c/0x23c
>> [ 2538.461040] ioctl_internal_command+0x5c/0x164
>> [ 2538.461046] scsi_set_medium_removal+0x5c/0xb0
>> [ 2538.461051] sd_release+0x50/0x94
>> [ 2538.461054] blkdev_put+0x190/0x28c
>> [ 2538.461058] blkdev_release+0x28/0x40
>> [ 2538.461063] __fput+0xf8/0x2a8
>> [ 2538.461066] __fput_sync+0x28/0x5c
>> [ 2538.461070] __arm64_sys_close+0x84/0xe8
>> [ 2538.461073] invoke_syscall+0x58/0x114
>> [ 2538.461078] el0_svc_common+0xac/0xe0
>> [ 2538.461082] do_el0_svc+0x1c/0x28
>> [ 2538.461087] el0_svc+0x38/0x68
>> [ 2538.461090] el0t_64_sync_handler+0x68/0xbc
>> [ 2538.461093] el0t_64_sync+0x1a8/0x1ac
>>
>> T1: T2:
>> sd_remove
>> del_gendisk
>> __blk_mark_disk_dead
>> blk_freeze_queue_start
>> ++q->mq_freeze_depth
>> bdev_release
>> mutex_lock(&disk->open_mutex)
>> sd_release
>> scsi_execute_cmd
>> blk_queue_enter
>> wait_event(!q->mq_freeze_depth)
>
> This looks wrong, there is a condition blk_queue_dying() in
> blk_queue_enter().
584 static void __blk_mark_disk_dead(struct gendisk *disk)
585 {
......
591
592 if (test_bit(GD_OWNS_QUEUE, &disk->state))
593 blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
SCSI does not set GD_OWNS_QUEUE, so QUEUE_FLAG_DYING is not set in
this scenario.
Thanks.
>
> Thanks,
> Kuai
>
>> mutex_lock(&disk->open_mutex)
>>
>> This is a classic ABBA deadlock. To fix the deadlock, make sure we don't
>> try to acquire disk->open_mutex after freezing the queue.
>>
>> Signed-off-by: Yang Yang <yang.yang@...o.com>
>> ---
>> block/genhd.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index 8f1f3c6b4d67..c5fca3e893a0 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -663,12 +663,12 @@ void del_gendisk(struct gendisk *disk)
>> */
>> if (!test_bit(GD_DEAD, &disk->state))
>> blk_report_disk_dead(disk, false);
>> - __blk_mark_disk_dead(disk);
>> /*
>> * Drop all partitions now that the disk is marked dead.
>> */
>> mutex_lock(&disk->open_mutex);
>> + __blk_mark_disk_dead(disk);
>> xa_for_each_start(&disk->part_tbl, idx, part, 1)
>> drop_partition(part);
>> mutex_unlock(&disk->open_mutex);
>>
>
Powered by blists - more mailing lists