[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c95b86b-68a0-41f8-a09c-3cb4b06fe61a@oracle.com>
Date: Tue, 26 Nov 2024 12:21:24 +0000
From: John Garry <john.g.garry@...cle.com>
To: Qiang Ma <maqianga@...ontech.com>, James.Bottomley@...senPartnership.com,
martin.petersen@...cle.com, axboe@...nel.dk, dwagner@...e.de,
ming.lei@...hat.com, hare@...e.de
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] scsi: Don't wait for completion of in-flight requests
On 26/11/2024 11:50, Qiang Ma wrote:
> Problem:
> When the system disk uses the scsi disk bus, The main
> qemu command line includes:
> ...
> -device virtio-scsi-pci,id=scsi0 \
> -device scsi-hd,scsi-id=1,drive=drive-virtio-disk
> -drive id=drive-virtio-disk,if=none,file=/home/kvm/test.qcow2
> ...
>
> The dmesg log is as follows::
>
> [ 50.304591][ T4382] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [ 50.377002][ T4382] kexec_core: Starting new kernel
> [ 50.669775][ T194] psci: CPU1 killed (polled 0 ms)
> [ 50.849665][ T194] psci: CPU2 killed (polled 0 ms)
> [ 51.109625][ T194] psci: CPU3 killed (polled 0 ms)
> [ 51.319594][ T194] psci: CPU4 killed (polled 0 ms)
> [ 51.489667][ T194] psci: CPU5 killed (polled 0 ms)
> [ 51.709582][ T194] psci: CPU6 killed (polled 0 ms)
> [ 51.949508][ T10] psci: CPU7 killed (polled 0 ms)
> [ 52.139499][ T10] psci: CPU8 killed (polled 0 ms)
> [ 52.289426][ T10] psci: CPU9 killed (polled 0 ms)
> [ 52.439552][ T10] psci: CPU10 killed (polled 0 ms)
> [ 52.579525][ T10] psci: CPU11 killed (polled 0 ms)
> [ 52.709501][ T10] psci: CPU12 killed (polled 0 ms)
> [ 52.819509][ T194] psci: CPU13 killed (polled 0 ms)
> [ 52.919509][ T194] psci: CPU14 killed (polled 0 ms)
> [ 243.214009][ T115] INFO: task kworker/0:1:10 blocked for more than 122 seconds.
> [ 243.214810][ T115] Not tainted 6.6.0+ #1
> [ 243.215517][ T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 243.216390][ T115] task:kworker/0:1 state:D stack:0 pid:10 ppid:2 flags:0x00000008
> [ 243.217299][ T115] Workqueue: events vmstat_shepherd
> [ 243.217816][ T115] Call trace:
> [ 243.218133][ T115] __switch_to+0x130/0x1e8
> [ 243.218568][ T115] __schedule+0x660/0xcf8
> [ 243.219013][ T115] schedule+0x58/0xf0
> [ 243.219402][ T115] percpu_rwsem_wait+0xb0/0x1d0
> [ 243.219880][ T115] __percpu_down_read+0x40/0xe0
> [ 243.220353][ T115] cpus_read_lock+0x5c/0x70
> [ 243.220795][ T115] vmstat_shepherd+0x40/0x140
> [ 243.221250][ T115] process_one_work+0x170/0x3c0
> [ 243.221726][ T115] worker_thread+0x234/0x3b8
> [ 243.222176][ T115] kthread+0xf0/0x108
> [ 243.222564][ T115] ret_from_fork+0x10/0x20
> ...
> [ 243.254080][ T115] INFO: task kworker/0:2:194 blocked for more than 122 seconds.
> [ 243.254834][ T115] Not tainted 6.6.0+ #1
> [ 243.255529][ T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 243.256378][ T115] task:kworker/0:2 state:D stack:0 pid:194 ppid:2 flags:0x00000008
> [ 243.257284][ T115] Workqueue: events work_for_cpu_fn
> [ 243.257793][ T115] Call trace:
> [ 243.258111][ T115] __switch_to+0x130/0x1e8
> [ 243.258541][ T115] __schedule+0x660/0xcf8
> [ 243.258971][ T115] schedule+0x58/0xf0
> [ 243.259360][ T115] schedule_timeout+0x280/0x2f0
> [ 243.259832][ T115] wait_for_common+0xcc/0x2d8
> [ 243.260287][ T115] wait_for_completion+0x20/0x38
> [ 243.260767][ T115] cpuhp_kick_ap+0xe8/0x278
> [ 243.261207][ T115] cpuhp_kick_ap_work+0x5c/0x188
> [ 243.261688][ T115] _cpu_down+0x120/0x378
> [ 243.262103][ T115] __cpu_down_maps_locked+0x20/0x38
> [ 243.262609][ T115] work_for_cpu_fn+0x24/0x40
> [ 243.263059][ T115] process_one_work+0x170/0x3c0
> [ 243.263533][ T115] worker_thread+0x234/0x3b8
> [ 243.263981][ T115] kthread+0xf0/0x108
> [ 243.264405][ T115] ret_from_fork+0x10/0x20
> [ 243.264846][ T115] INFO: task kworker/15:2:639 blocked for more than 122 seconds.
> [ 243.265602][ T115] Not tainted 6.6.0+ #1
> [ 243.266296][ T115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 243.267143][ T115] task:kworker/15:2 state:D stack:0 pid:639 ppid:2 flags:0x00000008
> [ 243.268044][ T115] Workqueue: events_freezable_power_ disk_events_workfn
> [ 243.268727][ T115] Call trace:
> [ 243.269051][ T115] __switch_to+0x130/0x1e8
> [ 243.269481][ T115] __schedule+0x660/0xcf8
> [ 243.269903][ T115] schedule+0x58/0xf0
> [ 243.270293][ T115] schedule_timeout+0x280/0x2f0
> [ 243.270763][ T115] io_schedule_timeout+0x50/0x70
> [ 243.271245][ T115] wait_for_common_io.constprop.0+0xb0/0x298
> [ 243.271830][ T115] wait_for_completion_io+0x1c/0x30
> [ 243.272335][ T115] blk_execute_rq+0x1d8/0x278
> [ 243.272793][ T115] scsi_execute_cmd+0x114/0x238
> [ 243.273267][ T115] sr_check_events+0xc8/0x310 [sr_mod]
> [ 243.273808][ T115] cdrom_check_events+0x2c/0x50 [cdrom]
> [ 243.274408][ T115] sr_block_check_events+0x34/0x48 [sr_mod]
> [ 243.274994][ T115] disk_check_events+0x44/0x1b0
> [ 243.275468][ T115] disk_events_workfn+0x20/0x38
> [ 243.275939][ T115] process_one_work+0x170/0x3c0
> [ 243.276410][ T115] worker_thread+0x234/0x3b8
> [ 243.276855][ T115] kthread+0xf0/0x108
> [ 243.277241][ T115] ret_from_fork+0x10/0x20
>
> ftrace finds that it enters an endless loop, code as follows:
>
> if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) {
> while (blk_mq_hctx_has_requests(hctx))
> msleep(5);
> percpu_ref_put(&hctx->queue->q_usage_counter);
> }
>
> Solution:
> Refer to the loop and dm-rq in patch commit bf0beec0607d
> ("blk-mq: drain I/O when all CPUs in a hctx are offline"),
> add a BLK_MQ_F_STACKING and set it for scsi, so we don't need
> to wait for completion of in-flight requests to avoid a potential
> hung task.
>
> Fixes: bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are offline")
>
> Signed-off-by: Qiang Ma <maqianga@...ontech.com>
> ---
> drivers/scsi/scsi_lib.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index adee6f60c966..0a2d5d9327fc 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2065,7 +2065,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
> tag_set->queue_depth = shost->can_queue;
> tag_set->cmd_size = cmd_size;
> tag_set->numa_node = dev_to_node(shost->dma_dev);
> - tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
> + tag_set->flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_STACKING;
This should not be set for all SCSI hosts. Some SCSI hosts rely on
bf0beec0607d.
> tag_set->flags |=
> BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
> if (shost->queuecommand_may_block)
Powered by blists - more mailing lists