[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1efb3d4d-a05b-2eb2-a140-a76981b662a4@huaweicloud.com>
Date: Wed, 5 Jul 2023 10:28:45 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Yu Kuai <yukuai1@...weicloud.com>,
Marc Hartmayer <mhartmay@...ux.ibm.com>
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com, hch@....de,
chaitanyak@...dia.com, shinichiro.kawasaki@....com,
dgilbert@...erlog.com, jejb@...ux.ibm.com,
martin.petersen@...cle.com, axboe@...nel.dk,
Benjamin Block <bblock@...ux.ibm.com>,
"yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] scsi/sg: don't grab scsi host module reference
Hi,
在 2023/07/05 9:43, Yu Kuai 写道:
> Hi,
>
> 在 2023/07/05 1:04, Marc Hartmayer 写道:
>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai
>> <yukuai1@...weicloud.com> wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>>> In order to prevent request_queue to be freed before cleaning up
>>> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
>>> debugfs entries leakage") use scsi_device_get(), however,
>>> scsi_device_get() will also grab scsi module reference and scsi module
>>> can't be removed.
>>>
>>> It's reported that blktests can't unload scsi_debug after block/001:
>>>
>>> blktests (master) # ./check block
>>> block/001 (stress device hotplugging) [failed]
>>> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
>>> Running block/001
>>> Stressing sd
>>> +modprobe: FATAL: Module scsi_debug is in use.
>>>
>>> Fix this problem by grabbing request_queue reference directly, so that
>>> scsi host module can still be unloaded while request_queue will be
>>> pinged by sg device.
>>>
>>> Reported-by: Chaitanya Kulkarni <chaitanyak@...dia.com>
>>> Link:
>>> https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
>>>
>>> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>> ---
>>> drivers/scsi/sg.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>> index 2433eeef042a..dcb73787c29d 100644
>>> --- a/drivers/scsi/sg.c
>>> +++ b/drivers/scsi/sg.c
>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>> int error;
>>> unsigned long iflags;
>>> - error = scsi_device_get(scsidp);
>>> + error = blk_get_queue(scsidp->request_queue);
>>> if (error)
>>> return error;
>>> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
>>> out:
>>> if (cdev)
>>> cdev_del(cdev);
>>> - scsi_device_put(scsidp);
>>> + blk_put_queue(scsidp->request_queue);
>>> return error;
>>> }
>>> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
>>> */
>>> blk_trace_remove(q);
>>> - scsi_device_put(sdp->device);
>>> + blk_put_queue(q);
>>> write_lock_irqsave(&sg_index_lock, flags);
>>> idr_remove(&sg_index_idr, sdp->index);
>>> --
>>> 2.39.2
>>
>> Hi,
>>
>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>> driver. On s390x you can reproduce the problem as follows (I have not
>> tested on x86):
>>
>> With this patch applied:
>>
>> $ sudo modprobe scsi_debug
>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the
>> scsi_debug device
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No
>> such file or directory
>>
>>
>> Patch reverted:
>>
>
> I didn't figure out the root cause, howver, have you tried to reviert
> this patch as well?
>
> db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"
Never mind this, root cause is that the checking of return value of
blk_get_queue() is wrong.
This shoud be fixed by following patch:
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 89fa046c7158..0d8afffd1683 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1497,9 +1497,10 @@ sg_add_device(struct device *cl_dev)
int error;
unsigned long iflags;
- error = blk_get_queue(scsidp->request_queue);
- if (error)
- return error;
+ if (!blk_get_queue(scsidp->request_queue)) {
+ pr_warn("%s: get scsi_device queue failed\n", __func__);
+ return -ENODEV;
+ }
error = -ENOMEM;
cdev = cdev_alloc();
>
> Thanks,
> Kuai
>> $ sudo modprobe scsi_debug
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> Size: 0 Blocks: 0 IO Block: 4096 directory
>> Device: 0,20 Inode: 12155 Links: 3
>> …
>>
>> Any ideas?
>>
>> Marc
>> .
>>
>
> .
>
Powered by blists - more mailing lists