lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c146a57-4b8d-615c-59ba-66be7c654466@huaweicloud.com>
Date:   Wed, 5 Jul 2023 10:26:00 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     Yu Kuai <yukuai1@...weicloud.com>,
        Benjamin Block <bblock@...ux.ibm.com>,
        Marc Hartmayer <mhartmay@...ux.ibm.com>
Cc:     linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
        yi.zhang@...wei.com, yangerkun@...wei.com, hch@....de,
        chaitanyak@...dia.com, shinichiro.kawasaki@....com,
        dgilbert@...erlog.com, jejb@...ux.ibm.com,
        martin.petersen@...cle.com, axboe@...nel.dk,
        "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH] scsi/sg: don't grab scsi host module reference

Hi,

在 2023/07/05 10:16, Yu Kuai 写道:
> Hi,
> 
> 在 2023/07/05 2:51, Benjamin Block 写道:
>> On Tue, Jul 04, 2023 at 07:04:00PM +0200, Marc Hartmayer wrote:
>>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai 
>>> <yukuai1@...weicloud.com> wrote:
>>>> From: Yu Kuai <yukuai3@...wei.com>
>>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>>> index 2433eeef042a..dcb73787c29d 100644
>>>> --- a/drivers/scsi/sg.c
>>>> +++ b/drivers/scsi/sg.c
>>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>>>       int error;
>>>>       unsigned long iflags;
>>>> -    error = scsi_device_get(scsidp);
>>>> +    error = blk_get_queue(scsidp->request_queue);
>>>>       if (error)
>>>>           return error;
>>
>> Might be interesting as well. Marc showed me a `dmesg` snipped earlier
>> from when the bind fails:
>>
>>    [   15.441817] scsi host2: scsi_eh_2: sleeping
>>    [   15.441899] scsi_debug:sdebug_driver_probe: scsi_debug: trim 
>> poll_queues to 0. poll_q/nr_hw = (0/1)
>>    [   15.441907] scsi host2: scsi_debug: version 0191 [20210520]
>>                     dev_size_mb=8, opts=0x0, submit_queues=1, 
>> statistics=0
>>    [   15.442078] scsi host2: scsi_scan_host_selected: 
>> <4294967295:4294967295:18446744073709551615>
>>    [   15.442267] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
>>    [   15.442286] scsi 2:0:0:0: scsi scan: INQUIRY successful with 
>> code 0x0
>>    [   15.442296] scsi 2:0:0:0: scsi scan: INQUIRY pass 2 length 96
>>    [   15.442308] scsi 2:0:0:0: scsi scan: INQUIRY successful with 
>> code 0x0
>>    [   15.442317] scsi 2:0:0:0: Direct-Access     Linux    
>> scsi_debug       0191 PQ: 0 ANSI: 7
>>    [   15.442554] scsi 2:0:0:0: Power-on or device reset occurred
>>    [   15.442560] scsi 2:0:0:0: tag#50 Done: SUCCESS Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>    [   15.442565] scsi 2:0:0:0: tag#50 CDB: Report supported operation 
>> codes a3 0c 01 88 00 00 00 00 00 14 00 00
>>    [   15.442569] scsi 2:0:0:0: tag#50 Sense Key : Unit Attention 
>> [current]
>>    [   15.442573] scsi 2:0:0:0: tag#50 Add. Sense: Power on occurred
>>
>> The bind should happend around here somewhere I think.
>>
>>    [   15.472680] sd 2:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
>>    [   15.472703] sd 2:0:0:0: scsi scan: REPORT LUNS successful (try 
>> 0) result 0x0
>>    [   15.472706] sd 2:0:0:0: scsi scan: REPORT LUN scan
>>    [   15.472709] sd 2:0:0:0: scsi scan: device exists on 2:0:0:0
>>    [   15.492874] sd 2:0:0:0: [sdi] 16384 512-byte logical blocks: 
>> (8.39 MB/8.00 MiB)
>>    [   15.502853] sd 2:0:0:0: [sdi] Write Protect is off
>>    [   15.502856] sd 2:0:0:0: [sdi] Mode Sense: 73 00 10 08
>>    [   15.522819] sd 2:0:0:0: [sdi] Write cache: enabled, read cache: 
>> enabled, supports DPO and FUA
>>    [   15.552773] sd 2:0:0:0: [sdi] Preferred minimum I/O size 512 bytes
>>    [   15.552776] sd 2:0:0:0: [sdi] Optimal transfer size 524288 bytes
>>    [   15.575373] sd 2:0:0:0: [sdi] tag#62 Done: SUCCESS Result: 
>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>>    [   15.575377] sd 2:0:0:0: [sdi] tag#62 CDB: Inquiry 12 01 b9 00 04 00
>>    [   15.575380] sd 2:0:0:0: [sdi] tag#62 Sense Key : Illegal Request 
>> [current]
>>    [   15.575383] sd 2:0:0:0: [sdi] tag#62 Add. Sense: Invalid field 
>> in cdb
>>    [   15.645749] sd 2:0:0:0: [sdi] Attached SCSI disk
>>
>> But we don't even see the `sg_alloc: dev=...` message that is logged in
>> `sg_alloc()`. And between the change above and the call to `sg_alloc()`,
>> there is only the character device allocation; and if that failed, it
>> would print an error. So either the bind is never even tried, or the new
>> `blk_get_queue()` fails to get a reference.
>>      Which is odd, since the only way that would happen is, if the queue
>> was marked dying; but we see that the stack is using it for LUN probing
>> in `sd`.
> 
> Yes, if scsi_device_get() works fine, but blk_get_queue() has problems,
> it seems to me that sg_add_device() can be called with scsi_device queue
> mark dying? This is odd, but I'm not sure if it's the case.

Sorry that I totally messed how blk_get_queue() is called, it returns
true on success and false on failure, not 0 on success and errno on
failure.

Sorry for all the troble.

Kuai
> 
> Thanks,
> Kuai
>>
>>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>>> driver. On s390x you can reproduce the problem as follows (I have not
>>> tested on x86):
>>>
>>> With this patch applied:
>>>
>>> $ sudo modprobe scsi_debug
>>
>> One more thing maybe worth mentioning: in the kernel configuration we
>> use in the CI we have `sg` built-in. I guess most have it built as
>> module.
>>
>>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the 
>>> scsi_debug device
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No 
>>> such file or directory
>>>
>>>
>>> Patch reverted:
>>>
>>> $ sudo modprobe scsi_debug
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>>    File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>>    Size: 0             Blocks: 0          IO Block: 4096   directory
>>> Device: 0,20    Inode: 12155       Links: 3
>>> …
>>
>> That's all I got from looking at it earlier, so far.
>>
> 
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ