lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35f5f7d3-3cbf-b45a-8a3b-eb1a3f34ddec@huawei.com>
Date: Fri, 9 Aug 2024 16:28:29 +0800
From: Yihang Li <liyihang9@...wei.com>
To: Bart Van Assche <bvanassche@....org>,
	<James.Bottomley@...senPartnership.com>, <martin.petersen@...cle.com>
CC: <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<prime.zeng@...wei.com>, <linuxarm@...wei.com>, <liyihang9@...wei.com>
Subject: Re: [PATCH] scsi: sd: Have scsi-ml retry START_STOP errors



On 2024/8/9 1:20, Bart Van Assche wrote:
> On 8/7/24 8:46 PM, Yihang Li wrote:
>> When sending START_STOP commands to resume scsi_device, it may be
>> interrupted by exception operations such as host reset or PCI FLR. Once
>> the command of START_STOP is failed, the runtime_status of scsi device
>> will be error and it is difficult for user to recover it.
> 
> How is the PCI FLR sent to the device? Shouldn't PCI FLRs only be
> triggered by the SCSI LLD from inside an error handler callback? How can
> a PCI FLR be triggered while a START STOP UNIT command is being
> processed? Why can PCI FLRs only be triggered while a START STOP UNIT
> command is being processed and not while any other command is being
> processed?

The PCI FLR mentioned in my description is not sent to the SCSI device,
but to the SCSI host.

When the START STOP UNIT command is submitted to the SCSI device,
the command is sent to the SCSI device through the SCSI host.
If the user triggers the PCI FLR through the sysfs interface or
the host is reset due to an error, the START STOP UNIT command
(other commands are the same) is interrupted. So I think the command
needs to be retried.

> 
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 5cd88a8eea73..29f30407d713 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -4088,9 +4088,20 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>>   {
>>       unsigned char cmd[6] = { START_STOP };    /* START_VALID */
>>       struct scsi_sense_hdr sshdr;
>> +    struct scsi_failure failure_defs[] = {
>> +        {
>> +            .allowed = 3,
>> +            .result = SCMD_FAILURE_RESULT_ANY,
>> +        },
>> +        {}
>> +    };
>> +    struct scsi_failures failures = {
>> +        .failure_definitions = failure_defs,
>> +    };
>>       const struct scsi_exec_args exec_args = {
>>           .sshdr = &sshdr,
>>           .req_flags = BLK_MQ_REQ_PM,
>> +        .failures = &failures,
>>       };
>>       struct scsi_device *sdp = sdkp->device;
>>       int res;
> 
> The above change makes the START STOP UNIT command to be retried
> unconditionally. A START STOP UNIT command should not be retried
> unconditionally.
> 
> Please take a look at the following patch series (posted yesterday):
> https://lore.kernel.org/linux-scsi/20240807203215.2439244-1-bvanassche@acm.org/

I will reconsider the retry of the START STOP UNIT command, and try this patch series as well.

Thanks,

Yihang.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ